在SparkContext.scala中，有两个为基本类型定义的到其可写对应项的隐式转换，例如Int,Long,String。
以String为例，有两种定义，定义如下：

implicit def stringWritableConverter(): WritableConverter[String] =
    simpleWritableConverter[String, Text](_.toString)

字符串
和

implicit val stringWritableConverterFn: () => WritableConverter[String] =
    () => simpleWritableConverter[String, Text](_.toString)

型
我想问一下这些方法和变量是如何使用的。它们用于隐式类型转换，但没有输入参数（stringWritableConverter不带参数，stringWritableConverterFn也不带参数）。
这些隐式转换是如何被转换为WritableConverter的
它们没有参数，我只是不知道如何/何时使用这些转换
谢谢.

更新

为了更好地用代码来说明问题，我编写了以下简单的代码来减少Spark

import scala.reflect.{ClassTag, classTag}

trait Writable

class IntWritable(value: Int) extends Writable

class Text(value: String) extends Writable

class WritableConverter[T](
                            val writableClass: ClassTag[T] => Class[_ <: Writable],
                            val convert: Writable => T)

object implicit_test {

  private def simpleWritableConverter[T, W <: Writable : ClassTag](convert: W => T)
  : WritableConverter[T] = {
    println("Hello, simpleWritableConverter")
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

  implicit val stringWritableConverterFn: () => WritableConverter[String] = {
    println("Hello  stringWritableConverterFn")
    () => simpleWritableConverter[String, Text](_.toString)
  }

  implicit def stringWritableConverter(): WritableConverter[String] = {
    println("Hello  stringWritableConverter")
    simpleWritableConverter[String, Text](_.toString)
  }

  def do_convert(a: String) = println(s"a is $a")

  def main(args: Array[String]): Unit = {

    //Compile Error: Required String, Found Text
    do_convert(new Text("abc"))
  }

}

型
当我调用do_convert(new Text("abc"))时，编译器会抱怨Required String, Found Text，这意味着隐式转换没有生效。

我并不是100%理解这个故事，但我觉得我可能会给你带来一些有用的见解。我将使用Spark版本3.4.1源代码（以及相应的Hadoop版本3.3.4）进行解释。
整个故事看起来有点像这样：

/**
 * A class encapsulating how to convert some type `T` from `Writable`. It stores both the `Writable`
 * class corresponding to `T` (e.g. `IntWritable` for `Int`) and a function for doing the
 * conversion.
 * The getter for the writable class takes a `ClassTag[T]` in case this is a generic object
 * that doesn't know the type of `T` when it is created. This sounds strange but is necessary to
 * support converting subclasses of `Writable` to themselves (`writableWritableConverter()`).
 */
private[spark] class WritableConverter[T](
    val writableClass: ClassTag[T] => Class[_ <: Writable],
    val convert: Writable => T)
  extends Serializable

object WritableConverter {

  // Helper objects for converting common types to Writable
  private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
  : WritableConverter[T] = {
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

  // The following implicit functions were in SparkContext before 1.3 and users had to
  // `import SparkContext._` to enable them. Now we move them here to make the compiler find
  // them automatically. However, we still keep the old functions in SparkContext for backward
  // compatibility and forward to the following functions directly.

  // The following implicit declarations have been added on top of the very similar ones
  // below in order to enable compatibility with Scala 2.12. Scala 2.12 deprecates eta
  // expansion of zero-arg methods and thus won't match a no-arg method where it expects
  // an implicit that is a function of no args.

  ...

  implicit val stringWritableConverterFn: () => WritableConverter[String] =
    () => simpleWritableConverter[String, Text](_.toString)
  
  ...

  // These implicits remain included for backwards-compatibility. They fulfill the
  // same role as those above.

  ...

  implicit def stringWritableConverter(): WritableConverter[String] =
    simpleWritableConverter[String, Text](_.toString)

  ...
}

字符串

这些可复制对象是什么？

我在这里引用this great answer的一部分：
我们已经知道，在分布式计算环境中，数据需要在不同节点之间传输。这需要数据的序列化和重复化，以将结构化格式的数据转换为字节流，反之亦然。因此，Hadoop使用简单高效的序列化协议在map和reduce阶段之间序列化数据，这些称为可重复性（s）前面已经提到的一些可写的例子是不可写的，可写的，布尔可写的和浮点可写的。
因此，我们知道在使用Hadoop时，我们将跳转到一堆这样的Writable对象上。为了使用原语类型进行可靠的计算，我们需要有一种方法从Writable转到我们的原语。

为什么我们看起来有两个`stringWritableConverter`？

如果你仔细看我贴在这里的代码注解，你会发现stringWritableConverter和stringWritableConverterFn做的事情完全一样。
所以我们可以简化我们的问题：这里只有stringWritableConverterFn重要。

这个`stringWritableConverterFn`会发生什么？

在Scala中，implant是在编译时解析的。所以在编译时，下面的代码将被调用：

simpleWritableConverter[String, Text](_.toString)

型
请密切注意这里的类型[String, Text]。String是原始类型，Text是Writable类org.apache.hadoop.io.Text。
让我们单独看看simpleWritableConverter函数：

private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
  : WritableConverter[T] = {
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

型
因此，创建了一个新的WritableConverter对象，它存储了两件事：

可继承阶级
convert函数，其签名为W => T。这是Hadoop中Text的toString方法。

总结

在Hadoop中，可转换类被广泛使用。如果我们希望能够检索它们所表示的原语类型，我们需要能够将这些类转换为它们的原语类型。这就是这些WritableConverter对象的用途。
最后，我们创建了这个WritableConverter[String]类，它包含了一种通过toString方法将Text可转换为String原语的方法！

回复您的更新

如果你想看看这个转换器的实际效果，你可以看一个测试示例：

test("BytesWritable implicit conversion is correct") {
    // Regression test for SPARK-3121
    val bytesWritable = new BytesWritable()
    val inputArray = (1 to 10).map(_.toByte).toArray
    bytesWritable.set(inputArray, 0, 10)
    bytesWritable.set(inputArray, 0, 5)

    val converter = WritableConverter.bytesWritableConverter()
    val byteArray = converter.convert(bytesWritable)
    assert(byteArray.length === 5)

    bytesWritable.set(inputArray, 0, 0)
    val byteArray2 = converter.convert(bytesWritable)
    assert(byteArray2.length === 0)
  }

型
在这里，您可以看到我们创建了这个converter对象，它可以使用其convert方法（来自本文开头的WritableConverter[T]类）为您进行转换。
这个例子的不同之处在于，你没有调用WritableConverter的convert方法。所以Scala找不到任何将Text转换为String的函数。

Apache Spark stringsableConverter是用来做什么的

1条答案

这些可复制对象是什么？

为什么我们看起来有两个`stringWritableConverter`？

这个`stringWritableConverterFn`会发生什么？

总结

回复您的更新

相关问题

热门标签

最新问答

Apache Spark stringsableConverter是用来做什么的

1条答案

这些可复制对象是什么？

为什么我们看起来有两个stringWritableConverter？

这个stringWritableConverterFn会发生什么？

总结

回复您的更新

相关问题

热门标签

最新问答

为什么我们看起来有两个`stringWritableConverter`？

这个`stringWritableConverterFn`会发生什么？