在spark中，如果没有rdd，如何在hadoop上编写文件？

uwopmtnx 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(261)

Sparkrdd具有 saveAsTxtFile 功能。但是，如何打开一个文件并将一个简单的字符串写入hadoop存储？

val sparkConf: SparkConf = new SparkConf().setAppName("example")
val sc: SparkContext = new SparkContext(sparkConf)

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "...")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "...")

val lines: RDD[String] = sc.textFile("s3n://your-output-bucket/lines.txt")
val lengths: RDD[Int] = lines.map(_.length)
lengths.saveAsTextFile("s3n://your-output-bucket/lenths.txt")

val numLines: Long = lines.count
val resultString: String = s"numLines: $numLines"
// how to save resultString to "s3n://your-output-bucket/result.txt"

sc.stop()

hadoop hdfs apache-spark

来源：https://stackoverflow.com/questions/39707879/in-spark-how-can-i-write-a-file-on-hadoop-without-an-rdd

2条答案

按热度按时间

1wnzp6jl1#

假设你有一个 SparkContext 绑定到 sc :

import java.io.{BufferedWriter, OutputStreamWriter}

val hdfs = org.apache.hadoop.fs.FileSystem.get(sc.hadoopConfiguration)

val outputPath = 
  new org.apache.hadoop.fs.Path("hdfs://localhost:9000//tmp/hello.txt")

val overwrite = true

val bw = 
  new BufferedWriter(new OutputStreamWriter(hdfs.create(outputPath, overwrite)))
bw.write("Hello, world")
bw.close()

注意：为了保持简单，没有代码在出现异常时关闭writer。

赞(0）回复(0）举报 2021-06-03

qrjkbowd2#

为什么不这样做呢？

val strings = sc.parallelize(Seq("hello", "there"), <numPartitions>)
strings.saveAsTextFile("<path-to-file>")

否则，您可能需要查看hadoopapi来编写一个文件并从驱动程序显式调用该代码。

赞(0）回复(0）举报 2021-06-02

我来回答

在spark中，如果没有rdd，如何在hadoop上编写文件？

2条答案

相关问题

热门标签

最新问答