如何在使用scala的hadoop客户机的hdfs中附加文本文件？

qojgxg4l 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(445)

我想把文本文件写入hdfs。文件必须写入hdfs的路径是动态生成的。如果文件路径（包括文件名）是新的，则应创建该文件并向其写入文本。如果文件路径（包括文件）已存在，则必须将字符串附加到现有文件。
我使用了以下代码。文件创建工作正常。但无法将文本附加到现有文件。

def writeJson(uri: String, Json: JValue, time: Time): Unit = {
    val path = new Path(generateFilePath(Json, time))
    val conf = new Configuration()
    conf.set("fs.defaultFS", uri)
    conf.set("dfs.replication", "1")
    conf.set("dfs.support.append", "true")
    conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","false")

    val Message = compact(render(Json))+"\n"
    try{
      val fileSystem = FileSystem.get(conf)
      if(fileSystem.exists(path).equals(true)){
        println("File exists.")
        val outputStream = fileSystem.append(path)
        val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
        bufferedWriter.write(Message.toString)
        bufferedWriter.close()
        println("Appended to file in path : " + path)
      }
      else {
        println("File does not exist.")
        val outputStream = fileSystem.create(path, true)
        val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
        bufferedWriter.write(Message.toString)
        bufferedWriter.close()
        println("Created file in path : " + path)
      }
    }catch{
      case e:Exception=>
        e.printStackTrace()
    }
  }

hadoop版本：2.7.0
每当必须执行append时，都会生成以下错误：
org.apache.hadoop.ipc.remoteexception（java.lang.arrayindexoutofboundsexception）

hadoop hdfs scala Append

来源：https://stackoverflow.com/questions/34529061/how-to-append-text-files-in-hdfs-using-hadoop-client-using-scala

1条答案

按热度按时间

6jjcrrmo1#

我可以看到3种可能性：
可能最简单的方法是使用 hdfs 它位于hadoop集群上，请参见：https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfscommands.html . 甚至webhdfs rest功能：https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/webhdfs.html
如果您不想使用hdfs commnads，那么可以使用 hadoop-hdfs 图书馆http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1
如果你想要清洁的scala溶液，使用spark。http://spark.apache.org/docs/latest/programming-guide.html 或者https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter3/save_the_rdd_to_files.html

赞(0）回复(0）举报 2021-05-30

我来回答

如何在使用scala的hadoop客户机的hdfs中附加文本文件？

1条答案

相关问题

热门标签

最新问答