使用scala检查子目录中的文件

j91ykkif 于 2021-05-31 发布在 Hadoop

关注(0)|答案(0)|浏览(255)

嗨，我在hdfs位置/user/hdfs/001/p1、/user/hdfs/002/p2、/user/hdfs/003/p3有一些文件夹结构，在p1、p2、p3中有.csv文件，我想从这些文件夹（p1、p2、p3）中获取所有文件名和文件路径。我写了以下代码：

var path = "/user/hdfs/"
  var dir_count = fs.listStatus(new Path(path)).filter(_.isDirectory).length
  if (dir_count > 0) {
    var fol_loc = fs.listStatus(new Path(path)).map(_.getPath).toList
    for (eachfolder <- fol_loc) {
      var New_Folder_Path: String = eachfolder.toString
      var Fs1 = FileSystem.get(sparksession.sparkContext.hadoopConfiguration)
      var FilePath = Fs1.listStatus(new Path(s"${New_Folder_Path}")).filter(_.isFile).map(_.getPath).toList
      var NewFiles = Fs1.listStatus(new Path(s"${New_Folder_Path}")).filter(_.isFile).map(_.getPath.getName).toList
      var FileSize = Fs1.listStatus(new Path(s"${New_Folder_Path}")).filter(_.isFile).map(_.getLen).toList
      var FileData = FilePath zip NewFiles zip FileSize
      for ((fileDetail, size) <- FileData) {
        var FilePath = fileDetail._1.toString
        var FileSize: Long = size
        var Filename = fileDetail._2
      }
    }
  }

但此代码只在文件夹001002003（/user/hdfs/001）之前进行检查。有没有办法检查p1，p2，p3等文件夹，并采取从那里的文件名。可以有人请指导我该如何修改我的代码。

hadoop hdfs scala

来源：https://stackoverflow.com/questions/62338652/checking-files-in-subdirectories-using-scala