使用spark scala以文件名作为参数读取excel文件时出错

w3nuxt5m  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(440)

有人能帮我用spark scala read api读取excel文件吗?我试过安装 com.crealytics:spark-excel_2.11:0.13.1 (从maven)到databricks runtime 6.5和6.6(apachespark 2.4.5,scala 2.11)的集群,但它只有在硬编码文件路径时才能工作。。

val df = spark.read
    .format("com.crealytics.spark.excel")
    .option("sheetName", "Listing_Attributed")
    .option("header", "true")
    .option("inferSchema", "false")
    .option("addColorColumns", "true") // Optional, default: false
    .option("badRecordsPath", Vars.rootSourcePath + "BadRecords/" + DataCategory)
    .option("dateFormat", "dd-MON-yy")
    .option("timestampFormat", "MM/dd/yyyy hh:mm:ss")
    .option("ignoreLeadingWhiteSpace",true)
    .option("ignoreTrailingWhiteSpace",true)
    .option("escape"," ")
    .load("/ABC/Test_Filename_6.12.20.xlsx")  // hard-coded path works...
//  .load(filepath)    //Filepath is a parameter and throws error, "java.io.IOException: GC overhead limit exceeded" (edited)
js5cn81o

js5cn81o1#

使用如下所示的.option(“location”,inputpath)

val df = spark.read
        .format("com.crealytics.spark.excel")
        .option("sheetName", "Listing_Attributed")
        .option("header", "true")
        .option("location", inputPath)
        .load()

相关问题