有人能帮我用spark scala read api读取excel文件吗?我试过安装 com.crealytics:spark-excel_2.11:0.13.1
(从maven)到databricks runtime 6.5和6.6(apachespark 2.4.5,scala 2.11)的集群,但它只有在硬编码文件路径时才能工作。。
val df = spark.read
.format("com.crealytics.spark.excel")
.option("sheetName", "Listing_Attributed")
.option("header", "true")
.option("inferSchema", "false")
.option("addColorColumns", "true") // Optional, default: false
.option("badRecordsPath", Vars.rootSourcePath + "BadRecords/" + DataCategory)
.option("dateFormat", "dd-MON-yy")
.option("timestampFormat", "MM/dd/yyyy hh:mm:ss")
.option("ignoreLeadingWhiteSpace",true)
.option("ignoreTrailingWhiteSpace",true)
.option("escape"," ")
.load("/ABC/Test_Filename_6.12.20.xlsx") // hard-coded path works...
// .load(filepath) //Filepath is a parameter and throws error, "java.io.IOException: GC overhead limit exceeded" (edited)
1条答案
按热度按时间js5cn81o1#
使用如下所示的.option(“location”,inputpath)