如何在没有pandas帮助的情况下使用pyspark读取xlsx文件

ego6inou 于 2021-05-27 发布在 Spark

关注(0)|答案(0)|浏览(255)

我正在用这段代码读取本地pc中的xlsx文件。但我无法读取该文件，我也在使用“com.crealytics.spark.excel”库。

sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

spark = SparkSession.builder \
    .appName("test") \
    .master("local[0]") \
    .getOrCreate()

empFile = "C:/Users/Dev/Downloads/SAMPLE.xlsx"

employeesDF = sqlContext.read.format("com.crealytics.spark.excel").option("sheetName", "Sheet1").option("useHeader", "true").option("treatEmptyValuesAsNulls", "false").option("inferSchema", "false").option("location", empFile).option("addColorColumns", "False").load()

employeesDF.createOrReplaceTempView("EMP")

expLevel = sqlContext.sql("Select * from EMP")
expLevel.show()

如果我运行这个代码，我会得到这样的错误
py4j.protocol.py4jjavaerror:调用o35.load时出错：java.lang.noclassdeffounderror:scala/product$类

apache-spark pyspark pyspark-dataframes

来源：https://stackoverflow.com/questions/63208607/how-to-read-xlsx-file-using-pyspark-without-help-of-pandas