有多个线程引用同一个问题,但没有一个线程发布了解决方案。在尝试使用pyspark将数据保存到配置单元时遇到相同的错误。然而,在我的例子中,它实际上是在sparksession中发生的。它与sqlcontext配合得很好
csv_data=spark.read.option("inferSchema","true").\
... format("csv").\
... option("header", "true").\
... load("file:///home/hands-on/playArea/names.csv")
>>> csv_data.printSchema()
root
|-- EmployeeID: integer (nullable = true)
|-- FirstName: string (nullable = true)
|-- Title: string (nullable = true)
|-- State: string (nullable = true)
|-- Laptop: string (nullable = true)
>>> csv_data.write.mode("overwrite").saveAsTable("names")
Traceback (most recent call last):
File "/opt/spark-2.4.5-bin-without-hadoop/python/pyspark/sql/utils.py", line 63, in deco
return f(*a,**kw)
File "/opt/spark-2.4.5-bin-without-hadoop/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o44.saveAsTable.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:192)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:103)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$3.apply(BaseSessionStateBuilder.scala:133)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$3.apply(BaseSessionStateBuilder.scala:133)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:420)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:414)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveExternalCatalog
不过,它可以很好地处理通过rdds的todf方法创建的Dataframe
df_csv = csv_clean.map(lambda p: Row(EmployeeID = int(p[0]), FirstName = p[1], Title=p[2], State=p[3], Laptop=p[4])).toDF()
df_csv.write.mode("overwrite").saveAsTable("names")
暂无答案!
目前还没有任何答案,快来回答吧!