在pyspark中阅读avro文件的问题

qhhrdooz  于 5个月前  发布在  Spark
关注(0)|答案(1)|浏览(48)

我试图在pyspark中读取avro文件,但遇到错误:
我的机器上的spark-version:3.5.0
Python版本在我的机器上:
我已经使用以下参数启动了我的pyspark会话:pyspark --packages org.apache.spark:spark-avro_2.13:3.5.0
代码:

from pyspark.sql import SparkSession
spark=SparkSession.builder.appName('test-app').getOrCreate()
df=spark.read.format('avro').load('twitter.avro')

字符串
在运行这个之后,我得到了下面的错误

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 307, in load
    return self._df(self._jreader.load(path))
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py", line 179, in deco
    return f(*a, **kw)
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o47.load.
: java.lang.AbstractMethodError: Receiver class org.apache.spark.sql.avro.AvroFileFormat does not define or inherit an implementation of the resolved method 'abstract scala.Option inferSchema(org.apache.spark.sql.SparkSession, scala.collection.immutable.Map, scala.collection.Seq)' of interface org.apache.spark.sql.execution.datasources.FileFormat.
    at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$11(DataSource.scala:208)
    at scala.Option.orElse(Option.scala:447)
    at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:407)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:832)

biswetbf

biswetbf1#

你能通过查看pyspark --version的输出来检查你的pyspark使用的Scala版本吗?
我怀疑你使用的是Scala版本2.12.x,但你使用的是spark-avro_2.13:3.5.02.13指的是Scala版本)。
尝试使用2.12而不是2.13启动pyspark shell:

pyspark --packages org.apache.spark:spark-avro_2.12:3.5.0

字符串

相关问题