在 pyspark 中读取 avro 文件时出现问题

问题描述 投票:0回答:1

我正在尝试读取 pyspark 中的 avro 文件,但遇到错误:

我机器上的spark版本:3.5.0
我机器上的 python 版本:

我已经使用以下参数启动了 pyspark 会话:

pyspark --packages org.apache.spark:spark-avro_2.13:3.5.0

代码:

from pyspark.sql import SparkSession
spark=SparkSession.builder.appName('test-app').getOrCreate()
df=spark.read.format('avro').load('twitter.avro')

运行后,出现以下错误

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 307, in load
    return self._df(self._jreader.load(path))
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py", line 179, in deco
    return f(*a, **kw)
  File "/Users/rajendraprasadpadma/opt/anaconda3/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o47.load.
: java.lang.AbstractMethodError: Receiver class org.apache.spark.sql.avro.AvroFileFormat does not define or inherit an implementation of the resolved method 'abstract scala.Option inferSchema(org.apache.spark.sql.SparkSession, scala.collection.immutable.Map, scala.collection.Seq)' of interface org.apache.spark.sql.execution.datasources.FileFormat.
    at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$11(DataSource.scala:208)
    at scala.Option.orElse(Option.scala:447)
    at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:407)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:832)
apache-spark pyspark avro
1个回答
0
投票

您可以通过查看

pyspark --version
的输出来检查您的 pyspark 使用的 Scala 版本吗?

我怀疑您使用的是 Scala 版本

2.12.x
,但您正在使用
spark-avro_2.13:3.5.0
2.13
指的是 Scala 版本)。

尝试使用

2.12
而不是
2.13
启动 pyspark shell:

pyspark --packages org.apache.spark:spark-avro_2.12:3.5.0
© www.soinside.com 2019 - 2024. All rights reserved.