找到csv:readStream的多个源

问题描述 投票:0回答:1

我试图运行下面的代码将文件作为数据帧读取到Kafka主题(用于Spark Streaming),通过Eclipse IDE开发,使用Scala,通过使用spark-submit在服务器上运行thin jar来适当地定义模式(不调用任何其他包)并在下面收到错误。试图从基于spark.read.option.schema.csv的互联网研究中得出的类似错误的建议没有成功。

有没有人在使用readStream选项时遇到类似的Spark Streaming问题?

期待听到您的回复!

错误:

Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (com.databricks.spark.csv.DefaultSource15, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name.

码:

val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv").csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv").csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv.DefaultSource15").csv("server_path") //does not resolve error
eclipse scala csv apache-spark spark-streaming
1个回答
0
投票

Pom.xml没有明确调用spark-csv jar。

结果显示包含jar的服务器HDP路径,Spark2有spark-csv和spark-sql jar,这导致了Csv冲突源的问题。从路径中删除额外的spark-csv jar后,问题得到了解决。

© www.soinside.com 2019 - 2024. All rights reserved.