Spark 1.4缺少Kafka库

问题描述 投票:3回答:1

我正在尝试运行一个在spark 1.3.1中完美运行的Python spark脚本。我已经下载了火花1.4并试图运行脚本,但它一直在说

在类路径中找不到Spark Streaming的Kafka库。请尝试以下方法之一。

  1. 在spark-submit命令中包含Kafka库及其依赖项 $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.4.0 ...
  2. 从Maven Central http://search.maven.org/下载工件的JAR,Group Id = org.apache.spark,Artifact Id = spark-streaming-kafka-assembly,Version = 1.4.0。然后,在spark-submit命令中包含jar作为 $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...

我在提交命令中明确引用了jar,并将jar添加为

/opt/spark/spark-1.4.0-bin-hadoop2.6/bin/spark-submit --jars spark-streaming_2.10-1.4.0.jar,spark-core_2.10-1.4.0.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar,kafka_2.10-0.8.2.1.jar,kafka-clients-0.8.2.1.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar /root/SparkPySQLNew.py

它还表示它已经在应用程序启动时添加了它们,为什么它没有找到它们?

15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-streaming_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming_2.10-1.4.0.jar with timestamp 1436334277792
15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-core_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-core_2.10-1.4.0.jar with timestamp 1436334277919
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278295
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka_2.10-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka_2.10-0.8.2.1.jar with timestamp 1436334278353
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka-clients-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka-clients-0.8.2.1.jar with timestamp 1436334278357
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278665
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar with timestamp 1436334278666               

而且我知道我已经加载了它们,我从一个开始,然后最后将它们全部添加到最后。

hadoop apache-spark apache-kafka spark-streaming hortonworks-data-platform
1个回答
0
投票

我怀疑每个版本的火花确切的答案各不相同,但基于this HCC thread,以下似乎为其他人做了伎俩:

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar 

乍一看,差异在于它有1个spark-streaming-kafka-assembly jar,而你提交的是2个。

© www.soinside.com 2019 - 2024. All rights reserved.