通过 Spark Streaming API 使用 Kafka Connect 中的控制台发布预览数据

问题描述 投票:0回答:1

请我尝试使用 Spark 结构化流 API 预览从 Kafka 主题消耗的数据。 它卡在这里几分钟,而不是启动我的 Spark CLI

请帮忙 我已经有了我的 kafka 主题,数据已写入其中,然后运行命令:

pyspark \
    --master yarn \
    --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.3 \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

但是我不断得到输出:

Python 3.7.17 (default, Jun  6 2023, 20:10:10) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark-3.3.3-bin-hadoop3/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.3.6/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
:: loading settings :: url = jar:file:/opt/spark-3.3.3-bin-hadoop3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/dee/.ivy2/cache
The jars for the packages stored in: /home/dee/.ivy2/jars
org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-b93cab13-37df-4a4c-9366-88c1ae221422;1.0
        confs: [default]
        found org.apache.spark#spark-sql-kafka-0-10_2.12;3.3.3 in central
        found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.3.3 in central
        found org.apache.kafka#kafka-clients;2.8.1 in central
        found org.lz4#lz4-java;1.8.0 in central
        found org.xerial.snappy#snappy-java;1.1.8.4 in central
        found org.slf4j#slf4j-api;1.7.32 in central
        found org.apache.hadoop#hadoop-client-runtime;3.3.2 in central
        found org.spark-project.spark#unused;1.0.0 in central
        found org.apache.hadoop#hadoop-client-api;3.3.2 in central
        found commons-logging#commons-logging;1.1.3 in central
        found com.google.code.findbugs#jsr305;3.0.0 in central
        found org.apache.commons#commons-pool2;2.11.1 in central
downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.12/3.3.3/spark-sql-kafka-0-10_2.12-3.3.3.jar ...
        [SUCCESSFUL ] org.apache.spark#spark-sql-kafka-0-10_2.12;3.3.3!spark-sql-kafka-0-10_2.12.jar (59ms)
downloading https://repo1.maven.org/maven2/org/apache/spark/spark-token-provider-kafka-0-10_2.12/3.3.3/spark-token-provider-kafka-0-10_2.12-3.3.3.jar ...
        [SUCCESSFUL ] org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.3.3!spark-token-provider-kafka-0-10_2.12.jar (20ms)
downloading https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/2.8.1/kafka-clients-2.8.1.jar ...
        [SUCCESSFUL ] org.apache.kafka#kafka-clients;2.8.1!kafka-clients.jar (279ms)
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar ...
        [SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar (20ms)
downloading https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.1/commons-pool2-2.11.1.jar ...
        [SUCCESSFUL ] org.apache.commons#commons-pool2;2.11.1!commons-pool2.jar (24ms)
downloading https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar ...
        [SUCCESSFUL ] org.spark-project.spark#unused;1.0.0!unused.jar (17ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client-runtime/3.3.2/hadoop-client-runtime-3.3.2.jar ...
        [SUCCESSFUL ] org.apache.hadoop#hadoop-client-runtime;3.3.2!hadoop-client-runtime.jar (914ms)
downloading https://repo1.maven.org/maven2/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar ...
        [SUCCESSFUL ] org.lz4#lz4-java;1.8.0!lz4-java.jar (25ms)
downloading https://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.8.4/snappy-java-1.1.8.4.jar ...
        [SUCCESSFUL ] org.xerial.snappy#snappy-java;1.1.8.4!snappy-java.jar(bundle) (42ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.32/slf4j-api-1.7.32.jar ...
        [SUCCESSFUL ] org.slf4j#slf4j-api;1.7.32!slf4j-api.jar (20ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client-api/3.3.2/hadoop-client-api-3.3.2.jar ...
        [SUCCESSFUL ] org.apache.hadoop#hadoop-client-api;3.3.2!hadoop-client-api.jar (187ms)
downloading https://repo1.maven.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar ...
        [SUCCESSFUL ] commons-logging#commons-logging;1.1.3!commons-logging.jar (22ms)
:: resolution report :: resolve 5105ms :: artifacts dl 1660ms
        :: modules in use:
        com.google.code.findbugs#jsr305;3.0.0 from central in [default]
        commons-logging#commons-logging;1.1.3 from central in [default]
        org.apache.commons#commons-pool2;2.11.1 from central in [default]
        org.apache.hadoop#hadoop-client-api;3.3.2 from central in [default]
        org.apache.hadoop#hadoop-client-runtime;3.3.2 from central in [default]
        org.apache.kafka#kafka-clients;2.8.1 from central in [default]
        org.apache.spark#spark-sql-kafka-0-10_2.12;3.3.3 from central in [default]
        org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.3.3 from central in [default]
        org.lz4#lz4-java;1.8.0 from central in [default]
        org.slf4j#slf4j-api;1.7.32 from central in [default]
        org.spark-project.spark#unused;1.0.0 from central in [default]
        org.xerial.snappy#snappy-java;1.1.8.4 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   12  |   12  |   12  |   0   ||   12  |   12  |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-b93cab13-37df-4a4c-9366-88c1ae221422
        confs: [default]
        12 artifacts copied, 0 already retrieved (56631kB/113ms)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.12-3.3.3.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.3.3.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.apache.kafka_kafka-clients-2.8.1.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/com.google.code.findbugs_jsr305-3.0.0.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.apache.commons_commons-pool2-2.11.1.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.apache.hadoop_hadoop-client-runtime-3.3.2.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.lz4_lz4-java-1.8.0.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.8.4.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.slf4j_slf4j-api-1.7.32.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/org.apache.hadoop_hadoop-client-api-3.3.2.jar added multiple times to distributed cache.
23/09/07 22:17:30 WARN Client: Same path resource file:///home/dee/.ivy2/jars/commons-logging_commons-logging-1.1.3.jar added multiple times to distributed cache.
java apache-spark hadoop apache-kafka
1个回答
0
投票

所以,我必须检查我的系统资源以查看作业是否实际执行,所以当我运行命令时

jps

事实证明,我还有其他仍在运行的 SparkSubmit 会话/作业,我认为我已经关闭了它们,因此内存已满,并且无法将内存分配给我当前的 Spark 会话。因此,我必须通过运行命令

kill -9 <job_id>
来终止/停止这些 Spark 会话,之后我还通过停止托管这些会话的 Python 内核进行验证,并且我当前的 Spark 作业能够加载该屏幕并授予我访问该屏幕的权限。 pyspark 接口。

© www.soinside.com 2019 - 2024. All rights reserved.