为什么我无法在 bitnami-spark 容器上运行 Spark 作业?

问题描述 投票:0回答:1

我正在使用以下 Dockerfile 构建 bitnami-spark 映像:

FROM --platform=linux/amd64 bitnami/spark:3.3.1

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt && rm requirements.txt

RUN export PACKAGES="io.delta:delta-core_2.12:1.0.0"
RUN export PYSPARK_SUBMIT_ARGS="--packages ${PACKAGES} pyspark-shell"

我的 docker-compose.yaml 看起来像这样:

version: '3'

services:
  spark-master:
    build:
      context: .
      dockerfile: Dockerfile.spark
    hostname: spark-master
    environment:
      - INIT_DAEMON_STEP=setup_spark
      - SPARK_MODE=master
    ports:
      - "8080:8080"
    networks:
      - spark-network
    volumes:
      - ./:/opt/bitnami/spark/mounted-data

我有一个 etl_script.py,如下所示:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType, StringType, IntegerType, DoubleType
import pyspark.sql.functions as F

from delta.pip_utils import configure_spark_with_delta_pip

import pyspark
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql.functions import *

from pyspark.sql import Window

builder = pyspark.sql.SparkSession.builder.appName("LocalDelta") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

当我尝试使用以下命令运行脚本时:

docker-compose exec spark-master spark-submit --master spark://172.18.0.2:7077 mounted-data/etl_script.py
  • 我收到以下错误:
    ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
    23/09/20 09:06:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45199.
    23/09/20 09:06:05 INFO NettyBlockTransferService: Server created on spark-master:45199
    23/09/20 09:06:05 WARN AbstractConnector: 
    java.io.IOException: No such file or directory
            at sun.nio.ch.NativeThread.signal(Native Method)
            at sun.nio.ch.ServerSocketChannelImpl.implCloseSelectableChannel(ServerSocketChannelImpl.java:291)
            at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:241)
            at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
            at org.sparkproject.jetty.server.ServerConnector.close(ServerConnector.java:371)
            at org.sparkproject.jetty.server.AbstractNetworkConnector.shutdown(AbstractNetworkConnector.java:104)
            at org.sparkproject.jetty.server.Server.doStop(Server.java:444)
            at org.sparkproject.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:94)
            at org.apache.spark.ui.ServerInfo.stop(JettyUtils.scala:525)
            at org.apache.spark.ui.WebUI.$anonfun$stop$2(WebUI.scala:180)
            at org.apache.spark.ui.WebUI.$anonfun$stop$2$adapted(WebUI.scala:180)
            at scala.Option.foreach(Option.scala:407)
            at org.apache.spark.ui.WebUI.stop(WebUI.scala:180)
            at org.apache.spark.ui.SparkUI.stop(SparkUI.scala:141)
            at org.apache.spark.SparkContext.$anonfun$stop$6(SparkContext.scala:2085)
            at org.apache.spark.SparkContext.$anonfun$stop$6$adapted(SparkContext.scala:2085)
            at scala.Option.foreach(Option.scala:407)
            at org.apache.spark.SparkContext.$anonfun$stop$5(SparkContext.scala:2085)
            at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1484)
            at org.apache.spark.SparkContext.stop(SparkContext.scala:2085)
            at org.apache.spark.SparkContext$$anon$3.run(SparkContext.scala:2049)
    23/09/20 09:06:05 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    23/09/20 09:06:05 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-master, 45199, None)
    23/09/20 09:06:05 INFO BlockManagerMasterEndpoint: Registering block manager spark-master:45199 with 366.3 MiB RAM, BlockManagerId(driver, spark-master, 45199, None)
    23/09/20 09:06:05 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-master, 45199, None)
    23/09/20 09:06:05 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-master, 45199, None)
    23/09/20 09:06:06 ERROR SparkContext: Error initializing SparkContext.
    java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.

奇怪的是,我认为脚本在我尝试的前几次中正在运行,然后这种情况开始发生。此外,该脚本在从 jupyter/pyspark-notebook:spark-3.3.1 映像构建的容器中运行良好。有人能指出我正确的方向吗?

docker pyspark docker-compose bitnami
1个回答
0
投票

Bitnami 不支持开箱即用的 deltalake。您需要配置 jars。而且它也不自动支持 pyspark。

如果你想要 pyspark-shell,请在 cli 中尝试以下命令。

export PYTHONPATH=/opt/bitnami/spark/python/lib/py4j-0.10.9.7-src.zip:/opt/bitnami/spark/python/:/opt/bitnami/spark/python/:
export PYTHONSTARTUP=/opt/bitnami/spark/python/pyspark/shell.py
exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main

© www.soinside.com 2019 - 2024. All rights reserved.