Spark依赖启动失败

问题描述 投票:0回答:1

我收到

dependency failed to start
错误,但容器正在运行且可访问。此外,日志文件没有显示任何错误。我可以从 localhost:9090 访问 UI。我在 Windows 11、12 核、48G RAM、978GB 可用空间上运行 docker。

我已经尝试过:

  • 测试:[“CMD”,“nc”,“-z”,“localhost”,“9090”]
  • 用8080测试。
  • 测试:[“CMD”,“curl”,“-f”,“localhost”,“7077”] =我得到同样的错误。
  • 未运行健康检查,说明依赖项启动失败,但没有健康检查。
  • 运行
    docker-compose up -d
    几次,有时这会修复错误。
  • ChatGPT 获取答案

我得到了什么:

(.env) (base) PS D:\ds_projects\cwy_realtime_data_streaming> docker-compose up -d
    [+] Running 10/10
     ✔ Container af-postgres      Running                                                                                                                                    0.0s 
     ✔ Container cassandra        Running                                                                                                                                    0.0s 
     ✔ Container zookeeper        Healthy                                                                                                                                    2.1s 
     ✘ Container spark-master     Error                                                                                                                                    302.6s 
     ✔ Container af-webserver     Healthy                                                                                                                                    2.1s 
     ✔ Container af-scheduler     Running                                                                                                                                    0.0s 
     ✔ Container broker           Healthy                                                                                                                                    2.6s 
     ✔ Container schema-registry  Running                                                                                                                                    0.0s 
     ✔ Container control-center   Recreated                                                                                                                                  0.1s 
     ✔ Container spark-worker     Started                                                                                                                                    1.6s 
    dependency failed to start: container spark-master is unhealthy

日志文件:

(.env) (base) PS D:\ds_projects\cwy_realtime_data_streaming> docker logs spark-master
spark 15:26:30.61 
spark 15:26:30.61 Welcome to the Bitnami spark container
spark 15:26:30.61 Subscribe to project updates by watching https://github.com/bitnami/containers
spark 15:26:30.61 Submit issues and feature requests at https://github.com/bitnami/containers/issues
spark 15:26:30.61

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/11/04 15:26:31 INFO Master: Started daemon with process name: 1@2880cc9f2499
23/11/04 15:26:31 INFO SignalUtils: Registering signal handler for TERM
23/11/04 15:26:31 INFO SignalUtils: Registering signal handler for HUP
23/11/04 15:26:31 INFO SignalUtils: Registering signal handler for INT
23/11/04 15:26:31 INFO SecurityManager: Changing view acls to: spark
23/11/04 15:26:31 INFO SecurityManager: Changing modify acls to: spark
23/11/04 15:26:31 INFO SecurityManager: Changing view acls groups to:
23/11/04 15:26:31 INFO SecurityManager: Changing modify acls groups to:
23/11/04 15:26:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: spark; groups with view permissions: EMPTY; users with modify permissions: spark; groups with modify permissions: EMPTY
23/11/04 15:26:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/11/04 15:26:32 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
23/11/04 15:26:32 INFO Master: Starting Spark master at spark://172.24.0.5:7077
23/11/04 15:26:32 INFO Master: Running Spark version 3.5.0
23/11/04 15:26:32 INFO JettyUtils: Start Jetty 0.0.0.0:8080 for MasterUI
23/11/04 15:26:32 INFO Utils: Successfully started service 'MasterUI' on port 8080.
23/11/04 15:26:32 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://2880cc9f2499:8080
23/11/04 15:26:32 INFO Master: I have been elected leader! New state: ALIVE
23/11/04 15:26:33 INFO Master: Registering worker 172.24.0.7:40279 with 2 cores, 1024.0 MiB RAM

docker-compose.yaml:

version: '3'

services:
  zookeeper:
      image: confluentinc/cp-zookeeper:7.4.0
      hostname: zookeeper
      container_name: zookeeper
      ports:
        - "2181:2181"
      environment:
        ZOOKEEPER_CLIENT_PORT: 2181
        ZOOKEEPER_TICK_TIME: 2000
      healthcheck:
        test: ['CMD', 'bash', '-c', "echo 'ruok' | nc localhost 2181"]
        interval: 10s
        timeout: 5s
        retries: 5
      networks:
        - confluent

  broker:
    image: confluentinc/cp-server:7.4.0
    hostname: broker
    container_name: broker
    depends_on:
      zookeeper:
        condition: service_healthy
    ports:
      - "9092:9092"
      - "9101:9101"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_CONFLUENT_BALANCER_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_JMX_PORT: 9101
      KAFKA_JMX_HOSTNAME: localhost
      KAFKA_CONFLUENT_SCHEMA_REGISTRY_URL: http://spark-master:8081
      CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: broker:29092
      CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
      CONFLUENT_METRICS_ENABLE: 'false'
      CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
    networks:
      - confluent
    healthcheck:
      test: [ "CMD", "bash", "-c", 'nc -z localhost 9092' ]
      interval: 10s
      timeout: 5s
      retries: 5

  schema-registry:
    image: confluentinc/cp-schema-registry:7.4.0
    hostname: schema-registry
    container_name: schema-registry
    depends_on:
      broker:
        condition: service_healthy
    ports:
      - "8081:8081"
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'broker:29092'
      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
    networks:
      - confluent
    healthcheck:
      test: [ "CMD", "curl", "-f", "http://localhost:8081/" ]
      interval: 30s
      timeout: 10s
      retries: 5

  control-center:
    image: confluentinc/cp-enterprise-control-center:7.4.0
    hostname: control-center
    container_name: control-center
    depends_on:
      broker:
        condition: service_healthy
      spark-master:
        condition: service_healthy
    ports:
      - "9021:9021"
    environment:
      CONTROL_CENTER_BOOTSTRAP_SERVERS: 'broker:29092'
      CONTROL_CENTER_SCHEMA_REGISTRY_URL: "http://spark-master:8081"
      CONTROL_CENTER_REPLICATION_FACTOR: 1
      CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
      CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
      CONFLUENT_METRICS_TOPIC_REPLICATION: 1
      CONFLIENT_METRICS_ENABLE: 'false'
      PORT: 9021
    networks:
      - confluent
    healthcheck:
      test: [ "CMD", "curl", "-f", "http://localhost:9021/health" ]
      interval: 30s
      timeout: 10s
      retries: 5

  webserver:
    image: apache/airflow:2.6.0-python3.9
    container_name: af-webserver
    command: webserver
    entrypoint: ['/opt/airflow/script/entrypoint.sh']
    depends_on:
      - postgres
    env_file:
      - airflow.env
    environment:
      - LOAD_EX=n
      - EXECUTOR=Sequential
    logging:
      options:
        max-size: 10m
        max-file: "3"
    volumes:
      - ./dags:/opt/airflow/dags
      - ./script/entrypoint.sh:/opt/airflow/script/entrypoint.sh
      - ./requirements.txt:/opt/airflow/requirements.txt
    ports:
      - "8080:8080"
    healthcheck:
      test: ['CMD-SHELL', "[ -f /opt/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3
    networks:
      - confluent

  scheduler:
    image: apache/airflow:2.6.0-python3.9
    container_name: af-scheduler
    depends_on:
      webserver:
        condition: service_healthy
    volumes:
      - ./dags:/opt/airflow/dags
      - ./script/entrypoint.sh:/opt/airflow/script/entrypoint.sh
      - ./requirements.txt:/opt/airflow/requirements.txt
    env_file:
      - airflow.env
    environment:
      - LOAD_EX=n
      - EXECUTOR=Sequential
    command: bash -c "pip install -r ./requirements.txt && airflow db upgrade && airflow scheduler"
    networks:
      - confluent

  postgres:
    image: postgres:14.0
    container_name: af-postgres
    env_file:
      - postgres.env
    environment:
      - POSTGRES_DB=airflow
    logging:
      options:
        max-size: 10m
        max-file: "3"
    networks:
      - confluent

  spark-master:
    image: bitnami/spark:latest
    container_name: spark-master
    command: bin/spark-class org.apache.spark.deploy.master.Master
    ports:
      - "9090:8080"
      - "7077:7077"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9090/"]
      interval: 30s
      timeout: 10s
      retries: 5
    networks:
      - confluent
  spark-worker:
    image: bitnami/spark:latest
    container_name: spark-worker
    command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
    depends_on:
      - spark-master
    environment:
      SPARK_MODE: worker
      SPARK_WORKER_CORES: 2
      SPARK_WORKER_MEMORY: 1g
      SPARK_MASTER_URL: spark://spark-master:7077
    networks:
      - confluent

  cassandra_db:
    image: cassandra:latest
    container_name: cassandra
    hostname: cassandra
    ports:
      - "9042:9042"
    env_file:
      - cassandra.env
    environment:
      - MAX_HEAP_SIZE=512M
      - HEAP_NEWSIZE=100M
    networks:
      - confluent

networks:
  confluent:

“.env”文件中的秘密(这只是一个学习,所以我只是将它们作为良好的实践移动):

AIRFLOW_WEBSERVER_SECRET_KEY=admin
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow

POSTGRES_USER=airflow
POSTGRES_PASSWORD=airflow

CASSANDRA_USERNAME=cassandra
CASSANDRA_PASSWORD=cassandra
apache-spark docker-compose spark-streaming health-check data-engineering
1个回答
0
投票

问题是在控制中心我有

CONTROL_CENTER_SCHEMA_REGISTRY_URL: "http://spark-master:8081"
,但它应该是
CONTROL_CENTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081
”。所以它显示为spark-master的依赖项,导致错误。

课程: 如果某些内容显示依赖项错误并且实际依赖项(本例中为工作程序)没有错误,请检查是否存在错误引用。

© www.soinside.com 2019 - 2024. All rights reserved.