我正在尝试使用 PySpark 和 Docker 构建一个容器化的小批量数据处理管道,之后处理的数据将存储在 Cassandra 中。我正在使用一个 docker-compose 文件来为 spark 和 Cassandra 拉取图像,我能够毫无错误地运行我的 pyspark 文件,但是在尝试运行 cassandra 行时出现错误,例如创建键空间和表,这就是我尝试的原因在容器中使用 cqlsh,之后出现以下错误“Connection error: ('Unable to connect to any servers', {'127.0.0.1:9042': ConnectionRefusedError(111, "Tried connecting to [('127.0. 0.1', 9042)]. Last error: Connection refused")})" 使用以下 docker-compose:-
version: '3'
networks:
app-tier:
driver: bridge
services:
spark:
image: docker.io/bitnami/spark:3.3
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
ports:
- '8080:8080'
volumes:
- ".:/opt/spark"
spark-worker:
image: docker.io/bitnami/spark:3.3
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
networks:
- app-tier
cassandra:
image: 'bitnami/cassandra:latest'
#image: docker.io/bitnami/cassandra:4.1
#image: cassandra:latest
ports:
- '7000:7000'
- '127.0.0.1:9042:9042'
volumes:
#- 'cassandra_data:/bitnami'
- ".:/opt/cassandra"
environment:
- CASSANDRA_SEEDS=cassandra
- CASSANDRA_PASSWORD_SEEDER=yes
- CASSANDRA_PASSWORD=cassandra
networks:
- app-tier ```
docker commands: -
docker compose up -d
docker ps
I have tried pulling various types of Cassandra images which comes up with the same error and I have checked several sources to identify how to schedule this using airflow in the container to no avail