postgreSQL 出现问题,尝试在 Docker 上的 Jupyter Notebook 上连接 PySpark

问题描述 投票:0回答:1

我遇到了这个问题

Py4JJavaError: An error occurred while calling o124.save. : org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
当我在 Jupyter notbook 上运行此 PySark 代码并使用 docker 运行所有内容时,postgreSQL 已安装在本地计算机(Windows)中。

from pyspark.sql import SparkSession
from pyspark.sql.functions import lit, col, explode
import pyspark.sql.functions as f

spark = SparkSession.builder.appName("ETL Pipeline").config("spark.jars", "./postgresql-42.7.1.jar").getOrCreate()
df = spark.read.text("./Data/WordData.txt")

df2 = df.withColumn("splitedData", f.split("value"," "))
df3 = df2.withColumn("words", explode("splitedData"))
wordsDF = df3.select("words")
wordCount = wordsDF.groupBy("words").count()

driver = "org.postgresql.Driver"
url = "jdbc:postgresql://localhost:5432/local_database"
table = "word_count"
user = "postgres"
password = "12345"

wordCount.write.format("jdbc") \
    .option("driver", driver) \
    .option("url", url) \
    .option("dbtable", table) \
    .option("mode", "append") \
    .option("user", user) \
    .option("password", password) \
    .save()

spark.stop()

我尝试编辑 postgresql.conf 添加“listen_addresses = 'localhost'”并编辑 pg_hba.conf 添加“host all all 0.0.0.0/0 md5”,但它对我不起作用,所以我不知道该怎么办。

java python-3.x postgresql pyspark connection
1个回答
0
投票

我也解决了在docker上安装PostgreSQL的问题(使用此图像https://hub.docker.com/_/postgres/为postgres创建一个容器)并在PySpark容器和postgreSQL容器之间创建一个网络命令

docker network create my_network
,

此命令适用于 postgres 容器

docker run --name postgres_container --network my_network -e POSTGRES_PASSWORD=12345 -d -p 5432:5432 postgres:latest

这个用于 Jupyter-pyspark 容器

docker run --name jupyter_container --network my_network -it -p 8888:8888 -v C:\home\work\path:/home/jovyan/work jupyter/pyspark-notebook:latest

© www.soinside.com 2019 - 2024. All rights reserved.