我正在尝试将数据帧写入 csv 并将其存储在本地。但是我在写 CSV 时遇到问题
py4j.protocol.Py4JJavaError: An error occurred while calling o44.csv.
: org.apache.spark.SparkException: Job aborted.
谁能帮我解决这个问题或解决这个问题?
import pyspark, os
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('answer').getOrCreate()
data = [("James","Educative","Engg","USA"),
("Michael","Google",None,"Asia"),
("Robert",None,"Marketing","Russia"),
("Maria","Netflix","Finance","Ukraine"),
(None, None, None, None)
]
columns = ["emp name","company","department","country"]
df = spark.createDataFrame(data = data, schema = columns)
csv_file_path = "data_new.csv"
df.coalesce(1).write.option("header", True).option("delimiter",",").csv(csv_file_path)
这是我试过的代码。我面临如下所示的问题:
Traceback (most recent call last):
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o44.csv.
: org.apache.spark.SparkException: Job aborted.
值得注意的是,DataFrame df 包含一些空值,这在将数据写入 CSV 格式时可能会出现问题。例如,您可以将代码修改为
csv_file_path = "data_new.csv"
df.coalesce(1).write.option("header", True).option("delimiter",",").option("nullValue", "NULL").csv(csv_file_path)