使用 Pyspark 将数据帧写入 CSV - 获取 org.apache.spark.SparkException:作业中止

问题描述 投票:0回答:1

我正在尝试将数据帧写入 csv 并将其存储在本地。但是我在写 CSV 时遇到问题

py4j.protocol.Py4JJavaError: An error occurred while calling o44.csv.
: org.apache.spark.SparkException: Job aborted.

谁能帮我解决这个问题或解决这个问题?

import pyspark, os
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('answer').getOrCreate()

data = [("James","Educative","Engg","USA"),
    ("Michael","Google",None,"Asia"),
    ("Robert",None,"Marketing","Russia"),
    ("Maria","Netflix","Finance","Ukraine"),
    (None, None, None, None)
  ]

columns = ["emp name","company","department","country"]
df = spark.createDataFrame(data = data, schema = columns)

csv_file_path = "data_new.csv"
df.coalesce(1).write.option("header", True).option("delimiter",",").csv(csv_file_path)

这是我试过的代码。我面临如下所示的问题:

Traceback (most recent call last):
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o44.csv.
: org.apache.spark.SparkException: Job aborted.
apache-spark pyspark export-to-csv
1个回答
0
投票

值得注意的是,DataFrame df 包含一些空值,这在将数据写入 CSV 格式时可能会出现问题。例如,您可以将代码修改为

csv_file_path = "data_new.csv" 
df.coalesce(1).write.option("header", True).option("delimiter",",").option("nullValue", "NULL").csv(csv_file_path)
© www.soinside.com 2019 - 2024. All rights reserved.