将数组列表添加到 pyspark DataFrame

问题描述 投票:0回答:0

我对 Pyspark 很陌生。我正在尝试将使用“mssparkutils.fs.ls(WORK_FOLDER)”命令收集的数组列表(文件列表)添加到 DataFrame。但我收到“TypeError:StructType 无法接受类型为 的对象‘20230205’”错误。您的帮助将不胜感激。

代码如下:

================================================= =================================

# Validation Id Checking
columns = StructType([StructField('Name',StringType())])
FileList = []
files = mssparkutils.fs.ls(WORK_FOLDER)
for file in files:
    if file.name.endswith('csv'):
            fileName = file.name
            array = fileName.split("_")
            for word in array:
                index = word.find('Exchange')
                if index != 0:
                    FileList.append(str(word))
print(FileList)
df = spark.createDataFrame(data=FileList,schema=columns) `

================================================= =======================================

print(FileList) 命令给出以下输出: ['20230205', '001040.csv', '20230205', '200005.csv', '20230206', '200006.csv', '20230207', '200021.csv', '20230208', '200007.csv' , '20230209', '200010.csv', '20230210', '200009.csv']

我正在尝试将“FileList”值添加到 Dataframe df。使用列名称“名称”的 StringType。

azure pyspark apache-synapse
© www.soinside.com 2019 - 2024. All rights reserved.