需要帮助使用python将csv文件中的一列(子行)拆分为两列或三列

问题描述 投票:0回答:1

我有这个 csv 文件,其中包含以下值示例: enter image description here

我想将基于该半列的“Intakt inkl moms Artikelnummer”列例如分为两到三列(obs:还需要划分其他列,因为数据不按顺序

Ordernummer:96049
Intakt inkl moms: 73,00
Artikelnummer1: 27404475
Ordernummer:96050
Intakt inkl moms: 536,00
Artikelnummer1: 82047448
Artikelnummer2:75109997

这是我最初写的

import pandas as pd

# Read the CSV file into a DataFrame
 df = pd.read_csv('orderexport_new.csv', encoding='latin1')

# Initialize empty lists to store the split values
Intakt_inkl_moms = {}
Artikelnr1 = {}
Artikelnr2 = {}
ordernummer_list = {}

# Iterate through each row in the DataFrame
 for index, row in df.iterrows():
# Split the values in the "Intakt inkl moms" column
 values = row['Intakt inkl moms'].split(';')
 num_values = len(values)

# Determine the number of values and append them to the corresponding lists
if num_values >= 1:
    Intakt_inkl_moms.append(values[0])
else:
    Intakt_inkl_moms.append(None)
    
if num_values >= 2:
    Artikelnr1.append(values[1])
else:
    Artikelnr1.append(None)
    
if num_values >= 3:
    Artikelnr2.append(values[2])
else:
    Artikelnr2.append(None)
    
# Append the ordernummer to maintain alignment
 ordernummer_list.append(row['Ordernummer'])

# Add the new columns to the DataFrame
 df['Intakt_inkl_moms'] = Intakt_inkl_moms
 df['Artikelnr1'] = Artikelnr1
 df['Artikelnr2'] = Artikelnr2
 df['Ordernummer'] = ordernummer_list

# Drop the original "Intakt inkl moms" column
#
# df.drop(['Intakt inkl moms'], axis=1, inplace=True)

# Save the modified DataFrame to a new CSV file
 df.to_csv('ny_orderdata.csv', index=False)

# Print the DataFrame to verify the changes
print(df)

输出: enter image description here

请各位指导并写下代码,非常感谢!

python python-3.x pandas list split
1个回答
0
投票

处理此问题的最简单方法是自己处理 CSV 的每一行,并捕获丢失的 Ordernummer。

  • 读取每一行,如果有 Ordernummer,则在输出中创建一条新记录
  • 如果没有 Ordernummer,请仅使用记录中您关心的字段更新您添加新字段的最后一条记录
  • 跟踪添加的字段的最大数量,以便可以生成输出 CSV 的标头
  • 自己编写 CSV,或创建
    pandas.DataFrame
    以更熟悉的方式继续处理

我建议使用 python 的内置

csv
库,因为它可以很好地处理很多事情,比如各种换行符、不同的分隔符等,并且可以帮你省去一些麻烦。

免责声明:我不知道这些词是什么,我只是将它们保留在问题正文中。

这是一个符合您的要求的模型,要使其适应您的需求,只需更新

columns_to_split
columns_to_split_names
来匹配您的用例:

import csv
import pandas as pd

def _main():
    # In file name
    fname = 'orderexport_new.csv'
    # Out file name
    outname = 'ny_orderdata.csv'
    # Which columns do you want to add to new columns?
    columns_to_split = [1, 3]
    # What names to give those new columns?
    columns_to_split_names = ['Artikelnummer', 'Artikelns styckpris inkl moms']
    out_data = []
    # This is to track what the fields in the final CSV should be
    max_cols = None  # (max_cur_artikelnummer, columns_in_that_max_entry)

    with open(fname) as csvfile:
        csvreader = csv.reader(csvfile)
        # Read the two headers first
        headers1 = next(csvreader)
        headers2 = next(csvreader)
        # Iterate through each row in the CSV
        for row in csvreader:
            # Check if there is a value in the first cell
            #   This relies on the cell being empty, if it is some other thing,
            #   then you should modify this check for that
            if row[0]:  # This row has a new Ordernummer
                # Create a small dict with the headers matched to their field values
                new_record = {header_name: field_value for header_name, field_value in zip(headers1, row)}
                out_data.append(new_record)
                # Reset the sub row counter
                cur_artikelnummer = 0
            else:  # This row does not have a new Ordernummer
                # Increment the cur_artikelnummer so that the columns get a new number
                cur_artikelnummer += 1
                # Iterate through each one so that you can add as many as you want above
                for i, column_to_split in enumerate(columns_to_split):
                    out_data[-1][f'{columns_to_split_names[i]}_{cur_artikelnummer}'] = row[column_to_split]
                # If this is the greatest value so far, update so that the CSV headers can be written
                if not max_cols or max_cols[0] < cur_artikelnummer:
                    max_cols = (cur_artikelnummer, out_data[-1].keys())

    # To create a similar CSV
    with open(outname, 'w', newline='\n') as fh:
        csvwriter = csv.writer(fh)
        # csvwriter.writerow(headers1)
        # csvwriter.writerow(headers2)
        csvwriter.writerow(max_cols[1])
        for row in out_data:
            csvwriter.writerow(row.values())

    # If you want a pandas dataframe instead:
    df = pd.DataFrame(out_data, columns=list(max_cols[1]))

if __name__ == '__main__':
    _main()

这是我复制的输入:

订单编号 “完整的墨水妈妈” “Valutajusterad Intakt inkl 妈妈” “动态价值”
Artikelnummer 安塔尔 “Artikelns styckpris inkl 妈妈”
96049 73.00 745.69 736.95
27404475 1 678.9
96050 536.0 536.0 536.0
82047448 1 310
75109997 1 158
96051 6073.0 6073.0 6073.0
695352 2 3072.0
96052 550.0 556.65 564.74
737-188-00 1 378.9
96053 550.0 556.65 564.74

这是我上面显示的代码的输出:

订单编号 完整墨水妈妈 Valutajusterad Intakt inkl 妈妈 动态价值 Artikelnummer_1 Artikelns styckpris inkl moms_1 Artikelnummer_2 Artikelns styckpris inkl moms_2
96049 73.00 745.69 736.95 27404475 678.9
96050 536.0 536.0 536.0 82047448 310 75109997 158
96051 6073.0 6073.0 6073.0 695352 3072.0
96052 550.0 556.65 564.74 737-188-00 378.9
96053 550.0 556.65 564.74

如果您有任何疑问,请告诉我。

© www.soinside.com 2019 - 2024. All rights reserved.