需要帮助使用python将csv文件中的一列（子行）拆分为两列或三列

Question

我有这个 csv 文件，其中包含以下值示例：

我想将基于该半列的“Intakt inkl moms Artikelnummer”列例如分为两到三列（obs：还需要划分其他列，因为数据不按顺序

Ordernummer:96049
Intakt inkl moms: 73,00
Artikelnummer1: 27404475
Ordernummer:96050
Intakt inkl moms: 536,00
Artikelnummer1: 82047448
Artikelnummer2:75109997

这是我最初写的

import pandas as pd

# Read the CSV file into a DataFrame
 df = pd.read_csv('orderexport_new.csv', encoding='latin1')

# Initialize empty lists to store the split values
Intakt_inkl_moms = {}
Artikelnr1 = {}
Artikelnr2 = {}
ordernummer_list = {}

# Iterate through each row in the DataFrame
 for index, row in df.iterrows():
# Split the values in the "Intakt inkl moms" column
 values = row['Intakt inkl moms'].split(';')
 num_values = len(values)

# Determine the number of values and append them to the corresponding lists
if num_values >= 1:
    Intakt_inkl_moms.append(values[0])
else:
    Intakt_inkl_moms.append(None)
    
if num_values >= 2:
    Artikelnr1.append(values[1])
else:
    Artikelnr1.append(None)
    
if num_values >= 3:
    Artikelnr2.append(values[2])
else:
    Artikelnr2.append(None)
    
# Append the ordernummer to maintain alignment
 ordernummer_list.append(row['Ordernummer'])

# Add the new columns to the DataFrame
 df['Intakt_inkl_moms'] = Intakt_inkl_moms
 df['Artikelnr1'] = Artikelnr1
 df['Artikelnr2'] = Artikelnr2
 df['Ordernummer'] = ordernummer_list

# Drop the original "Intakt inkl moms" column
#
# df.drop(['Intakt inkl moms'], axis=1, inplace=True)

# Save the modified DataFrame to a new CSV file
 df.to_csv('ny_orderdata.csv', index=False)

# Print the DataFrame to verify the changes
print(df)

输出：

请各位指导并写下代码，非常感谢！

Answer 1

处理此问题的最简单方法是自己处理 CSV 的每一行，并捕获丢失的 Ordernummer。

读取每一行，如果有 Ordernummer，则在输出中创建一条新记录
如果没有 Ordernummer，请仅使用记录中您关心的字段更新您添加新字段的最后一条记录
跟踪添加的字段的最大数量，以便可以生成输出 CSV 的标头
自己编写 CSV，或创建
```
pandas.DataFrame
```
以更熟悉的方式继续处理

我建议使用 python 的内置

csv

库，因为它可以很好地处理很多事情，比如各种换行符、不同的分隔符等，并且可以帮你省去一些麻烦。

免责声明：我不知道这些词是什么，我只是将它们保留在问题正文中。

这是一个符合您的要求的模型，要使其适应您的需求，只需更新

columns_to_split

和

columns_to_split_names

来匹配您的用例：

import csv
import pandas as pd

def _main():
    # In file name
    fname = 'orderexport_new.csv'
    # Out file name
    outname = 'ny_orderdata.csv'
    # Which columns do you want to add to new columns?
    columns_to_split = [1, 3]
    # What names to give those new columns?
    columns_to_split_names = ['Artikelnummer', 'Artikelns styckpris inkl moms']
    out_data = []
    # This is to track what the fields in the final CSV should be
    max_cols = None  # (max_cur_artikelnummer, columns_in_that_max_entry)

    with open(fname) as csvfile:
        csvreader = csv.reader(csvfile)
        # Read the two headers first
        headers1 = next(csvreader)
        headers2 = next(csvreader)
        # Iterate through each row in the CSV
        for row in csvreader:
            # Check if there is a value in the first cell
            #   This relies on the cell being empty, if it is some other thing,
            #   then you should modify this check for that
            if row[0]:  # This row has a new Ordernummer
                # Create a small dict with the headers matched to their field values
                new_record = {header_name: field_value for header_name, field_value in zip(headers1, row)}
                out_data.append(new_record)
                # Reset the sub row counter
                cur_artikelnummer = 0
            else:  # This row does not have a new Ordernummer
                # Increment the cur_artikelnummer so that the columns get a new number
                cur_artikelnummer += 1
                # Iterate through each one so that you can add as many as you want above
                for i, column_to_split in enumerate(columns_to_split):
                    out_data[-1][f'{columns_to_split_names[i]}_{cur_artikelnummer}'] = row[column_to_split]
                # If this is the greatest value so far, update so that the CSV headers can be written
                if not max_cols or max_cols[0] < cur_artikelnummer:
                    max_cols = (cur_artikelnummer, out_data[-1].keys())

    # To create a similar CSV
    with open(outname, 'w', newline='\n') as fh:
        csvwriter = csv.writer(fh)
        # csvwriter.writerow(headers1)
        # csvwriter.writerow(headers2)
        csvwriter.writerow(max_cols[1])
        for row in out_data:
            csvwriter.writerow(row.values())

    # If you want a pandas dataframe instead:
    df = pd.DataFrame(out_data, columns=list(max_cols[1]))

if __name__ == '__main__':
    _main()

这是我复制的输入：

订单编号	“完整的墨水妈妈”	“Valutajusterad Intakt inkl 妈妈”	“动态价值”
	Artikelnummer	安塔尔	“Artikelns styckpris inkl 妈妈”
96049	73.00	745.69	736.95
	27404475	1	678.9
96050	536.0	536.0	536.0
	82047448	1	310
	75109997	1	158
96051	6073.0	6073.0	6073.0
	695352	2	3072.0
96052	550.0	556.65	564.74
	737-188-00	1	378.9
96053	550.0	556.65	564.74

这是我上面显示的代码的输出：

订单编号	完整墨水妈妈	Valutajusterad Intakt inkl 妈妈	动态价值	Artikelnummer_1	Artikelns styckpris inkl moms_1	Artikelnummer_2	Artikelns styckpris inkl moms_2
96049	73.00	745.69	736.95	27404475	678.9
96050	536.0	536.0	536.0	82047448	310	75109997	158
96051	6073.0	6073.0	6073.0	695352	3072.0
96052	550.0	556.65	564.74	737-188-00	378.9
96053	550.0	556.65	564.74

如果您有任何疑问，请告诉我。

需要帮助使用python将csv文件中的一列（子行）拆分为两列或三列

问题描述投票：0回答：1

1个回答

最新问题

需要帮助使用python将csv文件中的一列（子行）拆分为两列或三列

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1