我想将基于该半列的“Intakt inkl moms Artikelnummer”列例如分为两到三列(obs:还需要划分其他列,因为数据不按顺序
Ordernummer:96049
Intakt inkl moms: 73,00
Artikelnummer1: 27404475
Ordernummer:96050
Intakt inkl moms: 536,00
Artikelnummer1: 82047448
Artikelnummer2:75109997
这是我最初写的
import pandas as pd
# Read the CSV file into a DataFrame
df = pd.read_csv('orderexport_new.csv', encoding='latin1')
# Initialize empty lists to store the split values
Intakt_inkl_moms = {}
Artikelnr1 = {}
Artikelnr2 = {}
ordernummer_list = {}
# Iterate through each row in the DataFrame
for index, row in df.iterrows():
# Split the values in the "Intakt inkl moms" column
values = row['Intakt inkl moms'].split(';')
num_values = len(values)
# Determine the number of values and append them to the corresponding lists
if num_values >= 1:
Intakt_inkl_moms.append(values[0])
else:
Intakt_inkl_moms.append(None)
if num_values >= 2:
Artikelnr1.append(values[1])
else:
Artikelnr1.append(None)
if num_values >= 3:
Artikelnr2.append(values[2])
else:
Artikelnr2.append(None)
# Append the ordernummer to maintain alignment
ordernummer_list.append(row['Ordernummer'])
# Add the new columns to the DataFrame
df['Intakt_inkl_moms'] = Intakt_inkl_moms
df['Artikelnr1'] = Artikelnr1
df['Artikelnr2'] = Artikelnr2
df['Ordernummer'] = ordernummer_list
# Drop the original "Intakt inkl moms" column
#
# df.drop(['Intakt inkl moms'], axis=1, inplace=True)
# Save the modified DataFrame to a new CSV file
df.to_csv('ny_orderdata.csv', index=False)
# Print the DataFrame to verify the changes
print(df)
请各位指导并写下代码,非常感谢!
处理此问题的最简单方法是自己处理 CSV 的每一行,并捕获丢失的 Ordernummer。
pandas.DataFrame
以更熟悉的方式继续处理我建议使用 python 的内置
csv
库,因为它可以很好地处理很多事情,比如各种换行符、不同的分隔符等,并且可以帮你省去一些麻烦。
免责声明:我不知道这些词是什么,我只是将它们保留在问题正文中。
这是一个符合您的要求的模型,要使其适应您的需求,只需更新
columns_to_split
和 columns_to_split_names
来匹配您的用例:
import csv
import pandas as pd
def _main():
# In file name
fname = 'orderexport_new.csv'
# Out file name
outname = 'ny_orderdata.csv'
# Which columns do you want to add to new columns?
columns_to_split = [1, 3]
# What names to give those new columns?
columns_to_split_names = ['Artikelnummer', 'Artikelns styckpris inkl moms']
out_data = []
# This is to track what the fields in the final CSV should be
max_cols = None # (max_cur_artikelnummer, columns_in_that_max_entry)
with open(fname) as csvfile:
csvreader = csv.reader(csvfile)
# Read the two headers first
headers1 = next(csvreader)
headers2 = next(csvreader)
# Iterate through each row in the CSV
for row in csvreader:
# Check if there is a value in the first cell
# This relies on the cell being empty, if it is some other thing,
# then you should modify this check for that
if row[0]: # This row has a new Ordernummer
# Create a small dict with the headers matched to their field values
new_record = {header_name: field_value for header_name, field_value in zip(headers1, row)}
out_data.append(new_record)
# Reset the sub row counter
cur_artikelnummer = 0
else: # This row does not have a new Ordernummer
# Increment the cur_artikelnummer so that the columns get a new number
cur_artikelnummer += 1
# Iterate through each one so that you can add as many as you want above
for i, column_to_split in enumerate(columns_to_split):
out_data[-1][f'{columns_to_split_names[i]}_{cur_artikelnummer}'] = row[column_to_split]
# If this is the greatest value so far, update so that the CSV headers can be written
if not max_cols or max_cols[0] < cur_artikelnummer:
max_cols = (cur_artikelnummer, out_data[-1].keys())
# To create a similar CSV
with open(outname, 'w', newline='\n') as fh:
csvwriter = csv.writer(fh)
# csvwriter.writerow(headers1)
# csvwriter.writerow(headers2)
csvwriter.writerow(max_cols[1])
for row in out_data:
csvwriter.writerow(row.values())
# If you want a pandas dataframe instead:
df = pd.DataFrame(out_data, columns=list(max_cols[1]))
if __name__ == '__main__':
_main()
这是我复制的输入:
订单编号 | “完整的墨水妈妈” | “Valutajusterad Intakt inkl 妈妈” | “动态价值” |
---|---|---|---|
Artikelnummer | 安塔尔 | “Artikelns styckpris inkl 妈妈” | |
96049 | 73.00 | 745.69 | 736.95 |
27404475 | 1 | 678.9 | |
96050 | 536.0 | 536.0 | 536.0 |
82047448 | 1 | 310 | |
75109997 | 1 | 158 | |
96051 | 6073.0 | 6073.0 | 6073.0 |
695352 | 2 | 3072.0 | |
96052 | 550.0 | 556.65 | 564.74 |
737-188-00 | 1 | 378.9 | |
96053 | 550.0 | 556.65 | 564.74 |
这是我上面显示的代码的输出:
订单编号 | 完整墨水妈妈 | Valutajusterad Intakt inkl 妈妈 | 动态价值 | Artikelnummer_1 | Artikelns styckpris inkl moms_1 | Artikelnummer_2 | Artikelns styckpris inkl moms_2 |
---|---|---|---|---|---|---|---|
96049 | 73.00 | 745.69 | 736.95 | 27404475 | 678.9 | ||
96050 | 536.0 | 536.0 | 536.0 | 82047448 | 310 | 75109997 | 158 |
96051 | 6073.0 | 6073.0 | 6073.0 | 695352 | 3072.0 | ||
96052 | 550.0 | 556.65 | 564.74 | 737-188-00 | 378.9 | ||
96053 | 550.0 | 556.65 | 564.74 |
如果您有任何疑问,请告诉我。