我想将CSV文件转换为JSON。
CSV文件:
id,name,email
1,jim,[email protected]
1,jim,[email protected]
2,kim,[email protected]
预期产出
{"row" : {"id":1,"name":"jim","email": ["[email protected]","[email protected]"]}},
{"row" : {"id":2,"name":"kim","email": "[email protected]"}}
这里有点笨重的实现
import csv
import json
with open('data.csv') as csvfile:
reader = csv.reader(csvfile)
# Get headers
headers = next(reader, None)
result = {}
for row in reader:
# Combine header and line to get a dict
data = dict(zip(headers, row))
if data['id'] not in result:
data.update({'email': [data.pop('email')]})
result[data['id']] = data
else:
# Aware if id and name fields are not consistant
assert data['name'] == result[data['id']]['name']
result[data['id']]['email'].append(data['email'])
for rec in result.values():
try:
# try to unpack as a single value and if it fails leave as is
rec['email'], = rec['email']
except ValueError:
pass
print(json.dumps({'row': rec}))
您可以使用pandas执行此操作:
import pandas as pd
df = pd.read_csv('test.csv', index_col=None)
print(df)
#Output
id name email
0 1 jim [email protected]
1 1 jim [email protected]
2 2 kim [email protected]
df1 = df.groupby(['id', 'name'])['email'].apply(list).reset_index()
df_json = df1.to_json(orient='index')
print(df_json)
#Output:
{"0":{"id":1,"name":"jim","email":["[email protected]","[email protected]"]},"1":{"id":2,"name":"kim","email":["[email protected]"]}}