我正在尝试将 csv 文件读入数据帧,以将其转换为嵌套的 json。 csv 文件中的分隔符是“;”。将数据加载到数据框中后,我得到以下结果:
名字 | 频率 | 频率_开始_日期 | 频率_计数 | 日程类型 | 一周中的某天 | 评论 |
---|---|---|---|---|---|---|
图案1 | 每日 | 2022-02-01 | 1 | 平日 | 星期一 | 空 |
模式2 | 每日 | 2024-01-01 | 1, 5 | 平日,平日 | 周一、周二 | 空,空 |
模式3 | 每日 | 2021-03-21 | 1, 2 | 平日,平日 | 周四、周五 | 空,空 |
这是我编写的用于将 csv 加载为数据框的代码:
df1_data = pd.read_csv(csv1, delimiter=';', keep_default_na=False, dtype=object)
这是我编写的用于将数据帧转换为 json 的代码:
nested_cols = ['frequency_count', 'schedule_type', 'day_of_week']
df1_data['patterns'] = df1_data[nested_cols].to_dict('records')
df1_nested2 = df1_data[['name', 'frequency', 'frequency_start_date', 'patterns', 'comments']].to_json(orient='records', indent=4)
当我运行此程序时,我得到以下信息:
[
{
"name": "Pattern 1",
"frequency": "daily",
"frequency_start_date": "2022-02-01",
"patterns": {
"frequency_count": "1",
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
},
{
"name": "Pattern 2",
"frequency": "daily",
"frequency_start_date": "2024-01-01",
"patterns": {
"frequency_count": "1, 5",
"schedule_type": "weekdays,weekdays",
"day_of_week": "null, null"
},
"comments": null
},
{
"name": "Pattern 3",
"frequency": "daily",
"frequency_start_date": "2021-03-21",
"patterns": {
"frequency_count": "1, 2",
"schedule_type": "weekdays,weekdays",
"day_of_week": "null, null"
},
"comments": null
}
]
但这就是我想要的:
[
{
"name": "Pattern 1",
"frequency": "daily",
"frequency_start_date": "2022-02-01",
"patterns": {
"frequency_count": "1",
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
},
{
"name": "Pattern 2",
"frequency": "daily",
"frequency_start_date": "2024-01-01",
"patterns": {
"frequency_count": 1,
"schedule_type": "weekdays",
"day_of_week": null
},
{
"frequency_count": 5,
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
},
{
"name": "Pattern 3",
"frequency": "daily",
"frequency_start_date": "2021-03-21",
"patterns": {
"frequency_count": 1,
"schedule_type": "weekdays",
"day_of_week": null
},
{
"frequency_count": 2,
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
}
]
这就是我被困住的地方。这是 csv 到数据框的问题吗?或者将数据框转换成json?似乎有多个值的列被读取为字符串。我阅读了 StackOverflow 上发布的问题,但实际上没有一个问题存在,或者答案并不能帮助我解决这个问题。
任何帮助将不胜感激。
谢谢,
测试版
鉴于:
name frequency frequency_start_date frequency_count schedule_type day_of_week Comments
0 Pattern 1 daily 2022-02-01 1 weekdays Monday NaN
1 Pattern 2 daily 2024-01-01 1, 5 weekdays, weekdays Monday, Tuesday null, null
2 Pattern 3 daily 2021-03-21 1, 2 weekdays, weekdays Thursday,Friday null, null
正在做:
multi_cols = ["frequency_count", "schedule_type", "day_of_week", "Comments"]
for col in multi_cols:
df[col] = df[col].str.split(", ?") # regex
df = df.explode(multi_cols)
patterns
列并修复Comments
列:df["patterns"] = df[["frequency_count", "schedule_type", "day_of_week"]].to_dict("records")
df["Comments"] = df["Comments"].fillna("null")
output = df.pivot_table(
index=["name", "frequency", "frequency_start_date", "Comments"],
values="patterns",
aggfunc=list,
).reset_index().to_json(orient="records", indent=4)
print(output)
输出:
[
{
"name":"Pattern 1",
"frequency":"daily",
"frequency_start_date":"2022-02-01",
"Comments":"null",
"patterns":[
{
"frequency_count":"1",
"schedule_type":"weekdays",
"day_of_week":"Monday"
}
]
},
{
"name":"Pattern 2",
"frequency":"daily",
"frequency_start_date":"2024-01-01",
"Comments":"null",
"patterns":[
{
"frequency_count":"1",
"schedule_type":"weekdays",
"day_of_week":"Monday"
},
{
"frequency_count":"5",
"schedule_type":"weekdays",
"day_of_week":"Tuesday"
}
]
},
{
"name":"Pattern 3",
"frequency":"daily",
"frequency_start_date":"2021-03-21",
"Comments":"null",
"patterns":[
{
"frequency_count":"1",
"schedule_type":"weekdays",
"day_of_week":"Thursday"
},
{
"frequency_count":"2",
"schedule_type":"weekdays",
"day_of_week":"Friday"
}
]
}
]