嗨,大家好,我目前正在做我的学校项目,我需要将我的dic转换为dataframe,以便将其用于机器学习。
myDic = {
'Acura': {
'CL': {
'2003': {
'transmission': '4',
'engine': '1',
'drivetrain': 'NHTSA: 13',
'wheels_hubs': 'NHTSA: 8',
'seat_belts_air_bags': 'NHTSA: 6',
'brakes': 'NHTSA: 6',
'lights': 'NHTSA: 5',
'body_paint': 'NHTSA: 2',
'fuel_system': 'NHTSA: 2',
'electrical': 'NHTSA: 2',
'suspension': 'NHTSA: 2',
'miscellaneous': 'NHTSA: 1',
'steering': 'NHTSA: 1'
},
'2002': {
'transmission': '2',
'engine': 'NHTSA: 8',
'brakes': 'NHTSA: 7',
'electrical': 'NHTSA: 4',
'accessories-interior': 'NHTSA: 3',
'seat_belts_air_bags': 'NHTSA: 3',
'suspension': 'NHTSA: 2',
'drivetrain': 'NHTSA: 2',
'body_paint': 'NHTSA: 1',
'accessories-exterior': 'NHTSA: 1',
'windows_windshield': 'NHTSA: 1',
'fuel_system': 'NHTSA: 1',
'steering': 'NHTSA: 1',
'miscellaneous': 'NHTSA: 1'
}
}
}
}
它是这样的。我可以搜索我的dic为 myDic['Acura']['CL']['2003']
我的意思是'品牌'-'型号'-'年份',它给出了汽车的问题。那么,我怎样才能将其转换为数据框架呢?栏目将是品牌,型号,年份和问题?
我假设你要找的是:
import pandas as pd
restructure_dict = {
(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in myDic.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict).T.reset_index()
df = df.rename(columns={'level_0': 'brand', 'level_1': 'model', 'level_2': 'year'})
print(df)
而输出将是:
brand model year transmission engine drivetrain wheels_hubs seat_belts_air_bags brakes lights body_paint fuel_system electrical suspension miscellaneous steering accessories-interior accessories-exterior windows_windshield
0 Acura CL 2003 4 1 NHTSA: 13 NHTSA: 8 NHTSA: 6 NHTSA: 6 NHTSA: 5 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 1 NHTSA: 1 NaN NaN NaN
1 Acura CL 2002 2 NHTSA: 8 NHTSA: 2 NaN NHTSA: 3 NHTSA: 7 NaN NHTSA: 1 NHTSA: 1 NHTSA: 4 NHTSA: 2 NHTSA: 1 NHTSA: 1 NHTSA: 3 NHTSA: 1 NHTSA: 1
另一个可能的解决方案是:
import pandas as pd
restructure_dict = {
(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in myDic.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict)
print(df)
输出结果是:
Acura
CL
2003 2002
transmission 4 2
engine 1 NHTSA: 8
drivetrain NHTSA: 13 NHTSA: 2
wheels_hubs NHTSA: 8 NaN
seat_belts_air_bags NHTSA: 6 NHTSA: 3
brakes NHTSA: 6 NHTSA: 7
lights NHTSA: 5 NaN
body_paint NHTSA: 2 NHTSA: 1
fuel_system NHTSA: 2 NHTSA: 1
electrical NHTSA: 2 NHTSA: 4
suspension NHTSA: 2 NHTSA: 2
miscellaneous NHTSA: 1 NHTSA: 1
steering NHTSA: 1 NHTSA: 1
accessories-interior NaN NHTSA: 3
accessories-exterior NaN NHTSA: 1
windows_windshield NaN NHTSA: 1
另一个选择是上述结果的转置版本。
import pandas as pd
restructure_dict = {
(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in myDic.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict).T
print(df)
输出为:
transmission engine drivetrain wheels_hubs seat_belts_air_bags brakes lights body_paint fuel_system electrical suspension miscellaneous steering accessories-interior accessories-exterior windows_windshield
Acura CL 2003 4 1 NHTSA: 13 NHTSA: 8 NHTSA: 6 NHTSA: 6 NHTSA: 5 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 1 NHTSA: 1 NaN NaN NaN
2002 2 NHTSA: 8 NHTSA: 2 NaN NHTSA: 3 NHTSA: 7 NaN NHTSA: 1 NHTSA: 1 NHTSA: 4 NHTSA: 2 NHTSA: 1 NHTSA: 1 NHTSA: 3 NHTSA: 1 NHTSA: 1