将 3 级字典转换为 DataFrame

问题描述 投票:0回答:1

我有这个 3 级字典: `将 pandas 导入为 pd 从运算符导入添加

d1={
Sweden':{'jan':{
    
'0-5': 5,
'6-8': 8,
'9-10':19,
'11-15': 14,
'16-18': 24},
    
'march':{        
    
'0-5': 5,
'6-8': 18,
'9-10': 9,
'11-15': 14,
'16-18': 24},
      
'feb':{        
'0-5': 5,
'6-8': 7,
'9-10': 3,
'11-15': 14,
'16-18': 24}},

'Norway':{'jan':{ 
'0-5': 25,
'6-8': 8,
'9-10': 45,
'11-15': 14,
'16-18': 24},
'march':{        
'0-5': 2,
'6-8': 8,
'9-10': 88,
'11-15': 14,
'16-18': 24},
      
'feb':{        
'0-5': 5,
'6-8': 48,
'9-10': 9,
'11-15': 39,
'16-18': 24}}

}`

我可以使用嵌套 for 循环将其解压到我想要的 DataFrame:

`colnames=['country','month','age','revenue']
lst=[]
for i in d1.keys():
for j in d1[i].keys():
    revenue=list(d1[i][j].items())

    l1=list(map(add,[(i,j)]*5,revenue))
    lst=lst+l1

df=pd.DataFrame.from_records(lst,columns=colnames)`

到形状

(30,4)
DataFrame。

pandas 是否有内置函数可以更好地执行此操作 没有 for 循环?

pandas dictionary unpack
1个回答
1
投票

您可以使用 函数来重塑,但效率可能较低:

out = (pd.concat({k: pd.DataFrame(d).rename_axis(index='age', columns='month')
                  for k, d in d1.items()},
                 names=['country'])
         .stack().reset_index(name='revenue')
      )

使用字典理解的代码变体:

out = pd.DataFrame([(k1, k2, k3, v3) for k1, d in d1.items()
                    for k2, d2 in d.items()
                    for k3, v3 in d2.items()],
                    columns=['country', 'month', 'age', 'revenue'])

输出:

   country  month    age  revenue
0   Sweden    jan    0-5        5
1   Sweden    jan    6-8        8
2   Sweden    jan   9-10       19
3   Sweden    jan  11-15       14
4   Sweden    jan  16-18       24
5   Sweden  march    0-5        5
6   Sweden  march    6-8       18
7   Sweden  march   9-10        9
8   Sweden  march  11-15       14
9   Sweden  march  16-18       24
10  Sweden    feb    0-5        5
11  Sweden    feb    6-8        7
12  Sweden    feb   9-10        3
13  Sweden    feb  11-15       14
14  Sweden    feb  16-18       24
15  Norway    jan    0-5       25
16  Norway    jan    6-8        8
17  Norway    jan   9-10       45
18  Norway    jan  11-15       14
19  Norway    jan  16-18       24
20  Norway  march    0-5        2
21  Norway  march    6-8        8
22  Norway  march   9-10       88
23  Norway  march  11-15       14
24  Norway  march  16-18       24
25  Norway    feb    0-5        5
26  Norway    feb    6-8       48
27  Norway    feb   9-10        9
28  Norway    feb  11-15       39
29  Norway    feb  16-18       24

时间:

# dictionary comprehension
148 µs ± 4.28 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# pandas reshaping
1.54 ms ± 21.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
© www.soinside.com 2019 - 2024. All rights reserved.