我有一个嵌套的字典,我想使其成为multiIndex行和列,如下所示。但是我的数据以某种方式丢失在表中。
test= {12: {'Category 1': {'TestA': {'att_1': 1, 'att_2': 'whatever'}, 'TestB': {'att_1': 3, 'att_2': 'spring'}}, 'Category 2': {'TestA': {'att_1': 23, 'att_2': 'another'}, 'TestB': {'att_1': 9, 'att_2': 'summer'}}}, 15: {'Category 1': {'TestA': {'att_1': 10, 'att_2': 'foo'}, 'TestB': {'att_1': 29, 'att_2': 'fall'}}, 'Category 2': {'TestA': {'att_1': 30, 'att_2': 'bar'}, 'TestB': {'att_1': 36, 'att_2': 'winter'}}}}
columns=pd.MultiIndex.from_arrays([['TestA','TestA','TestB','TestB'],['att_1','att_2','att_1','att_2']])
我想要的格式:
TestA TestB
att_1 att_2 att_1 att_2
12 Category 1 NaN NaN NaN NaN
Category 2 NaN NaN NaN NaN
15 Category 1 NaN NaN NaN NaN
Category 2 NaN NaN NaN NaN
我做了
pd.DataFrame(test,index=pd.MultiIndex.from_arrays([[12,12,15,15],['Category 1','Category 2','Category 1','Category 2']]),columns=pd.MultiIndex.from_arrays([['TestA','TestA','TestB','TestB'],['att_1','att_2','att_1','att_2']]))
我的数据丢失如下:
TestA TestB
att_1 att_2 att_1 att_2
12 Category 1 NaN NaN NaN NaN
Category 2 NaN NaN NaN NaN
15 Category 1 NaN NaN NaN NaN
Category 2 NaN NaN NaN NaN
如果我只有multiIndex行,那行得通,但是我想要multiIndex行和列。
pd.DataFrame.from_dict({(i,j): test[i][j]
for i in test.keys()
for j in test[i].keys()},
orient='index')
TestA TestB
12 Category 1 {'att_1': 1, 'att_2': 'whatever'} {'att_1': 3, 'att_2': 'spring'}
Category 2 {'att_1': 23, 'att_2': 'another'} {'att_1': 9, 'att_2': 'summer'}
15 Category 1 {'att_1': 10, 'att_2': 'foo'} {'att_1': 29, 'att_2': 'fall'}
Category 2 {'att_1': 30, 'att_2': 'bar'} {'att_1': 36, 'att_2': 'winter
您可以通过以下方式获得所需的数据框:
import pandas as pd
import numpy as np
test= {12: {'Category 1': {'TestA': {'att_1': 1, 'att_2': 'whatever'}, 'TestB': {'att_1': 3, 'att_2': 'spring'}}, 'Category 2': {'TestA': {'att_1': 23, 'att_2': 'another'}, 'TestB': {'att_1': 9, 'att_2': 'summer'}}}, 15: {'Category 1': {'TestA': {'att_1': 10, 'att_2': 'foo'}, 'TestB': {'att_1': 29, 'att_2': 'fall'}}, 'Category 2': {'TestA': {'att_1': 30, 'att_2': 'bar'}, 'TestB': {'att_1': 36, 'att_2': 'winter'}}}}
# Row indexes
row_index = [[12,12,15,15],['Category 1','Category 2','Category 1','Category 2']]
# Column indexes
col_index = [['TestA','TestA','TestB','TestB'],['att_1','att_2','att_1','att_2']]
# Values row wise
values =[1,'whatever',3,'spring',23,'another',9,'summer',10,'foo',29,'fall',30,'bar',36,'winter']
# Convert the list of values to numpy array
value = np.array(values)
# Reshape the value as (4,4) array as the matrix/dataframe is of shape (4,4)
value = value.reshape(4,4)
# Get your required data frame
pd.DataFrame(value, index=row_index, columns=col_index)