我想将多索引数据帧转换为一系列邻接矩阵,或者由时间坐标索引的3d Numpy数组。
这是数据框:
Boxes = {'Date': ['2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 12:00:00', '2016-01-01 12:00:00',
'2016-01-01 12:00:00', '2016-01-01 17:54:00', '2016-01-01 22:44:00'],
'From': ['Green','Green','Green','Blue','Blue','Red','Red','Red'],
'To': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle'],
'Qty': ['12', '3', '43', '125', '34', '76', '9', '222' ]}
df = pd.DataFrame(Boxes, columns= ['Date', 'From', 'To', 'Qty'])
我可以通过创建多索引数据框
dups = df.pivot_table(index=['Date'], columns = ['From', 'To'], values = ['Qty'], aggfunc=np.sum).fillna(0)
将此多索引数据帧转换为由时间组件索引的邻接矩阵序列的最佳方法是什么?或者,创建一个3d numpy数组,如下所示:
[[[ 0. 0. 0. 15. 43.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 125. 34.]
[ 0. 0. 0. 0. 76.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 9.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 222. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]]
由于这些矩阵将是稀疏的,因此邻接列表可能是更有效的答案。谢谢!
由于您没有提供预期的输出,我只能提供转换为3 d数组的方式
d1 = len(dups.columns.get_level_values(1).unique())
d2 = len(dups.columns.get_level_values(2).unique())
a = dups.values.reshape((len(dups), d1, d2))
a.shape
Out[450]: (4, 3, 2)