我正在努力处理scipy.stats.binned_statistic_dd()结果。我有一个位置数组和另一个ID数组,分别在3个方向上进行分箱。我提供的是容器边缘的列表作为输入,而不是每个方向上带有范围选项的容器数量。我在x中有3个容器,在y中有2个容器,在z中有3个容器,或者18个容器。
但是,当我检查列出的箱号时,它们都在大于20的范围内。如何获得箱号以反映所提供的箱数并摆脱所有多余的箱?
我已经尝试按照这篇帖子(Output in scipy.stats.binned_statistic_dd())中的建议进行处理,该帖子处理了类似的问题,但是我不明白如何将其应用于我的案例。和往常一样,该文档像以往一样神秘。
在此示例中,任何帮助我获得1-18之间的Binnumber的方法将不胜感激!
pos = np.array([[-0.02042167, -0.0223282 , 0.00123734],
[-0.0420364 , 0.01196078, 0.00694259],
[-0.09625651, -0.00311446, 0.06125461],
[-0.07693234, -0.02749618, 0.03617278],
[-0.07578646, 0.01199925, 0.02991888],
[-0.03258293, -0.00371765, 0.04245596],
[-0.06765955, 0.02798434, 0.07075846],
[-0.02431445, 0.02774102, 0.06719837],
[ 0.02798265, -0.01096739, -0.01658691],
[-0.00584252, 0.02043389, -0.00827088],
[ 0.00623063, -0.02642285, 0.03232817],
[ 0.00884222, 0.01498996, 0.02912483],
[ 0.07189474, -0.01541584, 0.01916607],
[ 0.07239394, 0.0059483 , 0.0740187 ],
[-0.08519159, -0.02894125, 0.10923724],
[-0.10803509, 0.01365444, 0.09555333],
[-0.0442866 , -0.00845725, 0.10361843],
[-0.04246779, 0.00396127, 0.1418258 ],
[-0.08975861, 0.02999023, 0.12713186],
[ 0.01772454, -0.0020405 , 0.08824418]])
ids = np.array([16, 9, 6, 19, 1, 4, 10, 5, 18, 11, 2, 12, 13, 8, 3, 17, 14,
15, 20, 7])
xbinEdges = np.array([-0.15298488, -0.05108961, 0.05080566, 0.15270093])
ybinEdges = np.array([-0.051, 0. , 0.051])
zbinEdges = np.array([-0.053, 0.049, 0.151, 0.253])
ret = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=False)
bincounts = ret.statistic
binnumber = ret.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)
ranges = [[-0.15298488071, 0.15270092971],
[-0.051000000000000004, 0.051000000000000004],
[-0.0530000000000001, 0.25300000000000006]]
ret3 = stats.binned_statistic_dd(pos, ids, bins=(3,2,3), statistic='count', expand_binnumbers=False, range=ranges)
bincounts = ret3.statistic
binnumber = ret3.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)
好吧,经过几天的背景思考并且快速浏览了binned_statistic_dd()源代码,我认为我已经找到了正确的答案,这很简单。
似乎binned_statistic_dd()在合并阶段添加了一组额外的离群箱,然后在返回直方图结果时将其删除,但不影响箱号(我想这是在您想将结果重新用于其他统计信息的情况下输出)。
因此,如果您导出扩展的bin编号(expand_binnumbers=True
),然后从每个bin编号中减去1以重新调整bin索引,则可以计算出“正确的” bin id。
ret2 = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=True)
bincounts2 = ret2.statistic
binnumber2 = ret2.binnumber
indxnum2 = binnumber2-1
corrected_bin_ids = np.ravel_multi_index((indxnum2),(numX, numY, numZ))
最终简单快捷!