scipy.stats.binned_statistic_dd()仓位编号有很多额外的仓位

问题描述 投票:0回答:1

我正在努力处理scipy.stats.binned_statistic_dd()结果。我有一个位置数组和另一个ID数组,分别在3个方向上进行分箱。我提供的是容器边缘的列表作为输入,而不是每个方向上带有范围选项的容器数量。我在x中有3个容器,在y中有2个容器,在z中有3个容器,或者18个容器。

但是,当我检查列出的箱号时,它们都在大于20的范围内。如何获得箱号以反映所提供的箱数并摆脱所有多余的箱?

我已经尝试按照这篇帖子(Output in scipy.stats.binned_statistic_dd())中的建议进行处理,该帖子处理了类似的问题,但是我不明白如何将其应用于我的案例。和往常一样,该文档像以往一样神秘。

在此示例中,任何帮助我获得1-18之间的Binnumber的方法将不胜感激!

pos = np.array([[-0.02042167, -0.0223282 ,  0.00123734],
       [-0.0420364 ,  0.01196078,  0.00694259],
       [-0.09625651, -0.00311446,  0.06125461],
       [-0.07693234, -0.02749618,  0.03617278],
       [-0.07578646,  0.01199925,  0.02991888],
       [-0.03258293, -0.00371765,  0.04245596],
       [-0.06765955,  0.02798434,  0.07075846],
       [-0.02431445,  0.02774102,  0.06719837],
       [ 0.02798265, -0.01096739, -0.01658691],
       [-0.00584252,  0.02043389, -0.00827088],
       [ 0.00623063, -0.02642285,  0.03232817],
       [ 0.00884222,  0.01498996,  0.02912483],
       [ 0.07189474, -0.01541584,  0.01916607],
       [ 0.07239394,  0.0059483 ,  0.0740187 ],
       [-0.08519159, -0.02894125,  0.10923724],
       [-0.10803509,  0.01365444,  0.09555333],
       [-0.0442866 , -0.00845725,  0.10361843],
       [-0.04246779,  0.00396127,  0.1418258 ],
       [-0.08975861,  0.02999023,  0.12713186],
       [ 0.01772454, -0.0020405 ,  0.08824418]])

ids = np.array([16,  9,  6, 19,  1,  4, 10,  5, 18, 11,  2, 12, 13,  8,  3, 17, 14,
       15, 20,  7])

xbinEdges = np.array([-0.15298488, -0.05108961,  0.05080566,  0.15270093])
ybinEdges = np.array([-0.051,  0.   ,  0.051])
zbinEdges = np.array([-0.053,  0.049,  0.151,  0.253])

ret = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
                                statistic='count', expand_binnumbers=False)
bincounts = ret.statistic
binnumber = ret.binnumber.T

>>> binnumber  = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
       52, 32, 47], dtype=int64)

ranges = [[-0.15298488071, 0.15270092971],
 [-0.051000000000000004, 0.051000000000000004],
 [-0.0530000000000001, 0.25300000000000006]]

ret3 = stats.binned_statistic_dd(pos, ids, bins=(3,2,3), statistic='count', expand_binnumbers=False, range=ranges)
bincounts = ret3.statistic
binnumber = ret3.binnumber.T

>>> binnumber  = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
       52, 32, 47], dtype=int64)

python multidimensional-array scipy binning
1个回答
0
投票

好吧,经过几天的背景思考并且快速浏览了binned_statistic_dd()源代码,我认为我已经找到了正确的答案,这很简单。

似乎binned_statistic_dd()在合并阶段添加了一组额外的离群箱,然后在返回直方图结果时将其删除,但不影响箱号(我想这是在您想将结果重新用于其他统计信息的情况下输出)。

因此,如果您导出扩展的bin编号(expand_binnumbers=True),然后从每个bin编号中减去1以重新调整bin索引,则可以计算出“正确的” bin id。

ret2 = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
                                statistic='count', expand_binnumbers=True)
bincounts2 = ret2.statistic
binnumber2 = ret2.binnumber
indxnum2 = binnumber2-1
corrected_bin_ids = np.ravel_multi_index((indxnum2),(numX, numY, numZ))

最终简单快捷!

© www.soinside.com 2019 - 2024. All rights reserved.