scipy.stats.binned_statistic_dd（）仓位编号有很多额外的仓位

Question

我正在努力处理scipy.stats.binned_statistic_dd（）结果。我有一个位置数组和另一个ID数组，分别在3个方向上进行分箱。我提供的是容器边缘的列表作为输入，而不是每个方向上带有范围选项的容器数量。我在x中有3个容器，在y中有2个容器，在z中有3个容器，或者18个容器。

但是，当我检查列出的箱号时，它们都在大于20的范围内。如何获得箱号以反映所提供的箱数并摆脱所有多余的箱？

我已经尝试按照这篇帖子（Output in scipy.stats.binned_statistic_dd()）中的建议进行处理，该帖子处理了类似的问题，但是我不明白如何将其应用于我的案例。和往常一样，该文档像以往一样神秘。

在此示例中，任何帮助我获得1-18之间的Binnumber的方法将不胜感激！

pos = np.array([[-0.02042167, -0.0223282 ,  0.00123734],
       [-0.0420364 ,  0.01196078,  0.00694259],
       [-0.09625651, -0.00311446,  0.06125461],
       [-0.07693234, -0.02749618,  0.03617278],
       [-0.07578646,  0.01199925,  0.02991888],
       [-0.03258293, -0.00371765,  0.04245596],
       [-0.06765955,  0.02798434,  0.07075846],
       [-0.02431445,  0.02774102,  0.06719837],
       [ 0.02798265, -0.01096739, -0.01658691],
       [-0.00584252,  0.02043389, -0.00827088],
       [ 0.00623063, -0.02642285,  0.03232817],
       [ 0.00884222,  0.01498996,  0.02912483],
       [ 0.07189474, -0.01541584,  0.01916607],
       [ 0.07239394,  0.0059483 ,  0.0740187 ],
       [-0.08519159, -0.02894125,  0.10923724],
       [-0.10803509,  0.01365444,  0.09555333],
       [-0.0442866 , -0.00845725,  0.10361843],
       [-0.04246779,  0.00396127,  0.1418258 ],
       [-0.08975861,  0.02999023,  0.12713186],
       [ 0.01772454, -0.0020405 ,  0.08824418]])

ids = np.array([16,  9,  6, 19,  1,  4, 10,  5, 18, 11,  2, 12, 13,  8,  3, 17, 14,
       15, 20,  7])

xbinEdges = np.array([-0.15298488, -0.05108961,  0.05080566,  0.15270093])
ybinEdges = np.array([-0.051,  0.   ,  0.051])
zbinEdges = np.array([-0.053,  0.049,  0.151,  0.253])

ret = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
                                statistic='count', expand_binnumbers=False)
bincounts = ret.statistic
binnumber = ret.binnumber.T

>>> binnumber  = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
       52, 32, 47], dtype=int64)

ranges = [[-0.15298488071, 0.15270092971],
 [-0.051000000000000004, 0.051000000000000004],
 [-0.0530000000000001, 0.25300000000000006]]

ret3 = stats.binned_statistic_dd(pos, ids, bins=(3,2,3), statistic='count', expand_binnumbers=False, range=ranges)
bincounts = ret3.statistic
binnumber = ret3.binnumber.T

>>> binnumber  = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
       52, 32, 47], dtype=int64)

Answer 1

好吧，经过几天的背景思考并且快速浏览了binned_statistic_dd（）源代码，我认为我已经找到了正确的答案，这很简单。

似乎binned_statistic_dd（）在合并阶段添加了一组额外的离群箱，然后在返回直方图结果时将其删除，但不影响箱号（我想这是在您想将结果重新用于其他统计信息的情况下输出）。

因此，如果您导出扩展的bin编号（expand_binnumbers=True），然后从每个bin编号中减去1以重新调整bin索引，则可以计算出“正确的” bin id。

ret2 = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
                                statistic='count', expand_binnumbers=True)
bincounts2 = ret2.statistic
binnumber2 = ret2.binnumber
indxnum2 = binnumber2-1
corrected_bin_ids = np.ravel_multi_index((indxnum2),(numX, numY, numZ))

最终简单快捷！

scipy.stats.binned_statistic_dd（）仓位编号有很多额外的仓位

问题描述投票：0回答：1

1个回答

最新问题

scipy.stats.binned_statistic_dd（）仓位编号有很多额外的仓位

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1