bin一列并且将(2,N)数组中的另一个相加

问题描述 投票:1回答:1

题:

我有一个如下数据集:

import numpy as np 
x = np.arange(0,10000,0.5)
y = np.arange(x.size)/x.size

在日志日志空间中绘图,它看起来像这样:

import matplotlib.pyplot as plt
plt.loglog(x, y)
plt.show()

enter image description here

显然,此日志日志图中有大量冗余信息。我不需要10000点来代表这种趋势。

我的问题是:如何将这些数据分类,以便在对数刻度的每个数量级上显示统一数量的点?在每个数量级,我想得到大约十分。因此,我需要以指数增长的bin大小对'x'进行bin,然后取与每个bin对应的y的所有元素的平均值。

尝试:

首先,我生成了我想用于x的箱子。

# need a nicer way to do this.
# what if I want more than 10 bins per order of magnitude? 
bins = 10**np.arange(1,int(round(np.log10(x.max()))))
bins = np.unique((bins.reshape(-1,1)*np.arange(0,11)).flatten())

#array([    0,     10,    20,    30,    40,    50,    60,    70,    80,
#          90,   100,   200,   300,   400,   500,   600,   700,   800,
#         900,  1000,  2000,  3000,  4000,  5000,  6000,  7000,  8000,
#        9000, 10000])

其次,我找到x的每个元素对应的bin的索引:

digits = np.digitize(x, bins) 

现在,我可以真正使用帮助部分。我想取y中每个bin对应的每个元素的平均值,然后将这些平均值与bin中点进行对比:

# need a nicer way to do this.. is there an np.searchsorted() solution?
# this way is quick and dirty, but it does not scale with acceptable speed
averages = []
for d in np.unique(digits):
    mask = digits==d
    y_mean = np.mean(y[mask])
    averages.append(y_mean)
del mask, y_mean, d    

# now plot the averages within each bin against the center of each bin 
plt.loglog((bins[1:]+bins[:-1])/2.0, averages)
plt.show()

enter image description here

简介:有更顺畅的方法吗?如何生成每个数量级的任意n点而不是10?

python numpy
1个回答
1
投票

我将回答你的几个问题中的两个问题:如何交替创建垃圾箱并生成每个数量级的任意n点而不是10?

您可以使用np.logspacenp.outer为任意n值创建您的垃圾箱,如下所示。 logspace中的默认基数为10.它生成对数间隔点,类似于生成线性间隔网格的linspace

对于n=10

n = 10
bins = np.unique(np.outer(np.logspace(0, 3, 4), np.arange(0, n+1)))
# array([0.e+00, 1.e+00, 2.e+00, 3.e+00, 4.e+00, 5.e+00, 6.e+00, 7.e+00,
#        8.e+00, 9.e+00, 1.e+01, 2.e+01, 3.e+01, 4.e+01, 5.e+01, 6.e+01,
#        7.e+01, 8.e+01, 9.e+01, 1.e+02, 2.e+02, 3.e+02, 4.e+02, 5.e+02,
#        6.e+02, 7.e+02, 8.e+02, 9.e+02, 1.e+03, 2.e+03, 3.e+03, 4.e+03,
#        5.e+03, 6.e+03, 7.e+03, 8.e+03, 9.e+03, 1.e+04])

对于n=20

n = 20
bins = np.unique(np.outer(np.logspace(0, 3, 4), np.arange(0, n+1)))
# array([0.0e+00, 1.0e+00, 2.0e+00, 3.0e+00, 4.0e+00, 5.0e+00, 6.0e+00, 7.0e+00, 8.0e+00, 9.0e+00, 1.0e+01, 1.1e+01, 1.2e+01, 1.3e+01, 1.4e+01, 1.5e+01, 1.6e+01, 1.7e+01, 1.8e+01, 1.9e+01, 2.0e+01, 3.0e+01, 4.0e+01, 5.0e+01, 6.0e+01, 7.0e+01, 8.0e+01, 9.0e+01, 1.0e+02, 1.1e+02, 1.2e+02, 1.3e+02, 1.4e+02, 1.5e+02, 1.6e+02, 1.7e+02, 1.8e+02, 1.9e+02, 2.0e+02, 3.0e+02, 4.0e+02, 5.0e+02, 6.0e+02, 7.0e+02, 8.0e+02, 9.0e+02, 1.0e+03, 1.1e+03, 1.2e+03, 1.3e+03, 1.4e+03, 1.5e+03, 1.6e+03, 1.7e+03, 1.8e+03, 1.9e+03, 2.0e+03, 3.0e+03, 4.0e+03, 5.0e+03, 6.0e+03, 7.0e+03, 8.0e+03, 9.0e+03, 1.0e+04, 1.1e+04, 1.2e+04, 1.3e+04, 1.4e+04, 1.5e+04, 1.6e+04, 1.7e+04, 1.8e+04, 1.9e+04, 2.0e+04])

编辑

如果你想要0, 10, 20, 30...90, 100, 200, 300...,你可以做以下事情

n = 10
bins = np.unique(np.outer(np.logspace(1, 3, 3), np.arange(0, n+1)))
# array([    0.,    10.,    20.,    30.,    40.,    50.,    60.,    70.,
#           80.,    90.,   100.,   200.,   300.,   400.,   500.,   600.,
#          700.,   800.,   900.,  1000.,  2000.,  3000.,  4000.,  5000.,
#         6000.,  7000.,  8000.,  9000., 10000.])
© www.soinside.com 2019 - 2024. All rights reserved.