题:
我有一个如下数据集:
import numpy as np
x = np.arange(0,10000,0.5)
y = np.arange(x.size)/x.size
在日志日志空间中绘图,它看起来像这样:
import matplotlib.pyplot as plt
plt.loglog(x, y)
plt.show()
显然,此日志日志图中有大量冗余信息。我不需要10000点来代表这种趋势。
我的问题是:如何将这些数据分类,以便在对数刻度的每个数量级上显示统一数量的点?在每个数量级,我想得到大约十分。因此,我需要以指数增长的bin大小对'x'进行bin,然后取与每个bin对应的y
的所有元素的平均值。
尝试:
首先,我生成了我想用于x
的箱子。
# need a nicer way to do this.
# what if I want more than 10 bins per order of magnitude?
bins = 10**np.arange(1,int(round(np.log10(x.max()))))
bins = np.unique((bins.reshape(-1,1)*np.arange(0,11)).flatten())
#array([ 0, 10, 20, 30, 40, 50, 60, 70, 80,
# 90, 100, 200, 300, 400, 500, 600, 700, 800,
# 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000,
# 9000, 10000])
其次,我找到x的每个元素对应的bin的索引:
digits = np.digitize(x, bins)
现在,我可以真正使用帮助部分。我想取y
中每个bin对应的每个元素的平均值,然后将这些平均值与bin中点进行对比:
# need a nicer way to do this.. is there an np.searchsorted() solution?
# this way is quick and dirty, but it does not scale with acceptable speed
averages = []
for d in np.unique(digits):
mask = digits==d
y_mean = np.mean(y[mask])
averages.append(y_mean)
del mask, y_mean, d
# now plot the averages within each bin against the center of each bin
plt.loglog((bins[1:]+bins[:-1])/2.0, averages)
plt.show()
简介:有更顺畅的方法吗?如何生成每个数量级的任意n
点而不是10?
我将回答你的几个问题中的两个问题:如何交替创建垃圾箱并生成每个数量级的任意n
点而不是10?
您可以使用np.logspace
和np.outer
为任意n
值创建您的垃圾箱,如下所示。 logspace
中的默认基数为10.它生成对数间隔点,类似于生成线性间隔网格的linspace
。
对于n=10
n = 10
bins = np.unique(np.outer(np.logspace(0, 3, 4), np.arange(0, n+1)))
# array([0.e+00, 1.e+00, 2.e+00, 3.e+00, 4.e+00, 5.e+00, 6.e+00, 7.e+00,
# 8.e+00, 9.e+00, 1.e+01, 2.e+01, 3.e+01, 4.e+01, 5.e+01, 6.e+01,
# 7.e+01, 8.e+01, 9.e+01, 1.e+02, 2.e+02, 3.e+02, 4.e+02, 5.e+02,
# 6.e+02, 7.e+02, 8.e+02, 9.e+02, 1.e+03, 2.e+03, 3.e+03, 4.e+03,
# 5.e+03, 6.e+03, 7.e+03, 8.e+03, 9.e+03, 1.e+04])
对于n=20
n = 20
bins = np.unique(np.outer(np.logspace(0, 3, 4), np.arange(0, n+1)))
# array([0.0e+00, 1.0e+00, 2.0e+00, 3.0e+00, 4.0e+00, 5.0e+00, 6.0e+00, 7.0e+00, 8.0e+00, 9.0e+00, 1.0e+01, 1.1e+01, 1.2e+01, 1.3e+01, 1.4e+01, 1.5e+01, 1.6e+01, 1.7e+01, 1.8e+01, 1.9e+01, 2.0e+01, 3.0e+01, 4.0e+01, 5.0e+01, 6.0e+01, 7.0e+01, 8.0e+01, 9.0e+01, 1.0e+02, 1.1e+02, 1.2e+02, 1.3e+02, 1.4e+02, 1.5e+02, 1.6e+02, 1.7e+02, 1.8e+02, 1.9e+02, 2.0e+02, 3.0e+02, 4.0e+02, 5.0e+02, 6.0e+02, 7.0e+02, 8.0e+02, 9.0e+02, 1.0e+03, 1.1e+03, 1.2e+03, 1.3e+03, 1.4e+03, 1.5e+03, 1.6e+03, 1.7e+03, 1.8e+03, 1.9e+03, 2.0e+03, 3.0e+03, 4.0e+03, 5.0e+03, 6.0e+03, 7.0e+03, 8.0e+03, 9.0e+03, 1.0e+04, 1.1e+04, 1.2e+04, 1.3e+04, 1.4e+04, 1.5e+04, 1.6e+04, 1.7e+04, 1.8e+04, 1.9e+04, 2.0e+04])
编辑
如果你想要0, 10, 20, 30...90, 100, 200, 300...
,你可以做以下事情
n = 10
bins = np.unique(np.outer(np.logspace(1, 3, 3), np.arange(0, n+1)))
# array([ 0., 10., 20., 30., 40., 50., 60., 70.,
# 80., 90., 100., 200., 300., 400., 500., 600.,
# 700., 800., 900., 1000., 2000., 3000., 4000., 5000.,
# 6000., 7000., 8000., 9000., 10000.])