大熊猫任意分布的频率均值计算

问题描述 投票:2回答:1

我有一个大型数据集,其值范围从1 to 25,分辨率为o.1。分布本质上是任意的,模式值为1.样本数据集可以是:

1,
1,
23.05,
19.57,
1,
1.56,
1,
23.53,
19.74,
7.07,
1,
22.85,
1,
1,
7.78,
16.89,
12.75,
15.32,
7.7,
14.26,
15.41,
1,
16.34,
8.57,
15,
14.97,
1.18,
14.15,
1.94,
14.61,
1,
15.49,
1,
9.18,
1.71,
1,
10.4,

如何评估不同范围(0-0.5,0.5-1等)的计数,并在pandas中找出它们的频率均值,Python。

预期产量可以

值的范围(of)occurrence(n)f * n

 1  
 2.2      1-2   2   3
 2.8      2-3   3   7.5
 3.7      3-4   2   7
 5.5      4-5   1   4.5
 5.8      5-6   3   16.5
 4.3            
 2.7 sum-  11        38.5
 3.5            
 1.8        frequency mean  3.5
 5.9            
python python-3.x pandas mean frequency-analysis
1个回答
2
投票

您需要cut进行分级,然后将CategoricalIndex转换为IntervalIndex以获取mid值,将mul,sum和last div标量转换为多列:

df = pd.DataFrame({'col':[1,2.2,2.8,3.7,5.5,5.8,4.3,2.7,3.5,1.8,5.9]})
print (df)
    col
0   1.0
1   2.2
2   2.8
3   3.7
4   5.5
5   5.8
6   4.3
7   2.7
8   3.5
9   1.8
10  5.9

binned = pd.cut(df['col'], np.arange(1, 7), include_lowest=True)
df1 = df.groupby(binned).size().reset_index(name='val')
df1['mid'] = pd.IntervalIndex(df1['col']).mid
df1['mul'] = df1['val'].mul(df1['mid'])
print (df1)
            col  val     mid     mul
0  (0.999, 2.0]    2  1.4995   2.999
1    (2.0, 3.0]    3  2.5000   7.500
2    (3.0, 4.0]    2  3.5000   7.000
3    (4.0, 5.0]    1  4.5000   4.500
4    (5.0, 6.0]    3  5.5000  16.500

a = df1.sum()
print (a)
val    11.0000
mid    17.4995
mul    38.4990
dtype: float64

b = a['mul'] / a['val']
print (b)
3.49990909091
© www.soinside.com 2019 - 2024. All rights reserved.