我有以下数据:
ll = [25.553885868617463,
1.4285714285714288,
5.0,
14.142857142857142,
2.714285430908202,
-4.428571428571429,
3.428571428571429,
2.8571428571428568,
5.857142857142858,
-2.0,
8.571428571428573,
1.4285714285714288,
1.857142333984374,
21.714285714285715,
3.1428571428571423,
2.428571428571427,
-0.2857142857142856,
-3.0,
4.142857687813894,
0.7142857142857135,
0.714285507202149,
-1.9999995858328674,
4.857142464773997,
2.8571428571428577,
-6.714285714285714,
3.57142848423549,
15.999999901907785,
0.14285714285714413,
-3.0,
0.5687830243791847,
7.857142900739401,
-3.0,
9.0,
2.428571428571427,
2.0000001634870266,
0.7999999999999998,
-5.7142857142857135,
3.1428571428571423,
0.14285714285714235,
22.5,
18.571428527832033,
2.7142857142857135,
0,
3.1428571428571423,
13.142857142857146,
10.428571428571427,
30.71428684779576,
0,
0.2857140350341787,
3.571428571428571,
2.0,
24.428570175170897,
2.428571428571429,
-0.3333333333333339,
4.2857142857142865,
-8.000000216166178,
15.57142857142857,
2.2857142857142856,
8.71428565979004,
0.8571428571428577,
2.1428570447649276,
1.0,
5.000000991821288,
4.714285714285715,
6.0,
2.8571428571428577,
1.6666666666666679,
1.9987989153180798,
12.714285714285715,
9.85714340209961,
7.71428658621652,
-5.857142857142858,
15.857142857142858,
4.428571428571429,
0.5676193237304688,
1.2857142857142847,
0.14285705566406248,
3.428570938110351,
5.142857142857142,
-1.2857142857142856,
-1.0,
11.714285714285715,
-0.7142857142857144,
0.714285888671875,
-1.0,
9.428571428571429,
4.428571428571429,
-2.428571428571429,
-20.571428571428573,
4.0,
1.1428571428571432,
2.2857142857142847,
19.0,
15.142857142857142,
5.571428451538086,
7.428571428571427,
1.0,
4.285714481898715,
3.7142853546142582,
-3.7142854309082036]
我可以创建直方图
import pandas as pd
import plotly.express as px
px.histogram(pd.DataFrame(ll, columns=['val']), x='val', nbins=100)
如何计算点周围分布的质量/密度?
换句话说,我希望能够计算不同的thr
和
centre
之间的质量两条红线:
thr = 0.5
centre = 0
fig = px.histogram(pd.DataFrame(ll, columns=['val']), x='val', nbins=100)
fig.add_vline(x=centre+thr, line_color='red')
fig.add_vline(x=centre-thr, line_color='red')
fig.show()
切片 DataFrame 和
sum
:
df = pd.DataFrame(ll, columns=['val'])
out = df.loc[df['val'].between(centre-thr, centre+thr)].sum()
输出:
val 0.095238
dtype: float64
如果要表达峰值相对于总数的比例:
out.div(df.sum())
输出:
val 0.000224
dtype: float64
gaussian_kde
执行核密度估计,并使用它的 integrate_box_1d
方法来计算给定范围内的(归一化)密度。例如,
from scipy.stats import gaussian_kde
kde = gaussian_kde(ll)
low = -0.5
high = 0.5
density = kde.integrate_box_1d(low, high)
如果您想转换回非标准化密度值,您可以乘以样本数。
您可以创建直方图分布,并通过在极限处减去 CDF 来获取域下的面积:
import numpy as np
from scipy import stats
hist = np.histogram(ll, bins=100)
dist = stats.rv_histogram(hist, density=False)
x = 0
m = 0.5
A = dist.cdf(x + m) - dist.cdf(x - m) # 0.08331471696503406