计算点周围分布的质量,python

问题描述 投票:0回答:3

我有以下数据:

ll = [25.553885868617463,
 1.4285714285714288,
 5.0,
 14.142857142857142,
 2.714285430908202,
 -4.428571428571429,
 3.428571428571429,
 2.8571428571428568,
 5.857142857142858,
 -2.0,
 8.571428571428573,
 1.4285714285714288,
 1.857142333984374,
 21.714285714285715,
 3.1428571428571423,
 2.428571428571427,
 -0.2857142857142856,
 -3.0,
 4.142857687813894,
 0.7142857142857135,
 0.714285507202149,
 -1.9999995858328674,
 4.857142464773997,
 2.8571428571428577,
 -6.714285714285714,
 3.57142848423549,
 15.999999901907785,
 0.14285714285714413,
 -3.0,
 0.5687830243791847,
 7.857142900739401,
 -3.0,
 9.0,
 2.428571428571427,
 2.0000001634870266,
 0.7999999999999998,
 -5.7142857142857135,
 3.1428571428571423,
 0.14285714285714235,
 22.5,
 18.571428527832033,
 2.7142857142857135,
 0,
 3.1428571428571423,
 13.142857142857146,
 10.428571428571427,
 30.71428684779576,
 0,
 0.2857140350341787,
 3.571428571428571,
 2.0,
 24.428570175170897,
 2.428571428571429,
 -0.3333333333333339,
 4.2857142857142865,
 -8.000000216166178,
 15.57142857142857,
 2.2857142857142856,
 8.71428565979004,
 0.8571428571428577,
 2.1428570447649276,
 1.0,
 5.000000991821288,
 4.714285714285715,
 6.0,
 2.8571428571428577,
 1.6666666666666679,
 1.9987989153180798,
 12.714285714285715,
 9.85714340209961,
 7.71428658621652,
 -5.857142857142858,
 15.857142857142858,
 4.428571428571429,
 0.5676193237304688,
 1.2857142857142847,
 0.14285705566406248,
 3.428570938110351,
 5.142857142857142,
 -1.2857142857142856,
 -1.0,
 11.714285714285715,
 -0.7142857142857144,
 0.714285888671875,
 -1.0,
 9.428571428571429,
 4.428571428571429,
 -2.428571428571429,
 -20.571428571428573,
 4.0,
 1.1428571428571432,
 2.2857142857142847,
 19.0,
 15.142857142857142,
 5.571428451538086,
 7.428571428571427,
 1.0,
 4.285714481898715,
 3.7142853546142582,
 -3.7142854309082036]

我可以创建直方图

import pandas as pd
import plotly.express as px

px.histogram(pd.DataFrame(ll, columns=['val']), x='val', nbins=100)

如何计算点周围分布的质量/密度?

换句话说,我希望能够计算不同的thr

centre
之间的质量
两条红线

thr = 0.5
centre = 0
fig = px.histogram(pd.DataFrame(ll, columns=['val']), x='val', nbins=100)
fig.add_vline(x=centre+thr, line_color='red')
fig.add_vline(x=centre-thr, line_color='red')
fig.show()

python pandas distribution
3个回答
1
投票

切片 DataFrame 和

sum
:

df = pd.DataFrame(ll, columns=['val'])
out = df.loc[df['val'].between(centre-thr, centre+thr)].sum()

输出:

val    0.095238
dtype: float64

如果要表达峰值相对于总数的比例:

out.div(df.sum())

输出:

val    0.000224
dtype: float64

0
投票

您可以使用 SciPy 的

gaussian_kde
执行核密度估计,并使用它的
integrate_box_1d
方法来计算给定范围内的(归一化)密度。例如,

from scipy.stats import gaussian_kde

kde = gaussian_kde(ll)

low = -0.5
high = 0.5
density = kde.integrate_box_1d(low, high)

如果您想转换回非标准化密度值,您可以乘以样本数。


0
投票

您可以创建直方图分布,并通过在极限处减去 CDF 来获取域下的面积:

import numpy as np
from scipy import stats

hist = np.histogram(ll, bins=100)
dist = stats.rv_histogram(hist, density=False)

x = 0
m = 0.5
A = dist.cdf(x + m) - dist.cdf(x - m)  # 0.08331471696503406
© www.soinside.com 2019 - 2024. All rights reserved.