从皮尔逊r计算p值与scipy不同

问题描述 投票:0回答:1

我正在研究一些统计数据,并认为我会编写自己的函数来进行Pearson的相关性。出于某种原因,我的代码给出了一个与scipy不一致的错误p值。这是我的代码:

x = [438.69461,
404.060132,
198.210118,
386.826542,
346.090443,
330.09382,
249.341107,
324.826081,
303.760307,
388.144244]

y = [115.343021,
89.971217,
174.421423,
155.054388,
22.921773,
352.884202,
254.11353,
176.763594,
265.541998,
222.213654]

N = len(list(zip(x, y)))

# Calculate covariance
cov = sum([(i - np.mean(x)) * (j - np.mean(y)) for i, j in zip(x, y)]) / (N-1)

# Calculate the standard devs of x and y
sx, sy = np.std(x, ddof=1), np.std(y, ddof=1)

r = cov / (sx * sy)

# r is not normally distributed, so convert to z
z_r = np.log((1+r)/(1-r)) / 2
SE_z_r = 1 / np.sqrt(N-3)
# See Field (2009) pp.172 for discussion on z score relative to particular non-zero value
z = z_r / (SE_z_r)
# convert to t value
t_r = r * np.sqrt(N-2) / np.sqrt(1 - (r**2))

# CIs for r as z
lower_z = z_r - (1.96 * SE_z_r)
upper_z = z_r + (1.96 * SE_z_r)
# Finally convert these values back to correlation coefficients...
lower = (np.exp(2*lower_z) - 1) / (np.exp(2*lower_z) + 1)
upper = (np.exp(2*upper_z) - 1) / (np.exp(2*upper_z) + 1)

# Compute the p value from the t-statistic
p = stats.norm.sf(z)*2

result = {'Pearson\'s r': r,
          '95% CI (upper)':upper,
          '95% CI (lower)':lower,
          'Sig. (two-tailed)':p,
          'N':N}

这给出了以下结果:

Out[315]: 
{'95% CI (lower)': -0.8080015100222782,
 '95% CI (upper)': 0.3455450058720938,
 'N': 10,
 "Pearson's r": -0.3630848051556617,
 'Sig. (two-tailed)': 1.6858418386527083}

这与scipy不同:

stats.pearsonr(x, y)
Out[316]: (-0.3630848051556618, 0.30243603632310984)

我想我正在使用错误的p值计算。因此,如果我使用p = stats.t.cdf(t_r, N-2)*2计算p值,它看起来很好,只有很小的差异:

Out[339]: 
{'95% CI (lower)': -0.8080015100222782,
 '95% CI (upper)': 0.3455450058720938,
 'N': 10,
 "Pearson's r": -0.3630848051556617,
 'Sig. (two-tailed)': 0.3024360363231102}

stats.pearsonr(x, y)
Out[340]: (-0.3630848051556618, 0.30243603632310984)

但随后将x和y更改为以下内容:

x = [52, 41, 44, 68, 83]
y = [86, 92, 107, 135, 150]

为p再次给出奇怪的值:

Out[343]: 
{'95% CI (lower)': 0.07262140356975652,
 '95% CI (upper)': 0.9932583155633771,
 'N': 5,
 "Pearson's r": 0.8973956836822184,
 'Sig. (two-tailed)': 1.9611596807577028}

stats.pearsonr(x, y)
Out[344]: (0.8973956836822184, 0.03884031924229741)
python-3.x scipy statistics
1个回答
0
投票

使用绝对值计算p值:

p = stats.norm.sf(np.abs(z))*2

你可以查看其他答案的参考:

© www.soinside.com 2019 - 2024. All rights reserved.