为什么即使数据相同，stats.chisquare 和 stats.chi2_contingency 的结果也不同

Question

我有一个名为“data_count”的数组，存储 9 个数字组的计数。我使用本福德定律生成了一个名为“expected_counts”的预期计数数组。我想测试两个数组是否具有相同的分布。我使用了 stats.chisquare 和 stats.chi2_contingency 函数，但结果却截然不同。 [scipy 指南] (https://docs.scipy.org/doc/scipy/reference/ generated/scipy.stats.chi2_contingency.html) 说他们应该有相同的结果。为什么它对我的案例不起作用？请帮助我，谢谢一百万。

res = chi2_contingency(obs, correction=False)
(res.statistic, res.pvalue) == stats.chisquare(obs.ravel(),
                                               f_exp=ex.ravel(),
                                               ddof=obs.size - 1 - dof)

这是我的代码：

import numpy as np
from scipy import stats

data_count = [34, 10, 8, 16, 14, 5, 4, 7, 4]
expected_counts = [31, 18, 13, 10, 8, 7, 6, 5, 5]

expected_percentage=[(i/sum(expected_counts))*100 for i in expected_counts]
data_percentage=[(i/sum(data_count))*100 for i in data_count]

# method 1
res1 = stats.chisquare(f_obs=data_percentage, f_exp=expected_percentage)
print(res1.pvalue)


# method 2
combined = np.array([data_count, expected_counts])

res2 = stats.chi2_contingency(combined, correction=False)

print(res2.pvalue)

输出结果为： 0.04329908403353834 0.45237501133745583

Answer 1

chi2_contingency

的文档并不表明您的代码将为两个测试生成相同的统计数据和 p 值。如果您从列联表开始，它会显示测试之间的关系，例如：

import numpy as np
from scipy import stats
# contingency table
observed = np.array([[10, 10, 20],
                     [20, 20, 20]])
# expected under the null hypothesis of independence
expected = stats.contingency.expected_freq(observed)

# according to the documentation of `chi2_contingency`
dof = observed.size - sum(observed.shape) + observed.ndim - 1
res1 = stats.contingency.chi2_contingency(observed, correction=False)
res2= stats.chisquare(observed.ravel(), f_exp=expected.ravel(), 
                      ddof=observed.size - 1 - dof)

np.testing.assert_allclose(res1.statistic, res2.statistic)
np.testing.assert_allclose(res1.pvalue, res2.pvalue)

您没有列联表形式的数据，因此您可以简单地对原始计数使用

chisquare

- 或者，如果预期计数和观察到的计数相等，也可以。


data_count = np.asarray([34, 10, 8, 16, 14, 5, 4, 7, 4])
expected_counts = np.asarray([31, 18, 13, 10, 8, 7, 6, 5, 5])

# observed and expected counts must be equal
# assuming that the relative frequencies of your expected counts
# are correct and that it is just not normalized properly:
expected_counts = expected_counts * np.sum(data_count) / np.sum(expected_counts)
# assuming no `ddof` adjustment is needed:
res = stats.chisquare(data_count, expected_counts)
# Power_divergenceResult(statistic=16.255158633621637, pvalue=0.03887025202788101)

为什么即使数据相同，stats.chisquare 和 stats.chi2_contingency 的结果也不同

问题描述投票：0回答：1

1个回答

最新问题

为什么即使数据相同，stats.chisquare 和 stats.chi2_contingency 的结果也不同

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1