Python 中两个比例差异的置信区间

Question

例如，在 AB 测试中，A 群体可能有 1000 个数据点，其中 100 个是成功的。而 B 可能有 2000 个数据点和 220 次成功。这使得 A 的成功比例为 0.1，B 的成功比例为 0.11，其增量为 0.01。我怎样才能在Python中计算这个增量周围的置信区间？

统计模型可以对一个样本执行此操作，但似乎没有一个包来处理 AB 测试所需的两个样本之间的差异。（http://www.statsmodels.org/dev/ generated/statsmodels.stats.proportion.proportion_confint.html）

Answer 1

我无法从 Statsmodels 中找到此功能。然而，这个网站详细介绍了生成置信区间的数学原理以及以下函数的来源：

def two_proprotions_confint(success_a, size_a, success_b, size_b, significance = 0.05):
    """
    A/B test for two proportions;
    given a success a trial size of group A and B compute
    its confidence interval;
    resulting confidence interval matches R's prop.test function

    Parameters
    ----------
    success_a, success_b : int
        Number of successes in each group

    size_a, size_b : int
        Size, or number of observations in each group

    significance : float, default 0.05
        Often denoted as alpha. Governs the chance of a false positive.
        A significance level of 0.05 means that there is a 5% chance of
        a false positive. In other words, our confidence level is
        1 - 0.05 = 0.95

    Returns
    -------
    prop_diff : float
        Difference between the two proportion

    confint : 1d ndarray
        Confidence interval of the two proportion test
    """
    prop_a = success_a / size_a
    prop_b = success_b / size_b
    var = prop_a * (1 - prop_a) / size_a + prop_b * (1 - prop_b) / size_b
    se = np.sqrt(var)

    # z critical value
    confidence = 1 - significance
    z = stats.norm(loc = 0, scale = 1).ppf(confidence + significance / 2)

    # standard formula for the confidence interval
    # point-estimtate +- z * standard-error
    prop_diff = prop_b - prop_a
    confint = prop_diff + np.array([-1, 1]) * z * se
    return prop_diff, confint

Answer 2

样本大小不必相等。两个比例的置信区间为

p1 和 p2 是观察到的概率，根据各自的样本 n1 和 n2 计算得出。

更多信息请参阅本白皮书。

Answer 3

statsmodels 包现在有confint_proportions_2indep，它获取比较两个比例的置信区间您可以在文档中查看详细信息https://www.statsmodels.org/stable/ generated/statsmodels.stats.proportion.confint_proportions_2indep.html

Answer 4

@纳兹利·萨布尔我正在运行 A/B 测试，并使用confint_proportions_2indep 来获取 CI。然而，我得到的结果是“nan”。你知道为什么吗？这是我的代码。

AB_control_cnt = control.sum()           # Control Sign-Up Count
AB_treatment_cnt = experiment.sum()      # Treatment Sign-Up Count
AB_control_size = control.count()        # Control Sample Size 
AB_treatment_size = experiment.count()   # Treatment Sample Size

ci = confint_proportions_2indep(AB_treatment_cnt, AB_treatment_size, 
                                AB_control_cnt, AB_control_size, method=None, compare='diff',
                                alpha=0.05, correction=True)
lower = ci[0]
upper = ci[1]

Python 中两个比例差异的置信区间

问题描述投票：0回答：4

4个回答

最新问题

Python 中两个比例差异的置信区间

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4