对数据帧中的每个记录应用比例z检验

问题描述 投票:1回答:1

我有下面的代码,在这里我试图将一个样本比例ztest应用于数据中每一行的值。我在数据框df中有以下示例数据。我正在尝试将每个值的比例与从计数值中获得的比例和从obs值中获得的试验次数进行比较。我想要每个记录的p值。相反,我似乎对所有记录都获得一个p值。我在下面有几行期望的输出来说明我的意思。有人可以指出我做错了什么以及如何解决吗?还是建议采用更轻松的方式?看来似乎应该有一种方法可以对付熊猫。

# code:

def pvl(x):
    return sm.stats.proportions_ztest(x['count'], 
                              x['value'],
                              x['obs'], 
                              alternative='larger')[1]



df['pval']=df.apply(pvl,
                    axis=1
      )



# sample data:

print(df)

count   value     obs                         
211.0  0.013354  15800.0
18.0   0.001139  15800.0
310.0  0.019620  15800.0
114.0  0.007215  15800.0
 85.0  0.005380  15800.0


# sample output:

count   value     obs     pval                      
211.0  0.013354  15800.0  0.5
18.0   0.001139  15800.0  0.5
310.0  0.019620  15800.0  0.5
114.0  0.007215  15800.0  0.5
 85.0  0.005380  15800.0  0.5


# desired output:

count   value     obs     pval                      
211.0  0.013354  15800.0  0.49
18.0   0.001139  15800.0  4.1454796845134295e-41
310.0  0.019620  15800.0  0.9999999999965842
python-3.x pandas statsmodels hypothesis-test
1个回答
0
投票

您所拥有的应该起作用,但是我相信您的示例中x['obs']应该在x['value']之前。我在下面更新了。

def pvl(x):   
    _, pval =  sm.stats.proportions_ztest(x['count'], 
                              x['obs'],
                              x['value'],
                              alternative='larger')
    return pval

Documenation:

Signature:
sm.stats.proportions_ztest(
    count,
    nobs,
    value=None,
    alternative='two-sided',
    prop_var=False,
)
Docstring:
Test for proportions based on normal (z) test

Parameters
----------
count : integer or array_like
    the number of successes in nobs trials. If this is array_like, then
    the assumption is that this represents the number of successes for
    each independent sample
nobs : integer or array-like
    the number of trials or observations, with the same length as
    count.
value : float, array_like or None, optional
    This is the value of the null hypothesis equal to the proportion in the
    case of a one sample test. In the case of a two-sample test, the
    null hypothesis is that prop[0] - prop[1] = value, where prop is the
    proportion in the two samples. If not provided value = 0 and the null
    is prop[0] = prop[1]
alternative : string in ['two-sided', 'smaller', 'larger']
    The alternative hypothesis can be either two-sided or one of the one-
    sided tests, smaller means that the alternative hypothesis is
    ``prop < value`` and larger means ``prop > value``. In the two sample
    test, smaller means that the alternative hypothesis is ``p1 < p2`` and
    larger means ``p1 > p2`` where ``p1`` is the proportion of the first
    sample and ``p2`` of the second one.
prop_var : False or float in (0, 1)
    If prop_var is false, then the variance of the proportion estimate is
    calculated based on the sample proportion. Alternatively, a proportion
    can be specified to calculate this variance. Common use case is to
    use the proportion under the Null hypothesis to specify the variance
    of the proportion estimate.

Returns
-------
zstat : float
    test statistic for the z-test
p-value : float
    p-value for the z-test

Examples
--------
>>> count = 5
>>> nobs = 83
>>> value = .05
>>> stat, pval = proportions_ztest(count, nobs, value)
>>> print('{0:0.3f}'.format(pval))
0.695
© www.soinside.com 2019 - 2024. All rights reserved.