“calculate_bartlett_sphericity”测试输出 nan 值

问题描述 投票:0回答:1

我有一个数据框

V
如下:

       ECON1     ECON2     ECON3     FOOD1     FOOD2     FOOD3      ENV1  \
28  0.310071  0.096913  0.228500  0.234986  0.260894  0.267858  0.489309   
28  0.353609  0.045075  0.222571  0.222803  0.248388  0.330560  0.060107   
28  0.280600  0.170201  0.232027  0.226792  0.233379  0.316765  0.114550   
28  0.299062  0.127866  0.198080  0.189948  0.222982  0.327082  0.052881   
28  0.346291  0.645534  0.371397  0.389068  0.380557  0.386004  0.186583   

        ENV2      HEA1      HEA2      HEA3     PERS1     PERS2     PERS3  \
28  0.206320  0.252537  0.266968  0.248452  0.184450  0.093345  0.173952   
28 -0.206570  0.263673  0.126182  0.265908  0.134481  0.191341  0.113324   
28  0.237818  0.257337  0.102037  0.214423  0.159002  0.321451  0.165960   
28  0.345857  0.272412  0.069192  0.251301  0.130606  0.132732  0.174925   
28  0.372713  0.382155  0.373531  0.468293  0.364305  0.299510  0.350822   

        COM1      COM2      POL1      POL2  
28  0.781430  0.487822  0.361886  0.233124  
28  0.083918  0.005381  0.266604  0.237078  
28  0.395897  0.257888  0.330607  0.229079  
28  0.000000  0.000000  0.307907  0.238908  
28  0.188402  0.101147  0.410619  0.385933  

我希望进行

bartlett_sphericity
测试,以使用观察到的相关矩阵与单位矩阵来检查观察到的变量(数据框
V
)是否相互关联。

from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi_square_value, p_value=calculate_bartlett_sphericity(V)
print(chi_square_value, p_value)

我发现的问题是输出如下所示:

nan nan

我不确定我做错了什么。

V
中的所有值都是数字。有人可以评论一下吗?

python-3.x dataframe factor-analysis
1个回答
0
投票

Bartlett 球形度测试返回 NaN 值:

您的案例似乎是最后一个。

加载数据:

from io import StringIO
data = """
ECON1,ECON2,ECON3,FOOD1,FOOD2,FOOD3,ENV1,ENV2,HEA1,HEA2,HEA3,PERS1,PERS2,PERS3,COM1,COM2,POL1,POL2
0.310071,0.096913,0.2285,0.234986,0.260894,0.267858,0.489309,0.20632,0.252537,0.266968,0.248452,0.18445,0.093345,0.173952,0.78143,0.487822,0.361886,0.233124
0.353609,0.045075,0.222571,0.222803,0.248388,0.33056,0.060107,-0.20657,0.263673,0.126182,0.265908,0.134481,0.191341,0.113324,0.083918,0.005381,0.266604,0.237078
0.2806,0.170201,0.232027,0.226792,0.233379,0.316765,0.11455,0.237818,0.257337,0.102037,0.214423,0.159002,0.321451,0.16596,0.395897,0.257888,0.330607,0.229079
0.299062,0.127866,0.19808,0.189948,0.222982,0.327082,0.052881,0.345857,0.272412,0.069192,0.251301,0.130606,0.132732,0.174925,0.0,0.0,0.307907,0.238908
0.346291,0.645534,0.371397,0.389068,0.380557,0.386004,0.186583,0.372713,0.382155,0.373531,0.468293,0.364305,0.29951,0.350822,0.188402,0.101147,0.410619,0.385933
"""
# Convert the string data to a file-like object
data_io = StringIO(data)
# Read the data into a pandas DataFrame
V = pd.read_csv(data_io)

检查每个变量的相关性大于 0.95 的次数:

(V.corr() > .95).sum(1).sort_values(ascending=False)
POL2 经济2 PERS1 经济3 食物1 食物2 PERS3 HEA1 HEA3 COM2 COM1 POL1 经济1 PERS2 ENV2 ENV1 食物3 HEA2
9 7 7 6 6 6 4 4 4 2 2 1 1 1 1 1 1 1

让我们从数据集中删除案例较多的变量,看看 Bartlett 测试是否返回正确的值:

for c in ['POL2','ECON2','PERS1']:
    V_fix = V.drop(c, axis=1)
    chi_square_value, p_value = calculate_bartlett_sphericity(V_fix)
    print(c, chi_square_value, p_value)
POL2 nan nan
ECON2 -1181.9125463026403 1.0
PERS1 -1182.5543638437994 1.0
© www.soinside.com 2019 - 2024. All rights reserved.