我有一个数据框
V
如下:
ECON1 ECON2 ECON3 FOOD1 FOOD2 FOOD3 ENV1 \
28 0.310071 0.096913 0.228500 0.234986 0.260894 0.267858 0.489309
28 0.353609 0.045075 0.222571 0.222803 0.248388 0.330560 0.060107
28 0.280600 0.170201 0.232027 0.226792 0.233379 0.316765 0.114550
28 0.299062 0.127866 0.198080 0.189948 0.222982 0.327082 0.052881
28 0.346291 0.645534 0.371397 0.389068 0.380557 0.386004 0.186583
ENV2 HEA1 HEA2 HEA3 PERS1 PERS2 PERS3 \
28 0.206320 0.252537 0.266968 0.248452 0.184450 0.093345 0.173952
28 -0.206570 0.263673 0.126182 0.265908 0.134481 0.191341 0.113324
28 0.237818 0.257337 0.102037 0.214423 0.159002 0.321451 0.165960
28 0.345857 0.272412 0.069192 0.251301 0.130606 0.132732 0.174925
28 0.372713 0.382155 0.373531 0.468293 0.364305 0.299510 0.350822
COM1 COM2 POL1 POL2
28 0.781430 0.487822 0.361886 0.233124
28 0.083918 0.005381 0.266604 0.237078
28 0.395897 0.257888 0.330607 0.229079
28 0.000000 0.000000 0.307907 0.238908
28 0.188402 0.101147 0.410619 0.385933
我希望进行
bartlett_sphericity
测试,以使用观察到的相关矩阵与单位矩阵来检查观察到的变量(数据框 V
)是否相互关联。
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi_square_value, p_value=calculate_bartlett_sphericity(V)
print(chi_square_value, p_value)
我发现的问题是输出如下所示:
nan nan
我不确定我做错了什么。
V
中的所有值都是数字。有人可以评论一下吗?
Bartlett 球形度测试返回 NaN 值:
您的案例似乎是最后一个。
加载数据:
from io import StringIO
data = """
ECON1,ECON2,ECON3,FOOD1,FOOD2,FOOD3,ENV1,ENV2,HEA1,HEA2,HEA3,PERS1,PERS2,PERS3,COM1,COM2,POL1,POL2
0.310071,0.096913,0.2285,0.234986,0.260894,0.267858,0.489309,0.20632,0.252537,0.266968,0.248452,0.18445,0.093345,0.173952,0.78143,0.487822,0.361886,0.233124
0.353609,0.045075,0.222571,0.222803,0.248388,0.33056,0.060107,-0.20657,0.263673,0.126182,0.265908,0.134481,0.191341,0.113324,0.083918,0.005381,0.266604,0.237078
0.2806,0.170201,0.232027,0.226792,0.233379,0.316765,0.11455,0.237818,0.257337,0.102037,0.214423,0.159002,0.321451,0.16596,0.395897,0.257888,0.330607,0.229079
0.299062,0.127866,0.19808,0.189948,0.222982,0.327082,0.052881,0.345857,0.272412,0.069192,0.251301,0.130606,0.132732,0.174925,0.0,0.0,0.307907,0.238908
0.346291,0.645534,0.371397,0.389068,0.380557,0.386004,0.186583,0.372713,0.382155,0.373531,0.468293,0.364305,0.29951,0.350822,0.188402,0.101147,0.410619,0.385933
"""
# Convert the string data to a file-like object
data_io = StringIO(data)
# Read the data into a pandas DataFrame
V = pd.read_csv(data_io)
检查每个变量的相关性大于 0.95 的次数:
(V.corr() > .95).sum(1).sort_values(ascending=False)
POL2 | 经济2 | PERS1 | 经济3 | 食物1 | 食物2 | PERS3 | HEA1 | HEA3 | COM2 | COM1 | POL1 | 经济1 | PERS2 | ENV2 | ENV1 | 食物3 | HEA2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | 7 | 7 | 6 | 6 | 6 | 4 | 4 | 4 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
让我们从数据集中删除案例较多的变量,看看 Bartlett 测试是否返回正确的值:
for c in ['POL2','ECON2','PERS1']:
V_fix = V.drop(c, axis=1)
chi_square_value, p_value = calculate_bartlett_sphericity(V_fix)
print(c, chi_square_value, p_value)
POL2 nan nan
ECON2 -1181.9125463026403 1.0
PERS1 -1182.5543638437994 1.0