最近,我不得不与R和SPSS合作,用Multinomial Regression框架分析数据集。我们调查了一些参与者(10-12岁),我们询问他们最喜欢哪个“专业领域”,然后我们询问他们访问互联网的频率。因此,结果是一个分类变量“:专业领域 - ”军事“,”我不知道“和”其他职业“;而自变量也是一个分类变量(您多久访问一次互联网( “我无法访问”,“1-3小时/天”,“3-5小时/天”)。
我使用R(使用nnet包,通过multinom函数)运行模型,其他统计学家使用SPSS运行。所有参考类别都已正确定义。
现在,当我们比较结果时,他们不同意我的自变量的第二类。第一个是好的。
请看一下整个代码:
library(tidyverse)
library(stargazer)
library(nnet)
ds <- ds %>% mutate(internet = factor(internet))
ds <- ds %>% mutate(internet = relevel(internet, ref = "I dont have internet access"))
ds <- ds %>% mutate(field = factor(field))
ds <- ds %>% mutate(fielf = relevel(field, ref = "I dont know"))
mod <- multinom(field ~ internet, data = ds, maxit=1000, reltol = 1.0e-9)
stargazer(mod, type = 'text')
为了清楚起见,当自变量只有两个类别(如性别,男性和女性)时,R和SPSS都同意其结果
在努力了解两个结果之间的差异之后,我读了nnet estimation could have some problems(优化问题?),and that the discrepancy of results is not so strange as I was thinking at the beginning ..
有人可以向我解释这里发生了什么吗?我错过了什么?!我假设如果我们运行相同的模型,SPSS和R必须具有相同的结果。
谢谢
那是我在这个例子中使用的ds:
ds <- structure(list(sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 1L), .Label = c("male", "female"), class = "factor"), internet = structure(c(3L,
3L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 2L,
2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 3L,
2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 2L,
2L, 1L, 3L, 2L, 2L, 3L, 2L, 2L), .Label = c("I dont have internet access",
"1-3 hours/day", "3-5 hours/day"), class = "factor"), field = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("I dont know", "Military",
"Other profession"), class = "factor")), class = "data.frame", row.names = c(NA,
-73L))
您可以使用mlogit
,它更接近于SPSS结果。 SPSS值应该是有效的,因为Stata产生类似的结果(-14.88 (982.95), 11.58 (982.95), 11.44 (982.95)
)。剩下的偏差可能源于“其他职业”的荒谬意义。
library(mlogit)
ml.dat <- mlogit.data(ds, choice="field", shape="wide")
ml <- mlogit(field ~ 1 | internet, data=ml.dat)
生产
texreg::screenreg(ml)
=========================================================
Model 1
---------------------------------------------------------
Military:(intercept) -0.41
(0.91)
Other profession:(intercept) -16.89
(2690.89)
Military:factor(internet)1-3 hours/day -1.50
(1.06)
Other profession:factor(internet)1-3 hours/day 13.60
(2690.89)
Military:factor(internet)3-5 hours/day -1.64
(1.06)
Other profession:factor(internet)3-5 hours/day 13.46
(2690.89)
---------------------------------------------------------
AIC 85.49
Log Likelihood -36.74
Num. obs. 73
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05