使用 2l.pan 或 panImpute（R 中的 mouse 包）对纵向数据进行多重插补

Question

我有一个名为

tradep_red

的长格式纵向（面板）数据框，其中包含 200 个国家 (

country

)、26 年 (

year

)、连续因变量

gini

和 2 个连续预测变量 (

trade)

和

unempl

，实际上有13个，但为了这个问题我把它减少到2个）。

gini

和预测变量都包含缺失值。虚拟数据如下所示：

# Generate dummy data 
set.seed(12345)
country <- as.factor(rep(1:200, each = 26))
year <- rep(1:26, times = 200)
gini <- rnorm(n = 200*26, mean = 20, sd = 4)
trade <- rnorm(n = 200*26, mean = 1000, sd = 7)
unempl <- rnorm(n = 200*26, mean = 4, sd = 0.2)

# Add NA values 
missing_indices_gini <- sample(1:length(gini), 1000)
gini[missing_indices_gini] <- NA
missing_indices_trade <- sample(1:length(trade), 800)
trade[missing_indices_trade] <- NA
missing_indices_unempl <- sample(1:length(unempl), 900)
unempl[missing_indices_unempl] <- NA

# Combine into dataframe
tradep_red <- data.frame(country, year, gini, trade, unempl)
head(tradep_red)
##   country year     gini     trade   unempl
## 1       1    1 22.34212 1006.3982 3.740346
## 2       1    2 22.83786  997.7583 3.801918
## 3       1    3 19.56279  996.9160 3.699202
## 4       1    4       NA        NA 3.838534
## 5       1    5 22.42355  996.0563 3.835563
## 6       1    6       NA 1005.5007 4.115319

我想对数据中的缺失值进行多重估算，同时专门考虑数据中的多级结构（即按

country

进行聚类）。使用下面的代码（使用

mice

包），我已经能够使用

pmm

方法创建估算数据集。

library(mice)

# Multiple imputation
predictorMatrix <- quickpred(tradep_red, 
                             include = c("country", "gini", "trade", "unempl"), 
                             exclude = c("year"), mincor = 0.1)

imp <- mice(data = tradep_red, 
            m = 3, 
            maxit = 5, 
            method = "pmm", 
            predictorMatrix = predictorMatrix,  
            seed = 123)

但是，我想使用

2l.pan

方法（或其他方法，例如

panImpute

）来解释集群变量

country

。

2l.pan

方法需要在

predictorMatrix

中指定一个簇变量，方法是给

country

一个值

-2

，然后运行插补：

predictorMatrix["country", ] <- -2 # specify country as cluster variable

imp <- mice(data = tradep_red, 
            m = 3, 
            maxit = 5, 
            method = "2l.pan", 
            predictorMatrix = predictorMatrix,  
            seed = 123)

然而这给出了错误：

## iter imp variable
##  1   1  giniError in mice.impute.2l.pan(y = c(22.3421152713754, 22.8378640700381,  : 
##  No class variable

或者，可以使用

formula

运算符在

语句中指定簇变量。此外，公式语句必须是

list

。我没有成功地正确指定这个公式语句。下面的代码显示了我尝试过的内容：

formula_imp <- list(gini + trade + unempl ~ (1 | country))

imp <- mice(data = tradep_red, 
            m = 3, 
            maxit = 5, 
            method = "2l.pan", 
            predictorMatrix = predictorMatrix, 
            formulas = formula_imp, 
            seed = 123)

这给出了错误：

## iter imp variable
##  1   1  gini trade unempl  giniError in mice.impute.2l.pan(y = c(22.3421152713754, 22.8378640700381,  : 
##  No class variable
## In addition: Warning messages:
## 1: In Ops.factor(1, country) : ‘|’ not meaningful for factors
## 2: In Ops.factor(1, country) : ‘|’ not meaningful for factors
## 3: In Ops.factor(1, country) : ‘|’ not meaningful for factors

尝试在

panImpute

函数中使用替代

mice

方法时，我遇到类似的错误。如何正确指定

country

作为多重插补过程的聚类变量？非常感谢任何帮助或参考！

Answer 1

class

变量必须是整数。因此添加以下内容，您第一次尝试使用

predictorMatrix

将会起作用

tradep_red = tradep_red %>% mutate(country = country %>% as.integer() )

Answer 2

我不是 100% 确定，但我认为需要更改规范，以便在指定集群时国家/地区是所有目标（行）变量的预测变量。

predictorMatrix[,"country"] <- -2

使用 2l.pan 或 panImpute（R 中的 mouse 包）对纵向数据进行多重插补

问题描述投票：0回答：2

2个回答

最新问题

使用 2l.pan 或 panImpute（R 中的 mouse 包）对纵向数据进行多重插补

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2