我想问一下如何处理面板数据,或如何使数据集变通,以便在具有多索引时将其建模为面板数据?
library(tibble)
library(plm)
library(fastDummies)
dataset <- tribble(
~country, ~year, ~sex, ~age, ~suicides_no,
"Albania", 1987, "male", "15-24", 50,
"Albania", 1987, "male", "35-50", 20,
"Albania", 1987, "male", "50-", 11,
"Albania", 1987, "female", "15-24", 18,
"Albania", 1987, "female", "35-50", 2,
"Albania", 1987, "female", "50-", 1,
"Albania", 1988, "male", "15-24", 50,
"Albania", 1988, "male", "35-50", 2,
"Albania", 1988, "male", "50-", 11,
"Albania", 1988, "female", "15-24", 17,
"Albania", 1988, "female", "35-50", 20,
"Albania", 1988, "female", "50-", 10,
"Albania", 1989, "male", "15-24", 0,
"Albania", 1989, "male", "35-50", 2,
"Albania", 1989, "male", "50-", 1,
"Albania", 1989, "female", "15-24", 7,
"Albania", 1989, "female", "35-50", 2,
"Albania", 1989, "female", "50-", 1,
"Germany", 1987, "male", "15-24", 50,
"Germany", 1987, "male", "35-50", 2,
"Germany", 1987, "male", "50-", 11,
"Germany", 1987, "female", "15-24", 18,
"Germany", 1987, "female", "35-50", 20,
"Germany", 1987, "female", "50-", 1,
"Germany", 1988, "male", "15-24", 0,
"Germany", 1988, "male", "35-50", 2,
"Germany", 1988, "male", "50-", 110,
"Germany", 1988, "female", "15-24", 17,
"Germany", 1988, "female", "35-50", 20,
"Germany", 1988, "female", "50-", 10,
"Germany", 1989, "male", "15-24", 0,
"Germany", 1989, "male", "35-50", 20,
"Germany", 1989, "male", "50-", 1,
"Germany", 1989, "female", "15-24", 73,
"Germany", 1989, "female", "35-50", 2,
"Germany", 1989, "female", "50-", 11
)
dataset %>% tail
dataset2 <- dummy_cols(dataset, "age") %>% select(-age)
panel <- pdata.frame(dataset2, index = c("country", "year"))
由于年龄间隔,我们在一年中对一个横截面单元进行了多次观察,
我们将如何转换此数据集以使其作为面板数据并使用随机或固定效果?
使用:
library(plm)
fixex = plm(suicides_no ~ factor(sex) + factor(age), index = c("country", "year"), data = dataset, model = "within")
不起作用,如何转换数据以便可以对其进行估计
plm()
功能需要ID和时间的唯一组合。如果您运行:
library(dplyr)
dataset %>%
count(country, year)
然后您会看到,每个国家和年份的组合都有六个观测值:
country year n
<chr> <dbl> <int>
1 Albania 1987 6
2 Albania 1988 6
3 Albania 1989 6
4 Germany 1987 6
5 Germany 1988 6
6 Germany 1989 6
为避免这种情况,您需要创建唯一的ID。我认为可以根据国家,年龄和性别来创建它们。然后,您可以执行以下操作:
library(broom)
dataset %>%
mutate(ID = group_indices(., !!!select(., -suicides_no, -year))) %>%
mutate_at(vars(sex, age), as.factor) %>%
do(tidy(plm(suicides_no ~ sex + age,
index = c("year", "ID"),
model = "within",
data = .)))
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 sexmale 5.17 7.82 0.661 0.514
2 age35-50 -15.5 9.57 -1.62 0.116
3 age50- -10.1 9.57 -1.05 0.301