具有多索引的面板数据

问题描述 投票:0回答:1

我想问一下如何处理面板数据,或如何使数据集变通,以便在具有多索引时将其建模为面板数据?

library(tibble)
library(plm)
library(fastDummies)

dataset <- tribble(
  ~country, ~year, ~sex, ~age, ~suicides_no,
  "Albania", 1987, "male", "15-24", 50, 
  "Albania", 1987, "male", "35-50", 20, 
  "Albania", 1987, "male", "50-", 11,
  "Albania", 1987, "female", "15-24", 18, 
  "Albania", 1987, "female", "35-50", 2, 
  "Albania", 1987, "female", "50-", 1,
  "Albania", 1988, "male", "15-24", 50, 
  "Albania", 1988, "male", "35-50", 2, 
  "Albania", 1988, "male", "50-", 11,
  "Albania", 1988, "female", "15-24", 17, 
  "Albania", 1988, "female", "35-50", 20, 
  "Albania", 1988, "female", "50-", 10,
  "Albania", 1989, "male", "15-24", 0, 
  "Albania", 1989, "male", "35-50", 2, 
  "Albania", 1989, "male", "50-", 1,
  "Albania", 1989, "female", "15-24", 7, 
  "Albania", 1989, "female", "35-50", 2, 
  "Albania", 1989, "female", "50-", 1,
  "Germany", 1987, "male", "15-24", 50, 
  "Germany", 1987, "male", "35-50", 2, 
  "Germany", 1987, "male", "50-", 11,
  "Germany", 1987, "female", "15-24", 18, 
  "Germany", 1987, "female", "35-50", 20, 
  "Germany", 1987, "female", "50-", 1,
  "Germany", 1988, "male", "15-24", 0, 
  "Germany", 1988, "male", "35-50", 2, 
  "Germany", 1988, "male", "50-", 110,
  "Germany", 1988, "female", "15-24", 17, 
  "Germany", 1988, "female", "35-50", 20, 
  "Germany", 1988, "female", "50-", 10,
  "Germany", 1989, "male", "15-24", 0, 
  "Germany", 1989, "male", "35-50", 20, 
  "Germany", 1989, "male", "50-", 1,
  "Germany", 1989, "female", "15-24", 73, 
  "Germany", 1989, "female", "35-50", 2, 
  "Germany", 1989, "female", "50-", 11

)
dataset %>% tail


dataset2 <- dummy_cols(dataset, "age") %>% select(-age)
panel <- pdata.frame(dataset2, index = c("country", "year"))

由于年龄间隔,我们在一年中对一个横截面单元进行了多次观察,

我们将如何转换此数据集以使其作为面板数据并使用随机或固定效果?

使用:

library(plm)

fixex = plm(suicides_no ~ factor(sex) + factor(age), index = c("country", "year"), data = dataset, model = "within")

不起作用,如何转换数据以便可以对其进行估计

r dplyr kaggle
1个回答
0
投票

plm()功能需要ID和时间的唯一组合。如果您运行:

library(dplyr)

dataset %>%
 count(country, year)

然后您会看到,每个国家和年份的组合都有六个观测值:

  country  year     n
  <chr>   <dbl> <int>
1 Albania  1987     6
2 Albania  1988     6
3 Albania  1989     6
4 Germany  1987     6
5 Germany  1988     6
6 Germany  1989     6

为避免这种情况,您需要创建唯一的ID。我认为可以根据国家,年龄和性别来创建它们。然后,您可以执行以下操作:

library(broom)

dataset %>%
 mutate(ID = group_indices(., !!!select(., -suicides_no, -year))) %>%
 mutate_at(vars(sex, age), as.factor) %>%
 do(tidy(plm(suicides_no ~ sex + age, 
             index = c("year", "ID"), 
             model = "within",
             data = .)))

  term     estimate std.error statistic p.value
  <chr>       <dbl>     <dbl>     <dbl>   <dbl>
1 sexmale      5.17      7.82     0.661   0.514
2 age35-50   -15.5       9.57    -1.62    0.116
3 age50-     -10.1       9.57    -1.05    0.301
© www.soinside.com 2019 - 2024. All rights reserved.