[对欧洲社会调查(ESS)的分析,我试图计算受教育程度高于其父母的被调查者所占的比例。我打算使用for循环进行计算。但是,我不能单独计算份额[[每个国家和年份。数据框中的行是单个观察值(约40万),我有一行指示受访者的国家/地区(cntry)和年份(ESSround)。我的代码看起来像这样
for (i in 1:nrow(ESS_cleann)) {
ESS_cleann$abs_mobility[i] <- ESS_cleann[ESS_cleann[cntry]==i && ESS_cleann[essround]==i] length(ESS_cleann$educ_mobility[i] [ESS_clean$educ_mobility [i] == "U"])/ESS_cleann[ESS_cleann[cntry]==i&& ESS_cleann[essround]==i] length(ESS_cleann$educ_mobility[i])
}
我很清楚这是错误的,但是我无法告诉R分别计算每个国家和年份的R份额。帮助大增!为了让您了解数据结构,以下是所有三个相关列的标题:
ESS_cleann.cntry ESS_cleann.essround ESS_cleann.educ_mobility 1 AT 2 D 2 AT 2 D 3 AT 3 U 4 AT 3 U 5 AT 1 N 6 AT 3 N
library(dplyr)
set.seed(2020)
cntry <- sample(c("AT", "UK"), 100, replace = TRUE)
essround <- sample(1:3, 100, replace = TRUE)
mobility <- sample(c("D", "U", "N"), 100, replace = TRUE)
ESS <- data.frame(cntry, essround, mobility)
ESS %>%
group_by(cntry, essround, mobility, .drop= FALSE) %>%
summarise(counts = n()) %>%
mutate(.data = ., perc = counts / sum(counts))
#> # A tibble: 18 x 5
#> # Groups: cntry, essround [6]
#> cntry essround mobility counts perc
#> <chr> <int> <chr> <int> <dbl>
#> 1 AT 1 D 6 0.429
#> 2 AT 1 N 4 0.286
#> 3 AT 1 U 4 0.286
#> 4 AT 2 D 3 0.273
#> 5 AT 2 N 5 0.455
#> 6 AT 2 U 3 0.273
#> 7 AT 3 D 5 0.333
#> 8 AT 3 N 4 0.267
#> 9 AT 3 U 6 0.4
#> 10 UK 1 D 7 0.318
#> 11 UK 1 N 6 0.273
#> 12 UK 1 U 9 0.409
#> 13 UK 2 D 4 0.25
#> 14 UK 2 N 7 0.438
#> 15 UK 2 U 5 0.312
#> 16 UK 3 D 7 0.318
#> 17 UK 3 N 10 0.455
#> 18 UK 3 U 5 0.227
由reprex package(v0.3.0)在2020-05-11创建
DT[,.SD[ education.level > parent.education.level, .N/nrow(.SD)], by= c("country", "year") ]
如果您要使用for循环来执行此操作,我想类似的方法会起作用:
for (year in years) { for (country in countries){ subtable <- table[year==yer & country===countr] store.in.some.variable.or.table.or.something <- nrow( subtable [ education > parental.education, ]) / nrow(subtable) } }
希望这会有所帮助。最好的祝福JA。