我的数据框太脏了,它把我有趣的变量放在一起了。
eid edu
1 1009467 "A levels/AS levels or equivalent"
2 1016906 "A levels/AS levels or equivalent"
3 1018742 "A levels/AS levels or equivalent"
4 1030778 "A levels/AS levels or equivalent","CSEs or equivalent"
5 1030785 "A levels/AS levels or equivalent","CSEs or equivalent"
或者你可以复制它: structure(list(n = 399:401, edu = c(""A levels/AS levels 或同等学历","学院或大学学位","CSEs 或同等学历","NVQ 或 HND 或 HNC 或同等学历", ""A levels/AS levels 或同等学历","College or University degree","CSEs or equality","NVQ or HND or HNC or equality","O levels/GCSEs or equality", “A levels/AS levels 或同等学历”、“学院或大学学位”、“CSEs 或同等学历”、“NVQ 或 HND 或 HNC 或同等学历”、“O levels/GCSEs 或同等学历” )), row.names = c(NA, 3L), class = "data.Frame")
它可能包含8个教育水平选项:
"A levels/AS levels or equivalent",
"College or University degree",
"CSEs or equivalent",
"NVQ or HND or HNC or equivalent",
"O levels/GCSEs or equivalent",
"Other professional qualifications eg: nursing, teaching",
"Prefer not to answer",
"None of the above"
最脏的是可能同时出现,前五个一起选
我想把我的edu列根据值分开,成为新的变量,如果edu包含它,它会显示1,如果没有,它会显示0
喜欢:
eid edu
1 1009467 "A levels/AS levels or equivalent"
2 1016906 "A levels/AS levels or equivalent"
3 1018742 "A levels/AS levels or equivalent"
4 1030778 "A levels/AS levels or equivalent","CSEs or equivalent"
5 1043561 "A levels/AS levels or equivalent","CSEs or equivalent"
A levels CSEs
1 0
1 0
1 0
1 1
1 1
谢谢!
试试这个:
library(dplyr) #>= 1.1.0
library(tidyr)
df %>%
separate_rows(edu, sep = ",") %>%
mutate(edu = gsub("^\"|\"$", "",edu)) %>%
mutate(A_levels = ifelse(edu %in% "A levels/AS levels or equivalent", 1, 0),
College_Uni = ifelse(edu %in% "College or University degree", 1, 0),
CSEs = ifelse(edu %in% "CSEs or equivalent", 1, 0),
NVQ_HND_HNC = ifelse(edu %in% "NVQ or HND or HNC or equivalent", 1, 0),
O_levels_GCSEs = ifelse(edu %in% "O levels/GCSEs or equivalent", 1, 0),
Other_prof_qual = ifelse(edu %in% "Other professional qualifications eg: nursing, teaching", 1, 0),
Prefer_not_to_answer = ifelse(edu %in% "Prefer not to answer", 1, 0),
None_of_the_above = ifelse(edu %in% "None of the above", 1, 0)) %>%
summarise(across(everything(), ~max(.)), .by = eid)
eid edu A_levels Colleg…¹ CSEs NVQ_H…² O_lev…³ Other…⁴ Prefe…⁵ None_…⁶
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1009467 A levels/AS levels or equivalent 1 0 0 0 0 0 0 0
2 1016906 A levels/AS levels or equivalent 1 0 0 0 0 0 0 0
3 1018742 A levels/AS levels or equivalent 1 0 0 0 0 0 0 0
4 1030778 CSEs or equivalent 1 0 1 0 0 0 0 0
5 1030785 CSEs or equivalent 1 0 1 0 0 0 0 0
# … with abbreviated variable names ¹College_Uni, ²NVQ_HND_HNC, ³O_levels_GCSEs, ⁴Other_prof_qual,
# ⁵Prefer_not_to_answer, ⁶None_of_the_above