如何分离列的值成为列?

问题描述 投票:0回答:1

我的数据框太脏了,它把我有趣的变量放在一起了。

我的 df 喜欢它:

      eid                                edu
1 1009467 "A levels/AS levels or equivalent"
2 1016906 "A levels/AS levels or equivalent"
3 1018742 "A levels/AS levels or equivalent"
4 1030778 "A levels/AS levels or equivalent","CSEs or equivalent"
5 1030785 "A levels/AS levels or equivalent","CSEs or equivalent"

或者你可以复制它: structure(list(n = 399:401, edu = c(""A levels/AS levels 或同等学历","学院或大学学位","CSEs 或同等学历","NVQ 或 HND 或 HNC 或同等学历", ""A levels/AS levels 或同等学历","College or University degree","CSEs or equality","NVQ or HND or HNC or equality","O levels/GCSEs or equality", “A levels/AS levels 或同等学历”、“学院或大学学位”、“CSEs 或同等学历”、“NVQ 或 HND 或 HNC 或同等学历”、“O levels/GCSEs 或同等学历” )), row.names = c(NA, 3L), class = "data.Frame")

它可能包含8个教育水平选项:

"A levels/AS levels or equivalent",
"College or University degree",
"CSEs or equivalent",
"NVQ or HND or HNC or equivalent",
"O levels/GCSEs or equivalent",
"Other professional qualifications eg: nursing, teaching",
"Prefer not to answer",
"None of the above"

最脏的是可能同时出现,前五个一起选

我想把我的edu列根据值分开,成为新的变量,如果edu包含它,它会显示1,如果没有,它会显示0

喜欢:

      eid                                edu 
1 1009467 "A levels/AS levels or equivalent"           
2 1016906 "A levels/AS levels or equivalent"           
3 1018742 "A levels/AS levels or equivalent"           
4 1030778 "A levels/AS levels or equivalent","CSEs or equivalent"        
5 1043561 "A levels/AS levels or equivalent","CSEs or equivalent"       

A levels CSEs
1        0
1        0
1        0
1        1
1        1

谢谢!

r dplyr stringr
1个回答
0
投票

试试这个:

library(dplyr) #>= 1.1.0
library(tidyr)

df %>% 
  separate_rows(edu, sep = ",") %>% 
  mutate(edu =  gsub("^\"|\"$", "",edu)) %>% 
  mutate(A_levels = ifelse(edu %in% "A levels/AS levels or equivalent", 1, 0),
         College_Uni = ifelse(edu %in% "College or University degree", 1, 0),
         CSEs = ifelse(edu %in% "CSEs or equivalent", 1, 0),
         NVQ_HND_HNC = ifelse(edu %in% "NVQ or HND or HNC or equivalent", 1, 0),
         O_levels_GCSEs = ifelse(edu %in% "O levels/GCSEs or equivalent", 1, 0),
         Other_prof_qual = ifelse(edu %in% "Other professional qualifications eg: nursing, teaching", 1, 0),
         Prefer_not_to_answer = ifelse(edu %in% "Prefer not to answer", 1, 0),
         None_of_the_above = ifelse(edu %in% "None of the above", 1, 0)) %>% 
  summarise(across(everything(), ~max(.)), .by = eid)
 eid edu                              A_levels Colleg…¹  CSEs NVQ_H…² O_lev…³ Other…⁴ Prefe…⁵ None_…⁶
    <int> <chr>                               <dbl>    <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 1009467 A levels/AS levels or equivalent        1        0     0       0       0       0       0       0
2 1016906 A levels/AS levels or equivalent        1        0     0       0       0       0       0       0
3 1018742 A levels/AS levels or equivalent        1        0     0       0       0       0       0       0
4 1030778 CSEs or equivalent                      1        0     1       0       0       0       0       0
5 1030785 CSEs or equivalent                      1        0     1       0       0       0       0       0
# … with abbreviated variable names ¹​College_Uni, ²​NVQ_HND_HNC, ³​O_levels_GCSEs, ⁴​Other_prof_qual,
#   ⁵​Prefer_not_to_answer, ⁶​None_of_the_above
© www.soinside.com 2019 - 2024. All rights reserved.