基于列分隔列表的内容创建二进制分类变量

问题描述 投票:0回答:1

我的数据框中有一个变量,称为“心脏合并症类型”,其中包含NA或各种心脏合并症类型的列分隔列表。我如何为每种可能的合并症创建一列,然后用1/0填充观察值,其中1 =表示合并症,0 =无合并症。

dput(head(et1$`Cardiac Comorbidity Types`,20))
c("MI,", NA, "CAD, Previous CABG or PTCA, MI, Pacemaker,", "Arrhythmia,", 
"CAD, Previous CABG or PTCA, MI, Arrhythmia,", NA, "CAD, Previous CABG or PTCA, MI,", 
"CAD, Previous CABG or PTCA, CHF, Pacemaker,", "CAD, Previous CABG or PTCA,", 
"CAD, Previous CABG or PTCA, Arrhythmia,", "CAD, Previous CABG or PTCA,", 
"CAD, Previous CABG or PTCA, MI,", "CAD, Previous CABG or PTCA, CHF, Arrhythmia,", 
"CAD, Previous CABG or PTCA, Pacemaker,", "CAD, Previous CABG or PTCA, MI, CHF,", 
"CAD, Previous CABG or PTCA, MI, CHF,", NA, "CAD, Previous CABG or PTCA, PVD, Pacemaker,", 
"PVD,", "CAD, Previous CABG or PTCA,")

此外,如果数据以分号分隔,该怎么办?

r database data-cleaning delimited-text medical
1个回答
0
投票

这有点棘手,但是我们可以结合使用unnest中的pivot_widertidyr

data <- data %>% mutate(ID = 1:nrow(data))

data %>% 
  mutate(Cardiac.Comorbidity.Types = str_split(Cardiac.Comorbidity.Types, ", ?")) %>%
  unnest(Cardiac.Comorbidity.Types) %>%
  filter(Cardiac.Comorbidity.Types != "") %>%
  pivot_wider(id_cols = "ID", names_from = Cardiac.Comorbidity.Types, values_from = Cardiac.Comorbidity.Types) %>%
  right_join(data, by="ID") %>%
  mutate_at(vars(-ID,-Cardiac.Comorbidity.Types), ~ as.integer(!is.na(.x))) %>% select(-ID)
© www.soinside.com 2019 - 2024. All rights reserved.