我的数据框中有一个变量,称为“心脏合并症类型”,其中包含NA或各种心脏合并症类型的列分隔列表。我如何为每种可能的合并症创建一列,然后用1/0填充观察值,其中1 =表示合并症,0 =无合并症。
dput(head(et1$`Cardiac Comorbidity Types`,20))
c("MI,", NA, "CAD, Previous CABG or PTCA, MI, Pacemaker,", "Arrhythmia,",
"CAD, Previous CABG or PTCA, MI, Arrhythmia,", NA, "CAD, Previous CABG or PTCA, MI,",
"CAD, Previous CABG or PTCA, CHF, Pacemaker,", "CAD, Previous CABG or PTCA,",
"CAD, Previous CABG or PTCA, Arrhythmia,", "CAD, Previous CABG or PTCA,",
"CAD, Previous CABG or PTCA, MI,", "CAD, Previous CABG or PTCA, CHF, Arrhythmia,",
"CAD, Previous CABG or PTCA, Pacemaker,", "CAD, Previous CABG or PTCA, MI, CHF,",
"CAD, Previous CABG or PTCA, MI, CHF,", NA, "CAD, Previous CABG or PTCA, PVD, Pacemaker,",
"PVD,", "CAD, Previous CABG or PTCA,")
此外,如果数据以分号分隔,该怎么办?
这有点棘手,但是我们可以结合使用unnest
中的pivot_wider
和tidyr
。
data <- data %>% mutate(ID = 1:nrow(data))
data %>%
mutate(Cardiac.Comorbidity.Types = str_split(Cardiac.Comorbidity.Types, ", ?")) %>%
unnest(Cardiac.Comorbidity.Types) %>%
filter(Cardiac.Comorbidity.Types != "") %>%
pivot_wider(id_cols = "ID", names_from = Cardiac.Comorbidity.Types, values_from = Cardiac.Comorbidity.Types) %>%
right_join(data, by="ID") %>%
mutate_at(vars(-ID,-Cardiac.Comorbidity.Types), ~ as.integer(!is.na(.x))) %>% select(-ID)