我希望将包含多个逗号分隔响应的列分成多个列。我正在 splitstackshape 包中使用 cSplit_e 函数。不幸的是,包中的某些项目在单个项目中包含逗号,因此我试图表明它应该仅在后面不跟空格的逗号处拆分。
这是我现在得到的语法:
cSplit_e(data=df,split.col="question",sep=",",type="character")
这需要:
Behavior; green, pink, blue,Sleep; indigo, violet, puce
并为以下内容创建单独的列:
question_Behavior; green
question_pink
question_blue
question_Sleep; indigo
question_violet
question_puce
但我希望它分成这样:
question_Behavior; green, pink, blue
question_Sleep; indigo, violet, puce
我不确定如何在 cSplit_e 的语法中指示我只希望它在紧随其后的非空格的逗号处进行分割,并且将不胜感激!
示例数据框:
id_num <- c("1","2","3","4","5")
question <- c("Behavior; green, pink, blue,Sleep; indigo, violet, puce","Behavior; green, pink, blue","","Sleep; indigo, violet, puce","Behavior; green, pink, blue,Sleep; indigo, violet, puce")
df <- data.frame(id_num,question)
如果您不介意使用
tidyr package
,这里有一个可能的解决方案的建议。也许它不像使用这个splitstackshape package
那么优雅或简单,但我不知道。
我的代码:
df %>%
separate_rows(question, sep = "(?<=\\S),(?=\\S)", convert = FALSE) %>%
separate(question, into = c("question", "response"), sep = ";", extra = "merge") %>%
filter(!is.na(response)) %>%
pivot_wider(names_from = question, values_from = response) %>%
rename_all(~gsub("\\.", "_", .))
输出:
# A tibble: 4 × 3
id_num Behavior Sleep
<chr> <chr> <chr>
1 1 " green, pink, blue" " indigo, violet, puce"
2 2 " green, pink, blue" NA
3 4 NA " indigo, violet, puce"
4 5 " green, pink, blue" " indigo, violet, puce"