[我正在尝试使用几个不同的值来分隔R中的列表,并且我感觉自己过度复杂化了我需要做的事情。
我想在“肯定”列中将列表中“正”(即,开始列表或在其前面带有+号的任何内容)分开。
带有-号的任何符号都进入负列。
将c(“ EmilyP”,“ EmilyS”)放入Emily列的任何内容
以及任何将c(“ Red”,“ Blue”)插入Color列的内容。
我已经尝试过dplyr和tidyr,但无法完成这项工作,然后我开始进行循环工作,这似乎很复杂。
有人可以提出更好的方法吗?
((下面的输入和输出)。
input <- structure(list(Team.Name = c("Team 1", "Team 2", "Team 3", "Team 4",
"Team 5", "Team 6"), Members = c("Frank + Terry - Joan - Bob + EmilyS + Red",
"Frank + Bob - Neil - Janet - Tim + EmilyP + Blue", "Frank + Blue - Joan - Bob + EmilyP + Red",
"Tom + Jerry - Bill - Jenny", "Tess + Jean + Jill + EmilyS",
"Bill + Bob + Red")), class = "data.frame", row.names = c(NA,
-6L))
而且我正试图得到这个:
output <- structure(list(Team.Name = c("Team 1", "Team 2", "Team 3", "Team 4",
"Team 5", "Team 6"), Positive = c("Frank + Terry", "Frank + Bob",
"Frank", "Tom + Jerry", "Tess + Jean + Jill", "Bill + Bob"),
Negative = c("Joan - Bob", "Neil - Janet - Tim", "Joan - Bob",
"Bill - Jenny", "", ""), Emily = c("EmilyS", "EmilyP", "EmilyP",
"", "EmilyS", ""), Color = c("Red", "Blue", "Red + Blue",
"", "", "Red")), class = "data.frame", row.names = c(NA,
-6L))
这是我现在得到的。首先,我拆分成员并使用map_dfr()
创建一个数据框。然后,我进行了一些字符串操作。每个组中的第一个成员没有+
。所以我将其添加到第一个成员。我在+
之前替换了-
或Emily
,然后用Emily
替换为大写字母。我还用+
替换了颜色名称之前的-
或color
。然后,将value
列与separate()
分开。对于每个组,我将所有名称与toString()
组合在一起。最后,我将数据转换为宽格式数据。
library(tidyverse)
library(stringi)
map_dfr(.x = stri_split_regex(str = input$Members, pattern = "\\s(?=[+|-])"),
.f = enframe,
.id = "id") %>%
mutate(value = if_else(!substr(x = value, start = 1, stop =1) %in% c("+", "-"),
paste("+ ", value, sep = ""), value),
value = if_else(grepl(x = value, pattern = "Emily[A-Z]"),
sub(x = value, pattern = "[+|-]", replacement = "Emily"),
value),
value = if_else(sub(x = value, pattern = "[+|-]\\s", replacement = "") %in% stri_trans_totitle(colors(distinct = TRUE)),
sub(x = value, pattern = "[+|-]", replacement = "color"),
value)) %>%
separate(col = "value", into = c("type", "value"), sep = "\\s") %>%
group_by(id, type) %>%
summarise(value = toString(value)) %>%
pivot_wider(id_cols = "id", names_from = "type", values_from = "value")
# id `-` `+` color Emily
# <chr> <chr> <chr> <chr> <chr>
#1 1 Joan, Bob Frank, Terry Red EmilyS
#2 2 Neil, Janet, Tim Frank, Bob Blue EmilyP
#3 3 Joan, Bob Frank Blue, Red EmilyP
#4 4 Bill, Jenny Tom, Jerry NA NA
#5 5 NA Tess, Jean, Jill NA EmilyS
#6 6 NA Bill, Bob Red NA