我正在尝试清理表以用作索引。由于一些条目对应于多个列,因此我想在必要时连接成一个字符串。我正在尝试做的例子是:
tibble(
day = c("Mo", "Tu","Tu", "We","Th","Fr","Fr","Fr"),
see = c("cat", "cat", "dog", NA, "cat","cat","dog","bird" )
) %>%
nest(data = see) %>%
mutate(see = paste0(data)) %>%
select(-data)
这几乎可以工作,但是 paste0 不只是粘贴列表中的数据,而是:
# A tibble: 5 × 2
day see
<chr> <chr>
1 Mo "list(see = \"cat\")"
2 Tu "list(see = c(\"cat\", \"dog\"))"
3 We "list(see = NA)"
4 Th "list(see = \"cat\")"
5 Fr "list(see = c(\"cat\", \"dog\", \"bird\"))"
我需要改变什么?我希望
see
列仅包含逗号或空格分隔的字符串,或者在 NA. 处保持为空
我认为我们不需要
nest()
,但可以用summarise()
代替paste(..., collapse = ", ")
。
library(dplyr)
tibble(
day = c("Mo", "Tu","Tu", "We","Th","Fr","Fr","Fr"),
see = c("cat", "cat", "dog", NA, "cat","cat","dog","bird" )
) %>%
summarise(data = paste(see, collapse = ", "), .by = "day")
#> # A tibble: 5 × 2
#> day data
#> <chr> <chr>
#> 1 Mo cat
#> 2 Tu cat, dog
#> 3 We NA
#> 4 Th cat
#> 5 Fr cat, dog, bird
创建于 2023-03-16 与 reprex v2.0.2
另一种方法是使用
toString
:
tibble(
day = c("Mo", "Tu","Tu", "We","Th","Fr","Fr","Fr"),
see = c("cat", "cat", "dog", NA, "cat","cat","dog","bird" )
) %>%
group_by(day) %>%
summarise(see = toString(see))
# A tibble: 5 × 2
day see
<chr> <chr>
1 Fr cat, dog, bird
2 Mo cat
3 Th cat
4 Tu cat, dog
5 We NA
您可以尝试
tidyverse
解决方案stringr::str_flatten_comma
:
library(dplyr) #1.1.0+
library(stringr)
tib <- tibble(
day = c("Mo", "Tu","Tu", "We","Th","Fr","Fr","Fr"),
see = c("cat", "cat", "dog", NA, "cat","cat","dog","bird" )
)
tib %>%
summarise(see = stringr::str_flatten_comma(see), .by = day)
# A tibble: 5 × 2
day see
<chr> <chr>
1 Mo cat
2 Tu cat, dog
3 We NA
4 Th cat
5 Fr cat, dog, bird
为了完整性,
data.table
方式
library(data.table)
setDT(df)
df[, lapply(.SD, toString), day]