我正在努力完成一项看似简单的任务:在
df
中,我想将words
中的数据汇总成字符串(没问题)以及这些Duration
的相应words
(问题)。我尝试在
Duration
失败的情况下对
[!is.na(words)]
进行子集化;相反,
Duration
中的所有值都串在一起:注意:我不想仅在
words
不 NA 的行上过滤数据帧,因为在我的实际数据框中,我有更多的列需要以类似的方式进行总结。
library(tidyverse)
df %>%
summarise(
words = str_c(words[!is.na(words)], collapse = ","),
words_dur = str_c(Duration[!is.na(words)], collapse = ",")
)
words words_dur
1 hey,how,are,you 44,150,30,55,77,80,99,100,200
预期输出是这样的:
words words_dur
1 hey,how,are,you 44,55,77,99
数据:
df <- data.frame(
words = c("hey", NA, NA, "how", "are", NA, "you", NA, NA),
Duration = c(44, 150, 30, 55, 77, 80, 99, 100, 200)
)
> df |> lapply(\(x) toString(x[!is.na(df$words)])) |> as.data.frame()
words Duration
1 hey, how, are, you 44, 55, 77, 99
> df |> lapply(\(x) list(x[!is.na(df$words)])) |> list2DF()
words Duration
1 hey, how, are, you 44, 55, 77, 99
words
的第一次计算中重新定义了
summarise()
。此时
words
是
"hey,how,are,you"
,因此
!is.na(words)
等于
TRUE
,它被回收并返回
Duration
中的所有值。更改第一个计算中
words
变量的名称,例如
words2 = str_c(words[!is.na(words)], collapse = ",")
或颠倒计算顺序:
df %>%
summarise(
words_dur = str_c(Duration[!is.na(words)], collapse = ","),
words = str_c(words[!is.na(words)], collapse = ",")
)
# words_dur words
# 1 44,55,77,99 hey,how,are,you