我想在一个广泛的数据框中合并作者的每四个帖子,如果剩下的帖子少于四个,则将这些帖子合并(例如,一个作者有11个帖子,我最终得到2个帖子(共4个帖子)和1个帖子(共3个帖子) 3)。
这是我的数据框示例:
name text
bee _ so we know that right
bee said so
alma hello,
alma Good to hear back from you.
bee I've currently written an application
alma I'm happy about it
bee It was not the last.
alma Will this ever stop.
alma Yet another line.
alma so
我想将其更改为此:
name text
bee _ so we know that right said so I've currently written an application It was not the last.
alma hello, Good to hear back from you. I'm happy about it Will this ever stop
alma Yet another line. so
这里是初始数据帧:
df = structure(list(name = c("bee", "bee", "alma", "alma", "bee", "alma", "bee", "alma", "alma", "alma"), text = c( "_ so we know that right", "said so", "hello,", "Good to hear back from you.", "I've currently written an application", "I'm happy about it", "It was not the last.", "Will this ever stop.", "Yet another line.", "so")), .Names = c("name", "text"), row.names = c(NA, -10L), class = "data.frame")
利用dplyr
的一个选项可能是:
df %>%
group_by(name) %>%
mutate(ID = ceiling(row_number()/4)) %>%
group_by(name, ID) %>%
summarise_all(paste, collapse = " ")
name ID text
<chr> <dbl> <chr>
1 alma 1 hello, Good to hear back from you. I'm happy about it Will this ever stop.
2 alma 2 Yet another line. so
3 bee 1 _ so we know that right said so I've currently written an application It was…