我目前正在尝试进行情感分析,我想将每个单词恢复为原始格式。因此,我希望将属于唯一ID的每个单词合并到一行中。所以我想要相反的unnest_tokens函数。我尝试了以下方法:
dsWords <- dsWords %>%
group_by(IDReview) %>%
summarize(text = str_c(word, collapse = " ")) %>%
ungroup()
但是,我只是将所有单词组合成1行,而不是将每个唯一ID都组合成一行。有谁可以帮我离开这里吗?以下是我的数据框和数据子集的屏幕截图。
structure(list(IDReview = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
word = c("love", "love", "author", "side", "end", "show",
"one", "way", "think", "everyon", "also", "idea", "mani",
"amaz", "look", "mani", "idea", "think", "learn", "someth",
"dont", "know", "look", "fact", "see", "right", "dont", "write",
"review", "will", "hero", "will", "hes", "person", "tri",
"short", "certain", "never", "find", "like")), row.names = c("1",
"1.1", "1.2", "1.4", "1.6", "1.13", "1.14", "1.15", "1.16", "1.17",
"1.18", "1.19", "1.20", "1.24", "1.25", "1.27", "1.28", "1.30",
"1.33", "1.34", "1.35", "1.36", "1.37", "1.38", "1.39", "1.41",
"1.42", "1.44", "1.45", "2", "2.3", "2.5", "2.10", "2.12", "2.18",
"2.23", "2.26", "2.27", "2.30", "2.34"), class = "data.frame")
正如Bas在评论中所写,下面的代码带有明确的包名称
dsWords %>%
dplyr::group_by(IDReview) %>%
dplyr::summarise(text = stringr::str_c(word, collapse = " ")) %>%
ungroup()
作为输出给定
# A tibble: 2 x 2
IDReview text
<int> <chr>
1 1 love love author side end show one way think everyon also idea mani amaz look mani idea think~
2 2 will hero will hes person tri short certain never find like
这就是您想要的,不是吗?
请注意,在plyr
之后加载dplyr
时可能会出现问题,请参阅here。