有没有办法做与unnest_tokens相反的事情?我想根据唯一的ID将单词组合成一行

问题描述 投票:0回答:1

我目前正在尝试进行情感分析,我想将每个单词恢复为原始格式。因此,我希望将属于唯一ID的每个单词合并到一行中。所以我想要相反的unnest_tokens函数。我尝试了以下方法:

dsWords <- dsWords %>% 
  group_by(IDReview) %>% 
  summarize(text = str_c(word, collapse = " ")) %>%
  ungroup()

但是,我只是将所有单词组合成1行,而不是将每个唯一ID都组合成一行。有谁可以帮我离开这里吗?以下是我的数据框和数据子集的屏幕截图。

enter image description here

structure(list(IDReview = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    word = c("love", "love", "author", "side", "end", "show", 
    "one", "way", "think", "everyon", "also", "idea", "mani", 
    "amaz", "look", "mani", "idea", "think", "learn", "someth", 
    "dont", "know", "look", "fact", "see", "right", "dont", "write", 
    "review", "will", "hero", "will", "hes", "person", "tri", 
    "short", "certain", "never", "find", "like")), row.names = c("1", 
"1.1", "1.2", "1.4", "1.6", "1.13", "1.14", "1.15", "1.16", "1.17", 
"1.18", "1.19", "1.20", "1.24", "1.25", "1.27", "1.28", "1.30", 
"1.33", "1.34", "1.35", "1.36", "1.37", "1.38", "1.39", "1.41", 
"1.42", "1.44", "1.45", "2", "2.3", "2.5", "2.10", "2.12", "2.18", 
"2.23", "2.26", "2.27", "2.30", "2.34"), class = "data.frame")
r sentiment-analysis
1个回答
0
投票

正如Bas在评论中所写,下面的代码带有明确的包名称

dsWords %>% 
  dplyr::group_by(IDReview) %>% 
  dplyr::summarise(text = stringr::str_c(word, collapse = " ")) %>%
  ungroup()

作为输出给定

# A tibble: 2 x 2
  IDReview text                                                                                          
     <int> <chr>                                                                                         
1        1 love love author side end show one way think everyon also idea mani amaz look mani idea think~
2        2 will hero will hes person tri short certain never find like

这就是您想要的,不是吗?

请注意,在plyr之后加载dplyr时可能会出现问题,请参阅here

© www.soinside.com 2019 - 2024. All rights reserved.