从带有逗号分隔字符串的特定数据帧创建频率表

问题描述 投票:1回答:1

具有这样的数据框:

df <- structure(list(doc_id = c("1", "2"), ner_words = c("John, Google", 
"Amazon, Python, Canada")), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

如何制作table(df$ner_words),但每行取不同的词?预期结果示例

data.frame(text = c("John", "Google", "Amazon", "Python", "Canada"), frq = c(1,1,1,1,1))
r
1个回答
1
投票

这是一个选项:

library(dplyr)
df %>% 
  separate_rows(ner_words, sep = ", ") %>% 
  group_by(ner_words) %>% 
  mutate(freq = n())

# A tibble: 5 x 3
# Groups:   ner_words [5]
  doc_id ner_words  freq
  <chr>  <chr>     <int>
1 1      John          1
2 1      Google        1
3 2      Amazon        1
4 2      Python        1
5 2      Canada        1
© www.soinside.com 2019 - 2024. All rights reserved.