文本挖掘中的错误：“替换的长度为零”和“要替换的项目数不是替换长度的倍数”

Question

我正在尝试使用for循环从文本中提取多个单词。

下面的代码行给我一个错误，指出replacement has length zero和number of items to replace is not a multiple of replacement length。为了清楚说明我的问题，请考虑以下情况。

library(tm)
library(stringr)
library(stringi)
mydata<-data.frame(id=c(1,2,3), 
          text=c("This is text mining exercise","Text analysis is bit confusing","Hint on this text 
          analysis?")) 
multiwords<-c("text","analysis","bit confusing")
txt<- freq<- list() 
for(i in 1:length(mydata$id)){ 
    txt[i]<-str_extract_all(mydata[i,], paste0(multiwords,collapse = "|")) freq[i]<-table(txt[i])
}

请注意，multiwords中的每个术语不一定在每次迭代中都出现。

Answer 1

如果要在整个提取的元素上使用table，则在将'multiwords'粘贴为str_extract_all之后在'text'列上使用pattern，然后将unlist粘贴到list并获得table

library(stringr)
lst1 <- str_extract_all(mydata$text, str_c(multiwords, collapse="|"))
table(unlist(lst1))
#    analysis bit confusing          text 
#           2             1             2

如果需要在table的每个元素上应用list，则>

lapply(lst1, table)
#[[1]]

#text 
#   1 

#[[2]]

#     analysis bit confusing 
#            1             1 

#[[3]]

#analysis     text 
#       1        1

文本挖掘中的错误：“替换的长度为零”和“要替换的项目数不是替换长度的倍数”

问题描述投票：0回答：1

1个回答

最新问题

文本挖掘中的错误：“替换的长度为零”和“要替换的项目数不是替换长度的倍数”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1