计算R中数据行中特定单词的出现次数

Question

我有一个包含2列和多行的数据集。第一列ID，第二列所属的文本。

我想添加更多的列，该列汇总了某个字符串在行文本中出现的次数。字符串将为“ \ n正数\ n”，“ \ n中性\ n”，“ \ n Negativ \ n”`

数据集示例：

Id, Content
2356, I like cheese.\n  Positive\nI don't want to be here.\n Negative\n
3456, I am alone.\n Neutral\n

最后应该看起来像

Id, Content,Positiv, Neutral, Negativ
2356, I like cheese.\n  Positive\nI don't want to be here.\n Negative\n,1 ,0 ,1
3456, I am alone.\n Neutral\n, 0, 1, 0

现在，我像这样尝试了，但是没有给出正确的答案：

getCount1 <- function(data, keyword)
{
Positive <- str_count(Dataset$CONTENT, keyword)
return(data.frame(data,Positive))
}
Stufe1 <-getCount1(Dataset,'\n Positive\n')
################################################################
getCount2 <- function(data,  keyword)
{
Neutral <- str_count(Stufe1$CONTENT, keyword)
return(data.frame(data,Neutral))
}
Stufe2 <-getCount2(Stufe1,'\n  Neutral\n')
#####################################################
getCount3 <- function(data,  keyword)
{
Negative <- str_count(Stufe2$CONTENT, keyword)
return(data.frame(data,Negative))
}
Stufe3 <-getCount3(Stufe2,'\n  Negative\n')

Answer 1

我想这就是您所需要的

样本数据

id <- c(1:4)
text <- c('I have a Dataset with 2 columns a',
          'nd multiple rows. first column ID', 'second column the text which',
          'n the text which belongs to it.')
dataset <- data.frame(id,text)

查找计数的功能

library(stringr)
getCount <- function(data,keyword)
{
  wcount <- str_count(dataset$text, keyword)
  return(data.frame(data,wcount))
}

调用getCount应该提供更新的数据集

> getCount(dataset,'second')
  id                              text wcount
  1   I have a Dataset with 2 columns a      0
  2   nd multiple rows. first column ID      0
  3        second column the text which      1
  4     n the text which belongs to it.      0

Answer 2

[提供一些替代方法，让我们从@on_the_shores_of_linux_sea的数据集的稍微修改的版本开始。

id <- c(1:4)
text <- c('I have a Dataset with 2 columns a',
          'nd multiple rows. first column ID rows', 
          'second column the text which',
          'n the text which belongs to it.')
dataset <- data.frame(id,text)

使用基本的R函数，您可以想出一个像这样的函数。

wordCounter <- function(invec, word, ...) {
  vapply(regmatches(invec, gregexpr(word, invec, ...)), length, 1L)
}

您将这样使用它：

## allows other arguments to gregexpr
wordCounter(dataset$text, "id", ignore.case = TRUE) 
# [1] 0 1 0 0
wordCounter(dataset$text, "id")
# [1] 0 0 0 0
wordCounter(dataset$text, "rows")
# [1] 0 2 0 0
wordCounter(dataset$text, "second", ignore.case = TRUE)
# [1] 0 0 1 0

如果要使用一些现成的解决方案，另一种选择是使用“ stringi”软件包，该软件包具有一组漂亮的stri_count*函数集。在这里，我使用了stri_count_fixed：

library(stringi)
stri_count_fixed(dataset$text, "rows")
# [1] 0 2 0 0

Answer 3

这也可以在不加载任何其他库的情况下完成，如Ananda所指出。我的解决方案是，假设2列表名为dataset，要查找的字符串为mystring：

countOccurr = function(text,motif) {
 res = gregexpr(motif,text,fixed=T)[[1]]
 ifelse(res[1] == -1, 0, length(res))
}

dataset = cbind(dataset, count = vapply(dataset[,2], countOccurr, 1, motif=mystring))

请注意，如果要避免出现问题，数据框的第二列必须具有模式字符（@ on-the-shores-of-linux-sea作为示例数据给出的数据框保留了模式因子，对于他的解决方案，但不是我的解决方案）。否则，请使用as.character(dataset[,2])进行投射。

Answer 4

为什么不只是：

dataset$Positiv <- str_count(dataset$Content, 'Positiv')
dataset$Neutral <- str_count(dataset$Content, 'Neutral')
dataset$Negativ <- str_count(dataset$Content, 'Negativ')

计算R中数据行中特定单词的出现次数

问题描述投票：2回答：4

4个回答

最新问题

计算R中数据行中特定单词的出现次数

问题描述 投票：2回答：4

4个回答

最新问题

问题描述投票：2回答：4