将数据框中的变量转换为术语文档矩阵

问题描述 投票:0回答:2

我有一个数据框,其中包含我想要执行Latent Dirichlet分配的段落。为此,我需要创建一个术语文档矩阵。此示例代码显示错误:

library(qdap)
library(topicmodels)

remove(list=ls())
doc <- c(1,2,3,4)
text <- c("The Quick Brown Fox Jumped Over The Lazy Dog",
        "The Cow Jumped Over The Moon",
        "Moo, Moo, Brown Cow Have You Any Milk",
        "The Fox Went Out One Moonshiny Night")
works.df <- data.frame(doc,text)

works.tdm <- as.tdm(text.var = works.df$text,  grouping.var = works.df$doc)
works.lda <- LDA(works.tdm, k = 2, control = list(seed = 1234))

哪里

works.tdm < - as.tdm(text.var = works.df $ text,grouping.var = works.df $ doc).TermDocumentMatrix(x,weighting)中的错误:缺少参数“weighting”,没有默认值

我认为我会得到一个稀疏矩阵,例如:术语“the”出现在文档1(频率为2),2(频率为2)和4(频率为1);术语“牛”出现在文件2和3中(频率均为1); ...

陈有人建议什么是遗失或者是否有更好的方法来完成我的任务?谢谢。

r text lexical-analysis topic-modeling
2个回答
0
投票

R需要提供加权:

library(tm)
works.tdm <- as.tdm(text.var = works.df$text,  grouping.var = works.df$doc, weighting = weightTf)

0
投票

看起来我需要先变成语料库并使用更常见的DocumentTermMatrix()

> remove(list=ls())
> doc<-c(1,2,3,4)
> text<-c("The Quick Brown Fox Jumped Over The Lazy Dog",
+         "The Cow Jumped Over The Moon",
+         "Moo, Moo, Brown Cow Have You Any Milk",
+         "The Fox Went Out One Moonshiny Night")
> works.df<-data.frame(doc,text)
> corp <- VCorpus(VectorSource(works.df$text))
> works.tdm <- DocumentTermMatrix(corp, control=list(weighting=weightTf))
> works.tdm
<<DocumentTermMatrix (documents: 4, terms: 20)>>
Non-/sparse entries: 27/53
Sparsity           : 66%
Maximal term length: 9
Weighting          : term frequency (tf)
> as.matrix(works.tdm)
    Terms
Docs any brown cow dog fox have jumped lazy milk moo, moon moonshiny night one out over quick the went
   1   0     1   0   1   1    0      1    1    0    0    0         0     0   0   0    1     1   2    0
   2   0     0   1   0   0    0      1    0    0    0    1         0     0   0   0    1     0   2    0
   3   1     1   1   0   0    1      0    0    1    2    0         0     0   0   0    0     0   0    0
   4   0     0   0   0   1    0      0    0    0    0    0         1     1   1   1    0     0   1    1
    Terms
Docs you
   1   0
   2   0
   3   1
   4   0

© www.soinside.com 2019 - 2024. All rights reserved.