tidytext书中有一些示例,主题模型更加整洁:
library(tidyverse)
library(tidytext)
library(topicmodels)
library(broom)
year_word_counts <- tibble(year = c("2007", "2008", "2009"),
+ word = c("dog", "cat", "chicken"),
+ n = c(1753L, 1157L, 1057L))
animal_dtm <- cast_dtm(data = year_word_counts, document = year, term = word, value = n)
animal_lda <- LDA(animal_dtm, k = 5, control = list( seed = 1234))
animal_lda <- tidy(animal_lda, matrix = "beta")
# Console output
Error in as.data.frame.default(x) :
cannot coerce class "structure("LDA_VEM", package = "topicmodels")" to a data.frame
In addition: Warning message:
In tidy.default(animal_lda, matrix = "beta") :
No method for tidying an S3 object of class LDA_VEM , using as.data.frame
复制错误,这也是here,但在这种情况下library(tidytext)
存在。
下面是所有包的列表是它们对应的版本:
packageVersion("tidyverse")
‘1.2.1’
packageVersion("tidytext")
‘0.1.6’
packageVersion("topicmodels")
‘0.2.7’
packageVersion("broom")
‘0.4.3’
函数调用sessionInfo()
的输出:
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] broom_0.4.3 tidytext_0.1.6 forcats_0.2.0 stringr_1.2.0 dplyr_0.7.4 purrr_0.2.4 readr_1.1.1 tidyr_0.8.0
[9] tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1 topicmodels_0.2-7
loaded via a namespace (and not attached):
[1] modeltools_0.2-21 slam_0.1-42 NLP_0.1-11 reshape2_1.4.3 haven_1.1.1 lattice_0.20-35 colorspace_1.3-2 SnowballC_0.5.1
[9] stats4_3.4.3 yaml_2.1.16 rlang_0.1.6 pillar_1.1.0 foreign_0.8-69 glue_1.2.0 modelr_0.1.1 readxl_1.0.0
[17] bindrcpp_0.2 bindr_0.1 plyr_1.8.4 munsell_0.4.3 gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2 psych_1.7.8
[25] tm_0.7-3 parallel_3.4.3 tokenizers_0.1.4 Rcpp_0.12.15 scales_0.5.0 jsonlite_1.5 mnormt_1.5-5 hms_0.4.1
[33] stringi_1.1.6 grid_3.4.3 cli_1.0.0 tools_3.4.3 magrittr_1.5 lazyeval_0.2.1 janeaustenr_0.1.5 crayon_1.3.4
[41] pkgconfig_2.0.1 Matrix_1.2-12 xml2_1.2.0 lubridate_1.7.2 assertthat_0.2.0 httr_1.3.1 rstudioapi_0.7 R6_2.2.2
[49] nlme_3.1-131 compiler_3.4.3
删除.Rhistory和.RData导致了正确的行为。
哇,这对我来说是非常神秘的。我无法重现该错误。我安装到所有相同的版本/ etc,除了我在MacOS而不是Windows。我在Appveyor上的Windows上有tests for the LDA tidiers that run and pass,所以我希望这可行。
您拥有的代码应该可以在不加载扫帚的情况下工作,因为它的价值。
library(tidyverse)
library(tidytext)
library(topicmodels)
year_word_counts <- tibble(year = c("2007", "2008", "2009"),
word = c("dog", "cat", "chicken"),
n = c(1753L, 1157L, 1057L))
animal_dtm <- cast_dtm(data = year_word_counts, document = year, term = word, value = n)
animal_lda <- LDA(animal_dtm, k = 5, control = list( seed = 1234))
class(animal_lda)
#> [1] "LDA_VEM"
#> attr(,"package")
#> [1] "topicmodels"
tidy(animal_lda, matrix = "beta")
#> # A tibble: 15 x 3
#> topic term beta
#> <int> <chr> <dbl>
#> 1 1 dog 0.0000000000000000000000000000000000000000000372
#> 2 2 dog 0.0000000000000000000000000000000000000000000372
#> 3 3 dog 0.0000000000000000000000000000000000000000000372
#> 4 4 dog 1.00
#> 5 5 dog 0.0000000000000000000000000000000000000000000372
#> 6 1 cat 0.0000000000000000000000000000000000000000000372
#> 7 2 cat 0.0000000000000000000000000000000000000000000372
#> 8 3 cat 0.0000000000000000000000000000000000000000000372
#> 9 4 cat 0.0000000000000000000000000000000000000000000372
#> 10 5 cat 1.00
#> 11 1 chicken 0.0000000000000000000000000000000000000000000372
#> 12 2 chicken 0.0000000000000000000000000000000000000000000372
#> 13 3 chicken 1.00
#> 14 4 chicken 0.0000000000000000000000000000000000000000000372
#> 15 5 chicken 0.0000000000000000000000000000000000000000000372
由reprex package创建于2018-02-14(v0.2.0)。
如果你加载library(methods)
会发生什么?
当我加载我保存的LDA时,我遇到了同样的问题。最后,由于没有明显的原因,当我重新启动R会话时,我再次工作。
除了Julia Silge提供的非常有用的答案之外:
我也相信加载.Rdata和topicmodels包之间的交互是罪魁祸首。但您仍然可以使用已保存的工作区:
我可以通过重新启动RStudio,加载topicmodels包然后加载.Rdata来消除这个问题。按此顺序完成,错误消息消失。首先加载数据然后包不起作用。
关于工作空间的另一个词:在LDA的情况下,将它们与您的RScripts一起使用实际上是我能够有效工作的唯一方法。根据语料库的参数和大小,拟合LDA模型可能需要几个小时。能够保存模型适合于随后进行进一步分析至关重要。