每次运行此代码时,我都会得到不同的结果:
set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5,
batch_size = 0.01, auto_iter = TRUE,
verbose = FALSE)
terms(lda_seq)
该软件包来自基于 Quanteda 的 Seededlda。
如何获得可重复的结果?
我尝试了不同的种子,但每次使用相同的种子运行代码时,结果都会完全不同。注意,set.seed()函数和seedlda包之间没有链接。
设置
options(seededlda_threads = 1)
给出可重复的结果:
library(quanteda)
library(seededlda)
options(seededlda_threads = 1)
corp <- data_corpus_moviereviews
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE,
remove_numbers = TRUE, remove_url = TRUE)
dfmt <- dfm(toks) |>
dfm_remove(stopwords("en")) |>
dfm_remove("*@*") |>
dfm_trim(max_docfreq = 0.1, docfreq_type = "prop")
set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5,
batch_size = 0.01, auto_iter = TRUE,
verbose = FALSE)
x <- terms(lda_seq)
set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5,
batch_size = 0.01, auto_iter = TRUE,
verbose = FALSE)
y <- terms(lda_seq)
waldo::compare(x, y)
#> ✔ No differences
创建于 2024-03-30,使用 reprex v2.1.0