这是我到目前为止的代码
pacman::p_load(dplyr, ggplot2, stringr, udpipe, lattice)
gnewsheadlines <- read.csv(file.choose(), stringsAsFactors = F)
udmodel_english <- udpipe_load_model(file = "C:/Users/Palam/Documents/english-ewt-ud-2.5-191206.udpipe")
第 2 步 – 按日期计算标题总数并绘制结果以供检查
headlinegoogle <- gnewsheadlines %>% filter(date >= "3/31/2022 ", date <= "4/3/2022")
s <- udpipe_annotate(udmodel_english,headlinegoogle$headline)
x <- data.frame(s)
这是我在运行 udpipe_annotate 时遇到的错误:
Error in `[.data.table`(out, , `:=`(c("token_id", "token", "lemma", "upos", :
Supplied 10 columns to be assigned an empty list (which may be an empty data.table or data.frame since they are lists too). To delete multiple columns use NULL instead. To add multiple empty list columns, use list(list()).
另外:警告信息:
In strsplit(x$conllu, "\\n", fixed = TRUE) : input string 1 is invalid UTF-8
看起来 headergoogle$headline 不是 UTF-8 编码。请参阅https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-tryitout.html