我有一个脚本,我用来读取多个PDF文件。这是我的代码
corpus_raw <- data.frame("company" = c(),"text" = c(), check.names = FALSE)
for (i in 1:length(pdf_list)){
print(i)
document_text <- pdf_text(paste("V:/CodingProject2_FundOverview/", pdf_list[i],sep = "")) %>%
strsplit("\r\n")
document <- data.frame("company" = gsub(x = pdf_list[i],pattern = ".pdf", replacement = ""),
"text" = document_text, stringsAsFactors = FALSE, check.names = FALSE)
colnames(document) <- c("company", "text")
corpus_raw <- rbind(corpus_raw,document)
}
我收到以下错误消息:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 79, 56
我甚至试图保持check.names = FALSE
,但似乎我做错了什么。任何帮助将不胜感激。谢谢
我知道我做的事情很愚蠢。无论如何,我能够自己找出答案。
for (i in 1:length(pdf_list)){
print(i)
document_text <- pdf_text(paste("V:/CodingProject2_FundOverview/", pdf_list[i],sep = "")) %>%
strsplit("\r\n")
document <- data.frame("company" = gsub(x = pdf_list[i],pattern = ".pdf", replacement = ""),
"text" = I(document_text), stringsAsFactors = FALSE, check.names = FALSE)
colnames(document) <- c("company", "text")
corpus_raw <- rbind(corpus_raw,document)
}