如何读取R中的多个PDF文件?

问题描述 投票:0回答:1

我有一个脚本,我用来读取多个PDF文件。这是我的代码

corpus_raw <- data.frame("company" = c(),"text" = c(), check.names = FALSE)

for (i in 1:length(pdf_list)){
  print(i)
  document_text <- pdf_text(paste("V:/CodingProject2_FundOverview/", pdf_list[i],sep = "")) %>% 
    strsplit("\r\n") 

  document <- data.frame("company" = gsub(x = pdf_list[i],pattern = ".pdf", replacement = ""), 
              "text" = document_text, stringsAsFactors = FALSE, check.names = FALSE)

  colnames(document) <- c("company", "text")
  corpus_raw <- rbind(corpus_raw,document) 
}

我收到以下错误消息:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 79, 56

我甚至试图保持check.names = FALSE,但似乎我做错了什么。任何帮助将不胜感激。谢谢

r pdf
1个回答
0
投票

我知道我做的事情很愚蠢。无论如何,我能够自己找出答案。

for (i in 1:length(pdf_list)){
  print(i)
  document_text <- pdf_text(paste("V:/CodingProject2_FundOverview/", pdf_list[i],sep = "")) %>% 
    strsplit("\r\n") 

  document <- data.frame("company" = gsub(x = pdf_list[i],pattern = ".pdf", replacement = ""), 
              "text" = I(document_text), stringsAsFactors = FALSE, check.names = FALSE)

  colnames(document) <- c("company", "text")
  corpus_raw <- rbind(corpus_raw,document) 
}
© www.soinside.com 2019 - 2024. All rights reserved.