在 RStudio 中抓取阿拉伯语链接时出现不一致的错误

Question

我正在网络抓取一份在线阿拉伯语期刊。在我的电脑上，我的代码运行良好。但是，当我在学校托管的远程桌面上使用它时，出现错误：

map()

中的错误： ℹ 在索引中：1。

open.connection()

错误导致：！使用错误/非法格式或缺少 URL 的 URL 运行

rlang::last_trace()

查看错误发生的位置。

我有这个功能，可以从给定主数据框链接的各个页面中抓取文本。

    get_AR_date_title <- function(article_link) {

  
  article_page <- read_html(article_link)
  
  #Scrape body
  text_AR = article_page %>%
    html_elements(xpath = "/html/body/section/div[7]/div/div[1]/div[1]/div[2]/div[6]/p") %>%
    html_text2() %>%
    paste(collapse = ",")
  
  #Scrape title
  text_title <- article_page %>% html_elements(xpath = "/html/body/section/div[7]/div/div[1]/div[1]/div[2]/div[1]/h1") %>%
    html_text2() %>% paste(collapse = ",")
  
  #scrape date
  text_date <- article_page %>% html_elements(xpath = "/html/body/section/div[7]/div/div[1]/div[1]/div[2]/div[4]/div[1]/span[3]") %>%
    html_text2() %>% paste(collapse = ",")
  
  #scrape text heading. There are a lot of articles that do not have a heading so here I return NA if there is no value.
  text_heading <- ifelse(length(article_page %>% html_elements(xpath = "/html/body/section/div[7]/div/div[1]/div[1]/div[1]/div[1]/a")) > 0,
                         article_page %>% html_elements(xpath = "/html/body/section/div[7]/div/div[1]/div[1]/div[1]/div[1]/a") %>%
                           html_text2() %>% paste(collapse = ","),NA)
  
  #put the values made in the function into a df and return it   
  output_text <- data.frame(text_AR, text_title, text_date,text_heading)
  return(output_text)
  

}

然后，这通过每个链接重复它，聚集在一个单独的 for 循环中，这里没有包含它，它抓取每个页面的链接。

article_data <- map_df(alrai$article_links, get_AR_date_title)

错误发生在地图函数处。我不确定是什么问题。同样，它在我的 PC 上运行良好，但在我学校的计算机上不起作用。这些链接包含阿拉伯语，我认为这可能是它不起作用的原因。但是，使用 url() 告诉我链接是可读的。

在 RStudio 中抓取阿拉伯语链接时出现不一致的错误

问题描述投票：0回答：0

最新问题

在 RStudio 中抓取阿拉伯语链接时出现不一致的错误

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0