如何使用 r 从动态 url 中提取信息?

问题描述 投票:0回答:0

我无法提取动态网址上显示的值。 问题似乎是网址的动态性。

当我检查源代码并将其用作 html 内容时,我可以正确提取它。 当我实时使用 url 时,html_node 似乎返回空并且我的代码失败。

# 2 sources of html
url_source <- '<span>Earnings on <span>Thu, Aug 03</span></span><span class="Mstart(15px) Fw(500) Fz(s)"><span>1-100 of 1270 results</span></span>'
url_live <- "https://finance.yahoo.com/calendar/earnings?from=2023-07-30&to=2023-08-05&day=2023-08-03"

# HTML content to parse
#html_content <- url_source
html_content <- url_live

# Parse the HTML content
webpage <- read_html(html_content)

# Extract the value using CSS selector
value <- webpage %>%
  html_node(xpath = '//span[contains(@class, "Mstart") and contains(@class, "Fw") and contains(@class, "Fz")]/span') %>%
  html_text()

# Extract the numeric part from the text
numeric_value <- as.numeric(str_extract(value, "\\d+(?= results)"))

# Print the extracted value
print(numeric_value)
#[1] 1270 from url_source
#[1] NA from url_live

r rvest html-nodes
© www.soinside.com 2019 - 2024. All rights reserved.