如何使用 Python/BeautifulSoup 在 HTML 网站上查找并提取时间戳？

Question

我正在尝试编写一个机器人，当德国气象局（Deutsche Wetterdienst DWD）发出警告时，该机器人会发送电子邮件。该机器人将在我的 Raspberry Pi 上用 Python 实现。

我想从 DWD 网站提取一些信息，比如说柏林。网址可以是 https://www.dwd.de/DE/wetter/warnungen_gemeinden/warnWetter_node.html?ort=Berlin-Mitte

首先我想提取最新的时间戳（）。当我检查此页面的 HTML 信息时，我找到了相应的 id="HeaderBox" 以及所需的时间戳 ()。

不幸的是，当我用Python拉取HTML代码时，没有给出这个日期和时间。这是我的代码：

import requests
from bs4 import BeautifulSoup
url = "https://www.dwd.de/DE/wetter/warnungen_gemeinden/warnWetter_node.html?ort=Berlin-Mitte"
r = requests.get(url)
doc = BeautifulSoup(r.text, "html.parser")
doctext = doc.get_text()
print(doctext)

结果总是只是“Letzte Aktualisierung：”和一个“空”行，即使我尝试

last_date = doc.find(id="headerBox")

。

我正在使用 PyCharm IDE（社区版）和 Python 3.11。

如有任何提示或想法，我们将不胜感激。

最诚挚的问候，克里斯蒂安

Answer 1

您遇到的问题可能与页面的加载或结构方式有关。有些网站使用 JavaScript 来加载动态内容，当您使用请求来获取 HTML 内容时，您可能无法获取动态生成的内容。

要从具有动态内容的网站中提取信息，您可以使用无头浏览器自动化工具（例如 Selenium），它可以与网页交互并在完全加载后检索内容。以下是如何修改脚本以使用 Selenium 提取时间戳：

首先，您需要安装Selenium。您可以使用 pip 来完成此操作：

pip install selenium

现在，您可以修改脚本：

from selenium import webdriver
from selenium.webdriver.common.by import By

# Set up the Selenium web driver (you will need to download a compatible webdriver for your browser)
# For example, for Chrome, you can download the chromedriver: https://sites.google.com/chromium.org/driver/
driver = webdriver.Chrome(executable_path='/path/to/chromedriver')

url = "https://www.dwd.de/DE/wetter/warnungen_gemeinden/warnWetter_node.html?ort=Berlin-Mitte"
driver.get(url)

# Wait for the page to load (you may need to adjust the timeout)
driver.implicitly_wait(10)

# Find the element with the timestamp by its ID
timestamp_element = driver.find_element(By.ID, "headerBox")

# Extract the timestamp text
timestamp_text = timestamp_element.text

# Close the web driver
driver.quit()

# Print the timestamp
print("Timestamp:", timestamp_text)

确保将 /path/to/chromedriver 替换为 Raspberry Pi 上 Chrome WebDriver 可执行文件的实际路径。

此脚本将在无头浏览器中打开网页，等待其加载，通过 ID 查找带有时间戳的元素，然后提取并打印时间戳。

如何使用 Python/BeautifulSoup 在 HTML 网站上查找并提取时间戳？

问题描述投票：0回答：1

1个回答

最新问题

如何使用 Python/BeautifulSoup 在 HTML 网站上查找并提取时间戳？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1