我试图通过一个简单的请求从该表中抓取数据,但在尝试使用表类后,它返回“none”:
table = soup.find("table", class_ = "hp")
尝试任何表都会返回空:
table = soup.find_all("table")
我该如何解决这个问题?
完整代码如下:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://aviation-safety.net/database/year/2024/1"
response = requests.get(url)
#print(response)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_ = "hp")
print(table)
我使用 pandas 的目的是稍后归档 .csv。
您被网站阻止,很可能是因为默认的请求用户代理标头 (MDN),
python-requests/<version>
。
如果您检查
response.text
的值,它会显示类似
Sorry, something went wrong. You can contact us via <email>, should the problem persist.
您应该将 User-Agent 标头设置为不同的内容。例如,
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://aviation-safety.net/database/year/2024/1"
response = requests.get(url, headers={"User-Agent": "your-user-agent-string")
#print(response.text)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_ = "hp")
print(table)