我正在抓取一个具有如下链接的网站:https://www.europages.co.uk/ARTISAN-4-SEASON-PROJECTS/00000005443350-781559001.html。 我可以抓取公司名称、地址等,但无法获取仅在单击按钮后才显示的电话号码。任何见解都会有所帮助。我不想使用硒。
我的代码
import requests
from bs4 import BeautifulSoup
url="https://www.europages.co.uk/ARTISAN-4-SEASON-PROJECTS/00000005443350-781559001.html"
res=requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')
address = soup.find('dd').text.strip()
name= soup.find('h1',class_="ep-epages-header-title text-h6 text-sm-h4").text.strip()
print(name)
print(address)
要获取电话号码,您必须执行另一个 Ajax 请求:
import re
import requests
def get_phone_url(url):
id_ = re.search(r"(\d+-\d+).html", url).group(1)
return f"https://www.europages.co.uk/ep-api/v2/epages/{id_}/phones"
url = "https://www.europages.co.uk/ARTISAN-4-SEASON-PROJECTS/00000005443350-781559001.html"
phone_url = get_phone_url(url)
data = requests.get(phone_url).json()
print(data)
打印:
{"phones": [{"category": 14, "items": [{"type": 3, "number": "+90 <REDACTED>"}]}]}