我正在尝试从 https://mapa.targeo.pl/20.878884999999993,50.805207372713255,21?data=eyJmdHMiOnsicSI6IlRyYWZvc3RhY2phIn19 抓取数据 这是一个基于地图的网站(不使用谷歌地图)。由于网站发送大量单独的 ajax 调用,因此找到正确的端点变得很复杂。我还努力获取正确的请求 cookie 以在我的请求中传递它们。抓取网站的最简单方法是什么?
我尝试使用 selenium-wire 找到正确的 ajax 调用并抓取它们,虽然它有效,但使用这种方法抓取整个网站需要几天时间。
以下是如何向服务器发出 Ajax 请求的示例:
import requests
session_url = "https://m40.targeo.pl/TargeoLoader_1_7.html?gz=0&fx=&ln=&k=ODY2NzI1YjgzOWFlMWM4YjM5Zjc2N2U5MTAzNjY1Y2Q5MTE2ODA0NQ==&vn=2_5&v=full&f=ModulesInitialize&jq=&e=mapa-polski-targeo&m=1&elemsent=1"
api_url = "https://m44.targeo.pl/service.html"
params = {
"xhr": "1",
"djson": "djson_1709163040015_5184459181087188",
"rpc": "FTS",
"q": "Trafostacja",
"c": '{"x":20.6543, "y":50.715198}',
"z": "23",
"querysource": "link",
"area": '{"l":0, "t":0, "r":3768, "b":1188}',
"mapbounds": '{"minX":20.3585275, "minY":50.651679, "maxX":21.0053475, "maxY":50.7808019}',
"availarea": '{"t":35, "l":50, "b":1173, "r":3396}',
"querytype": "OTHER",
"crevgeo": "true",
"mod": "fts",
"suggesterCounter": "0",
"suggester_index": "-1",
"request_source": "mapa",
"premium": "1",
"_data": "{}",
"tmk": "TargeoMap",
"k": "ODY2NzI1YjgzOWFlMWM4YjM5Zjc2N2U5MTAzNjY1Y2Q5MTE2ODA0NQ==",
"vn": "2_5",
"uu": "f820ae2bd2ffd5bc10ed391484399fe5",
"ln": "pl",
}
with requests.session() as s:
t = s.get(session_url).text
params["uu"] = s.cookies["U"]
data = s.get(api_url, params=params).json()
# print(data)
for i in data["items"]["list"]["values"]:
print(i["name"], i["desc"], i["xy"])
打印:
...
Trafostacja Pierzchnica. {'x': 20.75344, 'y': 50.69492}
Trafostacja Marzysz. {'x': 20.71835, 'y': 50.7674}
Trafostacja Pierzchnica. {'x': 20.75619, 'y': 50.69823}
Trafostacja Bilcza. {'x': 20.61897, 'y': 50.77823}
Trafostacja Bilcza. {'x': 20.62369, 'y': 50.77944}
Trafostacja Brzeziny. {'x': 20.58406, 'y': 50.76547}
Trafostacja Marzysz. {'x': 20.72491, 'y': 50.766}
...