抓取地图最有效的方法是什么?

问题描述 投票:0回答:1

我正在尝试从 https://mapa.targeo.pl/20.878884999999993,50.805207372713255,21?data=eyJmdHMiOnsicSI6IlRyYWZvc3RhY2phIn19 抓取数据 这是一个基于地图的网站(不使用谷歌地图)。由于网站发送大量单独的 ajax 调用,因此找到正确的端点变得很复杂。我还努力获取正确的请求 cookie 以在我的请求中传递它们。抓取网站的最简单方法是什么?

我尝试使用 selenium-wire 找到正确的 ajax 调用并抓取它们,虽然它有效,但使用这种方法抓取整个网站需要几天时间。

python api web-scraping
1个回答
0
投票

以下是如何向服务器发出 Ajax 请求的示例:

import requests

session_url = "https://m40.targeo.pl/TargeoLoader_1_7.html?gz=0&fx=&ln=&k=ODY2NzI1YjgzOWFlMWM4YjM5Zjc2N2U5MTAzNjY1Y2Q5MTE2ODA0NQ==&vn=2_5&v=full&f=ModulesInitialize&jq=&e=mapa-polski-targeo&m=1&elemsent=1"
api_url = "https://m44.targeo.pl/service.html"

params = {
    "xhr": "1",
    "djson": "djson_1709163040015_5184459181087188",
    "rpc": "FTS",
    "q": "Trafostacja",
    "c": '{"x":20.6543, "y":50.715198}',
    "z": "23",
    "querysource": "link",
    "area": '{"l":0, "t":0, "r":3768, "b":1188}',
    "mapbounds": '{"minX":20.3585275, "minY":50.651679, "maxX":21.0053475, "maxY":50.7808019}',
    "availarea": '{"t":35, "l":50, "b":1173, "r":3396}',
    "querytype": "OTHER",
    "crevgeo": "true",
    "mod": "fts",
    "suggesterCounter": "0",
    "suggester_index": "-1",
    "request_source": "mapa",
    "premium": "1",
    "_data": "{}",
    "tmk": "TargeoMap",
    "k": "ODY2NzI1YjgzOWFlMWM4YjM5Zjc2N2U5MTAzNjY1Y2Q5MTE2ODA0NQ==",
    "vn": "2_5",
    "uu": "f820ae2bd2ffd5bc10ed391484399fe5",
    "ln": "pl",
}

with requests.session() as s:
    t = s.get(session_url).text
    params["uu"] = s.cookies["U"]

    data = s.get(api_url, params=params).json()
    # print(data)
    for i in data["items"]["list"]["values"]:
        print(i["name"], i["desc"], i["xy"])

打印:


...

Trafostacja Pierzchnica. {'x': 20.75344, 'y': 50.69492}
Trafostacja Marzysz. {'x': 20.71835, 'y': 50.7674}
Trafostacja Pierzchnica. {'x': 20.75619, 'y': 50.69823}
Trafostacja Bilcza. {'x': 20.61897, 'y': 50.77823}
Trafostacja Bilcza. {'x': 20.62369, 'y': 50.77944}
Trafostacja Brzeziny. {'x': 20.58406, 'y': 50.76547}
Trafostacja Marzysz. {'x': 20.72491, 'y': 50.766}

...
© www.soinside.com 2019 - 2024. All rights reserved.