如何从网页抓取动态加载的商店 URL

问题描述 投票:0回答:1

我正在开发一个网络抓取项目,并尝试从以下页面中提取商店 URL 列表:https://maroof.sa/businesses。 以下是我迄今为止尝试过的方法,但没有成功:

在Python中使用BeautifulSoup和Requests来解析HTML,但我无法找到包含商店URL的正确标签/类。 使用 Selenium 等待 JavaScript 渲染商店链接,然后提取它们,但是适当的元素

python beautifulsoup scrape
1个回答
0
投票

你可以尝试:

import json
import requests


url = "https://api.thiqah.sa/maroof/public/api/app/business/search?keyword=&businessTypeId=&businessTypeSubCategoryId=&regionId=&cityId=&certificationType=&sortBy=2&sortDirection=2&sorting=&skipCount=0&maxResultCount=10"
headers = {"apikey": "c1qesecmag8GSbxTHGRjfnMFBzAH7UAN"}

data = requests.get(url, headers=headers).json()

# print(json.dumps(data, indent=4))

for i in data["items"]:
    print(i["nameAr"])
    print(f"https://maroof.sa/businesses/details/{i['id']}")
    print()

打印:

متجر بياض
https://maroof.sa/businesses/details/229217

متجر مروه
https://maroof.sa/businesses/details/48066

المتسوقة مآثر
https://maroof.sa/businesses/details/168551

متجر أحمد للأنظمة الصوتية والألكترونيات
https://maroof.sa/businesses/details/25650

متعب للأرقام المميزة
https://maroof.sa/businesses/details/253838

متجر شوب تتش
https://maroof.sa/businesses/details/246531

مؤسسة ماجد عبدالله الشهراني للمقاولات
https://maroof.sa/businesses/details/244174

شركة إنجاز للخدمات 
https://maroof.sa/businesses/details/276939

موقع عاملتي الرقمي
https://maroof.sa/businesses/details/261892

مندوبكم
https://maroof.sa/businesses/details/112807
© www.soinside.com 2019 - 2024. All rights reserved.