Trustpilot 评论抓取分页不起作用

Question

我一直在尝试从 Trustpilot 的几个页面中抓取关于 DoorDash 的客户评论，但由于某种原因，它只一遍又一遍地抓取第一页（似乎分页不起作用）！这是我的代码：

review_text=[]
review_score=[]
review_date=[]
review_title=[]

pages = np.arange(1, 10, 1)
for page in pages:
    page = requests.get("https://www.trustpilot.com/review/doordash.com" + "?page=" + str(page))
    sleep(randint(2,10))
    if response.status_code == 200:
        soup = bs4.BeautifulSoup(response.text)
        for rev in soup.find_all('div',class_="review-content"):
            nv = rev.find_all('p',class_= 'review-content__text')
            review = rev.p.text.strip() if len(nv) == True else '-'
            review_text.append(review)            
            date_json = json.loads(rev.find('script').string)
            date = date_json['publishedDate']
            review_date.append(date)
        for rev in soup.find_all('div',class_='star-rating star-rating--medium'):
            review_score.append(rev.find('img').get('alt'))
        for rev in soup.find_all('h2',class_='review-content__title'):
            review_title.append(rev.text.strip())
    else:
        print("Issue getting url")

有人知道我该如何解决这个问题吗？（除了分页之外，其他一切都完美）谢谢！

Answer 1

Trustpilot 中的分页不是使用第 1 页、第 2 页完成的，您需要获取下一页 URL 并抓取其内容。在此示例中，您可以了解如何获取下一页 URL 以使用页面抓取

base_url = "https://trustpilot.com/review/doordash.com"
general= "https://trustpilot.com"
Numberpage=20
for i in range(1,Numpages):
    page = requests.get(base_url, verify=False)
    tree = html.fromstring(page.content)
    next_page = tree.xpath("//a[contains(@class, 'next-page')]")
    if next_page:
        base_url = general + next_page[0].get('href')
    #place the function that collects reviews from one page here
    scrape_page(base_url)

Answer 2

如果您想增加流量，引导潜在客户使用您的服务或产品，并确保实现网站的销售目标，那么您来对地方了。您需要做的是为您的企业购买 TrustPilot 评论，我们将竭诚为您提供最优惠的价格。购买 TrustPilot 评论对于任何企业主来说都是最好的决定，因为目前消费者在购买任何东西之前总是会寻找正面评论

Trustpilot 评论抓取分页不起作用

问题描述投票：0回答：2

2个回答

最新问题

Trustpilot 评论抓取分页不起作用

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2