获取网页的最后一页编号 - aiohttp

问题描述 投票:0回答:0

我有一个抓取亚马逊网站的项目。但是,我无法获得分页的最后页码。

import telegram, aiohttp

async def main(self,links ,semaphore=8):
    try:
        s = HTMLSession()
        async with aiohttp.ClientSession() as asession:
            sem = asyncio.Semaphore(semaphore)
            urls = []
            for cat_link in links:
                API_link = cat_link
                resp = s.get(API_link)
                pages = resp.html.xpath('//li[@class="a-disabled" and contains(text(),"")][last()]/text()', first=True)
                print(pages)
                if not pages : pages = 1
                urls_1 = [cat_link + f'&page={p}' for p in range(1,(int(pages)+1))]
                for url_1 in urls_1: urls.append(url_1)
                #print(urls)
            tasks = [asyncio.ensure_future(self.fetch_eith_sem(sem, url, asession)) for url in urls]
            await asyncio.gather(*tasks)
        await asession.close()
    except aiohttp.ClientConnectionError as e:
        print('Error handelded')

这里的挑剔是行不通的 pages = resp.html.xpath('//li[@class="a-disabled" and contains(text(),"")][last()]/text()', first=True)

示例链接:

https://www.amazon.com/s?i=specialty-aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A502394&ref=nav_em__nav_desktop_sa_intl_camera_and_photo_0_2_5_3

python python-3.x aiohttp
© www.soinside.com 2019 - 2024. All rights reserved.