Pyppeteer 加载页面后 HTML 内容为空白

Question

所以我尝试使用 Pyppeteer（Puppeteer 的非官方 Python 端口），并且我尝试抓取一个站点并尝试选择一个元素。

例如，我一直在尝试等待具有“tab”类的元素

elements = await page.querySelectorAll('.tab')

但是没有任何返回，并且出现超时错误。

事实上，当我尝试等待我加载的网站上具有任何类的任何元素时，我会收到超时错误。我尝试通过编写 HTML 并调查该 HTML 来排除故障，但我不仅没有看到任何元素，而且当我打开它时，我的页面是空白的

这是我到目前为止的代码

import asyncio
from pyppeteer import launch
import requests
import sys
import json

print("Test")
print("Starting script...")

def print_cookies_as_json(cookies):
    cookies_json = json.dumps(cookies, indent=4)
    print("Cookies in JSON format:")
    print(cookies_json)

async def main():
    try:
        print("Launching browser...")
        browser = await launch(headless=False)
        page = await browser.newPage()
    except Exception as e:
        print(f"Error launching browser or creating new page: {e}")
        return

    try:
        print("Reading cookies from file...")
        # Load cookies from JSON file
        try:
            with open('/path/to/usrCookies.json', 'r') as f:
                cookies = json.load(f)
            if not cookies:
                raise ValueError("No valid cookies found in the file.")
            await page.setCookie(*cookies)
        except (FileNotFoundError, ValueError) as e:
            print(f"Error reading cookies: {e}")
            return

        # Print cookies in JSON format
        print_cookies_as_json(cookies)

        print("Navigating to URL...")
        url = 'https://example.com'
        await page.goto(url, {'waitUntil': 'networkidle0'})
        response = requests.get(url, cookies={c['name']: c['value'] for c in cookies})
        json_response = response.json()
        print(json_response)

        print("Processing JSON data...")

        await page.screenshot({'path': 'screenshot_TEST.png', 'fullPage': True})
        print("Waiting for page to load...")
        await page.waitForSelector('body', {'timeout': 10000})  # wait for the body to load

        await asyncio.sleep(1)

        # Get the HTML content of the page
        print("Getting HTML content...")
        html = await page.content()

        # Write the HTML content to a file
        with open("index.html", "w", encoding="utf-8") as file:
           file.write(html)

    except Exception as e:
        print("An error occurred:", e)
        await page.screenshot({'path': 'screenshot_ERROR.png', 'fullPage': True})
        print("Screenshot saved as screenshot.png")
    finally:
        # Close the browser
        try:
            print("Closing browser...")
            await browser.close()
        except Exception as e:
            print(f"Error closing browser: {e}")

asyncio.get_event_loop().run_until_complete(main())

这只是我的代码的一个通用片段。我的代码执行方式可能存在缺陷吗？

Answer 1

尝试改变你的写入方法。将“html”更改为“str(html)”但我正在使用 BeautifulSoup 来解析 html。尝试用 BS 解析你的 html，然后保存。

r = await page.content()
soup = BeautifulSoup(r, 'html.parser')

html = soup.find('table', class_='TABLEBORDER')

with open('files/output.html', 'w', encoding='utf-8') as file:
            file.write(str(html))

Pyppeteer 加载页面后 HTML 内容为空白

问题描述投票：0回答：1

1个回答

最新问题

Pyppeteer 加载页面后 HTML 内容为空白

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1