所以我尝试使用 Pyppeteer(Puppeteer 的非官方 Python 端口),并且我尝试抓取一个站点并尝试选择一个元素。
例如,我一直在尝试等待具有“tab”类的元素
elements = await page.querySelectorAll('.tab')
但是没有任何返回,并且出现超时错误。
事实上,当我尝试等待我加载的网站上具有任何类的任何元素时,我会收到超时错误。我尝试通过编写 HTML 并调查该 HTML 来排除故障,但我不仅没有看到任何元素,而且当我打开它时,我的页面是空白的
这是我到目前为止的代码
import asyncio
from pyppeteer import launch
import requests
import sys
import json
print("Test")
print("Starting script...")
def print_cookies_as_json(cookies):
cookies_json = json.dumps(cookies, indent=4)
print("Cookies in JSON format:")
print(cookies_json)
async def main():
try:
print("Launching browser...")
browser = await launch(headless=False)
page = await browser.newPage()
except Exception as e:
print(f"Error launching browser or creating new page: {e}")
return
try:
print("Reading cookies from file...")
# Load cookies from JSON file
try:
with open('/path/to/usrCookies.json', 'r') as f:
cookies = json.load(f)
if not cookies:
raise ValueError("No valid cookies found in the file.")
await page.setCookie(*cookies)
except (FileNotFoundError, ValueError) as e:
print(f"Error reading cookies: {e}")
return
# Print cookies in JSON format
print_cookies_as_json(cookies)
print("Navigating to URL...")
url = 'https://example.com'
await page.goto(url, {'waitUntil': 'networkidle0'})
response = requests.get(url, cookies={c['name']: c['value'] for c in cookies})
json_response = response.json()
print(json_response)
print("Processing JSON data...")
await page.screenshot({'path': 'screenshot_TEST.png', 'fullPage': True})
print("Waiting for page to load...")
await page.waitForSelector('body', {'timeout': 10000}) # wait for the body to load
await asyncio.sleep(1)
# Get the HTML content of the page
print("Getting HTML content...")
html = await page.content()
# Write the HTML content to a file
with open("index.html", "w", encoding="utf-8") as file:
file.write(html)
except Exception as e:
print("An error occurred:", e)
await page.screenshot({'path': 'screenshot_ERROR.png', 'fullPage': True})
print("Screenshot saved as screenshot.png")
finally:
# Close the browser
try:
print("Closing browser...")
await browser.close()
except Exception as e:
print(f"Error closing browser: {e}")
asyncio.get_event_loop().run_until_complete(main())
这只是我的代码的一个通用片段。我的代码执行方式可能存在缺陷吗?
尝试改变你的写入方法。将“html”更改为“str(html)”但我正在使用 BeautifulSoup 来解析 html。尝试用 BS 解析你的 html,然后保存。
r = await page.content()
soup = BeautifulSoup(r, 'html.parser')
html = soup.find('table', class_='TABLEBORDER')
with open('files/output.html', 'w', encoding='utf-8') as file:
file.write(str(html))