Selenium 应用程序无需无头模式即可运行;启用无头模式时中断

问题描述 投票:0回答:1

该应用程序是网络抓取。它最终将出现在实时网站上,因此无头模式至关重要。我遵循了一些指南和视频,例如:https://www.youtube.com/watch?v=ne3BH9-5H2o

我最终想要的是我的网络应用程序将在实时网站中运行,并且用户将能够下载包含抓取数据的 CSV。

我现在所拥有的是,这在没有无头浏览器的情况下也能完美工作,并且它最初会工作,然后在无头浏览器中崩溃。我真的不太熟悉这个事情。这是我第一个使用 Python 的项目,我尝试了许多来自 Google 的建议解决方案,并尝试了人工智能聊天机器人,但一无所获。

这是我在无头浏览器中运行的输出:

DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync HTTP/1.1" 200 14
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:GET http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/element/f.BA2702E83329F95F938C6039779FB64F.d.C5F3064420B902B29643EBB8453A64CB.e.26/enabled {"id": "f.BA2702E83329F95F938C6039779FB64F.d.C5F3064420B902B29643EBB8453A64CB.e.26"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "GET /session/47d14aaef0c4b9c8b7af27a70da64850/element/f.BA2702E83329F95F938C6039779FB64F.d.C5F3064420B902B29643EBB8453A64CB.e.26/enabled HTTP/1.1" 200 14
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/element {"using": "css selector", "value": ".VfPpkd-LgbsSe.VfPpkd-LgbsSe-OWXEXe-k8QpJ.VfPpkd-LgbsSe-OWXEXe-dgl2Hf.nCP5yc.AjY5Oe.DuMIQc.LQeN7.XWZjwc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/element HTTP/1.1" 200 126
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/element/f.BA2702E83329F95F938C6039779FB64F.d.C5F3064420B902B29643EBB8453A64CB.e.26/click {"id": "f.BA2702E83329F95F938C6039779FB64F.d.C5F3064420B902B29643EBB8453A64CB.e.26"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/element/f.BA2702E83329F95F938C6039779FB64F.d.C5F3064420B902B29643EBB8453A64CB.e.26/click HTTP/1.1" 200 14
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
INFO:root:
Scraping has started. This could take a few minutes. Please do not close the browser window or click the top and move it (the script will stop if you do so).
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 830
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 830
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
INFO:root:Number of elements: 7
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 830
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 830
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync {"script": "arguments[0].scrollIntoView();", "args": [{"ELEMENT": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.68", "element-6066-11e4-a52e-4f735466cecf": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.68"}]}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync HTTP/1.1" 200 14
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 1415
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 1415
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 1415
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
INFO:root:Number of elements: 12
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 1415
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 1415
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync {"script": "arguments[0].scrollIntoView();", "args": [{"ELEMENT": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.82", "element-6066-11e4-a52e-4f735466cecf": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.82"}]}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync HTTP/1.1" 200 14
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 1415
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/elements {"using": "css selector", "value": ".hfpxzc"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/elements HTTP/1.1" 200 1415
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync {"script": "arguments[0].scrollIntoView();", "args": [{"ELEMENT": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.62", "element-6066-11e4-a52e-4f735466cecf": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.62"}]}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync HTTP/1.1" 200 14
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/element/f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.62/click {"id": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.62"}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/element/f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.62/click HTTP/1.1" 200 14
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:selenium.webdriver.remote.remote_connection:GET http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/source {}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "GET /session/47d14aaef0c4b9c8b7af27a70da64850/source HTTP/1.1" 200 901718
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 301 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://www.taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 301 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://www.taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 301 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://www.taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 301 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://www.taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 301 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://www.taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 301 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.taksihelsinki.fi:443
DEBUG:urllib3.connectionpool:https://www.taksihelsinki.fi:443 "GET /tilaa-taksi/taksiasemat/ HTTP/1.1" 200 None
INFO:root:['Taksiasema Viiskulma', '0100 6203', 'https://taksihelsinki.fi/tilaa-taksi/taksiasemat/', '', 'Laivurinrinne 2, 00120 Helsinki']
DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:37913/session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync {"script": "arguments[0].scrollIntoView();", "args": [{"ELEMENT": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.63", "element-6066-11e4-a52e-4f735466cecf": "f.BA2702E83329F95F938C6039779FB64F.d.5597E6B4F31B589B3519D21B88DB7FD1.e.63"}]}
DEBUG:urllib3.connectionpool:http://127.0.0.1:37913 "POST /session/47d14aaef0c4b9c8b7af27a70da64850/execute/sync HTTP/1.1" 404 853
DEBUG:selenium.webdriver.remote.remote_connection:Finished Request
ERROR:googleMapsScrapingToolweb:Exception on /scrape [POST]
Traceback (most recent call last):
  File "/home/vaahtlnirn1/.local/lib/python3.10/site-packages/flask/app.py", line 1463, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/vaahtlnirn1/.local/lib/python3.10/site-packages/flask/app.py", line 872, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/vaahtlnirn1/.local/lib/python3.10/site-packages/flask_cors/extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/vaahtlnirn1/.local/lib/python3.10/site-packages/flask/app.py", line 870, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/vaahtlnirn1/.local/lib/python3.10/site-packages/flask/app.py", line 855, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/home/vaahtlnirn1/googleMapsScrapingTool/googleMapsScrapingToolweb.py", line 160, in scrape
    file_path = scraper.scrape()
  File "/home/vaahtlnirn1/googleMapsScrapingTool/googleMapsScrapingToolweb.py", line 46, in scrape
    return self._selenium_extractor(browser)
  File "/home/vaahtlnirn1/googleMapsScrapingTool/googleMapsScrapingToolweb.py", line 76, in _selenium_extractor
    browser.execute_script("arguments[0].scrollIntoView();", element)
  File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webdriver.py", line 667, in execute_script
    return self.execute(command, {
  File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webdriver.py", line 318, in execute
    self.error_handler.check_response(response)
  File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found in the current frame
  (Session info: chrome-headless-shell=124.0.6367.60)

这是相关代码:

class GoogleMapsScraper:
    def __init__(self, link):
        self.link = link
        self.csv_data = []
        self.elementResults = 0

    def scrape(self):
        options = webdriver.ChromeOptions()
        options.add_argument("--headless")
        options.add_argument("--no-sandbox")
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument('--disable-gpu')
        browser = webdriver.Chrome(options=options)
        browser.maximize_window()
        browser.get(self.link)
        try:
            WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".VfPpkd-LgbsSe.VfPpkd-LgbsSe-OWXEXe-k8QpJ.VfPpkd-LgbsSe-OWXEXe-dgl2Hf.nCP5yc.AjY5Oe.DuMIQc.LQeN7.XWZjwc")))
            accept_button = browser.find_element(By.CSS_SELECTOR, ".VfPpkd-LgbsSe.VfPpkd-LgbsSe-OWXEXe-k8QpJ.VfPpkd-LgbsSe-OWXEXe-dgl2Hf.nCP5yc.AjY5Oe.DuMIQc.LQeN7.XWZjwc")
            accept_button.click()  # Click the accept button for Google cookies and terms
        except Exception as e:
            logging.error("Error accepting cookies:", e)
        return self._selenium_extractor(browser)

    def _selenium_extractor(self, browser):
        prev_length = 0
        logging.info("\nScraping has started. This could take a few minutes. Please do not close the browser window or click the top and move it (the script will stop if you do so).")

        while len(self._get_elements(browser)) < 1000:  # This limits the number of results per page. Google seemingly has a hard limit of 120, but 1000 ensures that it runs smoothly.
            # Acquiring elements to scrape
            logging.info(f"Number of elements: {len(self._get_elements(browser))}")
            var = len(self._get_elements(browser))
            last_element = self._get_elements(browser)[-1]
            browser.execute_script("arguments[0].scrollIntoView();", last_element)
            time.sleep(2)  # Sleep allows time for page to load
            a = self._get_elements(browser)

            try:
                if len(a) == var:
                    self.elementResults += 1
                    if self.elementResults > 20 or len(a) == prev_length:
                        break
                else:
                    self.elementResults = 0
                prev_length = len(a)
            except StaleElementReferenceException:
                continue
python selenium-webdriver beautifulsoup
1个回答
0
投票

使用较新的 Chrome 无头模式:

options.add_argument("--headless=new")

这可以让 Chrome 无头模式获得与常规 Chrome 浏览器相同的结果。

如果它可以在常规 Chrome 浏览器中运行,那么这也可以在无头 Chrome 中运行。

© www.soinside.com 2019 - 2024. All rights reserved.