使用 Selenium 进行网页抓取(不起作用)

问题描述 投票:0回答:1

我是使用 Selenium 进行网页抓取的初学者。我正在尝试打开特定的谷歌个人资料(因为所有网站都已经登录)。我很高兴代码能够打开特定的配置文件窗口(配置文件为default)。但是,它无法在窗口中打开站点并开始抓取。这是代码:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time

# Installed chromedriver.exe

# Changing some arguments to set Chrome profile
options = webdriver.ChromeOptions()
# options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")

# Location where Chrome stores profiles
# options.add_arguments = {"user-data-dir": r"C:\Users\Kavipriyan\AppData\Local\Google\Chrome\User Data\Default"}
options.add_argument(r"--user-data-dir=C:\Users\Kavipriyan\AppData\Local\Google\Chrome\User Data")

# Profile name
options.add_argument(r"--profile-directory=Default")

service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(options=options, service=service) 

driver.get("https://twitter.com/home")
time.sleep(3)

driver.close()

侧面的这段代码也在控制台中给出了这个巨大的错误(我不确定哪些部分是确切需要的,所以我粘贴了它打印出来的所有内容):

Opening in existing browser session.
Traceback (most recent call last):
  File "d:\COMPUTER FILES\My Stuff\Code Programs - Python\Twitterscraping 2.0\Twitterscraper 3.2.py", line 23, in <module>
    driver = webdriver.Chrome(options=options, service=service)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kavipriyan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\selenium\webdriver\chrome\webdriver.py", line 45, in __init__
    super().__init__(
  File "C:\Users\Kavipriyan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\selenium\webdriver\chromium\webdriver.py", line 61, in __init__
    super().__init__(command_executor=executor, options=options)
  File "C:\Users\Kavipriyan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\selenium\webdriver\remote\webdriver.py", line 208, in __init__
    self.start_session(capabilities)
  File "C:\Users\Kavipriyan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\selenium\webdriver\remote\webdriver.py", line 292, in start_session     
    response = self.execute(Command.NEW_SESSION, caps)["value"]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kavipriyan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\selenium\webdriver\remote\webdriver.py", line 347, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Kavipriyan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\selenium\webdriver\remote\errorhandler.py", line 229, in check_response 
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location C:\Program Files\Google\Chrome\Application\chrome.exe is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
        GetHandleVerifier [0x00007FF667BC7062+63090]
        (No symbol) [0x00007FF667B32CB2]
        (No symbol) [0x00007FF6679CEC65]
        (No symbol) [0x00007FF667A00777]
        (No symbol) [0x00007FF6679FB2F4]
        (No symbol) [0x00007FF667A40BFB]
        (No symbol) [0x00007FF667A40830]
        (No symbol) [0x00007FF667A36D83]
        (No symbol) [0x00007FF667A083A8]
        (No symbol) [0x00007FF667A09441]
        GetHandleVerifier [0x00007FF667FC25CD+4238301]
        GetHandleVerifier [0x00007FF667FFF72D+4488509]
        GetHandleVerifier [0x00007FF667FF7A0F+4456479]
        GetHandleVerifier [0x00007FF667CA05A6+953270]
        (No symbol) [0x00007FF667B3E57F]
        (No symbol) [0x00007FF667B39254]
        (No symbol) [0x00007FF667B3938B]
        (No symbol) [0x00007FF667B29BC4]
        BaseThreadInitThunk [0x00007FFDF5047344+20]
        RtlUserThreadStart [0x00007FFDF56426B1+33]

提前致谢!

python selenium-webdriver web-scraping twitter selenium-chromedriver
1个回答
0
投票

提示Chrome无法正常启动。

这可能是由于多种原因造成的,例如 Chrome、ChromeDriver 和 Selenium 之间的兼容性问题。

尝试一下

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import time

# Specify the path to your ChromeDriver executable
chrome_driver_path = "chromedriver.exe"

# Set Chrome options
options = webdriver.ChromeOptions()

# Specify the path to the user data directory
options.add_argument(r"--user-data-dir=C:\Users\Kavipriyan\AppData\Local\Google\Chrome\User Data")

# Specify the profile directory
options.add_argument(r"--profile-directory=Default")

# Add other optional arguments if needed
# options.add_argument("--headless")  # Uncomment if you want to run Chrome in headless mode
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")

# Initialize the Chrome service and driver
service = Service(executable_path=chrome_driver_path)
driver = webdriver.Chrome(service=service, options=options)

# Load the desired URL
driver.get("https://twitter.com/home")
time.sleep(3)

# Do your scraping here...

# Close the browser session
driver.quit()
© www.soinside.com 2019 - 2024. All rights reserved.