使用 selenium 无头浏览器在无 GUI 环境中进行抓取

Question

目前在没有 GUI 的环境中测试我的项目，它是用 python 编写的，以便使用 selenium 包和无头浏览器从 facebook 市场抓取数据，链接到该项目：https://github.com/lokman-sassi/FMP -Scraper-with-Selenium ，为此我使用 ubuntu 22.04 作为 Windows 中的子系统（仅终端）。问题是，当我阅读有关 selenium 的文档时，它说我根本不需要在计算机上安装浏览器即可使用，他只会使用浏览器的驱动程序，但我很惊讶在执行我的在 ubuntu 中的文件中，他返回给我一个错误，说我没有安装 chrome！这与文档相反，我该如何解决这个问题，因为我想废弃而不需要在我的计算机上安装浏览器

Answer 1

虽然 Selenium 确实在无头环境中运行（这意味着它不需要图形用户界面），但它仍然需要在您的系统上安装浏览器二进制文件（例如 Chrome、Firefox）。无头模式允许 Selenium 在后台运行浏览器而不弹出窗口。

+-------------------+   +-------------------+   +------------------------+
| Your Python Code  |-->| Selenium Package  |-->| Headless Browser       |
+-------------------+   +-------------------+   + (e.g., Chrome/Firefox) |
                                      ^         +------------------------+
                                      |                  
                                Requires Browser
                                 Binary to Run

在您的系统上安装浏览器二进制文件，即使您在无头模式下运行它。

sudo apt update
sudo apt install wget
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb

安装 Chrome 后，请确保您的 Selenium 脚本已正确配置为在无头模式下使用 Chrome。 * （注意：我假设 Selenium 4+，使用服务对象，并且executable_path已被弃用）

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True  # Enable headless mode

# Create a Service object, pointing to the path of chromedriver
service = ChromeService(executable_path='/path/to/chromedriver')

# Initialize the Chrome driver with the service and options
driver = webdriver.Chrome(service=service, options=options)

driver.get("https://www.facebook.com/marketplace")
# Your scraping logic here

使用 selenium 无头浏览器在无 GUI 环境中进行抓取

问题描述投票：0回答：1

1个回答

最新问题

使用 selenium 无头浏览器在无 GUI 环境中进行抓取

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1