带有 Selenium 的 Google Chrome

问题描述 投票:0回答:3

如何使用

selenium
google chrome
来抓取网站?

virtualenv
呢?是必须的吗?为什么使用它/为什么不使用
virtualenv

#安装谷歌浏览器

wget -c wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb
apt-get -f install

#安装硒

apt-get install python-dev python-pip
pip install selenium

#selenium_scrape.py

检查其是否工作的简单脚本

import time
from selenium import webdriver
 
driver = webdriver.Chrome()
time.sleep(5)
driver.quit()

#命令

python selenium_scrape.py

#错误

Traceback (most recent call last):
  File "selenium_scrape.py", line 4, in <module>
    driver = webdriver.Chrome('/lib/modules/3.16.0-4-amd64/kernel/drivers/platform/chrome')
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 61, in __init__
    self.service.start()
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 74, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chrome' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home

Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.chrome.service.Service object at 0x7f88e9347190>> ignored

#完整脚本

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
 
def init_driver():
    driver = webdriver.Chrome()
    driver.wait = WebDriverWait(driver, 5)
    return driver
 
def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
        box = driver.wait.until(EC.presence_of_element_located(
            (By.NAME, "q")))
        button = driver.wait.until(EC.element_to_be_clickable(
            (By.NAME, "btnK")))
        box.send_keys(query)
        button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")
 
if __name__ == "__main__":
    driver = init_driver()
    lookup(driver, "Selenium")
    time.sleep(5)
    driver.quit()
python linux selenium
3个回答
4
投票

不同的是,你不能使用打包的Chrome浏览器;你需要一个特殊的驱动程序...chromedriver。

在此处获取当前最新版本: Chromedriver

现在您有 2 个选项,要么移动下载的 chromedriver,使其始终可访问(选项 1),要么在脚本中定义如何访问它。

选项 1:将其移至路径中

然后移动它,以便您使用时可以访问

webdriver.Chrome()

sudo mv /path/to/download/chromedriver /usr/bin

同时设置允许执行:

chmod a+x /usr/binchromedriver

选项 2:不要将其移至路径中

或者你可以定义一个路径

import os
chr = "/Users/you/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chr
driver = webdriver.Chrome(chromedriver)

2
投票

(注:最初的问题是关于 Chrome 的,所以我的答案是关于 Chrome 的,而不是 Firefox 的)。

对我来说,如果我只是将 chromedriver 提取到脚本所在的同一文件夹中,就可以了。

然后我这样运行

Xvfb :99 -ac -screen 0 1280x1024x16 &
echo 'Starting the test'
PATH=$PATH:. python selenimum_scrape.py

这将启动 Xvfb 并将 crome 驱动程序包含到

PATH
中。

以及对我有用的修改版本:

import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

# comment this out to run on the real display
os.environ['DISPLAY'] = ':99'

def init_driver():
    driver = webdriver.Chrome()
    driver.wait = WebDriverWait(driver, 5)
    return driver

def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
        box = driver.wait.until(EC.presence_of_element_located(
            (By.NAME, "q")))
        # once we type the query, this button disappears
        # button = driver.wait.until(EC.element_to_be_clickable(
        #     (By.NAME, "btnK")))
        box.send_keys(query)
        button = driver.wait.until(EC.element_to_be_clickable(
            (By.NAME, "btnG")))
        button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")

if __name__ == "__main__":
    driver = init_driver()
    lookup(driver, "Selenium")
    time.sleep(5)
    driver.quit()

-1
投票

问题(目前)是关于缩进错误。这很容易解决:

def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
        box = driver.wait.until(EC.presence_of_element_located(
            (By.NAME, "q")))
        button = driver.wait.until(EC.element_to_be_clickable(
            (By.NAME, "btnK")))
        box.send_keys(query)
        button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")
© www.soinside.com 2019 - 2024. All rights reserved.