当我抓取包含使用无头选项的产品的页面时,会得到不同的结果。对于相同的问题,一次我得到未排序的结果,另一次得到正确排序的结果。
Selenium firefox浏览器:
firefox_options = Options()
firefox_options.headless = True
browser = webdriver.Firefox(options=firefox_options, executable_path=firefox_driver)
根据this帖子:“使用无头选项时,firefox不会发送不同的标头”。
如何使用无头选项从抓取中获得恒定的结果?还是如何避免浏览器窗口在无头[False]模型中弹出?
感谢您的任何建议
理想情况下,使用和不使用firefox_options.headless = True
不会对要渲染的DOM Tree中的元素产生任何重大影响,但就Viewport而言,可能会有重大差异。
例如,当GeckoDriver / Firefox与--headless
选项一起初始化时,默认的视口为width = 1366px, height = 768px
,其中GeckoDriver / Firefox在没有--headless
的情况下初始化]选项,默认的视口是width = 1382px, height = 744px
。
示例代码:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.FirefoxOptions()
options.headless = True
driver = webdriver.Firefox(options=options, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.google.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
print ("Headless Firefox Initialized")
size = driver.get_window_size()
print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
driver.quit()
driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.google.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
print ("Firefox Initialized")
size = driver.get_window_size()
print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
driver.quit()
控制台输出:
Headless Firefox Initialized
Window size: width = 1366px, height = 768px
Firefox Initialized
Window size: width = 1382px, height = 744px
根据以上观察结果,可以推断出,使用--headless
选项,GeckoDriver / Firefox将Viewport减小后打开Browsing Context,因此,所标识的元素数量可以为less。] >
使用GeckoDriver / Firefox启动浏览上下文时,始终以maximized
模式打开:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.FirefoxOptions()
options.headless = True
options.add_argument("start-maximized")
driver = webdriver.Firefox(options=options, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.google.com/")