Headless chrome 和 html 解析器字符串

Question

我目前正在使用 selenium 和 BeautifulSoup 来抓取网站，但我遇到了两个主要问题，首先，我无法让 Chrome 以无头模式启动，并且它说有多个意外的输入结束（）。我遇到的另一个问题是，我在包含“html.parser”的行上不断收到错误，指出“str”不是可调用对象。任何有关这些问题的建议将不胜感激，谢谢。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import urllib.request
import lxml
import html5lib
import time
from bs4 import BeautifulSoup

#config options
options = Options()
options.headless = True

# Set the URL you want to webscrape from
url = 'https://tokcount.com/?user=mrsam993'

# Connect to the URL
browser = webdriver.Chrome(options=options, executable_path='D:\chromedriver') #chrome_options=options
browser.get(url)

# Parse HTML and save to BeautifulSoup object
soup = BeautifulSoup(browser.page_source(), "html.parser")
browser.quit()

# for i in range(10):
links = soup.findAll('span', class_= 'odometer-value')
print(links)

Answer 1

对于无头你需要这样调用：

from selenium import webdriver

options = webdriver.ChromeOptions()
...

page_source 不是方法。所以你需要去掉括号：

browser.page_source

Answer 2

为了以无头模式启动chrome，并使用BeautifulSoup4将内容解析为html，你可以这样做：

#Importing necessary packages
from selenium import webdriver 
from selenium.webdriver.chrome.service import Service as ChromeService 
from webdriver_manager.chrome import ChromeDriverManager 

url = 'https://tokcount.com/?user=mrsam993' 

options = webdriver.ChromeOptions()  
options.headless = True 

with webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options) as driver: #modified 
    driver.get(url)
    
    print("Page URL: ", driver.current_url)
    print("Page title: ", driver.title)

    #Get the source page
    html = driver.page_source

ParsedContent = soup(html, 'html.parser')
ParsedContent

确保您拥有以下软件包：Selenium、webdriver 管理器。

pip install selenium
pip install webdriver_manager

Headless chrome 和 html 解析器字符串

问题描述投票：0回答：2

2个回答

最新问题

Headless chrome 和 html 解析器字符串

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2