Python,BS和Selenium

问题描述 投票:0回答:1

[我尝试使用javascript dynamic + bs + python进行webscrap,我读了很多有关此代码的内容,例如,我尝试在著名网站上取消使用javascript渲染的价格:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.nespresso.com/fr/fr/order/capsules/original/"

browser = webdriver.PhantomJS(executable_path = "C:/phantomjs-2.1.1-windows/bin/phantomjs.exe")
browser.get(url)
html = browser.page_source

soup = BeautifulSoup(html, 'lxml')

soup.find("span", {'class':'ProductListElement__price'}).text

但是结果只有'\ xa0',它是源值,而不是javascript值,我真的不知道我做错了什么...

最诚挚的问候

python selenium web-scraping beautifulsoup
1个回答
0
投票

这里有两种获取价格的方法

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.nespresso.com/fr/fr/order/capsules/original/"

browser = webdriver.Chrome()
browser.get(url)
html = browser.page_source

# Getting the prices using bs4
soup = BeautifulSoup(html, 'lxml')
prices = soup.select('.ProductListElement__price')
print([p.text for p in prices])

# Getting the prices using selenium 
prices =browser.find_elements_by_class_name("ProductListElement__price")
print([p.text for p in prices])
© www.soinside.com 2019 - 2024. All rights reserved.