Python html解析部分类名称

问题描述 投票:0回答:3

我正在尝试使用bs4解析网页,但是我尝试访问的所有元素都具有不同的类名。示例:class ='列表项列表…id-12984'和class ='列表项列表…id-10359'

def preownedaston(url):
    preownedaston_resp = requests.get(url)

    if preownedaston_resp.status_code == 200:
        bs = BeautifulSoup(preownedaston_resp.text, 'lxml')
        posts = bs.find_all('div', class_='') #don't know what to put here
        for p in posts:
            title_year = p.find('div', class_='inset').find('a').find('span', class_='model_year').text
            print(title_year)

preownedaston('https://preowned.astonmartin.com/preowned-cars/search/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&postcode-area=United%20Kingdom&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay&section%5B%5D=109&order=-usd_price&pageId=3760')

是否有一种方法可以解析部分类名,例如class_='list-item '

python parsing beautifulsoup
3个回答
0
投票
from selenium import webdriver
from selenium.webdriver.firefox.options import Options


options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)

driver.get("https://preowned.astonmartin.com/preowned-cars/search/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&postcode-area=United%20Kingdom&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay&section%5B%5D=109&order=-usd_price&pageId=3760")

elements = [item.text for item in driver.find_elements_by_css_selector(
    "span.model_year")]
print(elements)

driver.quit()

输出:

['2011', '2011']

0
投票

用于匹配某些属性的部分值的Css Selector如下:

div[class*='list-item'] # the * means match the class with this partial value 

但是,如果您查看页面的源代码,您将看到您尝试抓取的内容是由Javascript生成的,因此这里有三个选项

  1. 与无头浏览器一起使用Selenium渲染javescript
  2. 查找Ajax调用并尝试模拟它们,例如,此URL是网站用来检索数据[A0]的Ajax调用>
  3. 按如下方式查找您要抓取到脚本标记中的数据:
  4. 我喜欢在类似情况下使用此选项,因为您将解析Json

Ajax URL

输出:

import requests , json 
from bs4 import BeautifulSoup
URL = 'https://preowned.astonmartin.com/preowned-cars/search/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&postcode-area=United%20Kingdom&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay&section%5B%5D=109&order=-usd_price&pageId=3760'

page = requests.get(URL, headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"})
soup = BeautifulSoup(page.text, 'html.parser')
json_obj = soup.find('script',{'type':"application/ld+json"}).text
#{"@context":"http://schema.org","@graph":[{"@type":"Brand","name":""},{"@type":"OfferCatalog","itemListElement":[{"@type":"Offer","name":"Pre-Owned By Aston Martin","price":"€114,900.00","url":"https://preowned.astonmartin.com/preowned-cars/12984-aston-martin-v12-vantage-v8-volante/","itemOffered":{"@type":"Car","name":"Aston Martin V12 Vantage V8 Volante","brand":"Aston Martin","model":"V12 Vantage","itemCondition":"Used","category":"Used","productionDate":"2010","releaseDate":"2011","bodyType":"6.0 Litre V12","emissionsCO2":"388","fuelType":"Obsidian Black","mileageFromOdometer":"42000","modelDate":"2011","seatingCapacity":"2","speed":"190","vehicleEngine":"6l","vehicleInteriorColor":"Obsidian Black","color":"Black"}},{"@type":"Offer","name":"Pre-Owned By Aston Martin","price":"€99,900.00","url":"https://preowned.astonmartin.com/preowned-cars/10359-aston-martin-v12-vantage-carbon-edition-coupe/","itemOffered":{"@type":"Car","name":"Aston Martin V12 Vantage Carbon Edition Coupe","brand":"Aston Martin","model":"V12 Vantage","itemCondition":"Used","category":"Used","productionDate":"2011","releaseDate":"2011","bodyType":"6.0 Litre V12","emissionsCO2":"388","fuelType":"Obsidian Black","mileageFromOdometer":"42000","modelDate":"2011","seatingCapacity":"2","speed":"190","vehicleEngine":"6l","vehicleInteriorColor":"Obsidian Black","color":"Black"}}]},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":"1","item":{"@id":"https://preowned.astonmartin.com/","name":"Homepage"}},{"@type":"ListItem","position":"2","item":{"@id":"https://preowned.astonmartin.com/preowned-cars/","name":"Pre-Owned Cars"}},{"@type":"ListItem","position":"3","item":{"@id":"//preowned.astonmartin.com/preowned-cars/search/","name":"Pre-Owned By Aston Martin"}}]}]}
items = json.loads(json_obj)['@graph'][1]['itemListElement']
for item in items :
    print(item['itemOffered']['name'])

0
投票

来自链接实际的信息以JSON格式返回,这意味着您可以轻松提取所需的字段。例如:

© www.soinside.com 2019 - 2024. All rights reserved.