如何使用BeautifulSoup提取文本和列表

问题描述 投票:0回答:1

[从BeautifulSoup开始,我使用SCRAPY提取OER数据,但是我有3个问题:

1. 金额 of:访问次数,节省的时间..因为xpath使用了类似“ / dl [2] / dd [2] /”

之类的东西
  • 有刮擦声

    cant_visitas = response.xpath ("//// i [@ class = 'fa fa-eye'] / following-sibling :: span [1] / text ()"). extract_first ()

  • with BeautifulSoup我无法提取该值(有人知道使用带有空格的标签>> [[可能)]cant_guardados = soup.find ('li', {'id': 'Number of saves'}) #. find ('span'). get_text ()

  • 2.在BeautifulSoup中,

    lists

就像度数。getall()(带有JSON TYPE返回)的情况一样,似乎是[[find_all,只是我不知道如何这样做是因为使用xpath“所有内容都嵌套dl [2] / dd [3]”
    有刮擦声
  • degrees = response.xpath ('normalize-space (// div [@ class = "material-details"] / dl [2] / dd [3] / text ())'). getall ()

    with

  • BeautifulSoup

degrees = soup.find ('div', 'material-details'). select ('dl: nth-of-type (2)> dd: nth-of-type (3)')

3.我如何提取估值的平均值,因为它是div的属性]

amount_stars = soup.find('div', 'item-rating').find('div', 'stars').find_all('i', 'active-star') #div data-rating-value amount_stars = len(amount_stars)

这带来了活跃星星的数量而不是其实际评级

PS:我认为

BeautifulSoup

中与xpath最接近的是选择,但最安全的是我错了

[从BeautifulSoup开始,我使用SCRAPY提取OER数据,但是我遇到3个问题:1.访问次数,节省的时间..因为xpath使用了类似“ / dl [2] / dd [2] /“和...

python beautifulsoup
1个回答
0
投票
import requests from bs4 import BeautifulSoup url = 'https://www.oercommons.org/courses/randomized-synthesis-project' soup = BeautifulSoup(requests.get(url).content, 'html.parser') number_of_visits = soup.select_one('[title="Number of visits"]').get_text(strip=True) number_of_saves = soup.select_one('[title="Number of saves"]').get_text(strip=True) material_type = soup.select_one('dt:contains("Material Type:") + dd').get_text(strip=True) data_rating = soup.select_one('[data-rating-value]')['data-rating-value'] print('Number of visits: {}'.format(number_of_visits)) print('Number of saves : {}'.format(number_of_saves)) print('Material type : {}'.format(material_type)) print('Data rating : {}'.format(data_rating))
© www.soinside.com 2019 - 2024. All rights reserved.