1. 金额 of:访问次数,节省的时间..因为xpath使用了类似“ / dl [2] / dd [2] /”
之类的东西有刮擦声
cant_visitas = response.xpath ("//// i [@ class = 'fa fa-eye'] / following-sibling :: span [1] / text ()"). extract_first ()
with BeautifulSoup我无法提取该值(有人知道使用带有空格的标签>> [[可能)]cant_guardados = soup.find ('li', {'id': 'Number of saves'}) #. find ('span'). get_text ()
2.在BeautifulSoup中,
lists
degrees = response.xpath ('normalize-space (// div [@ class = "material-details"] / dl [2] / dd [3] / text ())'). getall ()
with
degrees = soup.find ('div', 'material-details'). select ('dl: nth-of-type (2)> dd: nth-of-type (3)')
amount_stars = soup.find('div', 'item-rating').find('div', 'stars').find_all('i', 'active-star') #div data-rating-value
amount_stars = len(amount_stars)
这带来了活跃星星的数量而不是其实际评级
PS:我认为中与xpath最接近的是选择,但最安全的是我错了BeautifulSoup
[从BeautifulSoup开始,我使用SCRAPY提取OER数据,但是我遇到3个问题:1.访问次数,节省的时间..因为xpath使用了类似“ / dl [2] / dd [2] /“和...
import requests
from bs4 import BeautifulSoup
url = 'https://www.oercommons.org/courses/randomized-synthesis-project'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
number_of_visits = soup.select_one('[title="Number of visits"]').get_text(strip=True)
number_of_saves = soup.select_one('[title="Number of saves"]').get_text(strip=True)
material_type = soup.select_one('dt:contains("Material Type:") + dd').get_text(strip=True)
data_rating = soup.select_one('[data-rating-value]')['data-rating-value']
print('Number of visits: {}'.format(number_of_visits))
print('Number of saves : {}'.format(number_of_saves))
print('Material type : {}'.format(material_type))
print('Data rating : {}'.format(data_rating))