在我尝试将网络评论抓取到数据框之前,请先询问是否有人询问。我的问题是,它会刮擦同一条评论10次,而不是10条不同的评论。
'''进口要求从bs4导入BeautifulSoup以pd格式导入熊猫
url ='https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel'
for page in range(10):
page = requests.get("https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel")
soup = BeautifulSoup(page.content, 'html.parser')
general_data = soup.find_all(class_='bvseo-review')
i = 1
first = general_data[i]
i+=1
for item in general_data:
span = first.find_all('span')
description = first.find_all('span', attrs={'itemprop':'description'})
rating = first.find_all('span', attrs={'itemprop':'ratingValue'})
auteur = first.find_all('span', attrs={'itemprop':'author'})
pagereviews = pd.DataFrame({
"description":description,
"ratingValue":rating,
"author":auteur
})
pagereviews
'''
结果将是DF将包含10个独特的评论。
我将for循环替换为
span = []
description = []
rating = []
auteur = []
for item in general_data:
span.append(item.find_all('span'))
description.append(item.find_all('span', attrs={'itemprop':'description'}))
rating.append(item.find_all('span', attrs={'itemprop':'ratingValue'}))
auteur.append(item.find_all('span', attrs={'itemprop':'author'}))