webscraping to a pandas DF

问题描述 投票:0回答:1

在我尝试将网络评论抓取到数据框之前,请先询问是否有人询问。我的问题是,它会刮擦同一条评论10次,而不是10条不同的评论。

'''进口要求从bs4导入BeautifulSoup以pd格式导入熊猫

url ='https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel'

for page in range(10):
page = requests.get("https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel")

soup = BeautifulSoup(page.content, 'html.parser')

general_data = soup.find_all(class_='bvseo-review')
i = 1
first = general_data[i]
i+=1

for item in general_data:
    span = first.find_all('span')
    description = first.find_all('span', attrs={'itemprop':'description'})
    rating = first.find_all('span', attrs={'itemprop':'ratingValue'})
    auteur = first.find_all('span', attrs={'itemprop':'author'})

pagereviews = pd.DataFrame({
    "description":description,
    "ratingValue":rating,
    "author":auteur
})

pagereviews

'''

结果将是DF将包含10个独特的评论。

python-3.x
1个回答
0
投票

我将for循环替换为

span = []
description = []
rating = []
auteur = []
for item in general_data:
    span.append(item.find_all('span'))
    description.append(item.find_all('span', attrs={'itemprop':'description'}))
    rating.append(item.find_all('span', attrs={'itemprop':'ratingValue'}))
    auteur.append(item.find_all('span', attrs={'itemprop':'author'}))

© www.soinside.com 2019 - 2024. All rights reserved.