使用Python和精美汤进行多页Web抓取

问题描述 投票:0回答:1

[我正在尝试编写代码以从有关酒店的页面中删除某些日期。最终信息(酒店名称和地址)应导出到csv。该代码有效,但仅在一页上...

import requests
import pandas as pd
from bs4 import BeautifulSoup # HTML data structure

page_url = requests.get('https://e-turysta.pl/noclegi-krakow/')
soup = BeautifulSoup(page_url.content, 'html.parser')

list = soup.find(id='nav-lista-obiektow')
items = list.find_all(class_='et-list__details flex-grow-1 d-flex d-md-block flex-column')

nazwa_noclegu = [item.find(class_='h3 et-list__details__name').get_text() for item in items]
adres_noclegu = [item.find(class_='et-list__city').get_text() for item in items]

dane = pd.DataFrame(
    {
        'nazwa' : nazwa_noclegu,
        'adres' : adres_noclegu
    }
)

print(dane)

dane.to_csv('noclegi.csv')

我尝试了一个循环但不起作用:

for i in range(22):
    url = requests.get('https://e-turysta.pl/noclegi-krakow/'.format(i+1)).text
    soup = BeautifulSoup(url, 'html.parser')

有什么想法吗?

python pandas loops beautifulsoup python-requests
1个回答
0
投票

在循环中,您使用.format()函数,但需要在要格式化的字符串中插入方括号。

for i in range(22):
    url = requests.get('https://e-turysta.pl/noclegi-krakow/{}'.format(i+1)).text
    soup = BeautifulSoup(url, 'html.parser')
© www.soinside.com 2019 - 2024. All rights reserved.