遍历 24 页 - 使用解析器脚本从 a 到 z：通过请求和 xpath 获取从美国到 zypria（塞浦路斯）的医院... - python - SO中文参考

我目前正在制作一个爬虫：“医生和医疗机构：全球列表”

其中列出了世界各地会说英语的医生、医疗机构和从业人员，以帮助海外的英国人获得医疗保健。

注意：有一个从a到z的列表：

https://www.gov.uk/government/collections/doctors-and-medical-facilities-worldwide-list#b

查看以色列的医疗设施列表：

https://www.gov.uk/government/publications/cyprus-list-of-hospitals https://www.gov.uk/government/publications/israel-list-of-medical-facilities

我的

方法是第一眼看到页面#s

第一步，我选择一页 - 从许多页面中选择（请参阅：

https://www.gov.uk/government/collections/doctors-and-medical-facilities-worldwide-list#b）

我添加了一个函数来处理找不到表的情况，并添加了一些错误处理以使其更加健壮：

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Function to scrape data from a given URL
def scrape_medical_facilities(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.
  

    text, 'html.parser')
    
    # Find the table
    table = soup.find('table')
    
    # Check if the table exists
    if table:
        # Initialize lists to store data
        names = []
        addresses = []
        
        # Iterate through rows in the table
        for row in table.find_all('tr')[1:]:  # Skip header row
            columns = row.find_all('td')
            name = columns[0].get_text(strip=True)
            address = columns[1].get_text(strip=True)
            
            names.append(name)
            addresses.append(address)
        
        # Create a DataFrame
        df = pd.DataFrame({'Name': names, 'Address': addresses})
    else:
        # If the table is not found, create an empty DataFrame
        df = pd.DataFrame(columns=['Name', 'Address'])
    
    return df

# URLs for medical facilities in Israel
israel_medical_facilities_urls = [
    'https://www.gov.uk/government/publications/israel-list-of-medical-facilities',
    # Add more URLs if there are multiple pages
]

# Scrape data from each URL and concatenate into a single DataFrame
df_israel = pd.concat([scrape_medical_facilities(url) for url in israel_medical_facilities_urls], ignore_index=True)

# Save the DataFrame to a CSV file
df_israel.to_csv('israel_medical_facilities.csv', index=False)

0
投票

你的问题不清楚，但我想我明白你想要什么。你想要获得从 A 到 Z 的所有链接。我有这个代码片段可以完成这项工作：

s = BeautifulSoup(requests.get('https://www.gov.uk/government/collections/doctors-and-medical-facilities-worldwide-list#b').text, 'html.parser')
links = [div.find("a") for div in s.find_all('div', {'class':'gem-c-document-list__item-title'})]

基本上，它会查找具有类

a

的

div

 标签内的所有

gem-c-document-list__item-title

 标签。

如果这不能回答您的问题，请尝试重新表述您的问题并发布正确的链接。

遍历 24 页 - 使用解析器脚本从 a 到 z：通过请求和 xpath 获取从美国到 zypria（塞浦路斯）的医院...

问题描述投票：0回答：1

1个回答

最新问题

遍历 24 页 - 使用解析器脚本从 a 到 z：通过请求和 xpath 获取从美国到 zypria（塞浦路斯）的医院...

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1