在嵌套跨度下的跨度中抓取信息

问题描述 投票:0回答:2
python web-scraping beautifulsoup
2个回答
0
投票

如果您想获取今天的天气预报表,您可以使用此示例:

import pandas as pd
import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}

url = "https://weather.com/en-IN/weather/today/l/a0e0a5a98f7825e44d5b44b26d6f3c2e76a8d70e0426d099bff73e764af3087a"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

today_forecast = []
for a in soup.select(".TodayWeatherCard--TableWrapper--globn a"):
    today_forecast.append(
        t.get_text(strip=True, separator=" ") for t in a.find_all(recursive=False)
    )

df = pd.DataFrame(
    today_forecast, columns=["Time of day", "Degrees", "Text", "Chance of rain"]
)

print(df)

打印:

  Time of day Degrees                 Text          Chance of rain
0     Morning    11 °        Partly Cloudy                      --
1   Afternoon    20 °        Partly Cloudy                      --
2     Evening    14 °  Partly Cloudy Night  Rain Chance of Rain 3%
3   Overnight    10 °               Cloudy  Rain Chance of Rain 5%

-1
投票
from bs4 import BeautifulSoup

# Assuming you have your HTML content in 'html_content'
soup = BeautifulSoup(html_content, 'html.parser')

# Find the parent span and extract the text, excluding the nested span's text
rain_forecast = soup.find("span", {"class": "Column--precip--3JCDO"}).contents[-1].strip()

print(rain_forecast)
© www.soinside.com 2019 - 2024. All rights reserved.