BeautifulSoup 网络抓取 find_all( ):

问题描述 投票:0回答:1

我正在尝试获取带有绿色圆圈的时间和价格,保存它们并将数据发送到https://github.com/pedroslopez/whatsapp-web.jshttps://imgur.com/NSmNxL7

Google 协作链接:https://colab.research.google.com/drive/1HvO7AWvBhP1_epfiAKSE1aE1zBUHvJ9O#scrollTo=evP9Tallp-d5

当我使用这行代码时,

divs = soup.find_all('div', class_ = "col-xs-11 col-lg-5 template-tlh__colors")


for div in divs:
  print(div)
  print(" ")

它向我显示了这些数据,

<div class="col-xs-11 col-lg-5 template-tlh__colors">
<div class="template-tlh__colors--hours">
<div class="template-tlh__colors--hours-info">
<div class="template-tlh__colors--hours-circle template-tlh__background-color-high"> </div>
<span itemprop="description">20:00 - 21:00</span>
</div>
<div class="template-tlh__colors--hours-price template-tlh__color-high">
<span itemprop="price">0.06643
€/kWh</span>
<meta content="EUR" itemprop="priceCurrency"/>
</div>
</div>
</div>
 
<div class="col-xs-11 col-lg-5 template-tlh__colors">
<div class="template-tlh__colors--hours">
<div class="template-tlh__colors--hours-info">
<div class="template-tlh__colors--hours-circle template-tlh__background-color-high"> </div>
<span itemprop="description">21:00 - 22:00</span>
</div>
<div class="template-tlh__colors--hours-price template-tlh__color-high">
<span itemprop="price">0.07924
€/kWh</span>
<meta content="EUR" itemprop="priceCurrency"/>
</div>
</div>
</div>

但是,我不知道如何提取时间和价格数据(我用粗体标记了它们)。以便稍后能够使用 Whatsapp 机器人发送它们。

网站是https://tarifaluzhora.es/

python web-scraping
1个回答
0
投票

这是一个如何从该页面获取时间/价格/颜色到 pandas 数据框的示例:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://tarifaluzhora.es/"

soup = BeautifulSoup(requests.get(url).content, "html5lib")

all_data = []
for row in soup.select(".row:not(:has(.row)):has([itemprop=price])"):
    time = row.select_one('[itemprop="description"]')
    price = row.select_one('[itemprop="price"]')

    color = time.find_previous("div")["class"][-1]

    if "high" in color:
        color = "high"
    elif "low" in color:
        color = "low"
    else:
        color = "default"

    all_data.append(
        {
            "Time": time.text,
            "Price": price.text.replace("\n", " "),
            "Color": color,
        }
    )

df = pd.DataFrame(all_data)
print(df)

打印:

             Time          Price    Color
0   00:00 - 01:00  0.07835 €/kWh     high
1   01:00 - 02:00  0.07402 €/kWh     high
2   02:00 - 03:00  0.07551 €/kWh     high
3   03:00 - 04:00  0.07142 €/kWh     high
4   04:00 - 05:00   0.0686 €/kWh     high
5   05:00 - 06:00  0.06724 €/kWh     high
6   06:00 - 07:00  0.06988 €/kWh     high
7   07:00 - 08:00  0.06922 €/kWh     high
8   08:00 - 09:00  0.06694 €/kWh     high
9   09:00 - 10:00  0.04585 €/kWh      low
10  10:00 - 11:00  0.04314 €/kWh      low
11  11:00 - 12:00  0.04135 €/kWh      low
12  12:00 - 13:00  0.04095 €/kWh      low
13  13:00 - 14:00  0.04074 €/kWh      low
14  14:00 - 15:00  0.04164 €/kWh      low
15  15:00 - 16:00  0.04227 €/kWh      low
16  16:00 - 17:00  0.04262 €/kWh      low
17  17:00 - 18:00  0.04444 €/kWh      low
18  18:00 - 19:00  0.04552 €/kWh      low
19  19:00 - 20:00  0.05339 €/kWh      low
20  20:00 - 21:00  0.06643 €/kWh     high
21  21:00 - 22:00  0.07924 €/kWh     high
22  22:00 - 23:00  0.06724 €/kWh     high
23  23:00 - 24:00  0.05605 €/kWh  default
© www.soinside.com 2019 - 2024. All rights reserved.