抓取咏叹调标签

问题描述 投票:0回答:0

我读了几个关于这个主题的问题,所以我会问你是否有更好的想法。

基本上,我试图在以下 url 中抓取“价格/报价”表中的数据:

https://www.eurex.com/ex-en/markets/idx/dax/DAX-Options-139884

检查页面,我发现这段html代码应该是相关的:

</div>
<ul role="tablist" class="dbx-tabs__tab-list" data-scroll-disabled="none"><li id="tabsTab-1.1" role="tab" aria-controls="tabsTabPanel-1.1" class="dbx-tabs__tab dbx-tabs__tab--selected" data-js-tab="" tabindex="0" aria-selected="true">

例如,导致基础表标题的前半行:

<div class="_scrollable_table_container_16n6d_68" data-scroll-disabled="forward"><table class="react-table"><thead class="bg-white     "><tr><th align="left" class="                           
                            "><span class="fw-bold d-none"> Contract Type </span></th><th align="left" class="
                            "><span class="fw-bold text-notosans"> Last traded </span></th><th align="left" class="
                            "><span class="fw-bold text-notosans"> Open </span></th><th align="left" class="
                            "><span class="fw-bold text-notosans"> High </span></th><th align="left" class="
                            "><span class="fw-bold text-notosans"> Low </span></th><th align="left" class="

                            "><span class="fw-bold text-notosans"> D. Settle </span></th><th align="left" class="
                            "><span class="fw-bold text-notosans"> OI </span></th><th align="left" class="

                            "><span class="fw-bold text-notosans"> Volume </span></th><th align="left" class="

                            "><span class="fw-bold text-notosans"> Last </span></th><th align="left" class="
                            "><span class="fw-bold text-notosans"> Bid </span></th><th align="left" class="

                            "><span class="fw-bold text-notosans"> Ask </span></th></tr></thead><tbody 

我试着写了下面的python代码:

import requests
from bs4 import BeautifulSoup

url = "https://www.eurex.com/ex-en/markets/idx/dax/DAX-Options-139884"

response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

tab = soup.find("li", {"id": "tabsTab-1.1"})
table = tab.find_next_sibling("div").find("table")

headers = [th.text.strip() for th in table.find_all("th")]

data = []
for row in table.find_all("tr"):
    cells = [td.text.strip() for td in row.find_all("td")]
    if cells:
        data.append(cells)

print(headers)
print(data)

但是调用时弹出“AttributeError: 'NoneType' object has no attribute 'find_next_sibling'” table = tab.find_next_sibling("div").find("table")

似乎在“汤”中缺少与 id="tabsTab-1.1" 相关的 html 片段,因此 python 无法读取提到的基础表(看跌和看涨行使价和价格)。

你对如何阅读整个 html 代码以抓取数据有什么建议吗?

谢谢!

python web-scraping tabs wai-aria
© www.soinside.com 2019 - 2024. All rights reserved.