如何在同一网站上抓取多个表格

Question

我正在尝试从该网站抓取多个表格

这是我的代码

def scrape_ranking(url, sheet_name):
    with sync_playwright() as pw:
        browser = pw.chromium.launch()
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        soup = BeautifulSoup(page.content(), "html.parser")
        table = soup.select_one(".table_bd")
        print("done step 1")

        if table is None:
            print("Table not found.")
        else:
            df = pd.read_html(str(table))[0]
            print(df)
            with pd.ExcelWriter("jockeyclub.xlsx", engine="openpyxl", mode='a', if_sheet_exists='overlay') as writer:
                df.to_excel(writer, sheet_name=sheet_name, index=True, startrow = 70)

url_trainer = "https://racing.hkjc.com/racing/information/english/racing/Draw.aspx#race1.aspx"
scrape_ranking(url_trainer, "Race Card 1")

此代码能够打印 Race Card 1 的表格。但是，当我将行更改为

df = pd.read_html(str(table))[1]

或

df = pd.read_html(str(table))[2]

时，它无法在网站中找到任何其他表格。

有没有办法打印网站上的所有表格？

Answer 1

使用查找所有选择器

tables = soup.find_all(".table_bd")
if tables is None:
    print("Table not found.")
else:
    for table in tables:
        # do something with the table here

Answer 2

只需使用

pandas.red_html()

，选择带有

attrs

的表并迭代

dataframes

的列表：

import pandas as pd

url_trainer = "https://racing.hkjc.com/racing/information/english/racing/Draw.aspx#race1.aspx"

for table in pd.read_html(url_trainer, attrs={'class':'table_bd'}):
    print(table)

如何在同一网站上抓取多个表格

问题描述投票：0回答：2

2个回答

最新问题

如何在同一网站上抓取多个表格

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2