如何使用BeautifulSoup从HTML中抓取表格数据?

问题描述 投票:0回答:1

我一直在尝试从这个网站上抓取表格https://www.alphaquery.com/stock/aapl/earnings-history 但我无论如何也无法实现它。我什至找不到桌子。

import requests
from bs4 import BeautifulSoup

def get_eps(ticker):
    url = f"https://www.alphaquery.com/stock/{ticker}/earnings-history"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    
    # Attempt to more robustly find the table by checking for specific headers
    table = None
    for table_candidate in soup.find_all("table"):
        headers = [th.get_text(strip=True) for th in table_candidate.find_all("th")]
        if "Estimated EPS" in headers:
            table = table_candidate
            break
    if table:
        rows = table.find_all('tr')[1:6]  
        for row in rows:
            cells = row.find_all('td')
            if len(cells) >= 4:  # Ensure there are enough columns in the row
                try:
                    est_eps = cells[2].text.strip().replace('$', '').replace(',', '')
                except ValueError:
                    continue  # Skip rows where conversion from string to float fails
    else:
        print(f"Failed to find earnings table for {ticker}")

    return est_eps

# Example usage
ticker = 'AAPL'
beats = get_eps(ticker)
print(f'{ticker}  estimates {est_eps}')
python web-scraping beautifulsoup html-table python-requests
1个回答
0
投票

最受欢迎的,有时是最简单的刮桌子的方法应该引导您通过

pandas.read_html()
,它在引擎盖下使用
beautifulsoup

示例
import pandas as pd

pd.read_html('https://www.alphaquery.com/stock/aapl/earnings-history')[0].replace('\$','',regex=True)
公布日期 财政季度末 预计每股收益 实际每股收益
0 2024-05-02 2024-03-31 1.51 1.53
1 2024-02-01 2023-12-31 2.09 2.18
...
36 2015-04-27 2015-03-31 0.55 0.58
37 2015-01-27 2014-12-31 0.65 0.77

但是,要仅使用代码进行估计,您可以使用

yield
而不是
return

        for row in rows:
            cells = row.find_all('td')
            if len(cells) >= 4:  # Ensure there are enough columns in the row
                try:
                    est_eps = cells[2].text.strip().replace('$', '').replace(',', '')
                    yield est_eps
                except ValueError:
                    continue  # Skip rows where conversion from string to float fails     
    else:
        print(f"Failed to find earnings table for {ticker}")

# Example usage
ticker = 'AAPL'
print(f'{ticker} estimates {list(get_eps(ticker))}')
© www.soinside.com 2019 - 2024. All rights reserved.