使用 BeautifulSoup 抓取世界指数表后出现空名称

问题描述 投票:0回答:1

我正在尝试从雅虎财经的世界指数表中抓取该指数的股票代码和全名:https://finance.yahoo.com/world-indices/

这是我目前拥有的代码:

from bs4 import BeautifulSoup
import pandas as pd

# URL of the Yahoo Finance world indices page
url = 'https://finance.yahoo.com/world-indices/'

# Send HTTP request to the URL
response = requests.get(url)
response.raise_for_status()  # Raise an exception for HTTP errors

# Parse the HTML content of the page using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the table containing the indices data
table = soup.find('table', {'class': 'W(100%)'})

# Initialize lists to store ticker symbols and names
tickers = []
names = []

# Extract data from the table rows
for row in table.find_all('tr')[1:]:  # Skip the header row
    cells = row.find_all('td')
    ticker = cells[0].text.strip()
    name = cells[1].text.strip()
    tickers.append(ticker)
    names.append(name)

# Create a dataframe using pandas
data = {'Ticker': tickers, 'Name': names}
df = pd.DataFrame(data)

# Save the dataframe to a CSV file
df.to_csv('yahoo_finance_world_indices.csv', index=False)

print('Data saved to yahoo_finance_world_indices.csv')

以及生成的 csv 文件:

^GSPC,S&P 500
^DJI,Dow 30
^IXIC,Nasdaq
^NYA,
^XAX,
^BUK100P,
^RUT,Russell 2000
^VIX,
^FTSE,FTSE 100
^GDAXI,
^FCHI,
^STOXX50E,
^N100,
^BFX,
IMOEX.ME,
^N225,Nikkei 225
^HSI,
000001.SS,
399001.SZ,
^STI,
^AXJO,
^AORD,
^BSESN,
^JKSE,
^KLSE,
^NZ50,
^KS11,
^TWII,
^GSPTSE,
^BVSP,
^MXX,
^IPSA,
^MERV,
^TA125.TA,
^CASE30,
^JN0U.JO,

我想知道为什么某些名称没有输出到我的 CSV 文件 - 我注意到输出的名称也是实际表格上显示的全名的缩写/短格式版本。

我是网络抓取新手,所以我不完全确定这里出了什么问题。

python html csv beautifulsoup
1个回答
0
投票

以下是如何从网站获取股票代码+名称的示例:

import requests
from bs4 import BeautifulSoup

url = "https://finance.yahoo.com/world-indices/"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

for a in soup.select('a[data-test="quoteLink"]'):
    print(f"{a.text:<30} {a['title']}")

打印:

^GSPC                          S&P 500
^DJI                           Dow Jones Industrial Average
^IXIC                          NASDAQ Composite
^NYA                           NYSE COMPOSITE (DJ)
^XAX                           NYSE AMEX COMPOSITE INDEX
^BUK100P                       Cboe UK 100
^RUT                           Russell 2000
^VIX                           CBOE Volatility Index
^FTSE                          FTSE 100
^GDAXI                         DAX PERFORMANCE-INDEX
^FCHI                          CAC 40
^STOXX50E                      ESTX 50 PR.EUR
^N100                          Euronext 100 Index
^BFX                           BEL 20
IMOEX.ME                       MOEX Russia Index
^N225                          Nikkei 225
^HSI                           HANG SENG INDEX
000001.SS                      SSE Composite Index
399001.SZ                      Shenzhen Index
^STI                           STI Index
^AXJO                          S&P/ASX 200
^AORD                          ALL ORDINARIES
^BSESN                         S&P BSE SENSEX
^JKSE                          IDX COMPOSITE
^KLSE                          FTSE Bursa Malaysia KLCI
^NZ50                          S&P/NZX 50 INDEX GROSS ( GROSS 
^KS11                          KOSPI Composite Index
^TWII                          TSEC weighted index
^GSPTSE                        S&P/TSX Composite index
^BVSP                          IBOVESPA
^MXX                           IPC MEXICO
^IPSA                          S&P/CLX IPSA
^MERV                          MERVAL
^TA125.TA                      TA-125
^CASE30                        EGX 30 Price Return Index
^JN0U.JO                       Top 40 USD Net TRI Index
© www.soinside.com 2019 - 2024. All rights reserved.