我正在尝试从雅虎财经的世界指数表中抓取该指数的股票代码和全名:https://finance.yahoo.com/world-indices/
这是我目前拥有的代码:
from bs4 import BeautifulSoup
import pandas as pd
# URL of the Yahoo Finance world indices page
url = 'https://finance.yahoo.com/world-indices/'
# Send HTTP request to the URL
response = requests.get(url)
response.raise_for_status() # Raise an exception for HTTP errors
# Parse the HTML content of the page using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Find the table containing the indices data
table = soup.find('table', {'class': 'W(100%)'})
# Initialize lists to store ticker symbols and names
tickers = []
names = []
# Extract data from the table rows
for row in table.find_all('tr')[1:]: # Skip the header row
cells = row.find_all('td')
ticker = cells[0].text.strip()
name = cells[1].text.strip()
tickers.append(ticker)
names.append(name)
# Create a dataframe using pandas
data = {'Ticker': tickers, 'Name': names}
df = pd.DataFrame(data)
# Save the dataframe to a CSV file
df.to_csv('yahoo_finance_world_indices.csv', index=False)
print('Data saved to yahoo_finance_world_indices.csv')
以及生成的 csv 文件:
^GSPC,S&P 500
^DJI,Dow 30
^IXIC,Nasdaq
^NYA,
^XAX,
^BUK100P,
^RUT,Russell 2000
^VIX,
^FTSE,FTSE 100
^GDAXI,
^FCHI,
^STOXX50E,
^N100,
^BFX,
IMOEX.ME,
^N225,Nikkei 225
^HSI,
000001.SS,
399001.SZ,
^STI,
^AXJO,
^AORD,
^BSESN,
^JKSE,
^KLSE,
^NZ50,
^KS11,
^TWII,
^GSPTSE,
^BVSP,
^MXX,
^IPSA,
^MERV,
^TA125.TA,
^CASE30,
^JN0U.JO,
我想知道为什么某些名称没有输出到我的 CSV 文件 - 我注意到输出的名称也是实际表格上显示的全名的缩写/短格式版本。
我是网络抓取新手,所以我不完全确定这里出了什么问题。
以下是如何从网站获取股票代码+名称的示例:
import requests
from bs4 import BeautifulSoup
url = "https://finance.yahoo.com/world-indices/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for a in soup.select('a[data-test="quoteLink"]'):
print(f"{a.text:<30} {a['title']}")
打印:
^GSPC S&P 500
^DJI Dow Jones Industrial Average
^IXIC NASDAQ Composite
^NYA NYSE COMPOSITE (DJ)
^XAX NYSE AMEX COMPOSITE INDEX
^BUK100P Cboe UK 100
^RUT Russell 2000
^VIX CBOE Volatility Index
^FTSE FTSE 100
^GDAXI DAX PERFORMANCE-INDEX
^FCHI CAC 40
^STOXX50E ESTX 50 PR.EUR
^N100 Euronext 100 Index
^BFX BEL 20
IMOEX.ME MOEX Russia Index
^N225 Nikkei 225
^HSI HANG SENG INDEX
000001.SS SSE Composite Index
399001.SZ Shenzhen Index
^STI STI Index
^AXJO S&P/ASX 200
^AORD ALL ORDINARIES
^BSESN S&P BSE SENSEX
^JKSE IDX COMPOSITE
^KLSE FTSE Bursa Malaysia KLCI
^NZ50 S&P/NZX 50 INDEX GROSS ( GROSS
^KS11 KOSPI Composite Index
^TWII TSEC weighted index
^GSPTSE S&P/TSX Composite index
^BVSP IBOVESPA
^MXX IPC MEXICO
^IPSA S&P/CLX IPSA
^MERV MERVAL
^TA125.TA TA-125
^CASE30 EGX 30 Price Return Index
^JN0U.JO Top 40 USD Net TRI Index