使用Python从Yahoo Finance中搜集历史数据

问题描述 投票:2回答:1

就像你们当中有些人现在知道的那样,雅虎似乎!财务已停止其股票市场数据的API。虽然我知道fix-yahoo-finance解决方案的存在,但我试图通过直接从Yahoo抓取历史数据来为我的代码实现更稳定的解决方案。

所以这就是我现在所拥有的:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://finance.yahoo.com/quote/AAPL/history?period1=345423600&period2=1495922400&interval=1d&filter=history&frequency=1d")
soup = BeautifulSoup(page.content, 'html.parser')
soup
print(soup.prettify())

要从Yahoo表中获取数据,我可以这样做:

c=soup.find_all('tbody')
print(c)

我的问题是,如何将“c”变成更好的数据帧?谢谢!

python yahoo-finance
1个回答
4
投票

我这样写是为了直接从下载csv链接获取YF的历史数据。它需要发出两个请求,一个用于获取cookie和crumb,另一个用于获取数据。它返回一个pandas数据帧

import re
from io import StringIO
from datetime import datetime, timedelta

import requests
import pandas as pd


class YahooFinanceHistory:
    timeout = 2
    crumb_link = 'https://finance.yahoo.com/quote/{0}/history?p={0}'
    crumble_regex = r'CrumbStore":{"crumb":"(.*?)"}'
    quote_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={dfrom}&period2={dto}&interval=1d&events=history&crumb={crumb}'

    def __init__(self, symbol, days_back=7):
        self.symbol = symbol
        self.session = requests.Session()
        self.dt = timedelta(days=days_back)

    def get_crumb(self):
        response = self.session.get(self.crumb_link.format(self.symbol), timeout=self.timeout)
        response.raise_for_status()
        match = re.search(self.crumble_regex, response.text)
        if not match:
            raise ValueError('Could not get crumb from Yahoo Finance')
        else:
            self.crumb = match.group(1)

    def get_quote(self):
        if not hasattr(self, 'crumb') or len(self.session.cookies) == 0:
            self.get_crumb()
        now = datetime.utcnow()
        dateto = int(now.timestamp())
        datefrom = int((now - self.dt).timestamp())
        url = self.quote_link.format(quote=self.symbol, dfrom=datefrom, dto=dateto, crumb=self.crumb)
        response = self.session.get(url)
        response.raise_for_status()
        return pd.read_csv(StringIO(response.text), parse_dates=['Date'])

你可以像这样使用它:

df = YahooFinanceHistory('AAPL', days_back=30).get_quote()
© www.soinside.com 2019 - 2024. All rights reserved.