自从 yahoo 停止 API 支持之后,pandas datareader 现在失败了
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2017, 5, 17)
web.DataReader('GOOGL', 'yahoo', start, end)
HTTPError: HTTP Error 401: Unauthorized
有没有非官方的库可以让我们暂时解决这个问题? Quandl 上可能有什么吗?
fix_yahoo_finance 包的名称已更改为 yfinance。所以你可以试试这个代码
import yfinance as yf
data = yf.download('MSFT', start = '2012-01-01', end='2017-01-01')
我发现https://pypi.python.org/pypi/fix-yahoo-finance中的“fix-yahoo-finance”解决方法很有用,例如:
from pandas_datareader import data as pdr
import fix_yahoo_finance
data = pdr.get_data_yahoo('APPL', start='2017-04-23', end='2017-05-24')
请注意最后 2 个数据列的顺序是“Adj Close”和“Volume”,即。不是以前的格式。重新索引:
cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
data.reindex(columns=cols)
所以他们改变了他们的网址,现在使用cookies保护(可能还有javascript),所以我使用dryscrape解决了我自己的问题,它模拟浏览器 这只是仅供参考,因为这现在肯定违反了他们的条款和条件......所以使用时需要您自担风险?我正在 Quandl 寻找替代的 EOD 价格来源。
我无法使用 cookie 浏览 CookieJar,所以我最终使用 dryscrape 来“伪造”用户下载
import dryscrape
from bs4 import BeautifulSoup
import time
import datetime
import re
#we visit the main page to initialise sessions and cookies
session = dryscrape.Session()
session.set_attribute('auto_load_images', False)
session.set_header('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')
#call this once as it is slow(er) and then you can do multiple download, though there seems to be a limit after which you have to reinitialise...
session.visit("https://finance.yahoo.com/quote/AAPL/history?p=AAPL")
response = session.body()
#get the dowload link
soup = BeautifulSoup(response, 'lxml')
for taga in soup.findAll('a'):
if taga.has_attr('download'):
url_download = taga['href']
print(url_download)
#now replace the default end date end start date that yahoo provides
s = "2017-02-18"
period1 = '%.0f' % time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple())
e = "2017-05-18"
period2 = '%.0f' % time.mktime(datetime.datetime.strptime(e, "%Y-%m-%d").timetuple())
#now we replace the period download by our dates, please feel free to improve, I suck at regex
m = re.search('period1=(.+?)&', url_download)
if m:
to_replace = m.group(m.lastindex)
url_download = url_download.replace(to_replace, period1)
m = re.search('period2=(.+?)&', url_download)
if m:
to_replace = m.group(m.lastindex)
url_download = url_download.replace(to_replace, period2)
#and now viti and get body and you have your csv
session.visit(url_download)
csv_data = session.body()
#and finally if you want to get a dataframe from it
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(csv_data), index_col=[0], parse_dates=True)
df
我从雅虎更改为谷歌财经,它对我有用,所以从
data.DataReader(ticker, 'yahoo', start_date, end_date)
到
data.DataReader(ticker, 'google', start_date, end_date)
并改编了我的“旧”雅虎!符号来自:
tickers = ['AAPL','MSFT','GE','IBM','AA','DAL','UAL', 'PEP', 'KO']
到
tickers = ['NASDAQ:AAPL','NASDAQ:MSFT','NYSE:GE','NYSE:IBM','NYSE:AA','NYSE:DAL','NYSE:UAL', 'NYSE:PEP', 'NYSE:KO']
试试这个:
import fix_yahoo_finance as yf
data = yf.download('SPY', start = '2012-01-01', end='2017-01-01')
雅虎金融与熊猫合作得很好。像这样使用它:
import pandas as pd
import pandas_datareader as pdr
from pandas_datareader import data as wb
ticker='GOOGL'
start_date='2019-1-1'
data_source='yahoo'
ticker_data=wb.DataReader(ticker,data_source=data_source,start=start_date)
df=pd.DataFrame(ticker_data)
要添加 Tony Shouse 的上述答案,如果您想一次收集多个股票代码的“调整后收盘价”列,则以下代码适用于我使用 Visual Studio Code。
import numpy as np
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
tickers = ['PG', 'MSFT', 'F', 'GE']
portfolio = pd.DataFrame()
for t in tickers:
portfolio[t] = pdr.get_data_yahoo(t, start="2017-01-01", end="2017-04-30")['Adj Close']
让线程在读取每个数据之间休眠。 可能大部分时间都有效,所以尝试 5-6 次并将数据保存在 csv 文件中,以便下次您可以从文件中读取。
### code is here ###
import pandas_datareader as web
import time
import datetime as dt
import pandas as pd
symbols = ['AAPL', 'MSFT', 'AABA', 'DB', 'GLD']
webData = pd.DataFrame()
for stockSymbol in symbols:
webData[stockSymbol] = web.DataReader(stockSymbol,
data_source='yahoo',start=
startDate, end= endDate, retry_count= 10)['Adj Close']
time.sleep(22) # thread sleep for 22 seconds.
这个问题很老了,但我来了。我从 yfinance pypi.org 项目页面找到了标题为“pandas_datareader override”的部分。它指出,
“如果你的代码使用了pandas_datareader并且你想更快地下载数据,你可以“劫持”pandas_datareader.data.get_data_yahoo()方法来使用yfinance,同时确保返回的数据与pandas_datareader的get_data_yahoo()格式相同。”
他们还提供了以下当前正在运行的代码示例。
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")
无需循环
from pandas_datareader import data as pdr
import yfinance as yf
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
yf.pdr_override()
##Symbol Name
##^IRX 13 WEEK TREASURY BILL
##^FVX Treasury Yield 5 Years
##^TNX CBOE Interest Rate 10 Year T No
##^TYX Treasury Yield 30 Years
symbols = ['^TYX', '^TNX', '^FVX', '^IRX', 'DX=F', '6E=F', '6J=F', 'ES=F', 'GC=F', 'CL=F']
df = pdr.get_data_yahoo(symbols, start= '2015-01-01', end= dt.datetime.today())['Adj Close']
print(df)
# plot the current assets
df.plot(subplots=True, layout=(5, 2), figsize=(12,12), sharex=False, ylabel='%', title='Current Assets')
# tweek the layout
##plt.tight_layout()
plt.show()