我想下载印度股票市场公司的数据集,所以我编写了以下代码来下载它,但是这花了太多时间,因为我想下载的公司数量大约是1700家。
首先我以常规方式编写它,而没有使用如下所示的线程,
import pandas_datareader as web
import pandas as pd
import csv
import requests
import time
import concurrent.futures
import datetime
from threading import Thread
start = datetime.date.today() - datetime.timedelta(days=10)
end = yesterday = datetime.date.today() - datetime.timedelta(days=1)
t1 = time.perf_counter()
df = web.DataReader("RELIANCE.NS", 'yahoo', start,end)
df = web.DataReader("TCS.NS", 'yahoo', start,end)
df = web.DataReader("HINDUNILVR.NS", 'yahoo', start,end)
df = web.DataReader("HDFCBANK.NS", 'yahoo', start,end)
df = web.DataReader("HDFC.NS", 'yahoo', start,end)
df = web.DataReader("INFY.NS", 'yahoo', start,end)
df = web.DataReader("KOTAKBANK.NS", 'yahoo', start,end)
df = web.DataReader("BHARTIARTL.NS", 'yahoo', start,end)
df = web.DataReader("ITC.NS", 'yahoo', start,end)
df = web.DataReader("ICICIBANK.NS", 'yahoo', start,end)
df = web.DataReader("SBIN.NS", 'yahoo', start,end)
df = web.DataReader("ASIANPAINT.NS", 'yahoo', start,end)
df = web.DataReader("DMART.NS", 'yahoo', start,end)
df = web.DataReader("BAJFINANCE.NS", 'yahoo', start,end)
df = web.DataReader("MARUTI.NS", 'yahoo', start,end)
df = web.DataReader("HCLTECH.NS", 'yahoo', start,end)
df = web.DataReader("LT.NS", 'yahoo', start,end)
df = web.DataReader("WIPRO.NS", 'yahoo', start,end)
df = web.DataReader("AXISBANK.NS", 'yahoo', start,end)
df = web.DataReader( "ULTRACEMCO.NS", 'yahoo', start,end)
df = web.DataReader("HDFCLIFE.NS", 'yahoo', start,end)
df = web.DataReader("COALINDIA.NS", 'yahoo', start,end)
df = web.DataReader("ONGC.NS", 'yahoo', start,end)
df = web.DataReader("SUNPHARMA.NS", 'yahoo', start,end)
df = web.DataReader("NTPC.NS", 'yahoo', start,end)
t2 = time.perf_counter()
print(f'在{t2-t1}秒内完成”)
和输出,
Finished in 27.4473087 seconds
然后我在youtube上看到了一些有关线程的视频,我转换了以下相同的程序,
import pandas_datareader as web
import pandas as pd
import csv
import requests
import time
import concurrent.futures
import datetime
from threading import Thread
start = datetime.date.today() - datetime.timedelta(days=10)
end = yesterday = datetime.date.today() - datetime.timedelta(days=1)
t1 = time.perf_counter()
shareSymbols = [
"RELIANCE.NS", "TCS.NS", "HINDUNILVR.NS", "HDFCBANK.NS", "HDFC.NS", "INFY.NS","KOTAKBANK.NS","BHARTIARTL.NS", "ITC.NS", "ICICIBANK.NS", "SBIN.NS", "ASIANPAINT.NS","DMART.NS", "BAJFINANCE.NS", "MARUTI.NS", "HCLTECH.NS","LT.NS", "WIPRO.NS", "AXISBANK.NS", "ULTRACEMCO.NS", "HDFCLIFE.NS" ,"COALINDIA.NS", "ONGC.NS", "SUNPHARMA.NS", "NTPC.NS"
]
def download_data(shareSymbol):
df = web.DataReader(shareSymbols, 'yahoo', start,end)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(download_data, shareSymbols)
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')
输出,Finished in 83.4883162 seconds
为什么第一个程序比第二个程序要花更少的时间?我需要进行任何更改吗?
[class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None, initializer=None, initargs=())
为此。