对于上下文,我的主数据集是一个24541行x 1830列DataFrame,其中包含NaN或浮点数(股票价格)。我正在处理这个DataFrame 11次,每次在具有相同索引和列的铸造DataFrame中设置值。两个DataFrame的示例如下:
data = pd.DataFrame.from_csv(filepath)
data = pd.DataFrame(data=data, dtype=np.float64)
#dataset of daily prices
data.head()
Out[14]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
1925-12-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-02 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-04 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-05 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-06 NaN NaN NaN NaN ... NaN NaN NaN NaN
[5 rows x 1830 columns]
MA_a_frame = pd.DataFrame(
data=0,
index=data.index,
columns=data.columns)
#bool DataFrame
MA_a_frame.head()
Out[15]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
1925-12-31 0 0 0 0 ... 0 0 0 0
1926-01-02 0 0 0 0 ... 0 0 0 0
1926-01-04 0 0 0 0 ... 0 0 0 0
1926-01-05 0 0 0 0 ... 0 0 0 0
1926-01-06 0 0 0 0 ... 0 0 0 0
[5 rows x 1830 columns]
如果满足DataFrame“data”中的特定条件,则MA_a_frame(以及其他10个相同的DataFrame)中的值将设置为1。即,如果“数据”中的价格在先前函数中生成的完全不同的DataFrame中的计算值的1%(参数为“j”)内。总的来说,每次迭代最多会处理3个大型DataFrame。
就我的迭代器而言,我只是使用data.columns和data.index创建两个单独的列表(“日期”和“证券”)。所以我基本上间接地迭代数据的索引和列。不用多说,这里是代码的基础,在我的程序中总共运行了11次(我试图加速的部分!):
def gen_a():
for date in dates:
for security in securities:
try:
if type(data.loc[date, security]) is not float:
pass
#lots of the data is NaN, so skip these altogether
elif j > math.log(
MA_a_csv.loc[date, security]/
data.loc[date, security]) > -j:
MA_dict['a'].loc[date, security] = 1
print(f'Passed {date}, {security}')
except:
print(f'Failed {date}, {security}')
现在,问题是这段代码的一个周期需要大约8个小时。因此,我每次跑步都要看近90个小时。我有一份学术论文作为毕业要求,截止日期真的开始吓唬我这些数字了!假设我的输出是完美的,事情应该没问题,但如果有人提出可以降低速度的建议,我将永远感激不尽。否则,我可能不得不缩小数据范围,降低了统计分析的功效。
附:我通过英特尔i7 3970X在Windows 10上运行Spyder。我没有任何其他计算能力。我考虑过GPU加速,但我的GPU是GTX 670,它不是Pascal,因此与CuDF不兼容。
编辑:
这是数据DataFrame的最后五行:
s.head()
Out[16]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
2018-12-24 61.55 232.70000 NaN NaN ... NaN 15.71 NaN NaN
2018-12-26 65.11 244.59000 NaN NaN ... NaN 16.48 NaN NaN
2018-12-27 64.71 252.17999 NaN NaN ... NaN 16.71 NaN NaN
2018-12-28 64.96 249.64999 NaN NaN ... NaN 16.55 NaN NaN
2018-12-31 66.09 254.50000 NaN NaN ... NaN 16.74 NaN NaN
[5 rows x 1830 columns]
这是一个比较DataFrames的示例:
Out[23]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
2018-12-24 76.3430 258.376200 NaN NaN ... NaN 19.8672 NaN NaN
2018-12-26 75.9530 258.143600 NaN NaN ... NaN 19.7980 NaN NaN
2018-12-27 75.5552 258.127199 NaN NaN ... NaN 19.7238 NaN NaN
2018-12-28 75.1382 257.878799 NaN NaN ... NaN 19.6440 NaN NaN
2018-12-31 74.7716 257.683199 NaN NaN ... NaN 19.5600 NaN NaN
[5 rows x 1830 columns]
编辑2:
根据请求,这里是data.head()。to_dict():
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'44792': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85753': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20220': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12044': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20239': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28433': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12052': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12060': {Timestamp('1925-12-31 00:00:00'): 326.0,
Timestamp('1926-01-02 00:00:00'): 326.5,
Timestamp('1926-01-04 00:00:00'): 325.0,
Timestamp('1926-01-05 00:00:00'): 325.5,
Timestamp('1926-01-06 00:00:00'): 326.25},
'12062': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85792': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12067': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77605': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77606': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20263': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12073': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12076': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12079': {Timestamp('1925-12-31 00:00:00'): 117.5,
Timestamp('1926-01-02 00:00:00'): 124.25,
Timestamp('1926-01-04 00:00:00'): 127.125,
Timestamp('1926-01-05 00:00:00'): 123.75,
Timestamp('1926-01-06 00:00:00'): 124.5},
'61241': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12095': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28484': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53065': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20298': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77644': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28505': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53081': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77659': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12124': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77661': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28513': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61284': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77668': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12140': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85869': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20343': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28548': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77702': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12167': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85908': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12183': {Timestamp('1925-12-31 00:00:00'): 78.5,
Timestamp('1926-01-02 00:00:00'): 78.0,
Timestamp('1926-01-04 00:00:00'): 77.5,
Timestamp('1926-01-05 00:00:00'): 76.875,
Timestamp('1926-01-06 00:00:00'): 76.5},
'44951': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85913': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85914': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12191': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20386': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77730': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28580': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85926': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20394': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69550': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12212': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20407': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12220': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20415': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77768': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85963': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20431': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45014': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61399': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69607': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85991': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53225': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20474': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20482': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86021': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45065': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12298': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69649': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12308': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20503': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45081': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86041': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12319': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20511': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12343': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12345': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20554': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12369': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20562': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86102': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20570': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86111': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12394': {Timestamp('1925-12-31 00:00:00'): 123.5,
Timestamp('1926-01-02 00:00:00'): 124.0,
Timestamp('1926-01-04 00:00:00'): 123.25,
Timestamp('1926-01-05 00:00:00'): 123.5,
Timestamp('1926-01-06 00:00:00'): 122.75},
'36978': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86136': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28804': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86158': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12431': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61583': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20626': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77976': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53401': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86176': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12449': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69796': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12456': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45225': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12458': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20650': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28847': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
...}
不幸的是,我对这篇文章的空间不足,但MA_a_csv.head()。to_dict()产生的内容与上面相同,除了所有NaN而不是一个数据点。
我根据你给出的例子制作了我自己的样本数据生成器。我认为它符合您的要求,但如果没有,请告诉我。如果数据匹配,请不要担心我的具体方式。
rows = 6
cols = 5
np.random.seed(0)
data = pd.DataFrame(np.random.rand(rows, cols) * 100,
index=pd.DatetimeIndex(freq='d', start='1928-12-31', periods=rows))
nan_cols = len(data.columns) // 2
random_indices = zip(pd.Series(data.index.values[:-rows // 2])
.sample(nan_cols, random_state=1, replace=True),
pd.Series(data.columns).sample(nan_cols, random_state=2))
for row, col in random_indices:
data.loc[:row, col] = np.nan
MA_a_csv = data * (1 + (np.random.rand(rows, cols) / 50
* np.random.choice([-1, 1], size=(rows, cols))))
所以data
看起来像
0 1 2 3 4
1928-12-31 54.881350 71.518937 NaN 54.488318 NaN
1929-01-01 64.589411 43.758721 NaN 96.366276 38.344152
1929-01-02 79.172504 52.889492 56.804456 92.559664 7.103606
1929-01-03 8.712930 2.021840 83.261985 77.815675 87.001215
1929-01-04 97.861834 79.915856 46.147936 78.052918 11.827443
1929-01-05 63.992102 14.335329 94.466892 52.184832 41.466194
和MA_a_csv
看起来像
0 1 2 3 4
1928-12-31 55.171734 72.626384 NaN 55.107778 NaN
1929-01-01 63.791557 44.294412 NaN 98.185186 38.867028
1929-01-02 78.603241 53.351780 57.597027 92.448175 7.008877
1929-01-03 8.829794 2.013333 83.047291 77.324770 86.368349
1929-01-04 98.977844 80.616881 45.235708 77.893620 11.876852
1929-01-05 63.785651 14.522579 94.945445 52.671519 41.668902
我通过看起来像你的gen_a
的东西运行它,然后制作了一个矢量化的版本得到了相同的答案:
logs = np.log(MA_a_csv / data)
ans = ((j > logs) & (logs > -j)).replace({True: 1, False: 0})
ans
在哪里
0 1 2 3 4
1928-12-31 1 0 0 0 0
1929-01-01 0 0 0 0 0
1929-01-02 1 1 0 1 0
1929-01-03 0 1 1 1 1
1929-01-04 0 1 0 1 1
1929-01-05 1 0 1 1 1
np.log
可以同时对整个阵列进行操作,并且大熊猫可能正在做一些花哨的事情,以便将大于比较的矢量化。 &
是有点明智的,所以它只是检查每个位置的两个条件都是正确的。
这比我的gen_a
版本快180倍,它没有try / except或print语句,所以对你的代码应该是一个更大的改进。
你也不需要.replace({True: 1, False: 0})
部分 - 在Python中1 == True
是真的,就像0 == False
一样,所以你应该能够互换地使用它们。
如果您对此有任何疑问,请与我们联系。为了进一步阅读,我建议Tom Augspurger现代熊猫文章 - 特别适用于Fast Pandas部分。
将两个简短的评论结合到一个答案中。
1)声明
j > math.log(
MA_a_csv.loc[date, security]/
data.loc[date, security]) > -j
可以通过做abs
,例如, j > abs(...)
并且可以通过单独计算日志并利用log(a/b) == log(a) - log(b)
这一事实来显着加速。
即使对单元格只进行一次计算,您也可以计算并将其写回,以加快重新运行。
2)如果你的实际代码中有那些打印语句,它们将占用总时间的相当大的一部分。
也许在阅读csv时使用chunksize
参数。您需要四处游戏以确定最佳尺寸,但我听说一个好的经验法则是将其设置为可用内存的一半。
df = pd.read_csv("your.csv", chucksize=memory/2)
将结果写回文件时,您需要确保追加参数集:
df.to_csv("yourresults.csv", mode='a')
每次运行代码时都要删除文件,或者确保第一次调用to_csv()
是在写入模式下完成的(默认)。
我尝试的其他选择:
1)使用AWS EC2等云资源购买高规格的高内存机器,将数据和代码传输到它并让它运行代码。它应该快得多。
2)我正在考虑使用像Pyspark这样的东西来划分多台机器的负载,但如果不熟悉的话,这可能需要一段时间才能达到速度。
祝好运!