让这个熊猫代码尽可能精简和快速? [迭代大型DataFrames并设置]

问题描述 投票:2回答:3

对于上下文,我的主数据集是一个24541行x 1830列DataFrame,其中包含NaN或浮点数(股票价格)。我正在处理这个DataFrame 11次,每次在具有相同索引和列的铸造DataFrame中设置值。两个DataFrame的示例如下:

data = pd.DataFrame.from_csv(filepath)
data = pd.DataFrame(data=data, dtype=np.float64)

#dataset of daily prices
data.head()

Out[14]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-02    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-04    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-05    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-06    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN

[5 rows x 1830 columns]

MA_a_frame = pd.DataFrame(
        data=0,
        index=data.index, 
        columns=data.columns)

#bool DataFrame
MA_a_frame.head()

Out[15]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31      0      0      0      0  ...      0      0      0      0
1926-01-02      0      0      0      0  ...      0      0      0      0
1926-01-04      0      0      0      0  ...      0      0      0      0
1926-01-05      0      0      0      0  ...      0      0      0      0
1926-01-06      0      0      0      0  ...      0      0      0      0

[5 rows x 1830 columns]

如果满足DataFrame“data”中的特定条件,则MA_a_frame(以及其他10个相同的DataFrame)中的值将设置为1。即,如果“数据”中的价格在先前函数中生成的完全不同的DataFrame中的计算值的1%(参数为“j”)内。总的来说,每次迭代最多会处理3个大型DataFrame。

就我的迭代器而言,我只是使用data.columns和data.index创建两个单独的列表(“日期”和“证券”)。所以我基本上间接地迭代数据的索引和列。不用多说,这里是代码的基础,在我的程序中总共运行了11次(我试图加速的部分!):

def gen_a():

    for date in dates:

        for security in securities: 

            try: 

                if type(data.loc[date, security]) is not float:

                    pass
                    #lots of the data is NaN, so skip these altogether

                elif j > math.log(
                        MA_a_csv.loc[date, security]/
                        data.loc[date, security]) > -j:

                    MA_dict['a'].loc[date, security] = 1

                print(f'Passed {date}, {security}')

            except: 

                print(f'Failed {date}, {security}')

现在,问题是这段代码的一个周期需要大约8个小时。因此,我每次跑步都要看近90个小时。我有一份学术论文作为毕业要求,截止日期真的开始吓唬我这些数字了!假设我的输出是完美的,事情应该没问题,但如果有人提出可以降低速度的建议,我将永远感激不尽。否则,我可能不得不缩小数据范围,降低了统计分析的功效。

附:我通过英特尔i7 3970X在Windows 10上运行Spyder。我没有任何其他计算能力。我考虑过GPU加速,但我的GPU是GTX 670,它不是Pascal,因此与CuDF不兼容。

编辑:

这是数据DataFrame的最后五行:

s.head()
Out[16]: 
            49154      65541  32791  65568  ...  24563  81910  24571  90110
DATE                                        ...                            
2018-12-24  61.55  232.70000    NaN    NaN  ...    NaN  15.71    NaN    NaN
2018-12-26  65.11  244.59000    NaN    NaN  ...    NaN  16.48    NaN    NaN
2018-12-27  64.71  252.17999    NaN    NaN  ...    NaN  16.71    NaN    NaN
2018-12-28  64.96  249.64999    NaN    NaN  ...    NaN  16.55    NaN    NaN
2018-12-31  66.09  254.50000    NaN    NaN  ...    NaN  16.74    NaN    NaN

[5 rows x 1830 columns]

这是一个比较DataFrames的示例:

Out[23]: 
              49154       65541  32791  65568  ...  24563    81910  24571  90110
DATE                                           ...                              
2018-12-24  76.3430  258.376200    NaN    NaN  ...    NaN  19.8672    NaN    NaN
2018-12-26  75.9530  258.143600    NaN    NaN  ...    NaN  19.7980    NaN    NaN
2018-12-27  75.5552  258.127199    NaN    NaN  ...    NaN  19.7238    NaN    NaN
2018-12-28  75.1382  257.878799    NaN    NaN  ...    NaN  19.6440    NaN    NaN
2018-12-31  74.7716  257.683199    NaN    NaN  ...    NaN  19.5600    NaN    NaN

[5 rows x 1830 columns]

编辑2:

根据请求,这里是data.head()。to_dict():

  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '44792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85753': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12044': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20239': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28433': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12052': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12060': {Timestamp('1925-12-31 00:00:00'): 326.0,
  Timestamp('1926-01-02 00:00:00'): 326.5,
  Timestamp('1926-01-04 00:00:00'): 325.0,
  Timestamp('1926-01-05 00:00:00'): 325.5,
  Timestamp('1926-01-06 00:00:00'): 326.25},
 '12062': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12067': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77605': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77606': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20263': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12073': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12076': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12079': {Timestamp('1925-12-31 00:00:00'): 117.5,
  Timestamp('1926-01-02 00:00:00'): 124.25,
  Timestamp('1926-01-04 00:00:00'): 127.125,
  Timestamp('1926-01-05 00:00:00'): 123.75,
  Timestamp('1926-01-06 00:00:00'): 124.5},
 '61241': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12095': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28484': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77644': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28505': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77659': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12124': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77661': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28513': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61284': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77668': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12140': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85869': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28548': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77702': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12167': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85908': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12183': {Timestamp('1925-12-31 00:00:00'): 78.5,
  Timestamp('1926-01-02 00:00:00'): 78.0,
  Timestamp('1926-01-04 00:00:00'): 77.5,
  Timestamp('1926-01-05 00:00:00'): 76.875,
  Timestamp('1926-01-06 00:00:00'): 76.5},
 '44951': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85913': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85914': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12191': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20386': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77730': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28580': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85926': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20394': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69550': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12212': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20407': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20415': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77768': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85963': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45014': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61399': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69607': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85991': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20474': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20482': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86021': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69649': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12308': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20503': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86041': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12319': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20511': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12345': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20554': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12369': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20562': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86102': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20570': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86111': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12394': {Timestamp('1925-12-31 00:00:00'): 123.5,
  Timestamp('1926-01-02 00:00:00'): 124.0,
  Timestamp('1926-01-04 00:00:00'): 123.25,
  Timestamp('1926-01-05 00:00:00'): 123.5,
  Timestamp('1926-01-06 00:00:00'): 122.75},
 '36978': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86136': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28804': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86158': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61583': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20626': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77976': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53401': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86176': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12449': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69796': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12456': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12458': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20650': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28847': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 ...}

不幸的是,我对这篇文章的空间不足,但MA_a_csv.head()。to_dict()产生的内容与上面相同,除了所有NaN而不是一个数据点。

python pandas dataframe finance
3个回答
1
投票

我根据你给出的例子制作了我自己的样本数据生成器。我认为它符合您的要求,但如果没有,请告诉我。如果数据匹配,请不要担心我的具体方式。

rows = 6
cols = 5
np.random.seed(0)
data = pd.DataFrame(np.random.rand(rows, cols) * 100, 
                  index=pd.DatetimeIndex(freq='d', start='1928-12-31', periods=rows))
nan_cols = len(data.columns) // 2
random_indices = zip(pd.Series(data.index.values[:-rows // 2])
                     .sample(nan_cols, random_state=1, replace=True), 
                     pd.Series(data.columns).sample(nan_cols, random_state=2))
for row, col in random_indices:
    data.loc[:row, col] = np.nan

MA_a_csv = data * (1 + (np.random.rand(rows, cols) / 50 
                        * np.random.choice([-1, 1], size=(rows, cols))))

所以data看起来像

                    0          1          2          3          4
1928-12-31  54.881350  71.518937        NaN  54.488318        NaN
1929-01-01  64.589411  43.758721        NaN  96.366276  38.344152
1929-01-02  79.172504  52.889492  56.804456  92.559664   7.103606
1929-01-03   8.712930   2.021840  83.261985  77.815675  87.001215
1929-01-04  97.861834  79.915856  46.147936  78.052918  11.827443
1929-01-05  63.992102  14.335329  94.466892  52.184832  41.466194

MA_a_csv看起来像

                    0          1          2          3          4
1928-12-31  55.171734  72.626384        NaN  55.107778        NaN
1929-01-01  63.791557  44.294412        NaN  98.185186  38.867028
1929-01-02  78.603241  53.351780  57.597027  92.448175   7.008877
1929-01-03   8.829794   2.013333  83.047291  77.324770  86.368349
1929-01-04  98.977844  80.616881  45.235708  77.893620  11.876852
1929-01-05  63.785651  14.522579  94.945445  52.671519  41.668902

我通过看起来像你的gen_a的东西运行它,然后制作了一个矢量化的版本得到了相同的答案:

logs = np.log(MA_a_csv / data)
ans = ((j > logs) & (logs > -j)).replace({True: 1, False: 0})

ans在哪里

            0  1  2  3  4
1928-12-31  1  0  0  0  0
1929-01-01  0  0  0  0  0
1929-01-02  1  1  0  1  0
1929-01-03  0  1  1  1  1
1929-01-04  0  1  0  1  1
1929-01-05  1  0  1  1  1

np.log可以同时对整个阵列进行操作,并且大熊猫可能正在做一些花哨的事情,以便将大于比较的矢量化。 &是有点明智的,所以它只是检查每个位置的两个条件都是正确的。

这比我的gen_a版本快180倍,它没有try / except或print语句,所以对你的代码应该是一个更大的改进。

你也不需要.replace({True: 1, False: 0})部分 - 在Python中1 == True是真的,就像0 == False一样,所以你应该能够互换地使用它们。

如果您对此有任何疑问,请与我们联系。为了进一步阅读,我建议Tom Augspurger现代熊猫文章 - 特别适用于Fast Pandas部分。


0
投票

将两个简短的评论结合到一个答案中。

1)声明

j > math.log(
   MA_a_csv.loc[date, security]/
   data.loc[date, security]) > -j

可以通过做abs,例如, j > abs(...)

并且可以通过单独计算日志并利用log(a/b) == log(a) - log(b)这一事实来显着加速。

即使对单元格只进行一次计算,您也可以计算并将其写回,以加快重新运行。

2)如果你的实际代码中有那些打印语句,它们将占用总时间的相当大的一部分。


-1
投票

也许在阅读csv时使用chunksize参数。您需要四处游戏以确定最佳尺寸,但我听说一个好的经验法则是将其设置为可用内存的一半。

df = pd.read_csv("your.csv", chucksize=memory/2)

将结果写回文件时,您需要确保追加参数集:

df.to_csv("yourresults.csv", mode='a')

每次运行代码时都要删除文件,或者确保第一次调用to_csv()是在写入模式下完成的(默认)。

我尝试的其他选择:

1)使用AWS EC2等云资源购买高规格的高内存机器,将数据和代码传输到它并让它运行代码。它应该快得多。

2)我正在考虑使用像Pyspark这样的东西来划分多台机器的负载,但如果不熟悉的话,这可能需要一段时间才能达到速度。

祝好运!

© www.soinside.com 2019 - 2024. All rights reserved.