如何依次更新数据帧列中的每个NaN值

问题描述 投票:1回答:2

我具有以下尺寸为1762行×9列的数据框a。在“ ema”列中,除第13个元素外,其他所有元素均为NaN。 'ind'列包含相应行的索引。

a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0        NaN   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19

对于从第14行开始的'ema'列中的所有元素(即,'ind'列中的值从13开始),我想将它们更改为0.84 *(上一行的ema值)+ 0.16 *(在上一行“打开”)。关于使用以下Apply函数

a['ema']=a.apply(lambda x: (a.loc[x['ind']-1,'open']*0.16 + a.loc[x['ind']-1, 'ema']*0.84) if x['ind']>12 else x['ema'] ,axis=1)

但是我看到只有第14行元素被更改,后续的元素仍保留为NaN本身。

a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0  16.805099   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19

另一个奇怪的事情是,如果我一次又一次地执行上述命令,它将为后续的行一次生成一个正确的'ema'值。有人可以帮忙说一下这里出什么问题了吗?

pandas dataframe lambda apply
2个回答
0
投票

当前脚本问题

  • 如果x['ind']>12 else x['ema']低于ind 12,则什么都不会改变。
  • a.loc[x['ind']-1,'ema'],您正在基于emaopen的先前值计算ema
    • 在开始时,ema中只有一个值,因此仅填充了下一行。
    • 填充不会就位,因此其余值将保持未填充,直到再次运行脚本。
  • [使用NaN计算值时,结果是Nan

带有np.where

  • 首先,可以使用np.where对函数进行矢量化处理>
  • 我唯一想到的解决方案是循环播放
    • 我不是很喜欢,因为它多次循环执行矢量化操作。
np.where

带有condition = df.ind > 12 for _ in range(len(df[condition])): df.ema = np.where(condition, df.open.shift()*0.16 + df.ema.shift()*0.84, df.ema)

  • 更新全局变量
  • 这不涉及多次遍历数据帧
apply    

0
投票

问题是prev_ema = np.nan def test(x): global prev_ema if x['ind'] > 12: if not np.isnan(df.loc[x['ind']-1, 'ema']): prev_ema = df.loc[x['ind']-1, 'open']*0.16 + df.loc[x['ind']-1, 'ema']*0.84 return prev_ema else: prev_ema = df.loc[x['ind']-1, 'open']*0.16 + prev_ema*0.84 return prev_ema else: return x['ema'] df.apply(lambda x: test(x) ,axis=1) 完全在计算新列,只有在最后才分配结果。

© www.soinside.com 2019 - 2024. All rights reserved.