如何连续更新数据框列中的每个NaN值。

Question

我有以下的数据框架。a，尺寸为1762行×9列。在列中 ema 除了第13个元素，其他都是 NaN. 该 ind 列包含相应行的索引。

a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0        NaN   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19

中的所有元素。ema 列，从第14行开始（即在 ind 13以后的列），我想把它们改为 0.84*(ema value in previous row) + 0.16*(value of 'open' in previous row) 通过使用以下方式 apply 函数。

a['ema']=a.apply(lambda x: (a.loc[x['ind']-1,'open']*0.16 + a.loc[x['ind']-1, 'ema']*0.84) if x['ind']>12 else x['ema'] ,axis=1)

只有第14行元素会被更新，后续的行仍为 NaN.

a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0  16.805099   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19

反复执行该命令，产生正确的值，用于处理 ema，一次一个，为后面的行。
谁能帮我说说这里有什么问题吗？

Answer 1

当前脚本问题

如果 x['ind']>12 else x['ema'] 无出其右 ind 12是要改变的。
a.loc[x['ind']-1,'ema'] 你在计算 ema 根据之前的数值 open 和 ema.
- 一开始，只有一个值在 ema 所以只有下一行被填充。
- 填充并没有发生在原地，所以其余的值仍然没有填充，直到你再次运行脚本。
当你用NaN计算一个值时，结果是Nan。

随着 `apply`

更新全局变量

import numpy as np
import pandas as pd

updated_ema = np.nan

def test(x):
    global updated_ema
    if x['ind'] > 12:
        prev_ema = df.loc[x['ind']-1, 'ema']
        prev_open = df.loc[x['ind']-1, 'open'] * 0.16
        if not np.isnan(prev_ema):
            updated_ema = prev_open + prev_ema * 0.84
        else:
            updated_ema = prev_open + updated_ema * 0.84
        return updated_ema
    else:
        return x['ema']


df.ema = df.apply(lambda x: test(x), axis=1)

Answer 2

问题是 a.apply 完全是在计算新的列，只有在最后你才会分配结果。

这意味着所有的计算都将基于原来未修改的数据，这也解释了为什么只有一行被更新。

一个解决方法就是在行上循环，每次更新一行单元格（顺便说一句，这种方法没有理由会慢）。

如何连续更新数据框列中的每个NaN值。

问题描述投票：0回答：1

1个回答

当前脚本问题

随着 `apply`

最新问题

如何连续更新数据框列中的每个NaN值。

问题描述 投票：0回答：1

1个回答

当前脚本问题

随着 apply

最新问题

问题描述投票：0回答：1

随着 `apply`