如何连续更新数据框列中的每个NaN值。

问题描述 投票:0回答:1
  • 我有以下的数据框架。a,尺寸为1762行×9列。在列中 ema 除了第13个元素,其他都是 NaN. 该 ind 列包含相应行的索引。
a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0        NaN   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19
  • 中的所有元素。ema 列,从第14行开始(即在 ind 13以后的列),我想把它们改为 0.84*(ema value in previous row) + 0.16*(value of 'open' in previous row) 通过使用以下方式 apply 函数。
a['ema']=a.apply(lambda x: (a.loc[x['ind']-1,'open']*0.16 + a.loc[x['ind']-1, 'ema']*0.84) if x['ind']>12 else x['ema'] ,axis=1)
  • 只有第14行元素会被更新,后续的行仍为 NaN.
a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0  16.805099   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19
  • 反复执行该命令,产生正确的值,用于处理 ema,一次一个,为后面的行。
  • 谁能帮我说说这里有什么问题吗?
pandas dataframe lambda apply
1个回答
0
投票

当前脚本问题

  • 如果 x['ind']>12 else x['ema'] 无出其右 ind 12是要改变的。
  • a.loc[x['ind']-1,'ema'] 你在计算 ema 根据之前的数值 openema.
    • 一开始,只有一个值在 ema 所以只有下一行被填充。
    • 填充并没有发生在原地,所以其余的值仍然没有填充,直到你再次运行脚本。
  • 当你用NaN计算一个值时,结果是Nan。

随着 apply

  • 更新全局变量
import numpy as np
import pandas as pd

updated_ema = np.nan

def test(x):
    global updated_ema
    if x['ind'] > 12:
        prev_ema = df.loc[x['ind']-1, 'ema']
        prev_open = df.loc[x['ind']-1, 'open'] * 0.16
        if not np.isnan(prev_ema):
            updated_ema = prev_open + prev_ema * 0.84
        else:
            updated_ema = prev_open + updated_ema * 0.84
        return updated_ema
    else:
        return x['ema']


df.ema = df.apply(lambda x: test(x), axis=1)

0
投票

问题是 a.apply 完全是在计算新的列,只有在最后你才会分配结果。

这意味着所有的计算都将基于原来未修改的数据,这也解释了为什么只有一行被更新。

一个解决方法就是在行上循环,每次更新一行单元格(顺便说一句,这种方法没有理由会慢)。

© www.soinside.com 2019 - 2024. All rights reserved.