是否有Python的方法可以在时间序列数据帧中按列向下浏览并选择序列中的第一个数字,然后将其向前推直到下一个NaN,然后获取下一个非NaN编号并按该一个向下直到下一个NaN,依此类推(保留索引和NaN)。
例如,我想转换此数据框:
DF = pd.DataFrame(data={'A':[np.nan,1,3,5,7,np.nan,2,4,6,np.nan], 'B':[8,6,4,np.nan,np.nan,9,7,3,np.nan,3], 'C':[np.nan,np.nan,4,2,6,np.nan,1,5,2,8]})
A B C
0 NaN 8.0 NaN
1 1.0 6.0 NaN
2 3.0 4.0 4.0
3 5.0 NaN 2.0
4 7.0 NaN 6.0
5 NaN 9.0 NaN
6 2.0 7.0 1.0
7 4.0 3.0 5.0
8 6.0 NaN 2.0
9 NaN 3.0 8.0
至此数据框:
Result = pd.DataFrame(data={'A':[np.nan,1,1,1,1,np.nan,2,2,2,np.nan], 'B':[8,8,8,np.nan,np.nan,9,9,9,np.nan,3], 'C':[np.nan,np.nan,4,4,4,np.nan,1,1,1,1]})
A B C
0 NaN 8.0 NaN
1 1.0 8.0 NaN
2 1.0 8.0 4.0
3 1.0 NaN 4.0
4 1.0 NaN 4.0
5 NaN 9.0 NaN
6 2.0 9.0 1.0
7 2.0 9.0 1.0
8 2.0 NaN 1.0
9 NaN 3.0 1.0
我知道我可以使用循环来遍历各列来执行此操作,但是希望在较大的数据帧上以更有效的Python方式进行操作时会有所帮助。谢谢。
IIUC:
# where DF is not NaN
mask = DF.notna()
Result = (DF.shift(-1) # fill the original NaN's with their next value
.mask(mask) # replace all the original non-NaN with NaN
.ffill() # forward fill
.fillna(DF.iloc[0]) # starting of the the columns with a non-NaN
.where(mask) # replace the original NaN's back
)
输出:
A B C
0 NaN 8.0 NaN
1 1.0 8.0 NaN
2 1.0 8.0 4.0
3 1.0 NaN 4.0
4 1.0 NaN 4.0
5 NaN 9.0 NaN
6 2.0 9.0 1.0
7 2.0 9.0 1.0
8 2.0 NaN 1.0
9 NaN 3.0 1.0