从dataframe列中获取先前的值

问题描述 投票:2回答:1

我是python的新手。现在,这里有一个看起来像这样的数据框。

DocumentId                    offset        feature
0                              0              2000             
0                              7              2000
0                              16             0
0                              27             0
0                              36             0 
0                              40             0
0                              46             0
0                              57             0
0                              63             0
0                              78             0
0                              88             0
0                              91             0   
0                              103           2200
1                               109           0
1                               113           2200
1                               126           2200  
1                               131           2200 
1                                137           2200
1                                 142            0
1                                 152            0
1                                 157           200 
1                                159           200
1                                 161           200
1                                   167            0
1                                 170           200

现在,在此数据帧中,我有一列称为feature。它有一些值,其中也有0值,其他是positive values

现在,每当正值变为零值时,我们就必须取其前三个值。因此,我们假设它将始终以正值开头。

因此来自此数据集,

我首先得到的结果将是

 offset1     feature   offset2   feature   offset3   feature   
       -           -         0        2000       7       2000  

   for the `(-)` as it does not have the third value.

现在,它会继续检查值,只要正值变为零,我就必须取其中的前三个。

所以,在这里,

after 2200 which is at offset 103 value becomes the 0 which is at 109

现在,我必须取109 offset的前三个值,以便数据成为

offset1     feature    offset2    feature   offset3   feature   
   -           -          0        2000       7       2000  
   103        2200        91         0         88         0   

此后从正到0的值变为偏移142,因此,我必须取前三个。

offset1     feature    offset2    feature   offset3   feature   
       -           -          0        2000       7       2000  
       103        2200        91         0         88         0 
       137        2200        131      2200       126      2200
       161        200         159       200       157      200

我有什么办法可以达到这个结果?谢谢。

我尝试过的是

zero_indexes = list(input_with[input_csv['RFC_PREDICTEDFEATURE'] == 0].index)
df2 = pd.DataFrame()
for each_zero_index in zero_indexes:
    value = input_with['feature'].loc[each_zero_index - 1: each_zero_index].values[0]
    if value != 0 :
        df1 = input_with.loc[each_zero_index - 3: each_zero_index]
        df2 = df2.append(df1) 
python python-3.x pandas numpy
1个回答
0
投票

使用strides

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

n = 3
x = np.concatenate([[np.nan] * (n - 1), df['feature'].values])
#change order by indexing [:, ::-1]
arr1 = rolling_window(x, n)[:, ::-1]

print (arr1)
[[2000.   nan   nan]
 [2000. 2000.   nan]
 [   0. 2000. 2000.]
 [   0.    0. 2000.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [2200.    0.    0.]
 [   0. 2200.    0.]
 [2200.    0. 2200.]
 [2200. 2200.    0.]
 [2200. 2200. 2200.]
 [2200. 2200. 2200.]
 [   0. 2200. 2200.]
 [   0.    0. 2200.]
 [ 200.    0.    0.]
 [ 200.  200.    0.]
 [ 200.  200.  200.]
 [   0.  200.  200.]
 [ 200.    0.  200.]]

x = np.concatenate([[np.nan] * (n - 1), df['offset'].values])
arr2 = rolling_window(x, n)[:, ::-1]

print (arr2)
[[  0.  nan  nan]
 [  7.   0.  nan]
 [ 16.   7.   0.]
 [ 27.  16.   7.]
 [ 36.  27.  16.]
 [ 40.  36.  27.]
 [ 46.  40.  36.]
 [ 57.  46.  40.]
 [ 63.  57.  46.]
 [ 78.  63.  57.]
 [ 88.  78.  63.]
 [ 91.  88.  78.]
 [103.  91.  88.]
 [109. 103.  91.]
 [113. 109. 103.]
 [126. 113. 109.]
 [131. 126. 113.]
 [137. 131. 126.]
 [142. 137. 131.]
 [152. 142. 137.]
 [157. 152. 142.]
 [159. 157. 152.]
 [161. 159. 157.]
 [167. 161. 159.]
 [170. 167. 161.]]

然后为'column'的第一个arr1中的正和下一个0值创建掩码:

m =  np.append(arr1[1:, 0] == 0, False) & (arr1[:, 0] != 0)
arr1 = arr1[m] 
arr2 = arr2[m] 

#change order in first row
arr1[:1] = arr1[:1, ::-1]
arr2[:1] = arr2[:1, ::-1]


#create DataFrames and join together    
df1 = pd.DataFrame(arr1).add_prefix('feature_')
df2 = pd.DataFrame(arr2).add_prefix('offset_')
print (df1)
   feature_0  feature_1  feature_2
0        NaN     2000.0     2000.0
1     2200.0        0.0        0.0
2     2200.0     2200.0     2200.0
3      200.0      200.0      200.0

print (df2)
   offset_0  offset_1  offset_2
0       NaN       0.0       7.0
1     103.0      91.0      88.0
2     137.0     131.0     126.0
3     161.0     159.0     157.0

#https://stackoverflow.com/a/45122187/2901002
#change order of columns
a = np.arange(n).astype(str)
cols = [item for x in a for item in ('offset_' + x, 'feature_' + x)]
df = pd.concat([df1, df2], axis=1)[cols]
print (df)
   offset_0  feature_0  offset_1  feature_1  offset_2  feature_2
0       NaN        NaN       0.0     2000.0       7.0     2000.0
1     103.0     2200.0      91.0        0.0      88.0        0.0
2     137.0     2200.0     131.0     2200.0     126.0     2200.0
3     161.0      200.0     159.0      200.0     157.0      200.0
© www.soinside.com 2019 - 2024. All rights reserved.