我需要替换dataframe列x中的值。结果应该看起来像x_new。因此,我必须将值保留在x列中,其中y为1和255.在1到255之间,我必须将x值替换为y为1的值.255和1之间的值应保持不变。那么如何才能获得x_new列?
我想这可能适用于替换和一些条件,但我不知道如何结合它。我期待着任何帮助和提示。
我的数据框看起来像是:
x y z x_new
12.28 1 1 12.28
11.99 0 1 12.28
11.50 0 1 12.28
11.20 0 1 12.28
11.01 0 1 12.28
9.74 255 0 9.74
13.80 0 0 13.80
15.2 0 0 15.2
17.8 0 0 17.8
12.1 1 1 12.1
11.9 0 1 12.1
11.7 0 1 12.1
11.2 0 1 12.1
10.3 255 0 10.3
尝试:
# mark the occurrences of 1 and 255
df['is_1_255'] = df.y[(df.y==1)|(df.y==255)]
df['x_n'] = None
# copy the 1's
df.loc[df.is_1_255==1,'x_n'] = df.loc[df.is_1_255==1,'x']
# fill is_1_255 with markers,
#255 means between 255 and 1, 1 means between 1 and 255
df['is_1_255'] = df['is_1_255'].ffill()
# update the 255 values
df.loc[df.is_1_255==255, 'x_n'] = df.loc[df.is_1_255==255,'x']
# update the 1 values
df['x_n'].ffill(inplace=True)
输出:
+-----+-------+-----+---+-------+----------+-------+
| idx | x | y | z | x_new | is_1_255 | x_n |
+-----+-------+-----+---+-------+----------+-------+
| 0 | 12.28 | 1 | 1 | 12.28 | 1.0 | 12.28 |
| 1 | 11.99 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 2 | 11.50 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 3 | 11.20 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 4 | 11.01 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 5 | 9.74 | 255 | 0 | 9.74 | 255.0 | 9.74 |
| 6 | 13.80 | 0 | 0 | 13.80 | 255.0 | 13.80 |
| 7 | 15.20 | 0 | 0 | 15.20 | 255.0 | 15.20 |
| 8 | 17.80 | 0 | 0 | 17.80 | 255.0 | 17.80 |
| 9 | 12.10 | 1 | 1 | 12.10 | 1.0 | 12.10 |
| 10 | 11.90 | 0 | 1 | 12.10 | 1.0 | 12.10 |
| 11 | 11.70 | 0 | 1 | 12.10 | 1.0 | 12.10 |
| 12 | 11.20 | 0 | 1 | 12.10 | 1.0 | 12.10 |
| 13 | 10.30 | 255 | 0 | 10.30 | 255.0 | 10.30 |
+-----+-------+-----+---+-------+----------+-------+
假设1和255总是成对出现的干净数据,我们可以形成1-255组和groupby来填充数据。
s = (df.y.eq(1).cumsum() == df.y.eq(255).cumsum()+1)
df['xnew'] = df.groupby(s.ne(s.shift()).cumsum().where(s)).x.transform('first').fillna(df.x)
x y z xnew
0 12.28 1 1 12.28
1 11.99 0 1 12.28
2 11.50 0 1 12.28
3 11.20 0 1 12.28
4 11.01 0 1 12.28
5 9.74 255 0 9.74
6 13.80 0 0 13.80
7 15.20 0 0 15.20
8 17.80 0 0 17.80
9 12.10 1 1 12.10
10 11.90 0 1 12.10
11 11.70 0 1 12.10
12 11.20 0 1 12.10
13 10.30 255 0 10.30
虽然对于这样的事情,你应该真正形成一个彻底的单元测试,因为这种逻辑对于不正确的输入会变得相当棘手和有问题。
多个步骤但有效。查找y为255的行的索引,直到找到下一个1.将值保存在idx中。现在使用idx和另外两个条件(y == 1或y == 255)创建new_x。填补其余部分。
# Index of rows between 255 and 1 in column y
idx = df.loc[df['y'].replace(0, np.nan).ffill() == 255, 'y'].index
# Create x_new1 and assign value of x where index is idx or y == 1 or y ==255
df.loc[idx, 'x_new1'] = df['x']
df.loc[(df['y'] == 1) | (df['y'] == 255) , 'x_new1'] = df['x']
# ffill rest of the values in x_new1
df['x_new1'] = df['x_new1'].ffill()
x y z x_new x_new1
0 12.28 1 1 12.28 12.28
1 11.99 0 1 12.28 12.28
2 11.50 0 1 12.28 12.28
3 11.20 0 1 12.28 12.28
4 11.01 0 1 12.28 12.28
5 9.74 255 0 9.74 9.74
6 13.80 0 0 13.80 13.80
7 15.20 0 0 15.20 15.20
8 17.80 0 0 17.80 17.80
9 12.10 1 1 12.10 12.10
10 11.90 0 1 12.10 12.10
11 11.70 0 1 12.10 12.10
12 11.20 0 1 12.10 12.10
13 10.30 255 0 10.30 10.30