在 pandas 中应用 fillna/ffill/bfill 后保留组列/索引

Question

我有如下数据，新的pandas版本在fillna/ffill/bfill操作后不保留分组列。有没有办法获得分组数据？

data = """one;two;three
1;1;10
1;1;nan
1;1;nan
1;2;nan
1;2;20
1;2;nan
1;3;nan
1;3;nan"""

df = pd.read_csv(io.StringIO(data), sep=";")
print(df)
   one  two  three
0    1    1   10.0
1    1    1    NaN
2    1    1    NaN
3    1    2    NaN
4    1    2   20.0
5    1    2    NaN
6    1    3    NaN
7    1    3    NaN

print(df.groupby(['one','two']).ffill())
   three
0   10.0
1   10.0
2   10.0
3    NaN
4   20.0
5   20.0
6    NaN
7    NaN

Answer 1

对于最新的

pandas

，如果我们想保留

groupby

列，我们需要在此处添加

apply

out = df.groupby(['one','two']).apply(lambda x : x.ffill())
Out[219]: 
   one  two  three
0    1    1   10.0
1    1    1   10.0
2    1    1   10.0
3    1    2    NaN
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

Answer 2

是你所期望的吗？

df['three']= df.groupby(['one','two'])['three'].ffill()
print(df)

# Output:
   one  two  three
0    1    1   10.0
1    1    1   10.0
2    1    1   10.0
3    1    2    NaN
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

Answer 3

是的，请设置索引，然后尝试对其进行分组，以便它将保留如下所示的列：

df = pd.read_csv(io.StringIO(data), sep=";")
df.set_index(['one','two'], inplace=True)
df.groupby(['one','two']).ffill()

Answer 4

对于小型数据集，

.apply

方法效果很好，但它比使用

DataFrameGroupby.ffill

慢得多。对于大型数据集，或者对于速度很重要的其他情况，应首选以下方法。

df.set_index(['one', 'two']).groupby(['one', 'two']).ffill().reset_index()

1500 万行的大型数据集的执行时间差异非常显着：

import timeit
start_time = timeit.default_timer()
df.set_index(['one', 'two']).groupby(['one', 'two']).ffill().reset_index()
print(f'{timeit.default_timer() - start_time):.3f} seconds')

5.412 seconds

与

import timeit
start_time = timeit.default_timer()
df.groupby(['one', 'two']).apply(lambda x: x.ffill())
print(f'{timeit.default_timer() - start_time):.3f} seconds')

74.978 seconds

如果需要保留原始索引，则可以将其存储为列并随后重置：

df['old_index'] = df.index
df.set_index(['one', 'two']).groupby(['one', 
    'two']).ffill().set_index('old_index')

在 pandas 中应用 fillna/ffill/bfill 后保留组列/索引

问题描述投票：0回答：4

4个回答

最新问题

在 pandas 中应用 fillna/ffill/bfill 后保留组列/索引

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4