如何在ffill（）期间显示按列分组，而不使用大熊猫进行汇总？

Question

这不是重复的。我已经提到了这个post_1和post_2

我的问题不同，与agg功能无关。它也要显示按列分组的[[ffill操作期间。尽管代码可以正常工作，但只需共享完整的代码即可让您有所了解。 问题在注释行中。在下面寻找那条线。

我有一个如下所示的数据框
df = pd.DataFrame({ 'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2], 'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-05 12:59:00','2173-05-04 13:14:00','2173-05-05 13:37:00','2173-07-06 13:39:00','2173-07-08 11:30:00','2173-04-08 16:00:00','2173-04-09 22:00:00','2173-04-11 04:00:00','2173- 04-13 04:30:00','2173-04-14 08:00:00'], 'val' :[5,5,5,5,1,6,5,5,8,3,4,6]}) df['time_1'] = pd.to_datetime(df['time_1']) df['day'] = df['time_1'].dt.day df['month'] = df['time_1'].dt.month
此代码在论坛的Jezrael的帮助下所做的是基于阈值的add missing dates。唯一的问题是，我看不到grouped by column during output
df['time_1'] = pd.to_datetime(df['time_1']) df['day'] = df['time_1'].dt.day df['date'] = df['time_1'].dt.floor('d') df1 = (df.set_index('date') .groupby('subject_id') .resample('d') .last() .index .to_frame(index=False)) df2 = df1.merge(df, how='left') thresh = 5 mask = df2['day'].notna() s = mask.cumsum().mask(mask) df2['count'] = s.map(s.value_counts()) df2 = df2[(df2['count'] < thresh) | (df2['count'].isna())] df2 = df2.groupby(df2['subject_id']).ffill() # problem is here #here is the problem dates = df2['time_1'].dt.normalize() df2['time_1'] += np.where(dates == df2['date'], 0, df2['date'] - dates) df2['day'] = df2['time_1'].dt.day df2['val'] = df2['val'].astype(int)
如上面的代码所示，我尝试了以下方法
df2 = df2.groupby(df2['subject_id']).ffill() # doesn't help df2 = df2.groupby(df2['subject_id']).ffill().reset_index() # doesn't help df2 = df2.groupby('subject_id',as_index=False).ffill() # doesn't help
没有subject_id的错误输出

我希望我的输出也具有subject_id列

Answer 1

这里有2种可能的解决方案-在groupby之后指定列表中的所有列并分配回来：

cols = df2.columns.difference(['subject_id']) df2[cols] = df2.groupby('subject_id')[cols].ffill() # problem is here #here is the problem

或按subject_id列创建索引并按索引分组：#newer pandas versions
df2 = df2.set_index('subject_id').groupby('subject_id').ffill().reset_index()

#oldier pandas versions
df2 = df2.set_index('subject_id').groupby(level=0).ffill().reset_index()


dates = df2['time_1'].dt.normalize() 
df2['time_1'] += np.where(dates == df2['date'], 0, df2['date'] - dates)
df2['day'] = df2['time_1'].dt.day
df2['val'] = df2['val'].astype(int)
print (df2)
     subject_id       date              time_1  val  day  month  count
0             1 2173-04-03 2173-04-03 12:35:00    5    3    4.0    NaN
1             1 2173-04-03 2173-04-03 12:50:00    5    3    4.0    NaN
2             1 2173-04-04 2173-04-04 12:50:00    5    4    4.0    1.0
3             1 2173-04-05 2173-04-05 12:59:00    5    5    4.0    1.0
32            1 2173-05-04 2173-05-04 13:14:00    5    4    5.0    1.0
33            1 2173-05-05 2173-05-05 13:37:00    1    5    5.0    1.0
95            1 2173-07-06 2173-07-06 13:39:00    6    6    7.0    1.0
96            1 2173-07-07 2173-07-07 13:39:00    6    7    7.0    1.0
97            1 2173-07-08 2173-07-08 11:30:00    5    8    7.0    1.0
98            2 2173-04-08 2173-04-08 16:00:00    5    8    4.0    NaN
99            2 2173-04-09 2173-04-09 22:00:00    8    9    4.0    NaN
100           2 2173-04-10 2173-04-10 22:00:00    8   10    4.0    1.0
101           2 2173-04-11 2173-04-11 04:00:00    3   11    4.0    1.0
102           2 2173-04-12 2173-04-12 04:00:00    3   12    4.0    1.0
103           2 2173-04-13 2173-04-13 04:30:00    4   13    4.0    1.0
104           2 2173-04-14 2173-04-14 08:00:00    6   14    4.0    1.0

如何在ffill（）期间显示按列分组，而不使用大熊猫进行汇总？

问题描述投票：1回答：1

1个回答

最新问题

如何在ffill（）期间显示按列分组，而不使用大熊猫进行汇总？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1