如何按组将可变数量的空白行附加到数据帧？

Question

我有一个如下所示的数据框，其中包含 x 个人员 ID（超过 1000 人）、x 个每人事务数和 x 个变量（超过 1000 个变量）：

人员_ID	交易_ID	变量_1	变量_2	变量_3	变量_X
人1	交易1	123	0	1	abc
人1	交易2	456	1	0	定义
人1	交易3	123	0	1	abc
人	交易1	123	0	1	abc
人	交易2	456	0	1	定义

我想在每个 person id 组的开头填充包含 -10 的行，以便每个 person id 组的总行数为 6，如下所示：

人员_ID	交易_ID	变量_1	变量_2	变量_3	变量_X
人1	-10	-10	-10	-10	-10
人1	-10	-10	-10	-10	-10
人1	-10	-10	-10	-10	-10
人1	交易1	123	0	1	abc
人1	交易2	456	1	0	定义
人1	交易3	123	0	1	abc
人	-10	-10	-10	-10	-10
人	-10	-10	-10	-10	-10
人	-10	-10	-10	-10	-10
人	-10	-10	-10	-10	-10
人	交易1	123	0	1	abc
人	交易2	456	0	1	定义

这是我尝试过的代码（用 concat 更新）及其下面的错误。

df2 = pd.DataFrame([[''] * len(newdf.columns)], columns=newdf.columns)
df2

for row in newdf.groupby('person_id')['transaction_id']:
   x=newdf.groupby('person_id')['person_id'].nunique()
   if x.any() < 6:
       newdf=pd.concat([newdf, df2*(6-x)], ignore_index=True)

RuntimeWarning: '<' not supported between instances of 'int' and 'tuple', sort order is undefined for incomparable objects.
  newdf=pd.concat([newdf, df2*(6-x)], ignore_index=True)

它将几个 NaN 行附加到数据框的底部，但不会根据需要附加在组之间。预先感谢您，因为我是初学者。

Answer 1

代码

def func1(df):
    n = 6 - len(df)
    if n > 0:
        return pd.concat([pd.DataFrame(-10, columns=df.columns, index=range(0, n)), df])
out = df.groupby('Person_ID', group_keys=False).apply(func1).reset_index(drop=True)

出

如何按组将可变数量的空白行附加到数据帧？

问题描述投票：0回答：1

1个回答

最新问题

如何按组将可变数量的空白行附加到数据帧？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1