如何将任意函数按组应用到 Pandas 数据帧?该函数应该能够立即访问整个组 df,就像它是一个完整的 pandas 数据帧一样。
import pandas as pd
def arbitrary_function(df):
"""This function acts on groups of a df. It can see every row and column of a group df."""
# for example
# making a new column by accessing other columns in the df
df['new_col'] = df['data_col'].sum()
# return the original df with the new column
return df
df = pd.DataFrame([[1, 2], [1, 3], [2, 6], [2, 1]], columns=["group_col", "data_col"])
团体操作前:
df
group_col data_col
0 1 2
1 1 3
2 2 6
3 2 1
# group the dataframe by group_col
# run arbitrary_function() on the df groups
# the first run of arbitrary_function can see one group df as such:
# group_col data_col
# 0 1 2
# 1 1 3
# return to the original data - no more groups
预期输出:
df
group_col data_col new_col
0 1 2 5
1 1 3 5
2 2 6 7
3 2 1 7
应该这样做:
IIUC 您可以简单地按
group_col
、apply
您的函数进行分组,然后 reset_index
,删除分组索引:
out = (df
.groupby(['group_col'])
.apply(arbitrary_function)
.reset_index(drop=True)
)
样本数据的输出:
group_col data_col new_col
0 1 2 5
1 1 3 5
2 2 6 7
3 2 1 7