考虑这个 df:
data = {'ID': [1071.0, 1072.0, nan, 1074.0, 1076.0, nan, nan, nan, 1077.0],
'Name Type': ['Primary Name', 'Primary Name', 'Also Known As', 'Primary Name', 'Primary Name', 'Low Quality AKA', 'Low Quality AKA', 'Low Quality AKA', 'Primary Name'],
'Surname': ['Brown', 'Red', 'R', 'Green', 'Purple', 'Pipi', 'Poopa', 'Peep', 'Orange']}
还有更多列在具有主要名称的行中包含信息,但在 akas 中为空。我需要连接每个主要名称 - 姓氏下的值(如果它们是低质量 AKA 或也称为)并实现此数据框:
将此数据框
df
作为输入:
ID Name Type Surname
0 1071.0 Primary Name Brown
1 1072.0 Primary Name Red
2 NaN Also Known As R
3 1074.0 Primary Name Green
4 1076.0 Primary Name Purple
5 NaN Low Quality AKA Pipi
6 NaN Low Quality AKA Poopa
7 NaN Low Quality AKA Peep
8 1077.0 Primary Name Orange
您可以使用这种方法:
df_aka_filter = df["ID"].isna()
df["ID"] = df["ID"].ffill()
df_aka = df[df_aka_filter]
df_aka = (
df_aka.groupby("ID", as_index=False)
.agg(lambda x: ";".join(x))
.drop(columns="Name Type")
.rename(columns={"Surname": "AKAs"})
)
df = pd.merge(df[~df_aka_filter], df_aka, on="ID", how="left")
ID Name Type Surname AKAs
0 1071.0 Primary Name Brown NaN
1 1072.0 Primary Name Red R
2 1074.0 Primary Name Green NaN
3 1076.0 Primary Name Purple Pipi;Poopa;Peep
4 1077.0 Primary Name Orange NaN