我在DataFrame col1和col2中有两个列,我需要生成结果列。每个FD都有很少的相关MS,这些MS应该填充在结果列中,如图[f
dict_obj = {'col1': ['FD', 'MS', 'MS', 'FD', 'MS', 'MS', 'MS', 'FD', 'MS', 'MS'],
'col2': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']}
df = pd.DataFrame(dict_obj)
您可以使用GroupBy.agg
,连接字符串并将其分配回“ FD”行:
grp = (df.assign(col3=(df['col1'] == 'FD').cumsum())
.query("col1 == 'MS'")
.groupby('col3')['col2'].agg('|'.join))
df.loc[df['col1'] == 'FD', 'result'] = grp.values # grp.to_numpy(); pandas >= 0.24
df
col1 col2 result
0 FD A B|C
1 MS B NaN
2 MS C NaN
3 FD D E|F|G
4 MS E NaN
5 MS F NaN
6 MS G NaN
7 FD H I|J
8 MS I NaN
9 MS J NaN
df["result"] = ""
df.loc[df["col1"]=="FD", "result"] = df.groupby((df["col1"]=="FD").cumsum()) \
.apply(lambda group: group["col2"][1:].str.cat(sep="|")).values
df