我的熊猫数据框如下:
No IsRenew PrevNo
0 IAB19 TRUE -
1 IAB25 FALSE -
2 IAB56 TRUE IAB19
3 IAB22 TRUE IAB56
4 IAB81 TRUE IAB22
5 IAB82 TRUE -
6 IAB89 FALSE IAB82
我想为每个组生成唯一的ID。例如,
No UniqueID
0 IAB19 1
1 IAB25 2
2 IAB56 1
3 IAB22 1
4 IAB81 1
5 IAB82 3
6 IAB89 3
我应该如何按它们分组或合并/加入它们以获得上方的输出?
将networkx
与connected_components
一起使用:
connected_components
编辑:
import networkx as nx
# Create the graph from the dataframe
g = nx.Graph()
#replace - to df['No'] values
df['PrevNo'] = df['PrevNo'].mask(df['PrevNo'] == '-', df['No'])
# if - are missing values
#df['PrevNo'] = df['PrevNo'].fillna(df['No'])
g.add_edges_from(df[['No','PrevNo']].itertuples(index=False))
connected_components = nx.connected_components(g)
# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
for node in component:
node2id[node] = cid + 1
df['UniqueID'] = df['No'].map(node2id)
print (df)
No IsRenew PrevNo UniqueID
0 IAB19 True IAB19 1
1 IAB25 False IAB25 2
2 IAB56 True IAB19 1
3 IAB22 True IAB56 1
4 IAB81 True IAB22 1
5 IAB82 True IAB82 3
6 IAB89 False IAB82 3