我有一个数据框如下:
Ville match
Paris Talborjt,Talborjt,Ville Nouvelle
Rome Hay Najah,Hay Najah,Najah
我想保持列每行的唯一值匹配。期望的输出如下:
Ville match contains
Paris Talborjt,Talborjt,Ville Nouvelle Talborjt,Ville Nouvelle
Rome Hay Najah,Hay Najah,Najah Hay Najah, Najah
我尝试了以下方法,但未能获得所需的输出。
all_quartiers['contains']=all_quartiers['match'].apply(set).apply(list)
all_quartiers['contains']=all_quartiers['match'].apply(lambda x: list(set(x)))
all_quartiers['contains']=all_quartiers.explode('match')['match'].unique()
我希望这个解决方案适合您
# Ville match
# Paris Talborjt,Talborjt,Ville Nouvelle
# Rome Hay Najah,Hay Najah,Najah
# import pandas library
import pandas as pd
# Create a dataframe
df = pd.DataFrame(data=[
{'Ville': 'Paris', 'match': 'Talborjt,Talborjt,Ville Nouvelle'},
{'Ville': 'Rome', 'match': 'Hay Najah,Hay Najah,Najah'}
])
def return_unique(col):
# 1. it split the string ',' return a list [Talborjt,Talborjt,Ville Nouvelle]
# 2. convert into set to get unique {Talborjt,Ville Nouvelle}
# 3. join the set data to with ',' so we get Talborjt,Ville Nouvelle
return ','.join(set(col.split(',')))
df['contains'] = df.apply(lambda x: return_unique(x['match']), axis=1)
df