数据框:在每行的列表中保留唯一值

问题描述 投票:0回答:1

我有一个数据框如下:

Ville   match
Paris   Talborjt,Talborjt,Ville Nouvelle
Rome    Hay Najah,Hay Najah,Najah

我想保持列每行的唯一值匹配。期望的输出如下:

Ville   match                               contains
Paris   Talborjt,Talborjt,Ville Nouvelle    Talborjt,Ville Nouvelle
Rome    Hay Najah,Hay Najah,Najah           Hay Najah, Najah

我尝试了以下方法,但未能获得所需的输出。

all_quartiers['contains']=all_quartiers['match'].apply(set).apply(list)

all_quartiers['contains']=all_quartiers['match'].apply(lambda x: list(set(x)))

all_quartiers['contains']=all_quartiers.explode('match')['match'].unique()
pandas dataframe list aws-lambda set
1个回答
0
投票

我希望这个解决方案适合您

# Ville   match
# Paris   Talborjt,Talborjt,Ville Nouvelle
# Rome    Hay Najah,Hay Najah,Najah
# import pandas library
import pandas as pd
# Create a dataframe
df = pd.DataFrame(data=[
    {'Ville': 'Paris', 'match': 'Talborjt,Talborjt,Ville Nouvelle'}, 
    {'Ville': 'Rome', 'match': 'Hay Najah,Hay Najah,Najah'}
])
def return_unique(col):
    # 1. it split the string ',' return a list [Talborjt,Talborjt,Ville Nouvelle]
    # 2. convert into set to get unique {Talborjt,Ville Nouvelle}
    # 3. join the set data to with ',' so we get Talborjt,Ville Nouvelle
    return ','.join(set(col.split(','))) 
df['contains'] = df.apply(lambda x: return_unique(x['match']), axis=1)
df

© www.soinside.com 2019 - 2024. All rights reserved.