pandas 如何标记一组列是否唯一?

问题描述 投票:0回答:2

我有这个数据框:

import pandas as pd
data = {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
        'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
        'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
        'salary': [200, 100, 300, 200,500,1000],
        }

df = pd.DataFrame(data)

打印(df)`

我需要添加一个名为“仅”的新列,其值:

'真实' 如果列 ['name','surname','job'] 的值集是唯一的 或“假” 如果列 ['name','surname','job'] 的值集不是唯一的

喜欢:

data_answer= {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
        'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
        'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
        'salary': [200, 100, 300, 200,500,1000],
        'is only': ['false','true','true','true','true','false']
        }

data_answer = pd.DataFrame(data_answer)
print(data_answer)

    name  surname      job  salary is only
0    Tom    smith    boxer     200   false
1   nick  nielsen   writer     100    true
2  krish     hawk  officer     300    true
3   jack    boxer   driver     200    true
4    Tom    bless   barman     500    true
5    Tom    smith    boxer    1000   false

谁能帮我找到解决办法?

python pandas unique
2个回答
0
投票

有专门的方法:

pandas.DataFrame.duplicated

df["is only"] = ~df.duplicated(subset=["name", "surname", "job"], keep=False)

波浪号 (

~
) 反转结果,因为您对重复项的反面感兴趣。


0
投票

我认为这可以帮助:

# helper feature
combined = df['name'] + "__" + df["surname"] + "__" + df["job"]

# count how many times it appears 
count_combined = combined.value_counts()

# create feature, True when count == 1
df["is only"] = np.where(count_combined[combined] == 1, True , False)
© www.soinside.com 2019 - 2024. All rights reserved.