我有这个数据框:
import pandas as pd
data = {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
'salary': [200, 100, 300, 200,500,1000],
}
df = pd.DataFrame(data)
打印(df)`
我需要添加一个名为“仅”的新列,其值:
'真实' 如果列 ['name','surname','job'] 的值集是唯一的 或“假” 如果列 ['name','surname','job'] 的值集不是唯一的
喜欢:
data_answer= {'name': ['Tom', 'nick', 'krish', 'jack','Tom','Tom'],
'surname': ['smith', 'nielsen', 'hawk', 'boxer','bless','smith'],
'job': ['boxer', 'writer', 'officer', 'driver','barman','boxer'],
'salary': [200, 100, 300, 200,500,1000],
'is only': ['false','true','true','true','true','false']
}
data_answer = pd.DataFrame(data_answer)
print(data_answer)
name surname job salary is only
0 Tom smith boxer 200 false
1 nick nielsen writer 100 true
2 krish hawk officer 300 true
3 jack boxer driver 200 true
4 Tom bless barman 500 true
5 Tom smith boxer 1000 false
谁能帮我找到解决办法?
pandas.DataFrame.duplicated
df["is only"] = ~df.duplicated(subset=["name", "surname", "job"], keep=False)
波浪号 (
~
) 反转结果,因为您对重复项的反面感兴趣。
我认为这可以帮助:
# helper feature
combined = df['name'] + "__" + df["surname"] + "__" + df["job"]
# count how many times it appears
count_combined = combined.value_counts()
# create feature, True when count == 1
df["is only"] = np.where(count_combined[combined] == 1, True , False)