我有一个调查df,我想为“现有客户”指定一个新值或根据他们的回答“基于新客户”。例如,如果某人有3个答案,但其中一个他们匹配“可口可乐”,我想给他们现有客户的价值这是数据框:
ID Question Answer
101005 what brands did you purchase the past 5 months Coca-Cola or Pepsi or vitamin water
026458 what brands did you purchase the past 5 months None
045987 what brands did you purchase the past 5 months Coca-Cola
这是我想要的桌子
ID Question Answer Buyer_Type
101005 what brands did you purchase the past 5 months Coca-Cola,Pepsi,fanta Existing Users
026458 what brands did you purchase the past 5 months None New Buyer
045987 what brands did you purchase the past 5 months Coca-Cola Existing Users
我尝试了此代码,但由于某种原因,它显示例如101005作为新买家,即使此ID表示他们过去曾购买可口可乐:
deux['Buyer_Type'] = deux['answer'].apply(lambda x:'existing buyer' if x == 'Coca-Cola' else 'new buyer')
由于某些原因,它无法将101005识别为现有用户
[补充@Quang Hoang的评论,添加case=False
以及coca
和cola
的两个条件将有助于解决方案更灵活,如示例所示,适用于Answer
列中的不同类型的值:
df = pd.DataFrame({'ID':[1,2,3,4],'Answer':['Coca-Cola',None,'coca-cola','cocaCola']})
df['Buyer_Type'] = np.where(df['Answer'].str.contains('coca',case=False) & df['Answer'].str.contains('cola',case=False),
"Existing user","New buyer")
输出:
ID Answer Buyer_Type
0 1 Coca-Cola Existing user
1 2 None New buyer
2 3 coca-cola Existing user
3 4 cocaCola Existing user