[找到匹配项时通过创建新列来分配新单词

问题描述 投票:0回答:1

我有一个调查df,我想为“现有客户”指定一个新值或根据他们的回答“基于新客户”。例如,如果某人有3个答案,但其中一个他们匹配“可口可乐”,我想给他们现有客户的价值这是数据框:

 ID       Question                                                Answer
101005   what brands did you purchase the past 5 months   Coca-Cola or Pepsi or vitamin water
026458   what brands did you purchase the past 5 months           None
045987   what brands  did you purchase the past 5 months        Coca-Cola

这是我想要的桌子

ID        Question                                          Answer                      Buyer_Type

101005   what brands did you purchase the past 5 months  Coca-Cola,Pepsi,fanta          Existing Users          
026458   what brands did you purchase the past 5 months  None                           New Buyer              
045987   what brands did you purchase the past 5 months  Coca-Cola                      Existing Users

我尝试了此代码,但由于某种原因,它显示例如101005作为新买家,即使此ID表示他们过去曾购买可口可乐:

deux['Buyer_Type'] = deux['answer'].apply(lambda x:'existing buyer' if x == 'Coca-Cola' else 'new buyer') 

由于某些原因,它无法将101005识别为现有用户

python pandas matching
1个回答
0
投票

[补充@Quang Hoang的评论,添加case=False以及cocacola的两个条件将有助于解决方案更灵活,如示例所示,适用于Answer列中的不同类型的值:

df = pd.DataFrame({'ID':[1,2,3,4],'Answer':['Coca-Cola',None,'coca-cola','cocaCola']})
df['Buyer_Type'] = np.where(df['Answer'].str.contains('coca',case=False) & df['Answer'].str.contains('cola',case=False),
                            "Existing user","New buyer")

输出:

   ID     Answer     Buyer_Type
0   1  Coca-Cola  Existing user
1   2       None      New buyer
2   3  coca-cola  Existing user
3   4   cocaCola  Existing user
© www.soinside.com 2019 - 2024. All rights reserved.