我有下面的数据集。我正在尝试通过提供标签来确定客户类型。尝试时由于数据过多,我的excel崩溃了,所以尝试用Python完成。
item customer qty
------------------
ProdA CustA 1
ProdA CustB 1
ProdA CustC 1
ProdA CustD 1
ProdB CustA 1
ProdB CustB 1
在Excel中,我会:
1. Create new columns "ProdA", "ProdB", "Type"
2. Remove duplicates for column "customer"
3. COUNTIF Customer = ProdA, COUNTIF customer = ProdB
4. IF(AND(ProdA = 1, ProdB = 1), "Both", "One")
customer ProdA ProdB Type
--------------------------
CustA 1 1 Both
CustB 1 1 Both
CustC 1 0 One
CustD 1 0 One
我们可以使用pd.crosstab
,然后使用ProdA
和ProdB
至map
的总和来实现:
dfn = pd.crosstab(df['customer'], df['item']).reset_index()
dfn['Type'] = dfn[['ProdA', 'ProdB']].sum(axis=1).map({2:'Both', 1:'One'})
item customer ProdA ProdB Type
0 CustA 1 1 Both
1 CustB 1 1 Both
2 CustC 1 0 One
3 CustD 1 0 One