#first block: calculating last purchase date
from datetime import timedelta
last_purchase_date = (sales_data['TRANSAC_DATE'].max()) + timedelta(days=1)
print("Last purchase Date: ", sales_data['TRANSAC_DATE'].max())
print("Recency/Last purchase Date: ", last_purchase_date)
#Second block: calculating Recency of last purchase in RFM analysis
RFM = sales_data.groupby(['CLIENT_ID']).agg({
'CLIENT_ID': lambda x: (last_purchase_date - x.max()).days,
'Transaction_ID': 'count',
'NET': 'sum'
})
#Error line: lambda x: (last_purchase_date - x.max()).days
RFM.rename(columns={'CLIENT_ID': 'Recency', 'Transaction_ID': 'Frequency', 'NET': 'MonetaryValue'}, inplace= True)
display(RFM)
问题:我想要以天为单位的新近度,但我无法从last_purchase_date(时间戳)中减去整数数组中的输出 x.max() #错误行: lambda x: (last_purchase_date - x.max()).days #Error msg: 不再支持带有时间戳的整数和整数数组的加/减。不要使用加/减
n
,而是使用 n * obj.freq
从你有限的问题来看,我的理解是 -
sales_data
是你的DataFrame
,last_purchase_date
是timestamp
,如果我的理解是正确的,试试这个:
import pandas as pd
last_purchase_date = pd.Timestamp(last_purchase_date)
def calculate_recency(x):
return (last_purchase_date - x.max()).days
RFM = sales_data.groupby(['CLIENT_ID']).agg({
'CLIENT_ID': calculate_recency,
'Transaction_ID': lambda x: len(x),
'NET': 'sum'
})
RFM.rename(columns={'CLIENT_ID': 'Recency', 'Transaction_ID': 'Frequency', 'NET': 'MonetaryValue'}, inplace=True)
display(RFM)