我有这个 csv 文件“rfm_data.csv”:
CustomerID PurchaseDate TransactionAmount ProductInformation
8814 11-04-23 943.31 Product C
2188 11-04-23 463.70 Product A
4608 11-04-23 80.28 Product A
2559 11-04-23 221.29 Product A
我用这段代码读取和转换数据:
data = pd.read_csv("rfm_data.csv")
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'], format='%d-%m-%y')
data['Recency'] = (datetime.now().date() - data['PurchaseDate'].dt.date).dt.days
当我打印(数据)时,我收到此错误消息:
AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?
如果我从最后一行代码中删除 dt.day,我会得到以下结果:
CustomerID PurchaseDate TransactionAmount ProductInformation Recency
8814 2023-04-11 943.31 Product C 140 days, 0:00:00
2188 2023-04-11 463.70 Product A 140 days, 0:00:00
4608 2023-04-11 80.28 Product A 140 days, 0:00:00
2559 2023-04-11 221.29 Product A 140 days, 0:00:00
但是我想要的【Recency】只是天数,以便进一步计算。
您的问题在于调用
.dt.date
,它返回一个普通的 Python 日期对象列 - 没有 dt
访问器。由于您的输入只有日期,因此不需要标准化为日期。如果您无论如何都需要这样做,请使用 .dt.floor("d")
。
例如:
from io import StringIO
import pandas as pd
s = """CustomerID PurchaseDate TransactionAmount ProductInformation
8814 11-04-23 943.31 Product-C
2188 11-04-23 463.70 Product-A
4608 11-04-23 80.28 Product-A
2559 11-04-23 221.29 Product-A"""
data = pd.read_csv(StringIO(s), sep=" ")
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'], format='%d-%m-%y')
data['Recency'] = (pd.Timestamp("now").floor("d") - data['PurchaseDate']).dt.days
print(data)
CustomerID PurchaseDate TransactionAmount ProductInformation Recency
0 8814 2023-04-11 943.31 Product-C 140
1 2188 2023-04-11 463.70 Product-A 140
2 4608 2023-04-11 80.28 Product-A 140
3 2559 2023-04-11 221.29 Product-A 140