我正在尝试仅使用购买历史记录在
user based collaborative filtering
中为Django
构建一个简单的E-commerce
。 这是产品型号
class Product(models.Model):
name = models.CharField(max_length=100)
description = models.TextField()
这是购买模型
class Purchase(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
purchase_date = models.DateTimeField(auto_now_add=True)
现在获取相似用户
def find_similar_users(user, k=5):
all_users = User.objects.exclude(id=user.id)
similarities = [(other_user, jaccard_similarity(user, other_user)) for other_user in all_users]
similarities.sort(key=lambda x: x[1], reverse=True)
return [user_similarity[0] for user_similarity in similarities[:k]]
并计算每个之间的相似度:
def jaccard_similarity(user1, user2):
user1_purchases = set(Purchase.objects.filter(user=user1).values_list('product_id', flat=True))
user2_purchases = set(Purchase.objects.filter(user=user2).values_list('product_id', flat=True))
intersection = user1_purchases.intersection(user2_purchases)
union = user1_purchases.union(user2_purchases)
return len(intersection) / len(union) if len(union) > 0 else 0
现在这是我的入口函数:
def recommend_products(user, k=5):
similar_users = find_similar_users(user, k)
recommended_products = set()
for similar_user in similar_users:
purchases = Purchase.objects.filter(user=similar_user).exclude(product__in=recommended_products)
for purchase in purchases:
recommended_products.add(purchase.product)
return recommended_products
现在,显然那会很慢,我正在考虑在另一个
no-sql
数据库中使用数据的副本。
现在如果用户
A
购买东西,我将数据复制到另一个数据库,进行计算并将返回的类似产品“显然使用 celery 等后台服务”存储在 no-sql 数据库中,稍后为用户检索它们A
如果需要,这是正确的方法吗?
您可以通过以下方式大大提高效率:
def find_similar_users(user, k=5):
all_users = User.objects.exclude(id=user.id).prefetch_related('purchase_set')
similarities = [
(other_user, jaccard_similarity(user, other_user))
for other_user in all_users
]
def jaccard_similarity(user1, user2):
user1_purchases = {
purchase.product_id for purchase in user1.purchase_set.all()
}
user1_purchases = {
purchase.product_id for purchase in user2.purchase_set.all()
}
intersection = user1_purchases.intersection(user2_purchases)
union = user1_purchases.union(user2_purchases)
return len(intersection) / len(union) if len(union) > 0 else 0
这将检索“批量”中的所有
Purchase
s,因此只进行两次查询,这可能是瓶颈所在。