在 Django 中构建基于用户的协同过滤系统

问题描述 投票:0回答:1

我正在尝试仅使用购买历史记录在

user based collaborative filtering
中为
Django
构建一个简单的
E-commerce

这是我使用的步骤,我知道它需要更多改进,但我不知道下一步是什么。

这是产品型号

class Product(models.Model):
    name = models.CharField(max_length=100)
    description = models.TextField()

这是购买模型

class Purchase(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)
    purchase_date = models.DateTimeField(auto_now_add=True)

现在获取相似用户

def find_similar_users(user, k=5):
    all_users = User.objects.exclude(id=user.id)
    similarities = [(other_user, jaccard_similarity(user, other_user)) for other_user in all_users]
    similarities.sort(key=lambda x: x[1], reverse=True)
    return [user_similarity[0] for user_similarity in similarities[:k]]

并计算每个之间的相似度:

def jaccard_similarity(user1, user2):
    user1_purchases = set(Purchase.objects.filter(user=user1).values_list('product_id', flat=True))
    user2_purchases = set(Purchase.objects.filter(user=user2).values_list('product_id', flat=True))

    intersection = user1_purchases.intersection(user2_purchases)
    union = user1_purchases.union(user2_purchases)

    return len(intersection) / len(union) if len(union) > 0 else 0

现在这是我的入口函数:

def recommend_products(user, k=5):
    similar_users = find_similar_users(user, k)
    recommended_products = set()

    for similar_user in similar_users:
        purchases = Purchase.objects.filter(user=similar_user).exclude(product__in=recommended_products)
        for purchase in purchases:
            recommended_products.add(purchase.product)

    return recommended_products

现在,显然那会很慢,我正在考虑在另一个

no-sql
数据库中使用数据的副本。

现在如果用户

A
购买东西,我将数据复制到另一个数据库,进行计算并将返回的类似产品“显然使用 celery 等后台服务”存储在 no-sql 数据库中,稍后为用户检索它们
A
如果需要,这是正确的方法吗?

python django recommendation-engine collaborative-filtering
1个回答
1
投票

您可以通过以下方式大大提高效率:

def find_similar_users(user, k=5):
    all_users = User.objects.exclude(id=user.id).prefetch_related('purchase_set')
    similarities = [
        (other_user, jaccard_similarity(user, other_user))
        for other_user in all_users
    ]


def jaccard_similarity(user1, user2):
    user1_purchases = {
        purchase.product_id for purchase in user1.purchase_set.all()
    }
    user1_purchases = {
        purchase.product_id for purchase in user2.purchase_set.all()
    }

    intersection = user1_purchases.intersection(user2_purchases)
    union = user1_purchases.union(user2_purchases)

    return len(intersection) / len(union) if len(union) > 0 else 0

这将检索“批量”中的所有

Purchase
s,因此只进行两次查询,这可能是瓶颈所在。

© www.soinside.com 2019 - 2024. All rights reserved.