我想为电子商务创建一个
User-Based collaborative filter
这是我的步骤
创建一个与产品和用户具有外键关系的事件模型
首先我创建一个事件和一个推荐模型
class Event(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, models.CASCADE, related_name="user_event_set")
product = models.ForeignKey('product.Product', models.CASCADE, related_name="product_event_list")
created_at = models.DateTimeField(auto_now_add=True)
class EventType(models.IntegerChoices):
seen = 1, "Visited the product page"
cart = 2, "Added to cart"
bought = 3, "User purchased the product"
searched = 4, "User searched for the product"
event_type = models.IntegerField(choices=EventType.choices)
class Recommendation(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, models.CASCADE, related_name="user_recommended_set")
product = models.ForeignKey('product.Product', models.CASCADE, related_name="product_recommended_set")
created_at = models.DateTimeField(auto_now_add=True)
现在要获得类似的产品这是我的代码:
为了得到相似的用户,我得到所有的用户
def get_similar_users(user, limit=5):
all_users = User.objects.exclude(id=user.id).prefetch_related('user_event_set')
similarities = [(other_user, calculate_similarity(user, other_user)) for other_user in all_users]
similarities.sort(key=lambda x: x[2], reverse=True)
return [user_similarity[0] for user_similarity in similarities[:limit]]
使用这个函数计算相似度:
def calculate_similarity(user1, user2):
user1_events = set(user1.user_event_list.values_list('product_id', flat=True))
user2_events = set(user2.user_event_list.values_list('product_id', flat=True))
intersection = user1_events & user2_events
union = user1_events | user2_events
similarity = len(intersection) / len(union) if len(union) > 0 else 0
weight = intersection.aggregate(weight=models.Sum('event_type'))
#return by event priority
return similarity, weight
我使用权重来优先排序用户,购买>购物车>搜索>查看
def recommend_products(user, limit=5):
similar_users = get_similar_users(user, limit)
already_recommended_product = set(Recommendation.objects.filter(user=user).values_list('product_id', flat=True))
to_be_recommended_products = set()
for similar_user in similar_users:
events = Event.objects.filter(user__in=similar_user).exclude(
models.Q(
models.Q(product_id__in=to_be_recommended_products) | models.Q(product_id__in=already_recommended_product)
)
)
for event in events:
to_be_recommended_products.add(event.product)
return to_be_recommended_products
现在我只是使用从
recommend_products
功能返回的产品为用户批量创建推荐模型
现在当用户想要推荐时,他/她只需在这个 RestfulAPI 中对其进行分页:
class RecommendedProduct(ListAPIView):
queryset = Recommendation.objects.all().select_related('product')
serializer_class = RecommendationSerializer
permission_classes = (IsAuthenticated,)
def get_queryset(self):
qs = super().get_queryset()
qs = qs.filter(user=self.request.user)
return qs
这只是一个理论代码,可能有问题,但稍后会修复。
显然所有的计算都将使用芹菜创建。
这里的问题是: 这是一个好方法吗?它会消耗资源,因为这是一个多租户项目 我应该找一个人工智能模型吗? 我正在考虑将事件数据转换为 csv 以用于 AI 模型,但 csv 会很糟糕 对于 IO,是否有不同的存储方法?我应该使用 no-sql 吗?