ValueError:使用可迭代设置时必须具有相等的 len 键和值 |使用 .at

问题描述 投票:0回答:1

我正在编写一个推荐 5 种产品的函数。我使用余弦相似度作为相似度度量,并且使用长度为 2 的数组,该数组由每个产品的 t-SNE 特征值组成,例如x 坐标和 y 坐标。

我的输入是产品的名称,我想迭代数据框,计算输入产品与每个产品之间的余弦相似度,然后通过 df.at 将余弦相似度设置为每行的“dist”列。 point_1 的形状为 (1,2),point_2 的形状为 (2,)。

但是,当我使用示例调用该函数时,收到以下错误消息: ValueError: Must have equal len keys and value when set with an iterable

我应该如何修改我的函数来解决这个问题?

def yesstyle_recommender(product, label, df):

    #filter df by label
    filtered_df = df[df['label'] == label].reset_index().drop('index', axis = 1)

    #extract the product name, has to exactly match
    myItem = filtered_df[filtered_df['name'].str.contains(product, case=False)]

    if myItem.empty:
        print("Product not found.")
        return None

    # extract tsne values for the target item
    X = myItem['X'].values
    Y = myItem['Y'].values
    
    point_1 = np.array([X,Y]).T
    
    #instantiate 'dist' column to 0
    filtered_df['dist'] = 0.0

    #iterate through df and calculate cos sim
    for i in range(len(filtered_df)):
        point_2 = np.array([filtered_df['X'][i], filtered_df['Y'][i]])
        filtered_df.at[i, 'dist'] = np.dot(point_1, point_2) / (norm(point_1) * norm(point_2))

    #sort by 'dist' in descending order
    filtered_df = filtered_df.sort_values('dist', ascending=False)

    top_5_recommendations = filtered_df[['product','brand','price','dist']]

    return top_5_recommendations

#call function using an example product
yesstyle_recommender('Relief Sun','spf', yesstyle)

Error message

pandas dataframe numpy cosine-similarity tsne
1个回答
0
投票

您得到的错误来自这样的事实:X 和 Y 都是具有单个元素的数组,但您得到的是 (2,1) 数组,这不适合稍后在代码中进行逐元素操作。您需要正确处理数组的形状,并确保在余弦相似度计算中比较的点具有正确的形状。除非您的数据看起来与我创建的示例完全不同,否则您应该这样做:

import numpy as np
import pandas as pd
from numpy.linalg import norm

def yesstyle_recommender(product, label, df):
    filtered_df = df[df['label'] == label].reset_index().drop('index', axis=1)
    myItem = filtered_df[filtered_df['name'].str.contains(product, case=False)]

    if myItem.empty:
        print("Product not found.")
        return None

    X = myItem['X'].values[0]
    Y = myItem['Y'].values[0]

    point_1 = np.array([X, Y])
    filtered_df['dist'] = 0.0

    for i in range(len(filtered_df)):
        point_2 = np.array([filtered_df['X'][i], filtered_df['Y'][i]])
        filtered_df.at[i, 'dist'] = np.dot(point_1, point_2) / (norm(point_1) * norm(point_2))

    filtered_df = filtered_df.sort_values('dist', ascending=False)
    top_5_recommendations = filtered_df[['name', 'brand', 'price', 'dist']].head(5)

    return top_5_recommendations

data = {
    'name': ['Relief Sun', 'Sun Stick', 'Moisture Cream', 'UV Shield', 'Sun Gel', 'SPF Lotion'],
    'label': ['spf', 'spf', 'spf', 'spf', 'spf', 'spf'],
    'brand': ['BrandA', 'BrandB', 'BrandC', 'BrandD', 'BrandE', 'BrandF'],
    'price': [15.99, 20.99, 25.99, 30.99, 35.99, 40.99],
    'X': [1.0, 2.0, 1.5, 3.0, 4.0, 5.0],
    'Y': [1.0, 1.5, 2.0, 2.5, 3.0, 3.5]
}

yesstyle = pd.DataFrame(data)

recommendations = yesstyle_recommender('Relief Sun', 'spf', yesstyle)
print(recommendations)

这给出了

             name   brand  price      dist
0      Relief Sun  BrandA  15.99  1.000000
3       UV Shield  BrandD  30.99  0.995893
1       Sun Stick  BrandB  20.99  0.989949
2  Moisture Cream  BrandC  25.99  0.989949
4         Sun Gel  BrandE  35.99  0.989949
© www.soinside.com 2019 - 2024. All rights reserved.