我有一个数据框(
new_df
),其中包含以下数据:
Unnamed: 0 Words embedding
0 0 Elephant [-0.017855134320424067, -0.008739002273680945,...
1 1 Lion [-0.001514446088710819, -0.010011047775235734,...
2 2 Tiger [-0.013417221828848814, -0.009594874215361255,...
3 3 Dog [-0.0009933243881749651, -0.015114395874422861...
4 4 Cricket [0.003905495127549335, -0.0072066829816015395,...
5 5 Footbal [-0.011442505362835323, -0.008127146122306163,...
我选择数据帧的值作为
text_embedding = new_df["embedding"][1]
现在,当我运行以下代码片段来计算一些值时:
import numpy as np
def cosine_similarity(A, B):
# Calculate dot product
dot_product = sum(a*b for a, b in zip(A, B))
# Calculate the magnitude of each vector
magnitude_A = sum(a*a for a in A)**0.5
magnitude_B = sum(b*b for b in B)**0.5
cosine_similarity = dot_product / (magnitude_A * magnitude_B)
print(f"Cosine Similarity using standard Python: {cosine_similarity}")
array1 = np.array(new_df['embedding'][0])
array2 = np.array(text_embedding)
cosine_similarity(array1,array2)
我收到此错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[13], line 29
25 array2 = np.array(text_embedding)
27 #print(array1)
28 #print(array2)
---> 29 cosine_similarity(array1,array2)
31 #cosine_similarity([1,2,3,1,1,2,2,2,2,2,2,3,2,2,2,2,2],[1,2,3,1,1,2,2,2,2,2,2,5,2,2,2,2,2])
32
33 #df
Cell In[13], line 10, in cosine_similarity(A, B)
7 def cosine_similarity(A, B):
8
9 # Calculate dot product
---> 10 dot_product = sum(a*b for a, b in zip(A, B))
12 # Calculate the magnitude of each vector
13 magnitude_A = sum(a*a for a in A)**0.5
TypeError: iteration over a 0-d array
当我尝试下面的代码时,它工作正常:
cosine_similarity([1,2,3,1,1,2,2,2,2,2,2,3,2,2,2,2,2],[1,2,3,1,1,2,2,2,2,2,2,5,2,2,2,2,2])
我不明白为什么从数据框中获取列表时会出现此错误。这两个变量都是一维数组,所以我不确定为什么在传递手动提供的值时 array1、array2 失败。
正如 @EmiOB 在评论中所建议的,问题是 new_df['embedding'][0] 和 text_embedding 的类型。两者都返回像这样的字符串 '[1.0, 1.1 ,1.2]' 而不是像 [1.0, 1.1 ,1.2] 这样的数组
将代码更改为以下
from ast import literal_eval
array1 = literal_eval(text_embedding)
array2 = literal_eval(new_df['embedding'][0])
它返回的是浮点数组而不是之前的字符串。