为什么我会得到 TypeError: iteration over a 0-d array,即使我传递的是 1-d 数组?

问题描述 投票:0回答:1

我有一个数据框(

new_df
),其中包含以下数据:

Unnamed: 0       Words                                          embedding
0            0    Elephant  [-0.017855134320424067, -0.008739002273680945,...
1            1        Lion  [-0.001514446088710819, -0.010011047775235734,...
2            2       Tiger  [-0.013417221828848814, -0.009594874215361255,...
3            3         Dog  [-0.0009933243881749651, -0.015114395874422861...
4            4     Cricket  [0.003905495127549335, -0.0072066829816015395,...
5            5     Footbal  [-0.011442505362835323, -0.008127146122306163,...

我选择数据帧的值作为

text_embedding  = new_df["embedding"][1]

现在,当我运行以下代码片段来计算一些值时:

import numpy as np
def cosine_similarity(A, B):

    # Calculate dot product
    dot_product = sum(a*b for a, b in zip(A, B))

    # Calculate the magnitude of each vector
    magnitude_A = sum(a*a for a in A)**0.5
    magnitude_B = sum(b*b for b in B)**0.5
    cosine_similarity = dot_product / (magnitude_A * magnitude_B)
    print(f"Cosine Similarity using standard Python: {cosine_similarity}")

array1 = np.array(new_df['embedding'][0])
array2 = np.array(text_embedding)

cosine_similarity(array1,array2)

我收到此错误:

 ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    Cell In[13], line 29
         25 array2 = np.array(text_embedding)
         27 #print(array1)
         28 #print(array2)
    ---> 29 cosine_similarity(array1,array2)
         31 #cosine_similarity([1,2,3,1,1,2,2,2,2,2,2,3,2,2,2,2,2],[1,2,3,1,1,2,2,2,2,2,2,5,2,2,2,2,2])
         32 
         33 #df
    
    Cell In[13], line 10, in cosine_similarity(A, B)
          7 def cosine_similarity(A, B):
          8 
          9     # Calculate dot product
    ---> 10     dot_product = sum(a*b for a, b in zip(A, B))
         12     # Calculate the magnitude of each vector
         13     magnitude_A = sum(a*a for a in A)**0.5
    
    TypeError: iteration over a 0-d array

当我尝试下面的代码时,它工作正常:

cosine_similarity([1,2,3,1,1,2,2,2,2,2,2,3,2,2,2,2,2],[1,2,3,1,1,2,2,2,2,2,2,5,2,2,2,2,2])

我不明白为什么从数据框中获取列表时会出现此错误。这两个变量都是一维数组,所以我不确定为什么在传递手动提供的值时 array1、array2 失败。

python python-3.x pandas dataframe numpy
1个回答
0
投票

正如 @EmiOB 在评论中所建议的,问题是 new_df['embedding'][0] 和 text_embedding 的类型。两者都返回像这样的字符串 '[1.0, 1.1 ,1.2]' 而不是像 [1.0, 1.1 ,1.2] 这样的数组

将代码更改为以下

from ast import literal_eval
array1 = literal_eval(text_embedding)
array2 = literal_eval(new_df['embedding'][0])

它返回的是浮点数组而不是之前的字符串。

© www.soinside.com 2019 - 2024. All rights reserved.