Python新手在这里,我想从数据框中提取值到列表中,但我得到了我不需要的额外信息。有一个更好的方法吗:
rating1 = []
rating2 = []
for value in person1["Movie"]:
for value2 in person2["Movie"]:
if value == value2:
rating1.append(person1[person1["Movie"] == value]["Rating"])
rating2.append(person2[person2["Movie"] == value2]["Rating"])
当我打印rating1时,我得到了这个:
print(rating1)
[0 2.5
Name: Rating, dtype: float64, 1 3.5
Name: Rating, dtype: float64, 2 2.5
Name: Rating, dtype: float64, 5 3.0
Name: Rating, dtype: float64, 22 3.5
Name: Rating, dtype: float64, 23 3.0
Name: Rating, dtype: float64]
我的目标只是提取没有索引和其他信息的评级,用于计算曼哈顿和欧几里德距离。像这样的东西:
[2.5, 3.5, 2.5, 3.0, 3.5, 3.0]
我找到了我的问题的答案,这里是供将来参考。使用append方法,将其更改为extend方法,结果正是我想要的。
rating1 = []
rating2 = []
for value in person1["Movie"]:
for value2 in person2["Movie"]:
if value == value2:
rating1.extend(person1[person1["Movie"] == value]["Rating"])
rating2.extend(person2[person2["Movie"] == value2]["Rating"])
print(rating1)
>>>[2.5, 3.5, 2.5, 3.0, 3.5, 3.0]
这样我可以像这样调用Euclidean和Manhattan方法:
from scipy.spatial import distance
r1 = np.array(rating1)
r2 = np.array(rating2)
euclidean = distance.euclidean(r1, r2)
manhattan = distance.cityblock(r1, r2)