如何明智地比较两个不同形状的 numpy nd 数组元素

问题描述 投票:0回答:0

我正在学习区分垃圾邮件和非垃圾邮件的代码。我已经完成了训练数据的部分。在处理数据的测试时,我不得不比较预测和测试数据数组,我遇到了一个错误,所以我构建了两个不同的代码。但是这两种代码都产生了不同的输出。谁能帮我知道哪个代码更好更准确,还有没有其他简单的方法。

错误状态:

DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  correct_docs = (y_test==prediction)

我尝试了以下代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#file address

TOKEN_SPAM_PROB_FILE="SpamData/03_Testing/prob-spam.txt"
TOKEN_NONSPAM_PROB_FILE="SpamData/03_Testing/prob-nonspam.txt"
TOKEN_ALL_PROB_FILE="SpamData/03_Testing/prob-all-tokens.txt"

TEST_FEATURE_MATRIX="SpamData/03_Testing/test-features.txt"
TEST_TARGET_FILE="SpamData/03_Testing/test-target.txt"



VOCAB_SIZE=2500

#features
x_test=np.loadtxt(TEST_FEATURE_MATRIX, delimiter=" ")
#target
y_test=np.loadtxt(TEST_TARGET_FILE, delimiter=" ")
#token probabilitis
prob_token_spam=np.loadtxt(TOKEN_SPAM_PROB_FILE, delimiter=" ")
prob_token_nonspam=np.loadtxt(TOKEN_NONSPAM_PROB_FILE, delimiter=" ")
prob_all_token=np.loadtxt(TOKEN_ALL_PROB_FILE, delimiter=" ")

PROB_SPAM=0.3116

joint_log_spam=x_test.dot(np.log(prob_token_spam) - np.log(prob_all_token)) + np.log(PROB_SPAM)

joint_log_nonspam=x_test.dot(np.log(prob_token_nonspam) - np.log(prob_all_token)) + np.log(1-PROB_SPAM)


prediction=joint_log_spam > joint_log_nonspam

#simplification

joint_log_spam=x_test.dot(np.log(prob_token_spam)) + np.log(PROB_SPAM)

joint_log_nonspam=x_test.dot(np.log(prob_token_nonspam)) + np.log(r_1-PROB_SPAM)

#number of correct documents

correct_docs = (y_test==prediction)

# I want to use the following sum command as well

correct_docs = (y_test==prediction).sum()

然后我使用了以下两个代码,但得到了不同的输出

#Code 1

#numnber of correct documents

correct_docs=y_test[:len(prediction)]==prediction[:len(prediction)]

print("Length of correct_docs is:", len(correct_docs))

print("Docs Classified correctly are:", correct_docs)

numbdocs_wrong=x_test.shape[0]-correct_docs

print("Docs classified incorrectly are:", numbdocs_wrong)

代码 2

#Code 2

#numnber of correct documents

nr_correct_doc=[np.where(y_test==x)[0][0] for x in prediction]
# print(correct_doc)

total=0
for i in correct_doc:
    if i!=0:
        total+=1
# np.digitize(y_test, prediction)
print(total)
correct_doc_total=total

correct_docs=correct_doc_total

print("Docs Classified correctly are:", correct_docs)
numbdocs_wrong=x_test.shape[0]-correct_docs
print("Docs classified incorrectly are:", numbdocs_wrong)

所有文件的所有文件夹的链接是:https://drive.google.com/drive/folders/15M7-VcUZw7gkLWxlJ8MDKLm6muYIREoT?usp=share_link

pandas numpy-ndarray array-comparison
© www.soinside.com 2019 - 2024. All rights reserved.