如何从评估模型性能（BinaryClassificationEvaluator）中提取错误分类的标签？

Question

我目前正在使用句子转换器来查找 2 个句子之间的相似性，并且我有 1 或 0（相似、不相似）的标记数据。训练我自己的模型后，我可以在开发/测试数据集上评估模型性能，如下所示。

dev_samples = []
for index, row in df.iterrows():
    input_example = InputExample(texts=[row['Sent_1'], row['Sent_2']], label=row['Label_0_1'])
    dev_samples.append(input_example)    

model = SentenceTransformer(bi_encoder_model_save_path)
dev_evaluator = BinaryClassificationEvaluator.from_input_examples(dev_samples, name="dev_sample")

# CSV file with performance result is exported in the model folder
dev_evaluator(model, output_path=f'''{cwd}''')

在上面的例子中，我使用的是

BinaryClassificationEvaluator.from_input_examples(dev_samples, name="dev_sample")

结果将保存到具有特定列的 csv 文件，例如：

如何识别错误分类的标签或将其提取到混淆矩阵中？我是否应该使用每个输入示例的

encode()

计算分数，并确定分数是否 > 50，是否相似 (1) 以及更少，是否不相似 (0)

https://www.sbert.net/docs/usage/semantic_textual_similarity.html

Github代码

Answer 1

根据我对“语义文本相似性”的理解，您需要将预测标签与开发（dev）数据集中每个示例的真实标签进行比较。

BinaryClassificationEvaluator

不会直接输出错误分类的示例（如

UKPLab/sentence-transformers

问题1516所示），因此您需要使用

encode()

方法手动计算预测，然后将这些预测与真实标签进行比较。

```
encode()
```
方法将对每个
```
InputExample
```
中的两个句子进行编码，然后计算相似度得分。相似度可以使用两个句子的嵌入之间的余弦相似度来计算。
根据相似度得分，确定预测标签。您提到使用 50% 的阈值（如果您的相似度分数已标准化，则在标准化 [0, 1] 范围内使用 0.5）。高于此阈值的标签可被视为相似 (1)，低于此阈值的标签可被视为不相似 (0)。
对于每个
```
InputExample
```
，将预测标签与真实标签（
```
label
```
的
```
InputExample
```
属性）进行比较。
根据比较，统计真阳性、假阳性、真阴性和假阴性以构建混淆矩阵。

举个例子：

from sentence_transformers import SentenceTransformer, util
import numpy as np

# Assume model is already loaded
model = SentenceTransformer(bi_encoder_model_save_path)

# Encode sentences and calculate similarity scores
predictions = []
for example in dev_samples:
    embeddings = model.encode(example.texts)
    similarity_score = util.pytorch_cos_sim(embeddings[0], embeddings[1])
    predicted_label = 1 if similarity_score >= 0.5 else 0
    predictions.append((predicted_label, example.label))

# Construct a confusion matrix
true_positives = sum(1 for pred, true in predictions if pred == true == 1)
false_positives = sum(1 for pred, true in predictions if pred == 1 and true == 0)
true_negatives = sum(1 for pred, true in predictions if pred == true == 0)
false_negatives = sum(1 for pred, true in predictions if pred == 0 and true == 1)

confusion_matrix = np.array([[true_positives, false_positives],
                             [false_negatives, true_negatives]])

print("Confusion Matrix:")
print(confusion_matrix)

如何从评估模型性能（BinaryClassificationEvaluator）中提取错误分类的标签？

问题描述投票：0回答：1

1个回答

最新问题

如何从评估模型性能（BinaryClassificationEvaluator）中提取错误分类的标签？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1