找到返回最高精度的阈值

Question

我有这样的数据：

(26.5625,0)
(29.5625,0)
(30.390625,0)
(18.640625,0)
(27.984375,0)
(26.984375,0)
(25.703125,0)
(25.78125,0)
(32.09375,0)
(25.59375,0)
(27.703125,0)
(30.828125,0)
(23.578125,0)
(21.890625,0)
(25.734375,0)
(24.65625,0)
(27.46875,0)
(31.640625,0)
(26.53125,0)
(25.078125,0)
(30.65625,0)
(24.515625,0)
(25.21875,0)
(21.78125,0)
(28.984375,0)
(29.765625,0)
(27.171875,1)
(30.46875,1)
(35.3125,1)
(27.90625,1)
(34.9375,1)
(33.4375,1)
(30.90625,1)
(31.671875,1)
(32.40625,1)
(26.078125,1)
(31.171875,1)
(36.21875,1)
(35.0625,1)
(35.65625,1)
(36.65625,1)
(37.96875,1)
(31.953125,1)
(33.15625,1)
(37.34375,1)

对应精度的排序为：

ordered_labels: [1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

average precision: 0.7338

我试图找到返回最高精度的阈值（例如 27.0）（在这种情况下为 0.7338）。我尝试过逻辑回归，但返回的阈值为“0.7”，而不是 27.0 等数字。对于此类数据，我应该使用线性回归还是支持向量机？

我的输出：（代码如下）

Precision: [0.33333333 0.         0.         1.        ]
Recall: [1. 0. 0. 0.]
Threshold: [0.13154558 0.7006058  0.72969373]

这是我正在使用的代码：

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve
from ast import literal_eval

# Create a simple dataset
scores_labels_path = 'data.txt'
X, y = [], []
with open(scores_labels_path) as file:
    for line in file:
        line = literal_eval(line.rstrip())
        X.append(line[0])
        y.append(line[1])

X = np.array(X).reshape(-1, 1)
y = np.array(y)
# X1, y1 = make_classification(n_samples=1000, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=7)
lr = LogisticRegression(random_state=42)
lr.fit(X_train, y_train)
y_scores = lr.predict_proba(X_test)
precision, recall, threshold = precision_recall_curve(y_test, y_scores[:, 1])

print("Precision: {}".format(precision))
print("Recall: {}".format(recall))
print("Threshold: {}".format(threshold))

Answer 1

问题是您正在使用逻辑回归模型，一种概率模型。您获得的阈值不是您的输入特征，而是逻辑回归模型输出的概率。

无需机器学习模型即可实现：

<=

 的点分类到当前值作为

class 0 并将所有具有特征值 >

 当前值的点分类为

class 1 的精度。

实施：

import numpy as np
import pandas as pd
data = pd.read_csv('data.txt', header=None, names=['feature', 'label'])
data = data.sort_values('feature')
precisions= []
threshold = data['feature'].unique()
for threshold in thresholds:
    predicted_labels = np.where(data['feature'] <= threshold, 0, 1)
    tp = np.sum((predicted_labels == 1) & (data['label'] == 1))
    fp = np.sum((predicted_labels == 1) & (data['label'] == 0))
    precision = tp / (tp + fp) 
    precisions.append(precision)
max_precision_index = np.argmax(precisions)
best_threshold = thresholds[max_precision_index]
print("Best threshold: {}".format(best_threshold))

找到返回最高精度的阈值

问题描述投票：0回答：1

1个回答

最新问题

找到返回最高精度的阈值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1