为二进制分类计算roc_curve的阈值

Question

该问题与以下链接中提到的问题类似，请阅读以供参考。

How does sklearn calculate the area under the roc curve for two binary inputs?

我知道sklearn.metrics._binary_clf_curve中正在发生一切。

但是对于二进制分类，如何在所述函数中计算/确定多个阈值。该函数返回y_score[threshold_idxs]作为绘制roc_curve的阈值，我无法理解y_score[threshold_idxs]的计算以及为什么将其作为阈值。

Answer 1

让我们使用scikit-learn 0.22.2 documentation作为指南针来理解功能的每个组成部分以及最终结果。

功能：

sklearn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)

“ active” parameters（如果使用默认调用：
- y_true：数组，形状= [n_samples]，真二进制标签。
- y_score：数组，形状= [n_samples]。目标分数可以是肯定类别的概率估计值，置信度值或决策的非阈值度量]
- [drop_intermediate：布尔值，可选（默认值= True），是否降低一些在绘制的ROC曲线上不会出现的次优阈值。

输出

：

fpr：数组，形状= [> 2]，增加误报率，使得元素i是得分> =阈值[i]的预测的误报率。
tpr：数组，形状= [> 2]，增加真实肯定率，使得元素i是得分> =阈值[i]的预测的真实肯定率。
thresholds：数组，形状= [n_thresholds]，用于计算fpr和tpr的决策函数的阈值递减

[现在，考虑roc_curve()的代码，它调用函数roc_curve()，在经过适当的操作和排序后，它将计算：]]

_binary_clf_curve()

这些行的解释在评论中：

_binary_clf_curve()通常具有许多绑定值。在这里，我们提取与不同值关联的索引。我们还连接了曲线末端的值。

然后，它计算：

distinct_value_indices = np.where(np.diff(y_score))[0]
threshold_idxs = np.r_[distinct_value_indices, y_true.size - 1]
并返回：
y_score
之后，返回主功能tps = stable_cumsum(y_true * weight)[threshold_idxs]
fps = 1 + threshold_idxs - tps
，如果是return fps, tps, y_score[threshold_idxs]
，则返回>]
尝试删除与之间的点对应的阈值与其他点共线。

roc_curve()和“新”值是：
if drop_intermediate and len(fps) > 2:此后您可以看到其他操作，但是核心是我上面突出显示的内容。

为二进制分类计算roc_curve的阈值

问题描述投票：0回答：1

1个回答

最新问题

为二进制分类计算roc_curve的阈值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1