随机森林 pred_proba 输出四舍五入值

Question

我在 scikit learn 中使用随机森林进行分类和获取类概率，我使用了 pred_proba 函数。但它输出的概率四舍五入到小数点后一位

我尝试使用示例虹膜数据集

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Categorical(iris.target, iris.target_names)
df.head()

train, test = df[df['is_train']==True], df[df['is_train']==False]

features = df.columns[:4]
clf = RandomForestClassifier(n_jobs=2)
y, _ = pd.factorize(train['species'])
clf.fit(train[features], y)
clf.predict_proba(train[features])

输出概率

   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  0.8,  0.2],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],

这是默认输出吗？可以增加小数位数吗？

注：找到了解决方案。默认编号增加树数后，树数=10。树数达到百时，概率的精度提高了。

Answer 1

显然有十棵树的默认设置，您在代码中使用默认设置：

Parameters: 
n_estimators : integer, optional (default=10)
The number of trees in the forest.

尝试这样的操作，将树的数量增加到 25 棵或大于 10 的数量：

RandomForestClassifier(n_estimators=25, n_jobs=2)

如果您只是获得 10 个默认树的投票比例，这很可能会导致您看到的概率

您可能会遇到问题，因为鸢尾花数据集非常小。如果我没记错的话，少于 200 个观察值。

predict.proba() 的文档内容如下：

The predicted class probabilities of an input sample is computed as the
mean predicted class probabilities of the trees in the forest. The class
probability of a single tree is the fraction of samples of the same 
class in a leaf.

我在文档中找不到任何参数来调整预测概率的小数精度。

随机森林 pred_proba 输出四舍五入值

问题描述投票：0回答：1

1个回答

最新问题

随机森林 pred_proba 输出四舍五入值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1