随机森林中的分类

问题描述 投票:0回答:0

如何在随机森林中进行包含缺失值的分类?我的计划是在不先进行数据预处理过程的情况下进行分类。我很难在不估算和丢弃缺失值的情况下进行包含缺失值的分类。有谁知道随机森林算法如何使用 Python 编程语言处理缺失数据?

当我运行代码时,结果是

ValueError                                Traceback (most recent call last)
<ipython-input-22-1ef59d3c2482> in <cell line: 2>()
      1 rf = RandomForestClassifier(n_estimators=9, max_features='sqrt',max_depth=None, random_state=42)
----> 2 rf.fit(X_balanced, y_balanced)

4 frames
/usr/local/lib/python3.9/dist-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    159                 "#estimators-that-handle-nan-values"
    160             )
--> 161         raise ValueError(msg_err)
    162 
    163 

ValueError: Input X contains NaN.
RandomForestClassifier does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

我的希望是能够用包含缺失值的数据集进行分类,而不会在随机森林中丢弃和插补缺失值。

python classification random-forest missing-data
© www.soinside.com 2019 - 2024. All rights reserved.