Scikit 学习迭代输入器：更改和缩放公差

Question

作为学校项目的一部分，我必须在给定的数据库上探索和执行数据分析和机器学习方法。关键是我的数据库非常大（12651 行，810 列）并且包含很多缺失值。我想用 Scikit-learn 的迭代输入器来输入这些值，这就是我得到的：

imp = IterativeImputer(estimator = LinearRegression(), 
                       max_iter=10, 
                       random_state=0, 
                       verbose=2, 
                       n_nearest_features=10, 
                       initial_strategy="most_frequent")

imp.fit(data)

结果：

IterativeImputer] Completing matrix with shape (12651, 810)
[IterativeImputer] Ending imputation round 1/10, elapsed time 8.38
[IterativeImputer] Change: 101844979.96577276, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 2/10, elapsed time 18.76
[IterativeImputer] Change: 633298988.0233588, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 3/10, elapsed time 28.81
[IterativeImputer] Change: 591554347.9059296, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 4/10, elapsed time 37.43
[IterativeImputer] Change: 1289773197.9995384, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 5/10, elapsed time 46.58
[IterativeImputer] Change: 1291562921.1247401, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 6/10, elapsed time 54.32
[IterativeImputer] Change: 32943821.50762498, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 7/10, elapsed time 64.13
[IterativeImputer] Change: 58342050.73579848, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 8/10, elapsed time 73.44
[IterativeImputer] Change: 1559818227.7418892, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 9/10, elapsed time 81.46
[IterativeImputer] Change: 164792431487.71582, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 10/10, elapsed time 90.07
[IterativeImputer] Change: 13045775634991.55, scaled tolerance: 208867.02000000002
/usr/local/lib/python3.9/dist-packages/sklearn/impute/_iterative.py:785: ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached.
  warnings.warn(

我不知道我是否可以做任何事情让它收敛？谢谢！

P.s：我没有提到的一件事是，许多列实际上代表分类变量（但由于 NaN 值，pandas 将这些列转换为 float64）。

我尝试了什么：将估算器从 BayesianRidge 更改为 LinearRegressor，设置 n_nearest_features = 10 和 20，设置 initial_strategy="most_frequent" 而不是“mean”。它似乎也不起作用：/

Scikit 学习迭代输入器：更改和缩放公差

问题描述投票：0回答：0

最新问题

Scikit 学习迭代输入器：更改和缩放公差

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0