Scikit 学习迭代输入器:更改和缩放公差

问题描述 投票:0回答:0

作为学校项目的一部分,我必须在给定的数据库上探索和执行数据分析和机器学习方法。关键是我的数据库非常大(12651 行,810 列)并且包含很多缺失值。 我想用 Scikit-learn 的迭代输入器来输入这些值,这就是我得到的:

imp = IterativeImputer(estimator = LinearRegression(), 
                       max_iter=10, 
                       random_state=0, 
                       verbose=2, 
                       n_nearest_features=10, 
                       initial_strategy="most_frequent")

imp.fit(data)

结果:

IterativeImputer] Completing matrix with shape (12651, 810)
[IterativeImputer] Ending imputation round 1/10, elapsed time 8.38
[IterativeImputer] Change: 101844979.96577276, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 2/10, elapsed time 18.76
[IterativeImputer] Change: 633298988.0233588, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 3/10, elapsed time 28.81
[IterativeImputer] Change: 591554347.9059296, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 4/10, elapsed time 37.43
[IterativeImputer] Change: 1289773197.9995384, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 5/10, elapsed time 46.58
[IterativeImputer] Change: 1291562921.1247401, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 6/10, elapsed time 54.32
[IterativeImputer] Change: 32943821.50762498, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 7/10, elapsed time 64.13
[IterativeImputer] Change: 58342050.73579848, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 8/10, elapsed time 73.44
[IterativeImputer] Change: 1559818227.7418892, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 9/10, elapsed time 81.46
[IterativeImputer] Change: 164792431487.71582, scaled tolerance: 208867.02000000002 
[IterativeImputer] Ending imputation round 10/10, elapsed time 90.07
[IterativeImputer] Change: 13045775634991.55, scaled tolerance: 208867.02000000002
/usr/local/lib/python3.9/dist-packages/sklearn/impute/_iterative.py:785: ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached.
  warnings.warn(

我不知道我是否可以做任何事情让它收敛? 谢谢!

P.s:我没有提到的一件事是,许多列实际上代表分类变量(但由于 NaN 值,pandas 将这些列转换为 float64)。

我尝试了什么:将估算器从 BayesianRidge 更改为 LinearRegressor,设置 n_nearest_features = 10 和 20,设置 initial_strategy="most_frequent" 而不是“mean”。它似乎也不起作用:/

python scikit-learn missing-data
© www.soinside.com 2019 - 2024. All rights reserved.