如何在使用 LabelEncoder + Iterative Imputer 进行插补后取回分类数据?

问题描述 投票:0回答:0

我正在尝试为分类数据列估算缺失值,我已成功估算它们但现在我想将它们改回分类,该怎么做? 我用过 labelencoder 和 iterativeImputer

我做到了,

import numpy as np
    import pandas as pd
    from sklearn.preprocessing import LabelEncoder
    from sklearn.experimental import enable_iterative_imputer
    from sklearn.impute import IterativeImputer
    from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
    
    df = pd.read_csv("/kaggle/input/viintage-analysis/dwd.csv")
    categorical = ['OCCUPATION_TYPE']
    
    df[categorical] = df[categorical].apply(lambda series: pd.Series(
        LabelEncoder().fit_transform(series[series.notnull()]),
        index=series[series.notnull()].index
    ))
    
    print(df)
    imp_cat = IterativeImputer(estimator=RandomForestClassifier(), 
                               initial_strategy='most_frequent',
                               max_iter=10, random_state=0)
    
    df[categorical] = imp_cat.fit_transform(df[categorical])
    #df[categorical] = imp_cat.transform(df[categorical])
    
    print(df)

输出是数字 [imputer 的输出 我想将它们转换回分类,该怎么做? (我试过 inverse_transform 但没有用,在尝试 KNNImputer 时也遇到了类似的问题)

machine-learning data-science data-cleaning missing-data imputation
© www.soinside.com 2019 - 2024. All rights reserved.