在执行StandardScaler时,无法将字符串转换为float:'N'。

问题描述 投票:0回答:1

代码:

num_features = [feature for feature in x.columns if x[feature].dtypes != 'O']    

x[num_features].replace('N',value=0)

from sklearn.preprocessing import StandardScaler
stds = StandardScaler()
x[num_features]= stds.fit(x)

错误:

ValueError: could not convert string to float: 'N' (无法将字符串转换为浮点数)

python scikit-learn preprocessor
1个回答
0
投票

要选择数字列。

numeric_features = df.select_dtypes(include=['numeric'])

你的方法现在不会排除布尔值(True和False),Category值,或者Date和Time。

仅有数值是不够的。我们也可能有缺失的值。我们需要在标准化之前处理它们。

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

# numeric transformer
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

# now before we transform we impute missing values with columns median. 
normalise_numeric_features = numeric_transformer.fit_transform(numeric_feature)

这可能还不够,因为你的数据集中可能有不同的数据类型,每个类型都需要自己的转换。你可以这样做。

import pandas as pd
import numpy as np

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression

# select features types
numeric_features = df.select_dtypes(include=['numeric'])
categorical_features = df.select_dtypes(include=['category'])

# create transformer for each feature types
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

# create a process
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# if you have a model you can add it in a pipeline to
clf = Pipeline(steps=[('preprocessor', preprocessor),
                  ('classifier', LogisticRegression(solver='lbfgs'))])  
# now we can use train the model
clf.fit(df[train_features], df[train_target])

# the fit will take the features, do the preprocessing(impute, normalise etc)  that is fit_transforms, and train your model.

clf.predict(df[test_features])

# predict will just transform your features, using the learned metrics from train_features.

希望这能帮助你充分利用scikit -learn。

© www.soinside.com 2019 - 2024. All rights reserved.