如何在sklearn KNN .fit()方法中使用字符串和浮点数据类型

问题描述 投票:0回答:1

我有一个包含字符串和浮点数据类型的数据集,我想用该数据集训练我的 KNN 模型,但它给出一个 ValueError 说“无法将字符串转换为浮点”

inputs=data.drop(['HeartDisease'],'columns')
output=data.drop(['Age', 'Sex', 'ChestPainType', 'RestingBP', 'Cholesterol', 'FastingBS', 'RestingECG', 'MaxHR', 'ExerciseAngina', 'Oldpeak', 'ST_Slope'],'columns')

import sklearn
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(inputs,output,train_size=0.8)

from sklearn.neighbors import KNeighborsClassifier
model=KNeighborsClassifier(n_neighbors=31)
model.fit(x_train,y_train)

我还附上了数据集的图像..

我期望模型能够使用特定的数据集进行训练

python pandas machine-learning scikit-learn knn
1个回答
0
投票

在每个机器学习模型中,您不能按原样使用数据字符串。您必须预处理输入以将其转换为数字类型。除了自然语言处理之外,您可能有一些不同的文本值(分类特征)。

'ChestPainType'
列为例,您应该只有 4 个值:
['ATA', 'NAP', 'ASY', 'TA']
。现在您必须将此字符串转换为数字: 'ATA': 0, 'NAP': 1, 'ASY': 2, 'TA': 3。在 Pandas 中,您可以使用
pd.factorize
pd.get_dummies
这样做,但如果您使用
sklearn
,请尝试
LabelEncoder
(特别是需要时使用
y
目标)或
OneHotEncoder
(有时
OrdinalEncoder
)。

最简单的方法是使用

ColumnTransformer

可重现的示例:

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix

# https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction
data = pd.read_csv('heart.csv')

features = data.drop(columns=['HeartDisease'])
target = df['HeartDisease']

# Text features to convert as numeric. 'M': [1, 0], 'F': [0, 1]
feat_cols = ['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope']

ct = ColumnTransformer(
    transformers=[('le', OrdinalEncoder(), feat_cols)],
    remainder='passthrough'
)

# Convert your data as numeric values
X = ct.fit_transform(features)
y = np.stack(target.values)

# Create 2 datasets for train and test
X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8)

# Missing step, use `StandardScaler` to normalize numeric values

# Train your model
model = KNeighborsClassifier(n_neighbors=31)
model.fit(X_train, y_train)

# Evaluate your model (63% here)
model.score(X_test, y_test)
© www.soinside.com 2019 - 2024. All rights reserved.