无法执行ColumnTransformer函数

问题描述 投票:0回答:1

我正在尝试预处理数据。我填补了缺失的价值观。但是,当我尝试将分类数据编码为整数时,X数据集已正确编码,但是在y列中出现错误。到目前为止,关于该主题的文章还很少。请帮助。

  • 请检查附带的错误图像,原始数据集,并且图像中存在错误。
  • 原始数据集:
   Country   Age   Salary Purchased
0   France  44.0  72000.0        No
1    Spain  27.0  48000.0       Yes
2  Germany  30.0  54000.0        No
3    Spain  38.0  61000.0        No
4  Germany  40.0      NaN       Yes
5   France  35.0  58000.0       Yes
6    Spain   NaN  52000.0        No
7   France  48.0  79000.0       Yes
8  Germany  50.0  83000.0        No
9   France  37.0  67000.0       Yes
  • Python代码:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of missing data
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
imputer = imputer.fit(x[:, 1:3])
x[:, 1:3] = imputer.transform(x[:, 1:3])

# Encoding categorical data
# Encoding the Independent Variable
#from sklearn.preprocessing import LabelEncoder 
from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder()

ct = ColumnTransformer(
    [('one_hot_encoder', ohe, [0])],
    remainder='passthrough'
)

print(dataset)
x = np.array(ct.fit_transform(x), dtype=np.int)
y = np.array(ct.fit_transform(y), dtype=np.int)```

[error image][1]


  [1]: https://i.stack.imgur.com/YPR66.png
python pandas scikit-learn
1个回答
0
投票

y是您的目标变量,即您要预测的变量。这是一维数组,如果调用y.shape,则会得到

>>>y.shape
(10,)

这就是为什么您可能出现索引错误-y.shape[1]越界。

您不应该对目标变量进行一次热编码,而是对其进行目标编码。也就是说,将最后一行替换为:

y = pd.Categorical(y).codes

然后y将是

array([0, 1, 0, 0, 1, 1, 0, 1, 0, 1], dtype=int8)

0对应于“未购买”,1对应于“购买”

© www.soinside.com 2019 - 2024. All rights reserved.