所有输出转换都合并为一列: 我的数据框的形状是 (445132, 34) 它减少到 (445132, 1)
分类对象包含除“一般健康”之外的所有分类列 numeric 包含所有数字列
以下是我的代码:
trans = ColumnTransformer(transformers=[
("encoder", OrdinalEncoder(categories=[["Excellent","Very good","Good","Fair","Poor"]]), ["GeneralHealth"]),
("encoder1", OneHotEncoder(drop="first"), categorical),
("scaler", StandardScaler(), numerical)
], remainder="passthrough")
f_transformed = trans.fit_transform(f)
transformed_data = pd.DataFrame(f_transformed, columns=trans.get_feature_names_out())
transformed_data.head(4)
我做了一些补充:
trans = ColumnTransformer(transformers=[
("encoder", OrdinalEncoder(categories=[["Excellent","Very good","Good","Fair","Poor"]]), ["GeneralHealth"]),
("encoder1", OneHotEncoder(drop="first"), categorical),
("scaler", StandardScaler(), numerical)
], remainder="passthrough", verbose_feature_names_out=False)
f_transformed = trans.fit_transform(f)
transformed_data = pd.DataFrame(f_transformed, columns=trans.get_feature_names_out())
transformed_data.head(4)
如果您使用的是
sklearn
版本 1.2
或更新版本,请尝试以下操作:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, StandardScaler
# Define the transformer
trans = ColumnTransformer(transformers=[
("encoder", OrdinalEncoder(categories=[["Excellent", "Very good", "Good", "Fair", "Poor"]]), ["GeneralHealth"]),
("encoder1", OneHotEncoder(drop="first"), categorical),
("scaler", StandardScaler(), numerical)
], remainder="passthrough")
# Set the output of the transformer to a pandas DataFrame
trans.set_output(transform="pandas")
# Fit and transform the data
f_transformed = trans.fit_transform(f)
# Now f_transformed should be a DataFrame with the appropriate column names
transformed_data = f_transformed
# Display the first few rows of the DataFrame
print(transformed_data.head(4))