熊猫 get_dummies 改变形状

问题描述 投票:0回答:1

我想对 Pandas 数据帧的分类特征进行 one-hot 编码。以前,值存储形状为 (60,) 的变量。请参阅下面的代码:

ohe_features = ["Gender", "Married", "Self_Employed"]
num_features = ["Dependents"]

df = pd.get_dummies(df, columns=ohe_features, dtype=int)

调用

get_dummies
后,
df
现在具有以下形状的列:

Column 'Gender_Female' has shape (60, 2)
Column 'Gender_Male' has shape (60, 2)
Column 'Married_No' has shape (60, 2)
Column 'Married_Yes' has shape (60, 2)
Column 'Self_Employed_No' has shape (60, 2)
Column 'Self_Employed_Yes' has shape (60, 2)

如何在不改变特征原始维度的情况下对分类变量进行编码?

可重现示例:

Dependents  Gender  Married Self_Employed
0          Female  Yes      No
python python-3.x pandas dataframe
1个回答
0
投票

如果你想要特征的原始尺寸,你需要sklearn预处理:LabelEncoder()。但是,您需要知道 LabelEncoder() 和 pandas get_dummies 之间有什么区别:

LabelEncoder() 示例:

 import pandas as pd
 import numpy as np
 from sklearn.preprocessing import LabelEncoder

 bridge_types = ('Arch','Beam','Truss','Cantilever','Tied 
 Arch','Suspension','Cable')
 df = pd.DataFrame(bridge_types, columns=['Bridge_Types'])

 labelencoder = LabelEncoder()
 df['Bridge_Types_Cat'] = labelencoder.fit_transform(df['Bridge_Types'])
 df

更多信息:链接

© www.soinside.com 2019 - 2024. All rights reserved.