找到对应于类别的一个热编码中分配的二进制矩阵

问题描述 投票:0回答:1

我有熊猫数据框

df
df['downtime]
有这样的输出;

R1
R1
R2
R3
R1
NA

i编码

df['downtime]
如下;

enc_downtime_code = OneHotEncoder()
downtime_code_enc = enc_downtime_code.fit_transform(df['downtime code'].values.reshape(-1, 1)).toarray()

现在我想找到分配给 R1、R2、R3 和 NA 的一个热二进制矩阵。我该怎么做?

pandas scikit-learn one-hot-encoding
1个回答
0
投票

你可以这样做

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({
    'downtime': ['R1', 'R1', 'R2', 'R3', 'R1', 'NA']})

enc_downtime_code = OneHotEncoder()
downtime_code_enc = enc_downtime_code.fit_transform(df['downtime'].values.reshape(-1, 1)).toarray()

feature_names = enc_downtime_code.get_feature_names_out(['downtime'])
encoded_matrix = pd.DataFrame(downtime_code_enc, columns=feature_names)
subset_matrix = encoded_matrix[['downtime_R1', 'downtime_R2', 'downtime_R3', 'downtime_NA']]

print(subset_matrix)

返回

   downtime_R1  downtime_R2  downtime_R3  downtime_NA
0          1.0          0.0          0.0          0.0
1          1.0          0.0          0.0          0.0
2          0.0          1.0          0.0          0.0
3          0.0          0.0          1.0          0.0
4          1.0          0.0          0.0          0.0
5          0.0          0.0          0.0          1.0

© www.soinside.com 2019 - 2024. All rights reserved.