我有熊猫数据框
df
。 df['downtime]
有这样的输出;
R1
R1
R2
R3
R1
NA
i编码
df['downtime]
如下;
enc_downtime_code = OneHotEncoder()
downtime_code_enc = enc_downtime_code.fit_transform(df['downtime code'].values.reshape(-1, 1)).toarray()
现在我想找到分配给 R1、R2、R3 和 NA 的一个热二进制矩阵。我该怎么做?
你可以这样做
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
df = pd.DataFrame({
'downtime': ['R1', 'R1', 'R2', 'R3', 'R1', 'NA']})
enc_downtime_code = OneHotEncoder()
downtime_code_enc = enc_downtime_code.fit_transform(df['downtime'].values.reshape(-1, 1)).toarray()
feature_names = enc_downtime_code.get_feature_names_out(['downtime'])
encoded_matrix = pd.DataFrame(downtime_code_enc, columns=feature_names)
subset_matrix = encoded_matrix[['downtime_R1', 'downtime_R2', 'downtime_R3', 'downtime_NA']]
print(subset_matrix)
返回
downtime_R1 downtime_R2 downtime_R3 downtime_NA
0 1.0 0.0 0.0 0.0
1 1.0 0.0 0.0 0.0
2 0.0 1.0 0.0 0.0
3 0.0 0.0 1.0 0.0
4 1.0 0.0 0.0 0.0
5 0.0 0.0 0.0 1.0