在实现一个热编码时面临的ValueError。

Question

下面是一个热点编码的实现，每个栏目下有很多分类。

Ttop_10_LMR = [x for x in tdata.Loan_Amount_Requested.value_counts().sort_values(ascending=False).head(10).index]

Ttop_10_LMR 变量给出了Loan_Amount_Requested列中的前10个频繁值。

def one_hot_top_x(df, variable, top_x_labels):
    for label in top_x_labels:
        df[variable+'_'+str(label)] = np.where(data[variable]==label, 1, 0)

一个热门的top_x func将向数据通过将top_x_labels替换为1和0来实现Dataframe。

但是当我运行下面的代码时。

one_hot_top_x(tdata, 'Loan_Amount_Requested', Ttop_10_LMR)

获取为ValueError。值的长度与索引的长度不匹配

谢谢。

Answer 1

df[variable+'_'+str(label)] = np.where(tdata[variable]==label, 1, 0)

而不是以数据形式给出的tdata:)

在实现一个热编码时面临的ValueError。

问题描述投票：0回答：1

1个回答

最新问题

在实现一个热编码时面临的ValueError。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1