从多类分类算法输出前2类

问题描述 投票:0回答:1

我正在研究文本的多类分类问题,其中我有很多不同的类(超过15个)。我已经训练了Linearsvc svm方法(方法只是示例)。但是它只输出最高概率的单个类别,有没有一种算法可以同时输出两个类别的方法

我正在使用的示例代码:

    from sklearn.svm import LinearSVC
    import matplotlib.pyplot as plt
    from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer
    count_vect = CountVectorizer( max_df=.9,min_df=.002,  encoding='latin-1', ngram_range=(1, 3))
    X_train_counts = count_vect.fit_transform(df_upsampled['text'])
    tfidf_transformer = TfidfTransformer(sublinear_tf=True,norm='l2')
    X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
    clf = LinearSVC().fit(X_train_tfidf, df_upsampled['reason'])
    y_pred = model.predict(X_test)

当前输出:

    source  user   time    text         reason
0   hi      neha    0      0:neha:hi       1
1   there   ram     1      1:ram:there     1
2   ball    neha    2      2:neha:ball     3
3   item    neha    3      3:neha:item     6
4   go there ram    4      4:ram:go there  7
5   kk       ram    5      5:ram:kk        1
6   hshs    neha    6      6:neha:hshs     2
7   ggsgs   neha    7      7:neha:ggsgs    15

期望的输出:

    source  user   time    text         reason  reason2
0   hi      neha    0      0:neha:hi       1      2
1   there   ram     1      1:ram:there     1      6
2   ball    neha    2      2:neha:ball     3      7
3   item    neha    3      3:neha:item     6      4
4   go there ram    4      4:ram:go there  7      9
5   kk       ram    5      5:ram:kk        1      2
6   hshs    neha    6      6:neha:hshs     2      3
7   ggsgs   neha    7      7:neha:ggsgs    15     1

如果我只输出一列就可以了,因为我可以拆分并从中分成两列,这没关系。

python-3.x scikit-learn text-classification multiclass-classification
1个回答
0
投票

[linearSVC有一个称为decision_function的方法,该方法给出了各个类别的能力分数:]

样本的置信度分数是该样本的有符号距离采样到超平面。

具有三类数据集的示例:

decision_function
© www.soinside.com 2019 - 2024. All rights reserved.