情感分析任务中单词之间的模型交互

问题描述 投票:0回答:1

我想知道在情感分析任务的语言模型中为两个词/变量之间的交互建模的最合适方法是什么。例如,在以下数据集中:

You didn't solve my problem,NEU
I never made that purchase,NEU
You never solve my problems,NEG

“解决”和“从不”这两个词,单独来看,没有负面情绪。但是,当它们一起出现时,它们就会出现。正式地说:假设我们有一个特征 «solve»,当单词 «solve» 不存在时取值 0,当单词存在时取 1,另一个特征 «never» 具有相同的逻辑:概率的差异当 «never»=0 和 «never»=1 时,«solve»=0 和 «solve»=1 之间的 Y=NEG 是不同的。

但是基本的逻辑回归(例如,使用

sklearn
)无法处理这种情况。 使用
sklearn.preprocessing.PolynomialFeatures
可以添加交互系数,但这几乎不是最有效的选择。

nlp statistics logistic-regression sentiment-analysis
1个回答
0
投票

我不知道你是否解决了你的问题,但这是我如何处理一个类似的问题并检查两种方法的结果:

from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import classification_report

data = [("You didn't solve my problem", "NEU"),
        ("I never made that purchase", "NEU"),
        ("You never solve my problems", "NEG")]

X_train = [d[0] for d in data]
y_train = [d[1] for d in data]
X_test = ["I solved your problem", "You never helped me"]

vectorizer = CountVectorizer(stop_words='english')
X_train_features = vectorizer.fit_transform(X_train)

feature_names = vectorizer.get_feature_names_out()

X_test_features = vectorizer.transform(X_test)

nn_classifier = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=1)
nn_classifier.fit(X_train_features, y_train)

y_test_pred_nn = nn_classifier.predict(X_test_features)

print("Predicted sentiment using neural network:", y_test_pred_nn)

tree_classifier = DecisionTreeClassifier()
tree_classifier.fit(X_train_features, y_train)

plot_tree(tree_classifier, feature_names=feature_names)

y_test_pred_tree = tree_classifier.predict(X_test_features)

print("Predicted sentiment using decision tree:", y_test_pred_tree)

y_train_pred_nn = nn_classifier.predict(X_train_features)
y_train_pred_tree = tree_classifier.predict(X_train_features)
print("Neural network performance on training data:")
print(classification_report(y_train, y_train_pred_nn))
print("Decision tree performance on training data:")
print(classification_report(y_train, y_train_pred_tree))

返回

Predicted sentiment using neural network: ['NEU' 'NEU']
Predicted sentiment using decision tree: ['NEU' 'NEU']
Neural network performance on training data:
              precision    recall  f1-score   support

         NEG       1.00      1.00      1.00         1
         NEU       1.00      1.00      1.00         2

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3

Decision tree performance on training data:
              precision    recall  f1-score   support

         NEG       1.00      1.00      1.00         1
         NEU       1.00      1.00      1.00         2

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3

© www.soinside.com 2019 - 2024. All rights reserved.