我想知道在情感分析任务的语言模型中为两个词/变量之间的交互建模的最合适方法是什么。例如,在以下数据集中:
You didn't solve my problem,NEU
I never made that purchase,NEU
You never solve my problems,NEG
“解决”和“从不”这两个词,单独来看,没有负面情绪。但是,当它们一起出现时,它们就会出现。正式地说:假设我们有一个特征 «solve»,当单词 «solve» 不存在时取值 0,当单词存在时取 1,另一个特征 «never» 具有相同的逻辑:概率的差异当 «never»=0 和 «never»=1 时,«solve»=0 和 «solve»=1 之间的 Y=NEG 是不同的。
但是基本的逻辑回归(例如,使用
sklearn
)无法处理这种情况。
使用 sklearn.preprocessing.PolynomialFeatures
可以添加交互系数,但这几乎不是最有效的选择。
我不知道你是否解决了你的问题,但这是我如何处理一个类似的问题并检查两种方法的结果:
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import classification_report
data = [("You didn't solve my problem", "NEU"),
("I never made that purchase", "NEU"),
("You never solve my problems", "NEG")]
X_train = [d[0] for d in data]
y_train = [d[1] for d in data]
X_test = ["I solved your problem", "You never helped me"]
vectorizer = CountVectorizer(stop_words='english')
X_train_features = vectorizer.fit_transform(X_train)
feature_names = vectorizer.get_feature_names_out()
X_test_features = vectorizer.transform(X_test)
nn_classifier = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=1)
nn_classifier.fit(X_train_features, y_train)
y_test_pred_nn = nn_classifier.predict(X_test_features)
print("Predicted sentiment using neural network:", y_test_pred_nn)
tree_classifier = DecisionTreeClassifier()
tree_classifier.fit(X_train_features, y_train)
plot_tree(tree_classifier, feature_names=feature_names)
y_test_pred_tree = tree_classifier.predict(X_test_features)
print("Predicted sentiment using decision tree:", y_test_pred_tree)
y_train_pred_nn = nn_classifier.predict(X_train_features)
y_train_pred_tree = tree_classifier.predict(X_train_features)
print("Neural network performance on training data:")
print(classification_report(y_train, y_train_pred_nn))
print("Decision tree performance on training data:")
print(classification_report(y_train, y_train_pred_tree))
返回
Predicted sentiment using neural network: ['NEU' 'NEU']
Predicted sentiment using decision tree: ['NEU' 'NEU']
Neural network performance on training data:
precision recall f1-score support
NEG 1.00 1.00 1.00 1
NEU 1.00 1.00 1.00 2
accuracy 1.00 3
macro avg 1.00 1.00 1.00 3
weighted avg 1.00 1.00 1.00 3
Decision tree performance on training data:
precision recall f1-score support
NEG 1.00 1.00 1.00 1
NEU 1.00 1.00 1.00 2
accuracy 1.00 3
macro avg 1.00 1.00 1.00 3
weighted avg 1.00 1.00 1.00 3
和