永远运行的半监督svm模型

问题描述 投票:0回答:0

我正在试验 Elliptic 比特币数据集,并尝试检查数据集在监督和半监督模型上的性能。这是我的 supervised SVM 模型的代码:

classified = class_features_df[class_features_df['class'].isin(['1','2'])]

X = classified.drop(columns=['txId', 'class', 'time step']) 
y = classified[['class']]

# in this case, class 2 corresponds to licit transactions, we change this to 0 as our interest is the illicit transactions
y = y['class'].apply(lambda x: 0 if x == '2' else 1 )

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=15, shuffle=False)

model_svm = svm.SVC(kernel='linear') # Linear Kernel

model.fit(X_train, Y_train)

#find accuracy score
y_pred = model.predict(X_test)
acc = accuracy_score(Y_test, y_pred)

上面的代码运行良好并给出了良好的结果,但是当为 半监督学习 尝试相同的代码时,我收到警告并且我的模型已经运行了一个多小时(而它在不到一分钟的时间内运行了监督学习)

X_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=0.30, random_state=1, stratify=y_train)

unclassified = class_features_df[class_features_df['class'] == 3]

X_unclassified = unclassified[local_features_col + agg_features_col]

predictions = model_svm.predict(X_unclassified.values)


unclassified['class'] = predictions

# Combine the labeled and newly labeled unlabeled data
classified = classified.append(unclassified)


X = classified.drop(columns=['txId', 'class', 'time step'])
y = classified['class'].astype('int') # astype('int added to remove "'<' not supported between instances of 'int' and 'str' svm)" error)

model_svm.fit(X, y)

# Evaluate the model on the test set
y_pred = model_svm.predict(X_test_unlab)
acc = accuracy_score(y_test_unlab, y_pred)
print("Accuracy " , acc)

附加信息:值为 1 和 2 的类是标记的交易,值 3 的类是未标记或未分类的交易。这是数据集前 5 个值的图片:

我的半监督实施会出错吗?或者缺少任何值?任何代码帮助将不胜感激。

python machine-learning jupyter-notebook svm semisupervised-learning
© www.soinside.com 2019 - 2024. All rights reserved.