我试图手动编写一个核函数并使用支持向量机进行分类,这是为了使用这个手动核。我的数据集为 X,标签为 y。我简单地定义了一个核函数,并用它来拟合训练数据集。但需要无限的时间才能给出结果。
你能给我任何线索吗?
我还有以下问题:
我尝试了以下代码: # 将数据集分为训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) def my_kernel(x,z): 返回 sqrt(exp(exp(z)))
clf = SVC(kernel=my_kernel)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
from sklearn import metrics
# Model Accuracy: how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
这段简单的代码只是无限地运行,没有任何结果。 另外,如果我采用 X_train 使得 X_train 中的行数!= X_train 中的列数,我会收到以下错误:
ValueError Traceback (most recent call last)
Cell In[4], line 26
23 return sqrt(exp(exp(z)))
25 clf = SVC(kernel=my_kernel)
---> 26 clf.fit(X_train, y_train)
27 y_pred = clf.predict(X_test)
28 from sklearn import metrics
File ~\anaconda3\lib\site-packages\sklearn\svm\_base.py:252, in BaseLibSVM.fit(self, X, y, sample_weight)
249 print("[LibSVM]", end="")
251 seed = rnd.randint(np.iinfo("i").max)
--> 252 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
253 # see comment on the other call to np.iinfo in this file
255 self.shape_fit_ = X.shape if hasattr(X, "shape") else (n_samples,)
File ~\anaconda3\lib\site-packages\sklearn\svm\_base.py:315, in BaseLibSVM._dense_fit(self, X, y, sample_weight, solver_type, kernel, random_seed)
312 X = self._compute_kernel(X)
314 if X.shape[0] != X.shape[1]:
--> 315 raise ValueError("X.shape[0] should be equal to X.shape[1]")
317 libsvm.set_verbosity_wrap(self.verbose)
319 # we don't pass **self.get_params() to allow subclasses to
320 # add other parameters to __init__
ValueError: X.shape[0] should be equal to X.shape[1]
任何帮助将不胜感激。
核函数给出两个数据矩阵:
n_samples_1 x n_features
和n_samples_2 x n_features
(在训练时我认为它们是相同的)。你的函数应该返回一个大小为 n_samples_1 x n_samples_2
的矩阵。换句话说,您的核函数需要获取矩阵 1 中的每个样本,并计算该样本与矩阵 2 中每个样本之间的核。然后获取矩阵 1 中的下一个样本,并使用矩阵 2 中的每个样本计算核.最终得到一个大小为 n_samples_1 x n_samples_2
的矩阵,其中每个条目都是两个样本之间的核函数值。
您当前的核函数返回一个大小为
n_samples_2 x n_features
的矩阵。如果您在特征的点积上运行内核:kernel_ij = sqrt(exp(exp( feat0^2 + feat1^1 + ... ))
,它将为您提供一个具有正确尺寸的结果矩阵n_samples_1 x n_sampels_2
:
def my_kernel(X, Y):
return np.sqrt(np.exp(np.exp(X @ Y.T)))
下面的代码示例执行此操作并绘制决策边界。
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
#Synthetic data
n_pts = 200
y = np.hstack([np.ones(n_pts // 2), np.zeros(n_pts // 2)])
X = np.hstack([np.sin(np.linspace(0, 2 * np.pi, n_pts)).reshape(-1, 1),
np.cos(np.linspace(0, 2 * np.pi, n_pts)).reshape(-1, 1)]) +\
np.random.randn(n_pts, 2) / 10
#Define a kernel and fit classifier
def my_kernel(X, Y):
return np.sqrt(np.exp(np.exp(X @ Y.T)))
clf = SVC(kernel=my_kernel)
clf.fit(X, y)
print('Classifier train score is:', clf.score(X, y))
#
#Plots
#Show the original data, and the decision boundaries
#
f, ax = plt.subplots(figsize=(5, 5))
ax.scatter(X[:, 0], X[:, 1], c=y, zorder=2, cmap='cool', marker='s', s=40)
ax.set_xlabel('column 0')
ax.set_ylabel('column 1')
xx, yy = np.meshgrid(
np.linspace(X[:, 0].min(), X[:, 0].max()),
np.linspace(X[:, 1].min(), X[:, 1].max())
)
feat_space = np.stack([xx.ravel(), yy.ravel()], axis=1)
predictions = clf.predict(feat_space).reshape(xx.shape)
decision_vals = clf.decision_function(feat_space).reshape(xx.shape)
#Floodfill with predicted class
ax.contourf(xx, yy, predictions, alpha=0.4, cmap='bwr')
#Contours highlighting the decision terrain
cont = ax.contour(xx, yy, decision_vals, levels=30, cmap='cool', alpha=0.6)