朴素贝叶斯分类器，判别分析的准确性尚待完善

Question 1

正如其他人正确指出的那样，这些行中至少存在一个问题：

Answer

正如其他人正确指出的那样，这些行中至少存在一个问题：

class1 = classify(cluster1, training_data, target_class, 'diaglinear'); ...

您正在使用所有training_data训练分类器，但仅在子分类上对其进行评估。为了使数据聚类产生效果，您需要训练每个子聚类不同的分类器

within

Question 2

这是一个非常简单的示例，准确显示了它应该如何工作以及出了什么问题

%% Generate data and labels for each class x11 = bsxfun(@plus,randn(100,2),[2 2]); x10 = bsxfun(@plus,randn(100,2),[0 2]); x21 = bsxfun(@plus,randn(100,2),[-2 -2]); x20 = bsxfun(@plus,randn(100,2),[0 -2]); %If you have the PRT (shameless plug), this looks nice: %http://www.mathworks.com/matlabcentral/linkexchange/links/2947-pattern-recognition-toolbox % ds = prtDataSetClass(cat(1,x11,x21,x10,x20),prtUtilY(200,200)); x = cat(1,x11,x21,x10,x20); y = cat(1,ones(200,1),zeros(200,1)); clusterIdx = kmeans(x,2); %make 2 clusters xCluster1 = x(clusterIdx == 1,:); yCluster1 = y(clusterIdx == 1); xCluster2 = x(clusterIdx == 2,:); yCluster2 = y(clusterIdx == 2); %Performance is terrible: yOut1 = classify(xCluster1, x, y, 'diaglinear'); yOut2 = classify(xCluster2, x, y, 'diaglinear'); pcCluster = length(find(cat(1,yOut1,yOut2) == cat(1,yCluster1,yCluster2)))/size(y,1) %Performance is Good: yOutCluster1 = classify(xCluster1, xCluster1, yCluster1, 'diaglinear'); yOutCluster2 = classify(xCluster2, xCluster2, yCluster2, 'diaglinear'); pcWithinCluster = length(find(cat(1,yOutCluster1,yOutCluster2) == cat(1,yCluster1,yCluster2)))/size(y,1) %Performance is Bad (using all data): yOutFull = classify(x, x, y, 'diaglinear'); pcFull = length(find(yOutFull == y))/size(y,1)

Answer

这是一个非常简单的示例，准确显示了它应该如何工作以及出了什么问题

%% Generate data and labels for each class x11 = bsxfun(@plus,randn(100,2),[2 2]); x10 = bsxfun(@plus,randn(100,2),[0 2]); x21 = bsxfun(@plus,randn(100,2),[-2 -2]); x20 = bsxfun(@plus,randn(100,2),[0 -2]); %If you have the PRT (shameless plug), this looks nice: %http://www.mathworks.com/matlabcentral/linkexchange/links/2947-pattern-recognition-toolbox % ds = prtDataSetClass(cat(1,x11,x21,x10,x20),prtUtilY(200,200)); x = cat(1,x11,x21,x10,x20); y = cat(1,ones(200,1),zeros(200,1)); clusterIdx = kmeans(x,2); %make 2 clusters xCluster1 = x(clusterIdx == 1,:); yCluster1 = y(clusterIdx == 1); xCluster2 = x(clusterIdx == 2,:); yCluster2 = y(clusterIdx == 2); %Performance is terrible: yOut1 = classify(xCluster1, x, y, 'diaglinear'); yOut2 = classify(xCluster2, x, y, 'diaglinear'); pcCluster = length(find(cat(1,yOut1,yOut2) == cat(1,yCluster1,yCluster2)))/size(y,1) %Performance is Good: yOutCluster1 = classify(xCluster1, xCluster1, yCluster1, 'diaglinear'); yOutCluster2 = classify(xCluster2, xCluster2, yCluster2, 'diaglinear'); pcWithinCluster = length(find(cat(1,yOutCluster1,yOutCluster2) == cat(1,yCluster1,yCluster2)))/size(y,1) %Performance is Bad (using all data): yOutFull = classify(x, x, y, 'diaglinear'); pcFull = length(find(yOutFull == y))/size(y,1)

Question 3

查看第一个示例的cmat1数据（精度为81.49％），获得高精度的主要原因是您的分类器获得了大量的class 1和class 4正确的值。几乎所有其他类的表现都很差（获得零正确的预测）。这与您的最后一个示例（首先使用k均值）一致，对于cluster7，您的acc7为56.9698。

EDIT

Answer

查看第一个示例的cmat1数据（精度为81.49％），获得高精度的主要原因是您的分类器获得了大量的class 1和class 4正确的值。几乎所有其他类的表现都很差（获得零正确的预测）。这与您的最后一个示例（首先使用k均值）一致，对于cluster7，您的acc7为56.9698。

EDIT

Question 4

对数据进行聚类后，是否要为每个聚类转换分类器？如果您不这样做，则可能是您的问题。

尝试这样做。首先，对数据进行聚类并保留质心。然后，使用训练数据，按聚类训练分类器。对于分类阶段，找到要分类的对象的最近的质心，然后使用相应的分类器。

Answer

对数据进行聚类后，是否要为每个聚类转换分类器？如果您不这样做，则可能是您的问题。

尝试这样做。首先，对数据进行聚类并保留质心。然后，使用训练数据，按聚类训练分类器。对于分类阶段，找到要分类的对象的最近的质心，然后使用相应的分类器。

Question 5

考虑此函数调用：

classify(cluster1, training_data, target_class, 'diaglinear');

Answer

考虑此函数调用：

classify(cluster1, training_data, target_class, 'diaglinear');

training_data是整个特征空间的样本。那意味着什么？您正在训练的分类模型将尝试使整个要素空间的分类精度最大化。这意味着，如果您显示与训练数据具有相同行为的测试样本，您将获得分类结果。

朴素贝叶斯分类器，判别分析的准确性尚待完善

问题描述投票：8回答：5

5个回答

最新问题

朴素贝叶斯分类器，判别分析的准确性尚待完善

问题描述 投票：8回答：5

5个回答

最新问题

问题描述投票：8回答：5