为什么SVM在使用逗号分隔形式而不是公式形式时可以工作？R

Question

所以我有一个数据集nrow=218，我通过[这个][。https:/iamnagdev.com20180102sound-analytics-in-r-for-animal-sound-classification-using-vector-machine】。] 例子 [git here][https:/github.comnagdevAmruthnath] 。. 我把我的数据分为训练（nrow = 163; ~75%）和测试（nrow = 55; ~25%）。

当我到达 "pred <- predict(model_svm, test) "的部分时，如果我将pred转换为数据框架，就会有163行，而不是55行（使用svm调用的函数形式时）。这是否正常，因为它用了163行来训练？还是因为我使用测试集进行测试，所以应该只有55行？

当我使用'公式'形式的svm时，我在预测函数的行数上有问题。

model_svm <- svm(trainlabel ~ as.matrix(train) )

但当我使用 "传统 "形式时，在测试数据上预测可以正常工作。

model_svm <- svm(as.matrix(train), trainlabel)

知道为什么会这样吗？

一些假数据。

featuredata_all <- matrix(rexp(218, rate=.1), ncol=23)

一些代码


library(data.table)

pt1 <- scale(featuredata_all[,1:22],center=T)
pt2 <- as.character(featuredata_all[,23]) #since the label is a string I kept it separate 

ft<-cbind.data.frame(pt1,pt2) #to preserve the label in text
colnames(ft)[23]<- "Cluster"

## 75% of the sample size
smp_size <- floor(0.75 * nrow(ft))

## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ft)), size = smp_size)

train <- ft[train_ind,1:22] #163 reads
test  <- ft[-train_ind,1:22] #55 reads

trainlabel<- ft[train_ind,23] #163 labels
testlabel <- ft[-train_ind,23] #55 labels

#Support Vector Machine for classification
model_svm <- svm(trainlabel ~ as.matrix(train) )
summary(model_svm)

#Use the predictions on the data
pred <- predict(model_svm, test) 


 [1]: https://iamnagdev.com/2018/01/02/sound-analytics-in-r-for-animal-sound-classification-using-vector-machine/
 [2]: https://github.com/nagdevAmruthnath

Answer 1

你说的没错，你的公式方式是给你训练的结果数，而pred应该给你测试的结果数。我认为问题在于你在写公式的时候，用的是 as.matrix(). 如果你看一下你的预测结果，你会发现其实有一堆的NA。

下面是正确使用公式的方法

#Create training and testing sets

set.seed(123)
intrain<-createDataPartition(y=beaver2$activ,p=0.8,list=FALSE)
train<-beaver2[intrain,] #80 rows, 4 variables
test<-beaver2[-intrain,] #20 rows, 4 variables

svm_beaver2 <- svm(activ ~ ., data=train)

pred <- predict(svm_beaver2, test) #20 responses, the same as the length of test set

你的结果只要是一个因子就可以了。因此，即使它是一个字符串，你也可以通过以下操作将它转换为一个因子。train$outcome <- as.factor(train$outcome) 然后你可以用上面的公式。

为什么SVM在使用逗号分隔形式而不是公式形式时可以工作？R

问题描述投票：0回答：1

1个回答

最新问题

为什么SVM在使用逗号分隔形式而不是公式形式时可以工作？R

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1