所以我有一个nrow=218的数据集,我通过[这个][。https:/iamnagdev.com20180102sound-analytics-in-r-for-animal-sound-classification-using-vector-machine】。] 例子 [git here][https:/github.comnagdevAmruthnath] 。. 我把我的数据分成了训练(nrow = 163; ~75%)和测试(nrow = 55; ~25%)。
当我到了 "pred <- predict(model_svm, test) "的部分,如果我把pred转换为数据框架,就会发现有163行而不是55行。这是否正常,因为它用了163行来训练?还是因为我用测试集来测试,所以应该只有55行?
一些假数据。
featuredata_all <- matrix(rexp(218, rate=.1), ncol=23)
一些代码
library(data.table)
pt1 <- scale(featuredata_all[,1:22],center=T)
pt2 <- as.character(featuredata_all[,23]) #since the label is a string I kept it separate
ft<-cbind.data.frame(pt1,pt2) #to preserve the label in text
colnames(ft)[23]<- "Cluster"
## 75% of the sample size
smp_size <- floor(0.75 * nrow(ft))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ft)), size = smp_size)
train <- ft[train_ind,1:22] #163 reads
test <- ft[-train_ind,1:22] #55 reads
trainlabel<- ft[train_ind,23] #163 labels
testlabel <- ft[-train_ind,23] #55 labels
#ftID <- cbind(ft, seq.int(nrow(ft))
#colnames(ftID)[24]<- "RowID"
#ftIDtestrows <- ftID[-train_ind,24]
#Support Vector Machine for classification
model_svm <- svm(trainlabel ~ as.matrix(train) )
summary(model_svm)
#Use the predictions on the data
# ---------------- This is where the question is ---------------- #
pred <- predict(model_svm, test)
# ----------------------------------------------------------------#
print(confusionMatrix(pred[1:nrow(test)],testlabel))
#ROC and AUC curves and their plots
#-----------------also-------------> was trying to get this to work as pred doesn't naturally end up with the expected 55 nrow from test set
roc.multi<-multiclass.roc(testlabel, as.numeric(pred[1:55]))
rs <- roc.multi[['rocs']]
plot.roc(rs[[1]])
sapply(2:length(rs),function(i) lines.roc(rs[[i]],col=i)) ```
[1]: https://iamnagdev.com/2018/01/02/sound-analytics-in-r-for-animal-sound-classification-using-vector-machine/
[2]: https://github.com/nagdevAmruthnath
实际上,我能够使用下面的代码得到55行的结果。我所做的一些改变是为了 pt2
而不是 as.character
我把它做成了 as.factor
而不是 pred <- predict(model_svm, test)
到 pred <- predict(model_svm, as.matrix(test))
.
# load libraries
library(data.table)
library(e1071)
# create dataset with random values
featuredata_all <- matrix(rnorm(23*218), ncol=23)
# scale features
pt1 <- scale(featuredata_all[,1:22],center=T)
# make column as factor
pt2 <- as.factor(ifelse(featuredata_all[,23]>0, 0,1)) #since the label is a string I kept it separate
# join data (optional)
ft<-cbind.data.frame(pt1,pt2) #to preserve the label in text
colnames(ft)[23]<- "Cluster"
## 75% of the sample size
smp_size <- floor(0.75 * nrow(ft))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ft)), size = smp_size)
# split data to train
train <- ft[train_ind,1:22] #163 reads
test <- ft[-train_ind,1:22] #55 reads
dim(train)
# [1] 163 22
dim(test)
# [1] 55 22
# split data to test
trainlabel<- ft[train_ind,23] #163 labels
testlabel <- ft[-train_ind,23] #55 labels
length(trainlabel)
[1] 163
length(testlabel)
[1] 55
#Support Vector Machine for classification
model_svm <- svm(x= as.matrix(train), y = trainlabel, probability = T)
summary(model_svm)
# Call:
# svm.default(x = as.matrix(train), y = trainlabel, probability = T)
#
#
# Parameters:
# SVM-Type: C-classification
# SVM-Kernel: radial
# cost: 1
#
# Number of Support Vectors: 159
#
# ( 78 81 )
#
#
# Number of Classes: 2
#
# Levels:
# 0 1
#Use the predictions on the data
# ---------------- This is where the question is ---------------- #
pred <- predict(model_svm, as.matrix(test))
length(pred)
# [1] 55
# ----------------------------------------------------------------#
print(table(pred[1:nrow(test)],testlabel))
# testlabel
# 0 1
# 0 14 14
# 1 11 16
希望这能帮助你。
好吧,我意识到我是在训练数据集上训练模型,然后在测试集上测试它。我需要先在重新预测训练集上测试它,然后再将它输入测试集。
summary(model_svm)
#Use the predictions on the data
pred <- predict(model_svm, train)
model_svm <- svm(trainlabel ~ as.matrix(test) )
summary(model_svm)
#Use the predictions on the data
pred <- predict(model_svm, test)```