Question

Итак, у меня есть набор данных nrow = 218, и я рассматриваю [этот] [https://iamnagdev.com/2018/01/02/sound-analytics-in-r-for-animal-sound-classification-using-vector-machine/] пример [git здесь] [https://github.com/nagdevAmruthnath] . Я разделил мои данные на train (nrow = 163; ~ 75%) и тест (nrow = 55; ~ 25%).

Когда я доберусь до части, где "pred <- Forex (Model_svm, test) ", если я преобразую pred во фрейм данных, вместо 55 строк получается 163. Это нормально, потому что для обучения использовалось 163 строки? Или он должен иметь только 55 строк, так как я использую набор тестов для тестирования? </p>

Некоторые поддельные данные:

featuredata_all <- matrix(rexp(218, rate=.1), ncol=23)

Часть кода:


library(data.table)

pt1 <- scale(featuredata_all[,1:22],center=T)
pt2 <- as.character(featuredata_all[,23]) #since the label is a string I kept it separate 

ft<-cbind.data.frame(pt1,pt2) #to preserve the label in text
colnames(ft)[23]<- "Cluster"

## 75% of the sample size
smp_size <- floor(0.75 * nrow(ft))

## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ft)), size = smp_size)

train <- ft[train_ind,1:22] #163 reads
test  <- ft[-train_ind,1:22] #55 reads

trainlabel<- ft[train_ind,23] #163 labels
testlabel <- ft[-train_ind,23] #55 labels

#ftID <- cbind(ft, seq.int(nrow(ft))
#colnames(ftID)[24]<- "RowID"
#ftIDtestrows <- ftID[-train_ind,24]

#Support Vector Machine for classification
model_svm <- svm(trainlabel ~ as.matrix(train) )
summary(model_svm)

#Use the predictions on the data
# ---------------- This is where the question is ---------------- #
pred <- predict(model_svm, test)
# ----------------------------------------------------------------#

print(confusionMatrix(pred[1:nrow(test)],testlabel))

#ROC and AUC curves and their plots
#-----------------also------------->  was trying to get this to work as pred doesn't naturally end up with the expected 55 nrow from test set
roc.multi<-multiclass.roc(testlabel, as.numeric(pred[1:55])) 
rs <- roc.multi[['rocs']]
plot.roc(rs[[1]])
sapply(2:length(rs),function(i) lines.roc(rs[[i]],col=i)) ```


 [1]: https://iamnagdev.com/2018/01/02/sound-analytics-in-r-for-animal-sound-classification-using-vector-machine/
 [2]: https://github.com/nagdevAmruthnath

Not_Dave · Answer 1 · 01 мая 2020

Мне удалось получить результат в виде 55 строк, используя следующий код. Некоторые из изменений, которые я сделал, были для pt2 вместо as.character, я сделал это в as.factor и вместо pred <- predict(model_svm, test) до pred <- predict(model_svm, as.matrix(test)).

# load libraries
library(data.table)
library(e1071)

# create dataset with random values
featuredata_all <- matrix(rnorm(23*218), ncol=23)

# scale features
pt1 <- scale(featuredata_all[,1:22],center=T)

# make column as factor
pt2 <- as.factor(ifelse(featuredata_all[,23]>0, 0,1)) #since the label is a string I kept it separate 

# join data (optional)
ft<-cbind.data.frame(pt1,pt2) #to preserve the label in text
colnames(ft)[23]<- "Cluster"

## 75% of the sample size
smp_size <- floor(0.75 * nrow(ft))

## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ft)), size = smp_size)

# split data to train
train <- ft[train_ind,1:22] #163 reads
test  <- ft[-train_ind,1:22] #55 reads
dim(train)
# [1] 163  22

dim(test)
# [1] 55  22

# split data to test
trainlabel<- ft[train_ind,23] #163 labels
testlabel <- ft[-train_ind,23] #55 labels
length(trainlabel)
[1] 163

length(testlabel)
[1] 55

#Support Vector Machine for classification
model_svm <- svm(x= as.matrix(train), y = trainlabel, probability = T)
summary(model_svm)

# Call:
#   svm.default(x = as.matrix(train), y = trainlabel, probability = T)
# 
# 
# Parameters:
#   SVM-Type:  C-classification 
# SVM-Kernel:  radial 
# cost:  1 
# 
# Number of Support Vectors:  159
# 
# ( 78 81 )
# 
# 
# Number of Classes:  2 
# 
# Levels: 
#   0 1

#Use the predictions on the data
# ---------------- This is where the question is ---------------- #
pred <- predict(model_svm, as.matrix(test))
length(pred)
# [1] 55
# ----------------------------------------------------------------#

print(table(pred[1:nrow(test)],testlabel))
#    testlabel
#    0  1
# 0 14 14
# 1 11 16

Надеюсь, это поможет.

SqueakyBeak · Answer 2 · 02 мая 2020

Хорошо, я понял, что тренировал модель на своем наборе данных поезда, а затем тестировал ее на своем тестовом наборе. Мне нужно было сначала проверить его при повторном прогнозировании набора поездов, а затем передать его в набор тестов.

 summary(model_svm)
#Use the predictions on the data
pred <- predict(model_svm, train)

model_svm <- svm(trainlabel ~ as.matrix(test) )
 summary(model_svm)
#Use the predictions on the data
pred <- predict(model_svm, test)```

Является ли количество предсказанных значений правильным из набора тестов для SVM?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Является ли количество предсказанных значений правильным из набора тестов для SVM?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы