SVM Algorithm Performance

svm

#1

HI

I trained a SVM model for binary classification use case. Dataset had 25 variables and 100 observations.

Accuracy, Precision and Recall output were not satisfactory with 0.60, 0.38 and 0.55 respectively which is tad too low.

I have been reading articles on SVM articles and understand that its performance is much better on high dimension datasets and lower observations. Works best for Non linear type of problems

Is that the reason for it not to work optimally for the specified use case ?

I chose kernel as Linear, C=1 and Gamma = 10

Any suggestions on this will be highly appreciated


#2

Hi @vashish1,

Make sure your dataset has no outliers as SVM is robust to them. Also you can try to tune the hyperparameters that can also help you to improve the performance of the model. You can follow the article given below which explains the working and implementation of SVM in detail:


#3

Thanks PulkitS for sharing the link.

You mean SVM does not perform well if the data has outliers ?

I tried tuning Cost and gamma parameters (Increased the value) but still there is no improvement in Accuracy, Precision and Recall.

I calculated the score (model.fit) on training set which is coming to 0.80 which is getting reduced to 0.60 on testing set. Issue of overfitting…

If I get to understand the factors behind SVM low performance will help a lot. Any pointers ?


#4

Hi @vashish1,

Yes. When the data have outliers, SVM does not perform well. So, if you train or test data have outliers, it is recommended to use some other technique.

You can normalize all the variables, i.e. bring them down to the same scale which can prove to improve the model’s performance.


#5

Agree that outlier could be the culprit as i see data across variables is skewed so got to normalize it

Actually, I wanted to give a try at SVM (as I am sure other techniques will perform better) to understand the intricacies of the algorithm on real use case.

It performed exceptionally well on pre defined datasets in sklearn

Thanks a lot for your quick reverts. Will continue by efforts on improving SVM output


#6

Hi @vashish1,

The performance of model majorly depends on the type of dataset you are dealing with. For some datasets, one technique can perform exceptionally well while the same technique could perform the worst on some other dataset. So, you just have to try different techniques in order to achieve the best model.

Sure, and please share the insights that you get after your experiments that would help me to enhance my learnings on SVM as well.