Should I train multiple classification models on the same data

ensemble_methods

#1

Hello,

I am trying to increase my accuracy from 96% in the Digit Recogniser problem in Kaggle.I have currently used only Random Forest and trying with KNN and SVM.
I guess I will have to use ensemble methods for this one and hence I would like to know a few things:
1.Do i divide the training data into multiple sets and use the same algorithm on them and then combine the results? OR
2.Do I use multiple algorithms on the train data and then combine their predictions??
In this type of a classification problem if I divide the dataset into multiple datasets,some digits might be left out in a particular dataset-will this not reduce the power of the particular algo I am applying??
These are some of the considerations I would like to know before proceeding with ensembles,so can somebody please help me with these.


#2

Hi @pagal_guy,

So If you want to ahead with ensemble technique, use the second approach. You train multiple models over the full training dataset and then combine the predictions.

Just a tip for Digit recogniser,
You can only go ahead with Random forest, but try to increase the training data. How? Well you can tilt the images to 10/20/30 degrees both clockwise and anticlockwise. This will give you 6 times more data to train on. If computation time and restrictions are not an issue, you can try shifting images 5 pixels up/down/left/right also , this will further increase your dataset which you can train random forest on. These techniques gives significant increase in efficiency specially in image recognition.

Hope this helps.

Regards,
Aayush