Machine Learning Algorithm Selection

Dear All,

I have a ML problem with 130 features and 3000 records. The number of data seems to be less with more features. What would be the right ML algorithm that will be work for this and Why?

This is a regression problem.

Shankar R

1 Like

Apply PCA then give a try to Random Forest. @shankarthebest

Apply Support Vector Regression which is capable of handling more number of features.

First check the variables are independent of each other. (the basic assumption for regression problem).
Take your numerical variables and check the correlated features and remove those that are highly correlated. There are many statistical techniques too for picking the choosing the right variables.
For categorical variables: check the distribution of each category on that variable as a whole and on the response variable. If any category looks highly skewed remove those.

1 Like

Best method for dataset with more features and less data is Support vector.

However first check for co-relations between features and remove those that are highly correlated. Post that apply PCA.

© Copyright 2013-2019 Analytics Vidhya