Which supervised learning system would you use?



You have a data sample consisting of 200 data points and 100 features, most of which are independent. Which supervised learning system would you use? Why?



It would really depend on what your objective is? First start with PCA and explore the possibilities of dimension reduction.


You may use ridge or lasso regression or naive bayes algo. These usually work fine for wide data sets. But you may try dimension reduction as well by PCA or correlation or by variable importance.


First question i want to ask before answering is :

  1. What is type of target variable (Continuous or Categorical) ?
  2. What is the data is about ? (Domain)
  3. Have you tried EDA ?
  4. What is the business problem you are trying to solve ?

once you have answered the above question , you can try different supervised algorithms like : (Of course depend on whether you want to do classification or regression), Just mentioned few names below.

  1. Linear Regression
  2. Logistic Regression
  3. Random Forest
  4. SVM
  6. Neural Networks