How dimension reduction technique helps in classification technique

classification
dimensionreduction

#1

I have studied the dimension reduction technique like PCA and factor analysis and also I have studied the classification technique like decision tree and random forest I want to know how dimension reduction technique helps in predicting the output of classification .


#2

The advantages are:

  1. It reduces the time and storage space required.
  2. Removal of multi-collinearity improves the performance of the machine learning model.
  3. It becomes easier to visualize the data when reduced to very low dimensions such as 2D or 3D.

To understand why the results are expected to be better, you need to understand the term ‘curse of dimensionality’. Dimensionality reduction is done to tackle the curse of dimensionality. The “curse of dimensionality” is not a problem of high-dimensional data, but a joint problem of the data and the algorithm being applied. It arises when the algorithm does not scale well to high-dimensional data,
typically due to needing an amount of time or memory that is exponential in the number of imensions of the data.

For more information about the curse of dimensionality : https://en.wikipedia.org/wiki/Curse_of_dimensionality

Hope this helps!


#3

@hinduja1234,

Dimensionality reduction technique can be used for regression and classification both types of problems. We apply dimensionality reduction techniques to reduce the dimensions of the data set by identifying the most significant dimensions or generate a new dimensions based on exiting ones. We can build a classification or regression model based on output of dimensionality reduction methods.

Decision Tree and Random forest also works as dimensionality reduction methods as they generate tree based on most significant dimension only and after that return the variable importance plot.

Hope this helps!


#4

Hi Hinduja,
Just to add to all that Harshita and Steve has already said, you mentioned randomforest in your question, which really do not need dimension reduction. You can remove zero variance variables for sure but rest of the variables are only restricted by the laptop memory.

Hope this does not increase your confusion :smile:
Tavish