Feature Engineering - EDA Best practices for a project with large number of attributes


If a machine learning project has a large number of attributes/features say 50+ then :

  1. Analyzing the relationship between them one by one can be a cumbersome process and will consume a lot of time. Understanding bi-variate relationships can be more tedious and complex

  2. One might get lost in the whole process due to the confusion & complexity created by some many attributes

So how do we manage the whole process of analyzing relationships when there are large number of features so as to have a better grip and understanding over the whole process.

Are there any best practices to share to manage the whole process?

Thanks & Regards,


Using dimensions reduction you can reduce the features to 5

Hey @mohitlearns
dimensionality reduction is one way. Use either PCA ( very famous) or any other reducing algo.
Second approach is by considering it as a big data problem. If you want to analyze whole data and do reductions later. Then, try Mapreduce and/or PySpark techniques ( not sure about later, cause i have not tried that myself).

© Copyright 2013-2019 Analytics Vidhya