Importance of EDA in Machine learning


How EDA helps in improving model performance?


EDA helps in

  • Helps to gain familiarity with dataset
  • Identifying feature distribution
  • Identifying feature with null or erroneous values
  • Helps to identify feature that are not important like having same value for all observations

You can also refer the below link


If you have ever gone through any kaggle competition kernels, you must find an EDA kernel every time with most upvotes. Why is it so?

It is generally because EDA helps us in the better understanding of data and only using that we derive out trends and relationship among variables. That ultimately results in generation and selection of useful features that directly impact the model performance.

For learning EDA, you can refer this article.

Hope this clears your doubt.