Why univariate and bivariate analysis is required for machine learning?



It is a beautifully designed course, but why did they use all the variables? Should we not be filtering some variables, which are important and then building the model?


Hi @rishabgupta98,

Univariate and Bivariate analysis is done to get an insight from the data. Once you explore the variables individually using univariate analysis, you will learn the distribution of the data. Using bivariate analysis, you will learn the relationships between two or more variables.

So, these analysis helps to learn what the data is all about. Before univariate and bivariate analysis and even before looking at the dataset, we generate hypothesis, i.e. list down possible factors that can affect our target variable. So, you can combine the hypothesis and insights that you have got from the data to create new features or drop some of the existing features which you think are not useful for the model. This is called feature engineering.

In the course, we have created new features after doing the exploration. And hence, univariate and bivariate analysis are good steps to understand what your data is all about.