What is the point of univariate and bi-variate analysis?



Hello People,

I’m participating in pump it competition in data driven.
I was doing looking at each and every feature and trying to form conclusions. I realised that all the conclusions which I formed are pointless as any tree or ensemble model will catch that and model accordingly. So my question is what do you generally obtain or what should be looked for while doing univariate/multivariate analysis?
Also if you look at the data there are a lot of features which are factors. What feature engineering can be done such features?



The univariate analysis will help you to understand the distribution & bivariate is about understanding the relationship among two variables. For univariate you can try histogram, boxplot etc


Hi @B.Rabbit,

In addition to @ParindDhillon, I’d like to say a few points;

In the Business world, explaining the conclusions formed by your analyses to the concerned party is as important as building an accurate predictive model. If the feature your model uses does not have business value, your model cannot be relied upon. .So it is recommended to do proper analysis in every project you take.

In Competitions, features are basically spoon-fed to you. But still it is a best practice to understand your data before getting on with the problem. The purpose of doing analysis here is to find out which features are more informative, try to make the extract useful information from the less informative ones and make sure your model gets this information. That is why you can see that the top contenders for the prize always do extensive analysis and feature engineering.

Also, you have to understand that most of the machine learning models are dumb. They rely on features given to them, to give the outputs. Therefore the more better features you give, the better your model performs.

For the categorical feature engineering, you should look at this webpage for ideas.

PS: Here’s a general article on feature engineering.


Thank you @ParindDhillon and @jalFaizy. I’m truly enlightened :slight_smile:


With my little experience with some competitions, I could definitively say that making intuitive features helps in improving model accuracy and univariate and bivariate analysis could give you insight about making these features.