What are the broad inferences we could make by plots when exploring any dataset?

dataexploration
r
feature_engineering

#1

Hello,
I am doing the Exploratory data analysis course in Udacity®. I am applying the methods learnt in the course on various data sets(like bike sharing, Crime classification datasets in Kaggle). Being a beginner I am more or less running in the blind making combinations of plots without really inferencing anything. What are the basic things we look for when we make plots? As far as my understanding right now I look for:

  1. Outliers(what to do with outliers after finding them?)
  2. Settling for a bin size(to reduce noise)

What else should I keep in mind when studying plots?
Please provide me with examples where you were able to create new features by looking at plots.

Regards


#2

Hi @B.Rabbit,

Well, visualising your data not only gives you an idea of the kind of data you are dealing with, but also helps you utilise those trends to come up with different hypothesis during feature engineering.
For example,
In the bike sharing dataset, we create a feature by segregating the bike demand into categories using the boxplot of hours vs count of users.
See this
Also, there are numerous assumptions of algorithms that include the condition that data must be normalised. In order to check the distributions, visualizations are very useful.

If you provide a more specific query, I would be able to answer better.

Regards,
Shashwat