let’s say I have a small data which is indicative of demographics of a place has fields /attributes like :
- population (per district in the state),
- district_location (near sea or inland),
- total_households (per state district)
I plotted histograms on the entire dataset and saw long tailed distributions. Data is NOT normal or Gaussian
Now my questions are :
When checking for Normality / Gaussian curve. Should we check all the attributes or a few particular attributes in the dataset? which one would be those in my case above.
When transforming data to make it normal, do we transform all the available data/attributes or a few important ones.
How should I handle such a transformation of making data normal…through scipy.stats module using cox-box or by any other technique.Kindly explain.
PS: Can data scaling / standarization make data normal ?
Thanks & Regards,