 # Non - Gaussian Distribution

I am a beginner in machine learning and basically every blog says to check if the data is normally distributed. Otherwise we can use log transformation or BoxCox. I wanted to know

1. Does Gaussian - like means that the plot looks bell shaped but tests used to confirm it fails?
2. For which methods of regression and classification, the data should be Gaussian or Gaussian like?
3. Which methods (again in classification and regression) does not need Gaussian distribution?

According to this link, we do not have to worry about data distribution unless it is LDA or QDA.

So, what do you guys suggest?
Thanks

1 Like

Many statistical properties are just valid for Gaussian, other might just need different treatment.
E.g. the standard deviation for Laplace or Bernoulli does directly describe their exponential behaviour, where Gaussians are L2 Distributions, which means their exponential part is dependent from the variance and the squared distance between mean and data.

Many distributions do not even provide a mean value. From my perspective they explicitly describe a specific behaviour of the data values rather than just the deviation from the mean estimate.

Specifically Poissons for example are directly more dense around the satisfaction of their specific expectation value. It does not provide any separate variance from the expectation value. Thus it is tending to have higher masses at lim x->0.

LDA and QDA seem to be like PQ equations of estimating any underlying distribution, similar to Gaussians in other regimes, like Bayesian statistics.

There are many ways to fit data into a specific model by transforming it, the important part is that those transformations are invertible if necessary and at minimal information loss, as well as the model does recover the problem well enough.

1 Like