Scaling and standardizing is a confusing choice

machine_learning

#1

Hi,

Please explain me in learning in Choosing between scaling and standardizing,

Is Scaling and Standardization different?

Regards,

Tony


#2

Scaling is transforming your data to a range, say between 0 and 1, or 1 and 10, such that the numbers feel right. For example, converting your data from cm to meters just because it feels more convenient.

It can formulated as
x <- (x - min(x)) / (max(x) - min(x))

While standardizing means transforming your data such that it has zero mean and standard deviation equal to 1. Therefore, here we have scale the data in a standarized way such that the distribution comes roughly a normal distribution.
x <- (x - mean(x)) / sd(x)

Scaling is generally avoided when the data set has outliers, because that will lead to very small
intervals (since it includes max value). Therefore in such cases, we prefer standardization.

Hope this helps.


#3

Thank a ton shubham.


#4

Scaling :

A Min-Max scaling is typically done via the following equation:

Xnorm = (X−Xmin) / (Xmax−Xmin )

Standarization/Normalization :

z= (x−μ) / σ

Hence both are different . In Most of case we prefer Standarization (Like in K Mean clustering we use standardisation to normalise the data and then run the clustering on it)

In case outliers scaling should be avoided and Standarization has to be preferred.

Please let me know if you have any more questions on this.

Regards,
Arihant