When is normalization of data required

scaling
normalization

#1

Hi All,

Is it always advisable to have normalized data passed to models?

can anybody explain it properly that when to do normalization or scaling of features in different scenarios like Regression, SVM, Neural Networks.

Thanks in advance!

Rahul


#2

hello @rahul29,

Normalization is mainly necessary in case of algorithms which use distance measures like clustering,recommender systems which use cosine similarity etc.This is done so that a variable which is on a higher scale does not affect the outcome just because it is on a higher scale.For example consider a credit card dataset having two variables #creditcards and income and you intend to cluster records to find similar applicants based on these attributes. As you can well imagine these two will be on different scales and income being on a much higher scale will influence the distance measures much more than #creditcards.Thus normalization is done to avoid this.
Also in problems like regression sometimes we deal with variables which are on the same scale.
For example if you are studying the brand effectiveness of some brand and there are several columns which are on the likert scale(ratings data-very likely,likely etc.).In these cases normalization is performed before doing regression.However when this is not the case doing normalization will take away the interpret-ability of the model and hence it will ultimately depend upon the business need.

Hope this helps!!


#3

Thanks @shuvayan


#4

@shuvayan thanks for the clear reply
I agree it depends on the business need

But to extend the question,
a) Is there a rule of thumb? In general, given a dataset - before applying ML algorithms, are there any checklist steps to understand the distribution of data (ex to apply normalization, standarization etc)

b) when is standardization needed (ie mean of 0 and SD of 1)

many thanks