# Methods to deal with zero values while performing log transformation of variable

#1

Hi,

I am working on a data science project in python and while data exploration I have found a feature with skewed distribution. I want to apply log transformation to reduce the skewness of the feature but it is giving an error value because feature has zero values. Should I drop observation with zero values or there are methods to deal with this situation?

Regards,
Steve

0 Likes

#2

There are various methods to deal with it, I am listing some of these:

• Add a constant value © to each value of variable then take a log transformation
• Impute zero value with mean.
• Take square root instead of log for transformation

Hope this helps!

Regards,
Imran

3 Likes

#3

Hi Steve,

Do log(x+1) transformation. This is the best way to avoid error created by log transformation and is widely used among data scientists.

Hope this helps.

Regards,
Aayush

1 Like

#4

Just look for the smallest non zero entry in your data, let this be e.g. x, then add x/2 to this smallest values and compute the log .

Hope this helps!

Regards,
Rohit

1 Like

#5

Use Box-Cox transformation for data having zero values.This works fine with zeros (although not with negative values). However, often the square root is not a strong enough transformation to deal with the high levels of skewness (we generally do sqrt transformation for right skewed distribution) seen in real data. If you are using are than can use function boxcox.fit() in package named geoR.

Hope this helps.

Regards,
Dharm

0 Likes

#6

Thank you sir. sir i have little confusion in it. we will need to use log(x+1) transformation for the whole data set or only the values that are in negative or zero.?i shall be thankful

0 Likes

#7

Whole data set. You need to apply same transformation to each i/p to be consistent.

0 Likes

#8

I am not a statistician or expert but I have doubts about adding constant to feature with many zeros. Because when we add a constant values (like +1 or more) to all zeros along with other values, non-effective values (zeros) will be effective.
Does not this effect the prediction negatively?

0 Likes