I am working on a data science project in python and while data exploration I have found a feature with skewed distribution. I want to apply log transformation to reduce the skewness of the feature but it is giving an error value because feature has zero values. Should I drop observation with zero values or there are methods to deal with this situation?
Use Box-Cox transformation for data having zero values.This works fine with zeros (although not with negative values). However, often the square root is not a strong enough transformation to deal with the high levels of skewness (we generally do sqrt transformation for right skewed distribution) seen in real data. If you are using are than can use function boxcox.fit() in package named geoR.
Thank you sir. sir i have little confusion in it. we will need to use log(x+1) transformation for the whole data set or only the values that are in negative or zero.?i shall be thankful
I am not a statistician or expert but I have doubts about adding constant to feature with many zeros. Because when we add a constant values (like +1 or more) to all zeros along with other values, non-effective values (zeros) will be effective.
Does not this effect the prediction negatively?