Log Transformation for skewed data

machine_learning
log_transformation

#1

As we know, to convert right skewed data into symmetry we have to transform using log. I tried the same in R (code as below) for some sample data in order to understand the shape coverts from right skewed data to normal distribution or not.

y<-c(2,2,2,2.5,3,2,100,120)
hist(y)
y<-log(y)
y
[1] 0.6931472 0.6931472 0.6931472 0.9162907 1.0986123 0.6931472 4.6051702 4.7874917
hist(y)

as you see 100 and 120 makes the above data set into right skewed in untransformed data histogram figure

I take log of these data in order to make them normally distributed , after taking log the histogram does not seem to me normally distributed it looks me weird , is it due to lack of data or I am missing something


Thanks in advance

KS


#2

Try Box-cox transformation, log transform does not always convert your distribution to a normal distribution. Please refer to the section ‘Transformations for Skewed Distribution’ from this article.


#3

If data is this separate, just create two clusters and do your analysis for each clusters. It may not be skewed, it may just be a pattern.


#4

thanks ankit , this make sense