Log of target variable before training random forest regressor

Hello Folks,

I am working on a predictive model where i have to predict the median sales price for a retailer based on the several different variables.
My target variable is median sales.
Upon performing the EDA I have found that my target variable is positivity ( right skewed), i have tried the log transformation and trained using the Random forest regression.

just wanted to know the thoughts, and how the interpretation has to be done.

  1. You can go for advanced transformation methods like: Tukey and Mosteller’s Bulging Rule.
    Below two points are different in nature as well as may not fit for a production level deployment:
  2. Generally, right skewness calls for Weibull distribution. Assuming you dependent variable is a continuous random variable, you can explore this side of the data distribution.
  3. Based on your data’s XY plotting and regression results, you can check if linear regression assumptions stand or not. If not, why not try a non-linear regression model? You can compare this to your tree based regression model.

Thanks Apan,

–> yes the target variable is a continuous random variable and i am able to achieve the normal distribution on taking the log.
Do i still need to try different transformations ?

–>Linear regression assumptions doesn’t hold true.
–> yes,I am trying non liner model (Random forest regressor) with CV which is turning up the better results.

I am just trying to understand the interpretations of the results out of my model.
how the log values has to be interpretated in random forest model

© Copyright 2013-2019 Analytics Vidhya