Which regression algorithm to use when there is weak correlation between target and numerical variables?



I have a regression problem in hand. Dataset have 20 predictors and 1 target. Target is continuous and predictors are both categorical and continous. I performed a correlation test between continous predictors and target and found very week to negligible linear relation.

Here Absenteeism.in.hours is my target. My question is

  1. how should i approach this kind of data.
  2. What can be an appropriate machine learning technique for regression in this case.


Hi @rohit.haritash,

As you have mentioned that you have 20 predictors, you can find the relation between categorical and target variable. You might find some categorical variables that are strongly influencing the target variable.

You can also do feature engineering, i.e. try to make new features using the predictors so that they have high correlation with the target variable.

Once you have done that, you can also remove highly correlated predictor. From the above correlation plot it can be seen that Body.mass.index is highly correlated with Service.time, so you can remove one of these predictors.

You can start with simple Linear Regression model and then move on to advanced and complex models like Decision Tree, Random Forest. You can also find the feature importance using Random Forest to know the impact of features on the target variable.