How to fine tune the regression model if it has high dimensional multi variate data

machine_learning
regression

#1

I am working on a regression problem(Non linear). The overview of the problem is like the below;

  1. It has 6 variables in total. 5 of them features, 4 features are categorical.
  2. Using Label encoding and tried other encoding techniques also.
  3. Correlation factor among each of them was weak as all them are completely independent. Attached correlation matrix.

I have tried polynomial regression(tried up to 3rd degree), Lasso & Ridge regression. RMSE is 1.48 to 1.50 for all of them almost same.

Can any one from community help me to increase the model performance. should i use neural network or tune the hyper parameters for the used algorithms.

Any guidance would be greatly appreciated


#2

Hi, you didn’t mention if the predictions are on train set or test/dev set. Even though there is no one-size-fits-all approach, but you can begin with the following:

  1. After splitting the dataset and training on train set, check accuracy on train and test set.
  2. Determine if there is underfitting or overfitting (high bias, high variance, or both).
  3. Take it forward from there.

#3

Hi Supratim,

Thank you so much for the response.

I did split the dataset into training and testing sets.

Train Root Mean Squared Error: 1.5045384078370934
Test Root Mean Squared Error: 1.4954539893112404

It is heavily biased and under fitting, i am not sure what kind of hyper parameters i can tune.
Would you mind to suggest the good technique/algorithm in this context.


#4

try random forest algorithm


#5

Hello,

  1. Use random forest algorithm to find the feature_importance
  2. Then, use few features having large feature_importance value
    I hope this will work, but in case it doesn’t work then you should apply some boosting algorithms such as GradientBoostingRegressor ,etc.

#6

@nitesh57 Thank you.

I have tried both Random Forest and XGB.

rf has given good metric then other algorithms with different hyper-parameters.


#7

@ashishdutt, random forest is giving good metrics.
thank you.