GridSearchCV and Cross Valdation in Python


#1

Hi All,

I am aware that
K -Fold Cross validation is the technique where we keep a part of dataset and do not train on it and use that part for testing or validation.
Grid Search : This is used to tune our hyperparameters and get the best set of parameters

The question is how GridSearch and Cross Validation are inter linked. Is it necessary that I have to use both of them simultaneously or I can use them independently.

Please explain how this below line is evaluated :
grid = GridSearchCV(estimator=lasso_clf, model_grid, cv=LeaveOneOut(train.shape[0]),scoring=‘mean_squared_error’)

Thanks in advance,
Tarun Singh


#2

Hey @TarunSingh,

  1. You are right about the role of K-Fold and Grid Search.

  2. No, you can use them individually too. You can just cross validate your model using k-fold or if you use another technique for validation you can just use grid search to find the optimum parameters for your model. The reason we link both is usually you want to cross validate your model and want to have best parameters too at the same time(machine learning competitions).

  3. The simplified syntax for grid search in sklearn is

      clf = GridSearchCV(estimator, param_grid, scoring, cv)
    

    Where estimator is your model and parameters are options you want to set for your grid search. Including options like param_grid, this is a dictionary of parameters you want to run your search for, etc.

    Coming to your code,

     grid = GridSearchCV(estimator=lasso_clf, model_grid,
    

    You are creating a grid search for a model lasso_clf and model_grid is your parameters for the search.

      cv=LeaveOneOut(train.shape[0]),scoring=‘mean_squared_error’)
    

    here cv tells how you want to split your dataset for validation. scoring is the metric you are using to test your model’s accuracy, in this case since it is regression model you have used mean_squared_error as the metric.

For more information on cross validation : Read here

Hope this helps,
Sanad :slight_smile:


#3

Thanks a lot !!

Just one more question the cross validation score is nothing but the scoring on the metric we choose like in my case its ‘mean_squared_error’ ?

And on the basis of this cross validation score we choose the best parameters for our model. Ryt ?

Thanks
Tarun Singh


#4

Yes, this score gives you how your model is performing based on the metric that you selected.


#5

Thanks a lot !! This was really helpful