Fucntion for building a GBM model

Please am referring to the above link where you worked through tuning the parameters in gradient boosting machine, please in your ‘def model fit’ where you have :

def modelfit(alg, dtrain, predictors, performCV=True, printFeatureImportance=True, cv_folds=5):

how did you get ‘dtrain’ and ‘alg’?

Hi @louis,

These are simply the variable names used while defining the function. the alg refers to the algorithm to be used and the dtrain is basically your training data.

In the next code block we call the function

modelfit(gbm0, train, predictors)

In this case the alg is automatically assigned as gbm0 and the dtrain becomes train

what about the ‘predictors’?
Thanks for your response.

I noticed that you fit the raw dataset(dtrain) to get the train data. It’s a good practice to always split the data to train and test then fit the train so as to avoid overfitting.

Hi @louis,

The predictors is the features from the training data. Also, here we have used cross validation instead of creating train-test split to avoid overfitting.

© Copyright 2013-2019 Analytics Vidhya