Extracting the best fitted DecisionTreeClassifier after Grid Search

gridsearch
decision_trees
python

#1

I have implemented grid search to find the best decision tree that could be fitted to my training data using the following code :

parameters={'min_samples_split' : range(10,500,20),'max_depth': range(1,20,2)}
clf_tree=tree.DecisionTreeClassifier()
clf=grid_search.GridSearchCV(clf_tree,parameters)
clf.fit(X,Y)

After the grid search the best parameters were :
{'max_depth': 17, 'min_samples_split': 30}

Now I want to print the tree that was finally fitted to the training data set using the function :

def printTree(clf_tree):
    ---from sklearn import tree
    ---tree.export_graphviz(clf_tree,out_file='tree.dot') 
    ---from sklearn.externals.six import StringIO  
    ---import pydot 
    ---dot_data = StringIO() 
    ---tree.export_graphviz(clf_tree, out_file=dot_data) 
    ---graph = pydot.graph_from_dot_data(dot_data.getvalue()) 
    ---a=graph.write_png("tree.png") 
    ---from IPython.display import Image
    ---import os
    ---return Image(filename=os.getcwd()+'/tree.png')

The input required for the function is the decision tree object, Is there any way to extract the best fitted DecisionTreeClassifier from it? I am aware of creating a new decision tree object with the best parameters, so please suggest a way other than this.

Thanks in advance


#2

Use clf.predict(test_y)

As gridsearchcv automatically build the final model using the best / optimized parameter.
So, when you use the predict method. It will consider the model which is built up on best parameter.


#3

GridSerchCV model has a feature called best_estimator_ which is the best model object. In your case, if you are running a gridsearch on Decision tree, it will be the best decision tree that it would have selected. Once you extarct object you can apply, sklearn decision tree functions on it.

So your code will be

dt_model = clf.best_estimator_
printTree(dt_model)