How to find the contributing features of each tree in Random Forest Classifier in Python

tree
supervised_learning
featureselection
random_forest
python

#1

I have built a Random Forest Classifier with 50 trees and cross validated it with K-folds technique keeping number of folds as 5.Out of 60,630 features in my dataset,I have used 70% of the features.

kfold = model_selection.KFold(n_splits=5, random_state=42)
model=RandomForestClassifier(n_estimators=70,n_jobs=-1,oob_score = True,max_features = 0.7, min_samples_leaf = 50)
results = model_selection.cross_val_score(model,features,labels, cv=kfold,scoring='f1')

How to find the features that went in to building each tree?Likewise I want to find for all 50 trees.

Expected Result:

Tree 1:feature1,feature5,feature 7…
Tree 2: feature 10,feature 17,feature30…
.
.
.
.
Tree 50:feature 40000,feature 60000,feature 60000


#2

Hi @jayashrees

To do this you should have access to the tree structure of the random forest, as you are with classifier if you find the “gain” associated through the path of the variables (leaf) then you can calculate the contribution for each leaf. I cannot help more with Random forest (I am more verse with boosting).
Hope this help little, be careful if you have no tools this could take hours …
Best regards
Alain


#3

Here:

http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/

You can find a way to get that.

Kind Regards,
Carlos.