Scaling a Random Forest Classifier from 3 classes to 300

random_forest

#1

I am working on a classification project and using Random Forest Classification.
I have trained a model to the effect with 90% accuracy with 3 classes.
As i had to transpose i have close to 1500 columns.
Now i have to scale this for 10 million rows and for close to 300 classes.
I see the number of columns to be held in memory will the a few millions.
Obviously the data is private and i cannot put it on the cloud. What is the best way to handle this with a machine or few

Thanks


#2

You need to go for a GPU machine which performs better than the CPU machine for this task.


#3

Thanks Malathi.That seems to be the advice i am getting from others as well.


#4

Hi @earnest,

You could try converting the intergers (if you have many) to float32 and see if you could work out with it. I face similar trade off when working with large datasets. Other things that work is parsing the data and rows in segements. This should definetly help in your case!