Predicting a Continous output in a dataset with categories



Let says this ,
I’m working on a Machine learning project and i’m working on a dataset with a 4250,13 as shape and it is already group in 7 categories!
Note that those categories can’t be considered as prectictor,
here is how my data is grouped in the categories

A       32.852598 % of the dataset
B       19.151644 % of the dataset
c       19.003181 % of the dataset
D       16.076352 % of the dataset
E       5.132556 % of the dataset
F       4.814422 % of the dataset
G       2.969247 % of the dataset

I have a continuos output that I want to predict , so the task is a regression,
my goal is to predict it in each category , and the final decision will be the category where the predicted output is maximized.

My approach to deal with this problem is to sample my dataset into 7 sub-dataset and train the model in all that 7 dataset. and for a new input predict the output in each category and the final category will be where the predicted output is maximal.

Now I want to know is there any way to do it in one dataset and automatically predict the category where my output is max? with a single model instead of 7??

PS: I’m using python ans scikit learn

Sound like random-forrest but not sure that is it …;
can someone help? Any help will be appreciated…


See if you are using same regressor e.g. Random Forest, then you can define a function which takes complete dataset as input and then iterate over each group (or subDataset) store your predicted results in any form you like e.g. list or Dict, then you can predict the category (This will be your output) where the predicted output is max.
I hope this might help or you can ask if i didnt get your problem.


ok!, thanks for the answer , let me check but I don’t want to use only radom forrest! what if I use another model like SVM?