Minimum sample size for multinomial classifier


I need to build a multinomial classification model based on responses collected from an email survey we are planning to send out.
I need to understand the minimum volume of responders I need to build a robust model.
There are 4 classes that need to be predicted.

Can anyone give a simple explanation and formula for calculating the minimum sample size required for building a robust multiple classification model?


hi @Matthew_Sharp

The problem is not the number of classes but the number of variables and their distribution and finally the type of model you plan to use. Without the number of variables and their distributions in you sample it is quiet impossible to predict the accuracy power your model will have.

Hope this help to frame your question



Is there a rough approximation I can use to be sure that I have enough training data?
In the past I have heard mentioned that the ratio of features to samples should be 6/1 and 10/1.

The approximate number of responders I expect is 4k and I plan to build a decision tree in the first instance, before progressing to a Random Forest.