Giving a real likelihood, a probabilistic method, like probabilistic graphs or Bayesian inference would be the best strategy, given you have the right amout of data points (not considering whether they are labeled or not - you just need the same amout of labels for each data class).

For example with 10000 to about 1000000 is a small but relatively representable dataset.

For a single variable 10000 points would indicate a 99.999% chance corectness in the frequentists approach. (Parameters are often also counted as variables, as well as latents.)

For example you can reduce your data using a GMM or PMM to cluster it into an amount of clusters equivalent to your classes and then create a dummy Categorical variable to represent your classes, e.g. taking a delta distribution.

Even if, the Gaussian model is the most complex model with much more parameters, the delta distribution is, in case of minimal labels, also pretty demanding to learn, since you always have to define the probability given a cluster that it belongs to the specific label.

If there is a lag of data K-Means or K-Nearest Neighbors could be a good choice.

You could consider a neural network like method like PCA, non-Negative Matrix Factorisation or Sparse Coding, but I am not sure, how well you can refert back to the original data points at this point. Since it is a transformation into another latent dimension, which, especially in a greedy approach, still has to be invertable.

Nevertheless, using feature importance methods with an essembly model or covariance analysis is a good start and proves, that you can actually work in an econometric context.

For the rest it is totally up to you and the property of your data.

Keep in mind, that specific features can have influence into multiple classes without being significant in a mixture model, but they could have a much more prominent role using neural networks or sparse coding. Whereas, sparse coding, does not build one-hot latent variables / clusters.