Alternative to checking predictions on datahack



If we want an alternative way to check our predictions, and not have to submit to the datahack of this site, what would the alternative be?


Hi @Eddie84,

You can create a validation set out of your training data and then check the performance of the model on this validation set. Since you would not have the actual values of the test data, you would not be able to check the predictions or accuracy at your end.

But creating validation data could be a good approach.

1 Like

Hello Aishwarya,

When you say “since you would not have the actual values of the test data” you are referring to the fact that the test data does not have the response/outcome variable in it?

Also, when we submit to datahack, is that the same thing as a validation set? is that the validation set?


Hi @Eddie84,

Exactly, so you don’t have the actual values for the test data. (but the datahack team does! they created the problem statement :wink: ). When you submit to the datahack, your predictions are checked across the actual values. Since you don’t have the actual values so you can’t check it at your end.

Actually, your predictions are tested against the real values, but again, you can’t do it at your end. The idea of creating the validation set is that you will be able to test your model performance on unseen data, to make sure that you are not overfitting on the training data.



But the validation set does have the real values, no? so aren’t we testing against the real values when we use a validation set?


@Eddie84, you have to create the validation set from your train data. You will have the real values for the validation set. You can go through this article to understand how the validation set is created. And if you have any queries, you can post it here itself.