It’s my first participation in Hackathon Last Man Standing. I’m currently submitting the submission file in public leaderboard but i dont how to submit the submission file in private leaderboard. Also, i couldn’t understand the statement “The public leaderboard is based on 30% of test data, while final rank would be decided on remaining 70% of test data (which is private leaderboard)”. When will be the 70% test data released ?
@Raghuvaran_Raghu - The private leaderboard is the leaderboard in which you algorithm is run to see how your algorithm will work on unseen data and It is not open to participants it can be seen only by question seater and you can not submit your solution to the private leaderboard.
Hope this helps!
What is a Public-private split?
Adding to the points mentioned by @hinduja1234.
Last Man Standing has a test data with 59310 data points. This is the ONLY submission you have to make. There is no different private and public solution.
What 30-70 split means is that the current ranking (Public Leaderboard) which you are seeing is based on just 30% of these observations (30% of 59310 = 17793). So the remaining responses are not scored yet.
When the competition ends, the FINAL RANKING will the based on the REMAINING 70% of data and the 30% data used for public leaderboard will NOT be considered.
You might be wondering why. It is generally possible to overfit your solution to the public leaderboard. People can view the outcome of their submission and try to make algorithms specific to that outcome which may not generalize well. Also, the competitions has no limit on #submissions so this becomes easy. To prevent users from following such practices, their rank is calculated on the Private (70%) of the data and not Public (30%).
I hope this makes it a bit clear. Please let me know if you have further questions.
One piece of advice - always trust your Cross-Validation score more than the public leaderboard score. It is very much possible that the guys with >95% accuracy on public leaderboard actually end up at a lower position in final rankings because they have overfit their solution too much on the public leaderboard. I’m not saying it will, I’m just saying it is possible.
All the best!
Thanks for the good and neat explanation, but i still have doubt that whether my script will run on their machine or submission file is enough?
The submission will is enough.
The script is just to check the algorithm used. You need not worry about intermediate submissions but you should have a script with your final submission. Its generally a good idea to comment the script. The idea of script is just to check the approach used.
Hey Aarshay, just a few doubts. so can’t we validate the remaining 70% of test set ourselves?
Nopes. That’s the whole point. Think about it In terms of real life applications. You won’t have a test data and you have to make predictions. If you can check on entire test data, the thing becomes too easy (which is not the case in practical applications).
The idea behind giving results on a small proportion of test data in terms of public leaderboard is just to give people validation that they are going in the right direction. You should always trust your validation score more as private results are sometime very much different than public results. I you participated, you would have seen the same in our last hackathon - Date Your Data.
Agreed, so when people submit files to the public leaderboard during hackathons, do they predict for only 30% of the test file and submit it or predict for the whole of test file and submit but the system calculates the lb score for only 30% of the test set??
The submission contains 100% of the test file. But the public leaderboard shows result for only 30% of that data. The final rankings are based on the score on remaining 70%.