Benchmark Solution for "The Data Identity Hackathon" [Student DataFest 2018]



Hi all,

Here is the benchmark solution (Python) for The Data Identity hackathon to get you all started with the problem:

Happy Learning!!


Hi, PulkitS,

Just for the learning purpose only.

Is it possible to tell us the web link to download these 2 .csv files (train_HK6lq50.csv, test_2nAIblo.csv) to run your python code completely?

Thanks in advanced.


Hi @jimmyau,

You can download the train and test set from this link:


Should I have to submit the entire test.csv file in the submission or the sample_submission.csv file only.


Hi @deepam ,

Using the model that you created on the train set, you have to make predictions on the test set. These values are to be submitted for evaluation. The sample submission is for you to understand the format of the submission file.


Hi @deepam,

You only have to submit the sample_submission.csv file. The id in test.csv and sample_submission.csv file are same. So, make predictions for the test.csv file and save them in the sample_submission.csv file. Once you have saved the predictions, you will have only two columns in sample_submission file, i.e. id and is_pass. Then finally you can submit the sample_submission.csv file on the solution checker on the hackathon page.


Thanks PulkitS!