Open Datasets used in DataHack Premier League



Please use this thread for contributing any datasets you found useful for building models for DataHack Premier League.

Some food for thought - match data for various venues, weather data for venue etc.

Looking forward to seeing the datasets.


Just a question. How to procure the various data? Is there any source from where we can download/ procure the same? Or manually prepare data in a spreadsheet after getting the scorecards from websites like Cricinfo/ Cricbuzz?


The data is available in the site as Train data, Test data and sample data. You can download from there.


Can we prepare the dataset based on the information available over the internet and use the same to make the prediction?


You can use outside dataset for modelling but it must be included in the submission.


Is there a slack channel for DPL also?


Is the data shared with us or we have to take it for internet.


Yes just search for #datahackpremierleague on the AV Slack channel.


@getvishalsingh - you can get the base data from

You are free to use more data. But, if you do, make sure to include it in your code files, so that we can verify your codes.



The following datasets will help:

  1. Playing XI of historical IPL games
  2. Weather data - historical and future forecast
  3. Player classification - Batsman, Bowler, All-rounder, Wicket Keeper, Captain, Vice-captain

Does anyone know any site from where we can download these data?