# Share your approach - DataHack Premier League

#1

Hi all,

Hope you would have seen the awesome contest we launched recently. Now, marry your data science skills and cricket mastery to create new predictive models, which can predict performance of players on a stage which matters the most in cricket.

Check out more details about the contest:

Please use this thread to share your thoughts/ suggestions and various approaches.

Regards,
Kunal

3 Likes
Benchmark for DataHack Premier League 2018
pinned globally #2
#3

Can we have an example of the RMSE calculation for player prediction table?

#4

Hi,
We have 187 players featuring in IPL 2018.
For simplicity let us assume that there is only one match with 2 players in each team (MI and CSK) for whom you have predicted 10,15, 18 and 20 runs respectively.

On the basis of actual toss result, relevant predictions for each player would be picked from your submission and RMSE would be calculated across all players and matches.
For example, in actual if each player scored 14 runs the RMSE would be:

square root of ((10 - 14)^2 + (10-15)^2 + (10-18)^2 + (10-20)^2)/4) = 16 + 25 + 64 + 100

The wicket RMSE and Extras RMSE would be calculated in a similar manner.

#5

Thanks Ankit for sharing the perspective. We need to predict one more binary column i.e. playing_xi_flag from the pool of players sold at auction for a particular team. For e.g.11 players need to be selected out of 25 players for CSK and in actual if I predict 8 players correctly then what would be the method to calculate RMSE( sample size will be 8 or 11). Needless to say that the incorrect selection values will be negated. Am i missing anything?

#6

I think have got the answer. Incorrectly predicted players transaction values(runs, wickets) will be 0 and cost you the error square of the actual value unless the actual value is 0. The sample size will remain 11.

#7

first lets have a forecast table for each match filled with avg. values whether for wickets or runsâ€¦then we will adjust or modify it â€¦
use a derived metric: consistency::
like for each team,bowler and stadium and yearâ€¦pick the consistency like 4 wicket has 10% chances or 2 wickets hs 30%â€¦for each matchâ€¦
â€¦store in teamA_bowler name,he might have played in different teams in the pastâ€¦then aggregate consistencyâ€¦for all the matches of iplâ€¦how to do that create six columns as a for 1 wicket,col b for 2 wickets and fill each col with 0 or 1â€¦when you aggregateâ€¦you will know overall consistencyâ€¦
similarly, for batsmanâ€¦we can have aggression like for Glen maxwell consitency(2)* aggression(4)=8â€¦ whereas Gautam Gambhir consitency(3)agression(3) = 9 is better than 8â€¦
and AB devilliers in the past ipl has really not performed well enoughâ€¦
we need to see if in any particular stadium,a batsman is more consistent like regularly 25 runs so his consistency bracket is 25 runsâ€¦like Ajinkya Rahaneâ€¦but, if he scores lets say just for e.g. consecutive two 50s then in third match he gets out before 15â€¦this is dipping factorâ€¦and is it going to impact .how many players actually show dipping factor and how many dontâ€¦so, in the forecast table we will replace the value with consistent value â€¦and Mumbai indian is more constent and agressive in later half of the seriesâ€¦you need to get it out through plotsâ€¦if we can generate this info thru plot of chartsâ€¦
then next, prepare forecast table we can use average
probabilityâ€¦so plot average and average *probabilityâ€¦and the actual valuesâ€¦
if consistency is more than 60% replace the value prepare the table for each match per stadiumâ€¦
if unpredictability is more importantâ€¦then reduce the average valueâ€¦for e.g. last 5 batsman which are actually bowlers are more unpredicableâ€¦but, can we predict the unpredictable onesâ€¦which algorithm can help us,we can try, neural networks â€¦
for e.g RCB has this problem â€¦that if in last 7 overs they fail to score with run rate above 8â€¦they tend to loose the matchâ€¦even if they score 180+â€¦probably low no of sixes in last 7 overs hurts themâ€¦
all this will help in generating rules â€¦for winning and predictingâ€¦bothâ€¦
reg.,
VGâ€¦

2 Likes
#8

Hiâ€¦quite a good approach to quantify terms like consistency and aggression. Although, I am naive as far as modelling is concerned, speaking purely from a common sense perspective, shouldnâ€™t we need to get external data to get the recent performance of players in domestic, international or other t20 leagues. For example, Lasith Malinga has been outstanding in previous IPLs but if he would have played this IPL his performance would have been far less lethal.
Also how do you analyze runs/wickets for players who are playing the IPL for the first time?

1 Like
#9

lets take top first four batsman of RCBâ€¦we will take their average in past 3 IPL seriesâ€¦
quinton de cock 21 runs(21 balls)â€¦
Virat Kohliâ€¦ 30 runs (25 balls)â€¦
AB devilliersâ€¦30 runs (25 balls)â€¦
Sarfaraz â€¦22 runs(20 balls)â€¦so total we have 103 runsâ€¦(91 balls)â€¦they just scored with a run rate of 7.0 per overâ€¦in the first 10 oversâ€¦65/2 if we take an average targetâ€¦Virat and Quinton did the right jobâ€¦but,after that next 4 batsman have to score with a very high strike rate of more than 150â€¦
if you see this link only player from RCB with a strike rate above 150 and avg, of .22 is Kedar JAdhavâ€¦and now they have included Brendon Mccullumâ€¦with avg 29 strike rate of 145â€¦butâ€¦i think RCB needs one more player from top 25â€¦or one bowler with somewhat better average and strike rate.above 140.like Hardik pandyaâ€¦or Axar PAtell.or Jadeja.can make a difference in the last 3-4 oversâ€¦once you do thatâ€¦your chance of crossing 175 looks goodâ€¦i say 7/10 timesâ€¦

reg.,
VGâ€¦

#10

Would the RMSE evaluation include only the predicted playing XI?

#11

Hi,
The RMSE calculation wonâ€™t be affected by the selection of playing XI. For a valid submission you must assign 0 runs and wickets to players not in playing XI. RMSE would then be calculated using the actual wickets and runs scored in IPL 2018. The actuals for players not in playing XI would have zero value for both wickets and runs.

Ankit

1 Like
#12

Presenting a simple dashboard showing the performances of top batsmans:
https://public.tableau.com/profile/shailendrapatil92#!/vizhome/IPL_DATA/IPLBatsmanDashboard

Please feel free to play around and give your valuable comments.

9 Likes
#13

Ankit I have few doubts? If u r online please respond.

#14

@prathaps Please ask your queries on this thread.

#15

Well am new to Ml as am at the beginning of my learning path but I wanna try using R, I donâ€™t know how to build model, what is necessary to be considered and what not? I have visualized in the Tableau and Got insights of the data, Can you tell me any source where I can learn and build the model as well as clean my datas of those old players who r not in for the Ipl 2018. Sorry I am interested to work on but I donâ€™t know to where to start from? I found that you where knowledgeable, so it wud be helpful if u help me out and guide me.

#16

Any perspective on how to select/predict the playing XI in every match? Do we need to predict the playing XI or need to select the playing XI based on the experience/assumptions, recent developments in other forms of crickets around the world? i.e. Santner has been ruled out of the IPL for CSK, Md. Shami is unlikely to play IPL for DD due to the recent controversy.

#17

You are allowed to use any information openly available to aid your predictions.

#18

Can you help out what are the factors to be considered it wud be helpful for me in building a good model? Waiting for Your Response.

#19

Hi ,

Can any one please share any thoughts on how the runs and wickets for the new players in the test data player_predictions should be predicted as the new players donâ€™t have any historical data ,so what is the strategy to be considered for this scenario,

Regards,
Sandeep.

#20

Hi,
You could use the openly available performance data for such players from other leagues and international cricket and integrate these with your submissions.
Regards
Ankit

2 Likes