Hackathon Problem: Do you know who's a Megastar?


#1
            **Predict the category of a working professional**

Welcome to the final stage of this contest. If you have reached till here, we assume either you love Analytics or Amazon :wink: , whichever way it is, go ahead and show what you’ve got. Just remember, either you will win or you will learn.

This hackathon has been designed to help you practice and evaluate your analytics learning and gift yourself the Amazon Voucher worth Rs. 5000(~$100). We have made sure that the difficulty level of this dataset is lower than kaggle, yet challenging. Because, you love challenges. Right?

The Dataset

Objective: The objective of this dataset is to predict the correct category of the working professionals in India.

Description: The dataset shows the data of 16,611 working professionals in India. These working professionals belong to various categories (Rookie, Champion, Rising Star, Star, Rock Star and Mega Star) . The dataset has been divided into 2 parts: Training(9,887) and Test(6,724) Dataset. Categories of working professionals are given in training dataset. You have to predict the categories of professionals in the test dataset. Below is data dictionary to help you understand the variables.

**Important Details**
  1. Participants must be registered on Analytics Vidhya Discussion Portal before 11:00 AM 07-Jun-15.

  2. Submission should reach contest@analyticsvidhya.com latest by 5:30 p.m. IST on 07-Jun-15

  3. Share your model output in csv format with two variables (Var1 and Category) only at contest@analyticsvidhya.com.

  4. The winner will be decided on the basis of correct predictions. The model with highest % of right predictions will win.

  5. The decision will be fair and will be declared by 14-Jun-15.

  6. If you don’t win the Amazon voucher, you can still not lose. The enthusiasm and intent of participation of a participant will also be acknowledged by Analytics Vidhya and will be awarded a special prize. You can show your intent and enthusiasm in multiple ways:

  • Wherever you get stuck while working on dataset, simply post your doubt/question on the forum. It will get answered by our team. Hence, you no longer will be working alone.
  • The quality of questions/ answers posted will also be a decider for choosing the best participant.
7. The dataset provided is a dummy version of a real data collected in the past few years by Analytics Vidhya.
  1. You can also refer the practice guide for this competition.

Download the dataset from here:
train.csv (2.8 MB)
test.csv (1.9 MB)

May the best model win!


Data Hackathon - Online (Date: 7th June 2015)
#2

@Manish, @saimadhup, @kesavkulkarni, @ankurbh07, @qingkai_kong, @imranuddin13, @Jegan_Venkatasamy, @nayan1247, @nramachandran86, @Rudra_Saha, @Prateek_Saxena, @Cross_Bow, @Ankur_Khandelwal, @Sravanth_Kumar, @yakkali_sreenu, @Uday_Bhan_Singh, @Tushar_Kakkar, @Manoj_Hans, @Amartya_Hatua, @kmohan10, @Sunny_Malhotra, @Jayanti_Bhanushali, @Siddharth_Paturu, @gauravkantgoel, @ParmodKumar2, @Sakshi_Dhama, @ankitbhargava06, @al3hIshek, @Pragith_Prakash, @Harshil_Gandhi, @kadusrahul, @Ajoy, @Bruce_N_Sheri_Lyn, @Rizwan_Hudda, @mrajugoud0, @Vivek_Agarwal, @mdebasree9, @joshij, @karthiv, @dkanand86, @Miruna_Popa, @RAM_SINGH, @Arpit_Sidana, @vijayakumar_jawaharl, @gauravkumar37, @Alok_Kumar_Singh, @Keshav_Shrikant, @abhinavunnam, @shubhamgoel27, @mail1, @vikash, @explorer, @Gaurav_Bansal, @guptashubham389, @sagar1785, @jayesh1109, @sumit1, @deepishadwani, @ParindDhillon, @Saranya, @rithwik, @naveen26246, @harshals1, @Hardik, @Inglorious_Ankit, @varunmmm, @Aditya_Joshi, @ashimkapoor, @Ishan, @vijendrapsg, @ranjanpossible, @Debasis_Swain, @Ashish, @devanshrising, @dada_kishore, @Prem_Sangeeth, @Basu, @prakashjhawar, @Atul_Sharma, @Aksgupta123_1, @poornaramakrishnan, @Aman_Srivastava, @Sajan_Kedia, @emohit, @savy2020, @chaitumart, @shuvayan, @kamol1002, @aayushmnit, @BALAJI_SR, @Sunil, @tavish_srivastava, @karthe1, @anon, @kunal, @ajay_ohri, @Aditya_Sharma, @manneabhinay5, @Karthik_Ramasubraman, @koustuv10, @Sambit_Mishra, @vasum, @amarsharma999, @sarthak93, @Malavika, @vishwas_an, @gauravchawla03, @nitish_dydx, @vishesh16, @litankumar, @krishnakumar85, @Nalin, @dagreatemly, @John_Step, @savioseb, @Tushar_Sircar, @Harshita_Dudhe, @Sayantan_Sanyal, @aatishk, @Aditi_Sen, @krypton, @thimmasanidineshredd, @r_marouf, @win_nerds, @Abhishek_Nagarjuna, @abhijit7000, @Swapnil_Sharma, @sbairishal, @palashgoyal1

Here is the dataset


#4

Kunal … what is the deadline for submission?


#5

Latest by 5:30 p.m. today


#6

Hello sir,
any restrictions on tools that we can use?


#7

I’ve much interest on this event but I have no such experience on Data Anslysis although I am eager to do so at all time, could I join this event as well? How can I start it up?


#8

None at all…use what you like!


#9

Hi Kunal,

Will there be a leader board? If yes, then when and where will it be visible?

-Malavika


#10

Guys - Any good tips to handle the Skills column, how do I separate the comma and choose the best skills to form part of my model ?


#11

@Malavika

That is work in progress - it will be available in the next Hackathon. We will be releasing something later today, which will help you judge your models to some extent.

Regards,
Kunal


#12

@Karthik_Ramasubraman

Just post a new question. It will be far more helpful during the day

Kunal


#13

@Nattoise_Lab,

You can start exploring data using excel and find the variables those are explaining the category of professional better compare to rest.

Regards,
Sunil


#14

#Little Finding: There is a clear distinction between each category in terms of average num of months of experience…with Rookie having the least Avg months while Megastar having the highest avg no of months indicating Vintage variable explains better


#15

is it a problem on multinomial logistic regression ?
Prakash Jhawar


#16

Hi,

Can we do data cleaning in Excel or it has to be done through code using R/SAS or other tools?
Thanks!


#17

Any tool is fine.

Winner would need to share the solution…so don’t do anything which is hard to replicate :smile:


#18

@prakashjhawar

This is a classification problem. You can use any technique which can do classification for you


#19

haha :smile:
Sure! I just manually converted Vintage to only months (numeric), was lazy to code it in R :stuck_out_tongue:


#20

Hi Vishesh
@vishesh16
How to convert vintage into month?


#21

@prakashjhawar - i guess we can think of doing in some decision trees. More categorical variables. Trees would work good for these.