Hackathon 3.x - Predict customer worth for Happy Customer Bank


  • On popular demand, the competition will be live till Friday 11-September-2015 11:59 p.m. India time
  • The final evaluation would be on 75% of the unseen test data - so only on private leader board
  • There will be 2 prizes along with the glory - one for the weekend leader (INR 10k) and one for the leader after the week (INR 10k)

About Company

Happy Customer Bank is a mid-sized private bank which deals in all kinds of loans. They have presence across all major cities in India and focus on lending products. They have a digital arm which sources customers from the internet.


Digital arms of banks today face challenges with lead conversion, they source leads through mediums like search, display, email campaigns and via affiliate partners. Here Happy Customer Bank faces same challenge of low conversion ratio. They have given a problem to identify the customers segments having higher conversion ratio for a specific loan product so that they can specifically target these customers, here they have provided a partial data set for salaried customers only from the last 3 months. They also capture basic details about customers like gender, DOB, existing EMI, employer Name, Loan Amount Required, Monthly Income, City, Interaction data and many others. Let’s look at the process at Happy Customer Bank.

In above process, customer applications can drop majorly at two stages, at login and approval/ rejection by bank. Here we need to identify the segment of customers having higher disbursal rate in next 30 days.
Data Set
We have train and test data set, train data set has both input and output variable(s). Need to predict probability of disbursal for test data set.


The competition is now over.

Input variables:

ID - Unique ID (can not be used for predictions)
Gender- Sex
City - Current City
Monthly_Income - Monthly Income in rupees
DOB - Date of Birth
Lead_Creation_Date - Lead Created on date
Loan_Amount_Applied - Loan Amount Requested (INR)
Loan_Tenure_Applied - Loan Tenure Requested (in years)
Existing_EMI - EMI of Existing Loans (INR)
Employer_Name - Employer Name
Salary_Account- Salary account with Bank
Mobile_Verified - Mobile Verified (Y/N)
Var5- Continuous classified variable
Var1- Categorical variable with multiple levels
Loan_Amount_Submitted- Loan Amount Revised and Selected after seeing Eligibility
Loan_Tenure_Submitted- Loan Tenure Revised and Selected after seeing Eligibility (Years)
Interest_Rate- Interest Rate of Submitted Loan Amount
Processing_Fee- Processing Fee of Submitted Loan Amount (INR)
EMI_Loan_Submitted- EMI of Submitted Loan Amount (INR)
Filled_Form- Filled Application form post quote
Device_Type- Device from which application was made (Browser/ Mobile)
Var2- Categorical Variable with multiple Levels
Source- Categorical Variable with multiple Levels
Var4- Categorical Variable with multiple Levels


LoggedIn- Application Logged (Variable for understanding the problem – cannot be used in prediction)
Disbursed- Loan Disbursed (Target Variable)

Evaluation Cirteria:

Evaluation metrics of this challenge is ROC_AUC. To read more detail about ROC_AUC refer this article “Model Evaluation Metrics”.


The winner as judged by evaluation metric and Analytics Vidhya would stand to win INR 10,000 (`$200)

Mode of competition:

  • The problem would be made live on discussion portal and new hackathon platform

  • You can check accuracy of your new solution: http://datahack.club:8000

  • You will get an invite to the new platform shortly. You can download the datasets at any place - discussion / hackathon

  • Slack channel will be used for high fives and banter during the contest

  • You can post technical discussion on the discussion portal

  • Solution evaluator and leaderboard will be released on 5th September 2015, 11:59 a.m.


  • On popular demand, the competition will be live till Friday 11-September-2015 11:59 p.m. India time
  • The final evaluation would be on 75% of the unseen test data - so only on private leader board

The zip-files seem to be corrupt.
Nevermind, now they are working.

Some miss alignment in data in many records, any1 facing same issue ?

@parakramjain In what sense are they misaligned?

Can anyone through light on the described Input variables : Var1, Var2, Var4, Var5 and source?

when I open in Excel some values are coming in column AA, AB and AC also whe I open as CSV in R and do Summary it gives three more variables X, X.1, X.2

Total variables in train data set are 26 only.

which tool you opened the train file? variables are 26 only but some data moved to next columns due to some issue… may be I am doing smthing wong

I am using python

Nope, sorry - I’m not able to replicate those. While Excel (Office 2013 - Win7) threw some warnings at me about file formats and such, the data itself stops at column Z. And nothing looks out of place in R, as far as I have seen. Try downloading the files again.

Thanks, downloaded again, working properly

When the submission portal gets online?

The solution checker is now live! You can check it after logging on DataHack platform

You can find the solution checker in the section called problem - all the best everyone!

Need clarity on the LoggedIn variable - does ‘1’ for this field indicate that the customer logged the loan application ? I’m asking because I see instances where the LoggedIn variable is ‘0’ but the loan has been disbursed (‘Disbursed’ field is 1)


Good point. I was just checking that. Even i was wondering what that could mean
log 0 1
0 84435 31
1 1312 1242

1 Like

Same here.

@Kunal I am just unable to login to DataHack . I am using the username and password of analyitcsvidhya to login, but it is not accepting. Could you please help me out with the same.
Thanks in advance!

@ankurv857 - can you just use http://datahack.club:8000 for now?

Thanks a lot Kunal!

© Copyright 2013-2021 Analytics Vidhya