Hackathon: Cross-sell: target the right customer - issue with submission data



I wanted to submit my results for the hackathon…but there seems to be some issue…
We are expected to estimate the probability of a customer to opt for a personal loan (when responder is Y).
I have done so and as expected, the values lie between 0 and 1.
However, when i upload the data, i get a very low score…
I tried to submit the results by sorting according to the probability, but the score was still low.

What am i missing…can anybody help me out…
It would be great if somebody can share what is expected as the final answer…
Is it possible to share the result of the first 5 records as an example?


@j.akhil.j Make sure you are submitting probabilities of Y, the low scores may be because you are submitting probs of N


Yes, when computing class probabilities, two probabilities come in the result, for N and Y.
i saw that and submitted the probabilities of Y…

However, i will check again, and see if there is any difference in score between N and Y.


I uploaded the “Y” values again…this time it worked and i got a decent score.

For some reason, the site isnt accepting the N values at all…getting the following error:
“Some error occured. Please try after some time.”


You don’t have to submit “N values”. You have to submit the probability values for “Y”



I am getting the same problem of “Some error occurred. Please try after sometime”.
My submission file has customer_id and responders column which has the probability of Y. I have put this file in a folder and zipped it and trying to submit.

Can you please suggest.



Hi @kpksr
Dont put the file in a folder…just right click on the csv file and zip it.
i got the error only when trying to submit the N values…i am able to submit the Y values without any error…
please try without using a folder and let me know if you are able to submit your results


Hi @j.akhil.j

Thanks for the help, I did try that by right click and making it to .zip file and uploaded but got the same error. Below is how my solution.csv file looks like:

XXXXX1 prob_for_being_Y
XXXXX2 prob_for_being_Y
XXXXX3 prob_for_being_Y
XXXXX4 prob_for_being_Y

Likewise, it has all the customers of the test data with their corresponding probabilities of being a responder.



Hi @kpksr
the file seems to be right…it should have been uploaded without any error…
try this:
take the sample submission file, fill the customer ID from the test data, and fill random probability values under Responders column…zip it and try uploading this sheet. This is just a check to see if the process of submission is properly working or not
(random probability values can be generated in excel using the rand or randbetween function)

or…try to upload your results after some time…

@jalFaizy…please help out kpksr


Hi @j.akhil.j

Thank you very much for your help and quick responses.

I took the sample submission file and pasted values from my solution.csv and submitted without any error this time.
Really appreciate your help on this.



no problemo!


@j.akhil.j , @jalFaizy , @kpksr
Does that means that the value in ‘N’ case will be NaN or 0 ?


If you consider it logically, probability value of ‘N’ i nothing but (1 - probability value of ‘Y’)


yes I understand that , but I am unable to understand what it meant when you said earlier :

"You don’t have to submit “N values”. You have to submit the probability values for “Y”


In other words, you have to submit floating point values containing probability values for ‘Y’, i.e. will the customer respond


But for that i will first have to find out if the customer will respond or not .
How am I supposed to find the value of probability if it isn’t available in the training data itself ?



When we build a model for the classification problem, the model actually gives the probability of the class the observation belongs to.
In this case, the model gives the probability that a customer will respond or not. Since this is a binary classification, probability of N is (1-prob of Y) . While submitting, we need to submit with the probability values of the responder being Y.


@kpksr @jalFaizy @j.akhil.j
Since the competition has ended and I received a very bad result in the process , I kind of need some help from you in understanding some topics.

a.) The data given to us had the RESPONDERS column and it only contained ‘Y’ and ‘N’ value only. In that case as you mentioned I needed to find the probability of taking the loan . The main problem I had faced in the entire thing is how to find the probability in range of 0 to 1 if my training data has the output in 0(‘N’) and 1(‘Y’) which is not in range ?

b.) The step I used was to first find if a particular person will take a loan or not then find the probability using Logistic Regression but that too gave the output in 0 and 1 rather than in a range between 0 to 1.

This is quite a long question , I hope you understand my desperation here.
I would be glad to here your suggestions and what I did wrong.
Feel free to critisize or pin point to any wrong method I used. :stuck_out_tongue: