Hi, I am trying to work through the Python tutorial and can’t find the data. The link takes me to this hackathon but it doesn’t give the data.


go to list of hackatons http://datahack.analyticsvidhya.com/contest/all and click participate. this should do it!


This is the link for the python data tutorial.

Here is the text from the page, “You can download the dataset from here. Here is the description of variables:”

But there is no dataset to download.


Hi, how could i download the data set of Loan Prediction problem, Thanks very much!


@kunal , I am trying to feature engineer the variables. So, want to understand more on these variables.

  1. Is Applicant income monthly/annual?
  2. Also, More the Credit History better the person credibility?


i was not able to find data as well. Is it a hidden link or zip file that we can download? Will really appreciate if you please provide the data used n the tutorial.

Thank you


Hi Vajravi,

  1. Is Applicant income monthly/annual? - Monthly
  2. Also, More the Credit History better the person credibility?- Yes

Suggestion :- As a data analyst or Data Scientist we should be able to understand these values.

@zahir Zahir,

Data files

there are symbol in Note section Train File , Test File,Sample Submissions click on it and files will be downloaded



I am very beginning to analyticsvidhya . to practice i am looking for loan prediction data . don’t know where i can find that one thank you


Query regarding Loan prediction 3 datat set

I just picked up this problem and was trying to understand the data set ,i had a couple of questions for the
In the train set what does Loan_Status suggest ?
Does it suggest that a person with particular demographic data applied and the bank approved the loan ?
If yes ,then what is it that the bank is looking for in the test set ? is it asking the data analysts to build a model which will predict which other loans did it approve or will approve in future?

What use will this information be ? Instead i think the dependent variable should be whether a person (loan_id)
DEFAULTED(Y/N) on the loan or not, that would help the bank seprating the bad applicants from the good ones.

comments please.

For the sake of the problem ,i can assume that Loan_Status has to be predicted and somehow that will help the bank to find which applications should it approve.


I am unable to submit my solution can anyone please help me with this.it says error caused please try later


Hi , yes here you are predicting the loan_status, you should build a model which predicts weather the loan should be approved or not on the test data by training your model on the train data.


I got a score of 0.7778 can some one says how to improve my score, i used linear regression model.


I am not able to download the data set


@kunal I have a general doubt that we should do missing value imputation first and then check for relationship of predictive variable w.r.t other variables or vice versa? The solution provided by analytics vidya do it in first way.


I am not getting the dataset of the loan prediction problem. Can u provide me the correct link



I recommend you to check relationship b/w predictor vs other, before and after the imputation of missing values. Because to find the impact of missing values…
Technically your new imputed values should not alter relationship b/w predictor and other vars right…!!!

Finally As you are building model based of imputed data… You have to conform with relationship that you obtain after imputation of missing values.

I hope it helps you…
Cheers… Happy Analytics…


hi ayub…m very new to data analysis…and this is my first data set on which i’m working…can u plz help me …how did u proceed in solving dis data set…


If you have registered and you still can’t open the download page of data, maybe it is because your browser version is not right, you can try to use Google Chrome. it dose work to me.


you have to sign up in analyticsvidhya