How to calculate Accuracy


#1

I am new to data science and working on loan prediction dataset to predict if loan is going to approve and not.
I have applied logistic regression for classification and I got the coefficients, statistics and model for prediction.
I applied the model on test data as well.

Now I wanted to build confusion matrix and accuracy statistics and for that we need to have predicted class and actual class. I got the predicted class in my test data but how to get actual class to measure accuracy.

If there is any other method then please let me know.


#2

@rahul.agar

In real life, you won’t have the actual values for test set right? So the best thing to do is to make your own test set, i.e. divide your training set into training and validation set and you can then use your validation set to evaluate your model


#3

@jalFaizy I agree but in this case I have been provided with two dataset
train (614 observations) and test (300+). so are you saying I should ignore
test dataset and break train dataset into 2 to calculate accuracy


#4

I have split the train dataset with 70:30 and now I am getting 78.9% accuracy


#5

@rahul.agar 78.9% on training set or validation set?

Also, I forgot to mention one point. So if you are referring the loan prediction practice problem; you can submit your predictions and check the accuracy. A sample submission file is provided for you to see how you should submit the prediction file.


#6

It was on training set. I have split the training set


#7

I have submitted sample file but it doesn’t show the prediction. Does it take time to show the accuracy?


#8

So what is the score on validation set?

You sample submission file should have predictions of the model. Also, it should have all the entries as present in the test set.

I suggest you to follow this article to clear the basic concepts


#9

Now run it on the validation dataset and submit it and I got the accuracy of 77.77%.


#10

Hi Rahul,

To Calculate the accuracy you need to have confusion matrix

Accuracy = (True Positives + True Negatives)/(True Positives + True Negatives + False Positives + False Negatives)

Regards
Vikas Gupta


#11

yes I know that only thing was when I am running the model on test.csv then I don’t have actual loan status and only predicted loan status. I am then submitting it on website and getting the accuracy on my test.csv…which is now reads as 79.16% after applying some feature engineering and different technique