How to test accuracy on test data when test data does not have the output?

machine_learning

#1

Hi everyone,

I have followed this great tutorial and don’t understand how the author concludes to the prediction accuracies on test data - since the test data doesnt have the outcome?

I do understand the average accuracy calculated on cross-validated folds within the test data since this has the outcome - but not the next step. Can anyone explain this to me?

Many thanks in advance!


#2

Hi @sarahaliciaboe,

I dont know what is u r intention clearly, but as per my understanding i am giving the ans below ,

Generally accuracy was checked on train data as well as test data(unseen data u may say because u r model is purely training on train data only, so model is know the patterns inside of the train data).

After building the model , its performance was checked on test data like how many correct events/non correct events (for classification model) was made as per the original data set. Based on the original and predicted scores u r model performance is calculated. try to search confustion matrix for the classification model…so u may have some more information also.


#3

Hi @saisaranv,

thanks for your clarification. I thought that we did not know the true outcome in the test data therefore I was confused how you could estimate the prediction accuracy on that set. But as far as I understand you correctly, we do know the true outcome within the test data, it is just not within the file you downloaded beforehand, but when you submit your file to this platform, right?

Many thanks and best,
Sarah


#4

test data set is not the seperate data set…in ML u have to divide the total data (old data or transaction data) to train data and test data…means test data set is not the another separate file.
ex:
say flipkart and one famous bank wants to integrate with flipkart to introduce EMI facility to bought products.
flipkart will design the model with help of 2 years of data and build the models based on these. so data is divided in to train data and test data (0.80,0.20) ratios . A model is build on trained data and it is tested on data set with original vs and predicted values. so that will tune the model to perform more accurate.
Now the upcoming data will be flown in designed model and will check the present customers status. so flipkart will estimate the customer purchased/new purchased and will take the actions based on situation.

hope got this point and if not also surely post in comments where u have been stuck.