Difference between train and test file



Hi @kunal since I am a beginner in machine learning can you explain me what is test file & train file ?


Hi @ashu65

Train dataset: This file is used to train the machine learning models. In this dataset, you will have the features (independent variables) and target (dependent variable) . Considering the loan prediction dataset, you will have features such as Gender, Age, Income, etc and the target is to predict loan status.

Test dataset: This file has only the features, and you have to predict the target variable. The model built on the train dataset will be used to predict the target variable on the test set. In the Loan prediction problem, you will have to use the model to predict the Loan status for the data points present in the test set.


Hi @AishwaryaSingh

thanks for clarification here.

I have followed this great tutorial Loan Prediction with mlr package and don’t understand how the author concludes to the prediction accuracies on test data - since the test data doesnt have the outcome?

I do understand the average accuracy calculated on cross-validated folds within the test data since this has the outcome - but not the next step. Can you explain this to me?

Many thanks in advance!


Hi @sarahaliciaboe,

You can upload the predictions for test data on datahack and check your score. Here is the link for the competition: