How to find dependent variable in test dataset by using train dataset?



Instead of whole dataset I have 2 files, one of train dataset and another of test dataset. Now the dependent variable is missing in test dataset and is there in train dataset. I need dependent variable in my test dataset so can you help me by telling how to do that in python or what code can be used?


Hi @komalmittal273

The complete dataset is divided into train set and test set. To determine the dependent variable in test dataset, you can use basic machine learning models.

Train dataset: This set is used to train the machine learning models. In this dataset, you will have the features (independent variables) and target (dependent variable) . The model will learn from this dataset.

For example if your train dataset is like:

ID    Age    Gender    LikeChocolate   Purchase(target)

1      10       M           1              1
2      20       M           0              0

When you fit a model on this dataset, your model will learn that people of age 10 who like chocolate will purchase one.

Test dataset: This set has only the features (independent variable), and you have to predict the target variable. The model built on the train dataset will be used to predict the target variable on the test set. For example, when the test data has a value with Age 11 and like chocolate = 1, it will predict the target to be 1.