Is anyone of this called supervised learning? and is this a correct way to solve to problem

supervised_learning
data_science

#1

Say there are two data sets. One is training data sets and other is test data set. Here we need to model the data using training data sets and validate the same model using test data sets. I hope I am correct till here.
My question here is (according to me), Do not you analyst think there are two ways to solve this???

Problem definition:
Say we want to predict some variable(say Y) in the test data set. The total variables in training data set are 7(Say), that are A,B,C,D,E,F and Y(both categorical and numerical). And total no of variables in test data sets should also be 7, that are A,B,C,D,E,F and Y(both categorical and numerical). But in test data set, we need to predict Y(which is the objective).

Solution: There are 2 solutions(What i feel)

  1. Making use of both train and test data sets(comparing). That is, since I already said that the variables are common in both data sets. We can keep those variables as independent and calculate Y(variable).
    To make it very specific, lets say, in training data set, for some values of A,B,C,D,E,F we got Y value. Like this we can conclude in test data sets that for same values of A,B,C,D,E,F, we can get same Y value. This is one type of solution.

  2. Not comparing both train and test data sets. I mean, just taking train data sets and constructing a model( say regression etc…) and then validating the model using test data sets. This is second type of solution.

i want to know which is called supervised learning. Solution 1 or Solution 2 or None of them?


#2

Hi Vinay
In practice, if you have a look at real life data sets your solution 1 never or rarely holds. It is extremely hard to find exactly the same combination of A,B,C,D,E,F values BOTH in train and test. What you mention as solution 2 is the right approach, where in we build a model using train data and check its performance on the test data.

Supervised Learning is when you know your outcome, in this case, it is Y. Irrespective of the technique you use for modelling, supervised learning always involves partitioning your data set into two or three partitions.
Regards
Manish


#3

Thanks for the feedback.