How does a machine learning algorithm really learn?


I am new to data science.I am not able to understand how our model learns with the data set.Means to say, when we perform data munging and related stuff,in the end we eliminate some columns from the data set.
what I think is that we eliminate all redundant rows and columns which we think will not add up in decision making and in order to make our model learn we leave our train data with only those rows and columns that we think will provide a positive result.whenever I googled I ended up with machine learning algorithms.I need to know the basic idea about predictive modeling irrespective of the chosen algorithm


hello @khurshidrpvv,

First please see this link for a live demonstration of how the linear regression model is finding the best fit line

As you can see through many iterations the algorithm finds the line that minimizes the sum of squared residuals.Similarly for decision trees the decision boundaries are chosen such that the gain in information from each part inside a decision boundary is more than that in the data without the boundaries(simply said)

To give an example,say you are an algorithm( :slight_smile: ) and you need to cross a forest but you have no idea where to begin but you know that you need to minimize the time taken.So you start exploring the forest and make note of all the paths that can be taken and then ultimately you choose the path with the least time.This is the final path or the final equation that you get and later explorers (data points) will use this to cross the forest.

So you can say that the model ( which ever it is) will use the data to learn about the patterns in it and ultimately generate a final solution.
Hope this helps!!


What I understand from the example that we keep modifying train data to find different paths to cross forest and machine automatically chooses the best one.correct me if I am wrong.
Thank you very much @shuvayan for nice and simple example.


Hi @khurshidrpvv

not exactly, as @shuvayan mentioned you are in the forest, this is a metaphor of the train data then you go through it you tally you results and then take a decision to build a strategy (way) to go out of the forest, knowing that the best way is the lowest cost.

In few word the train set does not change but the way you will go out change to a certain extend. You can not do everything but can twist some parameters for example if you go by motorbike through your forest imagine (hard) !!! do you go for speed or petrol consumption, two parameters not more, now you have three strategies:

  1. Less petrol
  2. Maximum spedd
  3. Optimum speed and petrol

1,2 3 is what we refer as objective function in optimisation and machine learning. In case of linear regression you have only one cost to come out of the forest with the smallest path. (I have not mentioned this choice before!! )

@shuvayan you metaphor of forest is really good :slight_smile:

Hope this help