How datamining and machine learning are different?

machine_learning
data_mining

#1

I have a very basic question. can the same algorithm be used for both data mining and machine learning. For ex: i can find apriori and decision tree in both Datamining as well as machine learning algorithms.

At the algorithm level is there any segregation(mutually exclusive) between data mining and machine learning?

Say Algorithm X is used for Datamining
Say Algorithm Y-is used for machine learning

Also, Does Data mining algorithms use statistics(Pls provide an example)?


#3

May I please know how do you define / differentiate data mining and machine learning?

This post by Tavish might help you


#4

Thats where the confusion is all about. i read one of your blogs. when you say teaching someone how to dance is machine learning and finding the best dance center in town is Data mining. Still the underlying concept is dance right. so how exactly will you differentiate these two?

Also, i can see most of the statistics based regression and datamining algorithms are also been quoted as machine learning algorithms. So can we loosely say, that any algorithm which can be automated by a process with less human intervention as machine learning?

or can we classify the same algorithm(say decision tree) as both a datamining and machine learning algorithm?

Any help would be appreciated.


#5

I will try to explain my view. Other people might give a better opinion as well.

Firstly, there is no hard differentiation between statistical modeling / data mining and machine learning and so the models will be the same in both.

However the difference comes in their way of usage. Machine learning differs from data mining / statistical modeling that there are many assumptions which should be satisfied in statistical learning / data mining but machine learning does not care about these assumptions. In data mining, goodness of the model is also estimated by metrics like p-values while in machine learning only validation sample performance matters. So same linear regression is part of both while the way of using it differs.

Hope this helps!


#6

@Rajaram1986

As mentioned by @srk, the difference is in approach and not in algorithms. The post from Tavish, which he mentioned does an awesome job of explaining this difference.

Any problem can be solved in various ways using data. For example, let us take case of a bank which issues credit cards. Now, every thing they would do would be related to credit cards, but here is how different approaches will solve the same problem:

  1. MIS or the traditional Management Information Systems, would just report things as a matter of fact, likely through a few excels which will be circulated in the bank through excel files. In this case, you are only interested in knowing how much sales of new issuances of cards have happened.

  2. Business Intelligence / Data Visualization will take this MIS a step ahead, represent it in meaningful manner and start delivering business insights to the end user. This would still be about credit cards! But now, you will compare last August with this August and see how have you performed. Are there opportunities, which can help you improve your performance.

  3. Statistical Modeling would look at past trends of consumers and try to reconstruct their behaviour. Common predictions could be how much is the customer likely to spend. You will study the assumptions, validate them, look at statistical measure like p-values before you finalize the model. This is also about credit cards.

  4. Machine learning approach would just say that provide the machine a lot of data, build features over it and let the models learn this on their own. As a modeler, I am not interested or the behaviour I am trying to build in not simple enough to represent by simple mathematical forms. This approach makes no assumption about the distribution of data. But this is also still about Credit Cards.

The point being, all of these terms are different approaches to solve problems in different manner. Sadly, they have been used loosely in the industry and hence there is a lot of confusion around the meaning of these approaches.

Hope this helps.

Regards,
Kunal


#7

Thanks @SRK and @kunal for the detailed explanations.

Regards,
Rajaram


#8

I would like to add what @SRK and @kunal mentioned here.

“Data Mining”, as the name suggests, is a process of discovering useful patterns or insights in your datasets. These insights or newly-found information allow you to take meaningful action to improve your business or to make something better.

e.g lets say I run a grocery store. I record my daily sales in an excel sheet. A typical row contains the product I sold, the customer name who bought it, the day on which the sale was made, the quantity sold, the sale price etc

Now when I look at my excel sheet after 3 months, I can find out which of my products are sold most? Which are my most frequent customers? Which is the busiest day? Based on these facts, I can devise some business strategies to maximize my profit. This is a very basic form of Data Mining by Factual Reporting.

Using this past data, I could also try to predict which customers are likely to turn up on particular days? Or which of my products get sold in combination of each other? I will use certain statistical methods to derive this information. It will again help me to fine-tune my business. This is an improved form of data mining using statistical modeling where I may discover new information which could not be found by factual reporting. Every statistical model works on certain assumptions. I will use that statistical model which satisfy my business assumptions

Lets say – your customers provide you feedback on their purchases. To gain actionable insights from these feedbacks, you write a computer program which takes these feedbacks as input and gives you back a score of customers’ feelings or sentiments about your product. These sentiments can help you gain an understanding about your customers’ needs and you can take some actions to satisy your customers. As the purchases tend to grow by different customers and new feedbacks keep coming, your program gets better at sensing the sentiments of your customers. This is Machine Learning. Behind the scenes, Machine Learning is also using statistical techniques but without making any assumptions about data.

I would say that Machine Learning algorithms can be used to do data mining on large data sets and the mining results improve over time as these algorithms keep learning from experience.


#9

Dear Rajaram1986…
Here is a possible resource … :slight_smile:

opendatascience.com

Hope It will help…