Use of Hadoop in Machine Learning



I have started to learn hadoop. I want to apply machine learning on large datasets using hadoop. How should I start?
I have installed Hadoop. Now, how should i proceed from here.
I am getting totally confused out there with the names like spark, Mahout,Hive,Pig etc.


Hi @gau2112

good luck !!! I am joking with Hadoop and Machine learning the problem is the access to data, if you have massive training set you will have access to NHFS and this takes time. So the best way is to use Spark and if you work in R to use Rspark the RDD will be in memory even if non mutable the duplication is fast, check there ML library as well but there you should use Python. The Spark documentation is very good as well.

Best regards