Hadoop Clusters and node setup in Real office environment



Hi… Recently completed Big data certification on Hadoop, Mapreduce…

But still have a clarity on how a distributed system works in real time?

Majorly Data & Name node, Job & Task Tracker…(how these Four roles played with physical storage)

For example :
Job tracker and name node in different system or same system?
Ideal cluster numbers and how many systems in a cluster?
Size of those systems(Harddisk,RAM)…

It wud be grateful if anyone give input on this to realize the distirbuted system in real time…


Job tracker doesn’t run on Namenode , it runs on a separate node.Job tracker talks to Namenode to find the location of the data and it finds the best tasktracker(runs on all datanodes) to execute tasks according to data locality.

Number of the systems in the cluster depends totally on your application. You can have 10 nodes as slaves but maintenance is important[failure of a slave node].

Size of these systems is typically high[RAM- 16 GB - 6TB or even more than that]