w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
apache hadoop, hbase and nutch components distribution for 4 servers cluster

Say you have 4 nodes n1, n2, n3 and n4. You can install hadoop and hbase in distributed mode. If you are using Hadoop 1.x -

n1 - hadoop master[Namenode and Jobtracker]
n2, n3 and n3 - hadoop slaves [datanodes and tasktrackers]

For HBase, you can choose n1 or any other node as Master node, Since Master node are usually not CPU/Memory intensive, all Masters can be deployed on single node on test setup, However in Production its good to have each Master deployment on a separate node.

Lets say n2 - HBase Master, remaining 3 nodes can act as

Hive and Nutch can reside on any node. Hope this helps; For a test setup this should be good to go.

Update -

For Hadoop 2.x, since your cluster size is small, Namenode HA deployment can be skipped. Namenode HA would require two nodes one each for an active and standby node.

A zookeeper quorum which again requires odd number of nodes so a minimum of three nodes would be required.

A journal quorum again require a minimum of 3 nodes.

But for a cluster this small HA might not be a major concern. So you can keep

n1 - namenode

n2 - ResouceManager or Yarn

and remaining nodes can act as datanodes, try not to deploy anything else on the yarn node.

Rest of the deployment for HBase, Hive and Nutch would remain same.

© Copyright 2018 w3hello.com Publishing Limited. All rights reserved.