Sunday, 11 October 2015

Setting up Apache ZooKeeper Cluster

This tutorial provides step by step instructions to configure and start up Apache ZooKeeper 3.4.6 Multi-node cluster.

Pre-requisites for zookeeper

First thing that we would need in order to install Apache ZooKeeper are multiple machines. 

In this tutorial, We will be utilizing following virtual machines to install Apache ZooKeeper -

Parameter NameServer 1Server 2Server 3
Host Namenode1node2node3
IP Address192.168.0.1192.168.0.2192.168.0.3
Operating SystemUbuntuUbuntuUbuntu
No of CPU Cores444

Apart from above machines, please ensure that the following pre-requisites have been fulfilled to ensure that you are able to follow this article without any issues-
  1. JDK 6 or higher installed on all the virtual machines
  2. JAVA_HOME variable set to the path where JDK is installed
  3. Root access on all the virtual machines as all the steps should ideally be performed by root user
  4. Updated /etc/hosts file on both the Servers with below details

Installing Apache Zookeeper

First step to install Apache ZooKeeper is to download its binaries on both the Servers. In this article, we will be installing Apache ZooKeeper 3.4.6 to set up cluster which can be downloaded from here.

Once the libraries have been downloaded on the Servers, you can extract it to a directory where you would like ZooKeeper to be installed. We will refer this directory as $ZOOKEEPER_HOME throughout this tutorial.

Configuring Multi-node Cluster
Once Apache ZooKeeper has been extracted on all the Servers, next step is to configure these. 
We don't need to mark any node as Leader node during configuration as the leader is automatically chosen by ZooKeeper service. So, configuration for all the nodes will be same. 

First part of configuration involves creating/updating a configuration file called zoo.cfg in $ZOOKEEPER_HOME/conf directory with following contents:

ZooKeeper Configuration - $ZOOKEEPER_HOME/conf/zoo.cfg

# Where you would like ZooKeeper to save its data
# Where you would like ZooKeeper to log

First thing that you would need to do in above zoo.cfg file is to replace the value of dataDir and dataLogDir with the directory where you would like ZooKeeper to save its data and log respectively. Now, let's talk about some of the important parts of above configuration.

clientPort property, as the name suggests, is for the clients to connect to ZooKeeper Service.

Next let's talk about the last two entries in server.x=hostname:port1:port2 format. 

Firstly, there are two port numbers port1(2888) and port2(3888). The first followers use to connect to the leader, and the second is for leader election. 
Secondly, x in server.x denotes the id of node. 

Each server.x row must have unique id. Each server is assigned an id by creating a file named myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameter dataDir.

The myid file consists of a single line containing only the text of that machine's id. So myid of server 1 would contain the text 1 and nothing else. 

The id must be unique within the ensemble and should have a value between 1 and 255.

Starting Up Multi-node Cluster

Once you are all set up, next step is to start the cluster. 

On all the Servers, go to bin directory of Apache ZooKeeper and execute the following commands -

ZOOKEEPER_HOME/bin on all machines
./ start

You can execute the follow command to check the status of Apache ZooKeeper -

ZOOKEEPER_HOME/bin on all machines
./ status

Stopping Multi-node Cluster

In order to stop Apache ZooKeeper, execute the following command on all the Servers -

$ZOOKEEPER_HOME/bin on all machines
./ stop

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...