Summary
When ZooKeeper is configured as a multi-node ensemble, Clockspring cannot successfully connect and register with ZooKeeper unless the ZooKeeper ensemble has quorum.
In a standard 3-node ZooKeeper ensemble, at least 2 ZooKeeper nodes must be online and participating. If only 1 of the 3 configured ZooKeeper nodes is running, ZooKeeper will not have quorum. In that state, Clockspring may be able to reach the ZooKeeper port, but ZooKeeper is not actually available to service cluster coordination requests.
This commonly appears during Clockspring startup as a leader election or connection loss issue.
Why ZooKeeper Requires a Majority
ZooKeeper uses a quorum model to make sure the ensemble agrees on cluster state.
For ZooKeeper to safely serve requests, more than half of the configured voting nodes must be available. This prevents two separated groups of ZooKeeper nodes from both thinking they are the active/valid ensemble.
In a 3-node ensemble, the majority is 2 nodes:
3 configured ZooKeeper nodes -> 2 required for quorum
If only 1 of the 3 nodes is running or reachable, that node cannot know whether it is truly alone or whether it has been separated from the rest of the ensemble by a network problem. To avoid serving inconsistent cluster state, ZooKeeper will not operate normally without quorum.
That is why a single running ZooKeeper node is not enough when ZooKeeper is configured with 3 `server.X` entries.
For Clockspring, the result is simple: without ZooKeeper quorum, Clockspring cannot reliably perform cluster registration or leader election.
Symptoms
During Clockspring startup, `application.log` may show warnings similar to:
WARN [main] o.a.n.f.c.l.z.CuratorLeaderElectionManager Unable to determine the Elected Leader for role 'Cluster Coordinator'; assuming no leader has been elected
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator
The ZooKeeper log may show:
WARN [NIOWorkerThread-1:o.a.z.s.NIOServerCnxn@391] - Close of session 0x0
java.io.IOException: ZooKeeperServer not running
Root Cause
This usually means ZooKeeper does not have quorum.
For Clockspring clustered deployments, ZooKeeper is typically configured with a 3-node ensemble by default, using entries similar to:
server.1=<zookeeper-node-1>:2888:3888
server.2=<zookeeper-node-2>:2888:3888
server.3=<zookeeper-node-3>:2888:3888
When ZooKeeper is configured this way, it expects to participate as part of a multi-node ensemble. A single running ZooKeeper node is not enough.
If only 1 ZooKeeper node is running, ZooKeeper does not have quorum. Without quorum, ZooKeeper is not fully available, and Clockspring cannot use it for cluster coordination, leader election, or node registration.
The important point is this:
A ZooKeeper process being up does not mean the ZooKeeper ensemble is healthy. If the ensemble lacks quorum, Clockspring cannot use it.
Why This Breaks Clockspring
Clockspring relies on ZooKeeper for clustered coordination. During startup, Clockspring attempts to connect to ZooKeeper and participate in cluster leader election.
If ZooKeeper does not have quorum, Clockspring cannot reliably create or read the expected coordination paths, such as:
/nifi/leaders/Cluster Coordinator
As a result, Clockspring logs a `ConnectionLossException` and reports that it cannot determine the elected leader.
This is not usually a Clockspring application issue. It is usually a ZooKeeper ensemble availability issue.
How to Confirm
Check the ZooKeeper configuration and determine how many `server.X` entries are configured.
Example:
server.1=<zookeeper-node-1>:2888:3888
server.2=<zookeeper-node-2>:2888:3888
server.3=<zookeeper-node-3>:2888:3888
Then confirm how many of those ZooKeeper nodes are actually running and joined to the ensemble.
The clearest sign of this issue is usually in the ZooKeeper logs under:
/opt/zookeeper/logs
When starting one ZooKeeper node while the other configured ensemble members are down or unreachable, the running node may log warnings like:
WARN [QuorumConnectionThread-[myid=1]-1:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 2 at election address cluster2.example.com/10.x.x.x:3888
java.net.ConnectException: Connection refused
And:
WARN [QuorumConnectionThread-[myid=1]-2:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 3 at election address cluster3.example.com/10.x.x.x:3888
java.net.ConnectException: Connection refused
You may also see ZooKeeper repeatedly attempting leader election:
INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@997] - Notification time out: 400 ms
These messages mean the local ZooKeeper node is trying to contact the other configured ensemble members but cannot reach them on the election port, usually `3888`.
In a 3-node ensemble, this is enough to explain the Clockspring startup failure. One ZooKeeper node may be running and listening on port `2181`, but without at least one additional ZooKeeper node available, the ensemble does not have quorum.
For a 3-node ensemble:
| Configured ZooKeeper Nodes | Running / Reachable ZooKeeper Nodes | Quorum? | Expected Result |
| 3 | 3 | Yes | Healthy |
| 3 | 2 | Yes | Healthy enough to operate |
| 3 | 1 | No | Clockspring cannot register reliably |
| 3 | 0 | No | ZooKeeper unavailable |
The key point is that a local ZooKeeper process can be running but still not be usable by Clockspring. If the logs show that ZooKeeper cannot open election channels to the other configured nodes, and fewer than a majority of the ensemble members are reachable, ZooKeeper does not have quorum.
In that state, Clockspring may show `ConnectionLossException` errors because ZooKeeper is not available for leader election or cluster registration.
Resolution
Start enough ZooKeeper nodes to establish quorum and confirm that the ZooKeeper nodes can communicate with each other.
For the default 3-node Clockspring ZooKeeper ensemble, at least 2 of the 3 ZooKeeper nodes must be running and able to reach each other.
This is important: it is not enough for the ZooKeeper service to be running locally. The ZooKeeper nodes must also be able to communicate with the other configured ensemble members.
In a typical ZooKeeper ensemble, the relevant ports are:
2181 - Client connections from Clockspring to ZooKeeper
2888 - ZooKeeper peer communication
3888 - ZooKeeper leader election
If firewall rules, host firewalls, network ACLs, routing, DNS, or security groups block communication between ZooKeeper nodes on the quorum/election ports, ZooKeeper may fail to establish quorum even though the local ZooKeeper process is running.
For a 3-node ensemble, confirm the following:
Clockspring nodes can reach ZooKeeper on port 2181.
ZooKeeper nodes can reach each other on port 2888.
ZooKeeper nodes can reach each other on port 3888.
At least 2 of the 3 configured ZooKeeper nodes are running and reachable.
Once quorum is established, Clockspring should be able to connect to ZooKeeper, participate in leader election, and complete cluster registration.
Additional ZooKeeper Health Check
You can also verify whether a ZooKeeper node is actually serving requests by running the `srvr` four-letter command against the local ZooKeeper client port.
Run this from one of the ZooKeeper nodes:
echo srvr | nc localhost 2181
If ZooKeeper is running but does not have quorum, or is otherwise not ready to serve requests, you may see:
This ZooKeeper instance is not currently serving requests
That means ZooKeeper is not healthy from a client perspective. Even if the ZooKeeper process is running and port `2181` is open, Clockspring will not be able to use that ZooKeeper node for cluster coordination while it is in this state.
If ZooKeeper is healthy and serving requests, the command should return ZooKeeper server details similar to:
Zookeeper version: <version>, built on <date>
Latency min/avg/max: 0/0.0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x40000018a
Mode: follower
Node count: 11
The important indicators are:
Zookeeper version: ...
Mode: follower
or:
Zookeeper version: ...
Mode: leader
If the command returns the ZooKeeper version and a mode of `leader` or `follower`, the ZooKeeper node is serving requests.
If the command returns:
This ZooKeeper instance is not currently serving requests
then ZooKeeper is not ready, and Clockspring should not be expected to register successfully.
This check is more useful than only confirming that the ZooKeeper process is running. A running process does not prove that ZooKeeper has quorum or is able to serve Clockspring requests.
Important Notes
Do not stop troubleshooting after confirming that Clockspring can reach ZooKeeper on port `2181`.
That only proves that something is listening on the client port. It does not prove that ZooKeeper is healthy or that the ensemble has quorum.
For this issue, also confirm:
At least 2 of the 3 ZooKeeper nodes are running.
The ZooKeeper nodes can reach each other on ports 2888 and 3888.
The ZooKeeper node returns a valid response to: echo srvr | nc localhost 2181
If ZooKeeper is configured as a 3-node ensemble and only one node is available, Clockspring should be expected to fail cluster registration. The ZooKeeper process may be running, but the ensemble is not in a usable state.
Bottom Line
If ZooKeeper is configured as a 3-node ensemble, one running ZooKeeper node is not enough.
Clockspring requires ZooKeeper to be operational, and ZooKeeper requires quorum. With 3 configured ZooKeeper nodes, at least 2 must be running and joined to the ensemble before Clockspring can successfully register and participate in clustering.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article