Summary
Nodes in a Clockspring cluster can go offline due to maintenance, crashes, or network issues. This article explains what happens when a node is lost, how the cluster behaves, how failover works, and what to expect when the node returns. Most importantly: you do not lose FlowFiles during an outage.
1. Types of Node Loss
Clockspring distinguishes between two types of node loss:
A. Graceful Shutdown
The service is intentionally stopped (restart, patching, maintenance).
B. Unexpected Failure
Power loss, OS crash, network drop, hardware issue, or forced kill.
These two scenarios behave very differently.
2. Graceful Shutdown Behavior
When a node is stopped cleanly:
Clockspring automatically offloads all queued FlowFiles from that node
Remaining nodes take over the work
No FlowFiles remain stranded
No data is lost
This is the safest and preferred way to remove a node for maintenance and all currently processing FlowFiles will continue to be processed.
Primary Node Behavior
If the node being shut down is the Primary Node:
A new primary is automatically elected
Primary-only processors resume on the new primary
No manual intervention is needed.
3. Unexpected Node Failure Behavior
If a node goes down without warning:
FlowFiles on that node remain on that node’s disk
They are not redistributed
Other nodes continue processing their own work
The UI still shows the total queue count
A new primary is elected if needed
No data is lost.
All FlowFiles remain intact on disk until the node returns. When the node returns the FlowFiles will pick up where they left off.
What cannot happen
Offload cannot occur (the node isn’t running)
Other nodes cannot access the missing node’s queues
FlowFiles cannot be redistributed automatically
This is expected and safe behavior.
4. Recovery: What Happens When the Node Comes Back
When the failed node comes back online:
It immediately rejoins the cluster
FlowFiles stored on that node become available again
Processing resumes exactly where it left off
Any primary-only processors rejoin the scheduler
There is no manual fix required unless corruption or disk failure occurred.
5. How Failover Works
Primary Node Failover
If the primary node goes down:
A new primary is selected
Processors configured for “Primary Node Only” start running on the new primary
Processor Execution Mode
Execution mode does not move FlowFiles.
If a processor was running on one node:
It stops when that node stops
Its work resumes only when the node returns (unless another node was also running it)
6. When You Should Take Action
You only need to intervene if:
A node will NOT return
You must manually offload its queues (on each affected connection) to avoid leaving stranded FlowFiles.
Otherwise, simply removing the node will orphan that data.
The node is stuck in a bad health state
Try:
restarting Clockspring
rebooting the host
checking local disk space and repo health
Work is unevenly distributed and won’t self-correct
This is usually a design issue, not a cluster issue.
The fix is to add load balancing at the appropriate connection.
7. What You Should NOT Do
Do not repeatedly offload just because a node is slow
Do not offload to “rebalance” workload — it won’t help
Do not panic when queue counts remain high with a node down
Do not manually delete FlowFile repository files
Do not assume data is lost — it's not
Clockspring’s cluster model is built to survive node loss without losing work.
8. Practical Summary
Graceful shutdown = automatic drain, no manual work
Crash = FlowFiles stay on that node until it returns
No data is lost
Primary failover happens automatically
Manual offload is only required if a node will NOT return
For even distribution, use load balancing, not offload
Related Articles
How a Clockspring Cluster Works
Queue Load Balancing
Offloading Queues
Execution Node: All Nodes vs Primary Node
Designing Cluster-Safe Flows
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article