Summary
A Clockspring cluster is a group of nodes working together to process data. Each node stores FlowFiles locally, participates in coordinated queue management, and contributes to overall throughput. This article explains where FlowFiles actually live, how queues behave, how work moves between nodes, how the primary node works, and what happens when nodes shut down or fail.
1. Node Architecture and Local Repositories
Every node maintains its own internal storage:
FlowFile Repository — FlowFile metadata
Content Repository — the bytes of the FlowFile
Provenance Repository — processing history
These repositories are local to each node.
There is no shared disk.
Why this matters
A FlowFile physically resides on the node where it was created or received.
Nodes can only process FlowFiles that live on themselves.
The UI shows combined queue counts, not a shared queue.
2. How Queues Actually Work
A queue between processors is logical, not physical.
Under the hood:
Each node holds its own slice of that queue
The UI aggregates all node-held items into a single number
Nodes do not automatically exchange FlowFiles
Example:
If the UI shows 500 queued, that might be 300 on Node 1 and 200 on Node 2.
Each node processes only what it owns unless redistribution is intentionally configured.
3. When FlowFiles Move Between Nodes
FlowFiles only move between nodes under specific circumstances:
A. Queue Load Balancing (when configured on that connection)
If a connection has load balancing enabled:
Any FlowFile entering that queue is distributed according to the configured strategy
It does not matter where the FlowFile originated
Load balancing behavior itself is covered in a separate article.
B. Graceful Node Shutdown
When a node is intentionally stopped:
Clockspring automatically offloads its queued FlowFiles
These FlowFiles are redistributed to healthy nodes
No work is lost and processing continues normally
This only happens during planned shutdowns.
C. Manual Offload
Users can trigger offload on any queue to redistribute existing FlowFiles.
This is used for:
draining a node
balancing uneven queues
preparing for maintenance
removing a node from the cluster
4. What Does NOT Move FlowFiles
Unexpected Crashes or Power Loss
If a node fails abruptly:
FlowFiles on that node remain there
They are not automatically redistributed
Other nodes cannot process them
When the node returns, its FlowFiles become available again
Clockspring cannot offload a node that is not running.
5. The Primary Node
The cluster elects a primary node to run operations that must execute once:
polling
listing
generating FlowFiles
CRON-based triggers
If the primary node goes offline:
A new primary is elected
Primary-only processors resume on the new primary
FlowFiles on the previous primary remain there until it returns or is offloaded
The primary node is not a coordinator; it only governs processors that must run once.
6. Cluster Behavior During Node Loss
Graceful Shutdown
FlowFiles are automatically offloaded
Work is safely redistributed
Processing continues with fewer nodes
Unexpected Crash
FlowFiles stay on the offline node
Queue counts still include those items
Processing resumes when the node rejoins
7. Designing Flows with Cluster Basics in Mind
Key principles:
Use load balancing only when you want work spread across nodes
Avoid assuming automatic redistribution — it does not happen
Primary-only is ideal for polling and initial triggers
You can fan out processing afterward using a load-balanced queue
Offload is necessary for draining or rebalancing nodes
A dedicated best-practices article will cover deeper design patterns.
Related Articles
Execution Node: All Nodes vs Primary Node
Queue Load Balancing
Offloading Queues
Node Loss, Failover, and Recovery
Designing Cluster-Safe Flows
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article