Summary
Queue load balancing distributes FlowFiles across nodes in a cluster. When enabled on a connection, any FlowFile that enters that queue is redistributed according to the selected strategy. This improves parallel processing but must be used intentionally to avoid unnecessary overhead.
1. What Load Balancing Does
Load balancing is configured on a queue, not on a processor.
When a FlowFile enters a load-balanced queue:
It may be transferred to another node
Assignment depends on the chosen strategy
Downstream processors on all nodes can process the work
Load balancing only affects FlowFiles entering that queue after LB is enabled.
2. Why Use Load Balancing
Use it when you want:
True parallel processing on multiple nodes
Even distribution of heavy workloads
To fan out work from a Primary-only processor
To avoid overloading a single node
3. Load Balancing Strategies
Here are the three modes and their correct behavior:
A. Round Robin
Distributes FlowFiles evenly across nodes in sequence.
Good for:
General parallelism
CPU-heavy steps
Simple even distribution
B. Partition by Attribute
Groups FlowFiles by the value of an attribute and sends each group to the same node.
Good for:
Ensuring related data stays together
Avoiding cross-node contention
Customer/account/order-level partitioning
Examples: partition by customer_id, account_id, order_id
C. Single Node
All FlowFiles entering the queue go to one node.
Important details:
You cannot choose which node
Clockspring decides internally
Still useful when you want all downstream work local to a single node
Common use cases:
Merging FlowFiles
Ordering-sensitive operations
Deduplication
Anything requiring “everything in one place”
This behaves like an implicit “gather step.”
4. When Load Balancing Applies
Load balancing affects every FlowFile that enters the queue, regardless of how it was created.
Examples:
From GenerateFlowFile
From API responses
From splits/merges
From any processor upstream
From Primary-only or All-Nodes execution
What load balancing does not do:
Does not move FlowFiles that were already in the queue before LB was enabled
Does not rebalance FlowFiles sitting inside processors
5. Avoid Load Balancing on Multiple Consecutive Queues
Enabling LB repeatedly can hurt performance.
Why:
Each redistribution costs network bandwidth
It increases CPU workload from serialization, compression, and deserialization
It provides no benefit after the first balancing step
Can cause unnecessary churn and slow down the entire flow
Practical rule:
Load balance once at the point where parallelism is needed.
Do not add LB to every queue.
6. Execution Mode vs Load Balancing
Execution mode (All Nodes vs Primary Node) does not affect load balancing.
Key points:
A Primary-only processor can send to a load-balanced queue
An All-Nodes processor can send to a non-balanced queue
Execution mode controls where the processor runs
Load balancing controls where FlowFiles go next
These behaviors are independent.
7. Compression
When enabling load balancing, you can choose whether to compress FlowFiles in transit.
Benefits:
Reduces network transfer size
Helpful for large FlowFiles
Tradeoffs:
Adds CPU overhead to compress and decompress
Slows down overall throughput for small FlowFiles
Not helpful for already-compressed data (ZIP, PDF, images)
Rule of thumb:
Use compression when your bottleneck is network bandwidth, not CPU.
8. When Not to Load Balance
Avoid load balancing when:
You require strict local processing
You need node-specific behavior
Ordering is critical
You plan to merge downstream
You don’t truly need multi-node parallelism
You're unsure — adding LB by habit is a common mistake
Related Articles
How a Clockspring Cluster Works
Execution Node: All Nodes vs Primary Node
Offloading Queues
Node Loss, Failover, and Recovery
Designing Cluster-Safe Flows
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article