Cluster Execution: All Nodes vs Primary Node

Modified on Thu, 11 Dec, 2025 at 4:16 PM

Summary

Processors in Clockspring can run on every node in a cluster or only on the designated primary node. This setting determines how work is distributed, how external systems are called, and how the flow behaves when multiple nodes are available. This article explains how each option works, when to use it, and important constraints to understand before choosing one.

1. Execution Options

All nodes

The processor runs independently on every cluster node.
Each node pulls and processes only the FlowFiles that physically reside on that node.

Use All nodes when:

You want parallel throughput
Work is stateless
The external system can handle parallel requests
You want to take full advantage of multi-node capacity

Examples:

API calls that support concurrency
Database inserts when the target system allows parallel writers
Transformations, enrichment, routing

Primary node only

Only the primary node is allowed to run the processor.
All other nodes ignore it.

Use Primary node only when:

You want an action to run exactly once
You’re polling or listing (HTTP, DB, directories, S3, etc.)
The external system must not be hit in parallel
Duplicates would cause errors or duplicate data creation

Common use cases:

GenerateFlowFile (starters)
List processors (ListS3, ListFile, ListDatabase)
Scheduled or CRON-based workflows
Actions that modify global state

2. Critical Constraint: Only First Processors Can Be Primary-Only

You can only set Execution Node = Primary Node Only on a processor that starts a flow path.

If a processor already has an incoming connection:

You cannot switch it to Primary Node Only
Clockspring will reject the change and mark the processor invalid

Reason:
FlowFiles already exist in queues, and queues reside on multiple nodes. A mid-flow processor restricted to the primary node would be unable to process FlowFiles that live on non-primary nodes.

So:

Set the execution mode on the first processor in the flow.

3. What Actually Happens to FlowFiles

This matters for understanding why execution node mode behaves the way it does.

Each node has its own FlowFiles in its own local queue
The UI shows a single queue count, but it is an aggregate
FlowFiles do not move between nodes unless you enable load balancing (covered in another article)

So:

In All nodes mode:

Each node processes the FlowFiles that live on that node.

In Primary node only mode:

The primary node only processes the FlowFiles living on the primary node.
FlowFiles on other nodes remain untouched unless you enable load balancing upstream.

This prevents accidental data movement and aligns with Clockspring’s node-local storage design.

4. When to Use Each Mode

Choose All nodes when:

You want more throughput
You need true parallel execution
Work is stateless or independent
You want to fully use cluster CPU
External systems tolerate concurrency

Choose Primary node only when:

You must avoid duplicates
You're triggering downstream pipelines
You're polling a system
You're dealing with global shared state
You’re generating FlowFiles (to avoid multiplying work)

5. Operational Notes

You must stop the processor before changing execution mode
Set execution mode before connecting it
Other nodes will still receive and hold FlowFiles unless load balancing is configured
If the primary node goes down, a new primary is elected and will take over execution

Processor Scheduling: Timer vs CRON
Processor Concurrency and Run Duration
Processor Properties and Expression Language
Queue Load Balancing
Cluster Fundamentals: How Nodes, Queues, and Failover Work