Clockspring is designed to move data reliably, even when downstream systems slow down, fail, or disappear temporarily.
This article explains the high-level model Clockspring uses to prevent data loss and how the main mechanisms work together.
The Core Guarantee
Clockspring is built around this principle:
Data is not dropped unless you explicitly design a flow to drop it.
If something cannot be processed, it is queued.
If a queue fills up, upstream processing slows down.
If the system restarts, in-flight data resumes.
Queues Are the First Line of Defense
Every connection between processors has a queue.
Queues:
hold FlowFiles that cannot yet be processed
persist across restarts
decouple upstream and downstream systems
If a downstream system is slow or unavailable, data waits in the queue instead of being lost.
Backpressure Prevents Overload
Queues have limits. When those limits are reached, backpressure activates.
Backpressure:
stops upstream processors from producing more data
prevents memory exhaustion
prevents uncontrolled disk growth
This is intentional. Slowing down is safer than dropping data.
Backpressure is not an error condition. It is flow control.
What Happens When Downstream Systems Fail
If a downstream system goes down:
queues fill
backpressure activates
upstream processors pause
When the downstream system recovers:
queues drain
processing resumes automatically
No manual intervention is required unless storage limits are exceeded.
Restart Behavior and Crash Safety
Clockspring persists in-flight state to disk.
If a node:
restarts
crashes
is patched or rebooted
Then:
queued data remains
in-progress work resumes
nothing is silently lost
This is why disk space and repository health matter.
Offloading Is Not Data Loss
Offloading removes a node from active processing but does not discard data.
When offloading is used correctly:
in-flight data is redistributed
processing continues on remaining nodes
Offloading is about capacity management, not data deletion.
Retries vs Backpressure
Retries and backpressure solve different problems.
Retries handle transient failures
Backpressure handles sustained pressure
Overusing retries can make things worse. Backpressure keeps the system stable.
What Clockspring Does Not Guarantee
Clockspring does not:
guarantee exactly-once delivery across external systems
prevent duplicates from APIs
magically fix bad upstream data
Those concerns are handled by:
idempotency
deduplication
upsert patterns
The platform guarantees reliable handling, not perfect upstream behavior.
When Data Can Be Lost
Data loss only occurs when:
queues exceed disk capacity
retention limits are exceeded
flows explicitly route data to termination
operators manually delete data
These are deliberate conditions, not silent failures.
How This Fits With Other Concepts
This model works together with:
pagination and incremental pulls
partial failure handling
batching and deduplication
provenance and replay
Each mechanism reinforces the others.
Summary
Clockspring prevents data loss by design:
queues buffer data
backpressure controls flow
state is persisted to disk
restarts are safe
Once you understand this model, Clockspring behavior becomes predictable instead of mysterious.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article