How Clockspring Prevents Data Loss

Modified on Fri, 12 Dec, 2025 at 1:12 PM

Clockspring is designed to move data reliably, even when downstream systems slow down, fail, or disappear temporarily.

This article explains the high-level model Clockspring uses to prevent data loss and how the main mechanisms work together.



The Core Guarantee

Clockspring is built around this principle:

Data is not dropped unless you explicitly design a flow to drop it.


If something cannot be processed, it is queued.
If a queue fills up, upstream processing slows down.
If the system restarts, in-flight data resumes.


Queues Are the First Line of Defense

Every connection between processors has a queue.

Queues:

  • hold FlowFiles that cannot yet be processed

  • persist across restarts

  • decouple upstream and downstream systems

If a downstream system is slow or unavailable, data waits in the queue instead of being lost.


Backpressure Prevents Overload

Queues have limits. When those limits are reached, backpressure activates.

Backpressure:

  • stops upstream processors from producing more data

  • prevents memory exhaustion

  • prevents uncontrolled disk growth

This is intentional. Slowing down is safer than dropping data.

Backpressure is not an error condition. It is flow control.


What Happens When Downstream Systems Fail

If a downstream system goes down:

  • queues fill

  • backpressure activates

  • upstream processors pause

When the downstream system recovers:

  • queues drain

  • processing resumes automatically

No manual intervention is required unless storage limits are exceeded.


Restart Behavior and Crash Safety

Clockspring persists in-flight state to disk.

If a node:

  • restarts

  • crashes

  • is patched or rebooted

Then:

  • queued data remains

  • in-progress work resumes

  • nothing is silently lost

This is why disk space and repository health matter.


Offloading Is Not Data Loss

Offloading removes a node from active processing but does not discard data.

When offloading is used correctly:

  • in-flight data is redistributed

  • processing continues on remaining nodes

Offloading is about capacity management, not data deletion.


Retries vs Backpressure

Retries and backpressure solve different problems.

  • Retries handle transient failures

  • Backpressure handles sustained pressure

Overusing retries can make things worse. Backpressure keeps the system stable.


What Clockspring Does Not Guarantee

Clockspring does not:

  • guarantee exactly-once delivery across external systems

  • prevent duplicates from APIs

  • magically fix bad upstream data

Those concerns are handled by:

  • idempotency

  • deduplication

  • upsert patterns

The platform guarantees reliable handling, not perfect upstream behavior.


When Data Can Be Lost

Data loss only occurs when:

  • queues exceed disk capacity

  • retention limits are exceeded

  • flows explicitly route data to termination

  • operators manually delete data

These are deliberate conditions, not silent failures.


How This Fits With Other Concepts

This model works together with:

  • pagination and incremental pulls

  • partial failure handling

  • batching and deduplication

  • provenance and replay

Each mechanism reinforces the others.


Summary

Clockspring prevents data loss by design:

  • queues buffer data

  • backpressure controls flow

  • state is persisted to disk

  • restarts are safe


Once you understand this model, Clockspring behavior becomes predictable instead of mysterious.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article