Understanding Provenance Replay in Clockspring

Modified on Thu, 11 Dec, 2025 at 9:24 PM

Provenance replay lets you take a past FlowFile event and re-inject that FlowFile back into your flow. It’s a useful debugging tool, but it is often misunderstood. Replay restores the data, not the external world around it.

This article explains what replay does, what it does not do, and when it is safe to use.


1. What Replay Actually Does

When you select a provenance event and choose Replay, Clockspring:

  1. Loads the FlowFile’s content and attributes as they existed at the selected event

  2. Creates a new FlowFile with that data

  3. Inserts it back into the flow at the processor where that event occurred

The replayed FlowFile then runs forward through the flow from that point onward.

This makes replay ideal for inspecting behavior after making changes to a flow.


2. What Replay Does NOT Do

Replay restores a FlowFile inside Clockspring. It does not roll back anything in external systems.

Replay does not:

  • Undo database writes

  • Remove previously created files

  • Reverse API calls

  • Reset state in downstream applications

  • Restore queues, processors, or variables to earlier states

This is the most important point:
Replay is not a recovery or rollback mechanism. It is a debugging tool.

If the original FlowFile wrote to a database, the replayed FlowFile will write again unless your flow has protections.


3. Why This Matters for Production Flows

If your flow performs any external action:

  • Inserts or updates rows

  • Sends messages

  • Calls APIs

  • Writes files

  • Triggers downstream automation

A replay will trigger those actions again unless your flow is designed to handle duplicates or replays safely.

Replay should only be used in production when you are certain the downstream effect is acceptable.


4. Common Safe Use Cases

Replay is useful for:

  • Testing how a flow behaves after you adjust routing or mapping

  • Isolating a transformation problem

  • Re-running a FlowFile through a fixed version of a processor

  • Validating attribute or content changes

  • Demonstrating behavior without fabricating sample data

These are all internal, controlled scenarios where replay is low risk.


5. Common Unsafe Use Cases

Replay should be avoided when:

  • Downstream writes are not idempotent

  • You do not control the receiving system

  • The original FlowFile triggered a chain of external operations

  • You are unsure what the replayed FlowFile will hit next

If you need true reprocessing, build idempotency or use a separate “reingest” pattern that doesn’t rely on replay.


6. Cluster Behavior

Replay is always handled by the node where the original event occurred.
If the original node is offline, replay is unavailable for that event.

This preserves lineage accuracy and prevents replaying an event from the wrong execution context.


7. Key Takeaways

  • Replay restores FlowFile content and attributes from a specific event

  • It re-injects the FlowFile into the flow at that point

  • It does not undo or rewind external writes

  • It is a debugging tool, not a rollback tool

  • Use replay cautiously in production, especially around side-effecting processors

  • True recovery requires designed idempotency or reprocessing patterns

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article