Understanding Provenance Replay in Clockspring

Modified on Thu, 11 Dec, 2025 at 9:24 PM

Provenance replay lets you take a past FlowFile event and re-inject that FlowFile back into your flow. It’s a useful debugging tool, but it is often misunderstood. Replay restores the data, not the external world around it.

This article explains what replay does, what it does not do, and when it is safe to use.

1. What Replay Actually Does

When you select a provenance event and choose Replay, Clockspring:

Loads the FlowFile’s content and attributes as they existed at the selected event
Creates a new FlowFile with that data
Inserts it back into the flow at the processor where that event occurred

The replayed FlowFile then runs forward through the flow from that point onward.

This makes replay ideal for inspecting behavior after making changes to a flow.

2. What Replay Does NOT Do

Replay restores a FlowFile inside Clockspring. It does not roll back anything in external systems.

Replay does not:

Undo database writes
Remove previously created files
Reverse API calls
Reset state in downstream applications
Restore queues, processors, or variables to earlier states

This is the most important point:
Replay is not a recovery or rollback mechanism. It is a debugging tool.

If the original FlowFile wrote to a database, the replayed FlowFile will write again unless your flow has protections.

3. Why This Matters for Production Flows

If your flow performs any external action:

Inserts or updates rows
Sends messages
Calls APIs
Writes files
Triggers downstream automation

A replay will trigger those actions again unless your flow is designed to handle duplicates or replays safely.

Replay should only be used in production when you are certain the downstream effect is acceptable.

4. Common Safe Use Cases

Replay is useful for:

Testing how a flow behaves after you adjust routing or mapping
Isolating a transformation problem
Re-running a FlowFile through a fixed version of a processor
Validating attribute or content changes
Demonstrating behavior without fabricating sample data

These are all internal, controlled scenarios where replay is low risk.

5. Common Unsafe Use Cases

Replay should be avoided when:

Downstream writes are not idempotent
You do not control the receiving system
The original FlowFile triggered a chain of external operations
You are unsure what the replayed FlowFile will hit next

If you need true reprocessing, build idempotency or use a separate “reingest” pattern that doesn’t rely on replay.

6. Cluster Behavior

Replay is always handled by the node where the original event occurred.
If the original node is offline, replay is unavailable for that event.

This preserves lineage accuracy and prevents replaying an event from the wrong execution context.

7. Key Takeaways

Replay restores FlowFile content and attributes from a specific event
It re-injects the FlowFile into the flow at that point
It does not undo or rewind external writes
It is a debugging tool, not a rollback tool
Use replay cautiously in production, especially around side-effecting processors
True recovery requires designed idempotency or reprocessing patterns