What Data Clockspring Stores on Disk (and for How Long)

Modified on Fri, 12 Dec, 2025 at 1:05 PM

Clockspring processes data in motion, but it also writes data to disk as part of normal operation. Security reviews often ask what is stored, where it lives, and whether it is permanent.

This article explains what Clockspring stores on disk, why it is stored, and how long it typically remains.

High-Level Summary

Clockspring stores:

In-flight data required for reliable processing
Operational metadata needed to run the platform
Audit and diagnostic data used for troubleshooting

Some data is transient and automatically cleaned up.
Some data is retained based on configuration or operational needs.

Content Repository (FlowFile Content)

What it contains

Raw input data
Transformed data
Intermediate processing results

This is the actual payload data flowing through your pipelines.

How long it is stored

Stored only while a FlowFile exists
Automatically removed once the FlowFile completes or is dropped
Retention is transient by design

If a FlowFile is no longer in the flow, its content is not retained.

FlowFile Repository (State and Lineage)

What it contains

FlowFile metadata
Attributes
Queue state
Processing position

This repository tracks where data is in the flow, not the data itself.

How long it is stored

Exists only while FlowFiles exist
Automatically cleaned up as FlowFiles complete
Not intended for long-term retention

Provenance Repository (Audit History)

What it contains

Processing history
Where data came from
Which processors handled it
When events occurred

This is audit and troubleshooting data, not business data.

How long it is stored

Retention is configurable
Commonly retained for hours or days
Automatically purged based on size or age limits

Security teams often care about provenance because it may reference data movement, not payloads.

Logs

What they contain

Startup and shutdown events
Errors and warnings
Operational messages

Logs may reference:

file names
record counts
identifiers

They should not contain full payloads unless explicitly logged.

How long they are stored

Controlled by log rotation
Retention depends on OS and logging configuration

Configuration and State Files

What they contain

Flow definitions
Processor configuration
Controller services
Parameter references (not secrets themselves)

These files define how Clockspring runs, not the data it processes.

How long they are stored

Persist for the life of the deployment
Backed up as part of system configuration
Updated when flows or settings change

What Clockspring Does Not Store by Default

Clockspring does not:

Archive processed data long-term
Store historical copies of completed FlowFiles
Retain payloads once flows complete
Encrypt and retain application-level copies of data

If data persists long-term, it is because your flow explicitly wrote it somewhere (database, object storage, etc.).

How Retention Is Controlled

Retention behavior is controlled by:

Repository sizing limits
Age-based cleanup
Operational settings
Disk capacity

Clockspring is designed to:

Apply backpressure instead of dropping data
Clean up completed work automatically
Fail safely if storage fills up

Security Implications

From a security perspective:

Disk encryption protects stored data
File permissions restrict OS-level access
Retention limits reduce data exposure window

Clockspring assumes host-level security controls are enforced.

Common Misunderstandings

“Data stays on disk forever” → false
“Clockspring archives payloads” → false
“Deleting a flow deletes stored business data” → only if the flow had not completed

Most data stored on disk is temporary and operational.

Summary

Clockspring stores data on disk to reliably process flows, not to archive business data.

Payload data is transient
Metadata and audit data are retained briefly or per configuration
Long-term storage only happens when flows explicitly write data elsewhere

Understanding this helps security teams assess real risk instead of assuming worst case behavior.