What Data Clockspring Stores on Disk (and for How Long)

Modified on Fri, 12 Dec, 2025 at 1:05 PM

Clockspring processes data in motion, but it also writes data to disk as part of normal operation. Security reviews often ask what is stored, where it lives, and whether it is permanent.


This article explains what Clockspring stores on disk, why it is stored, and how long it typically remains.


High-Level Summary

Clockspring stores:

  • In-flight data required for reliable processing

  • Operational metadata needed to run the platform

  • Audit and diagnostic data used for troubleshooting

Some data is transient and automatically cleaned up.
Some data is retained based on configuration or operational needs.


Content Repository (FlowFile Content)

What it contains

  • Raw input data

  • Transformed data

  • Intermediate processing results

This is the actual payload data flowing through your pipelines.

How long it is stored

  • Stored only while a FlowFile exists

  • Automatically removed once the FlowFile completes or is dropped

  • Retention is transient by design

If a FlowFile is no longer in the flow, its content is not retained.


FlowFile Repository (State and Lineage)

What it contains

  • FlowFile metadata

  • Attributes

  • Queue state

  • Processing position

This repository tracks where data is in the flow, not the data itself.

How long it is stored

  • Exists only while FlowFiles exist

  • Automatically cleaned up as FlowFiles complete

  • Not intended for long-term retention


Provenance Repository (Audit History)

What it contains

  • Processing history

  • Where data came from

  • Which processors handled it

  • When events occurred

This is audit and troubleshooting data, not business data.

How long it is stored

  • Retention is configurable

  • Commonly retained for hours or days

  • Automatically purged based on size or age limits

Security teams often care about provenance because it may reference data movement, not payloads.


Logs

What they contain

  • Startup and shutdown events

  • Errors and warnings

  • Operational messages

Logs may reference:

  • file names

  • record counts

  • identifiers

They should not contain full payloads unless explicitly logged.

How long they are stored

  • Controlled by log rotation

  • Retention depends on OS and logging configuration


Configuration and State Files

What they contain

  • Flow definitions

  • Processor configuration

  • Controller services

  • Parameter references (not secrets themselves)

These files define how Clockspring runs, not the data it processes.

How long they are stored

  • Persist for the life of the deployment

  • Backed up as part of system configuration

  • Updated when flows or settings change


What Clockspring Does Not Store by Default

Clockspring does not:

  • Archive processed data long-term

  • Store historical copies of completed FlowFiles

  • Retain payloads once flows complete

  • Encrypt and retain application-level copies of data

If data persists long-term, it is because your flow explicitly wrote it somewhere (database, object storage, etc.).


How Retention Is Controlled

Retention behavior is controlled by:

  • Repository sizing limits

  • Age-based cleanup

  • Operational settings

  • Disk capacity

Clockspring is designed to:

  • Apply backpressure instead of dropping data

  • Clean up completed work automatically

  • Fail safely if storage fills up


Security Implications

From a security perspective:

  • Disk encryption protects stored data

  • File permissions restrict OS-level access

  • Retention limits reduce data exposure window

Clockspring assumes host-level security controls are enforced.


Common Misunderstandings

  • “Data stays on disk forever” → false

  • “Clockspring archives payloads” → false

  • “Deleting a flow deletes stored business data” → only if the flow had not completed

Most data stored on disk is temporary and operational.


Summary

Clockspring stores data on disk to reliably process flows, not to archive business data.

  • Payload data is transient

  • Metadata and audit data are retained briefly or per configuration

  • Long-term storage only happens when flows explicitly write data elsewhere

Understanding this helps security teams assess real risk instead of assuming worst case behavior.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article