FlowFile Content vs Attributes

Modified on Thu, 11 Dec, 2025 at 11:40 AM

Summary

A FlowFile has two parts: content (the data itself) and attributes (metadata about that data). Most processors operate on attributes, not content. Understanding the difference is critical for routing, debugging, and performance in Clockspring.

What a FlowFile actually is

A FlowFile is a lightweight wrapper around two things:

Content
- The actual bytes of the file, message, record, or payload
- Stored in Clockspring’s content repository
- Can be text, JSON, CSV, binary, images, etc.
Attributes
- Key/value metadata stored directly on the FlowFile
- Used for decisions, routing, lookups, and processor configuration
- Strings only (everything is a string)

Attributes travel with the FlowFile through the entire flow unless a processor adds, modifies, or removes them.

Content: what it is and how it behaves

Content is the raw data.

Examples:

An uploaded CSV
A JSON API response
The bytes of a PDF
A ZIP archive
The output of a transformation

Important notes:

Modifying content is expensive (copy-on-write)
Large content should be parsed using record processors, not attributes
Many processors never touch content at all
If a processor modifies content, its provenance will show a new content claim

You should only modify content when you must.

Attributes: what they are used for

Attributes describe the FlowFile or help drive flow behavior.

Common attribute examples:

filename
path
http.status.code
record.count
uuid
Any values extracted from JSON or CSV
Routing flags like is_duplicate or error_message

Attributes are used for:

Routing decisions (RouteOnAttribute)
Building URLs, SQL queries, filenames
Passing data into downstream processors
Looking up values (LookupAttribute, caches, DB lookups)
Setting parameters for readers/writers
Logging, debugging, and provenance tracking

Why processors rely more on attributes than content

Most decisions and branching in a flow come from metadata, not the raw payload.

Examples:

Route files based on record.type
Skip processing if sha256 matches a known duplicate
Use ${customer.id} to build an API path
Identify error responses using #{error_message}

Attributes let processors work without reading or rewriting the entire content, which keeps flows much faster.

How content and attributes interact

Think of attributes as “instructions” and content as “material.”

Example:

EvaluateJsonPath reads from content
It extracts values into attributes
Downstream processors use those attributes to decide routing or update content

Another example:

HashContent reads content and writes the hash into an attribute
DetectDuplicate then uses that attribute (${hash.value}) to decide if the content is new

You rarely need to store large or complex data in attributes.

Common mistakes to avoid

Storing entire JSON blobs in attributes
Attributes are not content. Large values cause performance issues.
Forgetting that attributes are strings
No lists, no objects — everything is string-typed.
Assuming attributes always remain the same through merges or splits
Some processors drop or rewrite them.
Trying to route based on content without extracting attributes first
Use EvaluateJsonPath, ExtractText, or record processors.

Quick rule of thumb

Content = actual data
Attributes = everything you use to make decisions about that data

You work with attributes far more often than you work with content.

How Attributes Move Through a Flow
Attribute Evaluation and Expression Language Basics
Best Practices for Attributes
Common Attribute Patterns