Summary
A FlowFile has two parts: content (the data itself) and attributes (metadata about that data). Most processors operate on attributes, not content. Understanding the difference is critical for routing, debugging, and performance in Clockspring.
What a FlowFile actually is
A FlowFile is a lightweight wrapper around two things:
Content
The actual bytes of the file, message, record, or payload
Stored in Clockspring’s content repository
Can be text, JSON, CSV, binary, images, etc.
Attributes
Key/value metadata stored directly on the FlowFile
Used for decisions, routing, lookups, and processor configuration
Strings only (everything is a string)
Attributes travel with the FlowFile through the entire flow unless a processor adds, modifies, or removes them.
Content: what it is and how it behaves
Content is the raw data.
Examples:
An uploaded CSV
A JSON API response
The bytes of a PDF
A ZIP archive
The output of a transformation
Important notes:
Modifying content is expensive (copy-on-write)
Large content should be parsed using record processors, not attributes
Many processors never touch content at all
If a processor modifies content, its provenance will show a new content claim
You should only modify content when you must.
Attributes: what they are used for
Attributes describe the FlowFile or help drive flow behavior.
Common attribute examples:
filenamepathhttp.status.coderecord.countuuidAny values extracted from JSON or CSV
Routing flags like
is_duplicateorerror_message
Attributes are used for:
Routing decisions (
RouteOnAttribute)Building URLs, SQL queries, filenames
Passing data into downstream processors
Looking up values (
LookupAttribute, caches, DB lookups)Setting parameters for readers/writers
Logging, debugging, and provenance tracking
Why processors rely more on attributes than content
Most decisions and branching in a flow come from metadata, not the raw payload.
Examples:
Route files based on
record.typeSkip processing if
sha256matches a known duplicateUse
${customer.id}to build an API pathIdentify error responses using
#{error_message}
Attributes let processors work without reading or rewriting the entire content, which keeps flows much faster.
How content and attributes interact
Think of attributes as “instructions” and content as “material.”
Example:
EvaluateJsonPathreads from contentIt extracts values into attributes
Downstream processors use those attributes to decide routing or update content
Another example:
HashContentreads content and writes the hash into an attributeDetectDuplicate then uses that attribute (
${hash.value}) to decide if the content is new
You rarely need to store large or complex data in attributes.
Common mistakes to avoid
Storing entire JSON blobs in attributes
Attributes are not content. Large values cause performance issues.Forgetting that attributes are strings
No lists, no objects — everything is string-typed.Assuming attributes always remain the same through merges or splits
Some processors drop or rewrite them.Trying to route based on content without extracting attributes first
UseEvaluateJsonPath,ExtractText, or record processors.
Quick rule of thumb
Content = actual data
Attributes = everything you use to make decisions about that data
You work with attributes far more often than you work with content.
Related Articles
How Attributes Move Through a Flow
Attribute Evaluation and Expression Language Basics
Best Practices for Attributes
Common Attribute Patterns
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article