FlowFile Content vs Attributes

Modified on Thu, 11 Dec, 2025 at 11:40 AM

Summary

A FlowFile has two parts: content (the data itself) and attributes (metadata about that data). Most processors operate on attributes, not content. Understanding the difference is critical for routing, debugging, and performance in Clockspring.


What a FlowFile actually is

A FlowFile is a lightweight wrapper around two things:

  1. Content

    • The actual bytes of the file, message, record, or payload

    • Stored in Clockspring’s content repository

    • Can be text, JSON, CSV, binary, images, etc.

  2. Attributes

    • Key/value metadata stored directly on the FlowFile

    • Used for decisions, routing, lookups, and processor configuration

    • Strings only (everything is a string)

Attributes travel with the FlowFile through the entire flow unless a processor adds, modifies, or removes them.


Content: what it is and how it behaves

Content is the raw data.


Examples:

  • An uploaded CSV

  • A JSON API response

  • The bytes of a PDF

  • A ZIP archive

  • The output of a transformation


Important notes:

  • Modifying content is expensive (copy-on-write)

  • Large content should be parsed using record processors, not attributes

  • Many processors never touch content at all

  • If a processor modifies content, its provenance will show a new content claim


You should only modify content when you must.


Attributes: what they are used for

Attributes describe the FlowFile or help drive flow behavior.


Common attribute examples:

  • filename

  • path

  • http.status.code

  • record.count

  • uuid

  • Any values extracted from JSON or CSV

  • Routing flags like is_duplicate or error_message


Attributes are used for:

  • Routing decisions (RouteOnAttribute)

  • Building URLs, SQL queries, filenames

  • Passing data into downstream processors

  • Looking up values (LookupAttribute, caches, DB lookups)

  • Setting parameters for readers/writers

  • Logging, debugging, and provenance tracking


Why processors rely more on attributes than content

Most decisions and branching in a flow come from metadata, not the raw payload.


Examples:

  • Route files based on record.type

  • Skip processing if sha256 matches a known duplicate

  • Use ${customer.id} to build an API path

  • Identify error responses using #{error_message}


Attributes let processors work without reading or rewriting the entire content, which keeps flows much faster.


How content and attributes interact

Think of attributes as “instructions” and content as “material.”


Example:

  • EvaluateJsonPath reads from content

  • It extracts values into attributes

  • Downstream processors use those attributes to decide routing or update content


Another example:

  • HashContent reads content and writes the hash into an attribute

  • DetectDuplicate then uses that attribute (${hash.value}) to decide if the content is new


You rarely need to store large or complex data in attributes.


Common mistakes to avoid

  • Storing entire JSON blobs in attributes
    Attributes are not content. Large values cause performance issues.

  • Forgetting that attributes are strings
    No lists, no objects — everything is string-typed.

  • Assuming attributes always remain the same through merges or splits
    Some processors drop or rewrite them.

  • Trying to route based on content without extracting attributes first
    Use EvaluateJsonPath, ExtractText, or record processors.


Quick rule of thumb

  • Content = actual data

  • Attributes = everything you use to make decisions about that data


You work with attributes far more often than you work with content.


Related Articles

  • How Attributes Move Through a Flow

  • Attribute Evaluation and Expression Language Basics

  • Best Practices for Attributes

  • Common Attribute Patterns

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article