RouteOnContent

Modified on Thu, 21 Sep, 2023 at 1:56 PM

Commonly you will want to send flowfiles down different paths based on the content of the flowfile. While the RouteOnAttribute processor is designed to evaulate against attributes, this RouteOnContent looks as the content of the flowfile itself instead of an attribute. While you can search for direct text in the file, the most common use is to use a Regular Expression (RegEx) to search the content for matches.

This processor has a 3 properties by default

Match Requirement

This property determines the criteria the processor will use to send flowfiles down a defined path. The 3 strategies are:

Requirement	Value
content must match exactly	This is the default routing strategy and states that the expression that is used must match completely to the content of the flowfile and if 'True' send it down the specified relationship
content must contain match	This setting will send the flowfile down the specified relationship if all expressions in the configuration evaluate to 'True'

Character Set

This property dictates the character set that the content is in. In most cases the default UTF-8 encoding will work, but update this field if you need to use another character set

Content Buffer Size

This property states how far into the file it will search for the specified text. As the content can be large the processor doesn't always load the full content into memory as it can reduce performance. The default size of 1 MB handles most use cases, but this can be modified to load more or less data to search against.

Example:

In all examples below we will assume we have a a flowfile with the following content:

    {  
        "employee": {  
            "name":       "sonoo",   
            "salary":      56000,   
            "married":    true  
        }  
    }

Content Must Match Exactly:

In this situation we will want to check that the content fully matches our RegEx. In order to do this we have to create a RegEx that will match the content completely. In this example I'm validating that we have JSON formatted text with "Employe", "name", "salary", and "married" keys:

Property	Value
valid_employee_json	^\{\s\n\s\"employee\".\n\s\"name\".\n\s\"salary\".\n\s\"married\".\n.\n\s*\}

As this RegEx completely matches our JSON, the flowfile is routed to the relationship with our property name of 'valid_employee_json'

In reality this is likely not the most efficient way of doing this validation, but the example holds true for how to use the Content Must Match Exactly setting.

Content Must Contain Match

If we set the RouteOnContent processor strategy to Content Must Contain Match and run it against the flowfile it will check to ensure that the content has some matching text (one or more times) and ignore any text that does not match.

We will look to see if the "married" key has a value of "true" in this example

Property	Value
is_married	\"married\".*true

Now the processor will evaluate this RegEx against the text, and if it matches, will send the flowfile down the "is_married" relationship.

Note: As the JSON is in a key/value pair, it is likely more efficient to store this data as an attribute (using EvaluateJSONPath) then use a RouteOnAttribute processor.