Improving PutDatabaseRecord Performance by Merging Small FlowFiles

Modified on Thu, 11 Dec, 2025 at 8:53 PM

PutDatabaseRecord performs best when it receives larger batches of records instead of a constant stream of tiny FlowFiles. Each FlowFile sent into PutDatabaseRecord results in its own database operation. When those FlowFiles contain only one or a few records, the processor ends up doing hundreds or thousands of small insert operations. That overhead slows everything down.

You can get significantly better throughput by merging incoming records before they reach PutDatabaseRecord.

Why PutDatabaseRecord struggles with small writes

Every FlowFile routed into PutDatabaseRecord triggers:

A database connection checkout
A transaction open
One or more SQL statements
A commit
A connection return

If you do this for each individual record, the cost adds up.
If you do it once for a large batch, the overhead is paid once.

The database also benefits because it can process sets of records much more efficiently than it can process single-row inserts.

How to improve performance

Insert a MergeRecord processor before PutDatabaseRecord.

Configure MergeRecord to:

Combine many input records into a single FlowFile
Use a record reader and writer that matches your schema
Control output size using Max Records, Max Bin Size, or Max Bin Age

This produces FlowFiles that contain batches of records instead of single rows.

PutDatabaseRecord will handle these batches in one database operation, which usually produces a massive speed increase.

Typical configuration pattern

Upstream processor produces one FlowFile per record
MergeRecord combines them into batches, for example:
- 500 records per FlowFile
- Or up to 1 MB per FlowFile
PutDatabaseRecord writes each batch as a single operation

This reduces the number of database transactions dramatically.

Example:
1,000 FlowFiles with 1 record each vs. 2 FlowFiles with 500 records each.

Two operations will always beat one thousand.

Things to keep in mind

Very large batches can increase memory usage, so start small (100–500 records) and tune upward.
If your table has strict constraints, one bad record in a batch can cause the entire batch to fail. You may want to add a validation step or route errors to a retry flow.
Index-heavy tables can still slow down on huge batches. Adjust batch size based on table performance.
Your DB may have max packet or batch size limits, especially MySQL.

Summary

If PutDatabaseRecord feels slow, it’s usually because it’s processing too many tiny FlowFiles. Merging your records into reasonable batch sizes reduces overhead and lets both Clockspring and the database work far more efficiently.