Building Incremental API Pulls in Clockspring

Modified on Fri, 12 Dec, 2025 at 12:28 PM

Most APIs are too large to pull in full every time. Incremental pulls let you request only what changed since the last run.

Done right, this makes API integrations faster and safer.
Done wrong, it causes missed data, duplicates, or both.

This article shows the correct way to build incremental API pulls in Clockspring.

What “Incremental” Really Means

An incremental API pull means:

Each run requests a time window or cursor
That window moves forward over time
Previously seen data may reappear
The flow must tolerate overlap and duplicates

Incremental does not mean “exactly once.”
It means “eventually complete without gaps.”

The Core Rules

Track state outside the API call
Advance state only after success
Use overlap windows
Assume duplicates will happen

Break any of these and data will fall through the cracks.

Common Incremental Patterns

APIs usually support one of these:

updated_since or modified_after timestamps
Cursor or token values
Incrementing IDs

Time-based pulls are the most common, so this KB focuses on those.

Where to Store “Last Run” State

You need a reliable place to store the last successful value.

Recommended options

Database table (most reliable)

One row per integration
Stores last successful timestamp or cursor
Survives restarts and redeploys

External system

Config service
Secrets store
Parameter service

What to avoid

In-memory attributes only
Hardcoded values
Advancing state mid-flow

If the node restarts, in-memory state is gone.

Using the Last Run Value in the API Call

Typical pattern:

Read last-run value
Subtract overlap time
Pass it into the API request

Example attributes:

last_run = 2025-01-10T12:00:00Z
window_start = ${last_run:toDate("yyyy-MM-dd'T'HH:mm:ss'Z'", "GMT"):toNumber():minus(1800000):format("yyyy-MM-dd'T'HH:mm:ss'Z'", "GMT")}

InvokeHTTP URL example:

${base_url}?updated_since=${window_start}

This intentionally re-requests some data.

Why Overlap Windows Matter

If you:

Run every 60 minutes
Query exactly the last 60 minutes

You will miss:

Late-arriving records
Boundary updates
Backfilled data

Recommended approach

Run every 30 minutes
Query the last 60 minutes (or 45)
Handle duplicates downstream

Overlap is a safety net, not a mistake.

When to Advance the Last Run Value

Only advance state after the flow succeeds.

Correct:

API call succeeds
Pagination completes
Data is written successfully
Then update last-run value

Incorrect:

Updating state at flow start
Updating after the first page
Updating even when downstream fails

Advancing state too early causes permanent data loss.

What Value Should You Store?

Best options:

Max updated_at value seen in the response
Cursor returned by the API

Avoid:

“Current time”
Assumed end-of-window timestamps

Always store what the API actually returned.

How This Interacts with Pagination

Incremental pulls and pagination stack together:

Incremental controls which slice of time
Pagination controls how much data per request

Rules still apply:

Validate pagination against the API
Expect duplicates across runs
Deduplicate or upsert downstream

They solve different problems and should not be mixed.

Common Mistakes

No overlap window
Advancing state before success
Using system time instead of API timestamps
Assuming APIs return data in order
Treating duplicates as errors

Most “missing data” bugs come from state handling, not pagination.

Summary

A safe incremental API pull in Clockspring looks like this:

Read last successful state
Query with an overlap window
Page through results safely
Write data
Advance state only after success