Most APIs are not clean, perfectly incremental data sources.
They resend records, update old data, and deliver results late.
If you assume “each API call equals new data,” you will eventually get:
duplicate rows
missing records
or both
This article explains how to design API ingestion flows in Clockspring that are safe, repeatable, and complete, even when APIs behave poorly.
The Core Rule
Favor completeness over uniqueness at ingest time.
It is always safer to:
ingest overlapping data
allow duplicates temporarily
remove or overwrite duplicates deterministically
Trying to pull “exactly once” data from an API almost always fails.
Why APIs Commonly Produce Duplicates
Duplicates happen because:
APIs resend data intentionally
Records are updated, not just created
Pagination overlaps across runs
Time-based filters are imprecise
Backfills arrive late
This is normal behavior, not a bug.
Your flow must assume duplicates will happen.
Avoid Missing Records: Use Overlapping Time Windows
This is one of the most important patterns in API ingestion.
The risky pattern
Run the flow every 60 minutes
Query the API for records from the last 60 minutes
Why this fails:
Records that arrive late can fall between runs
Boundary timestamps get skipped
Clock drift and rounding cause gaps
Backfilled updates may appear after your run finishes
This is how data quietly falls through the cracks.
The recommended pattern
Run the flow every 30 minutes
Query the API for the last 60 minutes (or last 45)
Expect duplicates
Handle them safely downstream
You intentionally create overlap so nothing is missed.
Rule of thumb:
Your query window should always be larger than your schedule
Choose overlap based on how late data can realistically arrive
This trades duplicates for reliability. That is a good trade.
How to Handle Duplicates Safely
There are two practical strategies. Pick one and be consistent.
Option 1: Upsert Using a Stable Key (Preferred)
This is the simplest and most reliable approach.
Requirements:
The API provides a stable identifier (id, external_id, etc.)
Your database table enforces uniqueness on that key
Behavior:
New records insert
Existing records update
Re-sent records do not create duplicates
This works especially well with overlapping time windows.
Important notes:
Make sure the primary or unique key truly represents the record
Understand whether updates should overwrite existing values
For most teams, this is the correct default.
Option 2: Deduplicate Before Writing
This approach filters duplicates before the database write.
Common methods:
Track seen IDs in a cache or table
Drop records that already exist
Pass only new records downstream
This can reduce database load but adds complexity.
Use this when:
You cannot upsert
The database schema is fixed
You need strict insert-only behavior
If you do this, deduplicate before batching, not after.
Where Deduplication Should Happen in the Flow
Correct order:
Pull data from the API
Normalize records
Deduplicate or upsert logically
Batch records
Write to the database
Do not:
batch first and dedupe later
rely on retries to fix duplicates
assume pagination prevents overlap
Common Mistakes
Running the schedule at the same interval as the query window
Assuming APIs return “only new data”
Treating duplicates as errors
Trying to eliminate duplicates at the API call layer
Ignoring late-arriving updates
Most duplicate problems are design issues, not processor issues.
How This Fits with Pagination and Rate Limits
These patterns work together:
Pagination loops may resend records
Overlapping windows will resend records
Rate limiting may cause retries
Your flow should assume:
records can appear more than once
order is not guaranteed
safety comes from keys and logic, not timing
Summary
To ingest API data safely in Clockspring:
Use overlapping time windows
Expect duplicate records
Prevent duplicates with upserts or deduplication
Batch records for performance
Design for completeness first
If you never miss data, duplicates become a solvable problem.
If you miss data, you may never know it happened.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article