Insights

The Problem with APIs & Data

Jul 31, 2025

The Problem with APIs & Data

For years, APIs have been the default solution for getting data out of SaaS tools and into a customer’s data environment. But at modern data scale, this approach is starting to crack.

What used to work fine for small datasets and simple use cases is now buckling under the pressure of increasingly large datasets and complex customer demands.

⏳ Rate Limits

Most APIs implement rate limits to protect infrastructure and ensure fair use. While this makes sense in general, it becomes a major bottleneck when dealing with data export since data workloads are inherently bursty.

There are many ways that API rate limits are imposed, whether it’s calls per minute (or other time unit), concurrent calls, total volume, or IP based just to name a few. However rate limiting is implemented, they all add significant complexity to pulling large volumes of data and cause errors in data pipelines at unpredictable times.

Meme about rate limitsd

API rate limits throttle progress, causing timeouts, retries, and ultimately a degraded user experience.

📖 Pagination Problems

APIs are designed for transactional workloads — one request, one response. But exporting data is inherently different. It’s a bulk operation, often involving millions of rows. While pagination helps, paginating through 500+ requests is slow and error prone.

Queries time out. Rate limits are hit. Hopefully customers have good error handling and their pipeline doesn’t fail on page 287 due to a bad record. Even with retry logic and batching, pagination leads to slow data exports.

🐢 Backfill Bottlenecks

Backfilling historical data is a common requirement — whether for analytics, compliance, or migration. Unfortunately, APIs are rarely optimized for this use case. Since they’re designed for real-time interaction, they prioritize low-latency reads of individual records, not the high-throughput extraction needed to move years of data.

Backilling may involve running dozens or hundreds of jobs over several days just to get complete history, all while managing pagination, rate limits, and retries. And hopefully the backfill doesn’t use up your monthly quota and cause your daily job to fail.

Meet Pontoon: API-Free Data Export (and open source!)

🚀 Direct Warehouse Integration

Pontoon takes a fundamentally different approach: instead of exporting data via an API, it syncs data directly to your customer's data warehouse. Whether they use Snowflake, BigQuery, or Redshift, Pontoon handles the connection seamlessly.

This eliminates the need for customers to build ETL pipelines altogether and eases the burden on your API infrastructure. Your product controls what gets synced and when, and your customers gets the data exactly where they want it, in a format they can immediately use.

✅ Open Source & Self-Hosted

Pontoon is open source (GitHub) and self-hosted, giving you full control over your data and infrastructure. That means no third-party vendors to trust, no data sent to external services, and full observability into every sync.

You can deploy it in your own cloud, integrate it with your existing systems, and customize it as needed. This flexibility is especially important for teams with strict data compliance or security requirements, as it lets you stay in control without sacrificing capability.