What is API rate limiting?

API rate limiting controls how many requests a client can make in a given time window. It protects your API from abuse, prevents denial-of-service attacks, and ensures fair resource allocation across all users.

What happens if an API has no rate limiting?

Without rate limiting, a single client can send unlimited requests and exhaust your server resources, cause outages for other users, or scrape your entire dataset. Rate limiting is a baseline security and reliability requirement.

How do I test if my API has proper rate limiting?

Scantient checks for rate limiting headers and behavior as part of its 20-point security scan. You can also test manually by sending rapid requests and verifying that 429 responses are returned after the threshold is hit.

API Rate Limiting: How to Implement It and Why Skipping It Costs You

Rate limiting is OWASP API Security Top 10 item #4: Unrestricted Resource and Rate Limiting. It appears on the list because APIs without rate limits are routinely abused in ways that cost money, degrade performance, and enable downstream attacks. Despite being well-understood, a significant percentage of APIs in production have no rate limiting on critical endpoints.

This guide explains what rate limiting is, the strategies for implementing it, what attackers do when it's absent, and the practical implementation for the most common API frameworks. For a broader look at the API security landscape, the complete API security guide covers the full set of controls.

What Rate Limiting Is (And What It Isn't)

Rate limiting controls how many requests a client can make to your API within a defined time window. A rate limit of 100 requests per minute means that after the 100th request in any 60-second window, subsequent requests receive a 429 Too Many Requestsresponse until the window resets.

Rate limiting is different from:

Authentication . Rate limiting doesn't verify who the requester is. It limits frequency regardless of identity.
Authorization . Rate limiting doesn't control what a requester can access. It controls how often they can access it.
Throttling . Throttling typically slows responses rather than rejecting them. Rate limiting returns an error response.

The client identifier for rate limiting is usually IP address (for unauthenticated endpoints) or API key / user ID (for authenticated endpoints). Using both in combination . per-IP and per-user . provides the best coverage.

What Attackers Do Without Rate Limiting

Understanding the threat model makes the control easier to justify and correctly scope.

Credential stuffing on auth endpoints

Credential stuffing is the automated testing of username/password combinations from breached credential lists against your login endpoint. Tools like Snipr can test millions of combinations per hour against unprotected endpoints. Without rate limiting on /api/auth/login or /api/users/login, attackers can trivially attempt a large fraction of your user base's credentials.

Even with strong hashed passwords, credential stuffing works because many users reuse passwords from other services that have been breached. Rate limiting doesn't need to be aggressive . 5 failed attempts per IP per minute is enough to make automated stuffing impractical.

Enumeration and scraping

Without rate limiting, competitors and data aggregators can systematically scrape your entire product catalog, user directory, or pricing data in minutes. If your API has endpoints like /api/products/{id} or /api/users/{id}, a bot can iterate through IDs and collect every record you have.

Rate limiting adds friction that makes bulk scraping economically unattractive . especially when combined with API key authentication requirements.

Resource exhaustion and cost amplification

Expensive operations (AI inference, PDF generation, email sending, database-heavy queries) have a per-request cost. Without rate limiting, a single malicious actor . or a buggy client in an infinite loop . can generate thousands of expensive requests, driving up your infrastructure and third-party API costs. This is especially relevant for common API security mistakes in AI-integrated applications.

Is rate limiting actually enforced on your live API?

Configuration and reality often diverge. Scantient tests your live endpoints to verify rate limiting is in effect . not just in code. Free scan, no signup.

Scan Your API Free →

Rate Limiting Strategies

There are three main algorithms. Each has tradeoffs around burstiness, accuracy, and implementation complexity.

Fixed Window

The simplest approach: count requests in fixed time buckets (e.g., every 60 seconds). When the count exceeds the limit, reject requests until the window resets.

Pros: Simple to implement, predictable reset time for clients.

Cons: Vulnerable to burst attacks. A client can make 100 requests at the end of one window and 100 more at the start of the next . 200 requests in two seconds against a "100 requests per minute" limit.

Sliding Window

Counts requests in a rolling window rather than fixed buckets. Each request is stamped with its timestamp; the rate limiter counts requests in the last N seconds.

Pros: Eliminates the boundary burst problem. More accurate rate enforcement.

Cons: More storage required (must track individual request timestamps). Slightly more complex to implement.

Sliding window is the recommended approach for most APIs. Redis sorted sets make it straightforward to implement at scale.

Token Bucket

Each client has a "bucket" of tokens. Each request consumes one token. Tokens refill at a constant rate. When the bucket is empty, requests are rejected.

Pros: Allows controlled bursting (clients can accumulate tokens during quiet periods and use them in bursts). Good for APIs where bursting is legitimate.

Cons: More complex state to maintain. Bursting may be undesirable for some endpoints.

Token bucket is well-suited for APIs where you want to allow short bursts (e.g., a user rapidly paging through results) while still enforcing sustained limits.

Implementation Examples

Next.js API Routes (with Upstash Redis)

Upstash provides a serverless Redis-compatible store with a rate-limiting SDK designed for edge and serverless environments:

import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
import { NextRequest, NextResponse } from "next/server";

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, "60 s"),
  analytics: true,
});

export async function POST(req: NextRequest) {
  const ip = req.ip ?? "127.0.0.1";
  const { success, limit, reset, remaining } = await ratelimit.limit(ip);

  if (!success) {
    return NextResponse.json(
      { error: "Too many requests" },
      {
        status: 429,
        headers: {
          "X-RateLimit-Limit": limit.toString(),
          "X-RateLimit-Remaining": remaining.toString(),
          "X-RateLimit-Reset": reset.toString(),
          "Retry-After": Math.ceil((reset - Date.now()) / 1000).toString(),
        },
      }
    );
  }

  // Handle the request
}

Express.js (with express-rate-limit)

const rateLimit = require('express-rate-limit');

const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 20, // 20 attempts per 15 min per IP
  message: { error: 'Too many requests, please try again later.' },
  standardHeaders: true, // Return RateLimit-* headers
  legacyHeaders: false,
  // For distributed systems, use Redis store:
  // store: new RedisStore({ ... })
});

app.use('/api/auth/login', authLimiter);
app.use('/api/auth/reset-password', authLimiter);

Applying Different Limits by Endpoint

Not all endpoints deserve the same rate limit. A blanket 100 requests/minute applied globally will be too strict for some endpoints and too lenient for others. Apply limits based on the risk profile of each endpoint type:

Endpoint type	Suggested limit	Rationale
Auth (login, register)	5–10 / 15 min per IP	Prevent brute force / stuffing
Password reset	3 / hour per IP	Prevent email bombing
AI / expensive compute	10–20 / min per user	Control compute cost
Read endpoints (data)	100–500 / min per user	Allow legitimate use, limit scraping
Write endpoints (mutations)	30–60 / min per user	Prevent spam, limit data writes
Webhooks / public	60 / min per IP	Baseline protection

The 429 Response: Getting It Right

When you reject a request due to rate limiting, the response matters . both for legitimate clients that need to handle backoff correctly, and for your debugging experience.

Always include these headers in 429 responses:

Retry-After: seconds until the client can retry
X-RateLimit-Limit: the limit that was exceeded
X-RateLimit-Remaining: requests remaining in current window (0)
X-RateLimit-Reset: Unix timestamp when the window resets

Return a clear JSON body: {"error": "rate_limit_exceeded", "retryAfter": 45}. Never return an empty body on a 429 . it makes debugging miserable for API consumers.

Common Rate Limiting Mistakes

Even teams that implement rate limiting often get one of these wrong:

Rate limiting by IP only on authenticated endpoints. IP-based limiting is trivially bypassed by rotating IPs. Authenticated endpoints should rate limit by user/API key identity, not IP.
Skipping rate limiting on internal or "admin" endpoints.If an endpoint is accessible from the internet, it needs rate limiting . regardless of how few clients are supposed to use it.
Setting limits in code but not verifying them in production.Rate limiting middleware can be accidentally disabled by a middleware ordering change, a deployment configuration issue, or a proxy stripping headers. Verify that rate limiting is actually enforced on your live endpoints . not just configured in code. See the OWASP API Top 10 checklist for what to verify at deployment.
Using in-memory storage for distributed systems. In-memory rate limiting doesn't work across multiple instances or serverless function invocations. Use a shared store (Redis, Upstash, DynamoDB) for any distributed deployment.