A Guide to Rate Limiting Strategies

No matter how many resources are allocated, systems have a specific capacity beyond which they don’t operate efficiently. Traffic can arrive in bursts, clients retry aggressively, and shared infrastructure makes one team's spike everyone’s outage.

This is where rate limiting helps as a defensive and fairness mechanism. It protects services from overload and abuse, shapes traffic to match real capacity, and ensures that high-value work does not drown in noise.

Rate limiting matters because it enforces a defined policy at the exact moment a request hits the system. The limiter decides whether a request enters the system now, later, or not at all. A good policy aligns with both reliability and user experience. It can protect downstream applications without surprising clients. In other words, rate limiting is not a feature for edge cases—it is part of the core reliability story, as essential as retries, timeouts, and circuit breakers.

In this article, we will focus on the basics of rate limiting and explore five practical strategies used by modern engineering teams.

1. Rate Limiting Basics

Before diving into algorithms, it is important to understand the basic flow of a rate limiter: 1. A Client sends a request to the system. 2. The request hits a Rate Limiter before reaching the API server. 3. The Rate Limiter checks the Rate Limiting Rules and the current Rate Limiting Data (usually stored in a fast in-memory database like Redis). 4. If the request is within the allowed limits, it is given access and forwarded to the API Server. 5. If the limit has been exceeded, the request is dropped, and the client receives a Status Code 429 (Too Many Requests).

2. Fixed Window Counter

The Fixed Window Counter is one of the simplest rate-limiting algorithms. Time is divided into fixed, non-overlapping intervals (or "windows"), such as 02:00 to 02:10.

How it works: Each window has a maximum capacity (e.g., 3 requests). As requests come in, a counter increments. Once the counter reaches the capacity limit, all subsequent requests in that time window are dropped.
Pros: Easy to implement and memory-efficient.
Cons: It suffers from the "boundary problem." If a burst of traffic occurs right at the edge of a time window, the system might end up processing twice the allowed capacity in a very short timespan.

3. Sliding Window Log

To solve the boundary problem of the fixed window, the Sliding Window Log algorithm keeps a detailed log of request timestamps.

How it works: When a request arrives, the system checks the log. It removes all outdated timestamps (those older than the current time minus the window size). If the size of the remaining log is within the limit, the new request's timestamp is added, and the request is accepted. Otherwise, it is rejected.
Pros: Highly accurate. It ensures the rate limit is never exceeded in any rolling time window.
Cons: It is highly memory-intensive because the system must store a timestamp for every single request.

4. Sliding Window Counter

The Sliding Window Counter is a hybrid approach that combines the low memory footprint of the Fixed Window Counter with the accuracy of the Sliding Window Log.

How it works: Instead of logging every request, it uses a formula to estimate the traffic based on the previous window and the current window. For example, if you are 30% into the current minute, the algorithm calculates the limit by taking 70% of the previous minute's count and adding it to the current minute's count.
Pros: Smooths out traffic spikes at window boundaries while remaining incredibly memory-efficient.
Cons: It relies on an approximation, assuming that traffic in the previous window was evenly distributed.

5. Token Bucket

The Token Bucket algorithm is the industry standard for rate limiting, widely used by companies like Amazon and Stripe.

How it works: Imagine a bucket with a maximum capacity of tokens. New tokens are added to the bucket at a uniform, fixed rate. When an incoming request arrives, it must take a token from the bucket to be processed. If tokens are available, the request goes through. If the bucket is empty, the request is dropped.
Pros: It allows for sudden bursts of traffic. As long as there are tokens in the bucket, bursts are handled seamlessly.
Cons: Requires careful tuning of two parameters: the bucket size (burst capacity) and the refill rate.

6. Leaky Bucket

While the Token Bucket allows bursts, the Leaky Bucket algorithm enforces a strict, constant processing rate.

How it works: Incoming requests are placed into a First-In-First-Out (FIFO) queue (the "bucket"). The system pulls requests from the queue and processes them at a strict, fixed rate (the "leak"). If the queue is full when a new request arrives, that request is dropped.
Pros: Excellent for traffic shaping. It smooths out bursty incoming traffic into a steady stream of outgoing requests, protecting downstream systems from sudden spikes.
Cons: If a massive burst fills the queue with older requests, new and potentially more urgent requests will be dropped until the queue clears out.

Conclusion

There is no single "best" rate-limiting strategy. If your system needs to handle bursts, Token Bucket is an excellent choice. If you need strict traffic shaping, look to the Leaky Bucket. For APIs that require strict enforcement without consuming massive amounts of memory, the Sliding Window Counter provides a brilliant middle ground. By choosing the right strategy, you ensure your architecture remains robust, resilient, and fair to all users.

A Guide to Rate Limiting Strategies

A Guide to Rate Limiting Strategies

1. Rate Limiting Basics

2. Fixed Window Counter

3. Sliding Window Log

4. Sliding Window Counter

5. Token Bucket

6. Leaky Bucket

Conclusion

Keep Reading

Why Systems Break in Production: The Real Art of System Design

Backend Development: Beyond APIs - Complete System Guide

Microservices Explained: When Complexity Actually Pays Off