# Rate Limiting Algorithms Compared: Fixed Window, Sliding Window, Token Bucket, Leaky Bucket

> Fixed window, sliding window, token bucket, and leaky bucket compared with Redis Lua scripts, measured boundary-burst numbers, and when each algorithm wins.

**TL;DR:** Rate limiting decides which requests get through and which get a 429. The five practical algorithms are fixed window, sliding window log, sliding window counter, token bucket, and leaky bucket. For most public APIs, pick sliding window counter (best fairness-to-cost ratio) or token bucket (when you want to allow short bursts). Avoid fixed window for anything user-facing because it lets a client send 2x the limit across a window boundary.

## Key takeaways

- A fixed-window limiter set to 10 requests per second allowed 20 requests in a 1ms span in my sandbox test, while sliding window stayed at 10 (as it should!).
- Sliding window counter is the same Redis cost as fixed window plus one extra `GET` per request, and it removes the boundary problem.
- Token bucket is the only common algorithm that intentionally allows bursts above the average rate, which is usually what API users want.
- Leaky bucket is the only one that enforces a strict output rate. It's great for limiting traffic to a downstream service rather than for fairness between users.
- The [`@upstash/ratelimit`](https://github.com/upstash/ratelimit-js) library ships fixed window, sliding window, and token bucket as Lua scripts that work out of the box.

## The best rate limiting algorithms - an overview

| Algorithm | Redis ops per request | Memory per identity | Allows bursts? | Boundary burst risk | Best use |
| --- | --- | --- | --- | --- | --- |
| Fixed window | 1 (`INCR` + `EXPIRE` on first) | 1 counter | No (but leaks at boundary) | High | Cheap internal limits |
| Sliding window log | 3 (`ZREMRANGEBYSCORE`, `ZADD`, `ZCARD`) | One sorted-set entry per request | No | None | Strict fairness, low volume |
| Sliding window counter | 2 `GET`s + `INCR` | 2 counters | No | None (weighted approximation) | Public APIs at scale |
| Token bucket | 1 `HMGET` + `HSET` + `PEXPIRE` | 1 hash with 2 fields | Yes, up to `maxTokens` | None | Developer APIs, SaaS quotas |
| Leaky bucket | 1 hash read/write | 1 hash with 2 fields | No (strict rate) | None | Outbound traffic shaping |

## What problem does a rate limiter solve?

A rate limiter answers one question: given an identity (user ID, API key, IP) and a request, should I allow it now? The algorithm encodes three sub-decisions: how time is sliced, how usage is recorded, and what to do when the budget is gone. Those three choices drive every tradeoff in this article.

The reason there are five algorithms instead of one is that "fairness" means different things. A login endpoint wants to reject a credential-stuffing attack at the second attempt. A public REST API wants to let a customer's batch job burst through 100 calls and then slow down. A queue producer wants to feed a downstream worker at exactly its drain rate. No single counter satisfies all three.

For Redis-backed limiters specifically, there is a fourth concern: the algorithm has to be implementable as a single round trip. Two-step "read then conditionally write" logic races between processes. Every production implementation collapses the check into one Lua script, which is why the [Upstash ratelimit Lua scripts](https://github.com/upstash/ratelimit-js/blob/main/src/lua-scripts/single.ts) all return both the decision and the remaining budget in one call.

## How does the fixed window algorithm work, and why does it leak?

Fixed window divides time into equal intervals (say, 1-second buckets). For each identity, you `INCR` a counter keyed by the current bucket. If the counter exceeds the limit, reject. When the next bucket starts, the counter resets.

In Redis, for example, that whole algorithm fits in two commands per request: an `INCR` to bump the counter, and a `PEXPIRE` on the first hit so the key cleans itself up. As a Lua script:

```tsx id="code-bq6o5dxg"
-- KEYS[1] = "rl:fixed:user-123:1700000000"
-- ARGV[1] = limit, ARGV[2] = window_ms
local count = redis.call("INCR", KEYS[1])
if count == 1 then
  redis.call("PEXPIRE", KEYS[1], ARGV[2])
end
if count > tonumber(ARGV[1]) then
  return 0
end
return 1
```

This is the cheapest possible limiter. It is also the most exploitable. A client that knows the window boundary can fire `limit` requests at `T - 1ms` and another `limit` requests at `T`, and both windows accept them. In a sandbox test I ran with a 10 req/sec limit, this technique allowed all 20 requests through in about 1 millisecond:

```tsx id="code-xyaxj4n1"
== Boundary burst: 10 req limit per 1s window ==
Fixed window allowed:           20 / 20
Sliding window counter allowed: 10 / 20
Sliding window log allowed:     10 / 20
```

My numbers above came from a pure-JS reimplementation of the [Upstash sliding-window Lua script](https://github.com/upstash/ratelimit-js/blob/main/src/lua-scripts/single.ts). Use fixed window for internal jobs where the identity is trusted, or as a coarse safety net behind a more accurate primary limiter. Do not use it on a login endpoint.

## How does the sliding window log algorithm fix the boundary problem?

Sliding window log stores a timestamp for every request and counts how many fall inside the last `window_ms`. In Redis, that is a sorted set keyed by the identity, with the timestamp as both score and member:

```tsx id="code-mqeazux3"
-- KEYS[1] = "rl:log:user-123"
-- ARGV[1] = now_ms, ARGV[2] = window_ms, ARGV[3] = limit
redis.call("ZREMRANGEBYSCORE", KEYS[1], 0, ARGV[1] - ARGV[2])
local count = redis.call("ZCARD", KEYS[1])
if count >= tonumber(ARGV[3]) then
  return 0
end
redis.call("ZADD", KEYS[1], ARGV[1], ARGV[1])
redis.call("PEXPIRE", KEYS[1], ARGV[2])
return 1
```

This is the most accurate algorithm. At every moment, you are evaluating the literal last N seconds of traffic. There is no approximation and no boundary effect.

The cost is linear memory per identity per active second. A user firing 1,000 requests per second at a 60-second window holds a 60,000-entry sorted set in Redis. During an abuse spike, memory blows up exactly when you can least afford it. The Stripe engineering team [moved away from sliding log to GCRA](https://stripe.com/blog/rate-limiters) (a token-bucket variant) for this reason. Use sliding window log only when the limit is small (single-digit requests per minute) and fairness has to be exact, like password reset or 2FA endpoints.

## How does the sliding window counter algorithm work?

Sliding window counter is the practical compromise. Instead of storing every timestamp, you keep two counters: one for the current fixed window and one for the previous fixed window. You then estimate the count over the last `window_ms` by weighting the previous window by how much of the sliding window overlaps it.

The formula from the [Upstash sliding-window Lua script](https://github.com/upstash/ratelimit-js/blob/main/src/lua-scripts/single.ts) is:

```tsx id="code-156g9uvv"
pct_in_current     = (now % window_ms) / window_ms
weighted_previous  = floor((1 - pct_in_current) * requests_in_previous_window)
total              = weighted_previous + requests_in_current_window
allow              = total < limit
```

If you are 25% into the current window, you count 75% of the previous window plus 100% of the current. This assumes traffic in the previous window was uniform, so the result is an approximation rather than a strict measure. In exchange, you get O(1) memory and 2-3 Redis commands per request.

In the boundary test above, sliding window counter caught exactly the same bursts that sliding window log did (10 of 20 allowed). The cases where it disagrees with the log are clustered-burst patterns inside the previous window, and the disagreement is bounded by the cluster size. For public APIs at any reasonable scale, this is the default. It is what [`Ratelimit.slidingWindow`](https://upstash.com/docs/redis/sdks/ratelimit-ts/algorithms#sliding-window) ships in `@upstash/ratelimit`.

One operational note: the [sliding window in a multi-region setup generates a large number of Redis commands](https://upstash.com/docs/redis/sdks/ratelimit-ts/algorithms#usage-2). If you replicate across regions, fall back to fixed window for cross-region coordination.

## How does the token bucket algorithm allow bursts without breaking the average?

Token bucket models capacity as tokens that refill over time. Each identity has three numbers: max capacity, refill rate, and current token count. A request consumes one token. If the bucket is empty, the request is rejected. Idle clients accumulate tokens up to the cap, so a customer that has been quiet for 10 seconds can burst through 10 seconds of saved capacity.

The Redis representation is a hash with `refilledAt` and `tokens` fields. The [Upstash token-bucket Lua script](https://github.com/upstash/ratelimit-js/blob/main/src/lua-scripts/single.ts) lazy-refills on read:

```tsx id="code-orm95p9o"
-- bucket = HMGET(key, "refilledAt", "tokens")
if now >= refilledAt + interval then
  local refills = math.floor((now - refilledAt) / interval)
  tokens = math.min(maxTokens, tokens + refills * refillRate)
  refilledAt = refilledAt + refills * interval
end
if tokens == 0 then return reject end
tokens = tokens - 1
-- HSET(key, "refilledAt", refilledAt, "tokens", tokens)
```

In the sandbox, a bucket with `maxTokens=10` and `refillRate=5` per second allowed exactly 10 of an initial 15-request burst, then 5 of the next 10 requests one second later, matching the spec:

```tsx id="code-6xtfxltt"
== Token bucket: max 10, refill 5/sec ==
Initial burst at t=0:  10 / 15 allowed
After 1s refill:       5 / 10 allowed
```

Token bucket is the right default for any API where users have unpredictable batch behavior: webhooks, AI inference, file uploads. The catch is that it is the most expensive algorithm computationally (a hash get plus a refill calculation plus a hash set), and it is [not supported by `MultiRegionRatelimit`](https://upstash.com/docs/redis/sdks/ratelimit-ts/algorithms#usage-3) in `@upstash/ratelimit` as of this writing.

## How does the leaky bucket algorithm differ from token bucket?

Leaky bucket is a queue with a fixed drain rate. Incoming requests enter the bucket, the bucket drains at a constant rate, and if the bucket is full when a request arrives, the request is rejected. There is no burst capacity, because the output rate is hard-capped at the drain rate.

You can implement it as a token bucket where the bucket capacity equals the drain rate, but the conceptual model is reversed. Token bucket protects the limit by counting tokens left. Leaky bucket protects the *downstream* by counting requests in flight:

```tsx id="code-i413w4h5"
-- bucket = HMGET(key, "level", "updatedAt")
local drained = (now - updatedAt) * leak_per_ms
level = math.max(0, level - drained)
if level + 1 > capacity then return reject end
level = level + 1
-- HSET(key, "level", level, "updatedAt", now)
```

In the sandbox, a leaky bucket with capacity 5 draining 1/sec allowed 5 of an initial 10-request burst, then 3 of 10 more after a 3-second pause (because 3 slots had drained):

```tsx id="code-uz1k6jia"
== Leaky bucket: capacity 5, drain 1 req/sec ==
Burst of 10 at t=0:   5 allowed
After 3s of drain:    3 / 10 allowed
```

Use leaky bucket when something downstream cannot tolerate bursts: a payment processor with strict TPS limits, an outbound email provider, a scraper that has to stay polite. Do not use it for per-user API fairness; it will punish a customer for sending two requests in the same millisecond even if their long-term rate is fine.

## What does this look like with @upstash/ratelimit in code?

The library wraps all three production-grade algorithms behind one `limit()` call. The choice happens in the constructor:

```tsx id="code-ftq786au"
import { Redis } from "@upstash/redis";
import { Ratelimit } from "@upstash/ratelimit";

const redis = Redis.fromEnv();

// 10 requests per 10 seconds, weighted sliding window
export const slidingWindow = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(10, "10 s"),
  analytics: true,
  prefix: "rl:sliding",
});

// 5 tokens refilled every 10s, max burst of 10
export const tokenBucket = new Ratelimit({
  redis,
  limiter: Ratelimit.tokenBucket(5, "10 s", 10),
  analytics: true,
  prefix: "rl:tb",
});
```

In a Next.js middleware, the call site is identical for any algorithm:

```tsx id="code-8vb88khy"
import { Redis } from "@upstash/redis";
import { Ratelimit } from "@upstash/ratelimit";

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, "1 m"),
  analytics: true,
});

export async function middleware(req: Request) {
  const id = req.headers.get("x-forwarded-for") ?? "anon";
  const { success, limit, remaining, reset } = await ratelimit.limit(id);
  if (!success) {
    return new Response(
      JSON.stringify({ error: "Too many requests", limit, remaining, reset }),
      { status: 429, headers: { "content-type": "application/json" } },
    );
  }
  return new Response(JSON.stringify({ ok: true, remaining }));
}
```

The `reset` field is a Unix timestamp in milliseconds when the limit refreshes. For sliding window it points to the start of the next fixed window rather than an exact rolling reset.

## Which algorithm should I pick?

Pick by the failure mode you most want to avoid.

- **Sliding window counter.** Default for public REST APIs with a long-tail user base. It removes the boundary leak, costs about the same as fixed window, and the approximation error is invisible in practice. Pair it with per-identity keys (user ID or API key).
- **Token bucket.** Default for paid APIs with burst-tolerant customers. AI inference endpoints, file upload endpoints, webhook fan-outs. Set `maxTokens` to 2-5x your steady rate to give honest customers headroom.
- **Sliding window log.** Reach for it only when fairness has to be exact and request volumes are low: password reset, 2FA, OAuth code exchange. The memory cost makes it a liability anywhere else.
- **Leaky bucket.** Use it outbound, not inbound. Protect a downstream provider, smooth a queue producer, throttle a scraper.
- **Fixed window.** Cheap internal kill switch. Behind a CDN, behind another limiter, behind anything you trust to absorb the 2x boundary leak.

The honest version of "which is best" is that you usually want two layers: a coarse limiter at the edge (fixed window per IP) and a fair limiter inside the app (sliding window counter or token bucket per user). Each catches a different abuse pattern.

## Why does Redis dominate rate limiting implementations?

Three reasons. First, every algorithm above fits in a single Lua script of under 30 lines, and `EVAL` is atomic in Redis, which eliminates the read-then-write race. Second, `INCR`, `EXPIRE`, sorted sets, and hashes are exactly the primitives these algorithms need, with no extra indirection. Third, Redis sub-millisecond latency means the limiter adds noise-level overhead to a typical API request.

The serverless variant adds a fourth: services like [Upstash Redis expose Redis over HTTP](https://upstash.com/docs/redis/features/restapi), so AWS Lambda, Vercel Edge, and Cloudflare Workers can rate-limit without holding a TCP connection pool. That is why [`@upstash/ratelimit`](https://github.com/upstash/ratelimit-js) is shipped as a connectionless library.

## FAQ

**Can I combine multiple algorithms?** Yes, and you usually should. A common pattern is fixed window per IP at the edge (cheap, stops floods) plus sliding window counter per user inside the app (fair, stops abuse from a single account). They guard different threats.

**What about GCRA?** Generic Cell Rate Algorithm is a clever single-timestamp variant of leaky/token bucket popularized by [Stripe's rate limiter post](https://stripe.com/blog/rate-limiters). It behaves roughly the same as token bucket, with a smaller memory footprint (one timestamp instead of a hash). If you are writing the limiter from scratch and want minimum state, GCRA is worth a look. If you are using `@upstash/ratelimit`, token bucket gets you the same behavior.

**Does the sliding window counter ever disagree with the log?** Yes, when traffic in the previous window was bursty rather than uniform. The disagreement is bounded by how much of the previous window's burst landed in the still-counted portion. For most APIs the error is under a few percent and it errs on the side of allowing the request, which is the right default for paying customers.

**How do I rate-limit across regions?** Cross-region rate limiting is mostly a consistency problem; the algorithm choice matters less than how state is shared. The Upstash docs are explicit that [sliding window across regions multiplies Redis commands](https://upstash.com/docs/redis/sdks/ratelimit-ts/algorithms#usage-2) and that token bucket is unsupported in multi-region. For most multi-region setups, the right answer is to limit per region with a single shared limit value and accept that a coordinated attacker can briefly exceed the global limit. If you need strict global, route each identity to a single home region.

**What HTTP status code should I return?** `429 Too Many Requests`, with a `Retry-After` header set to the seconds until reset. The `Ratelimit` response includes a `reset` timestamp you can convert.

