# How do I add rate limiting to a Next.js app with Upstash Redis (2026)?

> How to rate limit a Next.js 16 app with and Upstash Redis: sliding window vs fixed window, proxy.ts vs Route Handlers, deny lists, costs.

**TL;DR:** We install the Upstash Ratelimit and Redis SDKs, create a sliding-window limiter at module scope, and call it from a `proxy.ts` file (Next.js 16's replacement for `middleware.ts`) or directly inside Route Handlers and Server Actions. We identify callers by user ID when authenticated and IP otherwise, set a 1-second timeout to fail open on Redis hiccups, and flush the SDK's pending analytics writes after the response is sent.

## Key takeaways

- Upstash Ratelimit v2.0.8 (released January 2026) added [global dynamic limits](https://upstash.com/docs/redis/sdks/ratelimit-ts/features#dynamic-limits), so you can change a limit at runtime without redeploying.
- Sliding window costs 4 Redis commands per hit in the steady state vs 2 for fixed window. Pick fixed window when you're rate-limited a lot or running multi-region. ([costs](https://upstash.com/docs/redis/sdks/ratelimit-ts/costs))
- The SDK's ephemeral cache option rejects already-blocked identifiers without calling Redis at all. This way, it takes zero Redis commands per rejected request on a warm function.
- Turning on protection mode uses Upstash's [auto IP deny list](https://upstash.com/docs/redis/sdks/ratelimit-ts/traffic-protection#auto-ip-deny-list), which pulls from the [ipsum repo](https://github.com/stamparm/ipsum) aggregating 30+ open-source blocklists, refreshed daily at 2 AM UTC.

## Why Upstash Redis instead of in-memory or Vercel KV for Next.js rate limiting?

Serverless Next.js deployments have no persistent memory between invocations. An in-memory counter works in local dev and breaks when Vercel spins up a second Lambda. [Vercel's Fluid Compute](https://vercel.com/docs/fluid-compute) (default for new projects since 2025) softens this a little because multiple invocations [share one physical instance and its global state](https://vercel.com/docs/fluid-compute#isolation-boundaries-and-global-state), so a module-scope counter does count across the requests landing on that instance. But Vercel still scales out under load by spawning more instances, each starting from zero, so we can't rely on it as the source of truth. We need a shared store with sub-10ms reads.

Upstash Redis fits because we have two ways to talk to it. For long-running servers, we can connect over standard TCP with node-redis, ioredis, or any Redis client. For serverless, we can use the REST-based Upstash client. The Ratelimit SDK is [built on top of the REST client](https://github.com/upstash/ratelimit-js) for that serverless case. It has 2k GitHub stars, 18.8k dependents, and is the same library Vercel ships in its own [Ratelimit with Upstash template](https://vercel.com/templates/next.js/ratelimit-with-upstash-redis).

## Which rate limiting algorithm should I pick?

The SDK ships three algorithms:

| Algorithm | Redis commands per hit (intermediate state) | Boundary-burst issue | Multi-region friendly | Good for |
| --- | --- | --- | --- | --- |
| Fixed Window | 2 (EVAL, INCR) | Yes, a user can burst 2n at the window seam | Yes | High-volume APIs where cost matters |
| Sliding Window | 4 (EVAL, GET, GET, INCR) | No, weighted across two windows | Avoid (high command volume) | Default for almost everything |
| Token Bucket | 4 (EVAL, HMGET, HSET, PEXPIRE) | No, burst-friendly by design | Not supported | Paid-tier APIs that need burst tolerance |

The "boundary burst" problem is what makes fixed window leaky. A fixed window with a limit of 100 per minute resets at the top of each minute, so if a user fires 100 requests at 12:00:59 and another 100 at 12:01:00, they've gotten 200 requests through in two seconds while technically staying within the limit. Sliding window fixes this by weighting the previous window into the count, and token bucket sidesteps it entirely because the bucket refills continuously instead of resetting on a clock.

I pulled these numbers from the [Upstash costs reference](https://upstash.com/docs/redis/sdks/ratelimit-ts/costs#limit). The recommendation: we default to sliding window for accuracy, swap to fixed window only when we've measured the cost difference and care, and reach for token bucket on paid tiers where users genuinely need to absorb bursts (a batch upload endpoint, for example).

## Where should the limiter live: proxy.ts, Route Handler, or Server Action?

For me, the default is `proxy.ts`. It's one file, runs before any handler, and is perfect for per-IP limits on every API route. (If you're still on Next.js 15, the same code goes in `middleware.ts`; the file was renamed in 16 but the logic is identical.)

```tsx id="code-i68dliba"
// src/proxy.ts
import { NextRequest, NextResponse, after } from "next/server";
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(20, "10 s"),
  prefix: "rl:proxy",
  analytics: true,
});

export const config = {
  matcher: ["/api/:path*"],
};

export default async function proxy(req: NextRequest) {
  const ip =
    req.headers.get("x-forwarded-for")?.split(",")[0]?.trim() ??
    req.headers.get("x-real-ip") ??
    "127.0.0.1";

  const { success, limit, remaining, reset, pending } = await ratelimit.limit(ip);
  after(pending);

  const res = success
    ? NextResponse.next()
    : NextResponse.json({ error: "Too many requests" }, { status: 429 });
  res.headers.set("X-RateLimit-Limit", limit.toString());
  res.headers.set("X-RateLimit-Remaining", remaining.toString());
  res.headers.set("X-RateLimit-Reset", reset.toString());
  return res;
}
```

We move the limiter into a specific Route Handler when the limit depends on the route's body (weighting by token count for an LLM endpoint, for example) or when we need a different identifier per route (like a user ID for an account endpoint or IP for signup). We move it into a Server Action when the action is the only thing being protected and we want the limiter to share scope with the action's auth check.

The `after()` call is the Next.js 15.1+ API for [running work after the response is sent](https://nextjs.org/docs/app/api-reference/functions/after). The SDK returns a pending promise covering analytics writes and deny-list refreshes; handing it to `after()` keeps those off the response's critical path.

## How do I identify the caller behind Vercel's proxy?

The identifier we pass to the limiter is the rate-limit key. This is important because if we get it wrong, we'll either bucket everyone together (one user burns the whole quota) or never enforce anything at all (each request gets a unique key).

I recommend this, from best to worst identifiers:

1. **Authenticated user ID**, from the session cookie or JWT. It's stable across IPs and immune to NAT (where many users on the same network share a single public IP and would otherwise get bucketed together).
2. **API key** for B2B endpoints. We hash it before using it as the key so the raw key never lands in Redis.
3. **IP address** as a fallback for anonymous traffic.

On Vercel, the client IP is in the `x-forwarded-for` header. We fall back to `x-real-ip` and finally to a localhost placeholder so the limiter never gets an undefined identifier.

```tsx id="code-zv09au0d"
const ip =
  req.headers.get("x-forwarded-for")?.split(",")[0]?.trim() ??
  req.headers.get("x-real-ip") ??
  "127.0.0.1";
```

We never use the user agent or any other client-controllable value as the sole identifier. Attackers rotate them. We pass them to the deny list instead (see below).

## How do I run different limits for free vs paid tiers?

We create one limiter instance per tier, each with its own Redis key prefix so the tiers don't collide:

```tsx id="code-x64asiqb"
// src/lib/tiered.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const redis = Redis.fromEnv();
const cache = new Map<string, number>();

type Tier = "anon" | "free" | "pro";

const limiters: Record<Tier, Ratelimit> = {
  anon: new Ratelimit({
    redis, prefix: "rl:anon",
    limiter: Ratelimit.slidingWindow(20, "1 m"),
    ephemeralCache: cache, analytics: true,
  }),
  free: new Ratelimit({
    redis, prefix: "rl:free",
    limiter: Ratelimit.slidingWindow(100, "1 m"),
    ephemeralCache: cache, analytics: true,
  }),
  pro: new Ratelimit({
    redis, prefix: "rl:pro",
    // 200 tokens/min steady, burst up to 600
    limiter: Ratelimit.tokenBucket(200, "1 m", 600),
    ephemeralCache: cache, analytics: true,
  }),
};

export async function checkQuota(tier: Tier, id: string, weight = 1) {
  return limiters[tier].limit(id, { rate: weight });
}
```

The `rate` parameter is the SDK's [custom-rate feature](https://upstash.com/docs/redis/sdks/ratelimit-ts/features#custom-rates). Passing the request's token count as the rate makes one expensive LLM call worth 4,000 tokens consume 4,000 from the bucket instead of 1. This is the right primitive for LLM gateways and batch APIs.

## How do I keep the app running when Redis is slow or down?

Two settings together: a network timeout, and an in-memory cache for already-blocked identifiers.

```tsx id="code-7743iers"
const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, "10 s"),
  timeout: 1000,        // fail open after 1s
  ephemeralCache: new Map(),
  analytics: true,
});
```

The timeout defaults to 5 seconds, which is too long. A [Stack Overflow report](https://stackoverflow.com/questions/77066455/upstashs-rate-limiter-takes-5s-to-complete-api-call) shows users hitting the full 5-second wait when Redis is unreachable. We drop it to 1000ms. When the timeout fires, the SDK returns success with a timeout reason attached, our endpoint stays up, and we've traded rate-limit accuracy for availability during an outage. We log on that timeout reason so we notice when it fires in production.

The ephemeral cache is a plain JavaScript Map that lives on the warm function instance. Once an identifier is blocked, the SDK caches the reset timestamp locally and rejects further requests from that identifier without calling Redis at all. We use zero commands per blocked request and get a response in microseconds. Critical under DoS-like load.

The Map has to be declared at module scope, not inside the handler, or it gets re-created on every invocation.

## When should I use MultiRegionRatelimit?

Rarely. The single-region setup is the right default even for global apps because the Upstash Redis REST API resolves close to the edge and the latency of one HTTP round-trip is dominated by the limit logic, not the geography.

We reach for the multi-region limiter when both of these are true: we have measurable user populations on two or more continents, and we've measured p99 latency to a single Upstash region and it exceeds our budget. Otherwise the cost is real. Multi-region replicates state asynchronously between databases via [CRDTs](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type), and the docs warn that [sliding window in multi-region produces a large number of Redis commands](https://upstash.com/docs/redis/sdks/ratelimit-ts/algorithms#sliding-window). Stick to fixed window if we go this route:

```tsx id="code-njip9ns3"
import { MultiRegionRatelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

export const ratelimit = new MultiRegionRatelimit({
  redis: [
    new Redis({ url: process.env.US_URL!, token: process.env.US_TOKEN! }),
    new Redis({ url: process.env.EU_URL!, token: process.env.EU_TOKEN! }),
  ],
  limiter: MultiRegionRatelimit.fixedWindow(100, "10 s"),
  prefix: "rl:multi",
  analytics: true,
});
```

CRDT replication means the configured limit can be exceeded by a small margin during the sync window. That's acceptable for abuse prevention. It's not acceptable for billing: never use rate limits as the source of truth for usage charges.

## How do I block abusive IPs automatically?

We turn on protection in the limiter config and pass the request's IP, user agent, and country alongside the identifier:

```tsx id="code-7s0axzjf"
const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, "10 s"),
  enableProtection: true,
  analytics: true,
});

const { success, reason, deniedValue, pending } = await ratelimit.limit(userId, {
  ip: clientIp,
  userAgent: req.headers.get("user-agent") ?? undefined,
  country: req.headers.get("x-vercel-ip-country") ?? undefined,
});
after(pending);

if (!success && reason === "denyList") {
  console.warn("denied by deny list:", deniedValue);
}
```

This turns on two things. First, a manual deny list you manage from the [Upstash Ratelimit dashboard](https://console.upstash.com/ratelimit), where you add IPs, user agents, countries, or arbitrary identifiers that are blocked without consulting the algorithm. Second, the auto IP deny list, which syncs daily at 2 AM UTC from the [ipsum aggregator](https://github.com/stamparm/ipsum) that pulls from 30+ open-source blocklists.

The cost overhead is two extra Redis commands per call for the deny-list lookup, plus nine commands per day for the IP-list refresh. When a value is denied, the SDK caches it for 60 seconds so repeat attempts cost nothing. Pattern matching isn't supported, only exact matches.

## How much will this actually cost on Upstash?

Upstash bills per Redis command. The [free tier](https://upstash.com/pricing) covers 500,000 commands per month, and pay-as-you-go is $0.20 per 100,000 commands after that. Let's say we use sliding window + analytics:

- Allowed request, intermediate state: 4 commands (algorithm) + 1 (analytics) = **5 commands**
- Rate-limited request, cache miss: 3 + 1 = **4 commands**
- Rate-limited request, cache hit: 0 + 1 = **1 command**

So 500K commands a month gets us roughly 100K legitimate requests with sliding window plus analytics, before we start paying. Switching to fixed window without analytics doubles that to around 250K. The ephemeral cache means bot traffic effectively costs nothing once an attacker is blocked.

Past the free tier, the numbers stay tiny. A million allowed requests a month with sliding window plus analytics is 5M commands, which works out to about $9. For most side projects, this is a rounding error compared to the Vercel bill.

## When should I NOT use Upstash Ratelimit?

I wouldn't recommend Upstash Ratelimit if:

- **Per-second precision is required**, for financial or trading APIs. Sliding window is an approximation. Write a custom Lua script against Redis directly.
- **You're already on a self-hosted Redis Cluster** with sub-millisecond latency from your app servers. In this case it's easier to use node-redis with a token bucket Lua script.
- **For pure DDoS protection.** Application-layer rate limiting fires after the request has already reached your function. Pair it with Vercel's WAF or Cloudflare in front of your origin.

## FAQ

### Does Upstash Ratelimit work in serverless runtimes?

Yes. The SDK is HTTP-based, so it runs in Edge, Node.js, Cloudflare Workers, Fastly Compute, and any other runtime. In Next.js 16, the proxy file defaults to Node.js, it works great there too.

### What happens when my Redis fills up with rate-limit keys?

Every key gets a TTL set to the window duration on first hit, so a fixed-window or sliding-window key auto-deletes after the window passes with no further requests. You don't need to clean up manually. If you want to reset a specific identifier's state, the SDK exposes a reset method that takes the identifier.

### Can I share one Upstash Redis between rate limiting and caching?

Yes, and you should. Upstash bills per command, not per database. Use the limiter's prefix option (something like `rl:api`, `cache:sessions`) to keep keyspaces separate. The default prefix is `@upstash/ratelimit`.

### How do I test rate limiting locally without hitting production Redis?

Spin up a free Upstash dev database for local use and point your env at it. There's no in-memory mock that matches the SDK's behavior accurately: the Lua scripts the SDK uploads to Redis are part of the contract. If you must run tests offline, mock the limiter's return shape directly in your test setup. You can also get a free 72-hour Redis database for experiments by making a POST request to [https://upstash.com/start-redis](https://upstash.com/start-redis).

### How is this different from rate-limiter-flexible or express-rate-limit?

Those libraries assume a persistent process with a connection pool, and they break on serverless cold starts because each Lambda invocation opens a fresh TCP connection. Upstash Ratelimit uses HTTP, so cold starts are no problem. If you're on a long-running Node server with one regional Redis, rate-limiter-flexible is fine, but takes more set-up.

