# How do you store Vercel AI SDK chat history in Redis?

> Persist Vercel AI SDK chat history in Redis with onFinish, saveChat, loadChat, and createIdGenerator. Working AI SDK 6 + Upstash Redis code.

<Summary>
The Vercel AI SDK keeps every message in React state, so refreshing the page clears the chat. To make a chat survive reloads, we save the messages after each turn and load them when the page mounts. Redis is perfect for this: storing new messages takes 2-3 ms and AI SDK messages are already JSON anyway.
</Summary>

## Key takeaways

- The onFinish callback hands us the full conversation when streaming ends, so **persistence is a single 2 ms RPUSH command**
- It takes **6.7 ms to load a 1000-message history**. A single LRANGE pulls the whole list back which is way below the 200-2000 ms a model takes to first token
- We use the modern approach with createIdGenerator
- No lost messages when the user closes the tab: a single consumeStream call drains the model on the server so the reply still saves

## Why does Vercel AI SDK chat history disappear on refresh?

The useChat hook holds messages in component state. There is no persistence layer in the SDK itself.

The [official persistence guide](https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-message-persistence) shows a persistence pattern that writes each conversation to a JSON file on disk, which is fine for a demo but isn't great the moment we deploy to serverless. Each serverless function instance (Lambda, Vercel fluid compute, etc.) has its own filesystem, and there is no shared directory to write into.

We need an external store, and we need writes that complete inside the request without blocking the streamed response to the user. Redis is great at this: a single network call per turn, no schema, and the AI SDK message shape is already JSON anyway.

## Why pick Redis over Postgres for AI chat messages?

I'd say use Postgres when you need to query across messages (analytics, full-text search, joins to users and orgs). For the normal use case of "save this message, load this chat", Redis is a lot faster and easier to use.

![Our chat history in the Upstash Redis data browser](https://contentport.s3.amazonaws.com/chat/Sw016rSl6asUNtj0EpHf3DiBcD0ot3sQ/2PoDDQmSJRbvOHTWuEz5L.png)

A chat message is a deeply nested JSON object with parts, metadata, tool calls, and provider-specific fields. You read it back in exactly the shape you wrote it. You almost never query inside a message. That is the workload Redis handles best and where SQL adds friction.

Here is the head-to-head:

| Concern | Redis (Upstash) | Postgres |
| --- | --- | --- |
| Write latency per turn | 2 ms over HTTP | 20-100 ms depending on region |
| Schema for the UIMessage shape | None, store as JSON | jsonb column or one row per part |
| Schema migrations when AI SDK adds fields | None | ALTER TABLE or a migration job |
| Append a single message | RPUSH, O(1) | INSERT plus an index update |
| Load full conversation | LRANGE 0 -1 | SELECT ... ORDER BY created\_at |

## How fast is the Redis round-trip in practice?

I created a fresh Upstash Redis database and ran a small benchmark from a Node.js process: 30 warm pings, 50 single-message RPUSH calls, then LRANGE reads against history sizes of 10, 50, 200, and 1000 stored UIMessage objects. 

Each message had an id, a role, and one text part of roughly 200 bytes (about the size of a short chat sentence). Both the writes and the JSON serialization run through the Upstash Redis SDK.

![Redis latency for PING, RPUSH, and LRANGE commands](https://contentport.s3.amazonaws.com/chat/Sw016rSl6asUNtj0EpHf3DiBcD0ot3sQ/uYeCAdDF_-s1ugr7qAdEn.png)

| Operation | n | avg | p50 | p95 |
| --- | --- | --- | --- | --- |
| PING (REST round-trip) | 30 | 2.29 ms | 2.11 ms | 3.35 ms |
| RPUSH 1 message (per turn) | 50 | 2.21 ms | 2.19 ms | 2.68 ms |
| RPUSH 2 messages (user + reply) | 30 | 2.33 ms | 2.19 ms | 3.95 ms |
| LRANGE full list, history = 10 | 20 | 3.13 ms | 3.18 ms | 4.32 ms |
| LRANGE full list, history = 50 | 20 | 2.76 ms | 2.73 ms | 3.76 ms |
| LRANGE full list, history = 200 | 20 | 3.05 ms | 3.02 ms | 3.60 ms |
| LRANGE full list, history = 1000 | 20 | 6.76 ms | 6.73 ms | 8.56 ms |

A typical chat turn costs about 2 ms to persist and 3 ms to read back. Even at 1000 messages, a full history load is under 9 ms at p95, well under the 200-2000 ms a model takes to return its first token. The latency of course changes with the network distance between your function and the Redis region.

## How does the AI SDK 6 onFinish callback work?

The streamText function returns a result object whose toUIMessageStreamResponse method takes an onFinish callback. That callback fires once the stream is fully drained, with a payload like this:

```tsx id="code-ibc35aii"
onFinish: ({ messages }: { messages: UIMessage[] }) => {
  // messages = previous history + new user message + assistant reply
  // same shape useChat uses on the client, ready to JSON.stringify
};
```

We get the full conversation including the new assistant reply, in the exact shape the client already understands. The [persistence docs](https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-message-persistence) phrase it as "the complete messages including the new AI response."

Two things changed in AI SDK 6 that bite you if you copy older code:

1. convertToModelMessages is now async ([migration guide](https://ai-sdk.dev/docs/migration-guides/migration-guide-6-0#async-converttomodelmessages)). You have to await it, or tool-output handling breaks.
2. useChat in v6 no longer manages the input field's state. You hold the input value in your own useState hook.

The route handler that ties everything together looks like this:

```tsx id="code-yi2hyuj9"
// app/api/chat/route.ts
import { openai } from "@ai-sdk/openai";
import {
  convertToModelMessages,
  createIdGenerator,
  streamText,
  type UIMessage,
} from "ai";
import { appendMessages, loadChat } from "@/util/chat-store";

export async function POST(req: Request) {
  const { message, id } = (await req.json()) as {
    message: UIMessage;
    id: string;
  };

  const previous = await loadChat(id);
  const messages = [...previous, message];

  const result = streamText({
    model: openai("gpt-4o"),
    messages: await convertToModelMessages(messages),
  });

  // drain the stream even if the client disconnects 👇
  result.consumeStream();

  return result.toUIMessageStreamResponse({
    originalMessages: messages,
    generateMessageId: createIdGenerator({ prefix: "msg", size: 16 }),
    onFinish: async ({ messages: finalMessages }) => {
      const delta = finalMessages.slice(previous.length);
      await appendMessages(id, delta);
    },
  });
}
```

onFinish runs once, after the model is done, with the full message list. We only persist the delta (the new user message plus the assistant reply) because earlier messages are already in Redis.

## What does the Redis chat store look like?

One Redis list per chat, with keys shaped like chat:CHATID:messages where CHATID is the conversation's unique ID. RPUSH appends new turns at the tail, LRANGE reads the whole conversation back in order. No JSON merging, no read-modify-write cycle on long chats.

```tsx id="code-h3jbp62y"
// util/chat-store.ts
import { Redis } from "@upstash/redis";
import { generateId, type UIMessage } from "ai";

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_REST_URL!,
  token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});

const key = (chatId: string) => `chat:${chatId}:messages`;

export async function createChat(): Promise<string> {
  // no write needed yet; an empty list means "no messages"
  return generateId();
}

export async function loadChat(chatId: string): Promise<UIMessage[]> {
  return redis.lrange<UIMessage>(key(chatId), 0, -1);
}

export async function appendMessages(
  chatId: string,
  messages: UIMessage[],
): Promise<void> {
  if (messages.length === 0) return;
  await redis.rpush(key(chatId), ...messages);
}
```

The Upstash Redis SDK JSON-encodes any non-primitive on write and decodes on read, so we don't need to worry about serialization at any point.

If you also store the chat owner, give yourself a second key like chat:CHATID:meta as a Redis hash with the user ID, the created-at timestamp, and a title you generate from the first user message.

## How do you send only the last message instead of the whole history?

By default, the useChat hook posts the entire messages array on every request. After 50 turns that is a lot of redundant payload, and you are paying for tokens the server is going to discard anyway because it already has them in Redis.

The DefaultChatTransport class accepts a prepareSendMessagesRequest hook that lets you trim the body down to just the new message:

```tsx id="code-tntzu363"
// ui/chat.tsx
"use client";

import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport, type UIMessage } from "ai";
import { useState } from "react";

interface ChatProps {
  id: string;
  initialMessages: UIMessage[];
}

export function Chat({ id, initialMessages }: ChatProps) {
  const [input, setInput] = useState("");

  const { sendMessage, messages } = useChat({
    id,
    messages: initialMessages,
    transport: new DefaultChatTransport({
      api: "/api/chat",
      prepareSendMessagesRequest({ messages, id }) {
        return { body: { message: messages[messages.length - 1], id } };
      },
    }),
  });

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong>{" "}
          {m.parts.map((p, i) =>
            p.type === "text" ? <span key={i}>{p.text}</span> : null,
          )}
        </div>
      ))}

      <form
        onSubmit={(e) => {
          e.preventDefault();
          if (!input.trim()) return;
          sendMessage({ text: input });
          setInput("");
        }}
      >
        <input value={input} onChange={(e) => setInput(e.target.value)} />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}
```

The request body now contains only the latest message instead of the full conversation. The server pulls history from Redis and never trusts the client for older turns, which also stops a tampered client from rewriting earlier turns.

## How do you load the initial chat history on first paint?

Because we trimmed the request to just the latest message, the client starts with whatever array you pass into useChat as initial messages. In Next.js App Router, the cleanest way is to read Redis in a server component and hand the array to the chat component:

```tsx id="code-oo9rves6"
// app/chat/[id]/page.tsx
import { loadChat } from "@/util/chat-store";
import { Chat } from "@/ui/chat";

export default async function Page(props: {
  params: Promise<{ id: string }>;
}) {
  const { id } = await props.params;
  const initialMessages = await loadChat(id);
  return <Chat id={id} initialMessages={initialMessages} />;
}
```

One Redis call, no client-side loading state, the chat renders with history already on screen. In a plain React or SPA setup, do the same call from a React Query or useEffect hook and feed the result into useChat.

## Why do we need server-side message IDs?

By default, useChat generates message IDs on the client. That works for ephemeral chats but breaks the moment you persist: open the same chat in two tabs and you get colliding IDs, or refresh and the assistant message you saved comes back with a different ID than the one in state.

The AI SDK docs are explicit: "for persistence, you need server-side generated IDs to ensure consistency across sessions and prevent ID conflicts when messages are stored and retrieved" ([source](https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-message-persistence#client-side-vs-server-side-id-generation)).

The supported helper is createIdGenerator. Configure it with a short prefix (we use "msg") and a 16-character size, then pass it as the generateMessageId option on toUIMessageStreamResponse. Every assistant message coming out of the stream now has an ID you control, and the IDs in Redis match the IDs useChat sees on the client.

## What happens if the user closes the tab mid-stream?

By default, streamText applies backpressure: if the client stops reading, the model stops generating. That saves tokens, but it also means onFinish never fires and you lose the half-finished reply.

The fix is to call consumeStream on the result, without awaiting it, before you return the response. It drains the stream on the server regardless of what the client is doing, so the model runs to completion and onFinish writes the full assistant reply to Redis even if the client has already closed the connection. From the [persistence guide](https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-message-persistence#handling-client-disconnects): "consumeStream effectively removes the backpressure, meaning that the result is stored even when the client has already disconnected."

This costs you tokens for replies nobody reads. If that matters, gate it: only drain the stream for paid users, or only when the conversation has more than N turns invested.

For mid-stream resume (where the user comes back and sees the response continue from where it left off), look at [Chatbot Resume Streams](https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams), which uses a separate stream ID stored in Redis.

## What about tools, metadata, and schema drift?

The AI SDK ships a validateUIMessages helper that re-checks stored messages against your current tool and metadata schemas. Messages saved a month ago might reference a tool you renamed, or use a metadata field your Zod schema no longer accepts. If you feed those into the model conversion step unchecked, the model call throws partway in.

Validate on read:

```tsx id="code-u2u67kke"
import { validateUIMessages, TypeValidationError } from "ai";

const previous = await loadChat(id);

let validated;
try {
  validated = await validateUIMessages({
    messages: [...previous, message],
    tools,
    metadataSchema,
  });
} catch (err) {
  if (err instanceof TypeValidationError) {
    // drop or migrate the offending messages, then retry
    validated = [message];
  } else {
    throw err;
  }
}
```

For tool-heavy agents, this is the most important guard between Redis and the model. A bad schema in storage should degrade to a fresh conversation, not a 500.

## When should you not use Redis for chat history?

Pick a SQL database when any of these are true:

- You need to search across all messages from all users (full-text or vector search on the message body itself).
- You report on conversations (token usage by org, average turns per user, retention).
- You have hard compliance requirements that need row-level audit trails.

A common pattern: Redis as the hot store the chat reads and writes on every turn, plus an async job that copies finalized conversations into Postgres or a warehouse for analytics queries.

The other failure mode is enormous individual messages (long tool outputs, base64-encoded images embedded in message parts). Redis charges for memory and for request size, so for that case, store the heavy payload in object storage and keep only a reference in the message.

## FAQ

**Does this work outside Next.js?** Yes. The streamText and toUIMessageStreamResponse APIs and the Upstash Redis client are framework-agnostic. The route handler shown here is a plain Web request and response handler, so it drops into Hono, Remix, SvelteKit, or a raw Node server.

**Can I use ioredis or node-redis instead of the @upstash/redis client?** Yes. Upstash speaks both protocols: the @upstash/redis client uses HTTP and works inside Vercel Edge Functions and Cloudflare Workers, where outbound TCP is restricted. In a regular Node.js deployment, ioredis or node-redis connect to the same Upstash database over TCP and perform identically. Pick the client your runtime supports.

**How do I expire old chats?** Call EXPIRE on the key after each RPUSH, or run a daily job that deletes keys older than a cutoff. For per-user quotas, keep an index list per user (one Redis list per user, containing chat IDs) and trim it with LTRIM when it grows past your retention limit.

**Will this handle 10 million messages?** A single Redis list scales to that, but the read cost is O(N). For very long single conversations, paginate the LRANGE call with start and stop indexes and only hydrate the last 50-100 turns on the client. The model only needs the recent window plus whatever summary you have already produced.