DEV Community

Cover image for Vercel AI SDK v5 Internals - Part 4 — Decoupling Client & Server: State Management and Message Conversion
Yigit Konur
Yigit Konur

Posted on

Vercel AI SDK v5 Internals - Part 4 — Decoupling Client & Server: State Management and Message Conversion

This is the fourth post in our series diving into AI SDK 5. We've previously covered the UIMessage structure, the UI Message Streaming Protocol, and V2 Model Interfaces. So, assuming you're up to speed on those, let's dig into how v5 is making client-server interactions cleaner and more scalable through better state management and message conversion.

🖖🏿 A Note on Process & Curation: While I didn't personally write every word, this piece is a product of my dedicated curation. It's a new concept in content creation, where I've guided powerful AI tools (like Gemini Pro 2.5 for synthesis, git diff main vs canary v5 informed by extensive research including OpenAI's Deep Research, spent 10M+ tokens) to explore and articulate complex ideas. This method, inclusive of my fact-checking and refinement, aims to deliver depth and accuracy efficiently. I encourage you to see this as a potent blend of human oversight and AI capability. I use them for my own LLM chats on Thinkbuddy, and doing some make-ups and pushing to there too.

1. Why Per-Hook State Didn’t Scale: The Case for Centralized Client State

TL;DR: Vercel AI SDK v5 moves beyond v4's isolated useChat state by introducing principles for centralized client-side state management, embodied by the updated useChat hook when using a shared id, to solve synchronization and efficiency issues in complex UIs.

Why this matters?

If you've built chat UIs with Vercel AI SDK v4, you'll be familiar with the useChat hook. It's a fantastic tool, but in v4, each instance of useChat typically owned its own state—its own messages array, its own copy of the user's current input, loading status, and so on. This was straightforward for simple, single-view chat interfaces.

However, as applications grew more complex, this per-hook state model started to show its limitations. Imagine you have a main chat window, a chat preview in a sidebar, and maybe even a pop-up chat component, all needing to display or interact with the same conversation. The challenges became pretty clear:

  • Synchronization Headaches: Keeping all these views in sync was a manual, and often painful, process. You'd find yourself prop-drilling message histories and callbacks deep into your component tree, or reaching for the React Context API, or even bringing in external state management libraries (like Zustand, Redux, Jotai) just to make sure "tab A" knew what "tab B" was doing. This added a lot of boilerplate and complexity.
  • Data Duplication: Each useChat instance could hold its own copy of the messages array. For long conversations, this meant duplicating potentially large amounts of data in client-side memory, which isn't ideal for performance or resource usage.
  • Caching Inefficiencies: There was no built-in mechanism to de-duplicate fetches or cache message data across different useChat instances, even if they were conceptually referring to the same chat session (e.g., using the same chat ID). Each instance might refetch or hold redundant information.

These issues made it harder to build truly sophisticated and seamlessly integrated multi-view chat experiences without significant custom plumbing.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

Vercel AI SDK v5, particularly through the principles embodied in the updated useChat hook (especially when used with a shared id prop), moves decisively towards a centralized client-side state management model. Conceptually, you can think of this as the SDK providing an internal ChatStore (a v5 term we'll explore more in the next section) for each unique chat session.

The core idea is to have a single, authoritative source of truth for a given chat's state on the client. When multiple useChat hooks are initialized with the same chat id, they effectively subscribe to and share this centralized state.

This directly addresses the pain points of v4:

  • Synchronization Solved: If multiple components use useChat with the same id, any update to the chat state (a new user message, an incoming AI response part) made through one hook instance is automatically reflected in all other instances subscribed to that same id. The "tab A is out of sync with tab B" problem for the same conversation effectively disappears, without needing manual intervention.
  • Data De-duplication: There's only one canonical copy of the messages array (and other chat state like input, status) for a given chat id managed internally by the SDK. This reduces client-side memory footprint.
  • Implicit Caching: The SDK's internal management of state per id provides a form of in-memory caching for the duration of the user's session on the page.
V4: Per-Hook State
+-----------------+     +-----------------+     +-----------------+
| Component A     |     | Component B     |     | Component C     |
| useChat (no id) |     | useChat (no id) |     | useChat (no id) |
| [State A copy]  |     | [State B copy]  |     | [State C copy]  |
+-----------------+     +-----------------+     +-----------------+
 (Needs manual sync if related)

v5: Shared State via `id`
+-----------------+     +-----------------+
| Component X     |     | Component Y     |
| useChat({id:"1"})|     | useChat({id:"1"})|
+-----------------+     +-----------------+
        |                       |
        +-------+---------------+
                |
                v
    +-----------------------+
    | SDK Internal State    |
    | for chat ID "1"       |
    | (Single Source of Truth)|
    +-----------------------+

+-----------------+
| Component Z     |
| useChat({id:"2"})| --> SDK Internal State for "2"
+-----------------+
Enter fullscreen mode Exit fullscreen mode

[FIGURE 1: Diagram comparing V4's multiple independent useChat states vs. v5's useChat instances with shared ID pointing to a single internal state/cache for that ID]

So, while you might not be directly instantiating a ChatStore object and passing it around in many common scenarios with useChat, the hook itself has become much smarter about sharing and synchronizing state when told to do so via the id prop. This architectural shift is fundamental to building cleaner, more scalable, and more maintainable complex chat UIs.

Take-aways / Migration Checklist Bullets

  • V4 Limitation: Per-hook state in useChat led to synchronization difficulties, data duplication, and caching inefficiencies in complex UIs.
  • v5 Solution: useChat instances initialized with the same id prop now share a centralized, internally managed client-side state for that chat session. This embodies the principles of a conceptual ChatStore.
  • Benefits: Greatly simplifies state synchronization across multiple UI components, reduces data duplication, and provides implicit in-memory caching for chat sessions.
  • Migration Tip: For existing V4 apps, identify areas where multiple components display the same chat. Refactor these to use useChat with a common, consistent id prop to leverage v5's built-in state sharing.

2. ChatStore Responsibilities (Conceptual Deep Dive)

TL;DR: The conceptual *ChatStore** (whose functionalities are largely embodied by v5's useChat hook with a shared id) acts as the client-side single source of truth for chat state, managing message history, optimistic updates, write operations, and streaming status, with a vision for cross-framework compatibility.*

Why this matters?

As chat applications evolve beyond simple request-response cycles to include features like streaming, tool interactions, file handling, and real-time updates across multiple UI views, the need for a robust, centralized client-side state manager becomes paramount. Managing all the moving parts—user input, AI responses streaming in, optimistic updates, tool execution states, loading indicators, and error handling—coherently and efficiently is a complex task. Without a clear architectural pattern for this, client-side code can quickly become tangled and hard to maintain.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

While the v5 canary versions emphasize using the useChat hook with a shared id as the primary way for React developers to achieve centralized state, it's helpful to understand the underlying principles and responsibilities that a conceptual ChatStore would handle. Think of these as the "what it does for you" capabilities, which are largely now integrated into useChat's internal logic when a consistent id is provided.

Core Responsibilities

  1. Normalized Cache / Single Source of Truth:

    • This is the bedrock. For any given chat id, the mechanism acting as the ChatStore (i.e., the SDK's internal shared state for that useChat id) maintains a single, canonical version of:
      • The messages array (which are UIMessage[] – a v5 term for the rich, part-based message structure we discussed in Post 1).
      • The current input state.
      • The overall streaming status (e.g., 'idle', 'loading', 'error').
      • Any error state object.
    • This ensures that all components subscribed to this chat id see the exact same information, preventing data duplication and inconsistencies.
  2. In-flight Write Queue / Coordination & Optimistic Updates:

    • Coordination: If multiple actions could potentially modify the chat state simultaneously (e.g., user submits a new message, an AI response is streaming in, a client-side tool finishes execution), the store's logic ensures these operations are processed coherently and don't corrupt the state. This might involve an internal queue or a state machine to manage transitions.
    • Optimistic Updates: This is crucial for a responsive UI.
      • When a user submits a message, it's immediately added to the local messages array (in the shared state) before the server even acknowledges the request. The UI updates instantly.
      • As an AI response streams back (as UIMessageStreamParts – the building blocks of the v5 UI Message Streaming Protocol, covered in Post 2), the assistant's UIMessage in the shared state is incrementally updated. Each update (e.g., a new text chunk, a tool call appearing) triggers a reactive UI update.
  3. Cross-Framework Subscription (Conceptual / Future Vision):

    • The architectural vision is that a true ChatStore could be framework-agnostic. This means React, Vue, and Svelte useChat hooks (or their equivalents) could all subscribe to the same underlying JavaScript ChatStore instance.
    • While the current v5 Canary hooks (@ai-sdk/react, @ai-sdk/vue, @ai-sdk/svelte) are framework-specific implementations, the underlying principle of a shared, identifiable store is established. The current v5 canary achieves cross-component sync within the same framework via the shared id prop. True cross-framework sharing of a single store instance would be a further step if a standalone ChatStore API becomes more prominent.
+-----------------------+
| React Component       |
| useChat({id: "chat1"})| ----+
+-----------------------+     |
                              |
+-----------------------+     v
| Vue Component         |   +---------------------+
| useChat({id: "chat1"})|-->| Conceptual          |
+-----------------------+   | ChatStore           |
                              | (for chat ID "chat1") |
+-----------------------+     ^
| Svelte Component      |   | (Manages UIMessages,|
| useChat({id: "chat1"})| ----+   input, status, etc.)|
+-----------------------+     +---------------------+
Enter fullscreen mode Exit fullscreen mode

[FIGURE 2: Diagram illustrating multiple framework hooks (React useChat, Vue useChat, Svelte useChat) all pointing to and synchronizing with a single conceptual ChatStore instance for a given chat ID.]

Other Key Functions (Conceptually handled by useChat's internals for a given id)

  • Managing Streaming State: The store tracks the lifecycle of an AI response, typically through a status property. This might include states like:
    • 'submitted' or 'loading': When a request has been sent and is awaiting a response or the start of a stream.
    • 'streaming': When UIMessageStreamParts are actively being received and processed.
    • 'ready' or 'idle': When the interaction is complete and the system is ready for new input.
    • 'error': If an error occurs during the process.
  • Facilitating ChatTransport Interaction:
    • Conceptually, a ChatStore would be the component that invokes methods on a ChatTransport (the v5 concept for abstracting message delivery).
    • In the current v5 Canary useChat implementation, the internal callChatApi utility fulfills this role for the default HTTP/SSE transport.
  • Exposing State & Imperative API (Conceptual for Direct Use):

    • While most useChat users in v5 Canary interact with the store's principles reactively through the hook's return values, the underlying design of a ChatStore could offer an imperative API for advanced scenarios. This might include methods like:
      • addMessage(message: UIMessage)
      • setMessages(messages: UIMessage[])
      • getState(): { messages: UIMessage[], input: string, status: string, error: Error | null }
      • subscribe(listener: (state) => void): () => void (for manual subscription)
    • Conceptual example of direct usage:

      // Conceptual direct usage
      // This illustrates the *idea* of a standalone store.
      // In v5 Canary, useChat with an 'id' provides this behavior implicitly.
      //
      // import { createChatStore } from '@ai-sdk/core'; // Hypothetical import
      //
      // const store = createChatStore({ id: 'my-chat' });
      // store.addMessage({ // Assuming addMessage is a method on the store
      //   id: 'm1',
      //   role: 'user',
      //   parts: [{type: 'text', text: 'Hi'}]
      //   // createdAt might be added by the store or expected here
      // });
      // const currentState = store.getState();
      // console.log(currentState.messages);
      
    • It's important to reiterate that for most v5 Canary useChat users, this direct interaction is abstracted away. You get the benefits of the store's principles (caching, synchronization, optimistic updates) by simply using useChat with a consistent id.

Take-aways / Migration Checklist Bullets

  • The conceptual ChatStore in v5 centralizes client-side chat state, acting as a single source of truth.
  • Its key responsibilities include maintaining a normalized cache of UIMessage[], coordinating writes, managing optimistic updates, and tracking streaming status.
  • In v5 Canary, these responsibilities are largely fulfilled by the useChat hook's internal logic when a consistent id prop is used across components.
  • This approach inherently provides synchronization and caching for chat sessions within the same framework.
  • While a directly manipulated ChatStore object isn't the primary interaction pattern for typical useChat users in Canary, understanding its principles helps in grasping v5's state management improvements.

3. Hands-On: Creating & Sharing a Store (via useChat with id)

TL;DR: Achieving synchronized chat state across multiple React components in Vercel AI SDK v5 is straightforward: simply pass the same id prop to each useChat instance, making them subscribe to and update a shared underlying state.

Why this matters?

We've talked about the theory of centralized client state with ChatStore principles. Now, let's see it in action. The beauty of v5's approach is that for many common use cases, you don't need to manually manage a store instance or context. The useChat hook, when given a consistent id, handles the heavy lifting of state sharing and synchronization for you. This makes building UIs with multiple views of the same chat (like a main window and a sidebar preview) significantly simpler than it was in V4.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

Let's build a simple React application with two components:

  1. MainChatWindow: A primary chat interface where the user can type and send messages.
  2. ChatSidebarPreview: A smaller component that displays a preview of the same chat conversation (e.g., the last few messages).

Both components will use useChat, and we'll synchronize them using a shared chatId.

Scenario & Core Mechanism: Shared id Prop

The key to making these two components share the same chat state is to pass the exact same string value for the id prop to their respective useChat hooks. This id tells the AI SDK that these hooks are interested in the same underlying conversation data.

We can generate this id once (e.g., when a new chat session starts, or if it's loaded from a URL parameter for an existing chat). The Vercel AI SDK provides a utility for this: generateId (usually imported from ai or @ai-sdk/core).

Example Code (React)

Here’s how you might structure the components:

App.tsx (or your main page component)
This component will define the chatId and pass it to both child components.

// App.tsx (or main page)
import React, { useState } from 'react';
import { MainChatWindow } from './MainChatWindow';
import { ChatSidebarPreview } from './ChatSidebarPreview';
import { generateId } from 'ai'; // v5 SDK utility for generating unique IDs

// Reminder: v5 Canary APIs might change. Double-check imports.

export default function App() {
  // For a new chat, generate an ID.
  // In a real app, this might come from URL params for an existing chat,
  // or be fetched/generated when a user starts a new conversation.
  const [chatId] = useState(() => generateId());

  return (
    <div style={{ display: 'flex', fontFamily: 'sans-serif' }}>
      <MainChatWindow chatId={chatId} />
      <ChatSidebarPreview chatId={chatId} />
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

MainChatWindow.tsx
This component provides the full chat interface.

// MainChatWindow.tsx
import React from 'react';
import { useChat, UIMessage }
  from '@ai-sdk/react'; // v5 imports from canary

// Assuming UIMessage and its parts (TextUIPart etc.) are understood from Post 1.
// Define TextUIPart if not directly available for casting or filtering
interface TextUIPart { type: 'text'; text: string; }

export function MainChatWindow({ chatId }: { chatId: string }) {
  const {
    messages,
    input,
    handleInputChange,
    handleSubmit,
    status,
    isLoading // convenience boolean: status === 'loading' || status === 'generating' etc.
  } = useChat({
    id: chatId, // Crucial: use the shared chatId
    api: '/api/v5/chat_endpoint', // Your v5-compliant backend endpoint
    // initialMessages: [], // Load from DB if resuming an existing chat
    // onFinish: (message) => console.log('MainChatWindow: AI message finished', message),
  });

  return (
    <div style={{ flex: 2, padding: '20px', borderRight: '1px solid #ccc' }}>
      <h3>Main Chat (ID: {chatId})</h3>
      <div style={{
        height: '400px',
        overflowY: 'auto',
        border: '1px solid #eee',
        marginBottom: '10px',
        padding: '10px'
      }}>
        {messages.map((m: UIMessage) => ( // UIMessage from @ai-sdk/react
          <div key={m.id} style={{ marginBottom: '8px' }}>
            <strong>{m.role === 'user' ? 'You' : 'AI'}:</strong>
            {m.parts.map((p, idx) =>
              p.type === 'text' ? <span key={idx}>{(p as TextUIPart).text}</span> : <em key={idx}> [{p.type}]</em>
            ).reduce((prev, curr) => <>{prev}{curr}</>, null)}
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          disabled={isLoading}
          placeholder="Type your message..."
          style={{ width: 'calc(100% - 70px)', padding: '8px', marginRight: '5px' }}
        />
        <button type="submit" disabled={isLoading} style={{ padding: '8px 15px' }}>
          Send
        </button>
      </form>
      {isLoading && <p><em>AI is thinking...</em></p>}
      {status !== 'idle' && status !== 'loading' && <p>Status: {status}</p>}
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

ChatSidebarPreview.tsx
This component shows a simplified preview, like the last few messages.

// ChatSidebarPreview.tsx
import React from 'react';
import { useChat, UIMessage }
  from '@ai-sdk/react'; // v5 imports from canary

interface TextUIPart { type: 'text'; text: string; }

export function ChatSidebarPreview({ chatId }: { chatId: string }) {
  // Notice: This instance doesn't need input/handleSubmit if it's just for display.
  // It still needs the `api` endpoint defined, as `useChat` might try to use it
  // for other purposes or expect it for initialization, even if this specific
  // instance doesn't actively call `handleSubmit`.
  const { messages, status, isLoading } = useChat({
    id: chatId, // Crucial: use the SAME shared chatId
    api: '/api/v5/chat_endpoint', // Must match for the SDK to sync correctly
    // initialMessages: [], // Consistency: load if MainChatWindow loads
  });

  // Helper to get text content from UIMessage parts for preview
  const getPreviewText = (message: UIMessage): string => {
    return message.parts
      .filter(p => p.type === 'text')
      .map(p => (p as TextUIPart).text) // Cast to TextUIPart to access .text
      .join(' ')
      .substring(0, 70); // Show a snippet
  };

  return (
    <div style={{ flex: 1, padding: '20px', backgroundColor: '#f9f9f9' }}>
      <h4>Chat Preview (ID: {chatId})</h4>
      <div style={{ fontSize: '0.9em', maxHeight: '420px', overflowY: 'auto' }}>
        {messages.slice(-5).map((m: UIMessage) => ( // Show last 5 messages
          <div key={m.id} style={{ marginBottom: '5px', padding: '3px', borderBottom: '1px dotted #ddd' }}>
            <em>{m.role}:</em> {getPreviewText(m)}...
          </div>
        ))}
        {messages.length === 0 && <p><em>No messages yet.</em></p>}
        {isLoading && <p><em>Syncing...</em></p>}
         {status !== 'idle' && status !== 'loading' && <p><small>Status: {status}</small></p>}
      </div>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

(Remember to create the /api/v5/chat_endpoint on your server, as discussed in the sections on message conversion and persistence, ensuring it handles UIMessage[] and returns a v5 UI Message Stream.)

+----------------------------------+-------------------------------------+
| Main Chat Window (ID: xyz123)    | Chat Sidebar Preview (ID: xyz123)   |
+----------------------------------+-------------------------------------+
|                                  |                                     |
| You: Hello AI!                   | You: Hello AI!...                   |
| AI: Hello User! I am streaming...| AI: Hello User! I am streaming!...  |
| AI: [Tool: searchWeb (pending)]  |                                     |
|                                  |                                     |
| [Type your message...      ][Send]| (Last 5 messages shown here)        |
| AI is thinking...                | Syncing...                          |
+----------------------------------+-------------------------------------+
Enter fullscreen mode Exit fullscreen mode

[FIGURE 3: Screenshot of the UI with MainChatWindow and ChatSidebarPreview side-by-side, showing synchronized messages.]

Explain the Synchronization Magic

Now, let's walk through what happens when the user interacts:

  1. User Input: The user types a message in MainChatWindow's input field and clicks "Send" (or presses Enter).
  2. Optimistic Update (Shared State):
    • The handleSubmit function from MainChatWindow's useChat instance is called.
    • Internally, this useChat instance (which is linked to the shared state for chatId) optimistically adds the user's new UIMessage to its messages array.
    • Crucially: Because both MainChatWindow's useChat and ChatSidebarPreview's useChat are subscribed to the same underlying state for that chatId, both components will re-render almost instantly. The new user message will appear in both the main chat list and the sidebar preview. This is the "single source of truth" in action.
  3. Server Request: MainChatWindow's useChat instance then sends the updated messages array (including the new user message) to your /api/v5/chat_endpoint.
  4. AI Response Streaming:
    • The server processes the request, calls the LLM, and starts streaming back the AI's response using the v5 UI Message Streaming Protocol.
    • MainChatWindow's useChat instance receives these stream parts (e.g., text deltas, tool call info). It uses an internal utility (like processUIMessageStream) to incrementally build or update the assistant's UIMessage in the shared state.
  5. Simultaneous UI Updates:
    • Each time the assistant's UIMessage is updated in the shared state by MainChatWindow's useChat, both MainChatWindow AND ChatSidebarPreview re-render to show the streaming AI response. You'll see the text appearing character by character, or tool UIs updating, in both places at the same time.

This seamless synchronization happens without any manual prop drilling for the chat messages or status between MainChatWindow and ChatSidebarPreview. The shared id prop is the key that unlocks this powerful behavior, making the principles of the conceptual ChatStore a practical reality for everyday UI development.

Take-aways / Migration Checklist Bullets

  • To share chat state across multiple React components, initialize useChat in each component with the exact same id string value.
  • The AI SDK handles the internal state synchronization, caching, and optimistic updates for that id.
  • Ensure all useChat instances sharing an id point to the same api endpoint for consistency, even if only one of them actively submits new messages.
  • This pattern dramatically simplifies building complex UIs with multiple views of the same conversation compared to V4's manual synchronization needs.
  • Use a stable ID generation strategy (e.g., generateId() from ai for new chats, or load IDs for existing chats from your backend/URL).

4. Model Message Conversion on the Server: The convertToModelMessages Bridge

TL;DR: The server-side *convertToModelMessages** utility in Vercel AI SDK v5 is essential for transforming rich, client-originated UIMessage arrays into leaner ModelMessage arrays suitable for LLM V2 interfaces, selectively including/excluding parts and handling file/tool data.*

Why this matters?

We've established that UIMessage (with its parts array, metadata, etc.) is fantastic for building rich client-side UIs and for robust persistence. However, Large Language Models (LLMs) don't typically understand this complex UI-centric structure directly. They expect a more constrained input, usually a sequence of messages with roles and content, and specific formats for things like tool calls or image data.

Furthermore, different LLM providers might have slightly different expectations for how this input should be structured. This is where a server-side conversion step becomes critical. It ensures a clean separation of concerns: your client and database can work with the rich UIMessage format, while your server can reliably prepare the data in a way that any V2-compliant LLM provider adapter in the AI SDK can understand.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

This is where the convertToModelMessages() utility comes into play.

Introducing convertToModelMessages()

  • Purpose: Its primary job is to take an array of UIMessage[] (as received by your server API endpoint from the client) and transform it into an array of ModelMessage[]. The ModelMessage (a v5 term, defined in packages/ai/core/prompt/message.ts) is the standardized format that the Vercel AI SDK's V2 model interfaces (like LanguageModelV2, which we touched on in Post 3) expect. Like UIMessage, ModelMessage.content is also an array of typed parts (e.g., LanguageModelV2TextPart, LanguageModelV2FilePart).
  • Location: This is a server-side utility function, typically imported from ai or @ai-sdk/core.
  • Input: An array of UIMessage<METADATA>[].
  • Output: An object, the most important part of which is modelMessages: ModelMessage[].

4.1 Traversing UI parts → Model parts

The core logic of convertToModelMessages() involves iterating through each UIMessage in the input array and then iterating through its parts array. For each UIMessagePart, it determines how (or if) it should be represented in the corresponding ModelMessage's content array.

  • TextUIPart.text is typically mapped directly to a LanguageModelV2TextPart in the ModelMessage.content.

4.2 File handling & model.supportedUrls

This is a neat part of v5's more sophisticated model interaction. When convertToModelMessages encounters a FileUIPart in a UIMessage:

  • It aims to create a LanguageModelV2FilePart for the ModelMessage.
  • It needs to handle the FileUIPart.url.
    • If the url is a Data URL (e.g., data:image/png;base64,...), the function will extract the base64 encoded data. The LanguageModelV2FilePart will then contain this binary data.
    • If it's a remote HTTP(S) URL, the behavior is more nuanced. The Vercel AI SDK's LanguageModelV2 interface includes an optional property: supportedUrls?: { mediaType: string; urlRegex: RegExp }[].
      • This supportedUrls property allows a model provider to declare which types of media URLs it can ingest and process natively by fetching the content itself.
      • When convertToModelMessages (or more accurately, the V2 provider adapter that consumes its output) processes a FileUIPart with a remote URL, it can check against the target model's supportedUrls.
        • If the URL matches a supported pattern, the URL itself might be passed in the LanguageModelV2FilePart to the model.
        • If the URL is not directly supported by the model for fetching, but the model can accept inline data for that mediaType, the SDK might need to download the content from the URL and then provide it as base64 data.

4.3 Excluding UI-only parts (reasoning / step markers)

A key aspect of the "UI Messages ≠ Model Messages" philosophy is that not all UIMessageParts are relevant for the LLM's prompt.

  • Generally Excluded: Parts that are purely for UI presentation or structuring the visual flow are typically excluded during the conversion to ModelMessages. This includes:
    • ReasoningUIPart: The AI's thought process is for user insight, not for re-feeding to the AI.
    • StepStartUIPart: Visual step markers are UI-only.
    • SourceUIPart (often): While sources are important, they might be represented textually within a TextUIPart if they need to be part of the prompt, or excluded if they are just for UI display alongside AI-generated text.
  • Stripped Message-Level Fields: Similarly, top-level UIMessage fields like UIMessage.id, UIMessage.metadata, and UIMessage.createdAt are generally stripped because they are not part of the standard LLM prompt structure.

Handling ToolInvocationUIPart

This is critical for enabling tool use with LLMs:

  • Assistant's Tool Call Request: If an assistant's UIMessage contains a ToolInvocationUIPart where toolInvocation.state is 'call', convertToModelMessages transforms this into one or more LanguageModelV2ToolCallPart(s) within the assistant's ModelMessage.content array.
  • Providing Tool Results to the Model: If a UIMessage (representing a tool's outcome) contains a ToolInvocationUIPart where toolInvocation.state is 'result' or 'error', convertToModelMessages converts this into one or more LanguageModelV2ToolResultPart(s). These parts are then typically wrapped in a new ModelMessage that has role: 'tool'. This new ModelMessage is then added to the modelMessages array sent to the LLM.

Output and Provider Adaptation

The ModelMessage[] array that convertToModelMessages() produces is a standardized, intermediate representation. This array is then passed to V2 core functions like streamText({ messages: modelMessages, model, ... }).

The specific V2 provider adapter (e.g., @ai-sdk/openai, @ai-sdk/anthropic) takes this ModelMessage[] array and performs the final transformation into the exact API payload required by that specific LLM provider.

+--------------------+   +-------------------+   +-------------------------+   +-------------------+
| Client sends       |-->| API Endpoint      |-->| convertToModelMessages()|-->| ModelMessage[]    |
| UIMessage[]        |   | receives UIMessage[]|   | (Filters UI parts,      |   | (Standardized     |
+--------------------+   +-------------------+   |  handles files/tools)   |   |  for V2 models)   |
                                                  +-------------------------+   +-------------------+
                                                                                          |
                                                                                          v
+--------------------+   +--------------------------+   +-----------------------------+   +--------------+
| LLM API            |<--| Provider-Specific Payload|<--| Provider Adapter            |<--| streamText() |
| (e.g. OpenAI JSON) |   | (e.g. OpenAI format)     |   | (e.g. @ai-sdk/openai)       |   | (uses V2     |
+--------------------+   +--------------------------+   | (Converts ModelMessage[] to |   |  Model)      |
                                                        |  provider specific format)  |   +--------------+
                                                        +-----------------------------+
Enter fullscreen mode Exit fullscreen mode

[FIGURE 4: Server-side flow diagram focusing on message conversion: Client sends UIMessage[] -> API Endpoint receives UIMessage[] -> Calls convertToModelMessages() -> Outputs ModelMessage[] -> ModelMessage[] passed to streamText() with V2 Model -> Provider Adapter (e.g., @ai-sdk/openai) converts ModelMessage[] to Provider-Specific API Payload -> Call to LLM API.]

Take-aways / Migration Checklist Bullets

  • Your server-side API endpoint must use convertToModelMessages() to transform the incoming UIMessage[] from the client into ModelMessage[] before calling V2 LLM functions like streamText().
  • Understand that UI-specific parts like ReasoningUIPart and StepStartUIPart, as well as UIMessage.id and UIMessage.metadata, are generally filtered out during this conversion as they are not for the LLM prompt.
  • FileUIPart conversion to LanguageModelV2FilePart intelligently handles Data URLs and considers the target model's supportedUrls capability.
  • ToolInvocationUIParts are transformed into the appropriate LanguageModelV2ToolCallPart (for AI requests) or LanguageModelV2ToolResultPart (for tool outcomes, in a role: 'tool' ModelMessage) structures.
  • The output ModelMessage[] is a standardized format that V2 provider adapters then translate into the final API payload for each specific LLM.

5. Persist-Once Architecture: The UIMessage as the Source of Truth

TL;DR: Vercel AI SDK v5 strongly advocates persisting the complete *UIMessage[]** array, including all its rich parts and metadata, typically in the server-side onFinish callback, as this ensures high-fidelity UI restoration and decouples persisted data from LLM-specific formats.*

Why this matters?

Persisting chat conversations is a fundamental requirement for almost any chat application. Users expect to be able to close a chat and come back to it later, finding their history intact. The crucial question is: what exactly should you persist?

In V4, the guidance was evolving, and sometimes developers might have been tempted to store the simpler CoreMessages (the format closer to what the LLM produced) or a mix of client and core messages. This could lead to:

  • Loss of Rich UI State: If you only stored the LLM's raw text output, you'd lose all the rich UI information – the state of tools, how files were displayed, any custom metadata, reasoning steps shown to the user, etc. Restoring the UI to its exact previous state ("pixel-perfect restore") became difficult.
  • Brittleness to LLM/Prompt Changes: If your persisted format was too closely tied to a specific LLM's output structure or your prompt engineering, changing your LLM provider or your prompting strategy could potentially invalidate your old persisted data or require complex data migrations.

v5 provides very clear guidance to solve these issues.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

The Golden Rule: "Always persist UI messages."

This is the clear and strong recommendation from the Vercel AI SDK team for v5.

What does this mean? It means you should store the UIMessage[] array in your database. Each UIMessage object should be stored with its:

  • id (the client-generated unique ID)
  • role
  • createdAt timestamp
  • The full parts: UIMessagePart[] array, preserving the structure and content of each part (text, tool invocations with their arguments and results, file references, source citations, reasoning steps).
  • Any metadata: METADATA associated with the message.

Why Persist UIMessages?

  1. Accurate UI State Restoration (Pixel-Perfect Restore): This is the primary and most significant benefit. When a user reloads a chat, you are rehydrating the exact structure and data that their UI previously rendered.

    • If a message involved a tool, the ToolInvocationUIPart will have the toolName, args, and result (or error state), allowing your UI to render the tool's interaction exactly as it was.
    • If files were displayed, the FileUIPart (with its url, mediaType, filename) enables your UI to show the same file previews or links.
    • If reasoning steps (ReasoningUIPart) or sources (SourceUIPart) were part of the message, they are all there.
    • Any custom metadata you used for UI hints or tracking is also preserved.
  2. Decoupling from LLM/Prompt Changes: Your persisted data format (UIMessage) remains independent of:

    • The specific LLM provider you are using.
    • The prompt templates or specific ModelMessage structures you construct on the server using convertToModelMessages. If you decide to switch LLM providers or significantly change how you engineer your prompts, your database schema for storing chats does not need to change, and your existing persisted chat histories do not become invalid. You simply adjust your server-side convertToModelMessages logic or your V2 provider adapter.
  3. Preservation of All Rich Information: ModelMessages are often stripped-down versions of UIMessages, containing only what's necessary for the LLM. Persisting UIMessages ensures that all the rich contextual information is saved. This data might be valuable for analytics, debugging, or future features.

Where to Persist? The onFinish Callback

The ideal place to implement your persistence logic is in the onFinish callback of the server-side streaming helper functions. In v5, this is typically the onFinish callback provided as an option to result.toUIMessageStreamResponse().

  • This onFinish callback is invoked on the server after the entire response from the LLM for the current turn has been processed and all corresponding UIMessageStreamParts have been written to the client-bound stream.
  • The callback receives the final, complete AI-generated UIMessage(s) for the current turn.
  • Your logic in onFinish should then:
    1. Take these new assistant UIMessage(s).
    2. Combine them with the UIMessage[] history that was received from the client for that turn.
    3. Save this entire updated array of UIMessages to your database, associated with the chatId.

Server Route Example Snippet (Focus on onFinish for Persistence)

Here's how it might look in your Next.js API route:

// app/api/v5/chat_endpoint/route.ts (or similar server endpoint)
import { NextRequest, NextResponse } from 'next/server';
import { UIMessage, convertToModelMessages } from 'ai';
import { streamText } from '@ai-sdk/provider';
import { openai } from '@ai-sdk/openai'; // V2 provider

// Assume saveChatToDatabase is your custom function to interact with your DB
async function saveChatToDatabase(
  { id, messages }: { id: string | undefined, messages: UIMessage[] }
) {
  if (!id) {
    console.warn('Chat ID is undefined. Skipping persistence.');
    return;
  }
  console.log(`Persisting ${messages.length} messages for chat ${id} to database.`);
  // Example: await database.collection('chats').doc(id).set({ messages });
  // Make sure your DB schema can store the UIMessage[] structure, especially 'parts' and 'metadata'.
}

export async function POST(req: NextRequest) {
  try {
    const { messages: uiMessagesFromClient, id: chatId }: { messages: UIMessage[]; id?: string } = await req.json();

    const { modelMessages } = convertToModelMessages(uiMessagesFromClient);

    const result = await streamText({
      model: openai('gpt-4o-mini'),
      messages: modelMessages,
      // ... other streamText options
    });

    return result.toUIMessageStreamResponse({
      onFinish: async ({
        responseMessages // These are the NEW assistant UIMessage(s) from this turn
      }: {
        responseMessages: UIMessage[]
      }) => {
        if (chatId && responseMessages && responseMessages.length > 0) {
          const finalConversationStateToPersist: UIMessage[] = [
            ...uiMessagesFromClient, // History from client
            ...responseMessages     // New assistant message(s)
          ];
          await saveChatToDatabase({ id: chatId, messages: finalConversationStateToPersist });
        } else {
          if (!chatId) console.warn('[onFinish] Chat ID missing, cannot persist.');
        }
      },
    });

  } catch (error: unknown) {
    const errorMessage = error instanceof Error ? error.message : 'An unexpected error occurred.';
    return NextResponse.json({ error: errorMessage }, { status: 500 });
  }
}
Enter fullscreen mode Exit fullscreen mode

Contrast with Persisting Model Messages

Briefly, why is storing ModelMessages problematic for persistence?

  • Provider-Specific: ModelMessage structure can subtly differ.
  • Loses UI Detail: ModelMessages are stripped of UI-only parts, UIMessage.id, UIMessage.metadata, etc.
  • Brittle for Rehydration: Changes in LLMs or prompt strategies can break rehydration.

Persisting UIMessage directly avoids all these issues.

Take-aways / Migration Checklist Bullets

  • The Golden Rule: Always persist the complete UIMessage[] array from your chat conversations.
  • Ensure your database schema can store the rich UIMessage structure, including the id, role, createdAt, the full parts array (as JSON/structured data), and any metadata.
  • The ideal place for server-side persistence is the onFinish callback of streaming helper methods like result.toUIMessageStreamResponse().
  • In onFinish, combine the incoming message history from the client with the newly generated assistant UIMessage(s) from that turn, and save the entire updated array.
  • Avoid persisting ModelMessages or other intermediate/LLM-specific formats.
  • Migration Action: If your V4 app persisted a different format, plan to update your database schema and refactor your persistence logic to store UIMessages.

6. Edge-cases: Concurrent Writes, Undo, Re-order (Conceptual / Advanced)

TL;DR: While advanced features like concurrent write resolution, undo/redo, and message reordering are not out-of-the-box in Vercel AI SDK v5, its ChatStore principles and structured *UIMessage** history provide a solid foundation for applications to implement such functionalities.*

Why this matters?

As chat applications become more collaborative or offer richer editing capabilities, we start running into more complex state management challenges. For instance:

  • What happens if two users (or the same user in two different browser tabs) try to send a message to the same chat session at roughly the same time?
  • How can we implement an "undo" for sending a message, or "redo" an undone action?
  • What if we need to allow moderators to edit or re-order messages in a persisted chat?

While Vercel AI SDK v5 doesn't provide these features as built-in turn-key solutions (they are often highly application-specific), its architectural choices lay a much better groundwork for tackling them compared to v4.

How it’s solved in v5? (Conceptual Approaches based on v5 Architecture)

Let's look at how v5's architecture—particularly the ChatStore principles of a centralized, synchronized client state and the well-defined UIMessage structure—can help.

Concurrent Writes

  • The Challenge: Multiple clients or browser tabs are interacting with the same chat id simultaneously.
  • v5 Foundation:
    • The ChatStore concept aims to synchronize chat write operations.
    • When using useChat with a shared id, the SDK's internal shared state logic for that id provides consistency for writes originating from clients sharing that state within the same browser session.
    • For writes from different browser sessions or users to the same backend chat ID, the primary point of synchronization becomes your server-side persistence logic.
  • Server-Side Strategy (Conceptual):
    • Your backend API would receive the chatId and the client's current view of the messages history.
    • When persisting, you might need a mechanism to handle potential conflicts if the database's version of the chat for chatId has changed. This could involve:
      • Optimistic Locking: Include a version number with your persisted chat.
      • Last-Write-Wins: The latest write to the database overwrites previous state.
      • Conflict Resolution Logic: More complex, merging changes if possible.

Undo/Redo Functionality

  • The Challenge: Allowing users to undo sending a message, or undo an edit.
  • v5 Foundation:
    • This is not a built-in SDK feature.
    • However, the structured UIMessage[] history and the concept of a centralized client store make implementing undo/redo more feasible at the application level.
  • Application-Level Strategy (Conceptual):
    1. State Snapshots: Each significant state change could be recorded as a snapshot of the UIMessage[] array.
    2. Undo Stack: Maintain a stack of these previous states. "Undo" would pop from this stack and use setMessages() (from useChat) to revert the client-side state.
    3. Redo Stack: Maintain a stack for redone actions.
    4. Persistence: Persisting the "undone" state to the server is complex and depends on your application's requirements.

Re-ordering Messages (e.g., for Moderation)

  • The Challenge: An application might allow administrators or moderators to re-order messages in a conversation, or edit their content directly.
  • v5 Foundation:
    • Not built-in. This is an advanced application-specific feature.
  • Application-Level Strategy (Conceptual):
    1. Moderation UI: A separate UI would allow a privileged user to fetch the UIMessage[] for a chat.
    2. Direct Manipulation: The moderator could re-order the elements in the UIMessage[] array or modify parts of a UIMessage.
    3. Update Client State: Use setMessages() to reflect these changes locally.
    4. Re-persist: Send the entire modified UIMessage[] array back to the server to overwrite the persisted state for that chat, requiring a separate, authorized API endpoint.

The v5 SDK, by providing a clean, structured UIMessage format and promoting centralized client-side state management, gives developers much better primitives to build these advanced features upon.

Take-aways / Migration Checklist Bullets

  • Vercel AI SDK v5 does not provide out-of-the-box undo/redo, message re-ordering, or complex multi-user concurrent write resolution beyond same-session client consistency.
  • Its architecture provides a strong foundation for applications to implement these advanced features.
  • Concurrent Writes: For multiple clients on different sessions, server-side persistence logic is key.
  • Undo/Redo: Can be built at the application level by managing snapshots of the UIMessage[] state and using setMessages().
  • Re-ordering/Editing: Requires a custom UI and backend endpoint for authorized users to modify and re-persist the UIMessage[] array.

7. Conclusion & Performance Benchmarks (Teaser)

TL;DR: Vercel AI SDK v5's decoupled architecture, featuring ChatStore principles and clean UIMessage/ModelMessage separation, significantly enhances maintainability and scalability for complex chat applications, with performance details to be explored later.

Why this matters?

We've journeyed through some of the most significant architectural shifts in Vercel AI SDK v5 for chat applications. The changes are all geared towards enabling developers to build more sophisticated, robust, and maintainable conversational AI experiences. This isn't just about adding new features; it's about laying a stronger foundation for the future of AI-driven UIs.

How it’s solved in v5? (Summary of Architectural Wins)

Let's quickly recap the key architectural benefits we've explored in this post:

  1. Centralized, Synchronized Client State (via useChat + id embodying ChatStore principles):

    • No more manual synchronization of chat state across multiple UI components viewing the same conversation.
    • Reduces data duplication and provides implicit in-memory caching for chat sessions.
    • Enables smooth, consistent optimistic updates for a responsive user experience.
  2. Clean Separation of UI Concerns from LLM Prompt Engineering (via convertToModelMessages):

    • The rich UIMessage format is ideal for UI rendering and persistence.
    • The leaner ModelMessage format is tailored for V2 LLM interfaces.
    • The convertToModelMessages utility provides a clear, server-side bridge between these two worlds.
  3. Robust, High-Fidelity Chat History (via UIMessage Persistence):

    • Persisting the complete UIMessage[] array ensures that you can restore chat UIs with "pixel-perfect" fidelity.
    • This decouples your persisted data from changes in LLM providers or prompt engineering strategies.

Improved Maintainability & Scalability

Together, these architectural choices lead to:

  • Cleaner Codebases: Responsibilities are more clearly defined.
  • Easier Testing: Decoupled components are generally easier to test.
  • Better Scalability for Complex Applications: This structured and decoupled approach is designed to scale more gracefully.

Performance Benchmarks (Teaser for a Later Post)

While this post has focused heavily on the architectural "why" and "how," performance is always a critical consideration.
We won't dive deep into benchmarks here, but be assured that these are areas the Vercel AI team considers. In a future post, we'll aim to explore performance characteristics.

For now, the focus has been on understanding the foundational architecture that enables these capabilities.

Tease Post 5: The UI Message Stream Engine

With a solid understanding of v5's state management and message conversion, we're ready to look under the hood of its real-time communication.

Our next post will explore the very engine of v5's real-time experience: the SSE-powered UI Message Stream. We'll break down the different UIMessageStreamPart types, see how servers use them to broadcast structured updates, and how clients consume them to build the rich UIMessages we now understand. This protocol is what turbo-charges your chat UX in v5.

Top comments (0)

OSZAR »