Streaming is critical for a responsive chat interface. It allows the user to see the AI’s “thought process” in real-time rather than waiting for the entire generation to complete.

Architecture

The streaming pipeline consists of three main stages:

  1. Provider Stream: The LLMClient initiates a request with stream: true and receives a Stream<Item = Result<Bytes, Error>> from the HTTP client.
  2. Event Parsing: A decoder parses Server-Sent Events (SSE) (e.g., data: ...) into structured events (Token, ToolUse, Stop).
  3. UI Updates: These events are sent over a channel to the UI thread, which updates the current message buffer and triggers a re-render.

The StreamChunk Type

We normalize chunks from different providers into a common internal type:

pub enum StreamChunk {
    /// A piece of text content
    Content(String),
    
    /// The start of a tool call
    ToolCallStart { id: String, name: String },
    
    /// A fragment of JSON arguments for a tool
    ToolCallDelta { index: usize, partial_json: String },
    
    /// End of the turn
    Stop(StopReason),
}

Handling Tool Streams

Tool calling adds complexity. Providers like Anthropic stream the tool input JSON incrementally. The controller must buffer these fragments until the JSON is valid and complete before it can execute the tool.