Streaming is critical for a responsive chat interface. It allows the user to see the AI’s “thought process” in real-time rather than waiting for the entire generation to complete.
Architecture
The streaming pipeline consists of three main stages:
- Provider Stream: The
LLMClientinitiates a request withstream: trueand receives aStream<Item = Result<Bytes, Error>>from the HTTP client. - Event Parsing: A decoder parses Server-Sent Events (SSE) (e.g.,
data: ...) into structured events (Token,ToolUse,Stop). - UI Updates: These events are sent over a channel to the UI thread, which updates the current message buffer and triggers a re-render.
The StreamChunk Type
We normalize chunks from different providers into a common internal type:
pub enum StreamChunk {
/// A piece of text content
Content(String),
/// The start of a tool call
ToolCallStart { id: String, name: String },
/// A fragment of JSON arguments for a tool
ToolCallDelta { index: usize, partial_json: String },
/// End of the turn
Stop(StopReason),
}
Handling Tool Streams
Tool calling adds complexity. Providers like Anthropic stream the tool input JSON incrementally. The controller must buffer these fragments until the JSON is valid and complete before it can execute the tool.
