Session Configuration

The LLMSessionConfig struct contains all settings for an LLM session, including provider credentials, model selection, response parameters, and compaction configuration.

Configuration Fields

Field	Type	Description
`provider`	`LLMProvider`	The LLM provider (Anthropic, OpenAI, Google, etc.)
`api_key`	`String`	API key for authentication
`model`	`String`	Model identifier
`max_tokens`	`Option<u32>`	Maximum response tokens (default: 4096)
`system_prompt`	`Option<String>`	System prompt for the session
`temperature`	`Option<f32>`	Response randomness (0.0-2.0)
`streaming`	`bool`	Enable streaming responses
`context_limit`	`i32`	Context window size in tokens
`compaction`	`Option<CompactorType>`	Compaction configuration

Creating Session Configurations

Using Constructor Methods

Each provider has a constructor that sets appropriate defaults:

use agent_air::controller::LLMSessionConfig;

// Anthropic
let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514");

// OpenAI
let config = LLMSessionConfig::openai("sk-...", "gpt-4-turbo-preview");

// Google
let config = LLMSessionConfig::google("...", "gemini-2.5-flash");

// OpenAI-compatible providers
let config = LLMSessionConfig::openai_compatible(
    "api-key",
    "model-name",
    "https://api.provider.com/v1",
    128_000,  // context limit
);

Using Builder Methods

Customize configurations with builder methods:

let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514")
    .with_system_prompt("You are a helpful assistant.")
    .with_max_tokens(8192)
    .with_temperature(0.7)
    .with_streaming(true);

Builder Method Reference

with_system_prompt

Sets the system prompt for the session:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_system_prompt("You are a code reviewer. Be thorough and constructive.");

with_max_tokens

Sets the maximum number of tokens in responses:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_max_tokens(8192);

with_temperature

Controls response randomness. Lower values produce more deterministic responses:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_temperature(0.3);  // More focused

with_streaming

Enables or disables streaming responses:

let config = LLMSessionConfig::openai("key", "model")
    .with_streaming(true);

with_context_limit

Sets the context window size for compaction calculations:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_context_limit(100_000);

Context Limits

The context window is the maximum number of tokens the model can process in a single request, including both the input (messages, system prompt) and the output (response).

Default Context Limits by Provider

Provider	Default Context Limit
Anthropic	200,000 tokens
OpenAI	128,000 tokens
Google	1,000,000 tokens
Groq	131,072 tokens
Most others	128,000 tokens

Context Limit vs Max Tokens

These two settings serve different purposes:

context_limit: Total tokens the model can handle (input + output). Used for compaction decisions. Not sent to the API.
max_tokens: Maximum tokens in the response only. Sent to the API to limit response length.

Example: With a 200,000 context limit and 4,096 max tokens, a request with 100,000 input tokens can receive up to 4,096 output tokens.

Model-Specific Limits

Different models have different context window sizes. Set the context limit to match your model:

Claude models:

Claude 3 Opus/Sonnet/Haiku: 200,000 tokens

GPT models:

GPT-4 Turbo: 128,000 tokens
GPT-4: 8,192 tokens (or 32,768 for extended context)
GPT-3.5 Turbo: 16,385 tokens

Example for GPT-4 (non-turbo):

let config = LLMSessionConfig::openai("key", "gpt-4")
    .with_context_limit(8_192);

Compaction Configuration

Configure automatic context compaction to prevent exceeding context limits:

use agent_air::controller::{CompactionConfig, ToolCompaction};

// Threshold-based compaction
let config = LLMSessionConfig::anthropic("key", "model")
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.80,
        keep_recent_turns: 3,
        tool_compaction: ToolCompaction::Summarize,
    });

// Disable compaction
let config = LLMSessionConfig::anthropic("key", "model")
    .without_compaction();

Compaction Threshold

The default compaction threshold is 75% of the context limit. With a 200,000 token context limit, compaction triggers when context usage exceeds 150,000 tokens.

For agents that need more headroom, configure a lower threshold:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.50,  // Trigger at 50% usage
        keep_recent_turns: 5,
        tool_compaction: ToolCompaction::Summarize,
    });

Provider Defaults

Anthropic

Setting	Default Value
`max_tokens`	4,096
`streaming`	true
`context_limit`	200,000
`temperature`	Provider default
`compaction`	Threshold-based

OpenAI

Setting	Default Value
`max_tokens`	4,096
`streaming`	false
`context_limit`	128,000
`temperature`	Provider default
`compaction`	Threshold-based

Google

Setting	Default Value
`max_tokens`	4,096
`streaming`	true
`context_limit`	1,000,000
`temperature`	Provider default
`compaction`	Threshold-based

OpenAI-Compatible Providers

Setting	Default Value
`max_tokens`	4,096
`streaming`	true
`context_limit`	Varies by provider
`temperature`	Provider default
`compaction`	Threshold-based

Complete Configuration Example

use agent_air::controller::{LLMSessionConfig, CompactionConfig, ToolCompaction};

let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514")
    .with_system_prompt("You are a code reviewer.")
    .with_max_tokens(8192)
    .with_temperature(0.3)
    .with_streaming(true)
    .with_context_limit(200_000)
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.80,
        keep_recent_turns: 5,
        tool_compaction: ToolCompaction::Summarize,
    });

When to Customize Settings

Increase max_tokens when you expect long responses (code generation, detailed explanations).

Lower temperature (0.0-0.5) for deterministic tasks like code review or structured output.

Raise temperature (0.7-1.0) for creative tasks or brainstorming.

Disable streaming if your application processes complete responses rather than streaming output.

Adjust compaction based on conversation length requirements. See Compaction Strategies for details.

Building Agents

Architecture & Internals