Session Configuration

The LLMSessionConfig struct contains all settings for an LLM session, including provider credentials, model selection, response parameters, and compaction configuration.

Configuration Fields

FieldTypeDescription
providerLLMProviderThe LLM provider (Anthropic, OpenAI, Google, etc.)
api_keyStringAPI key for authentication
modelStringModel identifier
max_tokensOption<u32>Maximum response tokens (default: 4096)
system_promptOption<String>System prompt for the session
temperatureOption<f32>Response randomness (0.0-2.0)
streamingboolEnable streaming responses
context_limiti32Context window size in tokens
compactionOption<CompactorType>Compaction configuration

Creating Session Configurations

Using Constructor Methods

Each provider has a constructor that sets appropriate defaults:

use agent_air::controller::LLMSessionConfig;

// Anthropic
let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514");

// OpenAI
let config = LLMSessionConfig::openai("sk-...", "gpt-4-turbo-preview");

// Google
let config = LLMSessionConfig::google("...", "gemini-2.5-flash");

// OpenAI-compatible providers
let config = LLMSessionConfig::openai_compatible(
    "api-key",
    "model-name",
    "https://api.provider.com/v1",
    128_000,  // context limit
);

Using Builder Methods

Customize configurations with builder methods:

let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514")
    .with_system_prompt("You are a helpful assistant.")
    .with_max_tokens(8192)
    .with_temperature(0.7)
    .with_streaming(true);

Builder Method Reference

with_system_prompt

Sets the system prompt for the session:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_system_prompt("You are a code reviewer. Be thorough and constructive.");

with_max_tokens

Sets the maximum number of tokens in responses:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_max_tokens(8192);

with_temperature

Controls response randomness. Lower values produce more deterministic responses:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_temperature(0.3);  // More focused

with_streaming

Enables or disables streaming responses:

let config = LLMSessionConfig::openai("key", "model")
    .with_streaming(true);

with_context_limit

Sets the context window size for compaction calculations:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_context_limit(100_000);

Context Limits

The context window is the maximum number of tokens the model can process in a single request, including both the input (messages, system prompt) and the output (response).

Default Context Limits by Provider

ProviderDefault Context Limit
Anthropic200,000 tokens
OpenAI128,000 tokens
Google1,000,000 tokens
Groq131,072 tokens
Most others128,000 tokens

Context Limit vs Max Tokens

These two settings serve different purposes:

  • context_limit: Total tokens the model can handle (input + output). Used for compaction decisions. Not sent to the API.
  • max_tokens: Maximum tokens in the response only. Sent to the API to limit response length.

Example: With a 200,000 context limit and 4,096 max tokens, a request with 100,000 input tokens can receive up to 4,096 output tokens.

Model-Specific Limits

Different models have different context window sizes. Set the context limit to match your model:

Claude models:

  • Claude 3 Opus/Sonnet/Haiku: 200,000 tokens

GPT models:

  • GPT-4 Turbo: 128,000 tokens
  • GPT-4: 8,192 tokens (or 32,768 for extended context)
  • GPT-3.5 Turbo: 16,385 tokens

Example for GPT-4 (non-turbo):

let config = LLMSessionConfig::openai("key", "gpt-4")
    .with_context_limit(8_192);

Compaction Configuration

Configure automatic context compaction to prevent exceeding context limits:

use agent_air::controller::{CompactionConfig, ToolCompaction};

// Threshold-based compaction
let config = LLMSessionConfig::anthropic("key", "model")
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.80,
        keep_recent_turns: 3,
        tool_compaction: ToolCompaction::Summarize,
    });

// Disable compaction
let config = LLMSessionConfig::anthropic("key", "model")
    .without_compaction();

Compaction Threshold

The default compaction threshold is 75% of the context limit. With a 200,000 token context limit, compaction triggers when context usage exceeds 150,000 tokens.

For agents that need more headroom, configure a lower threshold:

let config = LLMSessionConfig::anthropic("key", "model")
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.50,  // Trigger at 50% usage
        keep_recent_turns: 5,
        tool_compaction: ToolCompaction::Summarize,
    });

Provider Defaults

Anthropic

SettingDefault Value
max_tokens4,096
streamingtrue
context_limit200,000
temperatureProvider default
compactionThreshold-based

OpenAI

SettingDefault Value
max_tokens4,096
streamingfalse
context_limit128,000
temperatureProvider default
compactionThreshold-based

Google

SettingDefault Value
max_tokens4,096
streamingtrue
context_limit1,000,000
temperatureProvider default
compactionThreshold-based

OpenAI-Compatible Providers

SettingDefault Value
max_tokens4,096
streamingtrue
context_limitVaries by provider
temperatureProvider default
compactionThreshold-based

Complete Configuration Example

use agent_air::controller::{LLMSessionConfig, CompactionConfig, ToolCompaction};

let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514")
    .with_system_prompt("You are a code reviewer.")
    .with_max_tokens(8192)
    .with_temperature(0.3)
    .with_streaming(true)
    .with_context_limit(200_000)
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.80,
        keep_recent_turns: 5,
        tool_compaction: ToolCompaction::Summarize,
    });

When to Customize Settings

Increase max_tokens when you expect long responses (code generation, detailed explanations).

Lower temperature (0.0-0.5) for deterministic tasks like code review or structured output.

Raise temperature (0.7-1.0) for creative tasks or brainstorming.

Disable streaming if your application processes complete responses rather than streaming output.

Adjust compaction based on conversation length requirements. See Compaction Strategies for details.