Session Configuration
The LLMSessionConfig struct contains all settings for an LLM session, including provider credentials, model selection, response parameters, and compaction configuration.
Configuration Fields
| Field | Type | Description |
|---|---|---|
provider | LLMProvider | The LLM provider (Anthropic, OpenAI, Google, etc.) |
api_key | String | API key for authentication |
model | String | Model identifier |
max_tokens | Option<u32> | Maximum response tokens (default: 4096) |
system_prompt | Option<String> | System prompt for the session |
temperature | Option<f32> | Response randomness (0.0-2.0) |
streaming | bool | Enable streaming responses |
context_limit | i32 | Context window size in tokens |
compaction | Option<CompactorType> | Compaction configuration |
Creating Session Configurations
Using Constructor Methods
Each provider has a constructor that sets appropriate defaults:
use agent_air::controller::LLMSessionConfig;
// Anthropic
let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514");
// OpenAI
let config = LLMSessionConfig::openai("sk-...", "gpt-4-turbo-preview");
// Google
let config = LLMSessionConfig::google("...", "gemini-2.5-flash");
// OpenAI-compatible providers
let config = LLMSessionConfig::openai_compatible(
"api-key",
"model-name",
"https://api.provider.com/v1",
128_000, // context limit
);
Using Builder Methods
Customize configurations with builder methods:
let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514")
.with_system_prompt("You are a helpful assistant.")
.with_max_tokens(8192)
.with_temperature(0.7)
.with_streaming(true);
Builder Method Reference
with_system_prompt
Sets the system prompt for the session:
let config = LLMSessionConfig::anthropic("key", "model")
.with_system_prompt("You are a code reviewer. Be thorough and constructive.");
with_max_tokens
Sets the maximum number of tokens in responses:
let config = LLMSessionConfig::anthropic("key", "model")
.with_max_tokens(8192);
with_temperature
Controls response randomness. Lower values produce more deterministic responses:
let config = LLMSessionConfig::anthropic("key", "model")
.with_temperature(0.3); // More focused
with_streaming
Enables or disables streaming responses:
let config = LLMSessionConfig::openai("key", "model")
.with_streaming(true);
with_context_limit
Sets the context window size for compaction calculations:
let config = LLMSessionConfig::anthropic("key", "model")
.with_context_limit(100_000);
Context Limits
The context window is the maximum number of tokens the model can process in a single request, including both the input (messages, system prompt) and the output (response).
Default Context Limits by Provider
| Provider | Default Context Limit |
|---|---|
| Anthropic | 200,000 tokens |
| OpenAI | 128,000 tokens |
| 1,000,000 tokens | |
| Groq | 131,072 tokens |
| Most others | 128,000 tokens |
Context Limit vs Max Tokens
These two settings serve different purposes:
- context_limit: Total tokens the model can handle (input + output). Used for compaction decisions. Not sent to the API.
- max_tokens: Maximum tokens in the response only. Sent to the API to limit response length.
Example: With a 200,000 context limit and 4,096 max tokens, a request with 100,000 input tokens can receive up to 4,096 output tokens.
Model-Specific Limits
Different models have different context window sizes. Set the context limit to match your model:
Claude models:
- Claude 3 Opus/Sonnet/Haiku: 200,000 tokens
GPT models:
- GPT-4 Turbo: 128,000 tokens
- GPT-4: 8,192 tokens (or 32,768 for extended context)
- GPT-3.5 Turbo: 16,385 tokens
Example for GPT-4 (non-turbo):
let config = LLMSessionConfig::openai("key", "gpt-4")
.with_context_limit(8_192);
Compaction Configuration
Configure automatic context compaction to prevent exceeding context limits:
use agent_air::controller::{CompactionConfig, ToolCompaction};
// Threshold-based compaction
let config = LLMSessionConfig::anthropic("key", "model")
.with_threshold_compaction(CompactionConfig {
threshold: 0.80,
keep_recent_turns: 3,
tool_compaction: ToolCompaction::Summarize,
});
// Disable compaction
let config = LLMSessionConfig::anthropic("key", "model")
.without_compaction();
Compaction Threshold
The default compaction threshold is 75% of the context limit. With a 200,000 token context limit, compaction triggers when context usage exceeds 150,000 tokens.
For agents that need more headroom, configure a lower threshold:
let config = LLMSessionConfig::anthropic("key", "model")
.with_threshold_compaction(CompactionConfig {
threshold: 0.50, // Trigger at 50% usage
keep_recent_turns: 5,
tool_compaction: ToolCompaction::Summarize,
});
Provider Defaults
Anthropic
| Setting | Default Value |
|---|---|
max_tokens | 4,096 |
streaming | true |
context_limit | 200,000 |
temperature | Provider default |
compaction | Threshold-based |
OpenAI
| Setting | Default Value |
|---|---|
max_tokens | 4,096 |
streaming | false |
context_limit | 128,000 |
temperature | Provider default |
compaction | Threshold-based |
| Setting | Default Value |
|---|---|
max_tokens | 4,096 |
streaming | true |
context_limit | 1,000,000 |
temperature | Provider default |
compaction | Threshold-based |
OpenAI-Compatible Providers
| Setting | Default Value |
|---|---|
max_tokens | 4,096 |
streaming | true |
context_limit | Varies by provider |
temperature | Provider default |
compaction | Threshold-based |
Complete Configuration Example
use agent_air::controller::{LLMSessionConfig, CompactionConfig, ToolCompaction};
let config = LLMSessionConfig::anthropic("sk-ant-...", "claude-sonnet-4-20250514")
.with_system_prompt("You are a code reviewer.")
.with_max_tokens(8192)
.with_temperature(0.3)
.with_streaming(true)
.with_context_limit(200_000)
.with_threshold_compaction(CompactionConfig {
threshold: 0.80,
keep_recent_turns: 5,
tool_compaction: ToolCompaction::Summarize,
});
When to Customize Settings
Increase max_tokens when you expect long responses (code generation, detailed explanations).
Lower temperature (0.0-0.5) for deterministic tasks like code review or structured output.
Raise temperature (0.7-1.0) for creative tasks or brainstorming.
Disable streaming if your application processes complete responses rather than streaming output.
Adjust compaction based on conversation length requirements. See Compaction Strategies for details.
