Context Compaction
Compaction reduces the size of the conversation history when context usage approaches the model’s limit. This allows long-running conversations to continue without hitting context window errors.
Why Compaction is Needed
Every LLM has a finite context window. As conversations progress, the accumulated messages, tool calls, and results consume more of this window. Without intervention, the context eventually fills and requests fail.
Tool results are particularly problematic. A single file read or search operation can return thousands of tokens. Multiple tool calls in a conversation can quickly consume most of the available context.
When Compaction Triggers
Compaction is checked automatically before each LLM request. When context utilization exceeds the configured threshold, compaction proceeds.
With default settings, compaction triggers when context usage exceeds 75% of the limit. For a 200,000 token context, this means compaction triggers at 150,000 tokens.
Compaction Strategies
Agent Air supports two compaction approaches:
Threshold Compaction
A fast approach that processes tool results without calling the LLM:
- Preserves recent turns unchanged
- Summarizes or redacts old tool results
- Does not require an additional API call
- Predictable and consistent results
let config = LLMSessionConfig::anthropic("key", "model")
.with_threshold_compaction(CompactionConfig {
threshold: 0.75,
keep_recent_turns: 5,
tool_compaction: ToolCompaction::Summarize,
});
LLM Compaction
An intelligent approach that uses the LLM to create summaries:
- Sends older messages to the LLM for summarization
- Replaces multiple messages with a single summary message
- Preserves recent turns unchanged
- Requires an additional API call (default timeout: 60 seconds)
let config = LLMSessionConfig::anthropic("key", "model")
.with_llm_compaction(LLMCompactorConfig::new(0.75, 5));
Tool Compaction Modes
For threshold compaction, two modes control how tool results are processed:
Summarize
Replaces tool result content with a pre-computed summary when available:
tool_compaction: ToolCompaction::Summarize
Tools can provide summaries when returning results. A file read returning 5,000 tokens might have a summary like:
"Read config.yaml (127 lines): Database configuration with PostgreSQL settings."
During compaction, the full result is replaced with this summary.
Redact
Replaces tool result content with a standard redaction notice:
tool_compaction: ToolCompaction::Redact
The content is replaced with:
[Tool result redacted during context compaction]
Use Redact when tool results are not important for future context or when maximum space savings are needed.
Configuration
CompactionConfig
pub struct CompactionConfig {
pub threshold: f64, // When to trigger (0.0 to 1.0)
pub keep_recent_turns: usize, // Turns to preserve unchanged
pub tool_compaction: ToolCompaction, // Summarize or Redact
}
Threshold Values
| Threshold | Behavior |
|---|---|
| 0.50 | Conservative. Compacts early, leaves substantial headroom |
| 0.75 | Default. Balanced approach suitable for most use cases |
| 0.90 | Aggressive. Maximizes context retention, higher overflow risk |
keep_recent_turns
Controls how many recent turns are preserved unchanged during compaction. A turn consists of:
- A user message
- The assistant’s response (including any tool calls)
- Tool results for those tool calls
Default is 5 turns. Reducing this value frees more context during compaction.
Example Configurations
Default (balanced):
CompactionConfig {
threshold: 0.75,
keep_recent_turns: 5,
tool_compaction: ToolCompaction::Summarize,
}
Aggressive (minimal context):
CompactionConfig {
threshold: 0.05,
keep_recent_turns: 1,
tool_compaction: ToolCompaction::Summarize,
}
Maximum retention:
CompactionConfig {
threshold: 0.90,
keep_recent_turns: 10,
tool_compaction: ToolCompaction::Summarize,
}
Manual Compaction
Programmatic Compaction
Trigger compaction regardless of current utilization:
let result = session.force_compact().await;
if result.compacted {
println!("Compacted {} turns, {} messages remain",
result.turns_compacted,
result.messages_after);
}
Using the /compact Command
Users can trigger compaction in the TUI:
/compact
This compacts immediately regardless of the current threshold.
CompactResult
Manual compaction returns detailed statistics:
pub struct CompactResult {
pub compacted: bool, // Whether compaction occurred
pub messages_before: usize, // Messages before compaction
pub messages_after: usize, // Messages after compaction
pub turns_compacted: usize, // Turns processed
pub turns_kept: usize, // Turns preserved
pub summary_length: usize, // LLM summary length (if used)
pub error: Option<String>, // Error message if failed
}
When to Use Manual Compaction
- Before large operations: Free up context before a request that needs substantial response space
- After accumulating tool results: Summarize or redact tool results that are no longer needed
- Context pressure: When the model has trouble maintaining coherence
- Testing: Verify compaction configuration works as expected
What Gets Compacted
Preserved Content
Recent turns: The keep_recent_turns most recent turns are preserved unchanged, including user messages, assistant responses, tool calls, and tool results.
System prompt: Never compacted. Remains at the beginning of the conversation throughout the session.
Modified Content
Threshold compaction: Only tool results in older turns are modified (summarized or redacted). User messages and assistant text responses are preserved.
LLM compaction: All messages outside the recent turns window are replaced with a single summary message.
Example: Before and After
Before (10 turns, keep_recent_turns=2):
Turn 1-8: Various messages and tool results
Turn 9: User asks to run tests <- Recent
Turn 10: Assistant runs tests <- Recent
After threshold compaction:
Turn 1-8: Messages preserved, tool results summarized
Turn 9: Preserved unchanged
Turn 10: Preserved unchanged
After LLM compaction:
[Summary of turns 1-8]
Turn 9: Preserved unchanged
Turn 10: Preserved unchanged
Disabling Compaction
To disable compaction entirely:
let config = LLMSessionConfig::anthropic("key", "model")
.without_compaction();
Without compaction, the session continues until the provider returns a context length error.
Compaction Statistics
The CompactionResult reports what was processed:
pub struct CompactionResult {
pub tool_results_summarized: usize,
pub tool_results_redacted: usize,
pub turns_compacted: usize,
}
Use these statistics to monitor compaction frequency and effectiveness.
Trade-offs
Threshold Compaction
| Pros | Cons |
|---|---|
| Fast (no LLM call) | Only handles tool results |
| Predictable results | Requires tools to provide summaries |
| No additional cost | Limited intelligence |
LLM Compaction
| Pros | Cons |
|---|---|
| Creates intelligent summaries | Requires additional API call |
| Handles all message types | Slower (default 60s timeout) |
| Context-aware compression | Additional token cost |
