Context Compaction

Compaction reduces the size of the conversation history when context usage approaches the model’s limit. This allows long-running conversations to continue without hitting context window errors.

Why Compaction is Needed

Every LLM has a finite context window. As conversations progress, the accumulated messages, tool calls, and results consume more of this window. Without intervention, the context eventually fills and requests fail.

Tool results are particularly problematic. A single file read or search operation can return thousands of tokens. Multiple tool calls in a conversation can quickly consume most of the available context.

When Compaction Triggers

Compaction is checked automatically before each LLM request. When context utilization exceeds the configured threshold, compaction proceeds.

With default settings, compaction triggers when context usage exceeds 75% of the limit. For a 200,000 token context, this means compaction triggers at 150,000 tokens.

Compaction Strategies

Agent Air supports two compaction approaches:

Threshold Compaction

A fast approach that processes tool results without calling the LLM:

Preserves recent turns unchanged
Summarizes or redacts old tool results
Does not require an additional API call
Predictable and consistent results

let config = LLMSessionConfig::anthropic("key", "model")
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.75,
        keep_recent_turns: 5,
        tool_compaction: ToolCompaction::Summarize,
    });

LLM Compaction

An intelligent approach that uses the LLM to create summaries:

Sends older messages to the LLM for summarization
Replaces multiple messages with a single summary message
Preserves recent turns unchanged
Requires an additional API call (default timeout: 60 seconds)

let config = LLMSessionConfig::anthropic("key", "model")
    .with_llm_compaction(LLMCompactorConfig::new(0.75, 5));

Tool Compaction Modes

For threshold compaction, two modes control how tool results are processed:

Summarize

Replaces tool result content with a pre-computed summary when available:

tool_compaction: ToolCompaction::Summarize

Tools can provide summaries when returning results. A file read returning 5,000 tokens might have a summary like:

"Read config.yaml (127 lines): Database configuration with PostgreSQL settings."

During compaction, the full result is replaced with this summary.

Redact

Replaces tool result content with a standard redaction notice:

tool_compaction: ToolCompaction::Redact

The content is replaced with:

[Tool result redacted during context compaction]

Use Redact when tool results are not important for future context or when maximum space savings are needed.

Configuration

CompactionConfig

pub struct CompactionConfig {
    pub threshold: f64,           // When to trigger (0.0 to 1.0)
    pub keep_recent_turns: usize, // Turns to preserve unchanged
    pub tool_compaction: ToolCompaction, // Summarize or Redact
}

Threshold Values

Threshold	Behavior
0.50	Conservative. Compacts early, leaves substantial headroom
0.75	Default. Balanced approach suitable for most use cases
0.90	Aggressive. Maximizes context retention, higher overflow risk

keep_recent_turns

Controls how many recent turns are preserved unchanged during compaction. A turn consists of:

A user message
The assistant’s response (including any tool calls)
Tool results for those tool calls

Default is 5 turns. Reducing this value frees more context during compaction.

Example Configurations

Default (balanced):

CompactionConfig {
    threshold: 0.75,
    keep_recent_turns: 5,
    tool_compaction: ToolCompaction::Summarize,
}

Aggressive (minimal context):

CompactionConfig {
    threshold: 0.05,
    keep_recent_turns: 1,
    tool_compaction: ToolCompaction::Summarize,
}

Maximum retention:

CompactionConfig {
    threshold: 0.90,
    keep_recent_turns: 10,
    tool_compaction: ToolCompaction::Summarize,
}

Manual Compaction

Programmatic Compaction

Trigger compaction regardless of current utilization:

let result = session.force_compact().await;

if result.compacted {
    println!("Compacted {} turns, {} messages remain",
        result.turns_compacted,
        result.messages_after);
}

Using the /compact Command

Users can trigger compaction in the TUI:

/compact

This compacts immediately regardless of the current threshold.

CompactResult

Manual compaction returns detailed statistics:

pub struct CompactResult {
    pub compacted: bool,          // Whether compaction occurred
    pub messages_before: usize,   // Messages before compaction
    pub messages_after: usize,    // Messages after compaction
    pub turns_compacted: usize,   // Turns processed
    pub turns_kept: usize,        // Turns preserved
    pub summary_length: usize,    // LLM summary length (if used)
    pub error: Option<String>,    // Error message if failed
}

When to Use Manual Compaction

Before large operations: Free up context before a request that needs substantial response space
After accumulating tool results: Summarize or redact tool results that are no longer needed
Context pressure: When the model has trouble maintaining coherence
Testing: Verify compaction configuration works as expected

What Gets Compacted

Preserved Content

Recent turns: The keep_recent_turns most recent turns are preserved unchanged, including user messages, assistant responses, tool calls, and tool results.

System prompt: Never compacted. Remains at the beginning of the conversation throughout the session.

Modified Content

Threshold compaction: Only tool results in older turns are modified (summarized or redacted). User messages and assistant text responses are preserved.

LLM compaction: All messages outside the recent turns window are replaced with a single summary message.

Example: Before and After

Before (10 turns, keep_recent_turns=2):

Turn 1-8: Various messages and tool results
Turn 9: User asks to run tests    <- Recent
Turn 10: Assistant runs tests     <- Recent

After threshold compaction:

Turn 1-8: Messages preserved, tool results summarized
Turn 9: Preserved unchanged
Turn 10: Preserved unchanged

After LLM compaction:

[Summary of turns 1-8]
Turn 9: Preserved unchanged
Turn 10: Preserved unchanged

Disabling Compaction

To disable compaction entirely:

let config = LLMSessionConfig::anthropic("key", "model")
    .without_compaction();

Without compaction, the session continues until the provider returns a context length error.

Compaction Statistics

The CompactionResult reports what was processed:

pub struct CompactionResult {
    pub tool_results_summarized: usize,
    pub tool_results_redacted: usize,
    pub turns_compacted: usize,
}

Use these statistics to monitor compaction frequency and effectiveness.

Trade-offs

Threshold Compaction

Pros	Cons
Fast (no LLM call)	Only handles tool results
Predictable results	Requires tools to provide summaries
No additional cost	Limited intelligence

LLM Compaction

Pros	Cons
Creates intelligent summaries	Requires additional API call
Handles all message types	Slower (default 60s timeout)
Context-aware compression	Additional token cost

Building Agents

Architecture & Internals

Context Compaction

Why Compaction is Needed

When Compaction Triggers

Compaction Strategies

Threshold Compaction

LLM Compaction

Tool Compaction Modes

Summarize

Redact

Configuration

CompactionConfig

Threshold Values

keep_recent_turns

Example Configurations

Manual Compaction

Programmatic Compaction

Using the /compact Command

CompactResult

When to Use Manual Compaction

What Gets Compacted

Preserved Content

Modified Content

Example: Before and After

Disabling Compaction

Compaction Statistics

Trade-offs

Threshold Compaction

LLM Compaction