Context Compaction

Compaction reduces the size of the conversation history when context usage approaches the model’s limit. This allows long-running conversations to continue without hitting context window errors.

Why Compaction is Needed

Every LLM has a finite context window. As conversations progress, the accumulated messages, tool calls, and results consume more of this window. Without intervention, the context eventually fills and requests fail.

Tool results are particularly problematic. A single file read or search operation can return thousands of tokens. Multiple tool calls in a conversation can quickly consume most of the available context.

When Compaction Triggers

Compaction is checked automatically before each LLM request. When context utilization exceeds the configured threshold, compaction proceeds.

With default settings, compaction triggers when context usage exceeds 75% of the limit. For a 200,000 token context, this means compaction triggers at 150,000 tokens.

Compaction Strategies

Agent Air supports two compaction approaches:

Threshold Compaction

A fast approach that processes tool results without calling the LLM:

  • Preserves recent turns unchanged
  • Summarizes or redacts old tool results
  • Does not require an additional API call
  • Predictable and consistent results
let config = LLMSessionConfig::anthropic("key", "model")
    .with_threshold_compaction(CompactionConfig {
        threshold: 0.75,
        keep_recent_turns: 5,
        tool_compaction: ToolCompaction::Summarize,
    });

LLM Compaction

An intelligent approach that uses the LLM to create summaries:

  • Sends older messages to the LLM for summarization
  • Replaces multiple messages with a single summary message
  • Preserves recent turns unchanged
  • Requires an additional API call (default timeout: 60 seconds)
let config = LLMSessionConfig::anthropic("key", "model")
    .with_llm_compaction(LLMCompactorConfig::new(0.75, 5));

Tool Compaction Modes

For threshold compaction, two modes control how tool results are processed:

Summarize

Replaces tool result content with a pre-computed summary when available:

tool_compaction: ToolCompaction::Summarize

Tools can provide summaries when returning results. A file read returning 5,000 tokens might have a summary like:

"Read config.yaml (127 lines): Database configuration with PostgreSQL settings."

During compaction, the full result is replaced with this summary.

Redact

Replaces tool result content with a standard redaction notice:

tool_compaction: ToolCompaction::Redact

The content is replaced with:

[Tool result redacted during context compaction]

Use Redact when tool results are not important for future context or when maximum space savings are needed.

Configuration

CompactionConfig

pub struct CompactionConfig {
    pub threshold: f64,           // When to trigger (0.0 to 1.0)
    pub keep_recent_turns: usize, // Turns to preserve unchanged
    pub tool_compaction: ToolCompaction, // Summarize or Redact
}

Threshold Values

ThresholdBehavior
0.50Conservative. Compacts early, leaves substantial headroom
0.75Default. Balanced approach suitable for most use cases
0.90Aggressive. Maximizes context retention, higher overflow risk

keep_recent_turns

Controls how many recent turns are preserved unchanged during compaction. A turn consists of:

  • A user message
  • The assistant’s response (including any tool calls)
  • Tool results for those tool calls

Default is 5 turns. Reducing this value frees more context during compaction.

Example Configurations

Default (balanced):

CompactionConfig {
    threshold: 0.75,
    keep_recent_turns: 5,
    tool_compaction: ToolCompaction::Summarize,
}

Aggressive (minimal context):

CompactionConfig {
    threshold: 0.05,
    keep_recent_turns: 1,
    tool_compaction: ToolCompaction::Summarize,
}

Maximum retention:

CompactionConfig {
    threshold: 0.90,
    keep_recent_turns: 10,
    tool_compaction: ToolCompaction::Summarize,
}

Manual Compaction

Programmatic Compaction

Trigger compaction regardless of current utilization:

let result = session.force_compact().await;

if result.compacted {
    println!("Compacted {} turns, {} messages remain",
        result.turns_compacted,
        result.messages_after);
}

Using the /compact Command

Users can trigger compaction in the TUI:

/compact

This compacts immediately regardless of the current threshold.

CompactResult

Manual compaction returns detailed statistics:

pub struct CompactResult {
    pub compacted: bool,          // Whether compaction occurred
    pub messages_before: usize,   // Messages before compaction
    pub messages_after: usize,    // Messages after compaction
    pub turns_compacted: usize,   // Turns processed
    pub turns_kept: usize,        // Turns preserved
    pub summary_length: usize,    // LLM summary length (if used)
    pub error: Option<String>,    // Error message if failed
}

When to Use Manual Compaction

  • Before large operations: Free up context before a request that needs substantial response space
  • After accumulating tool results: Summarize or redact tool results that are no longer needed
  • Context pressure: When the model has trouble maintaining coherence
  • Testing: Verify compaction configuration works as expected

What Gets Compacted

Preserved Content

Recent turns: The keep_recent_turns most recent turns are preserved unchanged, including user messages, assistant responses, tool calls, and tool results.

System prompt: Never compacted. Remains at the beginning of the conversation throughout the session.

Modified Content

Threshold compaction: Only tool results in older turns are modified (summarized or redacted). User messages and assistant text responses are preserved.

LLM compaction: All messages outside the recent turns window are replaced with a single summary message.

Example: Before and After

Before (10 turns, keep_recent_turns=2):

Turn 1-8: Various messages and tool results
Turn 9: User asks to run tests    <- Recent
Turn 10: Assistant runs tests     <- Recent

After threshold compaction:

Turn 1-8: Messages preserved, tool results summarized
Turn 9: Preserved unchanged
Turn 10: Preserved unchanged

After LLM compaction:

[Summary of turns 1-8]
Turn 9: Preserved unchanged
Turn 10: Preserved unchanged

Disabling Compaction

To disable compaction entirely:

let config = LLMSessionConfig::anthropic("key", "model")
    .without_compaction();

Without compaction, the session continues until the provider returns a context length error.

Compaction Statistics

The CompactionResult reports what was processed:

pub struct CompactionResult {
    pub tool_results_summarized: usize,
    pub tool_results_redacted: usize,
    pub turns_compacted: usize,
}

Use these statistics to monitor compaction frequency and effectiveness.

Trade-offs

Threshold Compaction

ProsCons
Fast (no LLM call)Only handles tool results
Predictable resultsRequires tools to provide summaries
No additional costLimited intelligence

LLM Compaction

ProsCons
Creates intelligent summariesRequires additional API call
Handles all message typesSlower (default 60s timeout)
Context-aware compressionAdditional token cost