Compaction Algorithm

This page documents how context compaction works internally. Compaction reduces conversation size when approaching context limits, enabling long-running conversations without manual intervention.

Compaction Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Before Each LLM Request                       │
│  maybe_compact() called                                          │
└─────────────────────────────────┬───────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                    Check Threshold                               │
│  context_used / context_limit > threshold?                       │
└─────────────────────────────────┬───────────────────────────────┘
                                  │ Yes

┌─────────────────────────────────────────────────────────────────┐
│                    Execute Compaction                            │
│  ThresholdCompactor: Summarize/redact tool results               │
│  LLMCompactor: Generate conversation summary                     │
└─────────────────────────────────────────────────────────────────┘

Compactor Trait

Two traits define the compaction interface:

pub trait Compactor: Send + Sync {
    fn should_compact(&self, context_used: i64, context_limit: i32) -> bool;

    fn compact(
        &self,
        conversation: &mut Vec<Message>,
        compact_summaries: &HashMap<String, String>,
    ) -> CompactionResult;

    fn is_async(&self) -> bool {
        false
    }
}

pub trait AsyncCompactor: Compactor {
    fn compact_async<'a>(
        &'a self,
        conversation: Vec<Message>,
        compact_summaries: &'a HashMap<String, String>,
    ) -> Pin<Box<dyn Future<Output = Result<(Vec<Message>, CompactionResult), CompactionError>> + Send + 'a>>;
}

CompactionResult

Compaction returns metrics about what was modified:

pub struct CompactionResult {
    pub tool_results_summarized: usize,
    pub tool_results_redacted: usize,
    pub turns_compacted: usize,
}

ThresholdCompactor

The synchronous compactor that summarizes or redacts tool results.

Configuration

pub struct CompactionConfig {
    pub threshold: f64,           // 0.0-1.0 utilization ratio
    pub keep_recent_turns: usize, // Turns to preserve
    pub tool_compaction: ToolCompaction,
}

pub enum ToolCompaction {
    Summarize,  // Replace with compact_summary
    Redact,     // Replace with redaction notice
}

Default Values

impl Default for CompactionConfig {
    fn default() -> Self {
        Self {
            threshold: 0.75,
            keep_recent_turns: 3,
            tool_compaction: ToolCompaction::Summarize,
        }
    }
}

Threshold Check

Compaction triggers when utilization exceeds the threshold:

fn should_compact(&self, context_used: i64, context_limit: i32) -> bool {
    let utilization = context_used as f64 / context_limit as f64;
    utilization > self.config.threshold
}

Algorithm Steps

  1. Extract Turn IDs: Collect unique turn IDs in order
fn extract_turn_ids(conversation: &[Message]) -> Vec<TurnId> {
    let mut turn_ids = Vec::new();
    for msg in conversation {
        let turn_id = msg.turn_id().clone();
        if turn_ids.last() != Some(&turn_id) {
            turn_ids.push(turn_id);
        }
    }
    turn_ids
}
  1. Identify Turns to Compact: Skip recent turns
let turn_count = turn_ids.len();
if turn_count <= self.config.keep_recent_turns {
    return CompactionResult::default(); // Nothing to compact
}

let compact_up_to = turn_count - self.config.keep_recent_turns;
let old_turn_ids: HashSet<_> = turn_ids[..compact_up_to].iter().collect();
  1. Process Messages: Modify tool results in old turns
for message in conversation.iter_mut() {
    if old_turn_ids.contains(message.turn_id()) {
        self.compact_message(message, compact_summaries, &mut result);
    }
}
  1. Compact Tool Results: Replace content based on mode
fn compact_message(
    &self,
    message: &mut Message,
    summaries: &HashMap<String, String>,
    result: &mut CompactionResult,
) {
    if let Message::User(user_msg) = message {
        for block in &mut user_msg.content {
            if let ContentBlock::ToolResult(tool_result) = block {
                match self.config.tool_compaction {
                    ToolCompaction::Summarize => {
                        if let Some(summary) = summaries.get(&tool_result.tool_use_id) {
                            tool_result.content = summary.clone();
                            result.tool_results_summarized += 1;
                        }
                    }
                    ToolCompaction::Redact => {
                        tool_result.content = "[Content redacted for context management]".to_string();
                        result.tool_results_redacted += 1;
                    }
                }
            }
        }
    }
}

LLMCompactor

The asynchronous compactor that uses an LLM to summarize conversation history.

Configuration

pub struct LLMCompactorConfig {
    pub threshold: f64,
    pub keep_recent_turns: usize,
    pub summary_system_prompt: Option<String>,
    pub max_summary_tokens: Option<i64>,
    pub summary_timeout: Option<Duration>,
}

Default Values

impl Default for LLMCompactorConfig {
    fn default() -> Self {
        Self {
            threshold: 0.75,
            keep_recent_turns: 3,
            summary_system_prompt: None,  // Uses built-in prompt
            max_summary_tokens: Some(2048),
            summary_timeout: Some(Duration::from_secs(60)),
        }
    }
}

Default Summary Prompt

const DEFAULT_SUMMARY_PROMPT: &str = r#"
You are a conversation summarizer. Summarize the following conversation
concisely while preserving:
- Key decisions made
- Important information discovered
- Current task context
- Any pending actions

Be concise but complete. Focus on information needed to continue the conversation.
"#;

Algorithm Steps

  1. Separate Messages: Divide into old and recent
async fn compact_async(
    &self,
    conversation: Vec<Message>,
    compact_summaries: &HashMap<String, String>,
) -> Result<(Vec<Message>, CompactionResult), CompactionError> {
    let turn_ids = extract_turn_ids(&conversation);
    let turn_count = turn_ids.len();

    if turn_count <= self.config.keep_recent_turns {
        return Ok((conversation, CompactionResult::default()));
    }

    let compact_up_to = turn_count - self.config.keep_recent_turns;
    let old_turn_ids: HashSet<_> = turn_ids[..compact_up_to].iter().collect();

    let (old_messages, recent_messages): (Vec<_>, Vec<_>) = conversation
        .into_iter()
        .partition(|msg| old_turn_ids.contains(msg.turn_id()));
  1. Format for Summarization: Convert messages to text
fn format_for_summary(messages: &[Message]) -> String {
    let mut output = String::new();
    for msg in messages {
        match msg {
            Message::User(m) => {
                output.push_str("User: ");
                for block in &m.content {
                    match block {
                        ContentBlock::Text(t) => output.push_str(&t.text),
                        ContentBlock::ToolResult(r) => {
                            output.push_str(&format!("[Tool result: {}]", r.tool_use_id));
                        }
                        _ => {}
                    }
                }
                output.push('\n');
            }
            Message::Assistant(m) => {
                output.push_str("Assistant: ");
                for block in &m.content {
                    match block {
                        ContentBlock::Text(t) => output.push_str(&t.text),
                        ContentBlock::ToolUse(u) => {
                            output.push_str(&format!("[Called tool: {}]", u.name));
                        }
                        _ => {}
                    }
                }
                output.push('\n');
            }
        }
    }
    output
}
  1. Call LLM for Summary: Make API request with timeout
let formatted = format_for_summary(&old_messages);
let prompt = format!("Summarize this conversation:\n\n{}", formatted);

let summary = tokio::time::timeout(
    self.config.summary_timeout.unwrap_or(Duration::from_secs(60)),
    self.client.complete(&prompt, self.config.max_summary_tokens),
).await??;
  1. Build Summary Message: Create placeholder at turn 0
fn create_summary_message(summary: &str, session_id: i64) -> Message {
    Message::User(UserMessage {
        id: Uuid::new_v4().to_string(),
        session_id,
        turn_id: TurnId { owner: "u".to_string(), number: 0 },
        created_at: Utc::now(),
        content: vec![ContentBlock::Text(TextBlock {
            text: format!("[Previous conversation summary]: {}", summary),
        })],
    })
}
  1. Reconstruct Conversation: Summary followed by recent messages
let mut new_conversation = Vec::with_capacity(1 + recent_messages.len());
new_conversation.push(summary_message);
new_conversation.extend(recent_messages);

Ok((new_conversation, CompactionResult {
    turns_compacted: old_turn_ids.len(),
    ..Default::default()
}))

Compaction Trigger

Compaction is checked before each LLM request:

async fn maybe_compact(&self) {
    let context_used = self.current_input_tokens.load(Ordering::SeqCst);
    let context_limit = self.context_limit.load(Ordering::SeqCst);

    // Check synchronous compactor
    if let Some(ref compactor) = self.compactor {
        if compactor.should_compact(context_used, context_limit) {
            let summaries = self.compact_summaries.read().await.clone();
            let mut conversation = self.conversation.write().await;
            let result = compactor.compact(Arc::make_mut(&mut *conversation), &summaries);
            tracing::info!("Compaction result: {:?}", result);
        }
    }

    // Check async compactor
    if let Some(ref llm_compactor) = self.llm_compactor {
        if llm_compactor.should_compact(context_used, context_limit) {
            let summaries = self.compact_summaries.read().await.clone();
            let conversation = self.conversation.read().await.as_ref().clone();

            match llm_compactor.compact_async(conversation, &summaries).await {
                Ok((new_conversation, result)) => {
                    *self.conversation.write().await = Arc::new(new_conversation);
                    tracing::info!("LLM compaction result: {:?}", result);
                }
                Err(e) => {
                    tracing::warn!("LLM compaction failed: {}", e);
                }
            }
        }
    }
}

Force Compaction

Compaction can be triggered manually:

pub async fn force_compact(&self) -> Result<CompactionResult, CompactionError> {
    self.compact_now().await
}

This ignores the threshold check and compacts immediately.

Compact Summaries

Tool results can include pre-computed summaries:

pub struct ToolResultBlock {
    pub tool_use_id: String,
    pub content: String,
    pub is_error: bool,
    pub compact_summary: Option<String>,  // Pre-computed summary
}

These are stored during tool execution:

async fn store_compact_summaries(&self, summaries: &HashMap<String, String>) {
    let mut guard = self.compact_summaries.write().await;
    for (tool_use_id, summary) in summaries {
        guard.insert(tool_use_id.clone(), summary.clone());
    }
}

CompactorType Enum

Configuration selects the compaction strategy:

pub enum CompactorType {
    Threshold(CompactionConfig),
    LLM(LLMCompactorConfig),
}

impl Default for CompactorType {
    fn default() -> Self {
        CompactorType::Threshold(CompactionConfig::default())
    }
}

Comparison

FeatureThresholdCompactorLLMCompactor
SpeedFast (sync)Slow (async API call)
QualityMechanicalIntelligent
CostFreeUses API tokens
Failure modeNever failsCan timeout/error
Use caseHigh-volumeQuality-focused

Error Handling

LLM compaction can fail:

pub enum CompactionError {
    LlmError(LlmError),
    Timeout,
    EmptySummary,
}

Failures are logged but do not block the conversation:

match llm_compactor.compact_async(...).await {
    Ok((new_conversation, result)) => { /* apply */ }
    Err(e) => {
        tracing::warn!("LLM compaction failed: {}", e);
        // Continue without compaction
    }
}

Next Steps