Compaction Algorithm
This page documents how context compaction works internally. Compaction reduces conversation size when approaching context limits, enabling long-running conversations without manual intervention.
Compaction Overview
┌─────────────────────────────────────────────────────────────────┐
│ Before Each LLM Request │
│ maybe_compact() called │
└─────────────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Check Threshold │
│ context_used / context_limit > threshold? │
└─────────────────────────────────┬───────────────────────────────┘
│ Yes
▼
┌─────────────────────────────────────────────────────────────────┐
│ Execute Compaction │
│ ThresholdCompactor: Summarize/redact tool results │
│ LLMCompactor: Generate conversation summary │
└─────────────────────────────────────────────────────────────────┘
Compactor Trait
Two traits define the compaction interface:
pub trait Compactor: Send + Sync {
fn should_compact(&self, context_used: i64, context_limit: i32) -> bool;
fn compact(
&self,
conversation: &mut Vec<Message>,
compact_summaries: &HashMap<String, String>,
) -> CompactionResult;
fn is_async(&self) -> bool {
false
}
}
pub trait AsyncCompactor: Compactor {
fn compact_async<'a>(
&'a self,
conversation: Vec<Message>,
compact_summaries: &'a HashMap<String, String>,
) -> Pin<Box<dyn Future<Output = Result<(Vec<Message>, CompactionResult), CompactionError>> + Send + 'a>>;
}
CompactionResult
Compaction returns metrics about what was modified:
pub struct CompactionResult {
pub tool_results_summarized: usize,
pub tool_results_redacted: usize,
pub turns_compacted: usize,
}
ThresholdCompactor
The synchronous compactor that summarizes or redacts tool results.
Configuration
pub struct CompactionConfig {
pub threshold: f64, // 0.0-1.0 utilization ratio
pub keep_recent_turns: usize, // Turns to preserve
pub tool_compaction: ToolCompaction,
}
pub enum ToolCompaction {
Summarize, // Replace with compact_summary
Redact, // Replace with redaction notice
}
Default Values
impl Default for CompactionConfig {
fn default() -> Self {
Self {
threshold: 0.75,
keep_recent_turns: 3,
tool_compaction: ToolCompaction::Summarize,
}
}
}
Threshold Check
Compaction triggers when utilization exceeds the threshold:
fn should_compact(&self, context_used: i64, context_limit: i32) -> bool {
let utilization = context_used as f64 / context_limit as f64;
utilization > self.config.threshold
}
Algorithm Steps
- Extract Turn IDs: Collect unique turn IDs in order
fn extract_turn_ids(conversation: &[Message]) -> Vec<TurnId> {
let mut turn_ids = Vec::new();
for msg in conversation {
let turn_id = msg.turn_id().clone();
if turn_ids.last() != Some(&turn_id) {
turn_ids.push(turn_id);
}
}
turn_ids
}
- Identify Turns to Compact: Skip recent turns
let turn_count = turn_ids.len();
if turn_count <= self.config.keep_recent_turns {
return CompactionResult::default(); // Nothing to compact
}
let compact_up_to = turn_count - self.config.keep_recent_turns;
let old_turn_ids: HashSet<_> = turn_ids[..compact_up_to].iter().collect();
- Process Messages: Modify tool results in old turns
for message in conversation.iter_mut() {
if old_turn_ids.contains(message.turn_id()) {
self.compact_message(message, compact_summaries, &mut result);
}
}
- Compact Tool Results: Replace content based on mode
fn compact_message(
&self,
message: &mut Message,
summaries: &HashMap<String, String>,
result: &mut CompactionResult,
) {
if let Message::User(user_msg) = message {
for block in &mut user_msg.content {
if let ContentBlock::ToolResult(tool_result) = block {
match self.config.tool_compaction {
ToolCompaction::Summarize => {
if let Some(summary) = summaries.get(&tool_result.tool_use_id) {
tool_result.content = summary.clone();
result.tool_results_summarized += 1;
}
}
ToolCompaction::Redact => {
tool_result.content = "[Content redacted for context management]".to_string();
result.tool_results_redacted += 1;
}
}
}
}
}
}
LLMCompactor
The asynchronous compactor that uses an LLM to summarize conversation history.
Configuration
pub struct LLMCompactorConfig {
pub threshold: f64,
pub keep_recent_turns: usize,
pub summary_system_prompt: Option<String>,
pub max_summary_tokens: Option<i64>,
pub summary_timeout: Option<Duration>,
}
Default Values
impl Default for LLMCompactorConfig {
fn default() -> Self {
Self {
threshold: 0.75,
keep_recent_turns: 3,
summary_system_prompt: None, // Uses built-in prompt
max_summary_tokens: Some(2048),
summary_timeout: Some(Duration::from_secs(60)),
}
}
}
Default Summary Prompt
const DEFAULT_SUMMARY_PROMPT: &str = r#"
You are a conversation summarizer. Summarize the following conversation
concisely while preserving:
- Key decisions made
- Important information discovered
- Current task context
- Any pending actions
Be concise but complete. Focus on information needed to continue the conversation.
"#;
Algorithm Steps
- Separate Messages: Divide into old and recent
async fn compact_async(
&self,
conversation: Vec<Message>,
compact_summaries: &HashMap<String, String>,
) -> Result<(Vec<Message>, CompactionResult), CompactionError> {
let turn_ids = extract_turn_ids(&conversation);
let turn_count = turn_ids.len();
if turn_count <= self.config.keep_recent_turns {
return Ok((conversation, CompactionResult::default()));
}
let compact_up_to = turn_count - self.config.keep_recent_turns;
let old_turn_ids: HashSet<_> = turn_ids[..compact_up_to].iter().collect();
let (old_messages, recent_messages): (Vec<_>, Vec<_>) = conversation
.into_iter()
.partition(|msg| old_turn_ids.contains(msg.turn_id()));
- Format for Summarization: Convert messages to text
fn format_for_summary(messages: &[Message]) -> String {
let mut output = String::new();
for msg in messages {
match msg {
Message::User(m) => {
output.push_str("User: ");
for block in &m.content {
match block {
ContentBlock::Text(t) => output.push_str(&t.text),
ContentBlock::ToolResult(r) => {
output.push_str(&format!("[Tool result: {}]", r.tool_use_id));
}
_ => {}
}
}
output.push('\n');
}
Message::Assistant(m) => {
output.push_str("Assistant: ");
for block in &m.content {
match block {
ContentBlock::Text(t) => output.push_str(&t.text),
ContentBlock::ToolUse(u) => {
output.push_str(&format!("[Called tool: {}]", u.name));
}
_ => {}
}
}
output.push('\n');
}
}
}
output
}
- Call LLM for Summary: Make API request with timeout
let formatted = format_for_summary(&old_messages);
let prompt = format!("Summarize this conversation:\n\n{}", formatted);
let summary = tokio::time::timeout(
self.config.summary_timeout.unwrap_or(Duration::from_secs(60)),
self.client.complete(&prompt, self.config.max_summary_tokens),
).await??;
- Build Summary Message: Create placeholder at turn 0
fn create_summary_message(summary: &str, session_id: i64) -> Message {
Message::User(UserMessage {
id: Uuid::new_v4().to_string(),
session_id,
turn_id: TurnId { owner: "u".to_string(), number: 0 },
created_at: Utc::now(),
content: vec![ContentBlock::Text(TextBlock {
text: format!("[Previous conversation summary]: {}", summary),
})],
})
}
- Reconstruct Conversation: Summary followed by recent messages
let mut new_conversation = Vec::with_capacity(1 + recent_messages.len());
new_conversation.push(summary_message);
new_conversation.extend(recent_messages);
Ok((new_conversation, CompactionResult {
turns_compacted: old_turn_ids.len(),
..Default::default()
}))
Compaction Trigger
Compaction is checked before each LLM request:
async fn maybe_compact(&self) {
let context_used = self.current_input_tokens.load(Ordering::SeqCst);
let context_limit = self.context_limit.load(Ordering::SeqCst);
// Check synchronous compactor
if let Some(ref compactor) = self.compactor {
if compactor.should_compact(context_used, context_limit) {
let summaries = self.compact_summaries.read().await.clone();
let mut conversation = self.conversation.write().await;
let result = compactor.compact(Arc::make_mut(&mut *conversation), &summaries);
tracing::info!("Compaction result: {:?}", result);
}
}
// Check async compactor
if let Some(ref llm_compactor) = self.llm_compactor {
if llm_compactor.should_compact(context_used, context_limit) {
let summaries = self.compact_summaries.read().await.clone();
let conversation = self.conversation.read().await.as_ref().clone();
match llm_compactor.compact_async(conversation, &summaries).await {
Ok((new_conversation, result)) => {
*self.conversation.write().await = Arc::new(new_conversation);
tracing::info!("LLM compaction result: {:?}", result);
}
Err(e) => {
tracing::warn!("LLM compaction failed: {}", e);
}
}
}
}
}
Force Compaction
Compaction can be triggered manually:
pub async fn force_compact(&self) -> Result<CompactionResult, CompactionError> {
self.compact_now().await
}
This ignores the threshold check and compacts immediately.
Compact Summaries
Tool results can include pre-computed summaries:
pub struct ToolResultBlock {
pub tool_use_id: String,
pub content: String,
pub is_error: bool,
pub compact_summary: Option<String>, // Pre-computed summary
}
These are stored during tool execution:
async fn store_compact_summaries(&self, summaries: &HashMap<String, String>) {
let mut guard = self.compact_summaries.write().await;
for (tool_use_id, summary) in summaries {
guard.insert(tool_use_id.clone(), summary.clone());
}
}
CompactorType Enum
Configuration selects the compaction strategy:
pub enum CompactorType {
Threshold(CompactionConfig),
LLM(LLMCompactorConfig),
}
impl Default for CompactorType {
fn default() -> Self {
CompactorType::Threshold(CompactionConfig::default())
}
}
Comparison
| Feature | ThresholdCompactor | LLMCompactor |
|---|---|---|
| Speed | Fast (sync) | Slow (async API call) |
| Quality | Mechanical | Intelligent |
| Cost | Free | Uses API tokens |
| Failure mode | Never fails | Can timeout/error |
| Use case | High-volume | Quality-focused |
Error Handling
LLM compaction can fail:
pub enum CompactionError {
LlmError(LlmError),
Timeout,
EmptySummary,
}
Failures are logged but do not block the conversation:
match llm_compactor.compact_async(...).await {
Ok((new_conversation, result)) => { /* apply */ }
Err(e) => {
tracing::warn!("LLM compaction failed: {}", e);
// Continue without compaction
}
}
Next Steps
- Token Tracking - Token counting implementation
- Context Management - Conversation storage
- Session Creation - Session configuration
