Status Tracking

This page documents how session status and token usage are tracked internally. Status tracking provides visibility into session health, resource consumption, and conversation progress.

SessionStatus Structure

The SessionStatus struct provides a snapshot of session state:

#[derive(Debug, Clone)]
pub struct SessionStatus {
    pub session_id: i64,
    pub model: String,
    pub created_at: Instant,
    pub conversation_len: usize,
    pub context_used: i64,
    pub context_limit: i32,
    pub utilization: f64,
    pub total_input: i64,
    pub total_output: i64,
    pub request_count: i64,
}

Field Descriptions

FieldTypeDescription
session_idi64Unique session identifier
modelStringLLM model name
created_atInstantSession creation time
conversation_lenusizeNumber of messages in history
context_usedi64Input tokens consumed
context_limiti32Maximum context window
utilizationf64Ratio of used to available context
total_inputi64Cumulative input tokens
total_outputi64Cumulative output tokens
request_counti64Number of LLM requests made

Retrieving Session Status

Status is computed on demand from atomic fields:

impl LLMSession {
    pub async fn status(&self) -> SessionStatus {
        let conversation = self.conversation.read().await;
        let context_used = self.current_input_tokens.load(Ordering::SeqCst);
        let context_limit = self.context_limit.load(Ordering::SeqCst);

        SessionStatus {
            session_id: self.id(),
            model: self.config.model.clone(),
            created_at: self.created_at,
            conversation_len: conversation.len(),
            context_used,
            context_limit,
            utilization: context_used as f64 / context_limit as f64,
            total_input: self.current_input_tokens.load(Ordering::SeqCst),
            total_output: self.current_output_tokens.load(Ordering::SeqCst),
            request_count: self.request_count.load(Ordering::SeqCst),
        }
    }
}

TokenUsage Structure

Token usage provides detailed consumption metrics:

#[derive(Debug, Clone, Default)]
pub struct TokenUsage {
    pub total_input_tokens: i64,
    pub total_output_tokens: i64,
    pub request_count: i64,
    pub last_input_tokens: i64,
    pub last_output_tokens: i64,
}

Usage Fields

FieldDescription
total_input_tokensCumulative input tokens across all requests
total_output_tokensCumulative output tokens across all requests
request_countNumber of LLM API calls
last_input_tokensInput tokens from most recent request
last_output_tokensOutput tokens from most recent request

Status Update Events

Status changes are communicated through the event system:

pub enum LLMResponseType {
    // ... other variants
    TokenUpdate,
}

Token updates are sent during streaming responses:

StreamEvent::MessageDelta { stop_reason, usage } => {
    if let Some(usage) = usage {
        self.current_input_tokens.store(usage.input_tokens as i64, Ordering::SeqCst);
        self.current_output_tokens.store(usage.output_tokens as i64, Ordering::SeqCst);

        let payload = FromLLMPayload {
            response_type: LLMResponseType::TokenUpdate,
            input_tokens: usage.input_tokens as i64,
            output_tokens: usage.output_tokens as i64,
            session_id: self.id(),
            // ...
        };
        let _ = self.from_llm.send(payload).await;
    }
}

Controller Event Propagation

The controller converts token updates to UI events:

pub enum ControllerEvent {
    // ... other variants
    TokenUpdate {
        session_id: i64,
        input_tokens: i64,
        output_tokens: i64,
    },
}

This enables real-time token display in the UI during streaming.

Atomic State Fields

Session state uses atomic types for thread-safe access:

pub struct LLMSession {
    // Token tracking
    current_input_tokens: AtomicI64,
    current_output_tokens: AtomicI64,
    request_count: AtomicI64,

    // Context limits
    context_limit: AtomicI32,

    // Session state
    shutdown: AtomicBool,
}

All atomic operations use Ordering::SeqCst for consistency:

// Read current value
let tokens = self.current_input_tokens.load(Ordering::SeqCst);

// Update value
self.current_input_tokens.store(new_value, Ordering::SeqCst);

// Atomic increment
self.request_count.fetch_add(1, Ordering::SeqCst);

Utilization Calculation

Context utilization determines when compaction triggers:

fn calculate_utilization(&self) -> f64 {
    let context_used = self.current_input_tokens.load(Ordering::SeqCst);
    let context_limit = self.context_limit.load(Ordering::SeqCst);
    context_used as f64 / context_limit as f64
}

Utilization thresholds:

  • Below 0.75: Normal operation
  • 0.75 to 1.0: Compaction triggered (if configured)
  • At 1.0: Context window full

Token Usage Tracker

The TokenUsageTracker aggregates usage across sessions:

pub struct TokenUsageTracker {
    tokens_per_session: RwLock<HashMap<i64, TokenMeter>>,
    tokens_per_model: RwLock<HashMap<String, TokenMeter>>,
    total_usage: RwLock<TokenMeter>,
}

Tracker Operations

impl TokenUsageTracker {
    pub async fn increment(
        &self,
        session_id: i64,
        model: &str,
        input_tokens: i64,
        output_tokens: i64,
    ) {
        // Update session-level usage
        let mut sessions = self.tokens_per_session.write().await;
        sessions
            .entry(session_id)
            .or_default()
            .add(input_tokens, output_tokens);

        // Update model-level usage
        let mut models = self.tokens_per_model.write().await;
        models
            .entry(model.to_string())
            .or_default()
            .add(input_tokens, output_tokens);

        // Update total usage
        let mut total = self.total_usage.write().await;
        total.add(input_tokens, output_tokens);
    }
}

TokenMeter Structure

#[derive(Debug, Clone, Default)]
pub struct TokenMeter {
    pub input_tokens: i64,
    pub output_tokens: i64,
    pub request_count: i64,
}

impl TokenMeter {
    pub fn add(&mut self, input: i64, output: i64) {
        self.input_tokens += input;
        self.output_tokens += output;
        self.request_count += 1;
    }

    pub fn total(&self) -> i64 {
        self.input_tokens + self.output_tokens
    }
}

Message-Level Token Storage

Assistant messages store token counts for historical tracking:

pub struct AssistantMessage {
    // ... other fields
    pub input_tokens: i64,
    pub output_tokens: i64,
    pub cache_read_tokens: i64,
    pub cache_write_tokens: i64,
}

This enables:

  • Per-message token auditing
  • Historical usage analysis
  • Cache hit ratio calculation

Status Queries

Common status query patterns:

// Get current utilization
let status = session.status().await;
if status.utilization > 0.9 {
    warn!("Session {} near context limit", status.session_id);
}

// Check total usage
let tracker = controller.usage_tracker();
let session_usage = tracker.get_session_usage(session_id).await;
let model_usage = tracker.get_model_usage("claude-sonnet-4-20250514").await;

// Monitor request rate
let requests = session.request_count();

Thread Safety

Status tracking is designed for concurrent access:

ComponentSynchronizationPurpose
Token countersAtomicI64Lock-free updates
Request countAtomicI64Concurrent increments
Context limitAtomicI32Runtime modification
Usage trackerRwLock<HashMap>Multi-session aggregation

Next Steps