Status Tracking
This page documents how session status and token usage are tracked internally. Status tracking provides visibility into session health, resource consumption, and conversation progress.
SessionStatus Structure
The SessionStatus struct provides a snapshot of session state:
#[derive(Debug, Clone)]
pub struct SessionStatus {
pub session_id: i64,
pub model: String,
pub created_at: Instant,
pub conversation_len: usize,
pub context_used: i64,
pub context_limit: i32,
pub utilization: f64,
pub total_input: i64,
pub total_output: i64,
pub request_count: i64,
}
Field Descriptions
| Field | Type | Description |
|---|---|---|
session_id | i64 | Unique session identifier |
model | String | LLM model name |
created_at | Instant | Session creation time |
conversation_len | usize | Number of messages in history |
context_used | i64 | Input tokens consumed |
context_limit | i32 | Maximum context window |
utilization | f64 | Ratio of used to available context |
total_input | i64 | Cumulative input tokens |
total_output | i64 | Cumulative output tokens |
request_count | i64 | Number of LLM requests made |
Retrieving Session Status
Status is computed on demand from atomic fields:
impl LLMSession {
pub async fn status(&self) -> SessionStatus {
let conversation = self.conversation.read().await;
let context_used = self.current_input_tokens.load(Ordering::SeqCst);
let context_limit = self.context_limit.load(Ordering::SeqCst);
SessionStatus {
session_id: self.id(),
model: self.config.model.clone(),
created_at: self.created_at,
conversation_len: conversation.len(),
context_used,
context_limit,
utilization: context_used as f64 / context_limit as f64,
total_input: self.current_input_tokens.load(Ordering::SeqCst),
total_output: self.current_output_tokens.load(Ordering::SeqCst),
request_count: self.request_count.load(Ordering::SeqCst),
}
}
}
TokenUsage Structure
Token usage provides detailed consumption metrics:
#[derive(Debug, Clone, Default)]
pub struct TokenUsage {
pub total_input_tokens: i64,
pub total_output_tokens: i64,
pub request_count: i64,
pub last_input_tokens: i64,
pub last_output_tokens: i64,
}
Usage Fields
| Field | Description |
|---|---|
total_input_tokens | Cumulative input tokens across all requests |
total_output_tokens | Cumulative output tokens across all requests |
request_count | Number of LLM API calls |
last_input_tokens | Input tokens from most recent request |
last_output_tokens | Output tokens from most recent request |
Status Update Events
Status changes are communicated through the event system:
pub enum LLMResponseType {
// ... other variants
TokenUpdate,
}
Token updates are sent during streaming responses:
StreamEvent::MessageDelta { stop_reason, usage } => {
if let Some(usage) = usage {
self.current_input_tokens.store(usage.input_tokens as i64, Ordering::SeqCst);
self.current_output_tokens.store(usage.output_tokens as i64, Ordering::SeqCst);
let payload = FromLLMPayload {
response_type: LLMResponseType::TokenUpdate,
input_tokens: usage.input_tokens as i64,
output_tokens: usage.output_tokens as i64,
session_id: self.id(),
// ...
};
let _ = self.from_llm.send(payload).await;
}
}
Controller Event Propagation
The controller converts token updates to UI events:
pub enum ControllerEvent {
// ... other variants
TokenUpdate {
session_id: i64,
input_tokens: i64,
output_tokens: i64,
},
}
This enables real-time token display in the UI during streaming.
Atomic State Fields
Session state uses atomic types for thread-safe access:
pub struct LLMSession {
// Token tracking
current_input_tokens: AtomicI64,
current_output_tokens: AtomicI64,
request_count: AtomicI64,
// Context limits
context_limit: AtomicI32,
// Session state
shutdown: AtomicBool,
}
All atomic operations use Ordering::SeqCst for consistency:
// Read current value
let tokens = self.current_input_tokens.load(Ordering::SeqCst);
// Update value
self.current_input_tokens.store(new_value, Ordering::SeqCst);
// Atomic increment
self.request_count.fetch_add(1, Ordering::SeqCst);
Utilization Calculation
Context utilization determines when compaction triggers:
fn calculate_utilization(&self) -> f64 {
let context_used = self.current_input_tokens.load(Ordering::SeqCst);
let context_limit = self.context_limit.load(Ordering::SeqCst);
context_used as f64 / context_limit as f64
}
Utilization thresholds:
- Below 0.75: Normal operation
- 0.75 to 1.0: Compaction triggered (if configured)
- At 1.0: Context window full
Token Usage Tracker
The TokenUsageTracker aggregates usage across sessions:
pub struct TokenUsageTracker {
tokens_per_session: RwLock<HashMap<i64, TokenMeter>>,
tokens_per_model: RwLock<HashMap<String, TokenMeter>>,
total_usage: RwLock<TokenMeter>,
}
Tracker Operations
impl TokenUsageTracker {
pub async fn increment(
&self,
session_id: i64,
model: &str,
input_tokens: i64,
output_tokens: i64,
) {
// Update session-level usage
let mut sessions = self.tokens_per_session.write().await;
sessions
.entry(session_id)
.or_default()
.add(input_tokens, output_tokens);
// Update model-level usage
let mut models = self.tokens_per_model.write().await;
models
.entry(model.to_string())
.or_default()
.add(input_tokens, output_tokens);
// Update total usage
let mut total = self.total_usage.write().await;
total.add(input_tokens, output_tokens);
}
}
TokenMeter Structure
#[derive(Debug, Clone, Default)]
pub struct TokenMeter {
pub input_tokens: i64,
pub output_tokens: i64,
pub request_count: i64,
}
impl TokenMeter {
pub fn add(&mut self, input: i64, output: i64) {
self.input_tokens += input;
self.output_tokens += output;
self.request_count += 1;
}
pub fn total(&self) -> i64 {
self.input_tokens + self.output_tokens
}
}
Message-Level Token Storage
Assistant messages store token counts for historical tracking:
pub struct AssistantMessage {
// ... other fields
pub input_tokens: i64,
pub output_tokens: i64,
pub cache_read_tokens: i64,
pub cache_write_tokens: i64,
}
This enables:
- Per-message token auditing
- Historical usage analysis
- Cache hit ratio calculation
Status Queries
Common status query patterns:
// Get current utilization
let status = session.status().await;
if status.utilization > 0.9 {
warn!("Session {} near context limit", status.session_id);
}
// Check total usage
let tracker = controller.usage_tracker();
let session_usage = tracker.get_session_usage(session_id).await;
let model_usage = tracker.get_model_usage("claude-sonnet-4-20250514").await;
// Monitor request rate
let requests = session.request_count();
Thread Safety
Status tracking is designed for concurrent access:
| Component | Synchronization | Purpose |
|---|---|---|
| Token counters | AtomicI64 | Lock-free updates |
| Request count | AtomicI64 | Concurrent increments |
| Context limit | AtomicI32 | Runtime modification |
| Usage tracker | RwLock<HashMap> | Multi-session aggregation |
Next Steps
- Token Tracking - Detailed token counting
- Context Management - Conversation storage
- Compaction Algorithm - Automatic context reduction
