While the Rust core is extremely fast, the bottleneck in AI agents is almost always the Network IO and LLM token generation speed. However, efficient local processing ensures the UI remains buttery smooth.

UI Responsiveness

  • Off-Main-Thread Rendering: The TUI rendering loop should never block on network requests. All I/O must happen in background tasks.
  • Debouncing: Input processing (like syntax highlighting in the chat box) should be debounced to avoid needless CPU spikes during rapid typing.

Token Optimization

“Performance” also means “getting the answer faster/cheaper”.

  1. Prompt Caching: Some providers support caching prefixes of the prompt. Agent Air can structure system prompts to maximize cache hits.
  2. Context Compaction: Aggressively removing old or irrelevant messages reduces the payload size sent to the API, lowering Time-To-First-Byte (TTFB).

Async Runtime

For high-throughput agents (e.g., a server handling many sessions), tuning the Tokio runtime (worker threads, blocking threads) in main.rs is essential.

#[tokio::main(worker_threads = 4)] // Tune based on workload
async fn main() { ... }