Retry Logic

This page documents how the HTTP client handles transient failures through automatic retries with exponential backoff. Retry logic ensures reliable communication with LLM APIs despite rate limiting and temporary outages.

Configuration Constants

/// Maximum number of retries for rate limit errors
const MAX_RETRIES: u32 = 5;

/// Base delay for exponential backoff (in milliseconds)
const BASE_DELAY_MS: u64 = 1000;

/// Maximum delay cap (in milliseconds)
const MAX_DELAY_MS: u64 = 60000;

Retryable Status Codes

Only specific status codes trigger retries:

Status Code	Name	Retry
429	Too Many Requests	Yes
529	Overloaded (Anthropic)	Yes
4xx (other)	Client Error	No
5xx (other)	Server Error	No

Other errors (network failures, timeouts) do not trigger automatic retries.

Retry Loop

The POST method includes automatic retry handling:

pub async fn post(
    &self,
    uri: &str,
    headers: &[(&str, &str)],
    body: &str,
) -> Result<String, LlmError> {
    let mut last_error = None;

    for attempt in 0..=MAX_RETRIES {
        // Build and send request
        let request = /* ... */;
        let res = self.client.request(request).await?;
        let status = res.status();
        let response_text = /* read body */;

        // Check for rate limit (429) or overloaded (529)
        if status == StatusCode::TOO_MANY_REQUESTS || status.as_u16() == 529 {
            if attempt < MAX_RETRIES {
                let delay = calculate_backoff_delay(attempt, &response_text);
                tracing::warn!(
                    status = %status,
                    attempt = attempt + 1,
                    max_retries = MAX_RETRIES,
                    delay_ms = delay.as_millis(),
                    "Rate limited, retrying after delay"
                );
                tokio::time::sleep(delay).await;
                last_error = Some(LlmError::new(
                    format!("HTTP_{}", status.as_u16()),
                    response_text,
                ));
                continue;
            }
        }

        // Success or non-retryable error
        return Ok(response_text);
    }

    // All retries exhausted
    Err(last_error.unwrap_or_else(|| {
        LlmError::new("RATE_LIMIT_EXHAUSTED", "Rate limit retries exhausted")
    }))
}

Exponential Backoff

Delay increases exponentially with each attempt:

fn calculate_backoff_delay(attempt: u32, response_text: &str) -> Duration {
    // Try to extract retry-after from error response
    if let Some(seconds) = extract_retry_after(response_text) {
        return Duration::from_secs(seconds);
    }

    // Exponential backoff: base * 2^attempt
    let exponential_delay = BASE_DELAY_MS * (1 << attempt);
    let capped_delay = exponential_delay.min(MAX_DELAY_MS);

    // Add random jitter (0-25% of delay)
    let jitter = (capped_delay as f64 * 0.25 * rand_factor()) as u64;
    Duration::from_millis(capped_delay + jitter)
}

Delay Progression

Attempt	Base Delay	With 25% Jitter
0	1,000 ms	1,000 - 1,250 ms
1	2,000 ms	2,000 - 2,500 ms
2	4,000 ms	4,000 - 5,000 ms
3	8,000 ms	8,000 - 10,000 ms
4	16,000 ms	16,000 - 20,000 ms
5	32,000 ms	32,000 - 40,000 ms

The maximum delay is capped at 60,000 ms (60 seconds).

Jitter

Random jitter prevents thundering herd problems:

fn rand_factor() -> f64 {
    use std::time::SystemTime;
    let nanos = SystemTime::now()
        .duration_since(SystemTime::UNIX_EPOCH)
        .unwrap()
        .subsec_nanos();
    (nanos % 1000) as f64 / 1000.0
}

Jitter adds 0-25% additional delay, randomized per request.

Retry-After Extraction

The client respects retry-after hints from the API:

fn extract_retry_after(response_text: &str) -> Option<u64> {
    let lower = response_text.to_lowercase();

    // Pattern: "retry after X seconds"
    if let Some(pos) = lower.find("retry after ") {
        let after_pos = pos + "retry after ".len();
        let remaining = &lower[after_pos..];
        if let Some(space_pos) = remaining.find(' ') {
            if let Ok(seconds) = remaining[..space_pos].trim().parse::<u64>() {
                return Some(seconds);
            }
        }
    }

    // Pattern: "retry_after": X (JSON field)
    if let Some(pos) = lower.find("\"retry_after\":") {
        let after_pos = pos + "\"retry_after\":".len();
        let remaining = &lower[after_pos..];
        let trimmed = remaining.trim_start();
        let num_str: String = trimmed.chars().take_while(|c| c.is_ascii_digit()).collect();
        if let Ok(seconds) = num_str.parse::<u64>() {
            return Some(seconds);
        }
    }

    None
}

Supported Formats

Natural language (Anthropic error messages):

Please retry after 30 seconds.

JSON field:

{"error": {"type": "rate_limit", "retry_after": 30}}

Streaming Retry

Streaming requests use the same retry logic:

pub async fn post_stream(
    &self,
    uri: &str,
    headers: &[(&str, &str)],
    body: &str,
) -> Result<Pin<Box<dyn Stream<Item = Result<Bytes, LlmError>> + Send>>, LlmError> {
    let mut last_error = None;

    for attempt in 0..=MAX_RETRIES {
        let res = self.client.request(request.clone()).await?;
        let status = res.status();

        // Check for rate limit
        if status == StatusCode::TOO_MANY_REQUESTS || status.as_u16() == 529 {
            if attempt < MAX_RETRIES {
                // Read body for retry-after hint
                let body_bytes = res.collect().await?.to_bytes();
                let response_text = String::from_utf8_lossy(&body_bytes);
                let delay = calculate_backoff_delay(attempt, &response_text);
                tokio::time::sleep(delay).await;
                continue;
            }
        }

        // Success - return stream
        if status.is_success() {
            return Ok(/* stream */);
        }
    }

    Err(last_error.unwrap_or_else(|| /* exhausted error */))
}

Logging

Retry attempts are logged at WARN level:

tracing::warn!(
    status = %status,
    attempt = attempt + 1,
    max_retries = MAX_RETRIES,
    delay_ms = delay.as_millis(),
    "Rate limited, retrying after delay"
);

Example log output:

2024-01-15T10:30:45Z WARN rate limited, retrying after delay
    status=429 attempt=1 max_retries=5 delay_ms=1234

Error Codes

Error Code	Meaning
`HTTP_429`	Rate limit error (429)
`HTTP_529`	Overloaded error (529)
`RATE_LIMIT_EXHAUSTED`	All retries failed

Best Practices

Application-Level Retry

For additional resilience, applications can add their own retry layer:

async fn send_with_app_retry(
    client: &LLMClient,
    messages: &[Message],
    options: &MessageOptions,
) -> Result<Message, LlmError> {
    let mut attempts = 0;
    loop {
        match client.send_message(messages, options).await {
            Ok(response) => return Ok(response),
            Err(e) if e.error_code == "RATE_LIMIT_EXHAUSTED" && attempts < 3 => {
                attempts += 1;
                tokio::time::sleep(Duration::from_secs(60)).await;
            }
            Err(e) => return Err(e),
        }
    }
}

Monitoring

Track retry metrics for operational visibility:

// Count retries
metrics::counter!("llm_retries", 1, "status" => status.to_string());

// Track retry delay
metrics::histogram!("llm_retry_delay_ms", delay.as_millis() as f64);

Non-Retryable Errors

These errors fail immediately without retry:

Network errors (DNS failure, connection refused)
TLS errors (certificate validation, handshake)
Client errors (400, 401, 403, 404)
Server errors (500, 502, 503, 504)
Parse errors (invalid JSON response)

Next Steps

Streaming - SSE stream handling
HTTP & TLS - HTTP client details
Client Overview - LLMClient structure

Building Agents

Architecture & Internals