Retry Logic

This page documents how the HTTP client handles transient failures through automatic retries with exponential backoff. Retry logic ensures reliable communication with LLM APIs despite rate limiting and temporary outages.

Configuration Constants

/// Maximum number of retries for rate limit errors
const MAX_RETRIES: u32 = 5;

/// Base delay for exponential backoff (in milliseconds)
const BASE_DELAY_MS: u64 = 1000;

/// Maximum delay cap (in milliseconds)
const MAX_DELAY_MS: u64 = 60000;

Retryable Status Codes

Only specific status codes trigger retries:

Status CodeNameRetry
429Too Many RequestsYes
529Overloaded (Anthropic)Yes
4xx (other)Client ErrorNo
5xx (other)Server ErrorNo

Other errors (network failures, timeouts) do not trigger automatic retries.

Retry Loop

The POST method includes automatic retry handling:

pub async fn post(
    &self,
    uri: &str,
    headers: &[(&str, &str)],
    body: &str,
) -> Result<String, LlmError> {
    let mut last_error = None;

    for attempt in 0..=MAX_RETRIES {
        // Build and send request
        let request = /* ... */;
        let res = self.client.request(request).await?;
        let status = res.status();
        let response_text = /* read body */;

        // Check for rate limit (429) or overloaded (529)
        if status == StatusCode::TOO_MANY_REQUESTS || status.as_u16() == 529 {
            if attempt < MAX_RETRIES {
                let delay = calculate_backoff_delay(attempt, &response_text);
                tracing::warn!(
                    status = %status,
                    attempt = attempt + 1,
                    max_retries = MAX_RETRIES,
                    delay_ms = delay.as_millis(),
                    "Rate limited, retrying after delay"
                );
                tokio::time::sleep(delay).await;
                last_error = Some(LlmError::new(
                    format!("HTTP_{}", status.as_u16()),
                    response_text,
                ));
                continue;
            }
        }

        // Success or non-retryable error
        return Ok(response_text);
    }

    // All retries exhausted
    Err(last_error.unwrap_or_else(|| {
        LlmError::new("RATE_LIMIT_EXHAUSTED", "Rate limit retries exhausted")
    }))
}

Exponential Backoff

Delay increases exponentially with each attempt:

fn calculate_backoff_delay(attempt: u32, response_text: &str) -> Duration {
    // Try to extract retry-after from error response
    if let Some(seconds) = extract_retry_after(response_text) {
        return Duration::from_secs(seconds);
    }

    // Exponential backoff: base * 2^attempt
    let exponential_delay = BASE_DELAY_MS * (1 << attempt);
    let capped_delay = exponential_delay.min(MAX_DELAY_MS);

    // Add random jitter (0-25% of delay)
    let jitter = (capped_delay as f64 * 0.25 * rand_factor()) as u64;
    Duration::from_millis(capped_delay + jitter)
}

Delay Progression

AttemptBase DelayWith 25% Jitter
01,000 ms1,000 - 1,250 ms
12,000 ms2,000 - 2,500 ms
24,000 ms4,000 - 5,000 ms
38,000 ms8,000 - 10,000 ms
416,000 ms16,000 - 20,000 ms
532,000 ms32,000 - 40,000 ms

The maximum delay is capped at 60,000 ms (60 seconds).

Jitter

Random jitter prevents thundering herd problems:

fn rand_factor() -> f64 {
    use std::time::SystemTime;
    let nanos = SystemTime::now()
        .duration_since(SystemTime::UNIX_EPOCH)
        .unwrap()
        .subsec_nanos();
    (nanos % 1000) as f64 / 1000.0
}

Jitter adds 0-25% additional delay, randomized per request.

Retry-After Extraction

The client respects retry-after hints from the API:

fn extract_retry_after(response_text: &str) -> Option<u64> {
    let lower = response_text.to_lowercase();

    // Pattern: "retry after X seconds"
    if let Some(pos) = lower.find("retry after ") {
        let after_pos = pos + "retry after ".len();
        let remaining = &lower[after_pos..];
        if let Some(space_pos) = remaining.find(' ') {
            if let Ok(seconds) = remaining[..space_pos].trim().parse::<u64>() {
                return Some(seconds);
            }
        }
    }

    // Pattern: "retry_after": X (JSON field)
    if let Some(pos) = lower.find("\"retry_after\":") {
        let after_pos = pos + "\"retry_after\":".len();
        let remaining = &lower[after_pos..];
        let trimmed = remaining.trim_start();
        let num_str: String = trimmed.chars().take_while(|c| c.is_ascii_digit()).collect();
        if let Ok(seconds) = num_str.parse::<u64>() {
            return Some(seconds);
        }
    }

    None
}

Supported Formats

Natural language (Anthropic error messages):

Please retry after 30 seconds.

JSON field:

{"error": {"type": "rate_limit", "retry_after": 30}}

Streaming Retry

Streaming requests use the same retry logic:

pub async fn post_stream(
    &self,
    uri: &str,
    headers: &[(&str, &str)],
    body: &str,
) -> Result<Pin<Box<dyn Stream<Item = Result<Bytes, LlmError>> + Send>>, LlmError> {
    let mut last_error = None;

    for attempt in 0..=MAX_RETRIES {
        let res = self.client.request(request.clone()).await?;
        let status = res.status();

        // Check for rate limit
        if status == StatusCode::TOO_MANY_REQUESTS || status.as_u16() == 529 {
            if attempt < MAX_RETRIES {
                // Read body for retry-after hint
                let body_bytes = res.collect().await?.to_bytes();
                let response_text = String::from_utf8_lossy(&body_bytes);
                let delay = calculate_backoff_delay(attempt, &response_text);
                tokio::time::sleep(delay).await;
                continue;
            }
        }

        // Success - return stream
        if status.is_success() {
            return Ok(/* stream */);
        }
    }

    Err(last_error.unwrap_or_else(|| /* exhausted error */))
}

Logging

Retry attempts are logged at WARN level:

tracing::warn!(
    status = %status,
    attempt = attempt + 1,
    max_retries = MAX_RETRIES,
    delay_ms = delay.as_millis(),
    "Rate limited, retrying after delay"
);

Example log output:

2024-01-15T10:30:45Z WARN rate limited, retrying after delay
    status=429 attempt=1 max_retries=5 delay_ms=1234

Error Codes

Error CodeMeaning
HTTP_429Rate limit error (429)
HTTP_529Overloaded error (529)
RATE_LIMIT_EXHAUSTEDAll retries failed

Best Practices

Application-Level Retry

For additional resilience, applications can add their own retry layer:

async fn send_with_app_retry(
    client: &LLMClient,
    messages: &[Message],
    options: &MessageOptions,
) -> Result<Message, LlmError> {
    let mut attempts = 0;
    loop {
        match client.send_message(messages, options).await {
            Ok(response) => return Ok(response),
            Err(e) if e.error_code == "RATE_LIMIT_EXHAUSTED" && attempts < 3 => {
                attempts += 1;
                tokio::time::sleep(Duration::from_secs(60)).await;
            }
            Err(e) => return Err(e),
        }
    }
}

Monitoring

Track retry metrics for operational visibility:

// Count retries
metrics::counter!("llm_retries", 1, "status" => status.to_string());

// Track retry delay
metrics::histogram!("llm_retry_delay_ms", delay.as_millis() as f64);

Non-Retryable Errors

These errors fail immediately without retry:

  • Network errors (DNS failure, connection refused)
  • TLS errors (certificate validation, handshake)
  • Client errors (400, 401, 403, 404)
  • Server errors (500, 502, 503, 504)
  • Parse errors (invalid JSON response)

Next Steps