Retry Logic
This page documents how the HTTP client handles transient failures through automatic retries with exponential backoff. Retry logic ensures reliable communication with LLM APIs despite rate limiting and temporary outages.
Configuration Constants
/// Maximum number of retries for rate limit errors
const MAX_RETRIES: u32 = 5;
/// Base delay for exponential backoff (in milliseconds)
const BASE_DELAY_MS: u64 = 1000;
/// Maximum delay cap (in milliseconds)
const MAX_DELAY_MS: u64 = 60000;
Retryable Status Codes
Only specific status codes trigger retries:
| Status Code | Name | Retry |
|---|---|---|
| 429 | Too Many Requests | Yes |
| 529 | Overloaded (Anthropic) | Yes |
| 4xx (other) | Client Error | No |
| 5xx (other) | Server Error | No |
Other errors (network failures, timeouts) do not trigger automatic retries.
Retry Loop
The POST method includes automatic retry handling:
pub async fn post(
&self,
uri: &str,
headers: &[(&str, &str)],
body: &str,
) -> Result<String, LlmError> {
let mut last_error = None;
for attempt in 0..=MAX_RETRIES {
// Build and send request
let request = /* ... */;
let res = self.client.request(request).await?;
let status = res.status();
let response_text = /* read body */;
// Check for rate limit (429) or overloaded (529)
if status == StatusCode::TOO_MANY_REQUESTS || status.as_u16() == 529 {
if attempt < MAX_RETRIES {
let delay = calculate_backoff_delay(attempt, &response_text);
tracing::warn!(
status = %status,
attempt = attempt + 1,
max_retries = MAX_RETRIES,
delay_ms = delay.as_millis(),
"Rate limited, retrying after delay"
);
tokio::time::sleep(delay).await;
last_error = Some(LlmError::new(
format!("HTTP_{}", status.as_u16()),
response_text,
));
continue;
}
}
// Success or non-retryable error
return Ok(response_text);
}
// All retries exhausted
Err(last_error.unwrap_or_else(|| {
LlmError::new("RATE_LIMIT_EXHAUSTED", "Rate limit retries exhausted")
}))
}
Exponential Backoff
Delay increases exponentially with each attempt:
fn calculate_backoff_delay(attempt: u32, response_text: &str) -> Duration {
// Try to extract retry-after from error response
if let Some(seconds) = extract_retry_after(response_text) {
return Duration::from_secs(seconds);
}
// Exponential backoff: base * 2^attempt
let exponential_delay = BASE_DELAY_MS * (1 << attempt);
let capped_delay = exponential_delay.min(MAX_DELAY_MS);
// Add random jitter (0-25% of delay)
let jitter = (capped_delay as f64 * 0.25 * rand_factor()) as u64;
Duration::from_millis(capped_delay + jitter)
}
Delay Progression
| Attempt | Base Delay | With 25% Jitter |
|---|---|---|
| 0 | 1,000 ms | 1,000 - 1,250 ms |
| 1 | 2,000 ms | 2,000 - 2,500 ms |
| 2 | 4,000 ms | 4,000 - 5,000 ms |
| 3 | 8,000 ms | 8,000 - 10,000 ms |
| 4 | 16,000 ms | 16,000 - 20,000 ms |
| 5 | 32,000 ms | 32,000 - 40,000 ms |
The maximum delay is capped at 60,000 ms (60 seconds).
Jitter
Random jitter prevents thundering herd problems:
fn rand_factor() -> f64 {
use std::time::SystemTime;
let nanos = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.subsec_nanos();
(nanos % 1000) as f64 / 1000.0
}
Jitter adds 0-25% additional delay, randomized per request.
Retry-After Extraction
The client respects retry-after hints from the API:
fn extract_retry_after(response_text: &str) -> Option<u64> {
let lower = response_text.to_lowercase();
// Pattern: "retry after X seconds"
if let Some(pos) = lower.find("retry after ") {
let after_pos = pos + "retry after ".len();
let remaining = &lower[after_pos..];
if let Some(space_pos) = remaining.find(' ') {
if let Ok(seconds) = remaining[..space_pos].trim().parse::<u64>() {
return Some(seconds);
}
}
}
// Pattern: "retry_after": X (JSON field)
if let Some(pos) = lower.find("\"retry_after\":") {
let after_pos = pos + "\"retry_after\":".len();
let remaining = &lower[after_pos..];
let trimmed = remaining.trim_start();
let num_str: String = trimmed.chars().take_while(|c| c.is_ascii_digit()).collect();
if let Ok(seconds) = num_str.parse::<u64>() {
return Some(seconds);
}
}
None
}
Supported Formats
Natural language (Anthropic error messages):
Please retry after 30 seconds.
JSON field:
{"error": {"type": "rate_limit", "retry_after": 30}}
Streaming Retry
Streaming requests use the same retry logic:
pub async fn post_stream(
&self,
uri: &str,
headers: &[(&str, &str)],
body: &str,
) -> Result<Pin<Box<dyn Stream<Item = Result<Bytes, LlmError>> + Send>>, LlmError> {
let mut last_error = None;
for attempt in 0..=MAX_RETRIES {
let res = self.client.request(request.clone()).await?;
let status = res.status();
// Check for rate limit
if status == StatusCode::TOO_MANY_REQUESTS || status.as_u16() == 529 {
if attempt < MAX_RETRIES {
// Read body for retry-after hint
let body_bytes = res.collect().await?.to_bytes();
let response_text = String::from_utf8_lossy(&body_bytes);
let delay = calculate_backoff_delay(attempt, &response_text);
tokio::time::sleep(delay).await;
continue;
}
}
// Success - return stream
if status.is_success() {
return Ok(/* stream */);
}
}
Err(last_error.unwrap_or_else(|| /* exhausted error */))
}
Logging
Retry attempts are logged at WARN level:
tracing::warn!(
status = %status,
attempt = attempt + 1,
max_retries = MAX_RETRIES,
delay_ms = delay.as_millis(),
"Rate limited, retrying after delay"
);
Example log output:
2024-01-15T10:30:45Z WARN rate limited, retrying after delay
status=429 attempt=1 max_retries=5 delay_ms=1234
Error Codes
| Error Code | Meaning |
|---|---|
HTTP_429 | Rate limit error (429) |
HTTP_529 | Overloaded error (529) |
RATE_LIMIT_EXHAUSTED | All retries failed |
Best Practices
Application-Level Retry
For additional resilience, applications can add their own retry layer:
async fn send_with_app_retry(
client: &LLMClient,
messages: &[Message],
options: &MessageOptions,
) -> Result<Message, LlmError> {
let mut attempts = 0;
loop {
match client.send_message(messages, options).await {
Ok(response) => return Ok(response),
Err(e) if e.error_code == "RATE_LIMIT_EXHAUSTED" && attempts < 3 => {
attempts += 1;
tokio::time::sleep(Duration::from_secs(60)).await;
}
Err(e) => return Err(e),
}
}
}
Monitoring
Track retry metrics for operational visibility:
// Count retries
metrics::counter!("llm_retries", 1, "status" => status.to_string());
// Track retry delay
metrics::histogram!("llm_retry_delay_ms", delay.as_millis() as f64);
Non-Retryable Errors
These errors fail immediately without retry:
- Network errors (DNS failure, connection refused)
- TLS errors (certificate validation, handshake)
- Client errors (400, 401, 403, 404)
- Server errors (500, 502, 503, 504)
- Parse errors (invalid JSON response)
Next Steps
- Streaming - SSE stream handling
- HTTP & TLS - HTTP client details
- Client Overview - LLMClient structure
