Cohere Provider

The Cohere provider enables integration with Cohere’s Command models through the Chat API v2. It supports both synchronous and streaming message completion with full tool calling capabilities. Command-R models are designed for enterprise use cases with strong performance on retrieval-augmented generation (RAG) and multi-step tool use.

The provider handles Cohere-specific API formats, including the message structure with tool calls and the streaming event format. Tool results are sent with the special tool role as required by Cohere’s API.

CohereProvider Struct

The provider is defined in src/client/providers/cohere/mod.rs:

pub struct CohereProvider {
    api_key: String,
    model: String,
}

impl CohereProvider {
    pub fn new(api_key: String, model: String) -> Self {
        Self { api_key, model }
    }

    pub fn model(&self) -> &str {
        &self.model
    }
}

API Configuration

The Cohere provider uses the following API settings:

Setting	Value
Endpoint	`https://api.cohere.com/v2/chat`
Content-Type	`application/json`

Request headers:

Content-Type: application/json
Authorization: Bearer <api_key>

LLMSessionConfig Builder

Create a Cohere session configuration using the cohere() builder method:

use agent_air::controller::LLMSessionConfig;

let config = LLMSessionConfig::cohere("co-...", "command-r-plus");

The cohere() method sets these defaults:

Option	Default Value
max_tokens	4096
streaming	true
context_limit	128,000
compaction	Threshold (default)

Builder Methods

Customize the configuration using builder methods:

let config = LLMSessionConfig::cohere("co-...", "command-r-plus")
    .with_max_tokens(4096)
    .with_system_prompt("You are a helpful assistant.")
    .with_temperature(0.3)
    .with_streaming(true)
    .with_context_limit(128_000);

Available methods:

Method	Description
`with_max_tokens(u32)`	Set maximum response tokens
`with_system_prompt(impl Into<String>)`	Set the system prompt
`with_temperature(f32)`	Set sampling temperature
`with_streaming(bool)`	Enable or disable streaming
`with_context_limit(i32)`	Set context window size
`with_threshold_compaction(config)`	Configure compaction
`without_compaction()`	Disable compaction

Streaming Support

The Cohere provider fully supports streaming. When streaming is enabled, responses arrive incrementally as StreamEvent values:

pub enum StreamEvent {
    MessageStart { message_id: String, model: String },
    TextDelta(String),
    ToolUse { id: String, name: String, input: Value },
    MessageStop,
    Error(String),
}

Enable streaming by setting stream: true in the request body.

Request Format

Cohere uses a straightforward messages array with role-based formatting:

{
  "model": "command-r-plus",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello"}
  ],
  "max_tokens": 4096,
  "temperature": 0.3
}

Role Mapping

Generic Role	Cohere Role
User	`user`
Assistant	`assistant`
System	`system`
Tool Result	`tool`

Tool Use Format

Tools are sent in the function format:

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  }]
}

Tool choice options:

Value	Description
(default)	Auto - model decides
`"required"`	Must use at least one tool
`"none"`	Cannot use tools
`{"type": "function", "function": {"name": "..."}}`	Force specific tool

Tool Calls in Responses

When the model requests tool calls, the response includes:

{
  "message": {
    "role": "assistant",
    "tool_calls": [{
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"location\":\"San Francisco\"}"
      }
    }]
  }
}

Tool Results

Tool results are sent with the special tool role:

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "72F and sunny in San Francisco"
}

Response Format

Cohere responses use the v2 Chat API format:

{
  "message": {
    "role": "assistant",
    "content": [{"type": "text", "text": "Hello! How can I help?"}]
  },
  "finish_reason": "COMPLETE"
}

The provider parses the message.content array and extracts text and tool call content.

Environment Variables

Configure Cohere via environment variables:

Variable	Description	Default
`COHERE_API_KEY`	API key (required)	None
`COHERE_MODEL`	Model identifier	`command-r-plus`

Available Models

Common Cohere model identifiers:

Model	Context	Description
`command-r-plus`	128K	Most capable model
`command-r`	128K	Balanced performance
`command-r-plus-08-2024`	128K	August 2024 snapshot
`command-r-08-2024`	128K	August 2024 snapshot

YAML Configuration

Configure in your agent’s config file:

providers:
  - provider: cohere
    api_key: co-...
    model: command-r-plus
    system_prompt: "You are a helpful coding assistant."

default_provider: cohere

Complete Example

use agent_air::AgentAir;
use agent_air::controller::LLMSessionConfig;

struct MyConfig;

impl AgentConfig for MyConfig {
    fn config_path(&self) -> &str { ".myagent/config.yaml" }
    fn default_system_prompt(&self) -> &str { "You are helpful." }
    fn log_prefix(&self) -> &str { "myagent" }
    fn name(&self) -> &str { "MyAgent" }
}

fn main() -> std::io::Result<()> {
    let mut agent = AgentAir::new(&MyConfig)?;

    // Configuration is loaded automatically from:
    // 1. ~/.myagent/config.yaml (if exists)
    // 2. COHERE_API_KEY environment variable (fallback)

    agent.run()
}

Programmatic Configuration

For direct configuration without files:

use agent_air::controller::{LLMSessionConfig, LLMController};

let config = LLMSessionConfig::cohere(
    std::env::var("COHERE_API_KEY").expect("API key required"),
    "command-r-plus"
)
.with_system_prompt("You are a helpful assistant.")
.with_max_tokens(4096)
.with_temperature(0.3);

// Create session with this config
let controller = LLMController::new(None);
let session_id = controller.create_session(config).await?;

Error Handling

Cohere API errors are converted to LlmError:

pub struct LlmError {
    pub error_code: String,
    pub error_message: String,
}

Common error codes:

Error Code	Description
`COHERE_ERROR`	General API error
`PARSE_ERROR`	Response parsing failed
`INVALID_REQUEST`	Malformed request

RAG and Grounding

Cohere’s Command-R models are optimized for retrieval-augmented generation. While the provider supports tool calling for RAG workflows, native Cohere RAG features (connectors, documents) require direct API access.

For RAG workflows, implement a retrieval tool:

struct RetrieveTool {
    vector_db: Arc<VectorDatabase>,
}

impl Executable for RetrieveTool {
    fn name(&self) -> &str { "search_knowledge_base" }
    fn description(&self) -> &str { "Search the knowledge base for relevant information" }
    fn input_schema(&self) -> &str {
        r#"{"type":"object","properties":{"query":{"type":"string"}},"required":["query"]}"#
    }

    fn execute(
        &self,
        _ctx: ToolContext,
        input: HashMap<String, serde_json::Value>,
    ) -> Pin<Box<dyn Future<Output = Result<String, String>> + Send>> {
        let db = self.vector_db.clone();
        let query = input.get("query")
            .and_then(|v| v.as_str())
            .unwrap_or("")
            .to_string();

        Box::pin(async move {
            let results = db.search(&query, 5).await?;
            Ok(format_search_results(&results))
        })
    }
}

Comparison with Other Providers

Feature	Cohere	Anthropic	OpenAI
Max context	128K	200K	128K
Streaming	Supported	Supported	Supported
System message	In messages	Dedicated field	In messages
Tool format	Function wrapper	Direct tool	Function wrapper
RAG optimization	Yes	No	No

Building Agents

Architecture & Internals