Skip to content

Agent

The agent module defines AI voice agents that handle incoming phone calls. Each agent has an identity, a system prompt, a realtime provider configuration, and optional knowledge bases.

pub struct Agent {
pub id: Uuid,
pub name: String,
pub prompt: String,
pub realtime_config: AgentRealtimeConfig,
pub tool_config: AgentToolConfig,
pub knowledge_base_ids: Vec<Uuid>,
}
pub struct AgentData {
pub name: String,
pub prompt: String,
pub realtime_config: AgentRealtimeConfig,
pub tool_config: AgentToolConfig,
pub knowledge_base_ids: Vec<Uuid>,
}
MethodPathDescription
GET/api/agentsList all agents for the organization
GET/api/agents/:idGet agent with knowledge base IDs
POST/api/agentsCreate agent + link knowledge bases
PUT/api/agents/:idUpdate agent (replaces all knowledge base links)
DELETE/api/agents/:idDelete agent

All endpoints require an authenticated Session. Agents are scoped to the session’s organization.

AgentRealtimeConfig is a tagged enum — the provider field determines which AI backend handles voice sessions.

{
"provider": "openai",
"model": "gpt-realtime-1.5",
"voice": "marin",
"speed": 1.0,
"noise_reduction": "near_field",
"vad": {
"type": "semantic_vad",
"eagerness": "medium",
"create_response": true,
"interrupt_response": true
}
}
SettingOptionsDefault
modelgpt-realtime-1.5, gpt-realtime, gpt-realtime-minigpt-realtime-1.5
voicemarin, cedar, alloy, ash, ballad, coral, echo, sage, shimmer, versemarin
speedf321.0
noise_reductionnear_field, far_field, nonenear_field
vad.typesemantic_vad, server_vadsemantic_vad
vad.eagernesslow, medium, high, automedium

Server VAD exposes additional fields: threshold, prefix_padding_ms, silence_duration_ms, idle_timeout_ms.

{
"provider": "gemini",
"model": "gemini-live-2.5-flash-preview-native-audio-09-2025",
"voice": "Aoede",
"start_of_speech_sensitivity": "START_SENSITIVITY_HIGH",
"end_of_speech_sensitivity": "END_SENSITIVITY_HIGH",
"activity_handling": "START_OF_ACTIVITY_INTERRUPTS"
}
SettingOptionsDefault
voiceAoede, Charon, Fenrir, Kore, PuckAoede
start_of_speech_sensitivitySTART_SENSITIVITY_HIGH, START_SENSITIVITY_LOWHIGH
end_of_speech_sensitivityEND_SENSITIVITY_HIGH, END_SENSITIVITY_LOWHIGH
silence_duration_msOption<u32>None
activity_handlingSTART_OF_ACTIVITY_INTERRUPTS, NO_INTERRUPTIONINTERRUPTS

Each agent has a tool_config field controlling which tools are available and their per-tool settings:

pub struct AgentToolConfig {
pub disabled_tools: Vec<String>, // Tools explicitly disabled for this agent
pub tool_settings: HashMap<String, serde_json::Value>, // Per-tool config
}

An empty config (the default) enables all tools with no custom settings — preserving backward compatibility for existing agents.

ToolEnabled WhenDescription
end_callAlways (cannot be disabled)Ends the call programmatically via Twilio REST API
transfer_callNot in disabled_toolsTransfers the active call to a specified phone number
query_knowledgeAgent has linked knowledge basesQueries knowledge bases using LLM-powered search
lookup_callerorganization.client_lookup_url is configuredLooks up caller info via external API

The transfer_call tool supports optional per-agent transfer numbers. If tool_settings["transfer_call"] contains a TransferCallSettings object, the agent can only transfer to the configured numbers. If no settings are provided, the agent may transfer to any number mentioned in its prompt.

pub struct TransferCallSettings {
pub numbers: Vec<TransferNumber>,
}
pub struct TransferNumber {
pub number: String, // E.164 format: "+12125551234"
pub label: String, // "Sales", "Support", etc.
}

The UI renders a ToolConfigForm component with checkboxes for each tool and inline inputs for transfer numbers when transfer_call is enabled.

The module defines provider-agnostic traits for realtime communication:

#[async_trait]
pub trait RealtimeSessionSender: Send + Sync {
async fn send_audio_delta(&mut self, audio_delta: String) -> Result<...>;
async fn send_tool_response(&mut self, call_id: String, function_name: String, output: String) -> Result<...>;
}
#[async_trait]
pub trait RealtimeSessionReceiver: Send {
async fn receive_event(&mut self) -> Result<RealtimeInEvent, ...>;
}

Events received from any provider are normalized into RealtimeInEvent:

VariantMeaning
AudioDeltaAudio chunk to relay to the caller
SpeechStartedUser interrupted — model cancelled its response
ResponseStartedModel started a new response
ToolCallModel requesting a function call
UnknownAny other event (ignored)

The instruction builder uses GPT-4.5 to generate and edit agent system prompts from natural language descriptions.

POST /api/instructions/generate

Creates structured voice agent instructions from a user description. Automatically includes organization context when available.

Request:

pub struct GenerateInstructionsRequest {
pub user_description: String,
}

Response:

pub struct AiInstructionResponse {
pub instructions: String,
}

The system appends organization profile data (if configured) to the user description before sending to GPT-4.5.

POST /api/instructions/edit

Refines existing instructions using a predefined action (improve, make concise, add examples, etc.).

Request:

pub struct EditInstructionsRequest {
pub action: InstructionAction,
pub current_instructions: String,
}
pub enum InstructionAction {
Improve,
MakeConcise,
ImproveTone,
AddExamples,
Fix,
StrengthenSafety,
AddPersonality,
EnhanceWithOrgData, // Injects org profile context
}

The EnhanceWithOrgData action fetches the organization profile and formats it as markdown context for the LLM.

POST /api/instructions/custom-edit

Applies a free-text editing instruction (e.g., “make this sound more professional”).

Request:

pub struct CustomEditRequest {
pub edit_instruction: String,
pub current_instructions: String,
}

InstructionPreview — displays instructions with three view modes:

ModePurpose
PreviewRendered markdown with syntax highlighting
EditRaw textarea for direct editing
DiffTabs to compare current vs original instructions

The component supports undo by snapshotting the original content on mount.

AiInstructionToolbar — renders quick-action buttons for each InstructionAction. Clicking a button calls the edit API and updates the preview.

ToolConfigForm — per-agent tool toggles (transfer_call, lookup_caller, query_knowledge). When transfer_call is enabled, displays inline inputs for configuring transfer numbers (label + E.164 phone number).

src/mods/agent/
├── api/
│ ├── instruction_builder/ # Generate/edit/custom-edit APIs
│ └── ... # CRUD endpoints
├── components/
│ ├── instruction_builder/ # InstructionPreview, AiInstructionToolbar
│ └── ... # List, details, realtime config form
├── services/
│ ├── build_organization_context_service.rs # Formats org profile for LLM
│ └── ... # Session, tool handling
├── prompts/ # System prompt templates for AI editing actions
├── traits/ # RealtimeSessionSender, RealtimeSessionReceiver
├── types/
│ ├── instruction_builder/ # Request/response types, InstructionAction enum
│ └── ... # Agent, realtime config, tool definitions
└── views/ # Page views
  • Settings — organization profile powers instruction generation
  • Twilio — delivers incoming calls to agents
  • OpenAI — implements sender/receiver for OpenAI Realtime
  • Gemini — implements sender/receiver for Gemini Live
  • Knowledge — knowledge bases queried by query_knowledge tool