OpenAI
The openai module connects to OpenAI’s Realtime API for live voice sessions and uses GPT-4o Transcribe for post-call transcription.
Realtime Sessions
Section titled “Realtime Sessions”When an agent uses the OpenAI provider, create_openai_realtime_session establishes a WebSocket connection:
- Build URL:
wss://api.openai.com/v1/realtime?model=<model_id> - Load API key from
core_confviaOpenAICoreConf - Connect with
Authorization: Bearer <api_key>header - Send
session.updatewith full config (instructions, voice, VAD, tools) - Send
response.createto prime the model - Return
(OpenAIRealtimeSessionSender, OpenAIRealtimeSessionReceiver)
Sender (OpenAIRealtimeSessionSender)
Section titled “Sender (OpenAIRealtimeSessionSender)”Implements RealtimeSessionSender:
| Method | What It Sends |
|---|---|
send_audio_delta(audio) | input_audio_buffer.append with base64 audio |
send_tool_response(call_id, _, output) | conversation.item.create (function_call_output) + response.create |
Receiver (OpenAIRealtimeSessionReceiver)
Section titled “Receiver (OpenAIRealtimeSessionReceiver)”Implements RealtimeSessionReceiver. Maps OpenAI events to RealtimeInEvent:
| OpenAI Event | → RealtimeInEvent |
|---|---|
response.output_audio.delta | AudioDelta |
input_audio_buffer.speech_started | SpeechStarted |
response.created | ResponseStarted |
response.function_call_arguments.done | ToolCall |
| Everything else (20+ event types) | Unknown |
Session Configuration
Section titled “Session Configuration”pub struct OpenaiRealtimeSessionConfig { pub r#type: String, // "realtime" pub model: String, // "gpt-realtime" pub output_modalities: Vec<String>, // ["audio"] pub audio: OpenaiRealtimeSessionConfigAudio, pub instructions: String, pub tools: Option<Vec<OpenaiRealtimeTool>>,}Audio input/output both use g711_ulaw format (matching Twilio’s µ-law stream). Output includes voice and speed settings. Input includes optional noise_reduction and turn_detection.
Tool Definitions
Section titled “Tool Definitions”pub struct OpenaiRealtimeTool { pub r#type: String, // "function" pub name: String, // e.g. "query_knowledge" pub description: String, pub parameters: serde_json::Value, // JSON Schema}Transcription
Section titled “Transcription”Post-call transcription uses the OpenAI Audio API.
Basic Usage
Section titled “Basic Usage”let response = transcribe_audio(recording_bytes).await?;println!("{}", response.text);With Options
Section titled “With Options”let request = TranscriptionRequest { file_bytes: recording_bytes, model: "gpt-4o-transcribe".to_string(), language: Some("en".to_string()), prompt: None, response_format: None,};let response = transcribe_audio_with_options(request).await?;Response
Section titled “Response”pub struct TranscriptionResponse { pub text: String, pub language: Option<String>, pub duration: Option<f64>, pub segments: Option<Vec<TranscriptionSegment>>, pub words: Option<Vec<TranscriptionWord>>,}Segments include timing, token, and confidence data. Words include start/end timestamps and optional speaker diarization.
Configuration
Section titled “Configuration”Loaded from core_conf table:
pub struct OpenAICoreConf { pub openai_api_key: String,}Used by both realtime session creation and transcription.
Constants
Section titled “Constants”| Constant | Value |
|---|---|
OPENAI_REALTIME_BASE_URL | wss://api.openai.com/v1/realtime |
OPENAI_TRANSCRIPTION_URL | https://api.openai.com/v1/audio/transcriptions |
Module Structure
Section titled “Module Structure”src/mods/openai/├── constants/ # API URLs (realtime WebSocket, transcription)├── services/ # create_openai_realtime_session├── types/ # Realtime events (20+ types), session config, transcription types└── utils/ # transcribe_audio, transcribe_audio_with_options