Skip to content

Twilio

The twilio module handles all telephony through Twilio’s APIs. It receives incoming calls via webhooks, streams audio bidirectionally over WebSocket, processes recordings, manages phone numbers, and supports outbound calling through the Twilio Voice SDK.

Incoming call → Twilio
→ POST /twilio/voice
→ TwiML: start recording (mono) + open WebSocket stream
→ WSS /twilio/stream
→ start event → resolve agent, create call record, open AI session
→ media events → forward audio to AI provider
→ AI audio delta → send back to Twilio stream
→ speech started → clear Twilio audio buffer (interruption)
→ Call ends → stop event → close sessions
→ POST /twilio/recording (async)
→ download recording → transcribe → analyze → report → todos → contacts
PathMethodHandlerAuth
/twilio/voicePOSTtwilio_voiceNone (Twilio webhook)
/twilio/streamWSStwilio_streamNone (Twilio WebSocket)
/twilio/recordingPOSTtwilio_recordingNone (Twilio webhook)
/twilio/voice/outboundPOSTtwilio_voice_outboundNone (Twilio webhook)
/twilio/call-statusPOSTtwilio_call_statusNone (Twilio webhook)
/api/twilio/access-tokenGETgenerate_access_token_apiSession required
/api/twilio/buy-phone-numberPOSTbuy_phone_number_apiSession required

Receives the initial call from Twilio. Returns TwiML that:

  1. Starts a mono recording with a status callback to /twilio/recording
  2. Opens a WebSocket media stream to /twilio/stream with from and to as custom parameters

Error handling — when parsing the Twilio form data fails, the endpoint returns 400 Bad Request with a clear error message. This ensures BYO webhook errors propagate as 502 Bad Gateway instead of silently failing.

The core real-time handler. On upgrade:

  1. Wait for start event — reads messages until a valid Start arrives (discards Connected and others)
  2. Resolve agent — calls handle_twilio_stream_in_event_start:
    • Looks up phone number → finds assigned agent
    • Loads agent’s realtime_config to determine provider
    • Resolves or creates a contact for the caller
    • Creates a call record in the database
  3. Create AI session — opens WebSocket to OpenAI or Gemini based on config
  4. Run two concurrent loops (tokio::select!):
    • Twilio listener — forwards audio from caller to AI via send_audio_delta
    • Realtime listener — dispatches AI events:
      • AudioDelta → send audio to Twilio (skipped if interrupted)
      • SpeechStarted → set interrupted flag, send clear to flush Twilio buffer
      • ResponseStarted → reset interrupted flag
      • ToolCall → execute tool, send result back to AI

Fires asynchronously after the call ends. Spawns a background task (tokio::spawn) that:

  1. Downloads the MP3 recording using the org’s Twilio credentials
  2. Saves the audio file ({call_sid}.mp3)
  3. Transcribes via transcribe_audio (GPT-4o Transcribe)
  4. Saves the transcript ({call_sid}.txt)
  5. Updates the call record with recording_url and transcription
  6. Runs post-call pipeline (all fire-and-forget, errors logged but not propagated):
    • enrich_contact — update contact with call context
    • analyze_call — run all org analyzers
    • extract_todos — AI-powered to-do extraction
    • send_call_report — email report
EventTypeKey Fields
connectedTwilioStreamInEventConnectedprotocol, version
startTwilioStreamInEventStartstream_sid, call_sid, from, to, media_format
mediaTwilioStreamInEventMediapayload (base64 µ-law), sequence_number
stopTwilioStreamInEventStopcall_sid, account_sid
EventTypePurpose
mediaTwilioStreamOutEventMediaAudio chunk for playback
clearTwilioStreamOutEventClearFlush buffered audio (interruption)

Audio format: base64-encoded µ-law, 8kHz, mono.

Three server-side components support outbound calls initiated from the Call module’s dial pad.

Access Token (GET /api/twilio/access-token)

Section titled “Access Token (GET /api/twilio/access-token)”

Returns a short-lived JWT for the Twilio Voice SDK running in the browser.

pub struct AccessTokenResponse {
pub token: String, // JWT, 1-hour expiry
pub identity: String, // "user_{member_id}"
}

On first request per organization, the endpoint auto-provisions two Twilio resources:

  1. TwiML Application — created via POST /2010-04-01/Accounts/{sid}/Applications.json with VoiceUrl pointing to /twilio/voice/outbound. The SID is stored in organization_twilio_settings.twiml_app_sid.
  2. API Key — created via POST /2010-04-01/Accounts/{sid}/Keys.json. The SID and secret are stored in organization_twilio_settings.api_key_sid and api_key_secret.

Subsequent requests skip provisioning and reuse the stored credentials.

The JWT uses HS256 with content type twilio-fpa;v=1, includes a voice grant scoped to the TwiML App SID, and expires after one hour. The client SDK handles automatic refresh via the tokenWillExpire event.

Outbound Voice Webhook (POST /twilio/voice/outbound)

Section titled “Outbound Voice Webhook (POST /twilio/voice/outbound)”

Twilio calls this webhook when the browser SDK initiates an outbound call via Device.connect(). The handler receives form parameters To, CallerId (or From), and ContactId.

It returns TwiML that:

  1. Starts a dual-channel recording with a callback to /twilio/recording
  2. Dials the destination number with the resolved caller ID
<Response>
<Start>
<Recording track="both" channels="dual"
recordingStatusCallback="https://{host}/twilio/recording" />
</Start>
<Dial callerId="{caller_id}"
action="https://{host}/twilio/call-status">
<Number>{to}</Number>
</Dial>
</Response>

A background task creates the call record with direction: "outbound" and status: "in-progress". Phone numbers in the TwiML are sanitized to prevent XML injection — only digits, +, -, and spaces pass through.

Call Status Webhook (POST /twilio/call-status)

Section titled “Call Status Webhook (POST /twilio/call-status)”

Fires when the outbound call ends. Receives CallSid, CallStatus, and DialCallDuration from Twilio. Updates the call record’s status to "completed" and sets duration_secs. Runs asynchronously (non-blocking).

UtilityWhat It Does
buy_twilio_phone_numberPOST to Twilio API to purchase a number by area code
create_twilio_subaccountCreate a per-org Twilio subaccount
set_phone_number_twimlSet the voice webhook URL on a phone number

The buy flow also assigns default analyzers to the new number and configures the webhook URL.

pub struct TwilioCoreConf {
pub twilio_main_sid: String,
pub twilio_main_token: String,
}

Per-Organization (organization_twilio_settings table)

Section titled “Per-Organization (organization_twilio_settings table)”

Twilio credentials are stored in the organization_twilio_settings table (one-to-one with organization).

pub struct OrganizationTwilioSettings {
pub id: Uuid,
pub organization_id: Uuid,
pub account_sid: String,
pub auth_token: String,
pub is_byo: bool, // true = BYO Twilio, false = platform-managed
pub event_sink_sid: Option<String>,
pub event_subscription_sid: Option<String>,
pub sms_events_active: bool,
pub connected_at: Option<DateTime>,
pub twiml_app_sid: Option<String>, // Auto-provisioned TwiML Application
pub api_key_sid: Option<String>, // Auto-provisioned API Key SID
pub api_key_secret: Option<String>, // Auto-provisioned API Key secret
}

Use get_twilio_settings(org_id) to retrieve settings for an organization. This helper returns AppError::NotFound if no settings exist.

BYO Twilio — when is_byo: true, the org owns the Twilio account. Voice webhook configuration (configure_phone_voice_api) captures the previous webhook URL before overwriting it and stores it in phone_number.previous_voice_url for rollback.

ConstantValue
TWILIO_API_BASEhttps://api.twilio.com/2010-04-01
TWILIO_VOICE_PATH/twilio/voice
TWILIO_STREAM_PATH/twilio/stream
TWILIO_RECORDING_PATH/twilio/recording
TWILIO_VOICE_OUTBOUND_PATH/twilio/voice/outbound
TWILIO_CALL_STATUS_PATH/twilio/call-status
src/mods/twilio/
├── api/ # Voice webhooks (inbound + outbound), recording, call status,
│ # access token, buy number
├── conf/ # TwilioCoreConf (from core_conf table)
├── constants/ # API base URL, webhook paths
├── routes/ # WebSocket stream handler (the core loop)
├── services/ # Start event handling, media forwarding
├── types/ # Stream events (in/out), call data, phone numbers, recording,
│ # access token response
└── utils/ # Buy number, create subaccount, process recording, set TwiML,
# create API key, create TwiML app, generate access token
  • Agent — resolved from phone number, provides realtime config and tools
  • OpenAI — receives audio stream for AI processing
  • Call — call records created during stream start
  • Contact — caller resolution and enrichment
  • Analyzer — post-call analysis triggered after recording