AI Usage Tracking

Loquent logs every AI call — text generation, transcription, and realtime voice sessions — to the ai_usage_log table. Each record captures the organization, feature, model, and token counts, enabling per-org cost tracking and usage analytics.

How It Works

After every AI call completes, a fire-and-forget logger spawns a background task to insert a usage record. Logging never blocks the response or propagates errors — failures produce a warning log and nothing more.

use crate::mods::ai::{spawn_log_ai_usage, AiUsageEntry, AiUsageFeature, AiModels};

let response = builder.build().generate_text().await?;

// Fire-and-forget: logs asynchronously, never fails the caller
spawn_log_ai_usage(AiUsageEntry::from_text_generation(
    organization_id,
    Some(call_id), // None if not call-related
    AiUsageFeature::SummarizeCall,
    AiModels::SUMMARIZE_CALL,
    &response.usage(),
));

let result = response.into_schema()?;

For transcriptions, use from_transcription with the usage data from the OpenAI response:

// Extract token counts from the transcription response
let (input_tokens, output_tokens, input_audio_tokens) = response
    .usage
    .map(|u| {
        let input_audio = u.input_token_details.and_then(|d| d.audio_tokens);
        let input_text = u.input_token_details.and_then(|d| d.text_tokens);
        (input_text, u.output_tokens, input_audio)
    })
    .unwrap_or((None, None, None));

spawn_log_ai_usage(AiUsageEntry::from_transcription(
    organization_id,
    Some(call_id),
    AiUsageFeature::Transcription,
    "gpt-4o-transcribe",
    input_tokens,
    output_tokens,
    input_audio_tokens,
    Some(duration_secs),
));

Realtime Voice Session Tracking

During voice calls, each model response turn emits usage data. Loquent captures these per-turn token counts from both OpenAI and Gemini realtime sessions using a provider-agnostic RealtimeUsageTick struct.

RealtimeUsageTick

#[derive(Debug, Clone)]
pub struct RealtimeUsageTick {
    pub provider: &'static str,         // "openai" or "gemini"
    pub model: String,                   // e.g., "gpt-4o-realtime-preview"
    pub input_tokens: Option<i32>,
    pub output_tokens: Option<i32>,
    pub input_audio_tokens: Option<i32>,
    pub output_audio_tokens: Option<i32>,
    pub cached_tokens: Option<i32>,
}

How Providers Report Usage

OpenAI sends a response.done event after each model response turn. The event includes usage with input_token_details and output_token_details that break down text vs. audio tokens.

Gemini includes usage_metadata on server messages with prompt_token_count and candidates_token_count. Gemini doesn’t provide separate audio token counts — those fields are None.

Gemini model names arrive as full Vertex AI resource paths. Loquent strips the path prefix, converting projects/.../models/gemini-live-2.5-flash-native-audio to gemini-live-2.5-flash-native-audio.

Logging Realtime Usage

In the Twilio stream route’s realtime event loop, UsageTick events trigger a fire-and-forget log:

RealtimeInEvent::UsageTick(tick) => {
    spawn_log_ai_usage(AiUsageEntry::from_realtime_turn(
        organization_id,
        call_id,
        tick,
    ));
}

from_realtime_turn maps the tick to an AiUsageEntry with usage_type: "realtime" and feature: "realtime_turn". Every turn in a voice call produces one row in ai_usage_log.

Database Schema

The ai_usage_log table stores one row per AI call:

Column	Type	Description
`id`	UUID	Primary key
`organization_id`	UUID	FK to `organization` (CASCADE)
`call_id`	UUID?	FK to `call` (SET NULL) — only for call-related features
`feature`	TEXT	Snake_case feature label from `AiUsageFeature`
`provider`	TEXT	Extracted from model slug (e.g., `"google"`, `"openai"`, `"anthropic"`, `"deepseek"`)
`model`	TEXT	Model identifier (e.g., `"deepseek/deepseek-v3.2"`)
`usage_type`	TEXT	`"text_generation"`, `"transcription"`, or `"realtime"`
`input_tokens`	INT?	Input tokens consumed
`output_tokens`	INT?	Output tokens generated
`cached_tokens`	INT?	Cached/prompt-cached tokens
`reasoning_tokens`	INT?	Reasoning tokens (extended thinking models)
`input_audio_tokens`	INT?	Input audio tokens (realtime voice sessions)
`output_audio_tokens`	INT?	Output audio tokens (realtime voice sessions)
`audio_duration_secs`	DOUBLE?	Audio duration in seconds (transcription only)
`created_at`	TIMESTAMPTZ	Auto-set to `CURRENT_TIMESTAMP`

Indexes: A composite index on (organization_id, created_at) optimizes per-org date-range queries. A partial index on call_id WHERE call_id IS NOT NULL speeds up call-specific lookups.

AiUsageFeature Enum

AiUsageFeature in src/mods/ai/types/ai_usage_type.rs identifies which capability generated the usage:

Variant	Feature	Provider
`EnrichContact`	Contact enrichment from transcription	OpenRouter
`EnrichContactFromMessages`	Contact enrichment from SMS history	OpenRouter
`SummarizeCall`	Call summary generation	OpenRouter
`UpdateContactMemory`	System note updates	OpenRouter
`AnalyzeCall`	Call analysis (user-defined analyzers)	OpenRouter
`IdentifySpeakers`	Speaker identification	OpenRouter
`AutoTagContact`	Automatic contact tagging	OpenRouter
`ExtractTodos`	Todo extraction from calls	OpenRouter
`ExecuteTodo`	Todo action execution	OpenRouter
`QueryKnowledge`	Knowledge base RAG queries	OpenRouter
`GenerateInstructions`	AI builder: generate agent instructions	OpenRouter
`EditInstructions`	AI builder: edit agent instructions	OpenRouter
`CustomEditInstructions`	AI builder: custom edit instructions	OpenRouter
`Transcription`	Standard audio transcription	OpenAI
`DiarizedTranscription`	Diarized transcription (speaker labels)	OpenAI
`TextAgentSuggestions`	Text agent response suggestions	OpenRouter
`RealtimeTurn`	Per-turn usage from realtime voice sessions	OpenAI / Gemini

Adding Usage Tracking to New Features

When you add a new AI call site, follow these steps:

Add an AiUsageFeature variant and its as_str() mapping in ai_usage_type.rs.
Add an AiModels constant in ai_models_type.rs if you’re using a new model.
Ensure organization_id is available — add it as a function parameter if needed.
Call spawn_log_ai_usage immediately after receiving the AI response, before consuming it.

Cost Calculation

Loquent calculates costs server-side using hardcoded per-model pricing in src/mods/ai/types/ai_pricing_type.rs. The get_model_pricing(provider, model) function returns a ModelPricing struct with rates per 1M tokens for input, output, cached, and audio tokens.

The Admin AI Costs dashboard uses this pricing engine to aggregate costs across models, providers, features, and organizations with period-over-period comparisons.

Querying Usage Data

The admin AI Costs dashboard (GET /api/admin/ai-costs) provides aggregated cost analytics. See the Admin module docs for details.

For raw data, query the database directly:

SELECT
    feature,
    model,
    SUM(COALESCE(input_tokens, 0)) AS total_input,
    SUM(COALESCE(output_tokens, 0)) AS total_output,
    COUNT(*) AS call_count
FROM ai_usage_log
WHERE organization_id = 'your-org-id'
  AND created_at >= '2026-03-01'
  AND created_at < '2026-04-01'
  AND usage_type = 'text_generation'
GROUP BY feature, model
ORDER BY total_output DESC;

Key Files

File	Purpose
`src/mods/ai/types/ai_usage_type.rs`	`AiUsageFeature` enum and `AiUsageEntry` struct
`src/mods/ai/services/log_ai_usage_service.rs`	`spawn_log_ai_usage()` fire-and-forget logger
`src/mods/ai/types/ai_pricing_type.rs`	`ModelPricing` struct and `get_model_pricing()` lookup
`src/mods/ai/types/ai_models_type.rs`	Centralized model registry
`src/mods/agent/types/realtime_usage_tick_type.rs`	`RealtimeUsageTick` provider-agnostic struct
`src/mods/openai/types/openai_realtime_in_event_response_done_type.rs`	OpenAI `response.done` usage parsing
`src/mods/gemini/types/gemini_realtime_in_event_type.rs`	Gemini `usage_metadata` parsing
`src/mods/twilio/routes/twilio_stream_route.rs`	Realtime event loop with usage logging
`migration/src/m20260308_000000_create_ai_usage_log_table.rs`	Table migration