Skip to content

AI Usage Tracking

Loquent logs every AI call — text generation, transcription, and realtime voice sessions — to the ai_usage_log table. Each record captures the organization, feature, model, and token counts, enabling per-org cost tracking and usage analytics.

After every AI call completes, a fire-and-forget logger spawns a background task to insert a usage record. Logging never blocks the response or propagates errors — failures produce a warning log and nothing more.

use crate::mods::ai::{spawn_log_ai_usage, AiUsageEntry, AiUsageFeature, AiModels};
let response = builder.build().generate_text().await?;
// Fire-and-forget: logs asynchronously, never fails the caller
spawn_log_ai_usage(AiUsageEntry::from_text_generation(
organization_id,
Some(call_id), // None if not call-related
AiUsageFeature::SummarizeCall,
AiModels::SUMMARIZE_CALL,
&response.usage(),
));
let result = response.into_schema()?;

For transcriptions, use from_transcription with the usage data from the OpenAI response:

// Extract token counts from the transcription response
let (input_tokens, output_tokens, input_audio_tokens) = response
.usage
.map(|u| {
let input_audio = u.input_token_details.and_then(|d| d.audio_tokens);
let input_text = u.input_token_details.and_then(|d| d.text_tokens);
(input_text, u.output_tokens, input_audio)
})
.unwrap_or((None, None, None));
spawn_log_ai_usage(AiUsageEntry::from_transcription(
organization_id,
Some(call_id),
AiUsageFeature::Transcription,
"gpt-4o-transcribe",
input_tokens,
output_tokens,
input_audio_tokens,
Some(duration_secs),
));

During voice calls, each model response turn emits usage data. Loquent captures these per-turn token counts from both OpenAI and Gemini realtime sessions using a provider-agnostic RealtimeUsageTick struct.

src/mods/agent/types/realtime_usage_tick_type.rs
#[derive(Debug, Clone)]
pub struct RealtimeUsageTick {
pub provider: &'static str, // "openai" or "gemini"
pub model: String, // e.g., "gpt-4o-realtime-preview"
pub input_tokens: Option<i32>,
pub output_tokens: Option<i32>,
pub input_audio_tokens: Option<i32>,
pub output_audio_tokens: Option<i32>,
pub cached_tokens: Option<i32>,
}

OpenAI sends a response.done event after each model response turn. The event includes usage with input_token_details and output_token_details that break down text vs. audio tokens.

Gemini includes usage_metadata on server messages with prompt_token_count and candidates_token_count. Gemini doesn’t provide separate audio token counts — those fields are None.

Gemini model names arrive as full Vertex AI resource paths. Loquent strips the path prefix, converting projects/.../models/gemini-live-2.5-flash-native-audio to gemini-live-2.5-flash-native-audio.

In the Twilio stream route’s realtime event loop, UsageTick events trigger a fire-and-forget log:

RealtimeInEvent::UsageTick(tick) => {
spawn_log_ai_usage(AiUsageEntry::from_realtime_turn(
organization_id,
call_id,
tick,
));
}

from_realtime_turn maps the tick to an AiUsageEntry with usage_type: "realtime" and feature: "realtime_turn". Every turn in a voice call produces one row in ai_usage_log.

The ai_usage_log table stores one row per AI call:

ColumnTypeDescription
idUUIDPrimary key
organization_idUUIDFK to organization (CASCADE)
call_idUUID?FK to call (SET NULL) — only for call-related features
featureTEXTSnake_case feature label from AiUsageFeature
providerTEXTExtracted from model slug (e.g., "google", "openai", "anthropic", "deepseek")
modelTEXTModel identifier (e.g., "deepseek/deepseek-v3.2")
usage_typeTEXT"text_generation", "transcription", or "realtime"
input_tokensINT?Input tokens consumed
output_tokensINT?Output tokens generated
cached_tokensINT?Cached/prompt-cached tokens
reasoning_tokensINT?Reasoning tokens (extended thinking models)
input_audio_tokensINT?Input audio tokens (realtime voice sessions)
output_audio_tokensINT?Output audio tokens (realtime voice sessions)
audio_duration_secsDOUBLE?Audio duration in seconds (transcription only)
created_atTIMESTAMPTZAuto-set to CURRENT_TIMESTAMP

Indexes: A composite index on (organization_id, created_at) optimizes per-org date-range queries. A partial index on call_id WHERE call_id IS NOT NULL speeds up call-specific lookups.

AiUsageFeature in src/mods/ai/types/ai_usage_type.rs identifies which capability generated the usage:

VariantFeatureProvider
EnrichContactContact enrichment from transcriptionOpenRouter
EnrichContactFromMessagesContact enrichment from SMS historyOpenRouter
SummarizeCallCall summary generationOpenRouter
UpdateContactMemorySystem note updatesOpenRouter
AnalyzeCallCall analysis (user-defined analyzers)OpenRouter
IdentifySpeakersSpeaker identificationOpenRouter
AutoTagContactAutomatic contact taggingOpenRouter
ExtractTodosTodo extraction from callsOpenRouter
ExecuteTodoTodo action executionOpenRouter
QueryKnowledgeKnowledge base RAG queriesOpenRouter
GenerateInstructionsAI builder: generate agent instructionsOpenRouter
EditInstructionsAI builder: edit agent instructionsOpenRouter
CustomEditInstructionsAI builder: custom edit instructionsOpenRouter
TranscriptionStandard audio transcriptionOpenAI
DiarizedTranscriptionDiarized transcription (speaker labels)OpenAI
TextAgentSuggestionsText agent response suggestionsOpenRouter
RealtimeTurnPer-turn usage from realtime voice sessionsOpenAI / Gemini

When you add a new AI call site, follow these steps:

  1. Add an AiUsageFeature variant and its as_str() mapping in ai_usage_type.rs.
  2. Add an AiModels constant in ai_models_type.rs if you’re using a new model.
  3. Ensure organization_id is available — add it as a function parameter if needed.
  4. Call spawn_log_ai_usage immediately after receiving the AI response, before consuming it.

Loquent calculates costs server-side using hardcoded per-model pricing in src/mods/ai/types/ai_pricing_type.rs. The get_model_pricing(provider, model) function returns a ModelPricing struct with rates per 1M tokens for input, output, cached, and audio tokens.

The Admin AI Costs dashboard uses this pricing engine to aggregate costs across models, providers, features, and organizations with period-over-period comparisons.

The admin AI Costs dashboard (GET /api/admin/ai-costs) provides aggregated cost analytics. See the Admin module docs for details.

For raw data, query the database directly:

SELECT
feature,
model,
SUM(COALESCE(input_tokens, 0)) AS total_input,
SUM(COALESCE(output_tokens, 0)) AS total_output,
COUNT(*) AS call_count
FROM ai_usage_log
WHERE organization_id = 'your-org-id'
AND created_at >= '2026-03-01'
AND created_at < '2026-04-01'
AND usage_type = 'text_generation'
GROUP BY feature, model
ORDER BY total_output DESC;
FilePurpose
src/mods/ai/types/ai_usage_type.rsAiUsageFeature enum and AiUsageEntry struct
src/mods/ai/services/log_ai_usage_service.rsspawn_log_ai_usage() fire-and-forget logger
src/mods/ai/types/ai_pricing_type.rsModelPricing struct and get_model_pricing() lookup
src/mods/ai/types/ai_models_type.rsCentralized model registry
src/mods/agent/types/realtime_usage_tick_type.rsRealtimeUsageTick provider-agnostic struct
src/mods/openai/types/openai_realtime_in_event_response_done_type.rsOpenAI response.done usage parsing
src/mods/gemini/types/gemini_realtime_in_event_type.rsGemini usage_metadata parsing
src/mods/twilio/routes/twilio_stream_route.rsRealtime event loop with usage logging
migration/src/m20260308_000000_create_ai_usage_log_table.rsTable migration