Pricing

Provider token cost, published transparently, with a fixed 10% markup.

Persona billing is usage-based. The router selects a provider and model by workload and rigor, and the billed AI price is simply the upstream token rate plus a 10% markup.

Current catalog published from the ai-router service example configuration on April 9, 2026.

Billing formula provider rate × 1.10

Persona passes through the routed AI token price and adds a fixed 10% markup.

Billing rule

Upstream token price + 10%

No mystery bundle. No opaque multiplier hidden behind a package name.

Routing rule

Workload + rigor decide the lane

The router publishes different provider/model lanes depending on the kind of runtime workload Persona is handling.

Price unit

USD per 1M tokens

All prices below are shown the same way the router config publishes them, so comparisons stay clean.

Current LLM Lanes

The published router lanes are priced by workload, not by one global model.

The current catalog below comes directly from the router example configuration. This is the right thing to publish publicly because Persona does not treat every call as the same workload.

LLM lane

General runtime work

The low-cost fast lane for everyday runtime workloads where the system does not need the heavier reasoning route.

Workload
Fast general runtime workloads
Rigor
casual, normal, precise
Provider
Google
Model
gemini-3-flash-preview
Upstream input $0.50 per 1M tokens
Upstream output $3.00 per 1M tokens
Persona input $0.55 per 1M tokens
Persona output $3.30 per 1M tokens
LLM lane

Deep runtime analysis

The higher-rigor lane used when those same runtime workloads need a deeper reasoning pass.

Workload
High-rigor runtime workloads
Rigor
deep
Provider
OpenAI
Model
gpt-5.2
Upstream input $1.75 per 1M tokens
Upstream output $14.00 per 1M tokens
Persona input $1.93 per 1M tokens
Persona output $15.40 per 1M tokens
LLM lane

Planning lane

Planning stays on a dedicated lane across all rigors so this workload keeps a predictable price.

Workload
Planning and orchestration workloads
Rigor
casual, normal, precise, deep
Provider
OpenAI
Model
gpt-5.3-codex
Upstream input $1.75 per 1M tokens
Upstream output $14.00 per 1M tokens
Persona input $1.93 per 1M tokens
Persona output $15.40 per 1M tokens

Embeddings

The memory lane has its own pricing because retrieval is not the same workload as generation.

Embeddings are priced separately from text generation and power retrieval, memory indexing, and semantic recall.

Embedding engine

voyage4_1024

Voyage · 1024 dims

Rigor casual, normal
Model voyage-4-lite
Upstream input $0.02 per 1M tokens
Persona input $0.022 per 1M tokens
Rigor precise
Model voyage-4
Upstream input $0.06 per 1M tokens
Persona input $0.066 per 1M tokens
Rigor deep
Model voyage-4-large
Upstream input $0.12 per 1M tokens
Persona input $0.132 per 1M tokens

Notes

Keep the pricing contract simple.

The published lane can change over time as the router catalog changes. This page is where the current public catalog should live.

The figures shown here are token-based AI pricing only. Any provider-side extras outside token usage should be quoted separately when enabled.

Markup is fixed at 10% over the upstream AI token rate.