Pricing

Provider token cost, published transparently, with a fixed 10% markup.

Persona billing is usage-based. The router selects a provider and model by workload and rigor, and the billed AI price is simply the upstream token rate plus a 10% markup.

Current catalog published from the ai-router service example configuration on April 9, 2026.

Read the docs Try it now

Billing formula provider rate × 1.10

Persona passes through the routed AI token price and adds a fixed 10% markup.

Billing rule

Upstream token price + 10%

No mystery bundle. No opaque multiplier hidden behind a package name.

Routing rule

Workload + rigor decide the lane

The router publishes different provider/model lanes depending on the kind of runtime workload Persona is handling.

Price unit

USD per 1M tokens

All prices below are shown the same way the router config publishes them, so comparisons stay clean.

Current LLM Lanes

The published router lanes are priced by workload, not by one global model.

The current catalog below comes directly from the router example configuration. This is the right thing to publish publicly because Persona does not treat every call as the same workload.

LLM lane

General runtime work

The low-cost fast lane for everyday runtime workloads where the system does not need the heavier reasoning route.

Workload: Fast general runtime workloads
Rigor: casual, normal, precise
Provider: Google
Model: gemini-3-flash-preview

Upstream input $0.50 per 1M tokens

Upstream output $3.00 per 1M tokens

Persona input $0.55 per 1M tokens

Persona output $3.30 per 1M tokens

LLM lane

Deep runtime analysis

The higher-rigor lane used when those same runtime workloads need a deeper reasoning pass.

Workload: High-rigor runtime workloads
Rigor: deep
Provider: OpenAI
Model: gpt-5.2

Upstream input $1.75 per 1M tokens

Upstream output $14.00 per 1M tokens

Persona input $1.93 per 1M tokens

Persona output $15.40 per 1M tokens

LLM lane

Planning lane

Planning stays on a dedicated lane across all rigors so this workload keeps a predictable price.

Workload: Planning and orchestration workloads
Rigor: casual, normal, precise, deep
Provider: OpenAI
Model: gpt-5.3-codex

Upstream input $1.75 per 1M tokens

Upstream output $14.00 per 1M tokens

Persona input $1.93 per 1M tokens

Persona output $15.40 per 1M tokens

Embeddings

The memory lane has its own pricing because retrieval is not the same workload as generation.

Embeddings are priced separately from text generation and power retrieval, memory indexing, and semantic recall.

Embedding engine

voyage4_1024

Voyage · 1024 dims

Rigor casual, normal

Model voyage-4-lite

Upstream input $0.02 per 1M tokens

Persona input $0.022 per 1M tokens

Rigor precise

Model voyage-4

Upstream input $0.06 per 1M tokens

Persona input $0.066 per 1M tokens

Rigor deep

Model voyage-4-large

Upstream input $0.12 per 1M tokens

Persona input $0.132 per 1M tokens

Notes

Keep the pricing contract simple.

The published lane can change over time as the router catalog changes. This page is where the current public catalog should live.

The figures shown here are token-based AI pricing only. Any provider-side extras outside token usage should be quoted separately when enabled.

Markup is fixed at 10% over the upstream AI token rate.