OrcaRouter — One AI gateway: adaptive LLM routing & governance

AI gateway for production

Smart routing and automatic failover on every request.

Routing that's measurably more accurate.

Every prompt is embedded and routed by a model that keeps learning online from real traffic. On the public RouterArena leaderboard (Jun 2026) it leads on accuracy — ahead of GPT-5, Azure, Martian and NotDiamond — at 75.5%.

contextual embeddingsonline learning<1ms overheadRouterArena

* Based on RouterArena leaderboard data, June 2026.

A provider goes down. No one notices.

When a provider rate-limits or 5xxs, OrcaRouter retries the request against a healthy model across 200+ options before the response starts — so transient upstream outages don't surface to your users.

200+ modelsauto-failoverno 429

Route on your terms.

orcarouter/auto is a smart default, not a black box. Point each workspace at the objective you want — the cheapest model that clears your quality bar, the highest quality, or a balance of both — or let it learn the trade-off from your own traffic. You're never locked into one behavior.

per-workspaceno markup either way<1ms overhead

See and prove every call — cost, model, latency, and why.

See everything. Prove anything.

See exactly what every request cost, which model served it, how long it took, and why it failed — full structured logs you can filter, replay, and copy as a runnable cURL. A route is never a black box.

Per-request logsgrade · model · costcopy-as-cURL

Zero markup. Zero black boxes.

You pay each provider their exact price — we add $0 per token, ever. Every request shows the grade, chosen model, provider, latency, and price, so cost is glass-box, not an opaque blended rate.

$0 / tokenprovider costglass-box receipt

Versioned prompts and caching — without a redeploy.

Change prompts. Not code.

Version prompts behind named labels with A/B splits and one-click rollback. Move a label and every request picks it up instantly — no redeploy, no code change, no client update.

VersionedA/BInstant rollbackNo deploy

Pay once. Reuse for free.

Repeated and cached prompt tokens bill at the provider's cache rate — often a fraction of the input price — across 5-minute and 1-hour ephemeral windows. Same answers, less spend, with cached_tokens on every receipt.

cache_controlcached_tokens5m / 1h windows

Guardrails, budgets, and an agent firewall that enforces.

Guardrails that stop things.

PII Shield and content policies run before the upstream call is billed. A blocked request returns a clean 400 and is never charged — guardrails enforced inline, not logged after the fact.

PII Shieldenforced pre-billingclean 400

Safe for your team. And your agents.

Budgets and roles for people; a risk-scored firewall for agents. Every tool and MCP call is graded ALLOW, REVIEW, or BLOCK before it runs, and anomaly detection flags rate and cost spikes against learned hour-of-week baselines.

ALLOW · REVIEW · BLOCKMCP gatinganomaly detection

Built for the agent era. Before you needed it.

Full control

Need more than a mode? Write the rule.

When the default isn't enough, express routing as code — version-controlled, reviewable, and live in seconds. No redeploy, no client change.

routing.yaml

version: 1
rules:
  - id: hard_agent_task
    when: task_class == "agent" && difficulty >= 0.6
    use:
      model: "claude-opus-4-7"
      reasoning_effort: high   # spend where it matters
  - id: short_prompts
    when: request.input_tokens < 500
    use: { delegate: cheapest }
default:
  delegate: balanced   # fall back to the chosen mode

YAML + CELversion-controlledlive in seconds

Explore routing docs

Setup

Live in 60 seconds.

One URL change. Your existing SDK, model names, and streaming all work exactly as before.

Step 1

🔗

Point your SDK at us

Set base_url to api.orcarouter.ai/v1 and swap your API key. No other code changes needed.

→

Step 2

⚡

We route, guard & observe

Every call is routed to the best model, checked against your guardrails, and metered — graded in under 1ms, with failover, caching and full logs built in.

→

Step 3

✓

You ship, on one endpoint

Traffic goes direct to each provider's first-party API at their published rate — we add $0 per token. One OpenAI-compatible endpoint for routing, observability and governance.

Every model. One price list.

200+ models with live, side-by-side pricing — what you'd pay the provider directly. We add $0 on top.

View all 200+ models →

Model	Routed to	Input /M	Output /M	Context	Quality
kimi/kimi-k3NEW	Moonshot	$3.00	$15.00	1M	9.0
openai/gpt-5.6-lunaNEW	OpenAI Direct	$1.00	$6.00	1M	7.0
openai/gpt-5.6-terraNEW	OpenAI Direct	$2.50	$15.00	1M	8.0
openai/gpt-5.6-solNEW	OpenAI Direct	$5.00	$30.00	1M	9.0
grok/grok-4.5NEW	—	$2.00	$6.00	500K	9.0
tencent/hy3NEW	—	$0.180	$0.590	262K	8.0
anthropic/claude-sonnet-5NEW	Anthropic Direct	$2.00	$10.00	1M	9.0
kling/kling-3-turbo	—	$0.112 /call	—	—	—
z-ai/glm-5.2	Zhipu AI	$1.40	$4.40	1M	9.0
kimi/kimi-k2.7-code	Moonshot	$0.950	$4.00	262K	8.0
anthropic/claude-fable-5	Anthropic Direct	$10.00	$50.00	1M	10.0
qwen/qwen3.7-plus	Alibaba Cloud	$0.350	$1.42	1M	8.0
minimax/minimax-m3	—	$0.300	$1.20	1M	9.0
anthropic/claude-opus-4.8	Anthropic Direct	$5.00	$25.00	1M	10.0
google/gemini-3.5-flash	Google Direct	$1.50	$9.00	1M	9.0
+ 194 more models · Prices update every 60 seconds

Pricing

Routing is free.
Pay for features.

We never take a cut of your token spend. Our revenue comes from optional team features.

Zero markup guarantee

You pay providers directly at their published rates. We add nothing on top of token costs. Routing is free; the optional Team plan funds the platform.

$0.00routing fee

Hacker

Free

Forever. Zero markup on all tokens.

✓ Route — 200+ models, auto-failover

✓ Observe — basic dashboard

✓ Manage — prompt versioning

✓ 3 API keys · 0% token markup

Start free

Team

$499/mo

Still zero markup. Pay for features.

✓ Everything in Hacker

✓ Up to 10 team seats

✓ Compliance enforcement & reports

✓ Unlimited API keys

✓ Priority support

Get started →

Enterprise

Custom

SLA commitments + private deployment.

✓ Everything in Team

✓ Private / on-prem deployment

✓ 99.99% uptime SLA

✓ Dedicated infrastructure

✓ Dedicated support & custom pricing

From the blog

Fresh from the engine room.

What we're building and why — the five latest posts.

All posts →

Guides & InsightsKimi K3 vs Opus 4.8: Flappy Bird & GTA IV Head-to-HeadJul 17, 2026

Guides & InsightsKimi K3 vs GPT-5.6: Same three.js Prompt, Two BuildsJul 17, 2026

Guides & InsightsKimi K3 vs Fable 5: One Prompt, Two Universe SimsJul 17, 2026

Guides & InsightsKimi K3 vs Fable 5, GPT-5.6 & Opus 4.8: We Tested the HypeJul 17, 2026

Guides & InsightsKimi K3: Open Source's DeepSeek Moment — or Just Hype?Jul 17, 2026

One Gateway. Every Model. Route Smarter. Ship Safer. Spend Less.

Works with the tools you already use