OrcaRouter grades every prompt and routes it — hard reasoning to frontier models, routine work to open-source — so you keep frontier answer quality while cutting spend ~40%. Quality-checked, never a blind downgrade. Zero token markup. One line to switch.
- client = OpenAI(api_key="sk-...")+ client = OpenAI(+ base_url="https://api.orcarouter.ai/v1",+ api_key="sk-orca-..."+ )# Everything else stays the same.response = client.chat.completions.create(model="orcarouter/auto", # router picks the best model per requestmessages=[{"role": "user", "content": "..."}])# → orcarouter/auto grades the prompt → frontier or open-source, zero token markup ✓
Every prompt is graded, then sent straight to the chosen model's first-party provider — no reseller in the middle. The grade, model, and provider are shown on each request, so a route is never a black box.
Each upstream provider's data and usage terms apply directly to your traffic. Pin routing to the models and providers that match your policy.
Every call records the difficulty grade, the model chosen, the provider, and the published price. Reproduce any routing decision later from the dashboard.
One URL change. Your existing SDK, model names, and streaming all work exactly as before.
Set base_url to api.orcarouter.ai/v1 and swap your API key. No other code changes needed.
Each prompt is graded for difficulty in under 1ms, then sent to the model that holds answer quality at the lowest cost — frontier for hard reasoning, open-source for routine work.
Whichever model is chosen, traffic goes direct to its first-party provider at their published rate. We add exactly $0 per token — our fee is on the plan, not your usage.
Quality-graded routing to the right model — frontier or open-source. Live prices refresh every 60s.
| Model | Routed to | Input /M | Output /M | Context | Quality |
|---|---|---|---|---|---|
| claude-opus-4-7 | Anthropic Direct | $5.00 | $25.00 | 1M | 10.0 |
| claude-sonnet-4-6 | Anthropic Direct | $3.00 | $15.00 | 1M | 7.0 |
| gpt-5.5 | OpenAI Direct | $5.00 | $30.00 | 1M | 10.0 |
| gemini-3.1-pro-preview | Google Direct | $4.00 | $18.00 | 1M | 10.0 |
| deepseek-v4-pro | DeepSeek | $0.560 | $1.12 | 1M | 9.0 |
| qwen3.6-plus | Alibaba Cloud | $0.500 | $3.00 | 1M | 8.0 |
| kimi-k2.6 | Moonshot | $0.900 | $3.75 | 256K | 9.0 |
| seedance-2.0 | ByteDance | from $0.07 /sec | — | — | 10.0 |
| + 194 more models · Prices update every 60 seconds | |||||
Everything you need to run AI in production without managing multiple provider integrations.
Every prompt is graded and sent to the model that holds quality at the lowest cost — frontier when it's hard, open-source when it's routine. Automatic.
Provider goes down mid-stream? We switch transparently. Your app sees zero errors.
Issue keys per team or service with spend caps, model allowlists, and rate limits built in.
See what every request cost, the model and provider that handled it, and how much grading saved you.
Change one line. Same SDK, same model names, same streaming format. Zero migration effort.
Hard and soft limits per key, team, or org. Auto-resets monthly. Slack + webhook alerts.
Every request shows the difficulty grade, the model chosen, the provider that served it, and the published price. Verifiable per call, reproducible later.
Each completion is tagged with the difficulty grade, the model picked, and its first-party provider — Anthropic Direct, OpenAI Direct, Bedrock, Vertex — surfaced in your dashboard and headers.
Every token charge equals the provider's public list price. Audit any request against the provider's own pricing page in seconds.
Difficulty grades, model choices, failover events, and health swaps are logged with timestamps. Reproduce any request's routing path.
We never take a cut of your token spend. Our revenue comes from optional team features.
Audit reports available under NDA — request a copy below.
Sign up with GitHub — $5 in tokens free. No credit card required. Swap one line of code and you're live.