Model routing — auto-pick the right model per request

Different requests deserve different models. A "summarize this email" call deserves plugsky-lite, not plugsky-frontier. A "refactor this 2,000-line function" call deserves plugsky-frontier, not plugsky-lite. Model routing is the discipline of picking the right model per request, automatically.

Why model routing

Most production AI systems route every request to a single model. The result is over-paying for simple calls and under-serving hard ones. Model routing fixes this:

Cost savings: 30-70% lower bill by using cheaper models for simple calls
Better quality: hard calls get the strong model they deserve
Latency: simple calls return in <500ms instead of waiting on a reasoning model
Resilience: if one model is down, traffic shifts to a backup automatically

Routing strategies

Plugsky supports four routing strategies you can mix and match per workspace:

⬇ Cost saver

Use the cheapest model that can handle the call. Plugsky classifies each request by complexity (token count, tool calls, domain heuristics) and picks plugsky-lite / micro / plus accordingly.

⚖ Balanced

Default. Use a mid-tier model (plugsky-pro or plugsky-plus) for most calls, fall back to cheap for trivial and to strong for hard. The best balance of cost and quality for most apps.

⬆ Max quality

Always use the strongest model in your tier (plugsky-frontier, plugsky-reasoning). Use when quality matters more than cost — code generation, medical, legal, financial analysis.

🛠 Custom

Define your own routing rules based on prompt content, user tier, time of day, or any other signal. Plugsky ships with a rule editor in the dashboard.

Cost saver / Balanced / Max quality

Just use model="plugsky-fusion" as your model name and Plugsky picks the right model per call. You can change the strategy per workspace, per API key, or per request.

Custom routing rules

For advanced teams, define rules like:

Users on the free plan always go to plugsky-micro
Requests with more than 4,000 input tokens always go to plugsky-frontier
Requests with the word "refactor" in the prompt go to plugsky-pro
Requests during business hours (9-5 GMT) go to plugsky-frontier, off-hours to plugsky-pro

Rules are evaluated in order; first match wins. The routing is logged per-request for audit and debugging.

Frequently asked questions

How does Plugsky know which model to use?

For the preset strategies (Cost saver, Balanced, Max quality), Plugsky classifies each request by token count, tool call presence, and prompt heuristics. For Custom, you define the rules.

Can I override the routing per request?

Yes. Use a specific model name like plugsky-pro to bypass routing for that request.

Does routing add latency?

No — the router runs in the API gateway, adding <5ms to the request.

Can I see routing decisions in logs?

Yes. Every request log includes the chosen model, the strategy, and the rule that fired (for Custom).

Try plugsky-fusion routing

Set model="plugsky-fusion" in your code and Plugsky picks the right model per call.

Start Free → See pricing