Different requests deserve different models. A "summarize this email" call deserves plugsky-lite, not plugsky-frontier. A "refactor this 2,000-line function" call deserves plugsky-frontier, not plugsky-lite. Model routing is the discipline of picking the right model per request, automatically.
Why model routing
Most production AI systems route every request to a single model. The result is over-paying for simple calls and under-serving hard ones. Model routing fixes this:
- Cost savings: 30-70% lower bill by using cheaper models for simple calls
- Better quality: hard calls get the strong model they deserve
- Latency: simple calls return in <500ms instead of waiting on a reasoning model
- Resilience: if one model is down, traffic shifts to a backup automatically
Routing strategies
Plugsky supports four routing strategies you can mix and match per workspace:
⬇ Cost saver
Use the cheapest model that can handle the call. Plugsky classifies each request by complexity (token count, tool calls, domain heuristics) and picks plugsky-lite / micro / plus accordingly.
⚖ Balanced
Default. Use a mid-tier model (plugsky-pro or plugsky-plus) for most calls, fall back to cheap for trivial and to strong for hard. The best balance of cost and quality for most apps.
⬆ Max quality
Always use the strongest model in your tier (plugsky-frontier, plugsky-reasoning). Use when quality matters more than cost — code generation, medical, legal, financial analysis.
🛠 Custom
Define your own routing rules based on prompt content, user tier, time of day, or any other signal. Plugsky ships with a rule editor in the dashboard.
Cost saver / Balanced / Max quality
Just use model="plugsky-fusion" as your model name and Plugsky picks the right model per call. You can change the strategy per workspace, per API key, or per request.
Custom routing rules
For advanced teams, define rules like:
- Users on the free plan always go to plugsky-micro
- Requests with more than 4,000 input tokens always go to plugsky-frontier
- Requests with the word "refactor" in the prompt go to plugsky-pro
- Requests during business hours (9-5 GMT) go to plugsky-frontier, off-hours to plugsky-pro
Rules are evaluated in order; first match wins. The routing is logged per-request for audit and debugging.
Frequently asked questions
How does Plugsky know which model to use?
For the preset strategies (Cost saver, Balanced, Max quality), Plugsky classifies each request by token count, tool call presence, and prompt heuristics. For Custom, you define the rules.
Can I override the routing per request?
Yes. Use a specific model name like plugsky-pro to bypass routing for that request.
Does routing add latency?
No — the router runs in the API gateway, adding <5ms to the request.
Can I see routing decisions in logs?
Yes. Every request log includes the chosen model, the strategy, and the rule that fired (for Custom).
Try plugsky-fusion routing
Set model="plugsky-fusion" in your code and Plugsky picks the right model per call.