An LLM API is an HTTP service that runs large language models and exposes them via a standard interface. Plugsky LLM API uses the OpenAI standard so every tool and SDK that speaks OpenAI works without modification.
The 18+ models on Plugsky
| Model | Ctx | Best for |
|---|---|---|
plugsky-micro | 16K | Free tier, fast reasoning, classification |
plugsky-lite | 16K | Free tier, fast chat, simple Q&A |
plugsky-plus | 32K | General chat, content generation |
plugsky-pro | 65K | Production chat, code, function calling |
plugsky-max | 131K | Long context, document analysis |
plugsky-frontier | 131K | Hard reasoning, code, agents |
plugsky-vision | 32K | Image understanding, OCR, document AI |
plugsky-reasoning | 65K | Long chain-of-thought reasoning |
plugsky-ultra | 131K | Flagship multimodal |
plugsky-kimi | 131K | Long-context reasoning, document Q&A |
plugsky-deepseek-pro | 65K | Cost-optimized reasoning |
plugsky-deepseek-flash | 32K | Fast, cheap, sub-second responses |
plugsky-gpt-oss | 32K | Open-source base model, fine-tunable |
plugsky-qwen-next | 131K | Multilingual, strong Chinese + English |
plugsky-llama | 131K | Open-source base, fine-tunable |
plugsky-phi | 16K | Smallest, lowest cost, on-device capable |
plugsky-mistral | 131K | Open-weights, EU origin |
plugsky-embed-v1 | 1536d | Embeddings (OpenAI ada-compatible) |
plugsky-embed-large | 3072d | High-recall embeddings |
Choosing the right model for the task
The right model depends on three factors: quality required, context length, and cost ceiling.
- Simple classification, intent detection, short replies — plugsky-micro, plugsky-lite (free)
- General chat, content generation, function calling — plugsky-pro, plugsky-plus
- Long-context (book, transcript, codebase analysis) — plugsky-max, plugsky-ultra
- Hard reasoning, math, complex code — plugsky-frontier, plugsky-reasoning
- Multimodal (image + text) — plugsky-vision, plugsky-ultra
- Cost-sensitive high-volume — plugsky-deepseek-flash, plugsky-phi
Routing and model failover
Use model routing to automatically pick the right model per request, or set up a fallback chain: if plugsky-frontier is down or rate-limited, fall back to plugsky-pro automatically.
Pricing and rate limits
All models are unlimited usage on self-serve plans, billed by the month. Rate limits scale with the plan: 60/min (Starter), 300/min (Builder), 1,000/min (Scale). See AI API pricing for the full table.
Frequently asked questions
How do I switch between models?
Change the model parameter in your request. No code changes needed.
Can I use multiple models in one app?
Yes. Different requests can use different models. Many teams use model routing to pick automatically per request.
Do you support fine-tuning?
On Enterprise contracts. We fine-tune open-source base models (plugsky-gpt-oss, plugsky-llama) on your data and deploy the result in your tenant.
What if a model is down?
Plugsky's API returns 503 with a Retry-After header. Set up a fallback chain in your client to try a different model on 503.
Try 18+ models on one endpoint
OpenAI-compatible API, 18+ models, flat monthly pricing. 7-day trial for $5.
Start $5 trial → See pricing