LLM API — 18+ models on one OpenAI-compatible endpoint

An LLM API is an HTTP service that runs large language models and exposes them via a standard interface. Plugsky LLM API uses the OpenAI standard so every tool and SDK that speaks OpenAI works without modification.

The 18+ models on Plugsky

Model	Ctx	Best for
`plugsky-micro`	16K	Free tier, fast reasoning, classification
`plugsky-lite`	16K	Free tier, fast chat, simple Q&A
`plugsky-plus`	32K	General chat, content generation
`plugsky-pro`	65K	Production chat, code, function calling
`plugsky-max`	131K	Long context, document analysis
`plugsky-frontier`	131K	Hard reasoning, code, agents
`plugsky-vision`	32K	Image understanding, OCR, document AI
`plugsky-reasoning`	65K	Long chain-of-thought reasoning
`plugsky-ultra`	131K	Flagship multimodal
`plugsky-kimi`	131K	Long-context reasoning, document Q&A
`plugsky-deepseek-pro`	65K	Cost-optimized reasoning
`plugsky-deepseek-flash`	32K	Fast, cheap, sub-second responses
`plugsky-gpt-oss`	32K	Open-source base model, fine-tunable
`plugsky-qwen-next`	131K	Multilingual, strong Chinese + English
`plugsky-llama`	131K	Open-source base, fine-tunable
`plugsky-phi`	16K	Smallest, lowest cost, on-device capable
`plugsky-mistral`	131K	Open-weights, EU origin
`plugsky-embed-v1`	1536d	Embeddings (OpenAI ada-compatible)
`plugsky-embed-large`	3072d	High-recall embeddings

Choosing the right model for the task

The right model depends on three factors: quality required, context length, and cost ceiling.

Simple classification, intent detection, short replies — plugsky-micro, plugsky-lite (free)
General chat, content generation, function calling — plugsky-pro, plugsky-plus
Long-context (book, transcript, codebase analysis) — plugsky-max, plugsky-ultra
Hard reasoning, math, complex code — plugsky-frontier, plugsky-reasoning
Multimodal (image + text) — plugsky-vision, plugsky-ultra
Cost-sensitive high-volume — plugsky-deepseek-flash, plugsky-phi

Routing and model failover

Use model routing to automatically pick the right model per request, or set up a fallback chain: if plugsky-frontier is down or rate-limited, fall back to plugsky-pro automatically.

Pricing and rate limits

All models are unlimited usage on self-serve plans, billed by the month. Rate limits scale with the plan: 60/min (Starter), 300/min (Builder), 1,000/min (Scale). See AI API pricing for the full table.

Frequently asked questions

How do I switch between models?

Change the model parameter in your request. No code changes needed.

Can I use multiple models in one app?

Yes. Different requests can use different models. Many teams use model routing to pick automatically per request.

Do you support fine-tuning?

On Enterprise contracts. We fine-tune open-source base models (plugsky-gpt-oss, plugsky-llama) on your data and deploy the result in your tenant.

What if a model is down?

Plugsky's API returns 503 with a Retry-After header. Set up a fallback chain in your client to try a different model on 503.

Try 18+ models on one endpoint

OpenAI-compatible API, 18+ models, flat monthly pricing. Start with a 7-day free trial.

Start Free → See pricing

The 18+ models on Plugsky

Choosing the right model for the task

Routing and model failover

Pricing and rate limits

Frequently asked questions

Try 18+ models on one endpoint

Related

OpenAI-compatible API

Model routing

Unlimited AI API