LLM API

LLM API — 18+ models, one endpoint, one bill

Plugsky LLM API is one OpenAI-compatible endpoint serving 18+ large language models — from plugsky-micro (free, fast, reasoning) to plugsky-frontier (frontier-quality reasoning and code). Switch models with one parameter, pay one predictable monthly bill.

An LLM API is an HTTP service that runs large language models and exposes them via a standard interface. Plugsky LLM API uses the OpenAI standard so every tool and SDK that speaks OpenAI works without modification.

The 18+ models on Plugsky

Model Ctx Best for
plugsky-micro16KFree tier, fast reasoning, classification
plugsky-lite16KFree tier, fast chat, simple Q&A
plugsky-plus32KGeneral chat, content generation
plugsky-pro65KProduction chat, code, function calling
plugsky-max131KLong context, document analysis
plugsky-frontier131KHard reasoning, code, agents
plugsky-vision32KImage understanding, OCR, document AI
plugsky-reasoning65KLong chain-of-thought reasoning
plugsky-ultra131KFlagship multimodal
plugsky-kimi131KLong-context reasoning, document Q&A
plugsky-deepseek-pro65KCost-optimized reasoning
plugsky-deepseek-flash32KFast, cheap, sub-second responses
plugsky-gpt-oss32KOpen-source base model, fine-tunable
plugsky-qwen-next131KMultilingual, strong Chinese + English
plugsky-llama131KOpen-source base, fine-tunable
plugsky-phi16KSmallest, lowest cost, on-device capable
plugsky-mistral131KOpen-weights, EU origin
plugsky-embed-v11536dEmbeddings (OpenAI ada-compatible)
plugsky-embed-large3072dHigh-recall embeddings

Choosing the right model for the task

The right model depends on three factors: quality required, context length, and cost ceiling.

  • Simple classification, intent detection, short replies — plugsky-micro, plugsky-lite (free)
  • General chat, content generation, function calling — plugsky-pro, plugsky-plus
  • Long-context (book, transcript, codebase analysis) — plugsky-max, plugsky-ultra
  • Hard reasoning, math, complex code — plugsky-frontier, plugsky-reasoning
  • Multimodal (image + text) — plugsky-vision, plugsky-ultra
  • Cost-sensitive high-volume — plugsky-deepseek-flash, plugsky-phi

Routing and model failover

Use model routing to automatically pick the right model per request, or set up a fallback chain: if plugsky-frontier is down or rate-limited, fall back to plugsky-pro automatically.

Pricing and rate limits

All models are unlimited usage on self-serve plans, billed by the month. Rate limits scale with the plan: 60/min (Starter), 300/min (Builder), 1,000/min (Scale). See AI API pricing for the full table.

Frequently asked questions

How do I switch between models?

Change the model parameter in your request. No code changes needed.

Can I use multiple models in one app?

Yes. Different requests can use different models. Many teams use model routing to pick automatically per request.

Do you support fine-tuning?

On Enterprise contracts. We fine-tune open-source base models (plugsky-gpt-oss, plugsky-llama) on your data and deploy the result in your tenant.

What if a model is down?

Plugsky's API returns 503 with a Retry-After header. Set up a fallback chain in your client to try a different model on 503.

Try 18+ models on one endpoint

OpenAI-compatible API, 18+ models, flat monthly pricing. 7-day trial for $5.

Start $5 trial → See pricing