31 models. One API.
Powered by NVIDIA NIM.
Every Plugsky model runs on NVIDIA NIM — NVIDIA's inference platform hosting 100+ open and proprietary models. Free, paid, vision, reasoning, code, and embeddings. Switch models in a single line.
🆓 Free tier
Available on every account, including free. No rate-limit headroom but perfect for evaluation, learning, and small jobs.
model: "plugsky-micro"model: "plugsky-lite"💎 Paid tier — General purpose
The everyday workhorses. Balanced quality, latency, and cost. Available on paid plans (trial / starter / builder / scale / enterprise).
model: "plugsky-plus"model: "plugsky-pro"model: "plugsky-max"model: "plugsky-frontier"🧠 Paid tier — Reasoning & code
Models tuned for specific workloads. Higher quality on their target task; same OpenAI-compatible API.
model: "plugsky-reasoning"model: "plugsky-coder"model: "plugsky-coder-fast"model: "plugsky-gpt-oss"model: "plugsky-phi"model: "plugsky-tiny"🌊 Paid tier — DeepSeek & Qwen
Open-source powerhouses from China. Top-tier reasoning and code at competitive pricing.
model: "plugsky-deepseek-pro"model: "plugsky-deepseek-flash"model: "plugsky-qwen-next"👁️ Paid tier — Multimodal & long context
Vision, video, ultra-long context, and MoE models. Use these for images, video, and documents that exceed typical 32K context windows.
model: "plugsky-minimax"model: "plugsky-ultra"model: "plugsky-vision-fast"model: "plugsky-llama4"model: "plugsky-qwen-vl"model: "plugsky-kimi"model: "plugsky-longctx"model: "plugsky-mistral-medium"model: "plugsky-mistral-small"model: "plugsky-nano"🌸 Paid tier — Google Gemma
Compact multimodal models from Google. Lightweight, fast, with image input support.
model: "plugsky-gemma-4"model: "plugsky-gemma3-nano-2b"model: "plugsky-gemma3-nano-4b"🔢 Embeddings
Vector embeddings for semantic search, RAG, clustering, recommendations. Each model outputs a different dimension — check before switching.
model: "plugsky-embed"model: "plugsky-embed-nim"model: "plugsky-embed-multilingual"⚡ API usage
OpenAI-compatible. Any model above can be used with a single line change. No code rewrites between models.
curl https://api.plugsky.com/v1/chat/completions \
-H "Authorization: Bearer sk-live-..." \
-H "Content-Type: application/json" \
-d '{
"model": "plugsky-pro",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 200
}'
from openai import OpenAI
client = OpenAI(
api_key="sk-live-...",
base_url="https://api.plugsky.com/v1"
)
resp = client.chat.completions.create(
model="plugsky-pro",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=200,
)
print(resp.choices[0].message.content)
const r = await fetch("https://api.plugsky.com/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer sk-live-...",
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "plugsky-pro",
messages: [{ role: "user", content: "Hello!" }],
max_tokens: 200
})
});
const data = await r.json();
console.log(data.choices[0].message.content);