● Deploy anywhere Get API key
All models

31 models. One API.
Powered by NVIDIA NIM.

Every Plugsky model runs on NVIDIA NIM — NVIDIA's inference platform hosting 100+ open and proprietary models. Free, paid, vision, reasoning, code, and embeddings. Switch models in a single line.

🆓 Free tier 💎 Paid tier 🧠 Reasoning 👁️ Vision 💻 Code 📚 Long context 🔢 Embeddings
31
Total models
2
Free tier
27
Paid tier
2
Embeddings
1M
Max context
100%
NVIDIA NIM

🆓 Free tier

Available on every account, including free. No rate-limit headroom but perfect for evaluation, learning, and small jobs.

plugsky-micro
Free
nvidia/nvidia-nemotron-nano-9b-v2
Context: 131K tokens
ChatStreamingJSON mode
Fast, cheap — classification, simple chat, intent detection.
model: "plugsky-micro"
plugsky-lite
Free
meta/llama-3.2-3b-instruct
Context: 32K tokens
ChatStreamingJSON modeTools
Support & chat automation — moderate complexity.
model: "plugsky-lite"

💎 Paid tier — General purpose

The everyday workhorses. Balanced quality, latency, and cost. Available on paid plans (trial / starter / builder / scale / enterprise).

plugsky-plus
deepseek-ai/deepseek-v4-flash
Context: 32K tokens
ChatStreamingJSONToolsFunction-call
Balanced general agent — good quality at lower cost.
model: "plugsky-plus"
plugsky-pro
qwen3.6-plus → minimaxai/minimax-m3
Context: 65K tokens
ChatStreamingJSONFunction-callCode
Coding & reasoning (default) — strong general purpose.
model: "plugsky-pro"
plugsky-max
nvidia/llama-3.3-nemotron-super-49b-v1.5
Context: 131K tokens
ChatStreamingJSONFunction-callLong-ctx
Complex multi-step — deep reasoning.
model: "plugsky-max"
plugsky-frontier
mistralai/mistral-large-3-675b-instruct-2512
Context: 131K tokens
ChatStreamingJSONFunction-callLong-ctxTools
Frontier-tier — Mistral Large 3 675B (0.3s, 128K context, EU origin).
model: "plugsky-frontier"

🧠 Paid tier — Reasoning & code

Models tuned for specific workloads. Higher quality on their target task; same OpenAI-compatible API.

plugsky-reasoning
nvidia/nemotron-3-super-120b-a12b
Context: 65K tokens
ReasoningToolsJSONLong-ctx
Deep reasoning, math, code — NVIDIA Nemotron Super 120B.
model: "plugsky-reasoning"
plugsky-coder
bytedance/seed-oss-36b-instruct
Context: 131K tokens
CodeToolsJSONLong-ctxMoE
Best open coding model — Qwen3 Coder 480B MoE.
model: "plugsky-coder"
plugsky-coder-fast
meta/llama-3.2-3b-instruct
Context: 32K tokens
CodeToolsJSON
Fast coding — Llama 3.2 3B.
model: "plugsky-coder-fast"
plugsky-gpt-oss
openai/gpt-oss-120b
Context: 32K tokens
ReasoningToolsJSONLong-ctx
Open-source GPT — gpt-oss-120B.
model: "plugsky-gpt-oss"
plugsky-phi
nvidia/nemotron-mini-4b-instruct
Context: 131K tokens
ChatStreamingToolsJSON
NVIDIA Nemotron Mini 4B — ultra-compact, very fast.
model: "plugsky-phi"
plugsky-tiny
nvidia/nvidia-nemotron-nano-9b-v2
Context: 131K tokens
ChatStreamingToolsJSON
NVIDIA Nemotron Nano 9B v2 — small, fast, low cost.
model: "plugsky-tiny"

🌊 Paid tier — DeepSeek & Qwen

Open-source powerhouses from China. Top-tier reasoning and code at competitive pricing.

plugsky-deepseek-pro
deepseek-ai/deepseek-v4-pro
Context: 65K tokens
ReasoningCodeToolsJSON
Reasoning + code — DeepSeek V4 Pro.
model: "plugsky-deepseek-pro"
plugsky-deepseek-flash
deepseek-ai/deepseek-v4-flash
Context: 32K tokens
ChatStreamingToolsJSON
Fast DeepSeek — V4 Flash.
model: "plugsky-deepseek-flash"
plugsky-qwen-next
qwen/qwen3-next-80b-a3b-instruct
Context: 131K tokens
ToolsJSONLong-ctxMoE
Alibaba Qwen3 Next 80B (MoE, 256K context).
model: "plugsky-qwen-next"

👁️ Paid tier — Multimodal & long context

Vision, video, ultra-long context, and MoE models. Use these for images, video, and documents that exceed typical 32K context windows.

plugsky-minimax
minimaxai/minimax-m3
Context: 32K tokens
VisionVideoMultimodalToolsJSON
NVIDIA MiniMax-M3 — strong multimodal + reasoning.
model: "plugsky-minimax"
plugsky-ultra
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning
Context: 1M tokens
VisionMultimodalToolsLong-ctxMoEReasoning
NVIDIA Nemotron 3 Nano Omni 30B — omni-modal + reasoning, 1M context.
model: "plugsky-ultra"
plugsky-vision-fast
meta/llama-3.2-11b-vision-instruct
Context: 32K tokens
VisionMultimodalJSON
Multimodal fast — Llama 3.2 Vision 11B.
model: "plugsky-vision-fast"
plugsky-llama4
meta/llama-4-maverick-17b-128e-instruct
Context: 131K tokens
ToolsJSONLong-ctxMoE
Meta Llama 4 Maverick 17B (128 experts MoE, 128K ctx).
model: "plugsky-llama4"
plugsky-qwen-vl
qwen/qwen3.5-397b-a17b
Context: 262K tokens
VisionMultimodalToolsJSONLong-ctxMoEReasoning
Qwen 3.5 397B MoE — multimodal + 256K context + reasoning.
model: "plugsky-qwen-vl"
plugsky-kimi
moonshotai/kimi-k2.6
Context: 131K tokens
ToolsJSONLong-ctx
Long-context (256K) — MoonshotAI Kimi K2.6.
model: "plugsky-kimi"
plugsky-longctx
mistralai/mistral-large-3-675b-instruct-2512
Context: 131K tokens
ToolsJSONLong-ctx
Mistral Large 3 675B — European, 128K context.
model: "plugsky-longctx"
plugsky-mistral-medium
mistralai/mistral-medium-3.5-128b
Context: 131K tokens
ToolsJSONLong-ctx
Mistral Medium 3.5 128B — fast 128K context.
model: "plugsky-mistral-medium"
plugsky-mistral-small
mistralai/mistral-small-4-119b-2603
Context: 131K tokens
ToolsJSONLong-ctx
Mistral Small 4 119B — fast + 128K context.
model: "plugsky-mistral-small"
plugsky-nano
nvidia/nemotron-3-nano-30b-a3b
Context: 1M tokens
ToolsJSONLong-ctxMoE
NVIDIA Nemotron 3 Nano 30B MoE — 1M context, fast.
model: "plugsky-nano"

🌸 Paid tier — Google Gemma

Compact multimodal models from Google. Lightweight, fast, with image input support.

plugsky-gemma-4
google/gemma-3n-e4b-it
Context: 32K tokens
MultimodalJSON
Google Gemma 3 Nano 4B — fast + multimodal.
model: "plugsky-gemma-4"
plugsky-gemma3-nano-2b
google/gemma-3n-e2b-it
Context: 32K tokens
MultimodalJSON
Google Gemma 3 Nano 2B — ultra-compact, fast.
model: "plugsky-gemma3-nano-2b"
plugsky-gemma3-nano-4b
google/gemma-3n-e4b-it
Context: 32K tokens
MultimodalJSON
Google Gemma 3 Nano 4B — small + fast + multimodal.
model: "plugsky-gemma3-nano-4b"

🔢 Embeddings

Vector embeddings for semantic search, RAG, clustering, recommendations. Each model outputs a different dimension — check before switching.

plugsky-embed
Embed
nvidia/nv-embed-v1
Dim: 4096 · Ctx: 8K
Embeddings
NVIDIA NV-Embed v1 — best general embeddings.
model: "plugsky-embed"
plugsky-embed-nim
Embed
nvidia/nv-embed-v1
Dim: 4096 · Ctx: 8K
Embeddings
NVIDIA NV-Embed v1 — best general embeddings (alias).
model: "plugsky-embed-nim"
plugsky-embed-multilingual
Embed
baai/bge-m3
Dim: 1024 · Ctx: 8K
EmbeddingsMultilingual
BGE-M3 — multilingual embeddings (100+ languages).
model: "plugsky-embed-multilingual"

API usage

OpenAI-compatible. Any model above can be used with a single line change. No code rewrites between models.

cURL
curl https://api.plugsky.com/v1/chat/completions \
  -H "Authorization: Bearer sk-live-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "plugsky-pro",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 200
  }'
Python (openai SDK)
from openai import OpenAI

client = OpenAI(
    api_key="sk-live-...",
    base_url="https://api.plugsky.com/v1"
)

resp = client.chat.completions.create(
    model="plugsky-pro",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=200,
)
print(resp.choices[0].message.content)
JavaScript / TypeScript
const r = await fetch("https://api.plugsky.com/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk-live-...",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "plugsky-pro",
    messages: [{ role: "user", content: "Hello!" }],
    max_tokens: 200
  })
});
const data = await r.json();
console.log(data.choices[0].message.content);