Plugsky — Full Model Reference (31 NVIDIA NIM Models)

🆓 Free tier

Available on every account, including free. No rate-limit headroom but perfect for evaluation, learning, and small jobs.

plugsky-micro

Free

nvidia/nvidia-nemotron-nano-9b-v2

Context: 131K tokens

ChatStreamingJSON mode

Fast, cheap — classification, simple chat, intent detection.

model: "plugsky-micro"

plugsky-lite

Free

meta/llama-3.2-3b-instruct

Context: 32K tokens

ChatStreamingJSON modeTools

Support & chat automation — moderate complexity.

model: "plugsky-lite"

💎 Paid tier — General purpose

The everyday workhorses. Balanced quality, latency, and cost. Available on paid plans (trial / starter / builder / scale / enterprise).

plugsky-plus

Paid

deepseek-ai/deepseek-v4-flash

Context: 32K tokens

ChatStreamingJSONToolsFunction-call

Balanced general agent — good quality at lower cost.

model: "plugsky-plus"

plugsky-pro

Paid

qwen3.6-plus → minimaxai/minimax-m3

Context: 65K tokens

ChatStreamingJSONFunction-callCode

Coding & reasoning (default) — strong general purpose.

model: "plugsky-pro"

plugsky-max

Paid

nvidia/llama-3.3-nemotron-super-49b-v1.5

Context: 131K tokens

ChatStreamingJSONFunction-callLong-ctx

Complex multi-step — deep reasoning.

model: "plugsky-max"

plugsky-frontier

Paid

mistralai/mistral-large-3-675b-instruct-2512

Context: 131K tokens

ChatStreamingJSONFunction-callLong-ctxTools

Frontier-tier — Mistral Large 3 675B (0.3s, 128K context, EU origin).

model: "plugsky-frontier"

🧠 Paid tier — Reasoning & code

Models tuned for specific workloads. Higher quality on their target task; same OpenAI-compatible API.

plugsky-reasoning

Paid

nvidia/nemotron-3-super-120b-a12b

Context: 65K tokens

ReasoningToolsJSONLong-ctx

Deep reasoning, math, code — NVIDIA Nemotron Super 120B.

model: "plugsky-reasoning"

plugsky-coder

Paid

bytedance/seed-oss-36b-instruct

Context: 131K tokens

CodeToolsJSONLong-ctxMoE

Best open coding model — Qwen3 Coder 480B MoE.

model: "plugsky-coder"

plugsky-coder-fast

Paid

meta/llama-3.2-3b-instruct

Context: 32K tokens

CodeToolsJSON

Fast coding — Llama 3.2 3B.

model: "plugsky-coder-fast"

plugsky-gpt-oss

Paid

openai/gpt-oss-120b

Context: 32K tokens

ReasoningToolsJSONLong-ctx

Open-source GPT — gpt-oss-120B.

model: "plugsky-gpt-oss"

plugsky-phi

Paid

nvidia/nemotron-mini-4b-instruct

Context: 131K tokens

ChatStreamingToolsJSON

NVIDIA Nemotron Mini 4B — ultra-compact, very fast.

model: "plugsky-phi"

plugsky-tiny

Paid

nvidia/nvidia-nemotron-nano-9b-v2

Context: 131K tokens

ChatStreamingToolsJSON

NVIDIA Nemotron Nano 9B v2 — small, fast, low cost.

model: "plugsky-tiny"

🌊 Paid tier — DeepSeek & Qwen

Open-source powerhouses from China. Top-tier reasoning and code at competitive pricing.

plugsky-deepseek-pro

Paid

deepseek-ai/deepseek-v4-pro

Context: 65K tokens

ReasoningCodeToolsJSON

Reasoning + code — DeepSeek V4 Pro.

model: "plugsky-deepseek-pro"

plugsky-deepseek-flash

Paid

deepseek-ai/deepseek-v4-flash

Context: 32K tokens

ChatStreamingToolsJSON

Fast DeepSeek — V4 Flash.

model: "plugsky-deepseek-flash"

plugsky-qwen-next

Paid

qwen/qwen3-next-80b-a3b-instruct

Context: 131K tokens

ToolsJSONLong-ctxMoE

Alibaba Qwen3 Next 80B (MoE, 256K context).

model: "plugsky-qwen-next"

👁️ Paid tier — Multimodal & long context

Vision, video, ultra-long context, and MoE models. Use these for images, video, and documents that exceed typical 32K context windows.

plugsky-minimax

Paid

minimaxai/minimax-m3

Context: 32K tokens

VisionVideoMultimodalToolsJSON

NVIDIA MiniMax-M3 — strong multimodal + reasoning.

model: "plugsky-minimax"

plugsky-ultra

Paid

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning

Context: 1M tokens

VisionMultimodalToolsLong-ctxMoEReasoning

NVIDIA Nemotron 3 Nano Omni 30B — omni-modal + reasoning, 1M context.

model: "plugsky-ultra"

plugsky-vision-fast

Paid

meta/llama-3.2-11b-vision-instruct

Context: 32K tokens

VisionMultimodalJSON

Multimodal fast — Llama 3.2 Vision 11B.

model: "plugsky-vision-fast"

plugsky-llama4

Paid

meta/llama-4-maverick-17b-128e-instruct

Context: 131K tokens

ToolsJSONLong-ctxMoE

Meta Llama 4 Maverick 17B (128 experts MoE, 128K ctx).

model: "plugsky-llama4"

plugsky-qwen-vl

Paid

qwen/qwen3.5-397b-a17b

Context: 262K tokens

VisionMultimodalToolsJSONLong-ctxMoEReasoning

Qwen 3.5 397B MoE — multimodal + 256K context + reasoning.

model: "plugsky-qwen-vl"

plugsky-kimi

Paid

moonshotai/kimi-k2.6

Context: 131K tokens

ToolsJSONLong-ctx

Long-context (256K) — MoonshotAI Kimi K2.6.

model: "plugsky-kimi"

plugsky-longctx

Paid

mistralai/mistral-large-3-675b-instruct-2512

Context: 131K tokens

ToolsJSONLong-ctx

Mistral Large 3 675B — European, 128K context.

model: "plugsky-longctx"

plugsky-mistral-medium

Paid

mistralai/mistral-medium-3.5-128b

Context: 131K tokens

ToolsJSONLong-ctx

Mistral Medium 3.5 128B — fast 128K context.

model: "plugsky-mistral-medium"

plugsky-mistral-small

Paid

mistralai/mistral-small-4-119b-2603

Context: 131K tokens

ToolsJSONLong-ctx

Mistral Small 4 119B — fast + 128K context.

model: "plugsky-mistral-small"

plugsky-nano

Paid

nvidia/nemotron-3-nano-30b-a3b

Context: 1M tokens

ToolsJSONLong-ctxMoE

NVIDIA Nemotron 3 Nano 30B MoE — 1M context, fast.

model: "plugsky-nano"

🌸 Paid tier — Google Gemma

Compact multimodal models from Google. Lightweight, fast, with image input support.

plugsky-gemma-4

Paid

google/gemma-3n-e4b-it

Context: 32K tokens

MultimodalJSON

Google Gemma 3 Nano 4B — fast + multimodal.

model: "plugsky-gemma-4"

plugsky-gemma3-nano-2b

Paid

google/gemma-3n-e2b-it

Context: 32K tokens

MultimodalJSON

Google Gemma 3 Nano 2B — ultra-compact, fast.

model: "plugsky-gemma3-nano-2b"

plugsky-gemma3-nano-4b

Paid

google/gemma-3n-e4b-it

Context: 32K tokens

MultimodalJSON

Google Gemma 3 Nano 4B — small + fast + multimodal.

model: "plugsky-gemma3-nano-4b"

🔢 Embeddings

Vector embeddings for semantic search, RAG, clustering, recommendations. Each model outputs a different dimension — check before switching.

plugsky-embed

Embed

nvidia/nv-embed-v1

Dim: 4096 · Ctx: 8K

Embeddings

NVIDIA NV-Embed v1 — best general embeddings.

model: "plugsky-embed"

plugsky-embed-nim

Embed

nvidia/nv-embed-v1

Dim: 4096 · Ctx: 8K

Embeddings

NVIDIA NV-Embed v1 — best general embeddings (alias).

model: "plugsky-embed-nim"

plugsky-embed-multilingual

Embed

baai/bge-m3

Dim: 1024 · Ctx: 8K

EmbeddingsMultilingual

BGE-M3 — multilingual embeddings (100+ languages).

model: "plugsky-embed-multilingual"

⚡ API usage

OpenAI-compatible. Any model above can be used with a single line change. No code rewrites between models.

cURL

curl https://api.plugsky.com/v1/chat/completions \
  -H "Authorization: Bearer sk-live-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "plugsky-pro",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 200
  }'

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-live-...",
    base_url="https://api.plugsky.com/v1"
)

resp = client.chat.completions.create(
    model="plugsky-pro",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=200,
)
print(resp.choices[0].message.content)

JavaScript / TypeScript

const r = await fetch("https://api.plugsky.com/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk-live-...",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "plugsky-pro",
    messages: [{ role: "user", content: "Hello!" }],
    max_tokens: 200
  })
});
const data = await r.json();
console.log(data.choices[0].message.content);

31 models. One API.
Powered by NVIDIA NIM.

🆓 Free tier

💎 Paid tier — General purpose

🧠 Paid tier — Reasoning & code

🌊 Paid tier — DeepSeek & Qwen

👁️ Paid tier — Multimodal & long context

🌸 Paid tier — Google Gemma

🔢 Embeddings

⚡ API usage

Get started in 30 seconds

31 models. One API.Powered by NVIDIA NIM.

🆓 Free tier

💎 Paid tier — General purpose

🧠 Paid tier — Reasoning & code

🌊 Paid tier — DeepSeek & Qwen

👁️ Paid tier — Multimodal & long context

🌸 Paid tier — Google Gemma

🔢 Embeddings

⚡ API usage

Get started in 30 seconds

31 models. One API.
Powered by NVIDIA NIM.