Plugsky is a deploy-anywhere AI platform. One OpenAI-compatible API gives you access to 31 top models (100% NVIDIA NIM) with predictable pricing, private endpoints, and elastic GPU — on our cloud, your cloud, or on-prem.

Is Plugsky OpenAI-compatible?

Yes. Just change the base URL to https://api.plugsky.com/v1 and use your Plugsky API key with any OpenAI SDK — Python, Node.js, LangChain, LlamaIndex, etc. work out of the box.

Is my data used to train models?

No. Prompts, completions, and RAG documents are never used to train any model. Enterprise plans add a signed DPA and in-region options.

How does pricing work?

Every paid plan includes unlimited usage with fair-use rate limits. Start with a 7-day free trial, then pick a plan ($20-$120/mo) or contact us for annual enterprise pricing.

◈ Every model. One API. Unlimited Usage.

Build AI faster & cheaper — unlimited usage, fully in your control

Powerful AI models & native agents in one API — fixed monthly pricing, no rate caps. Run on our cloud, your cloud, or on-prem in any country. A native agent like Manus & Perplexity, and the alternative to GPT, OpenRouter & Claude. Free, unlimited usage.

OpenAI-compatibleData residencySSO / RBACAudit logsWhite-label99.9% SLA

plugsky · /v1/chat/completions

$ curl https://api.plugsky.com/v1/chat/completions \
  -H "Authorization: Bearer sk-live-…" \
  -d '{ "model": "plugsky-pro", "stream": true }'
{
  "region": "auto",
  "latency_ms": 38,
  "content": "Hello 👋 The native AI agent like Manus & Perplexity — and the alternative to ChatGPT, Claude, GPT & OpenRouter. One API, free, unlimited usage."
}

35M+

API calls / day

2M+

AI agents powered

50,000+

Developers Addicted

99.9%

Uptime SLA

31+

Powerful AI models

7-day full-access free trial

Figures sourced from the Plugsky data room and pending independent validation before external distribution.

Trusted by the leaders

Backed by a world-class ecosystem — NVIDIA, Microsoft, OpenAI, AWS, Google, PayPal, Meta, Shopify — across enterprise AI, fintech, government and SaaS.

🟢 Nvidia Ⓜ️ Microsoft 🅿️ PayPal ⓕ Facebook 🇬 Google 🅾 OpenAI 🅰️ AWS 🇬 Google Cloud 🛍️ Shopify 🟢 Nvidia Ⓜ️ Microsoft 🅿️ PayPal ⓕ Facebook 🇬 Google 🅾 OpenAI 🅰️ AWS 🇬 Google Cloud 🛍️ Shopify

👨‍💻

Built for developers

Call 31+ models through one OpenAI-compatible API — flat pricing from $5, no per-token bills, switch from lite to frontier in a single line.

🧩

Made for SaaS teams

White-label models with flat, predictable pricing — built-in routing, auto-fallback, and team seats that protect your margins as you scale.

🏛️

Enterprise & regulated

Private, sovereign deployment in your own cloud or region — SSO/SAML, audit logs, GDPR & PDPL alignment, and SLAs up to 99.99%.

The bottleneck

AI is becoming a utility. Access is the constraint.

Global platforms impose rate caps, unpredictable token costs, and zero control over where your data runs. Plugsky removes those limits for any team, anywhere.

⛔

Rate & token limits

Throughput caps and queueing throttle production AI products at the worst time — peak demand.

📈

Unpredictable pricing

Pure token billing makes margins impossible to plan. SaaS teams need fixed, high-volume packages.

🛡️

No control over data

Sending sensitive data to third-party clouds you don't control is a non-starter for regulated teams.

One platform, every product

Every product your AI team needs — already wired

Six core products, one OpenAI-compatible API, one bill, one account. Each stands on its own — but they click together the moment you turn them on.

🧠

Models

Call 31+ models from one OpenAI-compatible API — lite to frontier, switch in a single line. Free, paid and embedding models side by side.

🤖

Agent Cloud

Build and run agents with function-calling, memory, and orchestration. OpenAI Assistants-compatible for one-line migration.

Open Agents →

🧩

Tools & Connectors

Wire agents into your tools, APIs and data sources. Bring-your-own Python functions, webhooks, HTTP actions — no glue code.

Browse tools →

📖

Knowledge (RAG)

Private RAG over your own docs with 4096-dim embeddings. Upload PDFs, code, knowledge bases — chat against them in one call.

Try RAG →

🔀

Model Fusion

Automatically blends cheaper and stronger models per task — fan out to N models in parallel, pick the best, save 60-80% vs always-frontier.

Open Fusion →

🛒

Marketplace

Discover and install ready-made agents, tools and prompt packs in a click. Free with every plan.

Browse marketplace →

Software & downloads

One-line install on every platform

Three first-party apps, all open-source under MIT / Apache-2.0, all pre-configured to talk to the Plugsky API. Pick your platform — install with a single command.

⌨️

Plugsky CLI

AI coding agent for your terminal

Fork of opencode, rebranded 100% as Plugsky. macOS, Linux, Windows. MIT-licensed.

curl -fsSL https://plugsky.com/install | bash

🍎 macOS arm64 🍎 macOS x64

🐧 Linux x64 🐧 Linux arm64

🪟 Windows x64 🪟 Windows arm64

🖥️

Plugsky Desktop

Native chat client (Jan fork)

Built on the open-source Jan project (Apache-2.0). Branded Plugsky with plugsky-pro as the default model.

curl -fsSL https://plugsky.com/install-desktop | bash

🍎 macOS Apple Silicon 🍎 macOS Intel

🐧 Linux x64 🪟 Windows x64

🌐

Plugsky Web

Self-hosted chat UI (Open WebUI fork)

Built on the open-source Open WebUI project. Docker or pip install. plugsky-fusion pre-listed.

curl -fsSL https://plugsky.com/install-web | bash

🐳 Docker 🐍 pip install

🍎 macOS 🐧 Linux

All installers are MIT-licensed. See NOTICE for upstream attribution. Full software reference →

Model ladder

From micro to frontier — 31 models, one API

31 first-party and partner models behind one OpenAI-compatible endpoint. Free, paid, vision, reasoning, code, embedding — pick by capability, switch in a line. Hover the ⓘ icon for full details on any model.

Built for

Built for every team

👩‍💻

Developers

High-volume APIs, simple migration, usage dashboards.

🧩

SaaS platforms

Reserved throughput & white-label AI to protect margins.

🏛️

Government

Private endpoints with full audit controls.

🏦

Banking & fintech

Compliance-ready RAG, SSO/RBAC, audit logs.

🤝

Agencies & SIs

Wholesale API + per-client isolated deployments.

📊

Trading & markets

Low-latency regional endpoints for high-volume support.

Why Plugsky

Compete on infrastructure, not model hype

Frontier labs win on raw intelligence. Plugsky wins on deployment control, economics & freedom from lock-in.

Dimension	Global APIs	Plugsky
Deployment control	Limited	Your cloud, our cloud, or on-prem
Pricing	Per-token, rate-limit surprise bills	Flat-rate packages — unlimited usage within fair-use, no per-token bills
Data residency	Plan-dependent	Any region you choose — UAE, KSA, EU, US, on-prem
White-label	Limited	Core capability — rebrand models, resell under your own brand
GPU capacity	Centrally owned	Hybrid: owned data center + GPU Share Network + partner clouds
Models	1-2 first-party + paid add-ons	31 first-party + partner models behind one OpenAI-compatible API
Agent Cloud	Beta, separate billing	Built-in, OpenAI Assistants-compatible, free on every plan
RAG (Knowledge)	Add-on, separate vendor	Built-in, 4096-dim embeddings, private collections, included free
Model Fusion	Not available	First-class — fan out, vote, merge, save 60-80% on cost
Marketplace	Not available	100+ ready-made agents, tools, prompt packs
Compliance	Plan-dependent	GDPR, PDPL, ISO 27001, SOC 2 Type II, HIPAA-ready, 99.99% SLA
Languages	English-first	50+ languages, Arabic dialects native, RTL UI
Integrations	A few SDKs	58+ — every IDE, CLI, desktop, framework, no-code tool
Onboarding	Read the docs, figure it out	Free trial, change base_url, ship in 5 minutes

Customer stories

Real results, real outcomes

“Ministry cut citizen service response from 48 hours to 4 minutes. 500K+ inquiries/month. Satisfaction jumped 67% → 91%. Zero IT headcount added.”

Head of AI · Government

“FinTech switched from AWS to Azure in 24 hours with zero code changes. 43% cost reduction. Expanded to 3 countries instantly.”

COO · FinTech

“Hospital deployed HIPAA-compliant, air-gapped AI in 5 days. 50K+ records/month. Physicians save 8 hrs/week. Zero breaches.”

Director of IT · Healthcare

“We hit regulatory roadblocks expanding to Saudi on AWS. Plugsky deployed identical LLM infrastructure across AWS (UAE), Azure (Saudi) and on-prem (Egypt) instantly. This level of freedom is unprecedented in enterprise AI.”

Jordan Lee · CFO, Brightwave

Pricing

Simple pricing — self-serve and enterprise packages

Start free, then pick the plan that fits. Launch special: 30% off monthly, 50% off yearly. Need volume, compliance, or sovereign deployment? See enterprise packages →

	Self-serve Free Full access to test	Self-serve New Hobby Generous limits	Self-serve Most popular Starter Unlimited — fair use	Self-serve Builder Unlimited — fair use	Self-serve Scale Unlimited — fair use	Enterprise Starter Enterprise First enterprise pilot	Enterprise Most popular Growth Enterprise Scaling AI internally	Enterprise Enterprise Regulated industries	Enterprise Sovereign AI Cloud Government & banks
Price	Free	$8 $5.6/mo −30% launch	$20 $14/mo −30% launch	$60 $42/mo −30% launch	$120 $84/mo −30% launch	$15K – $25K/year	$50K – $100K/year	$150K – $500K+/year	$500K – $2M+/year
Billed annually (−50% off)	—	$48/yr −50%	$120/yr −50%	$360/yr −50%	$720/yr −50%	—	—	—	—
Usage	Unlimited* — all self-serve plans				By deployment — annual contract
Best for	Full access to test	Generous limits	Unlimited — fair use	Unlimited — fair use	Unlimited — fair use	First enterprise pilot	Teams scaling AI internally	Regulated companies	Government & banks
Models	plugsky-micro only	up to plugsky-plus	up to plugsky-pro	up to plugsky-max	all + plugsky-frontier	up to plugsky-frontier	all models	all + private models	all + custom fine-tuning
Deployments	1	2	5	20	Unlimited	1	Up to 5	Unlimited	Unlimited + sovereign
Seats	1	2	5	20	25	5	25	Unlimited	Unlimited
API keys	2	5	20	Unlimited	Unlimited	Unlimited	Unlimited	Unlimited	Unlimited
Rate limit*	Low req/min	Higher req/min	120 req/min	300 req/min	1,000 req/min	Custom	Custom	Dedicated	Sovereign dedicated
Knowledge (RAG)	Minimal docs	Small docs	1,000 docs	20,000 docs	200,000 docs	5,000 docs	50,000 docs	Unlimited	Unlimited + private
Support	Community	Email	Email	Priority	Priority + onboarding	Email 48h SLA	Slack channel	Named engineer	24/7 + dedicated CSM
SLA	—	—	—	—	99.9%	99.9%	99.95%	99.99%	Sovereign + custom
SSO / SAML	—	—	—	—	—	—	✓	✓	✓
Audit logs	—	—	—	—	✓	Basic	Full	Full + export	Full + SIEM export
On-prem / VPC	—	—	—	—	—	—	Optional	Optional	Air-gapped available
DPA & Compliance	—	—	—	—	—	DPA included	DPA + SOC 2	DPA + SOC 2 + HIPAA	FedRAMP + sovereign

* Unlimited usage subject to fair-use rate limits. Enterprise packages are annual contracts, billed yearly. Custom configurations available — contact enterprise@plugsky.com for a tailored quote.

🛠

Professional Services Add-on

$50K – $500K

Setup, integration, RAG, fine-tuning, and migration support from our engineering team.

Implementation & onboarding Custom workflows RAG over your data Fine-tuning & evaluation

⚙️

Managed AI Ops Add-on

+30% uplift

Plugsky operates the AI stack end-to-end — monitoring, optimization, model/runtime operations, and 24/7 support.

24/7 monitoring + on-call Runtime optimization Model & prompt tuning Cost optimization

Feature	Free $0	Hobby $5.6 / mo −30%	Starter $14 / mo −30%	Builder $42 / mo −30%	Scale $84 / mo −30%
⚡ Core platform
All 31+ AI models (plugsky-micro → plugsky-frontier + Kimi, Qwen, DeepSeek, Llama, Mistral, NVIDIA Nemotron, Gemma)	✓	up to plus	up to pro	up to max	all 31+ + plugsky-frontier
OpenAI-compatible API	✓	✓	✓	✓	✓
Streaming & function calling	✓	✓	✓	✓	✓
Vision & multimodal	✓	✓	✓	✓	✓
JSON mode & structured outputs	✓	✓	✓	✓	✓
Embeddings (4096-dim, 100+ languages)	✓	✓	✓	✓	✓
Flux model (multi-model fan-out & routing)	—	✓	✓	✓	✓
Fair-use rate limit (req/min)	Low	Higher	120	300	1,000
API keys	2	5	20	Unlimited	Unlimited
Usage dashboard & analytics	✓	✓	✓ advanced	✓ custom	✓ custom
🎮 Playground Beta (in-browser chat)
In-browser chat playground with all 31+ models	✓	✓	✓	✓	✓
In-browser file upload & RAG playground	✓	✓	✓	✓	✓
In-browser Flux fan-out playground	—	✓	✓	✓	✓
Local file storage (IndexedDB, 50 MB)	10 MB	✓	✓	✓	✓
Tabbed chat history (in-browser)	—	✓	✓	✓	✓
Semantic search over your local files	—	✓	✓	✓	✓
Share chat tabs as a short URL	✓	✓	✓	✓	✓
🔌 Plugins (function calling, webhooks, custom tools)
Plugin sandbox & built-in plugins (calculator, web search, weather, stock, translator, code-run, SQL, image-gen)	✓	✓	✓	✓	✓
Custom webhooks (HTTP, function)	—	✓	✓	✓	✓
Bring-your-own Python plugin	—	—	✓	✓	✓
Plugin streaming (SSE, JSON events)	✓	✓	✓	✓	✓
Per-plugin auth scopes & rate limits	—	—	✓	✓	✓
🤖 Agents (system prompt + tools bundles)
Pin & unpin agents to chat (per-tab agents)	—	✓	✓	✓	✓
OpenAI Assistants-compatible agents	✓	✓	✓	✓	✓
Function calling & tools	✓	✓	✓	✓	✓
Conversation memory (across turns)	✓	✓	✓	✓	✓
Code interpreter (sandboxed Python execution)	—	✓	✓	✓	✓
File search & vector retrieval	—	✓	✓	✓	✓
Custom tools (BYO Python)	—	—	✓	✓	✓
Agent Studio (create, edit, pin agents)	✓	✓	✓	✓	✓
GPU Share Network (verified operators contribute idle GPUs)	—	—	—	✓	✓
🛒 Marketplace (ready-made agents, tools, prompt packs)
100+ ready-made agents, tools, prompt packs	—	✓	✓	✓	✓
Browse, search, filter the marketplace	✓	✓	✓	✓	✓
One-click deploy to your workspace	—	✓	✓	✓	✓
Submit your own agents/tools to the marketplace	—	—	—	✓	✓
Marketplace earnings (70% revshare to creators)	—	—	—	—	✓
📖 Knowledge (RAG)
Document upload & chunking (PDF, DOCX, MD, code)	✓	✓	✓	✓	✓
Private collections	Minimal	Small	1,000	20,000	200,000
Multilingual embeddings (100+ languages)	✓	✓	✓	✓	✓
Hybrid search (vector + keyword)	—	✓	✓	✓	✓
Citations & source highlighting	—	✓	✓	✓	✓
Re-rank & query rewriting	—	—	—	✓	✓
🔀 Model Fusion (multi-model fan-out)
Sequential chains	—	✓	✓	✓	✓
Parallel fan-out (up to 8 models)	—	—	✓	✓	✓
Cost-saver auto-escalation (micro → pro → max)	—	✓	✓	✓	✓
Majority vote & merge strategies	—	—	✓	✓	✓
Custom routing rules	—	—	—	✓	✓
Flux model preset library	—	✓	✓	✓	✓
🌍 Integrations & SDKs
58+ client integrations (IDEs, CLIs, desktop, frameworks)	✓	✓	✓	✓	✓
plugsky CLI & Desktop (Jan, Open WebUI, Chatbox, Msty)	✓	✓	✓	✓	✓
IDEs (VS Code, JetBrains, Cursor, Cline, Continue)	✓	✓	✓	✓	✓
CLIs (plugsky CLI, Aider, Goose, OpenCode, Crush)	✓	✓	✓	✓	✓
Frameworks (LangChain, LlamaIndex, Haystack, CrewAI, PydanticAI)	✓	✓	✓	✓	✓
No-code (Flowise, Langflow, Dify, Activepieces, Retool)	✓	✓	✓	✓	✓
Bring-your-own models (custom fine-tunes)	—	—	—	✓	✓
👥 Team & security
Team seats	1	2	5	20	25
Role-based access control (RBAC)	—	—	✓	✓	✓
Audit logs (workspace-level)	—	30 days	30 days	unlimited	unlimited
SSO / SAML	—	—	—	—	✓
GDPR / PDPL compliance	✓	✓	✓	✓	✓
SOC 2 Type II / ISO 27001	—	—	—	—	✓
Uptime SLA	—	—	99.9%	99.95%	99.99%
Priority email support	Community	Email	Email	Priority	Priority + onboarding
💳 Billing
Flat monthly fee (no per-token)	✓	✓	✓	✓	✓
Annual billing (save 50% with launch promo)	—	✓	✓	✓	✓
No overage fees	✓	✓	✓	✓	✓
Cancel anytime	✓	✓	✓	✓	✓
Discount coupons & promo codes	✓	✓	✓	✓	✓
Affiliate / referral program	✓	✓	✓	✓	✓

Need an enterprise or sovereign-cloud plan? Contact our team → — every enterprise contract is priced to your deployment and security requirements.

Need sovereign deployment? Private endpoints, dedicated capacity, on-prem/VPC, SSO/SAML, DPA, custom SLA — annual, priced by deployment.

From the blog

Articles & insights

Control

Why enterprises need AI they control

Data residency, compliance and the case for running inference where you choose.

Engineering

Migrating from OpenAI in one line

Switch your base URL, keep your stack — OpenAI-compatible by design.

Product

Introducing Agent Cloud

Build agents with tools, memory and private RAG on infrastructure you control.

Read all articles

FAQ

Everything you need to know

The 20 questions customers ask most before signing up.

Is usage really unlimited?

Yes. There are no per-token charges on any plan. Every plan is flat-rate, and usage is gated only by the fair-use rate limit for your tier (60, 120, 300, or 1,000 req/min). You can call a million times on a $20 plan and still pay $20.

How do I migrate from OpenAI?

Change your base_url to https://api.plugsky.com/v1, set your API key, and use model="plugsky-pro" instead of gpt-4o. Your existing code, SDK, and prompts work unchanged. Most teams migrate in 5 minutes.

What models do you support?

31 first-party models (plugsky-micro through plugsky-frontier, plus Kimi, Qwen, DeepSeek, Llama, Mistral, NVIDIA Nemotron, Gemma) and the same OpenAI-compatible endpoint also supports partner models. Free, paid, vision, reasoning, code, embeddings — all behind one API.

Can I switch clouds after deployment?

Yes. Switch from AWS to Azure to Google Cloud to on-prem without rebuilding your infrastructure. That is the core benefit of Plugsky — no vendor lock-in, ever.

Do you support on-premise deployment?

Yes — Enterprise includes on-premise and air-gapped deployment for maximum data control. Runs in your own data center, on your own hardware, with no outbound network calls.

What is Model Fusion?

Model Fusion lets you call multiple models in parallel and pick the best result. Set model="plugsky-fusion" and Plugsky runs your default chain (e.g. micro → pro → max) — fast and cheap, with auto-escalation on hard prompts. Saves 60-80% on production traffic.

What is Agent Cloud?

Agent Cloud is our platform for building AI agents with function-calling, memory, tools, and orchestration. OpenAI Assistants-compatible for one-line migration. Free on every plan.

What is the Knowledge (RAG) feature?

Private RAG over your own documents. Upload PDFs, code, or knowledge bases; Plugsky chunks them, embeds with 4096-dim vectors, and lets you chat against your data in one API call. No separate vector DB needed.

How does the marketplace work?

100+ ready-made agents, tools, and prompt packs. Browse in the dashboard, click "Deploy", and they're live in your workspace. Free with every plan.

What integrations do you support?

58+ — every major IDE (VS Code, JetBrains, Cursor, Cline, Continue), CLI (plugsky CLI, Aider, Goose, OpenCode, Crush), desktop (Jan, Open WebUI, Chatbox, Jan, Msty, LM Studio), framework (LangChain, LlamaIndex, Haystack, CrewAI, PydanticAI, Semantic Kernel), and no-code (Flowise, Langflow, Dify, Activepieces, Pipedream, Retool).

What compliance certifications do you have?

ISO 27001, SOC 2 Type II, GDPR, PDPL-compliant, HIPAA-ready. FedRAMP in progress. Enterprise plans include a signed DPA, BAA, and full audit log export.

How quickly can I deploy?

Self-serve plans: under 5 minutes. Enterprise private cloud: 1-3 days. On-premise: 3-7 days. White-glove migration with a named engineer: included in Enterprise.

What is your uptime guarantee?

Standard SLA 99.9%, Enhanced 99.95%, Custom up to 99.99% with financial penalties for downtime. The /status page is the public source of truth for live uptime.

Do you offer custom contracts?

Yes. Enterprise customers can negotiate custom terms, payment schedules, SLAs, regional deployment, and dedicated support engineers.

How is data residency handled?

You pick the region — UAE-1, UAE-2, KSA, EU, US, or on-prem. Data never leaves the region you select. Full audit log of every API call is stored in the same region.

Can I use my own models?

Yes. The Enterprise plan supports custom model uploads (fine-tuned or open-weights) served from your own infrastructure, behind the same OpenAI-compatible API. White-label option included.

How does billing work?

Flat monthly or annual fee. No per-token charges, no per-request charges, no overage fees. You can upgrade, downgrade, or cancel anytime. Annual plans save 20%.

What is the free trial?

7 days of full access to all 31 models, Agent Cloud, RAG, Model Fusion, marketplace, and all integrations. At the end, pick a plan or your account goes read-only (we never auto-charge).

How does the GPU Share Network work?

Verified operators contribute idle GPUs and earn 70% of the revenue Plugsky routes to them. Plugsky handles auth, metering, and routing. Non-sensitive workloads only — every payload is classified before routing.

Where can I get support?

Free: documentation, community Discord, GitHub issues. Pro: priority email, 24h response. Enterprise: named engineer, 4h SLA, Slack/Teams/phone. SLA: 99.9% uptime or you get credit.

Build on AI infrastructure you control today

Free to start. OpenAI-compatible. Live in minutes.