A private AI endpoint is a fully self-hosted instance of the Plugsky control plane, deployed inside your VPC or on your own hardware, exposing the same OpenAI-compatible API as the public cloud. You get all the platform features (model routing, RAG, agents, observability) without the data ever leaving your perimeter.
What is a private AI endpoint?
A private AI endpoint is the full Plugsky control plane — API gateway, model router, RAG engine, agent runtime, observability — packaged as a single deployment artifact (Helm chart, Terraform module, or air-gapped OVA) that you run inside your own infrastructure.
It exposes the same https://your-endpoint/v1/ OpenAI surface as the public cloud, so every SDK and tool that speaks OpenAI works against it without modification.
Deployment options
VPC (private cloud)
Helm install on EKS / AKS / GKE / OpenShift. Plugsky supports 6+ regions. Customer-managed keys. VPN or PrivateLink connectivity.
On-prem (air-gapped)
OVA or PXE image for VMware / KVM / bare metal. Offline update channels. Hardware HSM support. No internet egress required.
Hybrid (managed + private)
Public control plane, private inference. You keep models and data in your VPC while Plugsky handles routing, billing, and observability.
What you operate vs. what Plugsky operates
| Component | You | Plugsky |
|---|---|---|
| Compute & GPUs | ● | ○ |
| Networking & firewall | ● | ○ |
| Key custody (BYOK) | ● | ○ |
| Patch & update cadence | ● | advisory |
| Model registry | ○ | ● |
| Security patches & CVEs | ○ | ● |
| Compliance attestations | ○ | ● |
Cost and timeline
Typical private endpoint deployment:
- VPC: 4-6 weeks, $50K-$200K first-year, then $30K-$80K/year
- On-prem air-gapped: 3-6 months, $250K-$1M first-year (hardware + integration)
- Hybrid: 2-3 weeks, $20K-$50K first-year, $10K-$30K/year
Use our private LLM cost estimator to model your scenario.
Frequently asked questions
What hardware do I need for a private endpoint?
A single H100 or A100 node is enough for 18B-parameter models. Larger models (70B+) need 2-4 GPUs. Our GPU capacity calculator gives exact sizing.
Can I run a private endpoint behind my firewall?
Yes. Plugsky supports fully air-gapped deployments with offline update channels and hardware HSM integration.
Who patches the model registry?
Plugsky ships signed model updates on a schedule you choose (weekly / monthly / quarterly). You approve each update before deployment.
Is support included?
Yes — Enterprise contracts include 24/7 support with a 1-hour response SLA for P1.
Scope a private endpoint
Walk through a deployment plan with a Plugsky solutions engineer. NDA available on request.
Scope a pilot → See enterprise plans