Private AI Endpoint

Private AI endpoint — same OpenAI API, fully inside your VPC

Plugsky private AI endpoint runs the same OpenAI-compatible API inside your VPC or on your hardware. Your data never leaves your perimeter. Your team keeps full operational control. Plugsky provides the platform, you keep the keys.

A private AI endpoint is a fully self-hosted instance of the Plugsky control plane, deployed inside your VPC or on your own hardware, exposing the same OpenAI-compatible API as the public cloud. You get all the platform features (model routing, RAG, agents, observability) without the data ever leaving your perimeter.

What is a private AI endpoint?

A private AI endpoint is the full Plugsky control plane — API gateway, model router, RAG engine, agent runtime, observability — packaged as a single deployment artifact (Helm chart, Terraform module, or air-gapped OVA) that you run inside your own infrastructure.

It exposes the same https://your-endpoint/v1/ OpenAI surface as the public cloud, so every SDK and tool that speaks OpenAI works against it without modification.

Deployment options

VPC (private cloud)

Helm install on EKS / AKS / GKE / OpenShift. Plugsky supports 6+ regions. Customer-managed keys. VPN or PrivateLink connectivity.

On-prem (air-gapped)

OVA or PXE image for VMware / KVM / bare metal. Offline update channels. Hardware HSM support. No internet egress required.

Hybrid (managed + private)

Public control plane, private inference. You keep models and data in your VPC while Plugsky handles routing, billing, and observability.

What you operate vs. what Plugsky operates

Component You Plugsky
Compute & GPUs
Networking & firewall
Key custody (BYOK)
Patch & update cadenceadvisory
Model registry
Security patches & CVEs
Compliance attestations

Cost and timeline

Typical private endpoint deployment:

  • VPC: 4-6 weeks, $50K-$200K first-year, then $30K-$80K/year
  • On-prem air-gapped: 3-6 months, $250K-$1M first-year (hardware + integration)
  • Hybrid: 2-3 weeks, $20K-$50K first-year, $10K-$30K/year

Use our private LLM cost estimator to model your scenario.

Frequently asked questions

What hardware do I need for a private endpoint?

A single H100 or A100 node is enough for 18B-parameter models. Larger models (70B+) need 2-4 GPUs. Our GPU capacity calculator gives exact sizing.

Can I run a private endpoint behind my firewall?

Yes. Plugsky supports fully air-gapped deployments with offline update channels and hardware HSM integration.

Who patches the model registry?

Plugsky ships signed model updates on a schedule you choose (weekly / monthly / quarterly). You approve each update before deployment.

Is support included?

Yes — Enterprise contracts include 24/7 support with a 1-hour response SLA for P1.

Scope a private endpoint

Walk through a deployment plan with a Plugsky solutions engineer. NDA available on request.

Scope a pilot → See enterprise plans