Private AI endpoint — deploy LLM in your VPC

A private AI endpoint is a fully self-hosted instance of the Plugsky control plane, deployed inside your VPC or on your own hardware, exposing the same OpenAI-compatible API as the public cloud. You get all the platform features (model routing, RAG, agents, observability) without the data ever leaving your perimeter.

What is a private AI endpoint?

A private AI endpoint is the full Plugsky control plane — API gateway, model router, RAG engine, agent runtime, observability — packaged as a single deployment artifact (Helm chart, Terraform module, or air-gapped OVA) that you run inside your own infrastructure.

It exposes the same https://your-endpoint/v1/ OpenAI surface as the public cloud, so every SDK and tool that speaks OpenAI works against it without modification.

Deployment options

VPC (private cloud)

Helm install on EKS / AKS / GKE / OpenShift. Plugsky supports 6+ regions. Customer-managed keys. VPN or PrivateLink connectivity.

On-prem (air-gapped)

OVA or PXE image for VMware / KVM / bare metal. Offline update channels. Hardware HSM support. No internet egress required.

Hybrid (managed + private)

Public control plane, private inference. You keep models and data in your VPC while Plugsky handles routing, billing, and observability.

What you operate vs. what Plugsky operates

Component	You	Plugsky
Compute & GPUs	●	○
Networking & firewall	●	○
Key custody (BYOK)	●	○
Patch & update cadence	●	advisory
Model registry	○	●
Security patches & CVEs	○	●
Compliance attestations	○	●

Cost and timeline

Typical private endpoint deployment:

VPC: 4-6 weeks, $50K-$200K first-year, then $30K-$80K/year
On-prem air-gapped: 3-6 months, $250K-$1M first-year (hardware + integration)
Hybrid: 2-3 weeks, $20K-$50K first-year, $10K-$30K/year

Use our private LLM cost estimator to model your scenario.

Frequently asked questions

What hardware do I need for a private endpoint?

A single H100 or A100 node is enough for 18B-parameter models. Larger models (70B+) need 2-4 GPUs. Our GPU capacity calculator gives exact sizing.

Can I run a private endpoint behind my firewall?

Yes. Plugsky supports fully air-gapped deployments with offline update channels and hardware HSM integration.

Who patches the model registry?

Plugsky ships signed model updates on a schedule you choose (weekly / monthly / quarterly). You approve each update before deployment.

Is support included?

Yes — Enterprise contracts include 24/7 support with a 1-hour response SLA for P1.

Scope a private endpoint

Walk through a deployment plan with a Plugsky solutions engineer. NDA available on request.

Scope a pilot → See enterprise plans

What is a private AI endpoint?

Deployment options

VPC (private cloud)

On-prem (air-gapped)

Hybrid (managed + private)

What you operate vs. what Plugsky operates

Cost and timeline

Frequently asked questions

Scope a private endpoint

Related

Sovereign AI cloud

Data residency AI

Air-gapped LLM