RAG / Embeddings

RAG API that keeps your documents private

Plugsky's RAG API turns PDFs, docs, and structured data into a private knowledge base. Embeddings + retrieval + citations in one OpenAI-compatible endpoint. Your data never trains a model — and on Enterprise, you can keep it inside your VPC.

Retrieval-Augmented Generation (RAG) is how production AI systems ground a model in your own data. Most RAG stacks are assembled from 4-6 vendors (vector DB, embedder, retriever, reranker, LLM, observability). Plugsky's RAG API collapses that into a single managed service.

What is RAG-as-a-service?

Plugsky RAG API exposes three endpoints that work together:

  • POST /v1/embeddings — turn text into vectors (Plugsky or OpenAI ada-style)
  • POST /v1/rag/collections — create and manage vector collections
  • POST /v1/rag/query — query a collection, get back ranked chunks with citations

You can also use the embeddings endpoint standalone for similarity, clustering, or hybrid search in your own pipeline.

Embeddings API

python
from openai import OpenAI
client = OpenAI(base_url="https://api.plugsky.com/v1", api_key="sk-live-...")

emb = client.embeddings.create(
    model="plugsky-embed-v1",
    input="Plugsky is an OpenAI-compatible AI platform."
)
print(len(emb.data[0].embedding), "dimensions")

Plugsky ships plugsky-embed-v1 (1536 dimensions, OpenAI ada-compatible) and plugsky-embed-large (3072 dimensions, higher quality).

Collections, documents, queries

python
# 1. Create a collection
collection = client.rag.collections.create(name="acme-handbook")

# 2. Upload documents
client.rag.documents.create(
    collection_id=collection.id,
    file=open("handbook.pdf", "rb"),
    metadata={"department": "engineering", "year": 2026}
)

# 3. Query
result = client.rag.query.create(
    collection_id=collection.id,
    query="What is our vacation policy?",
    top_k=5,
)
for chunk in result.chunks:
    print(f"[{chunk.score:.2f}] {chunk.text[:100]}...")
    print(f"   source: {chunk.source}")

Documents are chunked, embedded, and indexed automatically. Retrieval supports keyword, vector, and hybrid search with optional cross-encoder reranking.

Citations and grounding

Every RAG query returns ranked chunks with source attribution. You can pass the chunks to any Plugsky chat model and get an answer with inline citations — Plugsky's RAG + chat composition handles prompt assembly, citation formatting, and refusal when the corpus doesn't have an answer.

Private RAG and on-prem

On self-serve plans, your documents are stored in Plugsky's managed vector store with per-collection encryption at rest. They never train a model.

On Enterprise, you can deploy RAG inside your own VPC with customer-managed keys (BYOK), private endpoints, and audit logs. Use cases include legal contract review, internal knowledge bases, financial filings, and medical records.

Frequently asked questions

How big can a RAG collection be?

Self-serve supports up to 200K documents per collection (Starter 1K, Builder 20K, Scale 200K). Enterprise has no hard cap.

What embedding models are supported?

plugsky-embed-v1 (1536d, OpenAI ada-compatible) and plugsky-embed-large (3072d, higher recall). You can also bring your own embeddings.

Does RAG use my data to train models?

Never. Documents are stored encrypted at rest and used only for retrieval. They are not used for training any model.

Can I use Plugsky RAG with on-prem?

Yes. Enterprise tier supports customer-managed keys, private endpoints, and full VPC deployment.

Try RAG in your stack

Embeddings + collections + queries in a single OpenAI-compatible API. Free trial for 7 days.

Start $5 trial → Try the RAG sandbox