RAG API — chat with your documents, private by default

Retrieval-Augmented Generation (RAG) is how production AI systems ground a model in your own data. Most RAG stacks are assembled from 4-6 vendors (vector DB, embedder, retriever, reranker, LLM, observability). Plugsky's RAG API collapses that into a single managed service.

What is RAG-as-a-service?

Plugsky RAG API exposes three endpoints that work together:

POST /v1/embeddings — turn text into vectors (Plugsky or OpenAI ada-style)
POST /v1/rag/collections — create and manage vector collections
POST /v1/rag/query — query a collection, get back ranked chunks with citations

You can also use the embeddings endpoint standalone for similarity, clustering, or hybrid search in your own pipeline.

Embeddings API

python

from openai import OpenAI
client = OpenAI(base_url="https://api.plugsky.com/v1", api_key="sk-live-...")

emb = client.embeddings.create(
    model="plugsky-embed-v1",
    input="Plugsky is an OpenAI-compatible AI platform."
)
print(len(emb.data[0].embedding), "dimensions")

Plugsky ships plugsky-embed-v1 (1536 dimensions, OpenAI ada-compatible) and plugsky-embed-large (3072 dimensions, higher quality).

Collections, documents, queries

python

# 1. Create a collection
collection = client.rag.collections.create(name="acme-handbook")

# 2. Upload documents
client.rag.documents.create(
    collection_id=collection.id,
    file=open("handbook.pdf", "rb"),
    metadata={"department": "engineering", "year": 2026}
)

# 3. Query
result = client.rag.query.create(
    collection_id=collection.id,
    query="What is our vacation policy?",
    top_k=5,
)
for chunk in result.chunks:
    print(f"[{chunk.score:.2f}] {chunk.text[:100]}...")
    print(f"   source: {chunk.source}")

Documents are chunked, embedded, and indexed automatically. Retrieval supports keyword, vector, and hybrid search with optional cross-encoder reranking.

Citations and grounding

Every RAG query returns ranked chunks with source attribution. You can pass the chunks to any Plugsky chat model and get an answer with inline citations — Plugsky's RAG + chat composition handles prompt assembly, citation formatting, and refusal when the corpus doesn't have an answer.

Private RAG and on-prem

On self-serve plans, your documents are stored in Plugsky's managed vector store with per-collection encryption at rest. They never train a model.

On Enterprise, you can deploy RAG inside your own VPC with customer-managed keys (BYOK), private endpoints, and audit logs. Use cases include legal contract review, internal knowledge bases, financial filings, and medical records.

Frequently asked questions

How big can a RAG collection be?

Self-serve supports up to 200K documents per collection (Starter 1K, Builder 20K, Scale 200K). Enterprise has no hard cap.

What embedding models are supported?

plugsky-embed-v1 (1536d, OpenAI ada-compatible) and plugsky-embed-large (3072d, higher recall). You can also bring your own embeddings.

Does RAG use my data to train models?

Never. Documents are stored encrypted at rest and used only for retrieval. They are not used for training any model.

Can I use Plugsky RAG with on-prem?

Yes. Enterprise tier supports customer-managed keys, private endpoints, and full VPC deployment.

Try RAG in your stack

Embeddings + collections + queries in a single OpenAI-compatible API. Free trial for 7 days.

Start Free → Try the RAG sandbox

What is RAG-as-a-service?

Embeddings API

Collections, documents, queries

Citations and grounding

Private RAG and on-prem

Frequently asked questions

Try RAG in your stack

Related

Embeddings API

Private AI endpoint

Data residency AI