Embeddings API — OpenAI ada-compatible vectors

An embedding is a dense numeric vector that captures the semantic meaning of text. Plugsky's embeddings API returns these vectors via the same OpenAI-compatible endpoint, so any tool or SDK that speaks OpenAI embeddings works unchanged.

Why embeddings

Embeddings turn text into a fixed-size vector where similar meaning = nearby points. This lets you:

Semantic search: find documents that match intent, not just keywords
Recommendation: surface items similar to what the user liked
Clustering: group similar items (tickets, reviews, products)
RAG retrieval: feed the most relevant chunks to an LLM
Anomaly detection: find outliers in user behavior or content

Two models, two use cases

Model	Dim	Best for
`plugsky-embed-v1`	1536	General-purpose, OpenAI ada-compatible, fast
`plugsky-embed-large`	3072	High-recall retrieval, multilingual, longer text

python

from openai import OpenAI
client = OpenAI(base_url="https://api.plugsky.com/v1", api_key="sk-live-...")

emb = client.embeddings.create(
    model="plugsky-embed-large",
    input="Plugsky is an OpenAI-compatible AI platform with 18+ models.",
)
print(len(emb.data[0].embedding), "dimensions")
# 3072

Use cases

RAG retrieval: combine with the RAG API for end-to-end document Q&A
Semantic search: store vectors in pgvector, Pinecone, Qdrant, or your DB of choice
Duplicate detection: cosine similarity > 0.95 = near-duplicate
Recommendation: "more like this" feeds in production
Clustering: k-means on embeddings gives you topic clusters for free
Classification: zero-shot by comparing embedding to a labelled centroid

Batch and async

The endpoint accepts arrays of inputs (up to 2,048 strings per request, max 8,191 tokens each). For larger jobs, batch client-side and submit in chunks.

python

embs = client.embeddings.create(
    model="plugsky-embed-v1",
    input=["doc 1 text...", "doc 2 text...", "doc 3 text..."],
)
vectors = [e.embedding for e in embs.data]
# store in your vector DB

Frequently asked questions

Are embeddings cached?

No — every request computes a fresh embedding. Cache in your own layer if you re-embed the same content.

Can I bring my own embedding model?

On Enterprise contracts, yes. We support Cohere, Voyage, BGE, and custom models via the private endpoint.

What languages are supported?

plugsky-embed-v1 supports 50+ languages with strong cross-lingual retrieval. plugsky-embed-large is best for multilingual and Arabic.

What are the pricing implications?

Self-serve plans include unlimited embeddings on every model in your tier — no per-vector meter.

Try the embeddings API

OpenAI-ada-compatible, 1536d or 3072d vectors, flat monthly pricing.

Start Free → Estimate token cost

Why embeddings

Two models, two use cases

Use cases

Batch and async

Frequently asked questions

Try the embeddings API

Related

RAG API

LLM API

Model routing