Embeddings API

Embeddings API — 1536d and 3072d vectors, OpenAI-compatible

Plugsky embeddings API returns dense vector representations of text in 1536d or 3072d. Use them for similarity, clustering, RAG retrieval, semantic search, and recommendation. OpenAI-ada-compatible endpoint.

An embedding is a dense numeric vector that captures the semantic meaning of text. Plugsky's embeddings API returns these vectors via the same OpenAI-compatible endpoint, so any tool or SDK that speaks OpenAI embeddings works unchanged.

Why embeddings

Embeddings turn text into a fixed-size vector where similar meaning = nearby points. This lets you:

  • Semantic search: find documents that match intent, not just keywords
  • Recommendation: surface items similar to what the user liked
  • Clustering: group similar items (tickets, reviews, products)
  • RAG retrieval: feed the most relevant chunks to an LLM
  • Anomaly detection: find outliers in user behavior or content

Two models, two use cases

Model Dim Best for
plugsky-embed-v11536General-purpose, OpenAI ada-compatible, fast
plugsky-embed-large3072High-recall retrieval, multilingual, longer text
python
from openai import OpenAI
client = OpenAI(base_url="https://api.plugsky.com/v1", api_key="sk-live-...")

emb = client.embeddings.create(
    model="plugsky-embed-large",
    input="Plugsky is an OpenAI-compatible AI platform with 18+ models.",
)
print(len(emb.data[0].embedding), "dimensions")
# 3072

Use cases

  • RAG retrieval: combine with the RAG API for end-to-end document Q&A
  • Semantic search: store vectors in pgvector, Pinecone, Qdrant, or your DB of choice
  • Duplicate detection: cosine similarity > 0.95 = near-duplicate
  • Recommendation: "more like this" feeds in production
  • Clustering: k-means on embeddings gives you topic clusters for free
  • Classification: zero-shot by comparing embedding to a labelled centroid

Batch and async

The endpoint accepts arrays of inputs (up to 2,048 strings per request, max 8,191 tokens each). For larger jobs, batch client-side and submit in chunks.

python
embs = client.embeddings.create(
    model="plugsky-embed-v1",
    input=["doc 1 text...", "doc 2 text...", "doc 3 text..."],
)
vectors = [e.embedding for e in embs.data]
# store in your vector DB

Frequently asked questions

Are embeddings cached?

No — every request computes a fresh embedding. Cache in your own layer if you re-embed the same content.

Can I bring my own embedding model?

On Enterprise contracts, yes. We support Cohere, Voyage, BGE, and custom models via the private endpoint.

What languages are supported?

plugsky-embed-v1 supports 50+ languages with strong cross-lingual retrieval. plugsky-embed-large is best for multilingual and Arabic.

What are the pricing implications?

Self-serve plans include unlimited embeddings on every model in your tier — no per-vector meter.

Try the embeddings API

OpenAI-ada-compatible, 1536d or 3072d vectors, flat monthly pricing.

Start $5 trial → Estimate token cost