An embedding is a dense numeric vector that captures the semantic meaning of text. Plugsky's embeddings API returns these vectors via the same OpenAI-compatible endpoint, so any tool or SDK that speaks OpenAI embeddings works unchanged.
Why embeddings
Embeddings turn text into a fixed-size vector where similar meaning = nearby points. This lets you:
- Semantic search: find documents that match intent, not just keywords
- Recommendation: surface items similar to what the user liked
- Clustering: group similar items (tickets, reviews, products)
- RAG retrieval: feed the most relevant chunks to an LLM
- Anomaly detection: find outliers in user behavior or content
Two models, two use cases
| Model | Dim | Best for |
|---|---|---|
plugsky-embed-v1 | 1536 | General-purpose, OpenAI ada-compatible, fast |
plugsky-embed-large | 3072 | High-recall retrieval, multilingual, longer text |
from openai import OpenAI
client = OpenAI(base_url="https://api.plugsky.com/v1", api_key="sk-live-...")
emb = client.embeddings.create(
model="plugsky-embed-large",
input="Plugsky is an OpenAI-compatible AI platform with 18+ models.",
)
print(len(emb.data[0].embedding), "dimensions")
# 3072
Use cases
- RAG retrieval: combine with the RAG API for end-to-end document Q&A
- Semantic search: store vectors in pgvector, Pinecone, Qdrant, or your DB of choice
- Duplicate detection: cosine similarity > 0.95 = near-duplicate
- Recommendation: "more like this" feeds in production
- Clustering: k-means on embeddings gives you topic clusters for free
- Classification: zero-shot by comparing embedding to a labelled centroid
Batch and async
The endpoint accepts arrays of inputs (up to 2,048 strings per request, max 8,191 tokens each). For larger jobs, batch client-side and submit in chunks.
embs = client.embeddings.create(
model="plugsky-embed-v1",
input=["doc 1 text...", "doc 2 text...", "doc 3 text..."],
)
vectors = [e.embedding for e in embs.data]
# store in your vector DB
Frequently asked questions
Are embeddings cached?
No — every request computes a fresh embedding. Cache in your own layer if you re-embed the same content.
Can I bring my own embedding model?
On Enterprise contracts, yes. We support Cohere, Voyage, BGE, and custom models via the private endpoint.
What languages are supported?
plugsky-embed-v1 supports 50+ languages with strong cross-lingual retrieval. plugsky-embed-large is best for multilingual and Arabic.
What are the pricing implications?
Self-serve plans include unlimited embeddings on every model in your tier — no per-vector meter.
Try the embeddings API
OpenAI-ada-compatible, 1536d or 3072d vectors, flat monthly pricing.
Start $5 trial → Estimate token cost