Qdrant
The high-performance vector database for AI applications.
Qdrant is a vector similarity search engine written in Rust. It stores embeddings (dense vectors from OpenAI, Cohere, Voyage, BGE, etc.) and serves nearest-neighbour queries via HNSW indexes — fast enough for production RAG, semantic search, recommendations, and anomaly detection. Pier deploys the official Docker image with persistent storage and the dashboard UI.
Deploy with Pier
- 1 Open the Pier dashboard and click Add service.
- 2 Pick Qdrant from the template list.
- 3 Choose the version, set a service name, and Pier provisions the container, storage, and ports automatically.
- 4 Attach a domain if you want HTTPS. Traefik auto-provisions the Let's Encrypt certificate.
What is Qdrant?
Qdrant is an open-source vector similarity search engine written in Rust. It exists to answer one question very, very fast: “given this query vector, find the k nearest stored vectors.” Where most general-purpose databases struggle with high-dim vector workloads, Qdrant is built specifically for HNSW indexes, payload filtering, and sub-50ms response times over millions of vectors.
The modern AI stack — RAG, semantic search, recommendations, anomaly detection — all need this primitive. Embeddings come from models (OpenAI, Cohere, Voyage, local BGE/MiniLM); the embeddings live in Qdrant; queries combine vector similarity with payload filters (“find similar to this, but only in English, posted last week, by user X”).
How Pier deploys it
Pier uses the official qdrant/qdrant Docker image. Default ports are
6333 (REST + Dashboard UI) and 6334 (gRPC). The data volume mounts at
/qdrant/storage. Pier auto-generates an API key (set via QDRANT__SERVICE__API_KEY)
and exposes it on the service detail page.
The built-in Dashboard UI is reachable on port 6333 (path /dashboard).
Attach a domain in Pier’s Domains tab for HTTPS via Traefik.
When NOT to use Qdrant
If you already have data in Postgres and your dataset is under ~1M vectors, pgvector is simpler — one fewer database to operate. For pure managed-service simplicity at any scale, Pinecone trades cost for zero-ops. Qdrant wins when you want self-hosted control, multi-million- vector scale, rich payload filtering, and hybrid sparse+dense search.
Key features
HNSW index on disk
Memory-mapped HNSW (Hierarchical Navigable Small World) graph — millisecond search over millions of high-dim vectors without holding everything in RAM.
Payload filtering
Attach JSON metadata (tags, dates, user IDs) to each vector. Filter at search time — "find similar docs that I own, modified in the last 30 days" in one query.
REST + gRPC + Python/JS/Rust/Go clients
First-class clients for Python, JS, Rust, Go, .NET, Java. Both REST and gRPC supported; gRPC is faster for high-QPS workloads.
Hybrid search (sparse + dense)
Combine BM25-style sparse vectors with dense embeddings. Hybrid often outperforms either alone for RAG over enterprise documents.
Quantization
Scalar, product, and binary quantization reduce index size 4-32× with minimal recall loss. Run billion-vector indexes on commodity VMs.
Sharding & replication
Horizontal scaling via shards; replicas for redundancy. Pier ships single-node; production-scale clusters need manual orchestration.
Use cases
RAG / chatbot retrieval
The "R" in retrieval-augmented generation. Embed your docs with OpenAI / BGE / Voyage; store in Qdrant; query at chat time for top-k relevant chunks.
Semantic search
Find documents by meaning, not just keywords. "Who has experience with React state management?" returns matches even if they wrote "Redux/Zustand expertise."
Recommendation systems
User → item embeddings → similar items. Cold-start friendly when you have item embeddings but no behavioural data.
Duplicate / near-duplicate detection
Hashing-based dedupe misses paraphrases. Embedding-based dedupe catches them. Vital for content moderation, plagiarism, support ticket dedupe.
Image / multimodal search
CLIP embeddings let you search images by text and vice versa. Same Qdrant collection, multimodal queries.
Code examples
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
client = QdrantClient(url="http://qdrant:6333")
client.create_collection(
collection_name="docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
) from qdrant_client.models import PointStruct
client.upsert(
collection_name="docs",
points=[
PointStruct(
id=42,
vector=[0.1, 0.2, ...], # 1536-dim
payload={"title": "Hello", "url": "/posts/hello", "lang": "en"},
)
],
) from qdrant_client.models import Filter, FieldCondition, MatchValue
hits = client.search(
collection_name="docs",
query_vector=[0.1, 0.2, ...],
query_filter=Filter(
must=[FieldCondition(key="lang", match=MatchValue(value="en"))]
),
limit=5,
) from qdrant_client.models import Prefetch
results = client.query_points(
collection_name="docs",
prefetch=[
Prefetch(query=dense_vector, using="dense", limit=20),
Prefetch(query=sparse_vector, using="sparse", limit=20),
],
query={ "fusion": "rrf" }, # reciprocal rank fusion
limit=5,
) How it compares
| vs Pinecone (managed) | Pinecone is hosted, polished, expensive. Qdrant is OSS, self-hosted, comparable performance. Pick Qdrant when you want control over data, costs, or need on-prem deployment. |
| vs Weaviate | Weaviate has built-in modules for embedding generation (calls OpenAI/Cohere for you). Qdrant is leaner — you generate embeddings client-side. Both excellent; Qdrant tends to be faster, Weaviate has nicer hybrid features. |
| vs pgvector (PostgreSQL extension) | pgvector is great when you already have data in Postgres and don't want a second database. Qdrant outperforms pgvector at scale (10M+ vectors) and has richer payload filtering. |
| vs Milvus | Milvus is a Chinese-led OSS project with a similar feature set. Heavier to operate (more components). Qdrant is simpler — single binary, single container. |
Frequently asked questions
Default port and protocol?
Where do embeddings come from?
Persistence?
Memory and disk requirements?
Authentication?
Backups?
Which version does Pier deploy?
Related services
Deploy on your VPS
Qdrant is a vector similarity search engine written in Rust. It stores embeddings (dense vectors from OpenAI, Cohere, Voyage, BGE, etc.) and serves nearest-neighbour queries via HNSW indexes — fast enough for production RAG, semantic search, recommendations, and anomaly detection. Pier deploys the official Docker image with persistent storage and the dashboard UI.
Deploy this service →