Qdrant

The high-performance vector database for AI applications.

Service #vector#search#ai#embeddings#similarity

Qdrant is a vector similarity search engine written in Rust. It stores embeddings (dense vectors from OpenAI, Cohere, Voyage, BGE, etc.) and serves nearest-neighbour queries via HNSW indexes — fast enough for production RAG, semantic search, recommendations, and anomaly detection. Pier deploys the official Docker image with persistent storage and the dashboard UI.

Deploy with Pier

1 Open the Pier dashboard and click Add service.
2 Pick Qdrant from the template list.
3 Choose the version, set a service name, and Pier provisions the container, storage, and ports automatically.
4 Attach a domain if you want HTTPS. Traefik auto-provisions the Let's Encrypt certificate.

What is Qdrant?

Qdrant is an open-source vector similarity search engine written in Rust. It exists to answer one question very, very fast: “given this query vector, find the k nearest stored vectors.” Where most general-purpose databases struggle with high-dim vector workloads, Qdrant is built specifically for HNSW indexes, payload filtering, and sub-50ms response times over millions of vectors.

The modern AI stack — RAG, semantic search, recommendations, anomaly detection — all need this primitive. Embeddings come from models (OpenAI, Cohere, Voyage, local BGE/MiniLM); the embeddings live in Qdrant; queries combine vector similarity with payload filters (“find similar to this, but only in English, posted last week, by user X”).

How Pier deploys it

Pier uses the official qdrant/qdrant Docker image. Default ports are 6333 (REST + Dashboard UI) and 6334 (gRPC). The data volume mounts at /qdrant/storage. Pier auto-generates an API key (set via QDRANT__SERVICE__API_KEY) and exposes it on the service detail page.

The built-in Dashboard UI is reachable on port 6333 (path /dashboard). Attach a domain in Pier’s Domains tab for HTTPS via Traefik.

When NOT to use Qdrant

If you already have data in Postgres and your dataset is under ~1M vectors, pgvector is simpler — one fewer database to operate. For pure managed-service simplicity at any scale, Pinecone trades cost for zero-ops. Qdrant wins when you want self-hosted control, multi-million- vector scale, rich payload filtering, and hybrid sparse+dense search.

Key features

HNSW index on disk

Memory-mapped HNSW (Hierarchical Navigable Small World) graph — millisecond search over millions of high-dim vectors without holding everything in RAM.

Payload filtering

Attach JSON metadata (tags, dates, user IDs) to each vector. Filter at search time — "find similar docs that I own, modified in the last 30 days" in one query.

REST + gRPC + Python/JS/Rust/Go clients

First-class clients for Python, JS, Rust, Go, .NET, Java. Both REST and gRPC supported; gRPC is faster for high-QPS workloads.

Hybrid search (sparse + dense)

Combine BM25-style sparse vectors with dense embeddings. Hybrid often outperforms either alone for RAG over enterprise documents.

Quantization

Scalar, product, and binary quantization reduce index size 4-32× with minimal recall loss. Run billion-vector indexes on commodity VMs.

Sharding & replication

Horizontal scaling via shards; replicas for redundancy. Pier ships single-node; production-scale clusters need manual orchestration.

Use cases

RAG / chatbot retrieval

The "R" in retrieval-augmented generation. Embed your docs with OpenAI / BGE / Voyage; store in Qdrant; query at chat time for top-k relevant chunks.

Semantic search

Find documents by meaning, not just keywords. "Who has experience with React state management?" returns matches even if they wrote "Redux/Zustand expertise."

Recommendation systems

User → item embeddings → similar items. Cold-start friendly when you have item embeddings but no behavioural data.

Duplicate / near-duplicate detection

Hashing-based dedupe misses paraphrases. Embedding-based dedupe catches them. Vital for content moderation, plagiarism, support ticket dedupe.

Image / multimodal search

CLIP embeddings let you search images by text and vice versa. Same Qdrant collection, multimodal queries.

Code examples

Create a collection python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(url="http://qdrant:6333")
client.create_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

Insert vectors with payload python

from qdrant_client.models import PointStruct

client.upsert(
    collection_name="docs",
    points=[
        PointStruct(
            id=42,
            vector=[0.1, 0.2, ...],          # 1536-dim
            payload={"title": "Hello", "url": "/posts/hello", "lang": "en"},
        )
    ],
)

Search with payload filter python

from qdrant_client.models import Filter, FieldCondition, MatchValue

hits = client.search(
    collection_name="docs",
    query_vector=[0.1, 0.2, ...],
    query_filter=Filter(
        must=[FieldCondition(key="lang", match=MatchValue(value="en"))]
    ),
    limit=5,
)

Hybrid search (dense + sparse) python

from qdrant_client.models import Prefetch

results = client.query_points(
    collection_name="docs",
    prefetch=[
        Prefetch(query=dense_vector,  using="dense", limit=20),
        Prefetch(query=sparse_vector, using="sparse", limit=20),
    ],
    query={ "fusion": "rrf" },        # reciprocal rank fusion
    limit=5,
)

How it compares

vs Pinecone (managed)	Pinecone is hosted, polished, expensive. Qdrant is OSS, self-hosted, comparable performance. Pick Qdrant when you want control over data, costs, or need on-prem deployment.
vs Weaviate	Weaviate has built-in modules for embedding generation (calls OpenAI/Cohere for you). Qdrant is leaner — you generate embeddings client-side. Both excellent; Qdrant tends to be faster, Weaviate has nicer hybrid features.
vs pgvector (PostgreSQL extension)	pgvector is great when you already have data in Postgres and don't want a second database. Qdrant outperforms pgvector at scale (10M+ vectors) and has richer payload filtering.
vs Milvus	Milvus is a Chinese-led OSS project with a similar feature set. Heavier to operate (more components). Qdrant is simpler — single binary, single container.

Frequently asked questions

Default port and protocol?

6333/tcp for REST + Dashboard, 6334/tcp for gRPC. Pier exposes both. gRPC is faster for high-QPS; REST is fine for most apps.

Where do embeddings come from?

Qdrant doesn't generate embeddings — you do, client-side. OpenAI `text-embedding-3-small`, Voyage, Cohere, or local models (BGE, all-MiniLM) all work. Just pass the vector to Qdrant.

Persistence?

Yes — Pier mounts `/qdrant/storage` as persistent volume. Survives container restarts and upgrades.

Memory and disk requirements?

HNSW indexes are memory-mapped. Rule of thumb — 4 GB RAM handles ~1M vectors at dim 1536. With scalar quantization, the same RAM holds ~4M vectors.

Authentication?

Qdrant supports API-key auth. Pier auto-generates the key and exposes it via env var. Send `api-key` header on every request.

Backups?

Use Qdrant's snapshot API to create per-collection snapshots. Store snapshots on the data volume or push to S3.

Which version does Pier deploy?

Default `latest`. Qdrant moves fast — pinned versions like `v1.12.0` are available in the version selector.

Related services

Deploy on your VPS

Deploy this service →