ClickHouse

The column-oriented OLAP database that scans billions of rows per second.

Database #analytics#olap#columnar

ClickHouse is a real-time analytics database that uses columnar storage, aggressive compression, vectorized execution, and parallel processing to deliver sub-second queries over tables with hundreds of billions of rows. Used by Cloudflare, Uber, eBay, Bloomberg, Yandex.

Deploy with Pier

1 Open the Pier dashboard and click Add service.
2 Pick ClickHouse from the template list.
3 Choose the version, set a service name, and Pier provisions the container, storage, and ports automatically.
4 Attach a domain if you want HTTPS. Traefik auto-provisions the Let's Encrypt certificate.

What is ClickHouse?

ClickHouse is a column-oriented OLAP database originally built at Yandex (2016 open-source release) to power Yandex Metrica — a web analytics service processing hundreds of billions of events per day. Today it backs Cloudflare’s Edge analytics, eBay’s data warehouse, Uber’s logs, Bloomberg’s market data, and most of the modern open-source observability stack (SigNoz, PostHog, Tinybird, Plausible).

The architecture choice — columnar storage + vectorized execution + aggressive compression — makes aggregation queries 10–100× faster than row stores like PostgreSQL or MySQL on wide event tables. The trade-off: weaker transactional semantics, no row-level UPDATE/DELETE (mutations are async batches), and a mental model that takes adjusting if you come from OLTP.

How Pier deploys it

Pier uses the official clickhouse/clickhouse-server Docker image, exposing both 8123 (HTTP) and 9000 (native TCP) ports. The data volume mounts at /var/lib/clickhouse. The default version is latest; 25-alpine, 24-alpine, and 23-alpine are pinned LTS-ish variants.

Cluster mode supports 2–5 nodes for read-scale and replication. Sharding via Distributed table engine still requires manual macros + replicas configuration after the cluster is up.

Backups use clickhouse-backup and can be uploaded to any S3-compatible storage.

When NOT to use ClickHouse

For OLTP — Postgres or MySQL. For single-row lookups — KV stores. For full-text search — Elasticsearch. ClickHouse is for analytical workloads at scale; below ~50GB and a few million rows, simpler databases are usually fast enough.

Key features

Columnar storage

Stores each column in a separate file with dictionary, delta, and LZ4/ZSTD compression. Aggregations scan only the columns they need — orders of magnitude less I/O than row-store databases.

Vectorized query execution

SIMD instructions process batches of values per CPU cycle. A single core handles tens of millions of rows per second on aggregations.

MergeTree storage engine

Sort-merge engine optimized for append-heavy time-series and event data. Background merges keep data physically sorted by primary key.

SQL with extensions

ANSI SQL plus ClickHouse-specific extensions — arrays as first-class, lambdas, approximate quantiles, sketches (HyperLogLog, T-Digest), window functions, and ASOF JOINs.

Distributed queries (cluster mode)

Federated query layer over multiple shards and replicas. Pier supports clusters of 2–5 nodes for sharded read-scale.

Real-time ingest

Bulk INSERTs at millions of rows per second per node. HTTP, native TCP, MySQL protocol, PostgreSQL protocol, Kafka engine — pick your ingestion path.

Use cases

Web analytics & product telemetry

Yandex Metrica (the original use case), PostHog, Plausible, Tinybird — every modern web-analytics platform runs on ClickHouse.

Application observability

SigNoz, Highlight, Uptrace store traces, logs, and metrics in ClickHouse. Faster and cheaper than Elasticsearch for high-volume log ingest.

Real-time BI dashboards

Sub-second responses over hundreds of millions of rows — perfect for live operational dashboards driven by Grafana, Metabase, Superset.

Ad-tech & marketing attribution

Click-stream aggregation, cohort analysis, funnel queries — workloads where columnar wins by 100×+ over row stores.

Financial market data

Tick storage, order book replay, backtesting — ASOF JOIN finds nearest-in-time matches without window functions.

Code examples

Create a MergeTree table sql

CREATE TABLE events (
  ts          DateTime64(3) CODEC(Delta, ZSTD),
  user_id     UInt64,
  event_type  LowCardinality(String),
  properties  Map(String, String),
  country     LowCardinality(FixedString(2))
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (event_type, user_id, ts);

Aggregate over a billion rows in seconds sql

SELECT toStartOfHour(ts) AS hour,
       event_type,
       count() AS n,
       uniqExact(user_id) AS unique_users
FROM events
WHERE ts > now() - INTERVAL 7 DAY
GROUP BY hour, event_type
ORDER BY hour;

Approximate quantiles (HyperLogLog + T-Digest) sql

SELECT
  quantilesTDigest(0.5, 0.9, 0.99)(response_ms) AS p50_p90_p99,
  uniqHLL12(user_id) AS approx_unique_users
FROM events
WHERE ts > now() - INTERVAL 1 HOUR;

ASOF JOIN (financial-style nearest-in-time) sql

SELECT t.ts, t.symbol, t.price, q.bid, q.ask
FROM trades t
ASOF LEFT JOIN quotes q
  ON t.symbol = q.symbol
 AND t.ts >= q.ts;

How it compares

vs PostgreSQL	Postgres wins on transactions, joins, mutability, and SQL standard. ClickHouse wins by 10–100× on analytical aggregations over wide event tables. Many stacks use both.
vs TimescaleDB	TimescaleDB keeps Postgres ergonomics with hypertables and continuous aggregates. ClickHouse is more aggressive — wider OLAP wins, weaker on consistency, joins, and mutability.
vs Elasticsearch	Elastic excels at full-text and dynamic JSON queries. ClickHouse is 10× cheaper at storing & querying structured event data. Many observability stacks migrated logs from Elastic to ClickHouse.
vs Snowflake / BigQuery	Managed columnar OLAP — same architectural family. ClickHouse self-hosted is dramatically cheaper for predictable workloads. Cloud-managed wins on zero-admin and elasticity.

Frequently asked questions

Is ClickHouse SQL-compatible?

Mostly — ANSI SQL with ClickHouse extensions. JOINs work but are not its strength; OLAP aggregations and array/map operations are. Don't expect transactional semantics (INSERT-only, no UPDATE/DELETE at row level historically; mutations exist but are async).

Default ports?

8123/tcp for HTTP, 9000/tcp for native TCP protocol. Pier exposes both. JDBC/ODBC drivers use HTTP; the official `clickhouse-client` uses native 9000.

Does cluster mode require manual config?

Pier supports clusters of 2–5 nodes for read-scale and replication. Sharding and distributed tables still require some manual configuration after deployment — see ClickHouse docs.

Are mutations safe in production?

ALTER TABLE ... DELETE / UPDATE are asynchronous and expensive. Prefer designing schemas so DELETEs are by partition drop (ALTER TABLE ... DROP PARTITION) and updates are append-then-deduplicate via ReplacingMergeTree.

Connection from BI tools?

Grafana, Metabase, Superset, Tableau all have native ClickHouse connectors. JDBC for Java/Kotlin. ODBC for Power BI/Excel.

Backups?

Pier ships scheduled backups via `clickhouse-backup` for full or per-database dumps. Can upload to any S3-compatible store.

When NOT to use it?

Transactional workloads — pick Postgres. Single-row lookups by key — pick Postgres or a KV store. Full-text search — pick Elasticsearch. Sub-100GB datasets — Postgres analytic queries are usually fast enough.

Related services

Deploy on your VPS

Deploy this service →