Apache Cassandra

Linearly-scalable wide-column NoSQL — the database that doesn't go down.

Database Cluster-ready #nosql#wide-column#distributed

Apache Cassandra is a distributed wide-column NoSQL database designed for high write throughput, linear horizontal scalability, and no single point of failure. Used at Netflix, Apple, Discord, Instagram, eBay — anywhere "the database must never go down" is a hard requirement.

Deploy with Pier

1 Open the Pier dashboard and click Add service.
2 Pick Cassandra from the template list.
3 Choose the version, set a service name, and Pier provisions the container, storage, and ports automatically.
4 Attach a domain if you want HTTPS. Traefik auto-provisions the Let's Encrypt certificate.

What is Apache Cassandra?

Cassandra is a distributed wide-column NoSQL database originally developed at Facebook (2008) and open-sourced through the Apache Foundation. Its design prioritizes write throughput and availability over consistency — every node is equal, there’s no failover ceremony, and the cluster keeps accepting writes through node failures and network partitions.

Netflix, Apple, Discord (4 billion messages/day), Instagram, eBay, Uber, and many telecom operators run Cassandra clusters at extreme scale. Its data model (partition key + clustering key) forces you to think about query patterns up front but rewards that with predictable horizontal scaling and no operational drama at scale.

How Pier deploys it

Pier uses the official cassandra Docker image, mounting /var/lib/cassandra as the data volume. The default version is latest (Cassandra 5.x); 5.0, 4.1 LTS, and 4.0 LTS are also available.

Cluster mode supports 2–5 nodes. For production you typically want 6+ nodes in 2+ datacenters — use Pier’s cluster template as a starting point and expand manually.

Backups capture nodetool snapshot tarballs and can be uploaded to any S3-compatible storage.

When NOT to use Cassandra

For OLTP with strong consistency — Postgres. For ad-hoc analytical queries — ClickHouse. For document storage with rich queries — MongoDB. For sub-millisecond per-node latency — ScyllaDB. Cassandra wins when partition- keyed access patterns + write throughput + availability + multi-DC are the hard requirements.

Key features

Masterless architecture

Every node is equal — no primary, no failover ceremony. Read and write to any node; the cluster keeps converging.

Linear horizontal scaling

Double the cluster size, get roughly double the throughput. Few databases scale write throughput this predictably.

Tunable consistency

Per-query consistency from ONE (fastest) to ALL (strongest). Pick the right CAP trade-off per use case.

CQL — SQL-like query language

Cassandra Query Language reads like SQL — CREATE TABLE, SELECT, INSERT — but enforces partition-aware modelling. Approachable for SQL teams.

Multi-datacenter replication

Asynchronous replication across geographic regions out of the box. Quorum-per-DC consistency for low-latency regional reads.

No SPOF — survives node loss

Cluster keeps serving writes through node failures, network partitions, and rolling upgrades. Battle-tested at Netflix-scale.

Use cases

Time-series and IoT at extreme scale

Billions of writes per day across multi-node clusters. Sensor data, telemetry, audit logs.

Messaging & chat systems

Discord stores billions of messages in Cassandra. Wide-row schema fits chat thread access patterns perfectly.

User activity streams

Per-user timelines, notification feeds — high write throughput, simple read patterns.

Multi-region session state

Globally-replicated user sessions with regional read locality. Multi-DC consistency tuned per session class.

Recommendation systems backing store

Pre-computed recommendations served by partition key with sub-millisecond reads at high concurrency.

Code examples

Create keyspace and table sql

CREATE KEYSPACE app
  WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3
  };

USE app;

CREATE TABLE events (
  user_id  UUID,
  ts       TIMESTAMP,
  kind     TEXT,
  payload  TEXT,
  PRIMARY KEY ((user_id), ts)
) WITH CLUSTERING ORDER BY (ts DESC);

Write and read by partition sql

INSERT INTO events (user_id, ts, kind, payload)
VALUES (uuid(), toTimestamp(now()), 'signup', '{"plan":"pro"}');

SELECT * FROM events
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
LIMIT 20;

Tunable consistency sql

-- Strong consistency (quorum):
CONSISTENCY QUORUM;
SELECT * FROM events WHERE user_id = ?;

-- Eventual / fast:
CONSISTENCY ONE;
SELECT * FROM events WHERE user_id = ?;

TTL on rows sql

INSERT INTO sessions (id, user_id, payload)
VALUES (?, ?, ?)
USING TTL 3600;  -- expire after 1 hour

How it compares

vs ScyllaDB	ScyllaDB is a C++ rewrite of Cassandra optimized for sub-millisecond latency and lower hardware cost. CQL-compatible. Pick ScyllaDB if latency and throughput per node matter; Cassandra if community + tooling matter.
vs MongoDB	Mongo is document-oriented; Cassandra is wide-column. Mongo wins on ad-hoc queries; Cassandra wins on linear write scaling and multi-DC replication.
vs PostgreSQL	Postgres is single-master with read replicas — strong consistency, joins, transactions. Cassandra is multi-master, no joins, eventual consistency by default. Pick Postgres for OLTP; Cassandra for write-heavy, partition-keyed workloads.
vs DynamoDB	DynamoDB is AWS-managed wide-column with on-demand billing. Cassandra is self-hosted with predictable costs. APIs differ; Cassandra is OSS.

Frequently asked questions

Does Cassandra do joins?

No. The data model is partition-key + clustering-key based; you denormalize and pre-compute joins at write time. SELECT must include the partition key (or pay a full-cluster scan).

Cluster size for production?

Minimum 3 nodes for quorum reads/writes; 6+ nodes for production. Pier's cluster template supports 2–5 nodes — fine for evaluation and small deployments; production typically needs more.

Default ports?

9042/tcp for CQL native protocol; 7000/tcp for inter-node gossip. Pier exposes 9042 to apps; 7000 stays on the internal Pier network.

Backups?

Pier triggers `nodetool snapshot` on schedule and ships the snapshot tarballs to S3. Restore involves stopping the node, replacing data files, and starting it back up.

Should I pick Cassandra or ScyllaDB?

For best performance per node — ScyllaDB. For largest ecosystem, most tooling, and community Q&A — Cassandra. Both speak CQL; migration between them is straightforward.

How do I model data?

Start with queries, derive tables. "Query-first" modelling is mandatory. Denormalize freely. The Cassandra "Spotify model" or "Time-series patterns" docs are essential reading before production use.

Default authentication?

Cassandra uses PasswordAuthenticator. Pier sets credentials via environment variables on first start.

Related services

Deploy on your VPS

Deploy this service →