Apache Cassandra
Linearly-scalable wide-column NoSQL — the database that doesn't go down.
Apache Cassandra is a distributed wide-column NoSQL database designed for high write throughput, linear horizontal scalability, and no single point of failure. Used at Netflix, Apple, Discord, Instagram, eBay — anywhere "the database must never go down" is a hard requirement.
Deploy with Pier
- 1 Open the Pier dashboard and click Add service.
- 2 Pick Cassandra from the template list.
- 3 Choose the version, set a service name, and Pier provisions the container, storage, and ports automatically.
- 4 Attach a domain if you want HTTPS. Traefik auto-provisions the Let's Encrypt certificate.
What is Apache Cassandra?
Cassandra is a distributed wide-column NoSQL database originally developed at Facebook (2008) and open-sourced through the Apache Foundation. Its design prioritizes write throughput and availability over consistency — every node is equal, there’s no failover ceremony, and the cluster keeps accepting writes through node failures and network partitions.
Netflix, Apple, Discord (4 billion messages/day), Instagram, eBay, Uber, and many telecom operators run Cassandra clusters at extreme scale. Its data model (partition key + clustering key) forces you to think about query patterns up front but rewards that with predictable horizontal scaling and no operational drama at scale.
How Pier deploys it
Pier uses the official cassandra Docker image, mounting /var/lib/cassandra
as the data volume. The default version is latest (Cassandra 5.x); 5.0, 4.1
LTS, and 4.0 LTS are also available.
Cluster mode supports 2–5 nodes. For production you typically want 6+ nodes in 2+ datacenters — use Pier’s cluster template as a starting point and expand manually.
Backups capture nodetool snapshot tarballs and can be uploaded to any
S3-compatible storage.
When NOT to use Cassandra
For OLTP with strong consistency — Postgres. For ad-hoc analytical queries — ClickHouse. For document storage with rich queries — MongoDB. For sub-millisecond per-node latency — ScyllaDB. Cassandra wins when partition- keyed access patterns + write throughput + availability + multi-DC are the hard requirements.
Key features
Masterless architecture
Every node is equal — no primary, no failover ceremony. Read and write to any node; the cluster keeps converging.
Linear horizontal scaling
Double the cluster size, get roughly double the throughput. Few databases scale write throughput this predictably.
Tunable consistency
Per-query consistency from ONE (fastest) to ALL (strongest). Pick the right CAP trade-off per use case.
CQL — SQL-like query language
Cassandra Query Language reads like SQL — CREATE TABLE, SELECT, INSERT — but enforces partition-aware modelling. Approachable for SQL teams.
Multi-datacenter replication
Asynchronous replication across geographic regions out of the box. Quorum-per-DC consistency for low-latency regional reads.
No SPOF — survives node loss
Cluster keeps serving writes through node failures, network partitions, and rolling upgrades. Battle-tested at Netflix-scale.
Use cases
Time-series and IoT at extreme scale
Billions of writes per day across multi-node clusters. Sensor data, telemetry, audit logs.
Messaging & chat systems
Discord stores billions of messages in Cassandra. Wide-row schema fits chat thread access patterns perfectly.
User activity streams
Per-user timelines, notification feeds — high write throughput, simple read patterns.
Multi-region session state
Globally-replicated user sessions with regional read locality. Multi-DC consistency tuned per session class.
Recommendation systems backing store
Pre-computed recommendations served by partition key with sub-millisecond reads at high concurrency.
Code examples
CREATE KEYSPACE app
WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3
};
USE app;
CREATE TABLE events (
user_id UUID,
ts TIMESTAMP,
kind TEXT,
payload TEXT,
PRIMARY KEY ((user_id), ts)
) WITH CLUSTERING ORDER BY (ts DESC); INSERT INTO events (user_id, ts, kind, payload)
VALUES (uuid(), toTimestamp(now()), 'signup', '{"plan":"pro"}');
SELECT * FROM events
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
LIMIT 20; -- Strong consistency (quorum):
CONSISTENCY QUORUM;
SELECT * FROM events WHERE user_id = ?;
-- Eventual / fast:
CONSISTENCY ONE;
SELECT * FROM events WHERE user_id = ?; INSERT INTO sessions (id, user_id, payload)
VALUES (?, ?, ?)
USING TTL 3600; -- expire after 1 hour How it compares
| vs ScyllaDB | ScyllaDB is a C++ rewrite of Cassandra optimized for sub-millisecond latency and lower hardware cost. CQL-compatible. Pick ScyllaDB if latency and throughput per node matter; Cassandra if community + tooling matter. |
| vs MongoDB | Mongo is document-oriented; Cassandra is wide-column. Mongo wins on ad-hoc queries; Cassandra wins on linear write scaling and multi-DC replication. |
| vs PostgreSQL | Postgres is single-master with read replicas — strong consistency, joins, transactions. Cassandra is multi-master, no joins, eventual consistency by default. Pick Postgres for OLTP; Cassandra for write-heavy, partition-keyed workloads. |
| vs DynamoDB | DynamoDB is AWS-managed wide-column with on-demand billing. Cassandra is self-hosted with predictable costs. APIs differ; Cassandra is OSS. |
Frequently asked questions
Does Cassandra do joins?
Cluster size for production?
Default ports?
Backups?
Should I pick Cassandra or ScyllaDB?
How do I model data?
Default authentication?
Related services
Deploy on your VPS
Apache Cassandra is a distributed wide-column NoSQL database designed for high write throughput, linear horizontal scalability, and no single point of failure. Used at Netflix, Apple, Discord, Instagram, eBay — anywhere "the database must never go down" is a hard requirement.
Deploy this service →