Database Design & Replication

The database is usually the hardest part of a system to scale and the most consequential to get wrong. Application servers are stateless and replaceable; the database holds your actual data.

This page covers how to choose the right database type, design schemas that don’t become a liability, and scale data storage as systems grow.

SQL vs. NoSQL

The first decision is always relational vs. non-relational.

Relational Databases (SQL)

Data is organized into tables with rows and columns, with strict schemas enforced by the database engine. Relationships between tables are modeled via foreign keys and accessed via JOINs.

Examples: PostgreSQL, MySQL, SQLite, Amazon Aurora, Microsoft SQL Server

Strengths:

ACID transactions: Atomic, Consistent, Isolated, Durable — guarantees that a set of operations either all succeed or all fail, with no partial state
JOINs: Efficiently query related data across multiple tables
Schema enforcement: The database rejects invalid data — your data is always clean
Mature tooling: Decades of optimization, operational knowledge, and ecosystem support

Weaknesses:

Scaling writes horizontally is hard — sharding a relational database breaks JOINs
Rigid schema makes evolving data models costly (migrations can be painful on large tables)

Use for: Financial transactions, user accounts, order management, anything with complex relationships and integrity requirements.

NoSQL Databases

NoSQL is an umbrella term for databases that trade ACID guarantees and rigid schemas for flexibility, horizontal scalability, and performance at specific access patterns.

Type	Model	Examples	Best For
Document	JSON-like documents	MongoDB, Firestore	Flexible schemas, nested data, content
Key-Value	Simple key → value	Redis, DynamoDB	Caching, sessions, simple lookups
Wide-Column	Rows with dynamic columns	Cassandra, HBase	Time-series, write-heavy, massive scale
Graph	Nodes and edges	Neo4j, Amazon Neptune	Social graphs, recommendation engines, fraud detection
Time-Series	Timestamped data points	InfluxDB, TimescaleDB	Metrics, IoT, monitoring

Normalization

Normalization is the process of structuring a relational database to reduce data redundancy and improve integrity.

Normal Forms (Quick Reference)

Form	Rule	Problem Prevented
1NF	Atomic values only; no repeating groups	Can’t store arrays in a column (use a join table)
2NF	Every non-key column depends on the entire primary key	Partial dependency — a column only depending on part of a composite key
3NF	No non-key column depends on another non-key column	Transitive dependency — e.g., storing `zip_code` and `city` in the same table

In practice: Design to 3NF. Denormalize deliberately when you have a measured performance problem — not preemptively.

Denormalization

Intentionally duplicating data to optimize read performance. Common in:

Reporting / analytics tables (pre-aggregated summaries)
Read-heavy NoSQL data models (embed related data to avoid joins)
CQRS read models

Indexing

An index is a separate data structure (usually a B-tree) that allows the database to find rows matching a condition without scanning the entire table.

-- Full table scan: O(n) — reads every row
SELECT * FROM orders WHERE customer_id = 42;

-- With index on customer_id: O(log n) — jumps directly
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Index Tradeoffs

	Without Index	With Index
SELECT speed	Slow (full table scan)	Fast (B-tree lookup)
INSERT/UPDATE/DELETE speed	Fast	Slower (index must be updated)
Storage	No overhead	Extra disk space

Rules of thumb:

Always index foreign keys and columns used in WHERE, JOIN ON, and ORDER BY clauses
Don’t over-index — every index slows down writes
Use EXPLAIN / EXPLAIN ANALYZE (Postgres) to verify the database is actually using your indexes

Replication

Replication copies data from one database node (the primary) to one or more replica nodes, providing read scalability and fault tolerance.

Primary-Replica (Leader-Follower)

Writes → [Primary] ──── async replication ────► [Replica 1]
                                               ► [Replica 2]
Reads  ─────────────────────────────────────► [Replica 1 or 2]

Synchronous replication: The primary waits for at least one replica to confirm the write before acknowledging success. Zero data loss on primary failure, but higher write latency.
Asynchronous replication: The primary acknowledges immediately and replicates in the background. Lower latency, but if the primary crashes before replicating, recent writes are lost (replication lag).

Replication lag means replicas may serve slightly stale data. For critical reads (e.g., “was my transaction successful?”), route to the primary.

Multi-Primary (Multi-Master)

Multiple nodes accept writes simultaneously. Requires conflict resolution when two nodes receive conflicting writes to the same row. Complex to operate correctly and rarely necessary — most systems can use primary-replica with a single write path.

The CAP Theorem

Every distributed data store must make a tradeoff between three properties:

Consistency (C): Every read receives the most recent write or an error.
Availability (A): Every request receives a response (not necessarily the most recent data).
Partition Tolerance (P): The system continues operating even when network partitions cause nodes to lose communication with each other.

The theorem: In the presence of a network partition, you must choose between consistency and availability. You cannot guarantee both.

System Type	Tradeoff	Example Systems
CP (Consistent + Partition Tolerant)	May reject requests during a partition	HBase, Zookeeper, etcd
AP (Available + Partition Tolerant)	May serve stale data during a partition	Cassandra, CouchDB, DynamoDB (eventual)
CA (Consistent + Available)	Not realistic in a distributed system — partitions always happen	Single-node RDBMS (no distribution)

PACELC Extension

CAP only describes behavior during a partition. PACELC extends it: even when the network is healthy, you must still choose between latency and consistency. Lower latency usually means looser consistency guarantees.

Consistency Models

Model	Guarantee	Example
Strong / Linearizable	Every read sees the latest write	Single-node DB, etcd
Read-your-writes	A client always sees its own writes	Session-pinned reads
Monotonic reads	A client never reads older data than it has already seen	Read replica with session affinity
Eventual consistency	All replicas will converge to the same value, eventually	Cassandra, DynamoDB default
Causal consistency	Operations that are causally related are seen in order	MongoDB sessions

Sharding (Horizontal Partitioning)

Sharding splits a single database into multiple independent nodes (shards), each holding a subset of the data.

User IDs 1–1M   → Shard 1
User IDs 1M–2M  → Shard 2
User IDs 2M–3M  → Shard 3

Sharding Strategies

Strategy	How	Risk
Range-based	Partition by value range (e.g., user ID ranges)	Hot spots if new IDs concentrate on one shard
Hash-based	`shard = hash(key) % num_shards`	Even distribution, but range queries span all shards
Directory-based	Lookup table maps key → shard	Flexible, but lookup table is a single point of failure

Sharding Costs

Cross-shard JOINs are impossible — data that needs to be queried together must live on the same shard
Resharding is painful — changing the shard count requires moving data across nodes
Transactions across shards require distributed transaction protocols (2-phase commit), which are slow and complex

Choosing a Database: Decision Guide

Requirement	Recommended Approach
Complex relationships, transactions, data integrity	PostgreSQL or MySQL (RDBMS)
Flexible / evolving schema, nested JSON documents	MongoDB or Firestore
High-volume writes, time-ordered data, massive scale	Cassandra
Sub-millisecond reads, simple key-value access	Redis
Graph traversal, relationship queries	Neo4j or Amazon Neptune
Metrics, monitoring, IoT sensor data	InfluxDB or TimescaleDB
Serverless / managed, low-ops key-value	DynamoDB