Database Design & Replication
The database is usually the hardest part of a system to scale and the most consequential to get wrong. Application servers are stateless and replaceable; the database holds your actual data.
This page covers how to choose the right database type, design schemas that don’t become a liability, and scale data storage as systems grow.
SQL vs. NoSQL
Section titled “SQL vs. NoSQL”The first decision is always relational vs. non-relational.
Relational Databases (SQL)
Section titled “Relational Databases (SQL)”Data is organized into tables with rows and columns, with strict schemas enforced by the database engine. Relationships between tables are modeled via foreign keys and accessed via JOINs.
Examples: PostgreSQL, MySQL, SQLite, Amazon Aurora, Microsoft SQL Server
Strengths:
- ACID transactions: Atomic, Consistent, Isolated, Durable — guarantees that a set of operations either all succeed or all fail, with no partial state
- JOINs: Efficiently query related data across multiple tables
- Schema enforcement: The database rejects invalid data — your data is always clean
- Mature tooling: Decades of optimization, operational knowledge, and ecosystem support
Weaknesses:
- Scaling writes horizontally is hard — sharding a relational database breaks JOINs
- Rigid schema makes evolving data models costly (migrations can be painful on large tables)
Use for: Financial transactions, user accounts, order management, anything with complex relationships and integrity requirements.
NoSQL Databases
Section titled “NoSQL Databases”NoSQL is an umbrella term for databases that trade ACID guarantees and rigid schemas for flexibility, horizontal scalability, and performance at specific access patterns.
| Type | Model | Examples | Best For |
|---|---|---|---|
| Document | JSON-like documents | MongoDB, Firestore | Flexible schemas, nested data, content |
| Key-Value | Simple key → value | Redis, DynamoDB | Caching, sessions, simple lookups |
| Wide-Column | Rows with dynamic columns | Cassandra, HBase | Time-series, write-heavy, massive scale |
| Graph | Nodes and edges | Neo4j, Amazon Neptune | Social graphs, recommendation engines, fraud detection |
| Time-Series | Timestamped data points | InfluxDB, TimescaleDB | Metrics, IoT, monitoring |
Normalization
Section titled “Normalization”Normalization is the process of structuring a relational database to reduce data redundancy and improve integrity.
Normal Forms (Quick Reference)
Section titled “Normal Forms (Quick Reference)”| Form | Rule | Problem Prevented |
|---|---|---|
| 1NF | Atomic values only; no repeating groups | Can’t store arrays in a column (use a join table) |
| 2NF | Every non-key column depends on the entire primary key | Partial dependency — a column only depending on part of a composite key |
| 3NF | No non-key column depends on another non-key column | Transitive dependency — e.g., storing zip_code and city in the same table |
In practice: Design to 3NF. Denormalize deliberately when you have a measured performance problem — not preemptively.
Denormalization
Section titled “Denormalization”Intentionally duplicating data to optimize read performance. Common in:
- Reporting / analytics tables (pre-aggregated summaries)
- Read-heavy NoSQL data models (embed related data to avoid joins)
- CQRS read models
Indexing
Section titled “Indexing”An index is a separate data structure (usually a B-tree) that allows the database to find rows matching a condition without scanning the entire table.
-- Full table scan: O(n) — reads every rowSELECT * FROM orders WHERE customer_id = 42;
-- With index on customer_id: O(log n) — jumps directlyCREATE INDEX idx_orders_customer_id ON orders(customer_id);Index Tradeoffs
Section titled “Index Tradeoffs”| Without Index | With Index | |
|---|---|---|
| SELECT speed | Slow (full table scan) | Fast (B-tree lookup) |
| INSERT/UPDATE/DELETE speed | Fast | Slower (index must be updated) |
| Storage | No overhead | Extra disk space |
Rules of thumb:
- Always index foreign keys and columns used in
WHERE,JOIN ON, andORDER BYclauses - Don’t over-index — every index slows down writes
- Use
EXPLAIN/EXPLAIN ANALYZE(Postgres) to verify the database is actually using your indexes
Replication
Section titled “Replication”Replication copies data from one database node (the primary) to one or more replica nodes, providing read scalability and fault tolerance.
Primary-Replica (Leader-Follower)
Section titled “Primary-Replica (Leader-Follower)”Writes → [Primary] ──── async replication ────► [Replica 1] ► [Replica 2]Reads ─────────────────────────────────────► [Replica 1 or 2]- Synchronous replication: The primary waits for at least one replica to confirm the write before acknowledging success. Zero data loss on primary failure, but higher write latency.
- Asynchronous replication: The primary acknowledges immediately and replicates in the background. Lower latency, but if the primary crashes before replicating, recent writes are lost (replication lag).
Replication lag means replicas may serve slightly stale data. For critical reads (e.g., “was my transaction successful?”), route to the primary.
Multi-Primary (Multi-Master)
Section titled “Multi-Primary (Multi-Master)”Multiple nodes accept writes simultaneously. Requires conflict resolution when two nodes receive conflicting writes to the same row. Complex to operate correctly and rarely necessary — most systems can use primary-replica with a single write path.
The CAP Theorem
Section titled “The CAP Theorem”Every distributed data store must make a tradeoff between three properties:
- Consistency (C): Every read receives the most recent write or an error.
- Availability (A): Every request receives a response (not necessarily the most recent data).
- Partition Tolerance (P): The system continues operating even when network partitions cause nodes to lose communication with each other.
The theorem: In the presence of a network partition, you must choose between consistency and availability. You cannot guarantee both.
| System Type | Tradeoff | Example Systems |
|---|---|---|
| CP (Consistent + Partition Tolerant) | May reject requests during a partition | HBase, Zookeeper, etcd |
| AP (Available + Partition Tolerant) | May serve stale data during a partition | Cassandra, CouchDB, DynamoDB (eventual) |
| CA (Consistent + Available) | Not realistic in a distributed system — partitions always happen | Single-node RDBMS (no distribution) |
PACELC Extension
Section titled “PACELC Extension”CAP only describes behavior during a partition. PACELC extends it: even when the network is healthy, you must still choose between latency and consistency. Lower latency usually means looser consistency guarantees.
Consistency Models
Section titled “Consistency Models”| Model | Guarantee | Example |
|---|---|---|
| Strong / Linearizable | Every read sees the latest write | Single-node DB, etcd |
| Read-your-writes | A client always sees its own writes | Session-pinned reads |
| Monotonic reads | A client never reads older data than it has already seen | Read replica with session affinity |
| Eventual consistency | All replicas will converge to the same value, eventually | Cassandra, DynamoDB default |
| Causal consistency | Operations that are causally related are seen in order | MongoDB sessions |
Sharding (Horizontal Partitioning)
Section titled “Sharding (Horizontal Partitioning)”Sharding splits a single database into multiple independent nodes (shards), each holding a subset of the data.
User IDs 1–1M → Shard 1User IDs 1M–2M → Shard 2User IDs 2M–3M → Shard 3Sharding Strategies
Section titled “Sharding Strategies”| Strategy | How | Risk |
|---|---|---|
| Range-based | Partition by value range (e.g., user ID ranges) | Hot spots if new IDs concentrate on one shard |
| Hash-based | shard = hash(key) % num_shards | Even distribution, but range queries span all shards |
| Directory-based | Lookup table maps key → shard | Flexible, but lookup table is a single point of failure |
Sharding Costs
Section titled “Sharding Costs”- Cross-shard JOINs are impossible — data that needs to be queried together must live on the same shard
- Resharding is painful — changing the shard count requires moving data across nodes
- Transactions across shards require distributed transaction protocols (2-phase commit), which are slow and complex
Choosing a Database: Decision Guide
Section titled “Choosing a Database: Decision Guide”| Requirement | Recommended Approach |
|---|---|
| Complex relationships, transactions, data integrity | PostgreSQL or MySQL (RDBMS) |
| Flexible / evolving schema, nested JSON documents | MongoDB or Firestore |
| High-volume writes, time-ordered data, massive scale | Cassandra |
| Sub-millisecond reads, simple key-value access | Redis |
| Graph traversal, relationship queries | Neo4j or Amazon Neptune |
| Metrics, monitoring, IoT sensor data | InfluxDB or TimescaleDB |
| Serverless / managed, low-ops key-value | DynamoDB |