Skip to content

Database Design & Replication

The database is usually the hardest part of a system to scale and the most consequential to get wrong. Application servers are stateless and replaceable; the database holds your actual data.

This page covers how to choose the right database type, design schemas that don’t become a liability, and scale data storage as systems grow.


The first decision is always relational vs. non-relational.

Data is organized into tables with rows and columns, with strict schemas enforced by the database engine. Relationships between tables are modeled via foreign keys and accessed via JOINs.

Examples: PostgreSQL, MySQL, SQLite, Amazon Aurora, Microsoft SQL Server

Strengths:

  • ACID transactions: Atomic, Consistent, Isolated, Durable — guarantees that a set of operations either all succeed or all fail, with no partial state
  • JOINs: Efficiently query related data across multiple tables
  • Schema enforcement: The database rejects invalid data — your data is always clean
  • Mature tooling: Decades of optimization, operational knowledge, and ecosystem support

Weaknesses:

  • Scaling writes horizontally is hard — sharding a relational database breaks JOINs
  • Rigid schema makes evolving data models costly (migrations can be painful on large tables)

Use for: Financial transactions, user accounts, order management, anything with complex relationships and integrity requirements.

NoSQL is an umbrella term for databases that trade ACID guarantees and rigid schemas for flexibility, horizontal scalability, and performance at specific access patterns.

TypeModelExamplesBest For
DocumentJSON-like documentsMongoDB, FirestoreFlexible schemas, nested data, content
Key-ValueSimple key → valueRedis, DynamoDBCaching, sessions, simple lookups
Wide-ColumnRows with dynamic columnsCassandra, HBaseTime-series, write-heavy, massive scale
GraphNodes and edgesNeo4j, Amazon NeptuneSocial graphs, recommendation engines, fraud detection
Time-SeriesTimestamped data pointsInfluxDB, TimescaleDBMetrics, IoT, monitoring

Normalization is the process of structuring a relational database to reduce data redundancy and improve integrity.

FormRuleProblem Prevented
1NFAtomic values only; no repeating groupsCan’t store arrays in a column (use a join table)
2NFEvery non-key column depends on the entire primary keyPartial dependency — a column only depending on part of a composite key
3NFNo non-key column depends on another non-key columnTransitive dependency — e.g., storing zip_code and city in the same table

In practice: Design to 3NF. Denormalize deliberately when you have a measured performance problem — not preemptively.

Intentionally duplicating data to optimize read performance. Common in:

  • Reporting / analytics tables (pre-aggregated summaries)
  • Read-heavy NoSQL data models (embed related data to avoid joins)
  • CQRS read models

An index is a separate data structure (usually a B-tree) that allows the database to find rows matching a condition without scanning the entire table.

-- Full table scan: O(n) — reads every row
SELECT * FROM orders WHERE customer_id = 42;
-- With index on customer_id: O(log n) — jumps directly
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Without IndexWith Index
SELECT speedSlow (full table scan)Fast (B-tree lookup)
INSERT/UPDATE/DELETE speedFastSlower (index must be updated)
StorageNo overheadExtra disk space

Rules of thumb:

  • Always index foreign keys and columns used in WHERE, JOIN ON, and ORDER BY clauses
  • Don’t over-index — every index slows down writes
  • Use EXPLAIN / EXPLAIN ANALYZE (Postgres) to verify the database is actually using your indexes

Replication copies data from one database node (the primary) to one or more replica nodes, providing read scalability and fault tolerance.

Writes → [Primary] ──── async replication ────► [Replica 1]
► [Replica 2]
Reads ─────────────────────────────────────► [Replica 1 or 2]
  • Synchronous replication: The primary waits for at least one replica to confirm the write before acknowledging success. Zero data loss on primary failure, but higher write latency.
  • Asynchronous replication: The primary acknowledges immediately and replicates in the background. Lower latency, but if the primary crashes before replicating, recent writes are lost (replication lag).

Replication lag means replicas may serve slightly stale data. For critical reads (e.g., “was my transaction successful?”), route to the primary.

Multiple nodes accept writes simultaneously. Requires conflict resolution when two nodes receive conflicting writes to the same row. Complex to operate correctly and rarely necessary — most systems can use primary-replica with a single write path.


Every distributed data store must make a tradeoff between three properties:

  • Consistency (C): Every read receives the most recent write or an error.
  • Availability (A): Every request receives a response (not necessarily the most recent data).
  • Partition Tolerance (P): The system continues operating even when network partitions cause nodes to lose communication with each other.

The theorem: In the presence of a network partition, you must choose between consistency and availability. You cannot guarantee both.

System TypeTradeoffExample Systems
CP (Consistent + Partition Tolerant)May reject requests during a partitionHBase, Zookeeper, etcd
AP (Available + Partition Tolerant)May serve stale data during a partitionCassandra, CouchDB, DynamoDB (eventual)
CA (Consistent + Available)Not realistic in a distributed system — partitions always happenSingle-node RDBMS (no distribution)

CAP only describes behavior during a partition. PACELC extends it: even when the network is healthy, you must still choose between latency and consistency. Lower latency usually means looser consistency guarantees.


ModelGuaranteeExample
Strong / LinearizableEvery read sees the latest writeSingle-node DB, etcd
Read-your-writesA client always sees its own writesSession-pinned reads
Monotonic readsA client never reads older data than it has already seenRead replica with session affinity
Eventual consistencyAll replicas will converge to the same value, eventuallyCassandra, DynamoDB default
Causal consistencyOperations that are causally related are seen in orderMongoDB sessions

Sharding splits a single database into multiple independent nodes (shards), each holding a subset of the data.

User IDs 1–1M → Shard 1
User IDs 1M–2M → Shard 2
User IDs 2M–3M → Shard 3
StrategyHowRisk
Range-basedPartition by value range (e.g., user ID ranges)Hot spots if new IDs concentrate on one shard
Hash-basedshard = hash(key) % num_shardsEven distribution, but range queries span all shards
Directory-basedLookup table maps key → shardFlexible, but lookup table is a single point of failure
  • Cross-shard JOINs are impossible — data that needs to be queried together must live on the same shard
  • Resharding is painful — changing the shard count requires moving data across nodes
  • Transactions across shards require distributed transaction protocols (2-phase commit), which are slow and complex

RequirementRecommended Approach
Complex relationships, transactions, data integrityPostgreSQL or MySQL (RDBMS)
Flexible / evolving schema, nested JSON documentsMongoDB or Firestore
High-volume writes, time-ordered data, massive scaleCassandra
Sub-millisecond reads, simple key-value accessRedis
Graph traversal, relationship queriesNeo4j or Amazon Neptune
Metrics, monitoring, IoT sensor dataInfluxDB or TimescaleDB
Serverless / managed, low-ops key-valueDynamoDB