Understanding Database Replication and WAL Through Real-World Scenarios

16 min read
#databases #system-design #distributed-systems

Understanding Database Replication and WAL Through Real-World Scenarios

Database replication and Write-Ahead Logging (WAL) are fundamental concepts in building scalable, reliable systems. Rather than diving into abstract theory, let’s learn these concepts through real-world scenarios that show why these technologies exist and when to use different approaches.

Table of Contents

  1. The Growing Startup: Why Replication Exists
  2. The Replication Challenge: How Do Servers Stay in Sync?
  3. Synchronous vs Asynchronous: Making the Trade-off
  4. The Crash Problem: Why WAL Exists
  5. Connecting WAL to Replication
  6. Replication Lag in Production
  7. When Things Break: Replica Failures
  8. When Things Break: Primary Failures
  9. WAL Retention Strategies

Scenario 1: The Growing Startup (Why Replication Exists) {#scenario-1-the-growing-startup}

Imagine you’re building a recipe-sharing app called “CookSnap”. You start with a single database server in AWS us-east-1.

Month 1: 1,000 users, everything runs smoothly. Your single database handles all reads and writes without breaking a sweat.

Month 6: 100,000 users. You start noticing problems:

The Problem

Your single database server has become a bottleneck. With 100,000 concurrent users trying to browse recipes, that one server simply can’t keep up with the read load.

The Solution: Replication

The idea is simple: create copies of your database on multiple servers.

Before:                          After:
┌──────────┐                    ┌──────────┐
│ Server A │                    │ Server A │ (Primary)
│          │                    │          │
│ Database │                    │ Database │
└──────────┘                    └──────────┘

                                      ├─────────────┬─────────────┐
                                      ▼             ▼             ▼
                                ┌──────────┐  ┌──────────┐  ┌──────────┐
                                │ Server B │  │ Server C │  │ Server D │
                                │ (Replica)│  │ (Replica)│  │ (Replica)│
                                └──────────┘  └──────────┘  └──────────┘

Now you can distribute the read load across multiple servers:

A load balancer distributes read traffic:

Reads  → 25% to A, 25% to B, 25% to C, 25% to D
Writes → 100% to A (only the primary can write)

Scenario 2: The Replication Challenge (How Do Servers Stay in Sync?) {#scenario-2-the-replication-challenge}

Now you have multiple database servers, but here’s the critical problem:

2:00 PM - User Alice posts "Chocolate Cake" recipe to Server A
2:01 PM - User Bob tries to view all recipes from Server B

Question: Will Bob see Alice’s new recipe?

Answer: Not automatically! Server B doesn’t magically know about changes that happened on Server A.

The Core Challenge

When a write happens on the Primary (Server A), the Replicas (B, C, D) need to be told about the change. This “telling” process is called replication.

The question becomes: HOW and WHEN does Server A notify the replicas?


Scenario 3: Synchronous vs Asynchronous Replication {#scenario-3-sync-vs-async}

You have two fundamental options for how replication works:

Option 1: Synchronous Replication

User posts recipe → Server A receives it
                 → Server A sends to Replicas B, C, D
                 → Server A waits for B, C, D to confirm "got it!"
                 → ONLY THEN does Server A tell user "recipe posted!"

Pros:

Cons:

Option 2: Asynchronous Replication

User posts recipe → Server A receives it
                 → Server A immediately tells user "recipe posted!"
                 → Server A sends data to B, C, D in background
                 → B, C, D update whenever they receive it

Pros:

Cons:

Making the Choice: CookSnap Example

Your CookSnap app has two types of writes:

Recipe Posts:

Recipe Likes:

Real-World Examples

Synchronous Replication:

Asynchronous Replication:


Scenario 4: The Crash Problem (Why WAL Exists) {#scenario-4-the-crash-problem}

Let’s zoom into what happens inside Server A when it processes a write. This reveals why Write-Ahead Log (WAL) is necessary.

The Problem

2:00:00 PM - User posts "Chocolate Cake" recipe
2:00:01 PM - Server A starts writing to database file...
2:00:02 PM - 💥 POWER FAILURE - Server A crashes mid-write
2:00:05 PM - Server A reboots

What went wrong?

The database file might be corrupted. Maybe:

Half-written data = corrupted database. The server has no idea what state the database is in.

The Solution: Write-Ahead Log (WAL)

The core idea: Record what you’re about to do BEFORE you do it.

WAL is an append-only file where you write your “plan” before modifying the actual database:

Step 1: Write to WAL file (fast, simple append)
        "INSERT recipe_id=999, title='Chocolate Cake', 
         ingredients='flour, sugar, cocoa...'"
        
Step 2: Flush WAL to disk (make it durable)

Step 3: NOW update the actual database

Step 4: Mark in WAL "operation complete"

Recovery After Crash

When Server A reboots after the crash:

1. Read the WAL file
2. Check for incomplete operations
3. Found: recipe_id=999 started but not marked complete
4. REPLAY the operation: apply it to the database again
5. Everything is recovered!

Why WAL is Simple and Fast

WAL has specific design properties that make it reliable:

Append-only:

Sequential writes:

Durable:


Scenario 5: Connecting WAL to Replication {#scenario-5-connecting-wal-to-replication}

Here’s where everything comes together beautifully.

The Insight

Remember our replication problem? The Primary needs to tell Replicas about changes.

Question: What does the Primary send to the Replicas?

Answer: The Primary sends its WAL entries!

Why This is Elegant

Primary (Server A) already has:
- Every change recorded in sequential order
- Entries are small and easy to transmit
- Entries describe exact operations to perform

Replicas (Server B, C, D) can:
- Receive the WAL stream
- Write entries to their own WAL
- Replay the same operations on their databases

Complete Write Flow with Replication

User posts "Chocolate Cake" to Primary A:

1. Primary A writes to its WAL: "INSERT recipe_id=999..."
2. Primary A applies change to its database
3. Primary A streams WAL entry to Replicas B, C, D
4. Replicas receive WAL entry and write to their WAL
5. Replicas apply the change to their databases
6. Depending on sync/async:
   - SYNC: Wait for replicas to confirm → then tell user "success!"
   - ASYNC: Tell user "success!" immediately → replicas catch up

Replication is Just WAL Streaming

This is the key insight:


Scenario 6: Replication Lag in Production {#scenario-6-replication-lag}

Your CookSnap app is using asynchronous replication for speed. One day you notice a confusing user experience:

2:00 PM - Alice posts "Chocolate Cake" on Primary A
2:00 PM - Primary A responds "success!" to Alice
2:00 PM - Alice clicks "view my recipes"
2:01 PM - Request goes to Replica B
         → Alice's new recipe isn't there! 😱

What Happened: Replication Lag

Replication lag is the time delay between:

Time    Primary A         Replica B
2:00    Recipe #999 ✓     Recipe #998
2:01    Recipe #999 ✓     Recipe #998 (still catching up)
2:02    Recipe #999 ✓     Recipe #999 ✓ (finally!)

Why It Happens

Several factors cause lag:

Network latency:

Replica load:

Large transactions:

Typical lag in production:

Solutions for Consistency

Option 1: Read from Primary after write

- User writes to Primary
- For next 30 seconds, route that user's reads to Primary
- After 30 seconds, allow reads from replicas

Option 2: Session consistency

- Primary returns: "Write succeeded at timestamp 2:00:00.123"
- User's next read includes: "Give me data as of 2:00:00.123"
- Replica checks: "Do I have data up to that timestamp?"
- If no: wait or redirect to Primary

Option 3: Accept eventual consistency

- Show user a message: "Your post is being processed"
- Or: "Data may take a few seconds to appear everywhere"
- Best for non-critical features (likes, views, etc.)

Scenario 7: When Things Break - Replica Failures {#scenario-7-replica-failures}

Production systems fail. Let’s see what happens when a replica crashes.

The Scenario

3:00 PM - Replica B crashes (hardware failure)
3:00 PM - Meanwhile, Primary A keeps processing writes
          (recipe #45,231, #45,232, #45,233... #48,500)
3:05 PM - Replica B reboots

The Problem

Replica B is now 5 minutes behind. It missed thousands of WAL entries.

How Recovery Works

When Replica B comes back online:

1. Replica B knows: "Last WAL entry I applied was #45,230"

2. Replica B connects to Primary A and says:
   "I need all WAL entries starting from #45,231"

3. Primary A checks: "Do I still have entry #45,231?"
   
   If YES:
   - Stream WAL entries #45,231 through #48,500 to B
   - Replica B applies all entries
   - Replica B is caught up!
   
   If NO (Primary deleted old WAL):
   - Replica B must do a FULL DATABASE COPY
   - This takes much longer (hours instead of minutes)

WAL Retention is Key

This is why WAL retention policy matters:

Short retention (1 hour):

Longer retention (24-72 hours):


Scenario 8: When Things Break - Primary Failures {#scenario-8-primary-failures}

This is the scariest failure scenario: the Primary crashes.

The Disaster

4:00 PM - Primary A crashes 💥
4:00 PM - Users can still READ (from replicas B, C, D)
4:00 PM - Users CANNOT WRITE (no primary!)

Your entire application is now read-only. Every “post recipe”, “add comment”, “like” button fails.

Failover: Promoting a Replica

You need to promote one replica to become the new Primary.

Step 1: Choose which replica to promote

Not all replicas are equally up-to-date:

When Primary crashed, WAL positions were:
- Primary A: Entry #48,500 (then crashed)
- Replica B: Entry #48,498 (2 entries behind)
- Replica C: Entry #48,450 (50 entries behind)
- Replica D: Entry #48,500 (fully caught up!)

Best choice: Promote Replica D (most up-to-date)

Step 2: Promote Replica D

1. Stop Replica D from accepting read traffic
2. Configure D to accept writes (become Primary)
3. Point application to D as new Primary
4. Reconfigure B and C to replicate from D (not A)

Step 3: Deal with the old Primary

When Server A eventually comes back online:

The Data Loss Problem

Asynchronous replication risk:

What if Primary A had:
- Entries #48,499 and #48,500
- But crashed before replicating them

And we promoted Replica B which only had up to #48,498

Result: Entries #48,499 and #48,500 are lost forever.

This is why critical data uses synchronous replication - it guarantees replicas have the data before acknowledging the write.

Automatic vs Manual Failover

Manual failover:

Automatic failover:

Real systems often use: automatic detection + manual approval.


Scenario 9: WAL Retention Strategies {#scenario-9-wal-retention-strategies}

How long should you keep old WAL entries? This depends entirely on your use case.

Option 1: Keep WAL Only Until All Replicas Confirm

When to use:

Scenario: High-Volume Logging System

System: Website clickstream analytics
Writes: 100,000 events/second
WAL growth: 5 TB per day
Data criticality: Low (losing recent clicks isn't catastrophic)
Infrastructure: Stable, replicas rarely crash

Decision: Delete WAL immediately after replication
Reason: Can't afford 5 TB/day storage forever

Trade-offs:

Option 2: Keep WAL for 24-72 Hours

When to use:

Scenario: CookSnap Recipe App

System: Social recipe sharing platform
Writes: 1,000-5,000/second
WAL growth: 50-100 GB per day
Data criticality: Medium (user content matters)
Operations: Occasional replica maintenance/crashes

Decision: Keep 48 hours of WAL
Reason: Balance between safety and cost

Trade-offs:

Scenario: E-commerce Platform

System: Online store
Writes: 10,000 orders/hour
WAL growth: 200 GB per day
Maintenance: Weekly replica updates (4-6 hours)

Decision: Keep 72 hours of WAL
Reason: Maintenance windows need safe buffer

Option 3: Keep WAL Forever (Archive)

When to use:

Scenario: Banking System

System: Core banking transactions
Writes: 50,000 transactions/hour
Compliance: Must audit any historical transaction
Regulation: Point-in-time recovery required

Decision: Archive all WAL forever
Reason: Legal requirement, cost of storage < cost of non-compliance

Trade-offs:

Scenario: Healthcare Records (HIPAA)

System: Electronic health records
Compliance: HIPAA requires 7-year retention
Recovery: Must prove exact record state at any time

Decision: Archive WAL for 7 years
Reason: Legal mandate

Decision Tree: Which Option to Choose?

Step 1: Calculate WAL growth

WAL growth/day = writes/second × entry size × 86,400 seconds

Example:
1,000 writes/sec × 1 KB/entry × 86,400 = 86 GB/day

Step 2: Assess data criticality

Low (metrics, logs):           → Option 1
Medium (user content):         → Option 2
High (financial, health):      → Option 3

Step 3: Evaluate operational needs

Stable infrastructure, no maintenance:     → Option 1
Regular maintenance, occasional crashes:   → Option 2
Compliance requirements:                   → Option 3

Step 4: Consider budget

Tight budget:                  → Option 1
Normal operations budget:      → Option 2
Compliance/regulatory budget:  → Option 3

Summary: Key Takeaways

Replication

Purpose:

Architecture:

Synchronous vs Asynchronous

Synchronous:

Asynchronous:

Write-Ahead Log (WAL)

Purpose:

Properties:

WAL Retention

Short (until replicated):

Medium (24-72 hours):

Long (archived forever):

Failure Scenarios

Replica failure:

Primary failure:


Final Thoughts

Database replication and WAL are not just theoretical concepts - they’re practical solutions to real problems you’ll face when building scalable systems:

The best way to internalize these concepts is to think through scenarios in your own domain. When designing your next system, ask yourself:

  1. What’s my read/write ratio?
  2. How critical is each type of data?
  3. What happens if I lose the last 5 seconds of writes?
  4. How long can I afford for a replica to be down?
  5. What compliance requirements do I have?

These questions will guide you to the right replication strategy for your use case.


Want to dive deeper? Explore PostgreSQL’s replication documentation, MySQL’s binary logs, or MongoDB’s replica sets to see these concepts implemented in real database systems.

About the Author

Aniket Indulkar is an Android Engineer based in London with a Master's in Artificial Intelligence. He writes about AI, ML, Android development, and his continuous learning journey.

Connect on LinkedIn →