16 min read

Understanding Database Replication and WAL Through Real-World Scenarios

Learn database replication, synchronous vs asynchronous replication, and Write-Ahead Logging (WAL) through practical system design scenarios

#Databases#System-design#Learning-log

Table of contents

  1. Understanding Database Replication and WAL Through Real-World Scenarios
  2. Table of Contents
  3. Scenario 1: The Growing Startup (Why Replication Exists) ${#scenario-1-the-growing-startup}
  4. The Problem
  5. The Solution: Replication
  6. Scenario 2: The Replication Challenge (How Do Servers Stay in Sync?) ${#scenario-2-the-replication-challenge}
  7. The Core Challenge
  8. Scenario 3: Synchronous vs Asynchronous Replication ${#scenario-3-sync-vs-async}
  9. Option 1: Synchronous Replication
  10. Option 2: Asynchronous Replication
  11. Making the Choice: CookSnap Example
  12. Real-World Examples
  13. Scenario 4: The Crash Problem (Why WAL Exists) ${#scenario-4-the-crash-problem}
  14. The Problem
  15. The Solution: Write-Ahead Log (WAL)
  16. Recovery After Crash
  17. Why WAL is Simple and Fast
  18. Scenario 5: Connecting WAL to Replication ${#scenario-5-connecting-wal-to-replication}
  19. The Insight
  20. Why This is Elegant
  21. Complete Write Flow with Replication
  22. Replication is Just WAL Streaming
  23. Scenario 6: Replication Lag in Production ${#scenario-6-replication-lag}
  24. What Happened: Replication Lag
  25. Why It Happens
  26. Solutions for Consistency
  27. Scenario 7: When Things Break - Replica Failures ${#scenario-7-replica-failures}
  28. The Scenario
  29. The Problem
  30. How Recovery Works
  31. WAL Retention is Key
  32. Scenario 8: When Things Break - Primary Failures ${#scenario-8-primary-failures}
  33. The Disaster
  34. Failover: Promoting a Replica
  35. The Data Loss Problem
  36. Automatic vs Manual Failover
  37. Scenario 9: WAL Retention Strategies ${#scenario-9-wal-retention-strategies}
  38. Option 1: Keep WAL Only Until All Replicas Confirm
  39. Option 2: Keep WAL for 24-72 Hours
  40. Option 3: Keep WAL Forever (Archive)
  41. Decision Tree: Which Option to Choose?
  42. Summary: Key Takeaways
  43. Replication
  44. Synchronous vs Asynchronous
  45. Write-Ahead Log (WAL)
  46. WAL Retention
  47. Failure Scenarios
  48. Final Thoughts

Understanding Database Replication and WAL Through Real-World Scenarios

Database replication and Write-Ahead Logging (WAL) are fundamental concepts in building scalable, reliable systems. Rather than diving into abstract theory, let’s learn these concepts through real-world scenarios that show why these technologies exist and when to use different approaches.

Table of Contents

  1. The Growing Startup: Why Replication Exists
  2. The Replication Challenge: How Do Servers Stay in Sync?
  3. Synchronous vs Asynchronous: Making the Trade-off
  4. The Crash Problem: Why WAL Exists
  5. Connecting WAL to Replication
  6. Replication Lag in Production
  7. When Things Break: Replica Failures
  8. When Things Break: Primary Failures
  9. WAL Retention Strategies

Scenario 1: The Growing Startup (Why Replication Exists) {#scenario-1-the-growing-startup}

Imagine you’re building a recipe-sharing app called “CookSnap”. You start with a single database server in AWS us-east-1.

Month 1: 1,000 users, everything runs smoothly. Your single database handles all reads and writes without breaking a sweat.

Month 6: 100,000 users. You start noticing problems:

  • Page loads are getting slower
  • Database CPU is consistently at 90% during peak hours
  • Most requests are reads (people browsing recipes): ~95%
  • Only a small fraction are writes (creating new recipes, comments): ~5%

The Problem

Your single database server has become a bottleneck. With 100,000 concurrent users trying to browse recipes, that one server simply can’t keep up with the read load.

The Solution: Replication

The idea is simple: create copies of your database on multiple servers.

Before:                          After:
┌──────────┐                    ┌──────────┐
│ Server A │                    │ Server A │ (Primary)
│          │                    │          │
│ Database │                    │ Database │
└──────────┘                    └──────────┘

                                      ├─────────────┬─────────────┐
                                      ▼             ▼             ▼
                                ┌──────────┐  ┌──────────┐  ┌──────────┐
                                │ Server B │  │ Server C │  │ Server D │
                                │ (Replica)│  │ (Replica)│  │ (Replica)│
                                └──────────┘  └──────────┘  └──────────┘

Now you can distribute the read load across multiple servers:

  • Server A (Primary): Handles all writes + some reads
  • Servers B, C, D (Replicas): Handle reads only

A load balancer distributes read traffic:

Reads  → 25% to A, 25% to B, 25% to C, 25% to D
Writes → 100% to A (only the primary can write)

Scenario 2: The Replication Challenge (How Do Servers Stay in Sync?) {#scenario-2-the-replication-challenge}

Now you have multiple database servers, but here’s the critical problem:

2:00 PM - User Alice posts "Chocolate Cake" recipe to Server A
2:01 PM - User Bob tries to view all recipes from Server B

Question: Will Bob see Alice’s new recipe?

Answer: Not automatically! Server B doesn’t magically know about changes that happened on Server A.

The Core Challenge

When a write happens on the Primary (Server A), the Replicas (B, C, D) need to be told about the change. This “telling” process is called replication.

The question becomes: HOW and WHEN does Server A notify the replicas?


Scenario 3: Synchronous vs Asynchronous Replication {#scenario-3-sync-vs-async}

You have two fundamental options for how replication works:

Option 1: Synchronous Replication

User posts recipe → Server A receives it
                 → Server A sends to Replicas B, C, D
                 → Server A waits for B, C, D to confirm "got it!"
                 → ONLY THEN does Server A tell user "recipe posted!"

Pros:

  • Guaranteed consistency: All servers have the data before user sees success
  • No data loss: If Primary crashes, replicas definitely have the data

Cons:

  • Slower writes: Must wait for network round-trip to all replicas
  • Availability issues: If one replica is slow or down, ALL writes slow down

Option 2: Asynchronous Replication

User posts recipe → Server A receives it
                 → Server A immediately tells user "recipe posted!"
                 → Server A sends data to B, C, D in background
                 → B, C, D update whenever they receive it

Pros:

  • Faster writes: User gets immediate response
  • Better availability: Slow/failed replicas don’t block writes

Cons:

  • Possible data loss: If Primary crashes before replicating, data is lost
  • Eventual consistency: Replicas are slightly behind

Making the Choice: CookSnap Example

Your CookSnap app has two types of writes:

Recipe Posts:

  • Users would be very upset if their carefully written recipe disappeared
  • Writes are infrequent (users don’t post recipes constantly)
  • Choice: Synchronous replication
  • Trade-off: Slightly slower post time, but guaranteed safety

Recipe Likes:

  • Losing a few likes is annoying but not catastrophic
  • Writes are very frequent (users clicking like constantly)
  • Choice: Asynchronous replication
  • Trade-off: Super fast, but a few likes might be lost on crash

Real-World Examples

Synchronous Replication:

  • Banking transactions
  • E-commerce orders
  • Medical records
  • Any data where loss is unacceptable

Asynchronous Replication:

  • Social media likes/views
  • Analytics and metrics
  • Cache invalidation
  • Non-critical user actions

Scenario 4: The Crash Problem (Why WAL Exists) {#scenario-4-the-crash-problem}

Let’s zoom into what happens inside Server A when it processes a write. This reveals why Write-Ahead Log (WAL) is necessary.

The Problem

2:00:00 PM - User posts "Chocolate Cake" recipe
2:00:01 PM - Server A starts writing to database file...
2:00:02 PM - 💥 POWER FAILURE - Server A crashes mid-write
2:00:05 PM - Server A reboots

What went wrong?

The database file might be corrupted. Maybe:

  • Recipe title was written: ✓
  • Ingredients were half-written: ⚠️
  • Instructions weren’t written at all: ✗

Half-written data = corrupted database. The server has no idea what state the database is in.

The Solution: Write-Ahead Log (WAL)

The core idea: Record what you’re about to do BEFORE you do it.

WAL is an append-only file where you write your “plan” before modifying the actual database:

Step 1: Write to WAL file (fast, simple append)
        "INSERT recipe_id=999, title='Chocolate Cake', 
         ingredients='flour, sugar, cocoa...'"
        
Step 2: Flush WAL to disk (make it durable)

Step 3: NOW update the actual database

Step 4: Mark in WAL "operation complete"

Recovery After Crash

When Server A reboots after the crash:

1. Read the WAL file
2. Check for incomplete operations
3. Found: recipe_id=999 started but not marked complete
4. REPLAY the operation: apply it to the database again
5. Everything is recovered!

Why WAL is Simple and Fast

WAL has specific design properties that make it reliable:

Append-only:

  • New entries are always added to the end
  • No seeking around on disk (very fast)
  • Old entries never modified (can’t corrupt existing data)

Sequential writes:

  • Disk sequential writes are 100x faster than random writes
  • Modern SSDs can append millions of entries per second

Durable:

  • WAL is flushed to disk before saying “success”
  • Even if database file corrupts, WAL survives

Scenario 5: Connecting WAL to Replication {#scenario-5-connecting-wal-to-replication}

Here’s where everything comes together beautifully.

The Insight

Remember our replication problem? The Primary needs to tell Replicas about changes.

Question: What does the Primary send to the Replicas?

Answer: The Primary sends its WAL entries!

Why This is Elegant

Primary (Server A) already has:
- Every change recorded in sequential order
- Entries are small and easy to transmit
- Entries describe exact operations to perform

Replicas (Server B, C, D) can:
- Receive the WAL stream
- Write entries to their own WAL
- Replay the same operations on their databases

Complete Write Flow with Replication

User posts "Chocolate Cake" to Primary A:

1. Primary A writes to its WAL: "INSERT recipe_id=999..."
2. Primary A applies change to its database
3. Primary A streams WAL entry to Replicas B, C, D
4. Replicas receive WAL entry and write to their WAL
5. Replicas apply the change to their databases
6. Depending on sync/async:
   - SYNC: Wait for replicas to confirm → then tell user "success!"
   - ASYNC: Tell user "success!" immediately → replicas catch up

Replication is Just WAL Streaming

This is the key insight:

  • WAL was invented for crash recovery
  • But it turned out to be perfect for replication too
  • One mechanism solves both problems

Scenario 6: Replication Lag in Production {#scenario-6-replication-lag}

Your CookSnap app is using asynchronous replication for speed. One day you notice a confusing user experience:

2:00 PM - Alice posts "Chocolate Cake" on Primary A
2:00 PM - Primary A responds "success!" to Alice
2:00 PM - Alice clicks "view my recipes"
2:01 PM - Request goes to Replica B
         → Alice's new recipe isn't there! 😱

What Happened: Replication Lag

Replication lag is the time delay between:

  • Primary recording a change (in its WAL)
  • Replica applying that change (from the WAL stream)
Time    Primary A         Replica B
2:00    Recipe #999 ✓     Recipe #998
2:01    Recipe #999 ✓     Recipe #998 (still catching up)
2:02    Recipe #999 ✓     Recipe #999 ✓ (finally!)

Why It Happens

Several factors cause lag:

Network latency:

  • Streaming WAL over network takes time
  • Cross-region replicas: 100-300ms delay

Replica load:

  • If replica is busy serving reads, it’s slower to apply WAL

Large transactions:

  • A big write (uploading 1000 recipes) takes time to replicate

Typical lag in production:

  • Same datacenter: 10-100 milliseconds
  • Cross-region: 100-500 milliseconds
  • Under heavy load: 1-5 seconds

Solutions for Consistency

Option 1: Read from Primary after write

- User writes to Primary
- For next 30 seconds, route that user's reads to Primary
- After 30 seconds, allow reads from replicas

Option 2: Session consistency

- Primary returns: "Write succeeded at timestamp 2:00:00.123"
- User's next read includes: "Give me data as of 2:00:00.123"
- Replica checks: "Do I have data up to that timestamp?"
- If no: wait or redirect to Primary

Option 3: Accept eventual consistency

- Show user a message: "Your post is being processed"
- Or: "Data may take a few seconds to appear everywhere"
- Best for non-critical features (likes, views, etc.)

Scenario 7: When Things Break - Replica Failures {#scenario-7-replica-failures}

Production systems fail. Let’s see what happens when a replica crashes.

The Scenario

3:00 PM - Replica B crashes (hardware failure)
3:00 PM - Meanwhile, Primary A keeps processing writes
          (recipe #45,231, #45,232, #45,233... #48,500)
3:05 PM - Replica B reboots

The Problem

Replica B is now 5 minutes behind. It missed thousands of WAL entries.

How Recovery Works

When Replica B comes back online:

1. Replica B knows: "Last WAL entry I applied was #45,230"

2. Replica B connects to Primary A and says:
   "I need all WAL entries starting from #45,231"

3. Primary A checks: "Do I still have entry #45,231?"
   
   If YES:
   - Stream WAL entries #45,231 through #48,500 to B
   - Replica B applies all entries
   - Replica B is caught up!
   
   If NO (Primary deleted old WAL):
   - Replica B must do a FULL DATABASE COPY
   - This takes much longer (hours instead of minutes)

WAL Retention is Key

This is why WAL retention policy matters:

Short retention (1 hour):

  • Replica must recover within 1 hour
  • Otherwise needs full database copy

Longer retention (24-72 hours):

  • Replica can be down for maintenance
  • Can take time debugging issues
  • More forgiving for recovery

Scenario 8: When Things Break - Primary Failures {#scenario-8-primary-failures}

This is the scariest failure scenario: the Primary crashes.

The Disaster

4:00 PM - Primary A crashes 💥
4:00 PM - Users can still READ (from replicas B, C, D)
4:00 PM - Users CANNOT WRITE (no primary!)

Your entire application is now read-only. Every “post recipe”, “add comment”, “like” button fails.

Failover: Promoting a Replica

You need to promote one replica to become the new Primary.

Step 1: Choose which replica to promote

Not all replicas are equally up-to-date:

When Primary crashed, WAL positions were:
- Primary A: Entry #48,500 (then crashed)
- Replica B: Entry #48,498 (2 entries behind)
- Replica C: Entry #48,450 (50 entries behind)
- Replica D: Entry #48,500 (fully caught up!)

Best choice: Promote Replica D (most up-to-date)

Step 2: Promote Replica D

1. Stop Replica D from accepting read traffic
2. Configure D to accept writes (become Primary)
3. Point application to D as new Primary
4. Reconfigure B and C to replicate from D (not A)

Step 3: Deal with the old Primary

When Server A eventually comes back online:

  • A cannot be Primary (D is Primary now)
  • A must become a Replica
  • A must catch up from D’s WAL stream

The Data Loss Problem

Asynchronous replication risk:

What if Primary A had:
- Entries #48,499 and #48,500
- But crashed before replicating them

And we promoted Replica B which only had up to #48,498

Result: Entries #48,499 and #48,500 are lost forever.

This is why critical data uses synchronous replication - it guarantees replicas have the data before acknowledging the write.

Automatic vs Manual Failover

Manual failover:

  • DBA/engineer manually promotes replica
  • Downtime: 5-30 minutes
  • Safe: human verifies data consistency

Automatic failover:

  • System detects Primary failure
  • Automatically promotes best replica
  • Downtime: 30 seconds - 2 minutes
  • Risk: automation might make wrong choice

Real systems often use: automatic detection + manual approval.


Scenario 9: WAL Retention Strategies {#scenario-9-wal-retention-strategies}

How long should you keep old WAL entries? This depends entirely on your use case.

Option 1: Keep WAL Only Until All Replicas Confirm

When to use:

Scenario: High-Volume Logging System

System: Website clickstream analytics
Writes: 100,000 events/second
WAL growth: 5 TB per day
Data criticality: Low (losing recent clicks isn't catastrophic)
Infrastructure: Stable, replicas rarely crash

Decision: Delete WAL immediately after replication
Reason: Can't afford 5 TB/day storage forever

Trade-offs:

  • ✓ Minimal disk usage
  • ✓ Lower costs
  • ✗ Can’t recover replica down for >1 hour
  • ✗ Can’t add new replica easily

Option 2: Keep WAL for 24-72 Hours

When to use:

Scenario: CookSnap Recipe App

System: Social recipe sharing platform
Writes: 1,000-5,000/second
WAL growth: 50-100 GB per day
Data criticality: Medium (user content matters)
Operations: Occasional replica maintenance/crashes

Decision: Keep 48 hours of WAL
Reason: Balance between safety and cost

Trade-offs:

  • ✓ Replicas can recover from multi-hour downtime
  • ✓ Can add new replicas anytime
  • ✓ Time to debug issues without pressure
  • ✗ ~100-200 GB storage needed
  • ✗ No point-in-time recovery beyond 48 hours

Scenario: E-commerce Platform

System: Online store
Writes: 10,000 orders/hour
WAL growth: 200 GB per day
Maintenance: Weekly replica updates (4-6 hours)

Decision: Keep 72 hours of WAL
Reason: Maintenance windows need safe buffer

Option 3: Keep WAL Forever (Archive)

When to use:

Scenario: Banking System

System: Core banking transactions
Writes: 50,000 transactions/hour
Compliance: Must audit any historical transaction
Regulation: Point-in-time recovery required

Decision: Archive all WAL forever
Reason: Legal requirement, cost of storage < cost of non-compliance

Trade-offs:

  • ✓ Complete audit trail
  • ✓ Point-in-time recovery to any moment
  • ✓ Can answer “what was account balance on June 15, 2023 at 2:37 PM?”
  • ✗ Significant storage costs (TBs to PBs)
  • ✗ Complex archival system needed

Scenario: Healthcare Records (HIPAA)

System: Electronic health records
Compliance: HIPAA requires 7-year retention
Recovery: Must prove exact record state at any time

Decision: Archive WAL for 7 years
Reason: Legal mandate

Decision Tree: Which Option to Choose?

Step 1: Calculate WAL growth

WAL growth/day = writes/second × entry size × 86,400 seconds

Example:
1,000 writes/sec × 1 KB/entry × 86,400 = 86 GB/day

Step 2: Assess data criticality

Low (metrics, logs):           → Option 1
Medium (user content):         → Option 2
High (financial, health):      → Option 3

Step 3: Evaluate operational needs

Stable infrastructure, no maintenance:     → Option 1
Regular maintenance, occasional crashes:   → Option 2
Compliance requirements:                   → Option 3

Step 4: Consider budget

Tight budget:                  → Option 1
Normal operations budget:      → Option 2
Compliance/regulatory budget:  → Option 3

Summary: Key Takeaways

Replication

Purpose:

  • Distribute read load across multiple servers
  • Provide redundancy for high availability
  • Enable geographic distribution

Architecture:

  • Primary (handles all writes)
  • Replicas (handle reads, receive WAL stream)
  • Load balancer distributes read traffic

Synchronous vs Asynchronous

Synchronous:

  • Wait for replicas to confirm before acknowledging write
  • Guarantees consistency, no data loss
  • Slower writes, availability depends on all replicas
  • Use for: critical data (orders, transactions, important user content)

Asynchronous:

  • Acknowledge write immediately, replicate in background
  • Fast writes, high availability
  • Possible data loss, eventual consistency
  • Use for: non-critical data (likes, views, metrics)

Write-Ahead Log (WAL)

Purpose:

  • Crash recovery: replay incomplete operations
  • Replication: stream changes to replicas
  • Point-in-time recovery: restore to any moment

Properties:

  • Append-only (fast, can’t corrupt)
  • Sequential writes (100x faster than random)
  • Durable (flushed to disk before success)

WAL Retention

Short (until replicated):

  • Minimal cost, high risk
  • Use for: high-volume, low-criticality data

Medium (24-72 hours):

  • Balanced approach, most common
  • Use for: typical web applications

Long (archived forever):

  • Maximum safety, high cost
  • Use for: compliance, audit requirements

Failure Scenarios

Replica failure:

  • Replica recovers using Primary’s WAL
  • If WAL expired: full database copy needed

Primary failure:

  • Promote most up-to-date replica
  • Possible data loss with async replication
  • Downtime during failover

Final Thoughts

Database replication and WAL are not just theoretical concepts - they’re practical solutions to real problems you’ll face when building scalable systems:

  • Replication solves the problem of scaling reads and providing redundancy
  • WAL solves the problem of crash recovery and change propagation
  • Sync vs Async is a trade-off you’ll make based on your specific requirements
  • Failure handling is about making smart trade-offs between consistency, availability, and performance

The best way to internalize these concepts is to think through scenarios in your own domain. When designing your next system, ask yourself:

  1. What’s my read/write ratio?
  2. How critical is each type of data?
  3. What happens if I lose the last 5 seconds of writes?
  4. How long can I afford for a replica to be down?
  5. What compliance requirements do I have?

These questions will guide you to the right replication strategy for your use case.


Want to dive deeper? Explore PostgreSQL’s replication documentation, MySQL’s binary logs, or MongoDB’s replica sets to see these concepts implemented in real database systems.

Share

More to explore

Keep exploring

Previous

Database Sharding: A Complete Guide for System Architects

Next

B-trees vs LSM Trees: A Deep Dive for System Designers