Understanding Database Replication and WAL Through Real-World Scenarios
Understanding Database Replication and WAL Through Real-World Scenarios
Database replication and Write-Ahead Logging (WAL) are fundamental concepts in building scalable, reliable systems. Rather than diving into abstract theory, let’s learn these concepts through real-world scenarios that show why these technologies exist and when to use different approaches.
Table of Contents
- The Growing Startup: Why Replication Exists
- The Replication Challenge: How Do Servers Stay in Sync?
- Synchronous vs Asynchronous: Making the Trade-off
- The Crash Problem: Why WAL Exists
- Connecting WAL to Replication
- Replication Lag in Production
- When Things Break: Replica Failures
- When Things Break: Primary Failures
- WAL Retention Strategies
Scenario 1: The Growing Startup (Why Replication Exists) {#scenario-1-the-growing-startup}
Imagine you’re building a recipe-sharing app called “CookSnap”. You start with a single database server in AWS us-east-1.
Month 1: 1,000 users, everything runs smoothly. Your single database handles all reads and writes without breaking a sweat.
Month 6: 100,000 users. You start noticing problems:
- Page loads are getting slower
- Database CPU is consistently at 90% during peak hours
- Most requests are reads (people browsing recipes): ~95%
- Only a small fraction are writes (creating new recipes, comments): ~5%
The Problem
Your single database server has become a bottleneck. With 100,000 concurrent users trying to browse recipes, that one server simply can’t keep up with the read load.
The Solution: Replication
The idea is simple: create copies of your database on multiple servers.
Before: After:
┌──────────┐ ┌──────────┐
│ Server A │ │ Server A │ (Primary)
│ │ │ │
│ Database │ │ Database │
└──────────┘ └──────────┘
│
├─────────────┬─────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Server B │ │ Server C │ │ Server D │
│ (Replica)│ │ (Replica)│ │ (Replica)│
└──────────┘ └──────────┘ └──────────┘
Now you can distribute the read load across multiple servers:
- Server A (Primary): Handles all writes + some reads
- Servers B, C, D (Replicas): Handle reads only
A load balancer distributes read traffic:
Reads → 25% to A, 25% to B, 25% to C, 25% to D
Writes → 100% to A (only the primary can write)
Scenario 2: The Replication Challenge (How Do Servers Stay in Sync?) {#scenario-2-the-replication-challenge}
Now you have multiple database servers, but here’s the critical problem:
2:00 PM - User Alice posts "Chocolate Cake" recipe to Server A
2:01 PM - User Bob tries to view all recipes from Server B
Question: Will Bob see Alice’s new recipe?
Answer: Not automatically! Server B doesn’t magically know about changes that happened on Server A.
The Core Challenge
When a write happens on the Primary (Server A), the Replicas (B, C, D) need to be told about the change. This “telling” process is called replication.
The question becomes: HOW and WHEN does Server A notify the replicas?
Scenario 3: Synchronous vs Asynchronous Replication {#scenario-3-sync-vs-async}
You have two fundamental options for how replication works:
Option 1: Synchronous Replication
User posts recipe → Server A receives it
→ Server A sends to Replicas B, C, D
→ Server A waits for B, C, D to confirm "got it!"
→ ONLY THEN does Server A tell user "recipe posted!"
Pros:
- Guaranteed consistency: All servers have the data before user sees success
- No data loss: If Primary crashes, replicas definitely have the data
Cons:
- Slower writes: Must wait for network round-trip to all replicas
- Availability issues: If one replica is slow or down, ALL writes slow down
Option 2: Asynchronous Replication
User posts recipe → Server A receives it
→ Server A immediately tells user "recipe posted!"
→ Server A sends data to B, C, D in background
→ B, C, D update whenever they receive it
Pros:
- Faster writes: User gets immediate response
- Better availability: Slow/failed replicas don’t block writes
Cons:
- Possible data loss: If Primary crashes before replicating, data is lost
- Eventual consistency: Replicas are slightly behind
Making the Choice: CookSnap Example
Your CookSnap app has two types of writes:
Recipe Posts:
- Users would be very upset if their carefully written recipe disappeared
- Writes are infrequent (users don’t post recipes constantly)
- Choice: Synchronous replication
- Trade-off: Slightly slower post time, but guaranteed safety
Recipe Likes:
- Losing a few likes is annoying but not catastrophic
- Writes are very frequent (users clicking like constantly)
- Choice: Asynchronous replication
- Trade-off: Super fast, but a few likes might be lost on crash
Real-World Examples
Synchronous Replication:
- Banking transactions
- E-commerce orders
- Medical records
- Any data where loss is unacceptable
Asynchronous Replication:
- Social media likes/views
- Analytics and metrics
- Cache invalidation
- Non-critical user actions
Scenario 4: The Crash Problem (Why WAL Exists) {#scenario-4-the-crash-problem}
Let’s zoom into what happens inside Server A when it processes a write. This reveals why Write-Ahead Log (WAL) is necessary.
The Problem
2:00:00 PM - User posts "Chocolate Cake" recipe
2:00:01 PM - Server A starts writing to database file...
2:00:02 PM - 💥 POWER FAILURE - Server A crashes mid-write
2:00:05 PM - Server A reboots
What went wrong?
The database file might be corrupted. Maybe:
- Recipe title was written: ✓
- Ingredients were half-written: ⚠️
- Instructions weren’t written at all: ✗
Half-written data = corrupted database. The server has no idea what state the database is in.
The Solution: Write-Ahead Log (WAL)
The core idea: Record what you’re about to do BEFORE you do it.
WAL is an append-only file where you write your “plan” before modifying the actual database:
Step 1: Write to WAL file (fast, simple append)
"INSERT recipe_id=999, title='Chocolate Cake',
ingredients='flour, sugar, cocoa...'"
Step 2: Flush WAL to disk (make it durable)
Step 3: NOW update the actual database
Step 4: Mark in WAL "operation complete"
Recovery After Crash
When Server A reboots after the crash:
1. Read the WAL file
2. Check for incomplete operations
3. Found: recipe_id=999 started but not marked complete
4. REPLAY the operation: apply it to the database again
5. Everything is recovered!
Why WAL is Simple and Fast
WAL has specific design properties that make it reliable:
Append-only:
- New entries are always added to the end
- No seeking around on disk (very fast)
- Old entries never modified (can’t corrupt existing data)
Sequential writes:
- Disk sequential writes are 100x faster than random writes
- Modern SSDs can append millions of entries per second
Durable:
- WAL is flushed to disk before saying “success”
- Even if database file corrupts, WAL survives
Scenario 5: Connecting WAL to Replication {#scenario-5-connecting-wal-to-replication}
Here’s where everything comes together beautifully.
The Insight
Remember our replication problem? The Primary needs to tell Replicas about changes.
Question: What does the Primary send to the Replicas?
Answer: The Primary sends its WAL entries!
Why This is Elegant
Primary (Server A) already has:
- Every change recorded in sequential order
- Entries are small and easy to transmit
- Entries describe exact operations to perform
Replicas (Server B, C, D) can:
- Receive the WAL stream
- Write entries to their own WAL
- Replay the same operations on their databases
Complete Write Flow with Replication
User posts "Chocolate Cake" to Primary A:
1. Primary A writes to its WAL: "INSERT recipe_id=999..."
2. Primary A applies change to its database
3. Primary A streams WAL entry to Replicas B, C, D
4. Replicas receive WAL entry and write to their WAL
5. Replicas apply the change to their databases
6. Depending on sync/async:
- SYNC: Wait for replicas to confirm → then tell user "success!"
- ASYNC: Tell user "success!" immediately → replicas catch up
Replication is Just WAL Streaming
This is the key insight:
- WAL was invented for crash recovery
- But it turned out to be perfect for replication too
- One mechanism solves both problems
Scenario 6: Replication Lag in Production {#scenario-6-replication-lag}
Your CookSnap app is using asynchronous replication for speed. One day you notice a confusing user experience:
2:00 PM - Alice posts "Chocolate Cake" on Primary A
2:00 PM - Primary A responds "success!" to Alice
2:00 PM - Alice clicks "view my recipes"
2:01 PM - Request goes to Replica B
→ Alice's new recipe isn't there! 😱
What Happened: Replication Lag
Replication lag is the time delay between:
- Primary recording a change (in its WAL)
- Replica applying that change (from the WAL stream)
Time Primary A Replica B
2:00 Recipe #999 ✓ Recipe #998
2:01 Recipe #999 ✓ Recipe #998 (still catching up)
2:02 Recipe #999 ✓ Recipe #999 ✓ (finally!)
Why It Happens
Several factors cause lag:
Network latency:
- Streaming WAL over network takes time
- Cross-region replicas: 100-300ms delay
Replica load:
- If replica is busy serving reads, it’s slower to apply WAL
Large transactions:
- A big write (uploading 1000 recipes) takes time to replicate
Typical lag in production:
- Same datacenter: 10-100 milliseconds
- Cross-region: 100-500 milliseconds
- Under heavy load: 1-5 seconds
Solutions for Consistency
Option 1: Read from Primary after write
- User writes to Primary
- For next 30 seconds, route that user's reads to Primary
- After 30 seconds, allow reads from replicas
Option 2: Session consistency
- Primary returns: "Write succeeded at timestamp 2:00:00.123"
- User's next read includes: "Give me data as of 2:00:00.123"
- Replica checks: "Do I have data up to that timestamp?"
- If no: wait or redirect to Primary
Option 3: Accept eventual consistency
- Show user a message: "Your post is being processed"
- Or: "Data may take a few seconds to appear everywhere"
- Best for non-critical features (likes, views, etc.)
Scenario 7: When Things Break - Replica Failures {#scenario-7-replica-failures}
Production systems fail. Let’s see what happens when a replica crashes.
The Scenario
3:00 PM - Replica B crashes (hardware failure)
3:00 PM - Meanwhile, Primary A keeps processing writes
(recipe #45,231, #45,232, #45,233... #48,500)
3:05 PM - Replica B reboots
The Problem
Replica B is now 5 minutes behind. It missed thousands of WAL entries.
How Recovery Works
When Replica B comes back online:
1. Replica B knows: "Last WAL entry I applied was #45,230"
2. Replica B connects to Primary A and says:
"I need all WAL entries starting from #45,231"
3. Primary A checks: "Do I still have entry #45,231?"
If YES:
- Stream WAL entries #45,231 through #48,500 to B
- Replica B applies all entries
- Replica B is caught up!
If NO (Primary deleted old WAL):
- Replica B must do a FULL DATABASE COPY
- This takes much longer (hours instead of minutes)
WAL Retention is Key
This is why WAL retention policy matters:
Short retention (1 hour):
- Replica must recover within 1 hour
- Otherwise needs full database copy
Longer retention (24-72 hours):
- Replica can be down for maintenance
- Can take time debugging issues
- More forgiving for recovery
Scenario 8: When Things Break - Primary Failures {#scenario-8-primary-failures}
This is the scariest failure scenario: the Primary crashes.
The Disaster
4:00 PM - Primary A crashes 💥
4:00 PM - Users can still READ (from replicas B, C, D)
4:00 PM - Users CANNOT WRITE (no primary!)
Your entire application is now read-only. Every “post recipe”, “add comment”, “like” button fails.
Failover: Promoting a Replica
You need to promote one replica to become the new Primary.
Step 1: Choose which replica to promote
Not all replicas are equally up-to-date:
When Primary crashed, WAL positions were:
- Primary A: Entry #48,500 (then crashed)
- Replica B: Entry #48,498 (2 entries behind)
- Replica C: Entry #48,450 (50 entries behind)
- Replica D: Entry #48,500 (fully caught up!)
Best choice: Promote Replica D (most up-to-date)
Step 2: Promote Replica D
1. Stop Replica D from accepting read traffic
2. Configure D to accept writes (become Primary)
3. Point application to D as new Primary
4. Reconfigure B and C to replicate from D (not A)
Step 3: Deal with the old Primary
When Server A eventually comes back online:
- A cannot be Primary (D is Primary now)
- A must become a Replica
- A must catch up from D’s WAL stream
The Data Loss Problem
Asynchronous replication risk:
What if Primary A had:
- Entries #48,499 and #48,500
- But crashed before replicating them
And we promoted Replica B which only had up to #48,498
Result: Entries #48,499 and #48,500 are lost forever.
This is why critical data uses synchronous replication - it guarantees replicas have the data before acknowledging the write.
Automatic vs Manual Failover
Manual failover:
- DBA/engineer manually promotes replica
- Downtime: 5-30 minutes
- Safe: human verifies data consistency
Automatic failover:
- System detects Primary failure
- Automatically promotes best replica
- Downtime: 30 seconds - 2 minutes
- Risk: automation might make wrong choice
Real systems often use: automatic detection + manual approval.
Scenario 9: WAL Retention Strategies {#scenario-9-wal-retention-strategies}
How long should you keep old WAL entries? This depends entirely on your use case.
Option 1: Keep WAL Only Until All Replicas Confirm
When to use:
Scenario: High-Volume Logging System
System: Website clickstream analytics
Writes: 100,000 events/second
WAL growth: 5 TB per day
Data criticality: Low (losing recent clicks isn't catastrophic)
Infrastructure: Stable, replicas rarely crash
Decision: Delete WAL immediately after replication
Reason: Can't afford 5 TB/day storage forever
Trade-offs:
- ✓ Minimal disk usage
- ✓ Lower costs
- ✗ Can’t recover replica down for >1 hour
- ✗ Can’t add new replica easily
Option 2: Keep WAL for 24-72 Hours
When to use:
Scenario: CookSnap Recipe App
System: Social recipe sharing platform
Writes: 1,000-5,000/second
WAL growth: 50-100 GB per day
Data criticality: Medium (user content matters)
Operations: Occasional replica maintenance/crashes
Decision: Keep 48 hours of WAL
Reason: Balance between safety and cost
Trade-offs:
- ✓ Replicas can recover from multi-hour downtime
- ✓ Can add new replicas anytime
- ✓ Time to debug issues without pressure
- ✗ ~100-200 GB storage needed
- ✗ No point-in-time recovery beyond 48 hours
Scenario: E-commerce Platform
System: Online store
Writes: 10,000 orders/hour
WAL growth: 200 GB per day
Maintenance: Weekly replica updates (4-6 hours)
Decision: Keep 72 hours of WAL
Reason: Maintenance windows need safe buffer
Option 3: Keep WAL Forever (Archive)
When to use:
Scenario: Banking System
System: Core banking transactions
Writes: 50,000 transactions/hour
Compliance: Must audit any historical transaction
Regulation: Point-in-time recovery required
Decision: Archive all WAL forever
Reason: Legal requirement, cost of storage < cost of non-compliance
Trade-offs:
- ✓ Complete audit trail
- ✓ Point-in-time recovery to any moment
- ✓ Can answer “what was account balance on June 15, 2023 at 2:37 PM?”
- ✗ Significant storage costs (TBs to PBs)
- ✗ Complex archival system needed
Scenario: Healthcare Records (HIPAA)
System: Electronic health records
Compliance: HIPAA requires 7-year retention
Recovery: Must prove exact record state at any time
Decision: Archive WAL for 7 years
Reason: Legal mandate
Decision Tree: Which Option to Choose?
Step 1: Calculate WAL growth
WAL growth/day = writes/second × entry size × 86,400 seconds
Example:
1,000 writes/sec × 1 KB/entry × 86,400 = 86 GB/day
Step 2: Assess data criticality
Low (metrics, logs): → Option 1
Medium (user content): → Option 2
High (financial, health): → Option 3
Step 3: Evaluate operational needs
Stable infrastructure, no maintenance: → Option 1
Regular maintenance, occasional crashes: → Option 2
Compliance requirements: → Option 3
Step 4: Consider budget
Tight budget: → Option 1
Normal operations budget: → Option 2
Compliance/regulatory budget: → Option 3
Summary: Key Takeaways
Replication
Purpose:
- Distribute read load across multiple servers
- Provide redundancy for high availability
- Enable geographic distribution
Architecture:
- Primary (handles all writes)
- Replicas (handle reads, receive WAL stream)
- Load balancer distributes read traffic
Synchronous vs Asynchronous
Synchronous:
- Wait for replicas to confirm before acknowledging write
- Guarantees consistency, no data loss
- Slower writes, availability depends on all replicas
- Use for: critical data (orders, transactions, important user content)
Asynchronous:
- Acknowledge write immediately, replicate in background
- Fast writes, high availability
- Possible data loss, eventual consistency
- Use for: non-critical data (likes, views, metrics)
Write-Ahead Log (WAL)
Purpose:
- Crash recovery: replay incomplete operations
- Replication: stream changes to replicas
- Point-in-time recovery: restore to any moment
Properties:
- Append-only (fast, can’t corrupt)
- Sequential writes (100x faster than random)
- Durable (flushed to disk before success)
WAL Retention
Short (until replicated):
- Minimal cost, high risk
- Use for: high-volume, low-criticality data
Medium (24-72 hours):
- Balanced approach, most common
- Use for: typical web applications
Long (archived forever):
- Maximum safety, high cost
- Use for: compliance, audit requirements
Failure Scenarios
Replica failure:
- Replica recovers using Primary’s WAL
- If WAL expired: full database copy needed
Primary failure:
- Promote most up-to-date replica
- Possible data loss with async replication
- Downtime during failover
Final Thoughts
Database replication and WAL are not just theoretical concepts - they’re practical solutions to real problems you’ll face when building scalable systems:
- Replication solves the problem of scaling reads and providing redundancy
- WAL solves the problem of crash recovery and change propagation
- Sync vs Async is a trade-off you’ll make based on your specific requirements
- Failure handling is about making smart trade-offs between consistency, availability, and performance
The best way to internalize these concepts is to think through scenarios in your own domain. When designing your next system, ask yourself:
- What’s my read/write ratio?
- How critical is each type of data?
- What happens if I lose the last 5 seconds of writes?
- How long can I afford for a replica to be down?
- What compliance requirements do I have?
These questions will guide you to the right replication strategy for your use case.
Want to dive deeper? Explore PostgreSQL’s replication documentation, MySQL’s binary logs, or MongoDB’s replica sets to see these concepts implemented in real database systems.