29 min read

Understanding Database Isolation Levels and Query Planning - A Beginner's Guide

A comprehensive guide to database concurrency control and query optimization, with hands-on examples and practical advice for developers.

#Databases#System-design#Dev-tools

Table of contents

  1. Understanding Database Isolation Levels and Query Planning: A Beginner’s Guide
  2. Table of Contents
  3. Part 1: Understanding Transactions
  4. What is a Transaction?
  5. The ACID Properties
  6. The Core Problem
  7. Part 2: Isolation Levels
  8. The Spectrum of Isolation
  9. The Four Problems We’re Trying to Prevent
  10. The Four Isolation Levels
  11. Summary Table
  12. Real-World Scenarios I Tested
  13. My Decision Framework
  14. Part 3: Query Planning and Optimization
  15. The Fundamental Question
  16. What is the Query Planner?
  17. Key Concept 1: Indexes
  18. Key Concept 2: Selectivity
  19. Key Concept 3: Statistics
  20. Key Concept 4: Cost Estimation
  21. Seeing Query Plans with EXPLAIN
  22. Real Examples from My Demos
  23. My Query Optimization Process
  24. Index Best Practices I Learned
  25. Part 4: Practical Guidelines
  26. When to Use Each Isolation Level
  27. Handling Serialization Failures
  28. Alternative: Optimistic Locking
  29. Query Optimization Workflow
  30. Index Maintenance
  31. Common Patterns I Use Now
  32. Debugging Checklist
  33. Testing Isolation Levels
  34. Conclusion
  35. What I Wish I’d Known Earlier
  36. My Mental Models Now
  37. Resources That Helped Me
  38. The GitHub Project
  39. Final Thoughts
  40. Quick Reference
  41. Isolation Levels Quick Guide
  42. Query Optimization Quick Guide

Understanding Database Isolation Levels and Query Planning: A Beginner’s Guide

When I started learning about database internals, I was overwhelmed by terms like “phantom reads,” “serializable isolation,” and “query planners.” These concepts seemed abstract and disconnected from real-world programming. After building a hands-on project to understand them, I realized they’re not only fundamental to database systems but also critical for writing correct, performant applications.

This guide documents everything I learned about two core database concepts: isolation levels (how databases handle concurrent transactions) and query planning (how databases optimize query execution). Think of this as my learning journal—a reference I can return to whenever I need to refresh my understanding.

Table of Contents

  1. Part 1: Understanding Transactions
  2. Part 2: Isolation Levels
  3. Part 3: Query Planning and Optimization
  4. Part 4: Practical Guidelines
  5. Conclusion

Part 1: Understanding Transactions

Before diving into isolation levels, I needed to understand what a transaction actually is.

What is a Transaction?

A transaction is a unit of work that must either completely succeed or completely fail—there’s no in-between. It’s a group of database operations that are treated as a single atomic operation.

Example: Bank Transfer

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE user = 'Alice';
  UPDATE accounts SET balance = balance + 100 WHERE user = 'Bob';
COMMIT;

This is ONE transaction with TWO updates. Either both updates happen, or neither happens. If something goes wrong (like the database crashes), you can ROLLBACK, and it’s as if nothing happened.

The ACID Properties

Transactions follow ACID principles:

  • Atomicity: All or nothing
  • Consistency: Database stays in a valid state
  • Isolation: Transactions don’t interfere with each other (this is what isolation levels control)
  • Durability: Once committed, changes are permanent

The “I” in ACID—Isolation—is what we’ll explore deeply.

The Core Problem

Imagine you’re building a ticket management system like Jira. Two developers, Alice and Bob, try to assign the same ticket to themselves at the exact same time:

Time  | Alice's Transaction          | Bob's Transaction
------|------------------------------|---------------------------
T1    | READ ticket: assigned = NULL |
T2    |                              | READ ticket: assigned = NULL
T3    | WRITE ticket: assigned=Alice |
T4    |                              | WRITE ticket: assigned=Bob

After T4, who owns the ticket? Bob—his write overwrote Alice’s. Neither of them knows there was a conflict. This is called a lost update, and it’s just one of several problems that arise with concurrent transactions.


Part 2: Isolation Levels

Isolation levels define what data transactions can “see” when multiple transactions run simultaneously. They represent a trade-off between consistency (correctness) and performance (speed).

The Spectrum of Isolation

Think of isolation levels as a dial:

Lower Isolation ←------------------------→ Higher Isolation
(Faster, Less Safe)                    (Slower, More Safe)

Read Uncommitted → Read Committed → Repeatable Read → Serializable

The Four Problems We’re Trying to Prevent

Before understanding isolation levels, I needed to know what problems they solve:

1. Dirty Read

Definition: Reading uncommitted data from another transaction that might be rolled back.

Example: Banking System

T1: Alice's transaction updates her balance to $500 (not committed)
T2: Bob's transaction reads Alice's balance: $500
T1: Alice's transaction ROLLS BACK (balance stays at $100)
T2: Bob made a decision based on $500 that doesn't exist

Why it’s bad: You’re basing decisions on data that might never exist. In financial systems, this could lead to fraud or incorrect calculations.

Real-world analogy: You see someone put money in your account on their phone, but they cancel the transfer before hitting “send.” If you already spent that money, you’re in trouble.

2. Non-Repeatable Read

Definition: Reading the same row twice in your transaction and getting different values.

Example: Report Generation

-- Your transaction starts
SELECT story_points FROM tickets WHERE id = 1;  -- Returns 5
-- You do some calculations...

-- Meanwhile, another transaction commits an update

SELECT story_points FROM tickets WHERE id = 1;  -- Returns 10
-- Your calculations are now inconsistent!

Why it’s bad: If you’re generating a report that reads the same data multiple times, you expect consistency. Getting different values mid-transaction makes your logic unreliable.

Real-world analogy: You check your bank balance ($100), decide to buy something for $90, check again and see $50. Someone transferred money out while you were shopping. Your decision to buy was based on outdated information.

3. Lost Update

Definition: Two transactions read the same value, both modify it, and one overwrites the other’s change.

Example: Story Points Update

Initial state: story_points = 10

Transaction A (Alice):
  READ story_points = 10
  Calculate: 10 + 3 = 13
  WRITE story_points = 13
  COMMIT

Transaction B (Bob):
  READ story_points = 10  (read same initial value)
  Calculate: 10 + 5 = 15
  WRITE story_points = 15
  COMMIT

Final state: story_points = 15
Expected: 18 (10 + 3 + 5)
Lost: 3 points from Alice's update

Why it’s bad: Data gets lost without anyone knowing. In inventory systems, you could oversell products. In accounting, you could lose financial data.

Real-world analogy: Two people editing the same Google Doc without real-time sync. They both start from “Version 1,” make changes, and save. Whoever saves last overwrites the other person’s changes.

4. Phantom Read

Definition: Running the same query twice and getting a different number of rows because another transaction inserted or deleted rows.

Example: Aggregation Query

-- Your transaction starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- Returns 5

-- Another transaction inserts a new TODO ticket and commits

SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- Returns 6
-- Your aggregation is inconsistent!

Why it’s bad: Aggregations, reports, and batch operations need a consistent view. If rows appear or disappear mid-transaction, your results are unreliable.

Real-world analogy: You count the number of people in a room (10 people), go back to writing it down, count again to verify, and now there are 11 people. Someone walked in while you were counting.

The Four Isolation Levels

Now let’s see how each isolation level prevents (or allows) these problems.

Read Uncommitted

What it guarantees: Almost nothing.

What it prevents: Nothing reliably.

Problems allowed:

  • Dirty reads
  • Non-repeatable reads
  • Lost updates
  • Phantom reads

When to use: Almost never. Even PostgreSQL’s “Read Uncommitted” actually behaves like Read Committed.

Performance: Fastest (but too dangerous)

Read Committed (PostgreSQL Default)

What it guarantees: You only see committed data.

What it prevents:

  • Dirty reads ✓

Problems allowed:

  • Non-repeatable reads ✗
  • Lost updates ✗
  • Phantom reads ✗

Example from my demos:

# Transaction 1
UPDATE tickets SET story_points = 20 WHERE id = 1;
# Not committed yet

# Transaction 2 (Read Committed)
SELECT story_points FROM tickets WHERE id = 1;
# Returns 10 (old value, not the uncommitted 20)

# Transaction 1
COMMIT;

# Transaction 2 (still in same transaction)
SELECT story_points FROM tickets WHERE id = 1;
# Now returns 20 (sees the committed change)

When to use:

  • Web applications with mostly independent operations
  • Read-heavy workloads
  • Dashboards and reports where slight staleness is acceptable
  • Most general-purpose applications (80% of use cases)

Performance: Fast and scalable

Key insight I learned: This is why it’s the default. It prevents the most dangerous problem (dirty reads) while maintaining good performance.

Repeatable Read

What it guarantees: If you read the same row twice, you get the same value.

What it prevents:

  • Dirty reads ✓
  • Non-repeatable reads ✓
  • Phantom reads ✓ (in PostgreSQL)

Problems allowed:

  • Lost updates ✗ (still possible, but can be detected)

Example from my demos:

# Transaction starts (Repeatable Read)
SELECT story_points FROM tickets WHERE id = 1;  # Returns 10

# Another transaction commits: UPDATE story_points = 20

# Same transaction reads again
SELECT story_points FROM tickets WHERE id = 1;  # Still returns 10!
# It sees a consistent snapshot from transaction start

When to use:

  • Reports requiring consistent snapshots
  • Batch processing jobs
  • Data exports
  • Analytics queries
  • Any operation that reads data multiple times and expects consistency

Performance: Medium (some overhead for maintaining snapshots)

Key insight I learned: PostgreSQL’s Repeatable Read actually implements Snapshot Isolation, which is stronger than the SQL standard requires. It prevents phantom reads too.

Serializable

What it guarantees: Transactions appear to run one at a time (serialized), even if they actually run concurrently.

What it prevents:

  • Dirty reads ✓
  • Non-repeatable reads ✓
  • Lost updates ✓
  • Phantom reads ✓
  • All anomalies ✓

Problems allowed: None

Example from my demos:

# Transaction 1 (Serializable)
READ story_points = 10
UPDATE story_points = 13
COMMIT  # Success

# Transaction 2 (Serializable)
READ story_points = 10  (read same initial value)
UPDATE story_points = 15
COMMIT  # ABORTED! Serialization failure detected

When to use:

  • Financial transactions (money transfers, payments)
  • Inventory management (prevent overselling)
  • Booking systems (prevent double-booking)
  • Any scenario where correctness is more important than speed

Performance: Slowest (uses locks or conflict detection)

Key insight I learned: One transaction gets aborted when a conflict is detected. Your application must be prepared to retry failed transactions.

Retry pattern:

def transfer_money_with_retry(from_account, to_account, amount, max_retries=3):
    for attempt in range(max_retries):
        try:
            conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
            # Do the transfer
            conn.commit()
            return True
        except SerializationFailure:
            if attempt == max_retries - 1:
                raise
            time.sleep(0.1 * (2 ** attempt))  # Exponential backoff
    return False

Summary Table

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadLost UpdatePerformance
Read UncommittedPossiblePossiblePossiblePossibleFastest
Read CommittedPreventedPossiblePossiblePossibleFast
Repeatable ReadPreventedPreventedPrevented*PossibleMedium
SerializablePreventedPreventedPreventedPreventedSlowest

*In PostgreSQL. The SQL standard allows phantom reads at this level.

Real-World Scenarios I Tested

Scenario 1: The Lost Update (Most Common Bug)

This was the most eye-opening demo I built. Here’s what happens with Read Committed:

Initial: story_points = 10

Alice and Bob both want to increment story points:
- Alice wants to add 3
- Bob wants to add 5

Expected result: 10 + 3 + 5 = 18

Actual result with Read Committed: 15 (one update lost)
Actual result with Serializable: 13 or 15 (one transaction aborted and should retry)

The lesson: Read Committed doesn’t prevent lost updates. You need either:

  1. Serializable isolation
  2. Explicit locking (SELECT ... FOR UPDATE)
  3. Optimistic locking (version numbers)
  4. Application-level atomic operations

Scenario 2: The Phantom Read Problem

Running reports while data changes:

-- Your report starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- 5 tickets

-- Someone creates a new TODO ticket

SELECT SUM(story_points) FROM tickets WHERE status = 'TODO';  -- Now 6 tickets worth of points
-- Your count and sum are inconsistent!

The lesson: For reports, use Repeatable Read or higher to get a consistent snapshot.

Scenario 3: Non-Repeatable Read

The subtle bug in calculations:

BEGIN;  -- Read Committed

SELECT price FROM products WHERE id = 1;  -- $100
-- Calculate discount: $100 * 0.1 = $10

-- Another transaction updates price to $200 and commits

SELECT price FROM products WHERE id = 1;  -- $200
-- Apply discount: $200 - $10 = $190

-- You subtracted $10 from a $200 item because you calculated based on $100!

The lesson: Any time you read data, perform calculations, and use that data again, you’re vulnerable to non-repeatable reads.

My Decision Framework

After building these demos, here’s how I now choose isolation levels:

1. Start with Read Committed (default)

  • Works for 80% of applications
  • Good performance
  • Prevents the most dangerous issue (dirty reads)

2. Upgrade to Repeatable Read when:

  • Generating reports or exports
  • Performing multi-step calculations on same data
  • Running batch jobs that need consistency
  • Reading related data across multiple queries

3. Use Serializable when:

  • Handling money or critical inventory
  • Preventing resource conflicts (booking systems)
  • Correctness matters more than performance
  • Your application can handle retries

4. Consider application-level solutions:

  • Optimistic locking (version columns)
  • Explicit locks (SELECT ... FOR UPDATE)
  • Atomic operations in a single statement
  • Message queues for serializing operations

Part 3: Query Planning and Optimization

After understanding isolation levels, I learned about the second pillar of database performance: how databases decide how to execute queries.

The Fundamental Question

When you write:

SELECT * FROM employees 
WHERE department = 'Engineering' 
AND salary > 100000;

The database has many ways to execute this:

  1. Read every row and check both conditions
  2. Use an index on department, then filter by salary
  3. Use an index on salary, then filter by department
  4. Use both indexes and combine results

How does it decide? The query planner.

What is the Query Planner?

The query planner (or optimizer) is the part of the database that:

  1. Generates all possible ways to execute your query
  2. Estimates the cost of each approach
  3. Picks the cheapest plan
  4. Executes it
Your SQL → Parse → Generate Plans → Estimate Costs → Pick Best → Execute → Results

Key Concept 1: Indexes

Before understanding query planning, I needed to understand indexes.

The Phone Book Analogy

Without an index (sequential scan):

  • Start at page 1
  • Read every name: “Adams… Baker… Chen…”
  • Keep going until you find “Smith”
  • Could check all 1 million names

With an index (index scan):

  • Know that S is about 3/4 through the book
  • Jump directly there
  • Check a few pages
  • Check maybe 100 names instead of 1 million

That’s a 10,000x improvement!

What Are Indexes Really?

An index is a data structure (usually a B-Tree) that stores:

  • The indexed column values (sorted)
  • Pointers to where the actual rows are stored
Index on salary:
50000 → Row 1543
50000 → Row 8721
50001 → Row 234
...
100000 → Row 456
100001 → Row 789
...

When you query WHERE salary > 100000, the database:

  1. Jumps to the index at the 100000 entry
  2. Reads all entries after that point
  3. Fetches only those row IDs from the table

The Cost: Indexes Aren’t Free

Pros:

  • Make reads MUCH faster (10-100x for selective queries)
  • Essential for large tables
  • Help with sorting (ORDER BY)
  • Help with joins (JOIN … ON)

Cons:

  • Slow down writes (INSERT, UPDATE, DELETE)
  • Take up disk space
  • Need maintenance (rebuilding, updating statistics)

Rule of thumb: Create indexes when reads outnumber writes by 10:1 or more.

Key Concept 2: Selectivity

Selectivity = How many rows a filter keeps vs eliminates

High selectivity (good for indexes):

  • Filters out MOST rows
  • Example: WHERE employee_id = 42 (1 out of 100,000 rows)
  • Keeps < 5% of rows

Low selectivity (indexes may not help):

  • Keeps MOST rows
  • Example: WHERE country = 'USA' (90,000 out of 100,000 rows)
  • Keeps > 20% of rows

Example from My Demo

I have 100,000 employees:

  • Engineering: 50,000 employees (50% - low selectivity)
  • HR: 5,000 employees (5% - high selectivity)
  • Salary > $150k: 10,000 employees (10% - medium selectivity)

Query 1: WHERE department = 'Engineering'

  • Not very selective (50% of rows)
  • Index might not help much
  • Planner might choose sequential scan

Query 2: WHERE department = 'HR'

  • Very selective (5% of rows)
  • Index helps a lot
  • Planner will use index scan

Query 3: WHERE salary > 150000

  • Medium selectivity (10% of rows)
  • Index helps significantly
  • Planner will use index scan

Key insight I learned: The planner chooses indexes based on selectivity, not just because an index exists.

Key Concept 3: Statistics

The query planner can’t actually run all possible plans to see which is fastest—that would take forever. Instead, it uses statistics to estimate costs.

What statistics does the database keep?

For each table and column:

  • Total number of rows
  • Number of distinct values
  • Most common values and their frequencies
  • Data distribution (histogram)
  • Null value percentage
  • Average row size

Example statistics:

-- PostgreSQL stores statistics
SELECT * FROM pg_stats WHERE tablename = 'employees';

-- You'll see things like:
n_distinct: 5  (for department column - 5 unique departments)
most_common_vals: {Engineering, Sales, Marketing}
most_common_freqs: {0.50, 0.30, 0.15}  (50%, 30%, 15%)

Why Statistics Matter

SELECT * FROM employees WHERE department = 'Engineering';

The planner thinks:

  • “Table has 100,000 rows”
  • “Statistics say ‘Engineering’ is 50% of rows”
  • “So this query will return ~50,000 rows”
  • “That’s too many for an index to help”
  • “I’ll use sequential scan”

If statistics are wrong:

-- Statistics say HR is 5% of rows (5,000 employees)
-- Actually, HR grew to 50,000 employees (50% of rows)

SELECT * FROM employees WHERE department = 'HR';

-- Planner thinks: "Only 5,000 rows, use index!" 
-- Reality: 50,000 rows, index scan is slow
-- Should have used sequential scan

The lesson: Keep statistics updated!

ANALYZE employees;  -- Update statistics for this table
ANALYZE;            -- Update all tables

PostgreSQL auto-analyzes tables, but for rapidly changing tables, manual ANALYZE helps.

Key Concept 4: Cost Estimation

The planner assigns a “cost” to each operation:

Costs include:

  • Sequential page read: 1.0 cost units
  • Random page read: 4.0 cost units (disk seeks are expensive)
  • CPU comparison: 0.01 cost units
  • Row processing: 0.01 cost units

Example cost calculation:

-- Sequential scan on 100,000 row table
-- Table is 1,000 pages on disk
Cost = 1,000 pages * 1.0 + 100,000 rows * 0.01 = 2,000

-- Index scan returning 10,000 rows
-- Need to read 10,000 random pages
Cost = 10,000 pages * 4.0 + 10,000 rows * 0.01 = 40,100

-- Wait, index scan is MORE expensive!
-- Planner will choose sequential scan

Key insight I learned: For low selectivity queries (returning > 20% of rows), sequential scans are often faster than index scans because sequential reads are cheaper than random reads.

Seeing Query Plans with EXPLAIN

The best tool for understanding query planning is EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT * FROM employees WHERE salary > 100000;

Output:

Seq Scan on employees  (cost=0.00..1834.00 rows=10000 width=40) 
                       (actual time=0.012..15.234 rows=9876 loops=1)
  Filter: (salary > 100000)
  Rows Removed by Filter: 90124
Planning Time: 0.123 ms
Execution Time: 16.789 ms

How to read this:

  • Seq Scan: Type of operation (sequential scan, reading all rows)
  • cost=0.00..1834.00: Estimated startup cost .. total cost
  • rows=10000: Estimated rows returned
  • width=40: Estimated average row size (bytes)
  • actual time=0.012..15.234: Real time (ms) to first row .. all rows
  • rows=9876: Actual rows returned
  • Rows Removed by Filter: 90124: Rows scanned but didn’t match
  • Planning Time: Time to create the plan
  • Execution Time: Time to actually run the query

Key metrics to watch:

  1. Execution Time: Higher = slower query
  2. Rows: Estimated vs Actual: Big difference = bad statistics
  3. Seq Scan on large tables: Often indicates missing index
  4. Rows Removed by Filter: High number = inefficient filtering

Real Examples from My Demos

Example 1: Full Scan vs Index Scan

Query: Find Engineering employees making > $100k

SELECT * FROM employees 
WHERE department = 'Engineering' 
AND salary > 100000;

Scenario A: No indexes

Seq Scan on employees  (cost=0.00..2834.00 rows=5000 width=40) 
                       (actual time=0.023..35.456 rows=4987 loops=1)
  Filter: (department = 'Engineering' AND salary > 100000)
  Rows Removed by Filter: 95013
Execution Time: 36.789 ms

Reads: 100,000 rows Time: 36 ms

Scenario B: With index on salary

Index Scan using idx_salary on employees  (cost=0.42..456.78 rows=5000 width=40)
                                          (actual time=0.045..5.234 rows=4987 loops=1)
  Index Cond: (salary > 100000)
  Filter: (department = 'Engineering')
  Rows Removed by Filter: 5013
Execution Time: 5.789 ms

Reads: 10,000 rows (only high earners) Time: 5 ms

7x faster! The index on salary reduced the rows to scan from 100,000 to 10,000.

Example 2: Which Index to Use?

Query: Same as above

Available indexes:

  • idx_department on department
  • idx_salary on salary

Stats:

  • Engineering: 50,000 employees (50%)
  • Salary > $100k: 10,000 employees (10%)

If planner uses department index:

  • Reads 50,000 rows (all Engineering employees)
  • Filters to 5,000 who make > $100k
  • Cost: ~50,000 rows processed

If planner uses salary index:

  • Reads 10,000 rows (all high earners)
  • Filters to 5,000 who are in Engineering
  • Cost: ~10,000 rows processed

The planner chooses salary index because it’s more selective (10% vs 50%).

The lesson: Given multiple indexes, the planner picks the one with highest selectivity.

Example 3: Bitmap Index Scan

Query: Same as above

Strategy: Use BOTH indexes!

BitmapAnd  (cost=234.56..678.90 rows=5000 width=40)
  ->  Bitmap Index Scan on idx_department
        Index Cond: (department = 'Engineering')
  ->  Bitmap Index Scan on idx_salary
        Index Cond: (salary > 100000)
  ->  Bitmap Heap Scan on employees
Execution Time: 3.456 ms

How it works:

  1. Scan department index → get bitmap of row IDs [50,000 rows]
  2. Scan salary index → get bitmap of row IDs [10,000 rows]
  3. Perform AND operation on bitmaps → [5,000 rows]
  4. Fetch only those 5,000 rows from table

Even faster! Only fetches the exact rows needed (5,000) instead of filtering 10,000.

When this is used: When the intersection is significantly smaller than either individual filter.

Example 4: JOIN Order

Tables:

  • employees: 100,000 rows
  • departments: 5 rows

Query:

SELECT e.name, d.name 
FROM employees e
JOIN departments d ON e.department = d.name
WHERE d.name = 'Engineering';

Bad plan (employees first):

  1. Scan all 100,000 employees
  2. For each employee, look up their department
  3. Filter to Engineering
  4. Cost: 100,000 lookups

Good plan (departments first):

  1. Find Engineering department (1 row)
  2. Find all employees in Engineering (50,000 rows)
  3. Cost: 50,000 lookups

The planner automatically picks the good plan by starting with the smallest table and most selective filter.

The lesson: The planner considers JOIN order. Smaller tables and more selective filters first.

My Query Optimization Process

After building these demos, here’s my process for slow queries:

Step 1: Measure with EXPLAIN ANALYZE

EXPLAIN ANALYZE your_slow_query;

Step 2: Look for red flags:

  • Seq Scan on large tables (> 10,000 rows)
  • Large difference between estimated and actual rows
  • High execution time
  • Many rows removed by filter

Step 3: Check for indexes:

SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'your_table';

Step 4: Consider creating indexes:

-- Single column
CREATE INDEX idx_salary ON employees(salary);

-- Multiple columns (order matters!)
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- Partial index (for specific conditions)
CREATE INDEX idx_high_earners ON employees(salary) 
WHERE salary > 100000;

Step 5: Update statistics:

ANALYZE your_table;

Step 6: Test again:

EXPLAIN ANALYZE your_slow_query;

Index Best Practices I Learned

Create indexes on:

  • Primary keys (automatic)
  • Foreign keys (manual - PostgreSQL doesn’t auto-create these!)
  • Columns in WHERE clauses
  • Columns in JOIN conditions
  • Columns in ORDER BY
  • Columns in GROUP BY

Don’t create indexes on:

  • Small tables (< 1,000 rows)
  • Columns with low cardinality (gender, boolean)
  • Columns rarely queried
  • Frequently updated columns (indexes slow down writes)

Multi-column index order:

-- If you often query:
WHERE department = 'Engineering' AND salary > 100000

-- Create index in this order:
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- NOT this order:
CREATE INDEX idx_salary_dept ON employees(salary, department);

Rule: Most selective column first, OR column you always filter on first.


Part 4: Practical Guidelines

After building this entire project, here are the practical lessons I’ll apply in real applications.

When to Use Each Isolation Level

Use Read Committed (default) for:

  • Web applications (most requests)
  • REST APIs
  • Microservices
  • Any operation that doesn’t read the same data twice
  • Dashboards (slight staleness is fine)

Use Repeatable Read for:

  • Generating reports
  • Data exports
  • Batch jobs
  • Analytics queries
  • Any operation that reads data multiple times and needs consistency

Use Serializable for:

  • Money transfers
  • Payment processing
  • Inventory updates (prevent overselling)
  • Booking systems (prevent double-booking)
  • Any critical operation where lost updates would be catastrophic

Handling Serialization Failures

When using Serializable, transactions can be aborted. Always implement retries:

def safe_transaction(operation, max_retries=3):
    for attempt in range(max_retries):
        try:
            conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
            result = operation(conn)
            conn.commit()
            return result
        except SerializationFailure:
            conn.rollback()
            if attempt == max_retries - 1:
                raise
            # Exponential backoff
            time.sleep(0.1 * (2 ** attempt))
        finally:
            conn.close()

Alternative: Optimistic Locking

Instead of Serializable isolation, use version numbers:

-- Add version column
ALTER TABLE tickets ADD COLUMN version INTEGER DEFAULT 0;

-- Update with version check
UPDATE tickets 
SET story_points = story_points + 3,
    version = version + 1
WHERE id = 1 
  AND version = 5;  -- Expected version

-- If rowcount = 0, someone else modified it

This works at Read Committed level and gives you control over conflict handling.

Query Optimization Workflow

For Development:

  1. Don’t optimize prematurely
  2. Build features first
  3. Measure performance with real data
  4. Optimize bottlenecks

For Production:

  1. Enable slow query logging
  2. Monitor pg_stat_statements
  3. Identify top 10 slowest queries
  4. Use EXPLAIN ANALYZE
  5. Add indexes strategically
  6. Run ANALYZE regularly

Red flags in production:

  • Queries taking > 1 second
  • Sequential scans on tables > 10,000 rows
  • Many queries on same table (needs index)
  • High CPU usage from sorting (needs index for ORDER BY)

Index Maintenance

Check index usage:

SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan;

If idx_scan = 0: Index is never used → consider dropping it

Unused indexes waste:

  • Disk space
  • Write performance (every INSERT/UPDATE must update the index)
  • Maintenance overhead

Drop unused indexes:

DROP INDEX idx_unused;

Rebuild fragmented indexes:

REINDEX INDEX idx_salary;

Common Patterns I Use Now

Pattern 1: Transaction Wrapper

from contextlib import contextmanager

@contextmanager
def transaction(isolation_level=None):
    conn = get_connection(isolation_level)
    try:
        yield conn
        conn.commit()
    except Exception as e:
        conn.rollback()
        raise
    finally:
        conn.close()

# Usage
with transaction(ISOLATION_LEVEL_SERIALIZABLE):
    # Do work
    cursor.execute("UPDATE ...")

Pattern 2: Read-Write Split

# Read operations: Read Committed (fast)
def get_user_profile(user_id):
    with transaction(ISOLATION_LEVEL_READ_COMMITTED) as conn:
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
        return cursor.fetchone()

# Write operations: Serializable (safe)
def transfer_credits(from_user, to_user, amount):
    with transaction(ISOLATION_LEVEL_SERIALIZABLE) as conn:
        cursor = conn.cursor()
        cursor.execute("UPDATE users SET credits = credits - %s WHERE id = %s", 
                      (amount, from_user))
        cursor.execute("UPDATE users SET credits = credits + %s WHERE id = %s", 
                      (amount, to_user))

Pattern 3: Optimistic Locking

def update_with_optimistic_lock(ticket_id, new_points):
    conn = get_connection()
    cursor = conn.cursor()
    
    # Read with version
    cursor.execute("SELECT story_points, version FROM tickets WHERE id = %s", 
                  (ticket_id,))
    current_points, version = cursor.fetchone()
    
    # Update with version check
    cursor.execute("""
        UPDATE tickets 
        SET story_points = %s, version = version + 1
        WHERE id = %s AND version = %s
    """, (new_points, ticket_id, version))
    
    if cursor.rowcount == 0:
        raise ConcurrentModificationError("Ticket was modified by another user")
    
    conn.commit()
    conn.close()

Pattern 4: Batch Processing with Repeatable Read

def generate_monthly_report():
    # Use Repeatable Read for consistent snapshot
    with transaction(ISOLATION_LEVEL_REPEATABLE_READ) as conn:
        cursor = conn.cursor()
        
        # All queries see the same snapshot
        cursor.execute("SELECT SUM(amount) FROM sales WHERE month = 'October'")
        total_sales = cursor.fetchone()[0]
        
        cursor.execute("SELECT COUNT(*) FROM orders WHERE month = 'October'")
        total_orders = cursor.fetchone()[0]
        
        cursor.execute("SELECT AVG(amount) FROM sales WHERE month = 'October'")
        avg_sale = cursor.fetchone()[0]
        
        return {
            'total_sales': total_sales,
            'total_orders': total_orders,
            'average_sale': avg_sale
        }

Debugging Checklist

When things go wrong, here’s what I check:

For Isolation Issues:

  • Are multiple users/processes accessing the same data?
  • Could two transactions be reading and writing the same rows?
  • Am I seeing inconsistent data within a single operation?
  • Are updates getting lost?
  • Is the isolation level appropriate for the operation?

For Performance Issues:

  • Did I run EXPLAIN ANALYZE on the slow query?
  • Are there sequential scans on large tables?
  • Do indexes exist on WHERE/JOIN columns?
  • Are statistics up to date? (ANALYZE)
  • Is the query returning more data than needed?
  • Are there N+1 query problems? (multiple queries in a loop)

Testing Isolation Levels

The most important lesson: isolation bugs only appear under concurrent load.

# Bad: Testing serially
def test_lost_update():
    update_story_points(1, 5)  # Works fine
    update_story_points(1, 3)  # Works fine
    # No bug detected!

# Good: Testing concurrently
def test_lost_update():
    thread1 = Thread(target=lambda: update_story_points(1, 5))
    thread2 = Thread(target=lambda: update_story_points(1, 3))
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
    # Now we can see if updates were lost!

Load testing tools:

  • pgbench (PostgreSQL’s built-in tool)
  • Apache JMeter
  • Locust
  • Custom Python scripts with threading

Conclusion

When I started this learning journey, database internals felt like black magic. Now I understand they’re based on fundamental trade-offs:

Isolation Levels: Safety vs Performance

  • More isolation = More safety, Less speed
  • Less isolation = More speed, Less safety
  • No silver bullet - choose based on your use case

Query Planning: Indexes vs Table Scans

  • Indexes = Fast reads, Slow writes, Uses space
  • No indexes = Slow reads, Fast writes, Saves space
  • The planner makes intelligent choices based on statistics

What I Wish I’d Known Earlier

  1. Read Committed is usually enough - Don’t cargo-cult Serializable everywhere
  2. Indexes aren’t magic - They help selective queries, not bulk operations
  3. Statistics matter - An outdated ANALYZE can tank performance
  4. Test with concurrency - Serial tests won’t catch isolation bugs
  5. EXPLAIN ANALYZE is your best friend - Use it liberally
  6. Premature optimization is real - Measure first, optimize second

My Mental Models Now

For Isolation:

  • Think about concurrent users editing the same data
  • Ask: “What happens if two people do this simultaneously?”
  • Default to Read Committed, upgrade only when needed
  • Always handle Serializable failures with retries

For Query Planning:

  • Think about data volume and selectivity
  • Ask: “How many rows will this filter keep?”
  • Create indexes for selective filters (< 10% of rows)
  • Update statistics after bulk changes
  • Use EXPLAIN ANALYZE to verify assumptions

Resources That Helped Me

Books:

  • “Designing Data-Intensive Applications” by Martin Kleppmann
  • “PostgreSQL: Up and Running” by Regina Obe and Leo Hsu

Online:

Papers:

  • “A Critique of ANSI SQL Isolation Levels” (explains why standards differ from reality)

The GitHub Project

I built a complete hands-on project demonstrating all these concepts:

  • 7 isolation level demos showing each problem and solution
  • 5 query planning demos with real performance comparisons
  • Docker setup for easy reproduction
  • Colored terminal output for clarity
  • Full documentation

Repository: github.com/yourusername/isolation-and-query-optimization

You can clone it, run the demos, and see these concepts in action. The demos use PostgreSQL and Python with threading to simulate real concurrent scenarios.

Final Thoughts

Understanding these concepts transformed how I think about database design:

  • I’m no longer afraid of concurrent users
  • I understand why some operations are slow
  • I can debug performance issues systematically
  • I make informed decisions about indexes
  • I choose the right isolation level for each operation

The key insight: Databases are sophisticated systems with well-understood trade-offs. Once you understand the trade-offs, you can make intelligent choices for your specific use case.

This blog post is my “note to future self” - a comprehensive reference I can return to whenever I need to refresh these concepts. I hope it helps you too.

If you’re learning about database internals, I highly recommend building your own demos. The act of coding these scenarios and seeing them fail (and then fixing them) cemented my understanding in a way that reading never could.


Quick Reference

Isolation Levels Quick Guide

LevelUse ForPreventsAllows
Read CommittedWeb apps, APIs, dashboardsDirty readsNon-repeatable reads, Lost updates, Phantoms
Repeatable ReadReports, exports, batch jobsDirty reads, Non-repeatable reads, Phantoms*Lost updates
SerializableFinancial transactions, critical updatesEverythingNothing (safest)

*In PostgreSQL

Query Optimization Quick Guide

Slow query? Check these in order:

  1. Run EXPLAIN ANALYZE - See the actual plan
  2. Look for Seq Scan - On tables > 10k rows? Needs index
  3. Check selectivity - Filter keeps > 20% of rows? Index may not help
  4. Verify indexes exist - On WHERE, JOIN, ORDER BY columns?
  5. Update statistics - Run ANALYZE table_name
  6. Re-test - Did it improve?

Create indexes for:

  • Foreign keys (always!)
  • Columns in WHERE clauses (if selective)
  • Columns in JOIN conditions
  • Columns in ORDER BY (if used frequently)

Don’t create indexes for:

  • Small tables (< 1,000 rows)
  • Low cardinality columns (gender, boolean)
  • Rarely queried columns
  • Frequently updated columns

Last updated: October 2025

Project Repository: isolation-and-query-optimization

Questions or corrections? Open an issue on GitHub or reach out on [Twitter/LinkedIn].

Share

More to explore

Keep exploring

Previous

Understanding Database Locks: A Deep Dive into Concurrency Control

Next

Building My First Blog with Astro: A Learning Journey