Understanding Database Isolation Levels and Query Planning - A Beginner's Guide

October 6, 2025 • 29 min read

#databases #postgresql #isolation-levels #query-optimization

Understanding Database Isolation Levels and Query Planning: A Beginner’s Guide

When I started learning about database internals, I was overwhelmed by terms like “phantom reads,” “serializable isolation,” and “query planners.” These concepts seemed abstract and disconnected from real-world programming. After building a hands-on project to understand them, I realized they’re not only fundamental to database systems but also critical for writing correct, performant applications.

This guide documents everything I learned about two core database concepts: isolation levels (how databases handle concurrent transactions) and query planning (how databases optimize query execution). Think of this as my learning journal—a reference I can return to whenever I need to refresh my understanding.

Part 1: Understanding Transactions
Part 2: Isolation Levels
Part 3: Query Planning and Optimization
Part 4: Practical Guidelines
Conclusion

Part 1: Understanding Transactions

Before diving into isolation levels, I needed to understand what a transaction actually is.

What is a Transaction?

A transaction is a unit of work that must either completely succeed or completely fail—there’s no in-between. It’s a group of database operations that are treated as a single atomic operation.

Example: Bank Transfer

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE user = 'Alice';
  UPDATE accounts SET balance = balance + 100 WHERE user = 'Bob';
COMMIT;

This is ONE transaction with TWO updates. Either both updates happen, or neither happens. If something goes wrong (like the database crashes), you can ROLLBACK, and it’s as if nothing happened.

The ACID Properties

Transactions follow ACID principles:

Atomicity: All or nothing
Consistency: Database stays in a valid state
Isolation: Transactions don’t interfere with each other (this is what isolation levels control)
Durability: Once committed, changes are permanent

The “I” in ACID—Isolation—is what we’ll explore deeply.

The Core Problem

Imagine you’re building a ticket management system like Jira. Two developers, Alice and Bob, try to assign the same ticket to themselves at the exact same time:

Time  | Alice's Transaction          | Bob's Transaction
------|------------------------------|---------------------------
T1    | READ ticket: assigned = NULL |
T2    |                              | READ ticket: assigned = NULL
T3    | WRITE ticket: assigned=Alice |
T4    |                              | WRITE ticket: assigned=Bob

After T4, who owns the ticket? Bob—his write overwrote Alice’s. Neither of them knows there was a conflict. This is called a lost update, and it’s just one of several problems that arise with concurrent transactions.

Part 2: Isolation Levels

Isolation levels define what data transactions can “see” when multiple transactions run simultaneously. They represent a trade-off between consistency (correctness) and performance (speed).

The Spectrum of Isolation

Think of isolation levels as a dial:

Lower Isolation ←------------------------→ Higher Isolation
(Faster, Less Safe)                    (Slower, More Safe)

Read Uncommitted → Read Committed → Repeatable Read → Serializable

The Four Problems We’re Trying to Prevent

Before understanding isolation levels, I needed to know what problems they solve:

1. Dirty Read

Definition: Reading uncommitted data from another transaction that might be rolled back.

Example: Banking System

T1: Alice's transaction updates her balance to $500 (not committed)
T2: Bob's transaction reads Alice's balance: $500
T1: Alice's transaction ROLLS BACK (balance stays at $100)
T2: Bob made a decision based on $500 that doesn't exist

Why it’s bad: You’re basing decisions on data that might never exist. In financial systems, this could lead to fraud or incorrect calculations.

Real-world analogy: You see someone put money in your account on their phone, but they cancel the transfer before hitting “send.” If you already spent that money, you’re in trouble.

2. Non-Repeatable Read

Definition: Reading the same row twice in your transaction and getting different values.

Example: Report Generation

-- Your transaction starts
SELECT story_points FROM tickets WHERE id = 1;  -- Returns 5
-- You do some calculations...

-- Meanwhile, another transaction commits an update

SELECT story_points FROM tickets WHERE id = 1;  -- Returns 10
-- Your calculations are now inconsistent!

Why it’s bad: If you’re generating a report that reads the same data multiple times, you expect consistency. Getting different values mid-transaction makes your logic unreliable.

Real-world analogy: You check your bank balance ($100), decide to buy something for $90, check again and see $50. Someone transferred money out while you were shopping. Your decision to buy was based on outdated information.

3. Lost Update

Definition: Two transactions read the same value, both modify it, and one overwrites the other’s change.

Example: Story Points Update

Initial state: story_points = 10

Transaction A (Alice):
  READ story_points = 10
  Calculate: 10 + 3 = 13
  WRITE story_points = 13
  COMMIT

Transaction B (Bob):
  READ story_points = 10  (read same initial value)
  Calculate: 10 + 5 = 15
  WRITE story_points = 15
  COMMIT

Final state: story_points = 15
Expected: 18 (10 + 3 + 5)
Lost: 3 points from Alice's update

Why it’s bad: Data gets lost without anyone knowing. In inventory systems, you could oversell products. In accounting, you could lose financial data.

Real-world analogy: Two people editing the same Google Doc without real-time sync. They both start from “Version 1,” make changes, and save. Whoever saves last overwrites the other person’s changes.

4. Phantom Read

Definition: Running the same query twice and getting a different number of rows because another transaction inserted or deleted rows.

Example: Aggregation Query

-- Your transaction starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- Returns 5

-- Another transaction inserts a new TODO ticket and commits

SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- Returns 6
-- Your aggregation is inconsistent!

Why it’s bad: Aggregations, reports, and batch operations need a consistent view. If rows appear or disappear mid-transaction, your results are unreliable.

Real-world analogy: You count the number of people in a room (10 people), go back to writing it down, count again to verify, and now there are 11 people. Someone walked in while you were counting.

The Four Isolation Levels

Now let’s see how each isolation level prevents (or allows) these problems.

Read Uncommitted

What it guarantees: Almost nothing.

What it prevents: Nothing reliably.

Problems allowed:

Dirty reads
Non-repeatable reads
Lost updates
Phantom reads

When to use: Almost never. Even PostgreSQL’s “Read Uncommitted” actually behaves like Read Committed.

Performance: Fastest (but too dangerous)

Read Committed (PostgreSQL Default)

What it guarantees: You only see committed data.

What it prevents:

Dirty reads ✓

Problems allowed:

Non-repeatable reads ✗
Lost updates ✗
Phantom reads ✗

Example from my demos:

# Transaction 1
UPDATE tickets SET story_points = 20 WHERE id = 1;
# Not committed yet

# Transaction 2 (Read Committed)
SELECT story_points FROM tickets WHERE id = 1;
# Returns 10 (old value, not the uncommitted 20)

# Transaction 1
COMMIT;

# Transaction 2 (still in same transaction)
SELECT story_points FROM tickets WHERE id = 1;
# Now returns 20 (sees the committed change)

When to use:

Web applications with mostly independent operations
Read-heavy workloads
Dashboards and reports where slight staleness is acceptable
Most general-purpose applications (80% of use cases)

Performance: Fast and scalable

Key insight I learned: This is why it’s the default. It prevents the most dangerous problem (dirty reads) while maintaining good performance.

Repeatable Read

What it guarantees: If you read the same row twice, you get the same value.

What it prevents:

Dirty reads ✓
Non-repeatable reads ✓
Phantom reads ✓ (in PostgreSQL)

Problems allowed:

Lost updates ✗ (still possible, but can be detected)

Example from my demos:

# Transaction starts (Repeatable Read)
SELECT story_points FROM tickets WHERE id = 1;  # Returns 10

# Another transaction commits: UPDATE story_points = 20

# Same transaction reads again
SELECT story_points FROM tickets WHERE id = 1;  # Still returns 10!
# It sees a consistent snapshot from transaction start

When to use:

Reports requiring consistent snapshots
Batch processing jobs
Data exports
Analytics queries
Any operation that reads data multiple times and expects consistency

Performance: Medium (some overhead for maintaining snapshots)

Key insight I learned: PostgreSQL’s Repeatable Read actually implements Snapshot Isolation, which is stronger than the SQL standard requires. It prevents phantom reads too.

Serializable

What it guarantees: Transactions appear to run one at a time (serialized), even if they actually run concurrently.

What it prevents:

Dirty reads ✓
Non-repeatable reads ✓
Lost updates ✓
Phantom reads ✓
All anomalies ✓

Problems allowed: None

Example from my demos:

# Transaction 1 (Serializable)
READ story_points = 10
UPDATE story_points = 13
COMMIT  # Success

# Transaction 2 (Serializable)
READ story_points = 10  (read same initial value)
UPDATE story_points = 15
COMMIT  # ABORTED! Serialization failure detected

When to use:

Financial transactions (money transfers, payments)
Inventory management (prevent overselling)
Booking systems (prevent double-booking)
Any scenario where correctness is more important than speed

Performance: Slowest (uses locks or conflict detection)

Key insight I learned: One transaction gets aborted when a conflict is detected. Your application must be prepared to retry failed transactions.

Retry pattern:

def transfer_money_with_retry(from_account, to_account, amount, max_retries=3):
    for attempt in range(max_retries):
        try:
            conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
            # Do the transfer
            conn.commit()
            return True
        except SerializationFailure:
            if attempt == max_retries - 1:
                raise
            time.sleep(0.1 * (2 ** attempt))  # Exponential backoff
    return False

Summary Table

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read	Lost Update	Performance
Read Uncommitted	Possible	Possible	Possible	Possible	Fastest
Read Committed	Prevented	Possible	Possible	Possible	Fast
Repeatable Read	Prevented	Prevented	Prevented*	Possible	Medium
Serializable	Prevented	Prevented	Prevented	Prevented	Slowest

*In PostgreSQL. The SQL standard allows phantom reads at this level.

Real-World Scenarios I Tested

Scenario 1: The Lost Update (Most Common Bug)

This was the most eye-opening demo I built. Here’s what happens with Read Committed:

Initial: story_points = 10

Alice and Bob both want to increment story points:
- Alice wants to add 3
- Bob wants to add 5

Expected result: 10 + 3 + 5 = 18

Actual result with Read Committed: 15 (one update lost)
Actual result with Serializable: 13 or 15 (one transaction aborted and should retry)

The lesson: Read Committed doesn’t prevent lost updates. You need either:

Serializable isolation
Explicit locking (SELECT ... FOR UPDATE)
Optimistic locking (version numbers)
Application-level atomic operations

Scenario 2: The Phantom Read Problem

Running reports while data changes:

-- Your report starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- 5 tickets

-- Someone creates a new TODO ticket

SELECT SUM(story_points) FROM tickets WHERE status = 'TODO';  -- Now 6 tickets worth of points
-- Your count and sum are inconsistent!

The lesson: For reports, use Repeatable Read or higher to get a consistent snapshot.

Scenario 3: Non-Repeatable Read

The subtle bug in calculations:

BEGIN;  -- Read Committed

SELECT price FROM products WHERE id = 1;  -- $100
-- Calculate discount: $100 * 0.1 = $10

-- Another transaction updates price to $200 and commits

SELECT price FROM products WHERE id = 1;  -- $200
-- Apply discount: $200 - $10 = $190

-- You subtracted $10 from a $200 item because you calculated based on $100!

The lesson: Any time you read data, perform calculations, and use that data again, you’re vulnerable to non-repeatable reads.

My Decision Framework

After building these demos, here’s how I now choose isolation levels:

1. Start with Read Committed (default)

Works for 80% of applications
Good performance
Prevents the most dangerous issue (dirty reads)

2. Upgrade to Repeatable Read when:

Generating reports or exports
Performing multi-step calculations on same data
Running batch jobs that need consistency
Reading related data across multiple queries

3. Use Serializable when:

Handling money or critical inventory
Preventing resource conflicts (booking systems)
Correctness matters more than performance
Your application can handle retries

4. Consider application-level solutions:

Optimistic locking (version columns)
Explicit locks (SELECT ... FOR UPDATE)
Atomic operations in a single statement
Message queues for serializing operations

Part 3: Query Planning and Optimization

After understanding isolation levels, I learned about the second pillar of database performance: how databases decide how to execute queries.

The Fundamental Question

When you write:

SELECT * FROM employees 
WHERE department = 'Engineering' 
AND salary > 100000;

The database has many ways to execute this:

Read every row and check both conditions
Use an index on department, then filter by salary
Use an index on salary, then filter by department
Use both indexes and combine results

How does it decide? The query planner.

What is the Query Planner?

The query planner (or optimizer) is the part of the database that:

Generates all possible ways to execute your query
Estimates the cost of each approach
Picks the cheapest plan
Executes it

Your SQL → Parse → Generate Plans → Estimate Costs → Pick Best → Execute → Results

Key Concept 1: Indexes

Before understanding query planning, I needed to understand indexes.

The Phone Book Analogy

Without an index (sequential scan):

Start at page 1
Read every name: “Adams… Baker… Chen…”
Keep going until you find “Smith”
Could check all 1 million names

With an index (index scan):

Know that S is about 3/4 through the book
Jump directly there
Check a few pages
Check maybe 100 names instead of 1 million

That’s a 10,000x improvement!

What Are Indexes Really?

An index is a data structure (usually a B-Tree) that stores:

The indexed column values (sorted)
Pointers to where the actual rows are stored

Index on salary:
50000 → Row 1543
50000 → Row 8721
50001 → Row 234
...
100000 → Row 456
100001 → Row 789
...

When you query WHERE salary > 100000, the database:

Jumps to the index at the 100000 entry
Reads all entries after that point
Fetches only those row IDs from the table

The Cost: Indexes Aren’t Free

Pros:

Make reads MUCH faster (10-100x for selective queries)
Essential for large tables
Help with sorting (ORDER BY)
Help with joins (JOIN … ON)

Cons:

Slow down writes (INSERT, UPDATE, DELETE)
Take up disk space
Need maintenance (rebuilding, updating statistics)

Rule of thumb: Create indexes when reads outnumber writes by 10:1 or more.

Key Concept 2: Selectivity

Selectivity = How many rows a filter keeps vs eliminates

High selectivity (good for indexes):

Filters out MOST rows
Example: WHERE employee_id = 42 (1 out of 100,000 rows)
Keeps < 5% of rows

Low selectivity (indexes may not help):

Keeps MOST rows
Example: WHERE country = 'USA' (90,000 out of 100,000 rows)
Keeps > 20% of rows

Example from My Demo

I have 100,000 employees:

Engineering: 50,000 employees (50% - low selectivity)
HR: 5,000 employees (5% - high selectivity)
Salary > $150k: 10,000 employees (10% - medium selectivity)

Query 1: WHERE department = 'Engineering'

Not very selective (50% of rows)
Index might not help much
Planner might choose sequential scan

Query 2: WHERE department = 'HR'

Very selective (5% of rows)
Index helps a lot
Planner will use index scan

Query 3: WHERE salary > 150000

Medium selectivity (10% of rows)
Index helps significantly
Planner will use index scan

Key insight I learned: The planner chooses indexes based on selectivity, not just because an index exists.

Key Concept 3: Statistics

The query planner can’t actually run all possible plans to see which is fastest—that would take forever. Instead, it uses statistics to estimate costs.

What statistics does the database keep?

For each table and column:

Total number of rows
Number of distinct values
Most common values and their frequencies
Data distribution (histogram)
Null value percentage
Average row size

Example statistics:

-- PostgreSQL stores statistics
SELECT * FROM pg_stats WHERE tablename = 'employees';

-- You'll see things like:
n_distinct: 5  (for department column - 5 unique departments)
most_common_vals: {Engineering, Sales, Marketing}
most_common_freqs: {0.50, 0.30, 0.15}  (50%, 30%, 15%)

Why Statistics Matter

SELECT * FROM employees WHERE department = 'Engineering';

The planner thinks:

“Table has 100,000 rows”
“Statistics say ‘Engineering’ is 50% of rows”
“So this query will return ~50,000 rows”
“That’s too many for an index to help”
“I’ll use sequential scan”

If statistics are wrong:

-- Statistics say HR is 5% of rows (5,000 employees)
-- Actually, HR grew to 50,000 employees (50% of rows)

SELECT * FROM employees WHERE department = 'HR';

-- Planner thinks: "Only 5,000 rows, use index!" 
-- Reality: 50,000 rows, index scan is slow
-- Should have used sequential scan

The lesson: Keep statistics updated!

ANALYZE employees;  -- Update statistics for this table
ANALYZE;            -- Update all tables

PostgreSQL auto-analyzes tables, but for rapidly changing tables, manual ANALYZE helps.

Key Concept 4: Cost Estimation

The planner assigns a “cost” to each operation:

Costs include:

Sequential page read: 1.0 cost units
Random page read: 4.0 cost units (disk seeks are expensive)
CPU comparison: 0.01 cost units
Row processing: 0.01 cost units

Example cost calculation:

-- Sequential scan on 100,000 row table
-- Table is 1,000 pages on disk
Cost = 1,000 pages * 1.0 + 100,000 rows * 0.01 = 2,000

-- Index scan returning 10,000 rows
-- Need to read 10,000 random pages
Cost = 10,000 pages * 4.0 + 10,000 rows * 0.01 = 40,100

-- Wait, index scan is MORE expensive!
-- Planner will choose sequential scan

Key insight I learned: For low selectivity queries (returning > 20% of rows), sequential scans are often faster than index scans because sequential reads are cheaper than random reads.

Seeing Query Plans with EXPLAIN

The best tool for understanding query planning is EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT * FROM employees WHERE salary > 100000;

Output:

Seq Scan on employees  (cost=0.00..1834.00 rows=10000 width=40) 
                       (actual time=0.012..15.234 rows=9876 loops=1)
  Filter: (salary > 100000)
  Rows Removed by Filter: 90124
Planning Time: 0.123 ms
Execution Time: 16.789 ms

How to read this:

Seq Scan: Type of operation (sequential scan, reading all rows)
cost=0.00..1834.00: Estimated startup cost .. total cost
rows=10000: Estimated rows returned
width=40: Estimated average row size (bytes)
actual time=0.012..15.234: Real time (ms) to first row .. all rows
rows=9876: Actual rows returned
Rows Removed by Filter: 90124: Rows scanned but didn’t match
Planning Time: Time to create the plan
Execution Time: Time to actually run the query

Key metrics to watch:

Execution Time: Higher = slower query
Rows: Estimated vs Actual: Big difference = bad statistics
Seq Scan on large tables: Often indicates missing index
Rows Removed by Filter: High number = inefficient filtering

Real Examples from My Demos

Example 1: Full Scan vs Index Scan

Query: Find Engineering employees making > $100k

SELECT * FROM employees 
WHERE department = 'Engineering' 
AND salary > 100000;

Scenario A: No indexes

Seq Scan on employees  (cost=0.00..2834.00 rows=5000 width=40) 
                       (actual time=0.023..35.456 rows=4987 loops=1)
  Filter: (department = 'Engineering' AND salary > 100000)
  Rows Removed by Filter: 95013
Execution Time: 36.789 ms

Reads: 100,000 rows Time: 36 ms

Scenario B: With index on salary

Index Scan using idx_salary on employees  (cost=0.42..456.78 rows=5000 width=40)
                                          (actual time=0.045..5.234 rows=4987 loops=1)
  Index Cond: (salary > 100000)
  Filter: (department = 'Engineering')
  Rows Removed by Filter: 5013
Execution Time: 5.789 ms

Reads: 10,000 rows (only high earners) Time: 5 ms

7x faster! The index on salary reduced the rows to scan from 100,000 to 10,000.

Example 2: Which Index to Use?

Query: Same as above

Available indexes:

idx_department on department
idx_salary on salary

Stats:

Engineering: 50,000 employees (50%)
Salary > $100k: 10,000 employees (10%)

If planner uses department index:

Reads 50,000 rows (all Engineering employees)
Filters to 5,000 who make > $100k
Cost: ~50,000 rows processed

If planner uses salary index:

Reads 10,000 rows (all high earners)
Filters to 5,000 who are in Engineering
Cost: ~10,000 rows processed

The planner chooses salary index because it’s more selective (10% vs 50%).

The lesson: Given multiple indexes, the planner picks the one with highest selectivity.

Example 3: Bitmap Index Scan

Query: Same as above

Strategy: Use BOTH indexes!

BitmapAnd  (cost=234.56..678.90 rows=5000 width=40)
  ->  Bitmap Index Scan on idx_department
        Index Cond: (department = 'Engineering')
  ->  Bitmap Index Scan on idx_salary
        Index Cond: (salary > 100000)
  ->  Bitmap Heap Scan on employees
Execution Time: 3.456 ms

How it works:

Scan department index → get bitmap of row IDs [50,000 rows]
Scan salary index → get bitmap of row IDs [10,000 rows]
Perform AND operation on bitmaps → [5,000 rows]
Fetch only those 5,000 rows from table

Even faster! Only fetches the exact rows needed (5,000) instead of filtering 10,000.

When this is used: When the intersection is significantly smaller than either individual filter.

Example 4: JOIN Order

Tables:

employees: 100,000 rows
departments: 5 rows

Query:

SELECT e.name, d.name 
FROM employees e
JOIN departments d ON e.department = d.name
WHERE d.name = 'Engineering';

Bad plan (employees first):

Scan all 100,000 employees
For each employee, look up their department
Filter to Engineering
Cost: 100,000 lookups

Good plan (departments first):

Find Engineering department (1 row)
Find all employees in Engineering (50,000 rows)
Cost: 50,000 lookups

The planner automatically picks the good plan by starting with the smallest table and most selective filter.

The lesson: The planner considers JOIN order. Smaller tables and more selective filters first.

My Query Optimization Process

After building these demos, here’s my process for slow queries:

Step 1: Measure with EXPLAIN ANALYZE

EXPLAIN ANALYZE your_slow_query;

Step 2: Look for red flags:

Seq Scan on large tables (> 10,000 rows)
Large difference between estimated and actual rows
High execution time
Many rows removed by filter

Step 3: Check for indexes:

SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'your_table';

Step 4: Consider creating indexes:

-- Single column
CREATE INDEX idx_salary ON employees(salary);

-- Multiple columns (order matters!)
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- Partial index (for specific conditions)
CREATE INDEX idx_high_earners ON employees(salary) 
WHERE salary > 100000;

Step 5: Update statistics:

ANALYZE your_table;

Step 6: Test again:

EXPLAIN ANALYZE your_slow_query;

Index Best Practices I Learned

Create indexes on:

Primary keys (automatic)
Foreign keys (manual - PostgreSQL doesn’t auto-create these!)
Columns in WHERE clauses
Columns in JOIN conditions
Columns in ORDER BY
Columns in GROUP BY

Don’t create indexes on:

Small tables (< 1,000 rows)
Columns with low cardinality (gender, boolean)
Columns rarely queried
Frequently updated columns (indexes slow down writes)

Multi-column index order:

-- If you often query:
WHERE department = 'Engineering' AND salary > 100000

-- Create index in this order:
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- NOT this order:
CREATE INDEX idx_salary_dept ON employees(salary, department);

Rule: Most selective column first, OR column you always filter on first.

Part 4: Practical Guidelines

After building this entire project, here are the practical lessons I’ll apply in real applications.

When to Use Each Isolation Level

Use Read Committed (default) for:

Web applications (most requests)
REST APIs
Microservices
Any operation that doesn’t read the same data twice
Dashboards (slight staleness is fine)

Use Repeatable Read for:

Generating reports
Data exports
Batch jobs
Analytics queries
Any operation that reads data multiple times and needs consistency

Use Serializable for:

Money transfers
Payment processing
Inventory updates (prevent overselling)
Booking systems (prevent double-booking)
Any critical operation where lost updates would be catastrophic

Handling Serialization Failures

When using Serializable, transactions can be aborted. Always implement retries:

def safe_transaction(operation, max_retries=3):
    for attempt in range(max_retries):
        try:
            conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
            result = operation(conn)
            conn.commit()
            return result
        except SerializationFailure:
            conn.rollback()
            if attempt == max_retries - 1:
                raise
            # Exponential backoff
            time.sleep(0.1 * (2 ** attempt))
        finally:
            conn.close()

Alternative: Optimistic Locking

Instead of Serializable isolation, use version numbers:

-- Add version column
ALTER TABLE tickets ADD COLUMN version INTEGER DEFAULT 0;

-- Update with version check
UPDATE tickets 
SET story_points = story_points + 3,
    version = version + 1
WHERE id = 1 
  AND version = 5;  -- Expected version

-- If rowcount = 0, someone else modified it

This works at Read Committed level and gives you control over conflict handling.

Query Optimization Workflow

For Development:

Don’t optimize prematurely
Build features first
Measure performance with real data
Optimize bottlenecks

For Production:

Enable slow query logging
Monitor pg_stat_statements
Identify top 10 slowest queries
Use EXPLAIN ANALYZE
Add indexes strategically
Run ANALYZE regularly

Red flags in production:

Queries taking > 1 second
Sequential scans on tables > 10,000 rows
Many queries on same table (needs index)
High CPU usage from sorting (needs index for ORDER BY)

Index Maintenance

Check index usage:

SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan;

If idx_scan = 0: Index is never used → consider dropping it

Unused indexes waste:

Disk space
Write performance (every INSERT/UPDATE must update the index)
Maintenance overhead

Drop unused indexes:

DROP INDEX idx_unused;

Rebuild fragmented indexes:

REINDEX INDEX idx_salary;

Common Patterns I Use Now

Pattern 1: Transaction Wrapper

from contextlib import contextmanager

@contextmanager
def transaction(isolation_level=None):
    conn = get_connection(isolation_level)
    try:
        yield conn
        conn.commit()
    except Exception as e:
        conn.rollback()
        raise
    finally:
        conn.close()

# Usage
with transaction(ISOLATION_LEVEL_SERIALIZABLE):
    # Do work
    cursor.execute("UPDATE ...")

Pattern 2: Read-Write Split

# Read operations: Read Committed (fast)
def get_user_profile(user_id):
    with transaction(ISOLATION_LEVEL_READ_COMMITTED) as conn:
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
        return cursor.fetchone()

# Write operations: Serializable (safe)
def transfer_credits(from_user, to_user, amount):
    with transaction(ISOLATION_LEVEL_SERIALIZABLE) as conn:
        cursor = conn.cursor()
        cursor.execute("UPDATE users SET credits = credits - %s WHERE id = %s", 
                      (amount, from_user))
        cursor.execute("UPDATE users SET credits = credits + %s WHERE id = %s", 
                      (amount, to_user))

Pattern 3: Optimistic Locking

def update_with_optimistic_lock(ticket_id, new_points):
    conn = get_connection()
    cursor = conn.cursor()
    
    # Read with version
    cursor.execute("SELECT story_points, version FROM tickets WHERE id = %s", 
                  (ticket_id,))
    current_points, version = cursor.fetchone()
    
    # Update with version check
    cursor.execute("""
        UPDATE tickets 
        SET story_points = %s, version = version + 1
        WHERE id = %s AND version = %s
    """, (new_points, ticket_id, version))
    
    if cursor.rowcount == 0:
        raise ConcurrentModificationError("Ticket was modified by another user")
    
    conn.commit()
    conn.close()

Pattern 4: Batch Processing with Repeatable Read

def generate_monthly_report():
    # Use Repeatable Read for consistent snapshot
    with transaction(ISOLATION_LEVEL_REPEATABLE_READ) as conn:
        cursor = conn.cursor()
        
        # All queries see the same snapshot
        cursor.execute("SELECT SUM(amount) FROM sales WHERE month = 'October'")
        total_sales = cursor.fetchone()[0]
        
        cursor.execute("SELECT COUNT(*) FROM orders WHERE month = 'October'")
        total_orders = cursor.fetchone()[0]
        
        cursor.execute("SELECT AVG(amount) FROM sales WHERE month = 'October'")
        avg_sale = cursor.fetchone()[0]
        
        return {
            'total_sales': total_sales,
            'total_orders': total_orders,
            'average_sale': avg_sale
        }

Debugging Checklist

When things go wrong, here’s what I check:

For Isolation Issues:

Are multiple users/processes accessing the same data?
Could two transactions be reading and writing the same rows?
Am I seeing inconsistent data within a single operation?
Are updates getting lost?
Is the isolation level appropriate for the operation?

For Performance Issues:

Did I run EXPLAIN ANALYZE on the slow query?
Are there sequential scans on large tables?
Do indexes exist on WHERE/JOIN columns?
Are statistics up to date? (ANALYZE)
Is the query returning more data than needed?
Are there N+1 query problems? (multiple queries in a loop)

Testing Isolation Levels

The most important lesson: isolation bugs only appear under concurrent load.

# Bad: Testing serially
def test_lost_update():
    update_story_points(1, 5)  # Works fine
    update_story_points(1, 3)  # Works fine
    # No bug detected!

# Good: Testing concurrently
def test_lost_update():
    thread1 = Thread(target=lambda: update_story_points(1, 5))
    thread2 = Thread(target=lambda: update_story_points(1, 3))
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
    # Now we can see if updates were lost!

Load testing tools:

pgbench (PostgreSQL’s built-in tool)
Apache JMeter
Locust
Custom Python scripts with threading

Conclusion

When I started this learning journey, database internals felt like black magic. Now I understand they’re based on fundamental trade-offs:

Isolation Levels: Safety vs Performance

More isolation = More safety, Less speed
Less isolation = More speed, Less safety
No silver bullet - choose based on your use case

Query Planning: Indexes vs Table Scans

Indexes = Fast reads, Slow writes, Uses space
No indexes = Slow reads, Fast writes, Saves space
The planner makes intelligent choices based on statistics

What I Wish I’d Known Earlier

Read Committed is usually enough - Don’t cargo-cult Serializable everywhere
Indexes aren’t magic - They help selective queries, not bulk operations
Statistics matter - An outdated ANALYZE can tank performance
Test with concurrency - Serial tests won’t catch isolation bugs
EXPLAIN ANALYZE is your best friend - Use it liberally
Premature optimization is real - Measure first, optimize second

My Mental Models Now

For Isolation:

Think about concurrent users editing the same data
Ask: “What happens if two people do this simultaneously?”
Default to Read Committed, upgrade only when needed
Always handle Serializable failures with retries

For Query Planning:

Think about data volume and selectivity
Ask: “How many rows will this filter keep?”
Create indexes for selective filters (< 10% of rows)
Update statistics after bulk changes
Use EXPLAIN ANALYZE to verify assumptions

Resources That Helped Me

Books:

“Designing Data-Intensive Applications” by Martin Kleppmann
“PostgreSQL: Up and Running” by Regina Obe and Leo Hsu

Online:

Papers:

“A Critique of ANSI SQL Isolation Levels” (explains why standards differ from reality)

The GitHub Project

I built a complete hands-on project demonstrating all these concepts:

7 isolation level demos showing each problem and solution
5 query planning demos with real performance comparisons
Docker setup for easy reproduction
Colored terminal output for clarity
Full documentation

Repository: github.com/yourusername/isolation-and-query-optimization

You can clone it, run the demos, and see these concepts in action. The demos use PostgreSQL and Python with threading to simulate real concurrent scenarios.

Final Thoughts

Understanding these concepts transformed how I think about database design:

I’m no longer afraid of concurrent users
I understand why some operations are slow
I can debug performance issues systematically
I make informed decisions about indexes
I choose the right isolation level for each operation

The key insight: Databases are sophisticated systems with well-understood trade-offs. Once you understand the trade-offs, you can make intelligent choices for your specific use case.

This blog post is my “note to future self” - a comprehensive reference I can return to whenever I need to refresh these concepts. I hope it helps you too.

If you’re learning about database internals, I highly recommend building your own demos. The act of coding these scenarios and seeing them fail (and then fixing them) cemented my understanding in a way that reading never could.

Quick Reference

Isolation Levels Quick Guide

Level	Use For	Prevents	Allows
Read Committed	Web apps, APIs, dashboards	Dirty reads	Non-repeatable reads, Lost updates, Phantoms
Repeatable Read	Reports, exports, batch jobs	Dirty reads, Non-repeatable reads, Phantoms*	Lost updates
Serializable	Financial transactions, critical updates	Everything	Nothing (safest)

*In PostgreSQL

Query Optimization Quick Guide

Slow query? Check these in order:

Run EXPLAIN ANALYZE - See the actual plan
Look for Seq Scan - On tables > 10k rows? Needs index
Check selectivity - Filter keeps > 20% of rows? Index may not help
Verify indexes exist - On WHERE, JOIN, ORDER BY columns?
Update statistics - Run ANALYZE table_name
Re-test - Did it improve?

Create indexes for:

Foreign keys (always!)
Columns in WHERE clauses (if selective)
Columns in JOIN conditions
Columns in ORDER BY (if used frequently)

Don’t create indexes for:

Small tables (< 1,000 rows)
Low cardinality columns (gender, boolean)
Rarely queried columns
Frequently updated columns

Last updated: October 2025

Project Repository: isolation-and-query-optimization

Questions or corrections? Open an issue on GitHub or reach out on [Twitter/LinkedIn].

Understanding Database Isolation Levels and Query Planning: A Beginner’s Guide

Table of Contents

Part 1: Understanding Transactions

What is a Transaction?

The ACID Properties

The Core Problem

Part 2: Isolation Levels

The Spectrum of Isolation

The Four Problems We’re Trying to Prevent

1. Dirty Read

2. Non-Repeatable Read

3. Lost Update

4. Phantom Read

The Four Isolation Levels

Read Uncommitted

Read Committed (PostgreSQL Default)

Repeatable Read

Serializable

Summary Table

Real-World Scenarios I Tested

Scenario 1: The Lost Update (Most Common Bug)

Scenario 2: The Phantom Read Problem

Scenario 3: Non-Repeatable Read

My Decision Framework

Part 3: Query Planning and Optimization

The Fundamental Question

What is the Query Planner?

Key Concept 1: Indexes

The Phone Book Analogy

What Are Indexes Really?

The Cost: Indexes Aren’t Free

Key Concept 2: Selectivity

Example from My Demo

Key Concept 3: Statistics

Why Statistics Matter

Key Concept 4: Cost Estimation

Seeing Query Plans with EXPLAIN

Real Examples from My Demos

Example 1: Full Scan vs Index Scan

Example 2: Which Index to Use?

Example 3: Bitmap Index Scan

Example 4: JOIN Order

My Query Optimization Process

Index Best Practices I Learned

Part 4: Practical Guidelines

When to Use Each Isolation Level

Use Read Committed (default) for:

Use Repeatable Read for:

Use Serializable for:

Handling Serialization Failures

Alternative: Optimistic Locking

Query Optimization Workflow

Index Maintenance

Common Patterns I Use Now

Pattern 1: Transaction Wrapper

Pattern 2: Read-Write Split

Pattern 3: Optimistic Locking

Pattern 4: Batch Processing with Repeatable Read

Debugging Checklist

Testing Isolation Levels

Conclusion

What I Wish I’d Known Earlier

My Mental Models Now

Resources That Helped Me

The GitHub Project

Final Thoughts

Quick Reference

Isolation Levels Quick Guide

Query Optimization Quick Guide