Understanding Database Isolation Levels and Query Planning - A Beginner's Guide

29 min read
#databases #postgresql #isolation-levels #query-optimization

Understanding Database Isolation Levels and Query Planning: A Beginner’s Guide

When I started learning about database internals, I was overwhelmed by terms like “phantom reads,” “serializable isolation,” and “query planners.” These concepts seemed abstract and disconnected from real-world programming. After building a hands-on project to understand them, I realized they’re not only fundamental to database systems but also critical for writing correct, performant applications.

This guide documents everything I learned about two core database concepts: isolation levels (how databases handle concurrent transactions) and query planning (how databases optimize query execution). Think of this as my learning journal—a reference I can return to whenever I need to refresh my understanding.

Table of Contents

  1. Part 1: Understanding Transactions
  2. Part 2: Isolation Levels
  3. Part 3: Query Planning and Optimization
  4. Part 4: Practical Guidelines
  5. Conclusion

Part 1: Understanding Transactions

Before diving into isolation levels, I needed to understand what a transaction actually is.

What is a Transaction?

A transaction is a unit of work that must either completely succeed or completely fail—there’s no in-between. It’s a group of database operations that are treated as a single atomic operation.

Example: Bank Transfer

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE user = 'Alice';
  UPDATE accounts SET balance = balance + 100 WHERE user = 'Bob';
COMMIT;

This is ONE transaction with TWO updates. Either both updates happen, or neither happens. If something goes wrong (like the database crashes), you can ROLLBACK, and it’s as if nothing happened.

The ACID Properties

Transactions follow ACID principles:

The “I” in ACID—Isolation—is what we’ll explore deeply.

The Core Problem

Imagine you’re building a ticket management system like Jira. Two developers, Alice and Bob, try to assign the same ticket to themselves at the exact same time:

Time  | Alice's Transaction          | Bob's Transaction
------|------------------------------|---------------------------
T1    | READ ticket: assigned = NULL |
T2    |                              | READ ticket: assigned = NULL
T3    | WRITE ticket: assigned=Alice |
T4    |                              | WRITE ticket: assigned=Bob

After T4, who owns the ticket? Bob—his write overwrote Alice’s. Neither of them knows there was a conflict. This is called a lost update, and it’s just one of several problems that arise with concurrent transactions.


Part 2: Isolation Levels

Isolation levels define what data transactions can “see” when multiple transactions run simultaneously. They represent a trade-off between consistency (correctness) and performance (speed).

The Spectrum of Isolation

Think of isolation levels as a dial:

Lower Isolation ←------------------------→ Higher Isolation
(Faster, Less Safe)                    (Slower, More Safe)

Read Uncommitted → Read Committed → Repeatable Read → Serializable

The Four Problems We’re Trying to Prevent

Before understanding isolation levels, I needed to know what problems they solve:

1. Dirty Read

Definition: Reading uncommitted data from another transaction that might be rolled back.

Example: Banking System

T1: Alice's transaction updates her balance to $500 (not committed)
T2: Bob's transaction reads Alice's balance: $500
T1: Alice's transaction ROLLS BACK (balance stays at $100)
T2: Bob made a decision based on $500 that doesn't exist

Why it’s bad: You’re basing decisions on data that might never exist. In financial systems, this could lead to fraud or incorrect calculations.

Real-world analogy: You see someone put money in your account on their phone, but they cancel the transfer before hitting “send.” If you already spent that money, you’re in trouble.

2. Non-Repeatable Read

Definition: Reading the same row twice in your transaction and getting different values.

Example: Report Generation

-- Your transaction starts
SELECT story_points FROM tickets WHERE id = 1;  -- Returns 5
-- You do some calculations...

-- Meanwhile, another transaction commits an update

SELECT story_points FROM tickets WHERE id = 1;  -- Returns 10
-- Your calculations are now inconsistent!

Why it’s bad: If you’re generating a report that reads the same data multiple times, you expect consistency. Getting different values mid-transaction makes your logic unreliable.

Real-world analogy: You check your bank balance ($100), decide to buy something for $90, check again and see $50. Someone transferred money out while you were shopping. Your decision to buy was based on outdated information.

3. Lost Update

Definition: Two transactions read the same value, both modify it, and one overwrites the other’s change.

Example: Story Points Update

Initial state: story_points = 10

Transaction A (Alice):
  READ story_points = 10
  Calculate: 10 + 3 = 13
  WRITE story_points = 13
  COMMIT

Transaction B (Bob):
  READ story_points = 10  (read same initial value)
  Calculate: 10 + 5 = 15
  WRITE story_points = 15
  COMMIT

Final state: story_points = 15
Expected: 18 (10 + 3 + 5)
Lost: 3 points from Alice's update

Why it’s bad: Data gets lost without anyone knowing. In inventory systems, you could oversell products. In accounting, you could lose financial data.

Real-world analogy: Two people editing the same Google Doc without real-time sync. They both start from “Version 1,” make changes, and save. Whoever saves last overwrites the other person’s changes.

4. Phantom Read

Definition: Running the same query twice and getting a different number of rows because another transaction inserted or deleted rows.

Example: Aggregation Query

-- Your transaction starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- Returns 5

-- Another transaction inserts a new TODO ticket and commits

SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- Returns 6
-- Your aggregation is inconsistent!

Why it’s bad: Aggregations, reports, and batch operations need a consistent view. If rows appear or disappear mid-transaction, your results are unreliable.

Real-world analogy: You count the number of people in a room (10 people), go back to writing it down, count again to verify, and now there are 11 people. Someone walked in while you were counting.

The Four Isolation Levels

Now let’s see how each isolation level prevents (or allows) these problems.

Read Uncommitted

What it guarantees: Almost nothing.

What it prevents: Nothing reliably.

Problems allowed:

When to use: Almost never. Even PostgreSQL’s “Read Uncommitted” actually behaves like Read Committed.

Performance: Fastest (but too dangerous)

Read Committed (PostgreSQL Default)

What it guarantees: You only see committed data.

What it prevents:

Problems allowed:

Example from my demos:

# Transaction 1
UPDATE tickets SET story_points = 20 WHERE id = 1;
# Not committed yet

# Transaction 2 (Read Committed)
SELECT story_points FROM tickets WHERE id = 1;
# Returns 10 (old value, not the uncommitted 20)

# Transaction 1
COMMIT;

# Transaction 2 (still in same transaction)
SELECT story_points FROM tickets WHERE id = 1;
# Now returns 20 (sees the committed change)

When to use:

Performance: Fast and scalable

Key insight I learned: This is why it’s the default. It prevents the most dangerous problem (dirty reads) while maintaining good performance.

Repeatable Read

What it guarantees: If you read the same row twice, you get the same value.

What it prevents:

Problems allowed:

Example from my demos:

# Transaction starts (Repeatable Read)
SELECT story_points FROM tickets WHERE id = 1;  # Returns 10

# Another transaction commits: UPDATE story_points = 20

# Same transaction reads again
SELECT story_points FROM tickets WHERE id = 1;  # Still returns 10!
# It sees a consistent snapshot from transaction start

When to use:

Performance: Medium (some overhead for maintaining snapshots)

Key insight I learned: PostgreSQL’s Repeatable Read actually implements Snapshot Isolation, which is stronger than the SQL standard requires. It prevents phantom reads too.

Serializable

What it guarantees: Transactions appear to run one at a time (serialized), even if they actually run concurrently.

What it prevents:

Problems allowed: None

Example from my demos:

# Transaction 1 (Serializable)
READ story_points = 10
UPDATE story_points = 13
COMMIT  # Success

# Transaction 2 (Serializable)
READ story_points = 10  (read same initial value)
UPDATE story_points = 15
COMMIT  # ABORTED! Serialization failure detected

When to use:

Performance: Slowest (uses locks or conflict detection)

Key insight I learned: One transaction gets aborted when a conflict is detected. Your application must be prepared to retry failed transactions.

Retry pattern:

def transfer_money_with_retry(from_account, to_account, amount, max_retries=3):
    for attempt in range(max_retries):
        try:
            conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
            # Do the transfer
            conn.commit()
            return True
        except SerializationFailure:
            if attempt == max_retries - 1:
                raise
            time.sleep(0.1 * (2 ** attempt))  # Exponential backoff
    return False

Summary Table

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadLost UpdatePerformance
Read UncommittedPossiblePossiblePossiblePossibleFastest
Read CommittedPreventedPossiblePossiblePossibleFast
Repeatable ReadPreventedPreventedPrevented*PossibleMedium
SerializablePreventedPreventedPreventedPreventedSlowest

*In PostgreSQL. The SQL standard allows phantom reads at this level.

Real-World Scenarios I Tested

Scenario 1: The Lost Update (Most Common Bug)

This was the most eye-opening demo I built. Here’s what happens with Read Committed:

Initial: story_points = 10

Alice and Bob both want to increment story points:
- Alice wants to add 3
- Bob wants to add 5

Expected result: 10 + 3 + 5 = 18

Actual result with Read Committed: 15 (one update lost)
Actual result with Serializable: 13 or 15 (one transaction aborted and should retry)

The lesson: Read Committed doesn’t prevent lost updates. You need either:

  1. Serializable isolation
  2. Explicit locking (SELECT ... FOR UPDATE)
  3. Optimistic locking (version numbers)
  4. Application-level atomic operations

Scenario 2: The Phantom Read Problem

Running reports while data changes:

-- Your report starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO';  -- 5 tickets

-- Someone creates a new TODO ticket

SELECT SUM(story_points) FROM tickets WHERE status = 'TODO';  -- Now 6 tickets worth of points
-- Your count and sum are inconsistent!

The lesson: For reports, use Repeatable Read or higher to get a consistent snapshot.

Scenario 3: Non-Repeatable Read

The subtle bug in calculations:

BEGIN;  -- Read Committed

SELECT price FROM products WHERE id = 1;  -- $100
-- Calculate discount: $100 * 0.1 = $10

-- Another transaction updates price to $200 and commits

SELECT price FROM products WHERE id = 1;  -- $200
-- Apply discount: $200 - $10 = $190

-- You subtracted $10 from a $200 item because you calculated based on $100!

The lesson: Any time you read data, perform calculations, and use that data again, you’re vulnerable to non-repeatable reads.

My Decision Framework

After building these demos, here’s how I now choose isolation levels:

1. Start with Read Committed (default)

2. Upgrade to Repeatable Read when:

3. Use Serializable when:

4. Consider application-level solutions:


Part 3: Query Planning and Optimization

After understanding isolation levels, I learned about the second pillar of database performance: how databases decide how to execute queries.

The Fundamental Question

When you write:

SELECT * FROM employees 
WHERE department = 'Engineering' 
AND salary > 100000;

The database has many ways to execute this:

  1. Read every row and check both conditions
  2. Use an index on department, then filter by salary
  3. Use an index on salary, then filter by department
  4. Use both indexes and combine results

How does it decide? The query planner.

What is the Query Planner?

The query planner (or optimizer) is the part of the database that:

  1. Generates all possible ways to execute your query
  2. Estimates the cost of each approach
  3. Picks the cheapest plan
  4. Executes it
Your SQL → Parse → Generate Plans → Estimate Costs → Pick Best → Execute → Results

Key Concept 1: Indexes

Before understanding query planning, I needed to understand indexes.

The Phone Book Analogy

Without an index (sequential scan):

With an index (index scan):

That’s a 10,000x improvement!

What Are Indexes Really?

An index is a data structure (usually a B-Tree) that stores:

Index on salary:
50000 → Row 1543
50000 → Row 8721
50001 → Row 234
...
100000 → Row 456
100001 → Row 789
...

When you query WHERE salary > 100000, the database:

  1. Jumps to the index at the 100000 entry
  2. Reads all entries after that point
  3. Fetches only those row IDs from the table

The Cost: Indexes Aren’t Free

Pros:

Cons:

Rule of thumb: Create indexes when reads outnumber writes by 10:1 or more.

Key Concept 2: Selectivity

Selectivity = How many rows a filter keeps vs eliminates

High selectivity (good for indexes):

Low selectivity (indexes may not help):

Example from My Demo

I have 100,000 employees:

Query 1: WHERE department = 'Engineering'

Query 2: WHERE department = 'HR'

Query 3: WHERE salary > 150000

Key insight I learned: The planner chooses indexes based on selectivity, not just because an index exists.

Key Concept 3: Statistics

The query planner can’t actually run all possible plans to see which is fastest—that would take forever. Instead, it uses statistics to estimate costs.

What statistics does the database keep?

For each table and column:

Example statistics:

-- PostgreSQL stores statistics
SELECT * FROM pg_stats WHERE tablename = 'employees';

-- You'll see things like:
n_distinct: 5  (for department column - 5 unique departments)
most_common_vals: {Engineering, Sales, Marketing}
most_common_freqs: {0.50, 0.30, 0.15}  (50%, 30%, 15%)

Why Statistics Matter

SELECT * FROM employees WHERE department = 'Engineering';

The planner thinks:

If statistics are wrong:

-- Statistics say HR is 5% of rows (5,000 employees)
-- Actually, HR grew to 50,000 employees (50% of rows)

SELECT * FROM employees WHERE department = 'HR';

-- Planner thinks: "Only 5,000 rows, use index!" 
-- Reality: 50,000 rows, index scan is slow
-- Should have used sequential scan

The lesson: Keep statistics updated!

ANALYZE employees;  -- Update statistics for this table
ANALYZE;            -- Update all tables

PostgreSQL auto-analyzes tables, but for rapidly changing tables, manual ANALYZE helps.

Key Concept 4: Cost Estimation

The planner assigns a “cost” to each operation:

Costs include:

Example cost calculation:

-- Sequential scan on 100,000 row table
-- Table is 1,000 pages on disk
Cost = 1,000 pages * 1.0 + 100,000 rows * 0.01 = 2,000

-- Index scan returning 10,000 rows
-- Need to read 10,000 random pages
Cost = 10,000 pages * 4.0 + 10,000 rows * 0.01 = 40,100

-- Wait, index scan is MORE expensive!
-- Planner will choose sequential scan

Key insight I learned: For low selectivity queries (returning > 20% of rows), sequential scans are often faster than index scans because sequential reads are cheaper than random reads.

Seeing Query Plans with EXPLAIN

The best tool for understanding query planning is EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT * FROM employees WHERE salary > 100000;

Output:

Seq Scan on employees  (cost=0.00..1834.00 rows=10000 width=40) 
                       (actual time=0.012..15.234 rows=9876 loops=1)
  Filter: (salary > 100000)
  Rows Removed by Filter: 90124
Planning Time: 0.123 ms
Execution Time: 16.789 ms

How to read this:

Key metrics to watch:

  1. Execution Time: Higher = slower query
  2. Rows: Estimated vs Actual: Big difference = bad statistics
  3. Seq Scan on large tables: Often indicates missing index
  4. Rows Removed by Filter: High number = inefficient filtering

Real Examples from My Demos

Example 1: Full Scan vs Index Scan

Query: Find Engineering employees making > $100k

SELECT * FROM employees 
WHERE department = 'Engineering' 
AND salary > 100000;

Scenario A: No indexes

Seq Scan on employees  (cost=0.00..2834.00 rows=5000 width=40) 
                       (actual time=0.023..35.456 rows=4987 loops=1)
  Filter: (department = 'Engineering' AND salary > 100000)
  Rows Removed by Filter: 95013
Execution Time: 36.789 ms

Reads: 100,000 rows Time: 36 ms

Scenario B: With index on salary

Index Scan using idx_salary on employees  (cost=0.42..456.78 rows=5000 width=40)
                                          (actual time=0.045..5.234 rows=4987 loops=1)
  Index Cond: (salary > 100000)
  Filter: (department = 'Engineering')
  Rows Removed by Filter: 5013
Execution Time: 5.789 ms

Reads: 10,000 rows (only high earners) Time: 5 ms

7x faster! The index on salary reduced the rows to scan from 100,000 to 10,000.

Example 2: Which Index to Use?

Query: Same as above

Available indexes:

Stats:

If planner uses department index:

If planner uses salary index:

The planner chooses salary index because it’s more selective (10% vs 50%).

The lesson: Given multiple indexes, the planner picks the one with highest selectivity.

Example 3: Bitmap Index Scan

Query: Same as above

Strategy: Use BOTH indexes!

BitmapAnd  (cost=234.56..678.90 rows=5000 width=40)
  ->  Bitmap Index Scan on idx_department
        Index Cond: (department = 'Engineering')
  ->  Bitmap Index Scan on idx_salary
        Index Cond: (salary > 100000)
  ->  Bitmap Heap Scan on employees
Execution Time: 3.456 ms

How it works:

  1. Scan department index → get bitmap of row IDs [50,000 rows]
  2. Scan salary index → get bitmap of row IDs [10,000 rows]
  3. Perform AND operation on bitmaps → [5,000 rows]
  4. Fetch only those 5,000 rows from table

Even faster! Only fetches the exact rows needed (5,000) instead of filtering 10,000.

When this is used: When the intersection is significantly smaller than either individual filter.

Example 4: JOIN Order

Tables:

Query:

SELECT e.name, d.name 
FROM employees e
JOIN departments d ON e.department = d.name
WHERE d.name = 'Engineering';

Bad plan (employees first):

  1. Scan all 100,000 employees
  2. For each employee, look up their department
  3. Filter to Engineering
  4. Cost: 100,000 lookups

Good plan (departments first):

  1. Find Engineering department (1 row)
  2. Find all employees in Engineering (50,000 rows)
  3. Cost: 50,000 lookups

The planner automatically picks the good plan by starting with the smallest table and most selective filter.

The lesson: The planner considers JOIN order. Smaller tables and more selective filters first.

My Query Optimization Process

After building these demos, here’s my process for slow queries:

Step 1: Measure with EXPLAIN ANALYZE

EXPLAIN ANALYZE your_slow_query;

Step 2: Look for red flags:

Step 3: Check for indexes:

SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'your_table';

Step 4: Consider creating indexes:

-- Single column
CREATE INDEX idx_salary ON employees(salary);

-- Multiple columns (order matters!)
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- Partial index (for specific conditions)
CREATE INDEX idx_high_earners ON employees(salary) 
WHERE salary > 100000;

Step 5: Update statistics:

ANALYZE your_table;

Step 6: Test again:

EXPLAIN ANALYZE your_slow_query;

Index Best Practices I Learned

Create indexes on:

Don’t create indexes on:

Multi-column index order:

-- If you often query:
WHERE department = 'Engineering' AND salary > 100000

-- Create index in this order:
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- NOT this order:
CREATE INDEX idx_salary_dept ON employees(salary, department);

Rule: Most selective column first, OR column you always filter on first.


Part 4: Practical Guidelines

After building this entire project, here are the practical lessons I’ll apply in real applications.

When to Use Each Isolation Level

Use Read Committed (default) for:

Use Repeatable Read for:

Use Serializable for:

Handling Serialization Failures

When using Serializable, transactions can be aborted. Always implement retries:

def safe_transaction(operation, max_retries=3):
    for attempt in range(max_retries):
        try:
            conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
            result = operation(conn)
            conn.commit()
            return result
        except SerializationFailure:
            conn.rollback()
            if attempt == max_retries - 1:
                raise
            # Exponential backoff
            time.sleep(0.1 * (2 ** attempt))
        finally:
            conn.close()

Alternative: Optimistic Locking

Instead of Serializable isolation, use version numbers:

-- Add version column
ALTER TABLE tickets ADD COLUMN version INTEGER DEFAULT 0;

-- Update with version check
UPDATE tickets 
SET story_points = story_points + 3,
    version = version + 1
WHERE id = 1 
  AND version = 5;  -- Expected version

-- If rowcount = 0, someone else modified it

This works at Read Committed level and gives you control over conflict handling.

Query Optimization Workflow

For Development:

  1. Don’t optimize prematurely
  2. Build features first
  3. Measure performance with real data
  4. Optimize bottlenecks

For Production:

  1. Enable slow query logging
  2. Monitor pg_stat_statements
  3. Identify top 10 slowest queries
  4. Use EXPLAIN ANALYZE
  5. Add indexes strategically
  6. Run ANALYZE regularly

Red flags in production:

Index Maintenance

Check index usage:

SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan;

If idx_scan = 0: Index is never used → consider dropping it

Unused indexes waste:

Drop unused indexes:

DROP INDEX idx_unused;

Rebuild fragmented indexes:

REINDEX INDEX idx_salary;

Common Patterns I Use Now

Pattern 1: Transaction Wrapper

from contextlib import contextmanager

@contextmanager
def transaction(isolation_level=None):
    conn = get_connection(isolation_level)
    try:
        yield conn
        conn.commit()
    except Exception as e:
        conn.rollback()
        raise
    finally:
        conn.close()

# Usage
with transaction(ISOLATION_LEVEL_SERIALIZABLE):
    # Do work
    cursor.execute("UPDATE ...")

Pattern 2: Read-Write Split

# Read operations: Read Committed (fast)
def get_user_profile(user_id):
    with transaction(ISOLATION_LEVEL_READ_COMMITTED) as conn:
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
        return cursor.fetchone()

# Write operations: Serializable (safe)
def transfer_credits(from_user, to_user, amount):
    with transaction(ISOLATION_LEVEL_SERIALIZABLE) as conn:
        cursor = conn.cursor()
        cursor.execute("UPDATE users SET credits = credits - %s WHERE id = %s", 
                      (amount, from_user))
        cursor.execute("UPDATE users SET credits = credits + %s WHERE id = %s", 
                      (amount, to_user))

Pattern 3: Optimistic Locking

def update_with_optimistic_lock(ticket_id, new_points):
    conn = get_connection()
    cursor = conn.cursor()
    
    # Read with version
    cursor.execute("SELECT story_points, version FROM tickets WHERE id = %s", 
                  (ticket_id,))
    current_points, version = cursor.fetchone()
    
    # Update with version check
    cursor.execute("""
        UPDATE tickets 
        SET story_points = %s, version = version + 1
        WHERE id = %s AND version = %s
    """, (new_points, ticket_id, version))
    
    if cursor.rowcount == 0:
        raise ConcurrentModificationError("Ticket was modified by another user")
    
    conn.commit()
    conn.close()

Pattern 4: Batch Processing with Repeatable Read

def generate_monthly_report():
    # Use Repeatable Read for consistent snapshot
    with transaction(ISOLATION_LEVEL_REPEATABLE_READ) as conn:
        cursor = conn.cursor()
        
        # All queries see the same snapshot
        cursor.execute("SELECT SUM(amount) FROM sales WHERE month = 'October'")
        total_sales = cursor.fetchone()[0]
        
        cursor.execute("SELECT COUNT(*) FROM orders WHERE month = 'October'")
        total_orders = cursor.fetchone()[0]
        
        cursor.execute("SELECT AVG(amount) FROM sales WHERE month = 'October'")
        avg_sale = cursor.fetchone()[0]
        
        return {
            'total_sales': total_sales,
            'total_orders': total_orders,
            'average_sale': avg_sale
        }

Debugging Checklist

When things go wrong, here’s what I check:

For Isolation Issues:

For Performance Issues:

Testing Isolation Levels

The most important lesson: isolation bugs only appear under concurrent load.

# Bad: Testing serially
def test_lost_update():
    update_story_points(1, 5)  # Works fine
    update_story_points(1, 3)  # Works fine
    # No bug detected!

# Good: Testing concurrently
def test_lost_update():
    thread1 = Thread(target=lambda: update_story_points(1, 5))
    thread2 = Thread(target=lambda: update_story_points(1, 3))
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
    # Now we can see if updates were lost!

Load testing tools:


Conclusion

When I started this learning journey, database internals felt like black magic. Now I understand they’re based on fundamental trade-offs:

Isolation Levels: Safety vs Performance

Query Planning: Indexes vs Table Scans

What I Wish I’d Known Earlier

  1. Read Committed is usually enough - Don’t cargo-cult Serializable everywhere
  2. Indexes aren’t magic - They help selective queries, not bulk operations
  3. Statistics matter - An outdated ANALYZE can tank performance
  4. Test with concurrency - Serial tests won’t catch isolation bugs
  5. EXPLAIN ANALYZE is your best friend - Use it liberally
  6. Premature optimization is real - Measure first, optimize second

My Mental Models Now

For Isolation:

For Query Planning:

Resources That Helped Me

Books:

Online:

Papers:

The GitHub Project

I built a complete hands-on project demonstrating all these concepts:

Repository: github.com/yourusername/isolation-and-query-optimization

You can clone it, run the demos, and see these concepts in action. The demos use PostgreSQL and Python with threading to simulate real concurrent scenarios.

Final Thoughts

Understanding these concepts transformed how I think about database design:

The key insight: Databases are sophisticated systems with well-understood trade-offs. Once you understand the trade-offs, you can make intelligent choices for your specific use case.

This blog post is my “note to future self” - a comprehensive reference I can return to whenever I need to refresh these concepts. I hope it helps you too.

If you’re learning about database internals, I highly recommend building your own demos. The act of coding these scenarios and seeing them fail (and then fixing them) cemented my understanding in a way that reading never could.


Quick Reference

Isolation Levels Quick Guide

LevelUse ForPreventsAllows
Read CommittedWeb apps, APIs, dashboardsDirty readsNon-repeatable reads, Lost updates, Phantoms
Repeatable ReadReports, exports, batch jobsDirty reads, Non-repeatable reads, Phantoms*Lost updates
SerializableFinancial transactions, critical updatesEverythingNothing (safest)

*In PostgreSQL

Query Optimization Quick Guide

Slow query? Check these in order:

  1. Run EXPLAIN ANALYZE - See the actual plan
  2. Look for Seq Scan - On tables > 10k rows? Needs index
  3. Check selectivity - Filter keeps > 20% of rows? Index may not help
  4. Verify indexes exist - On WHERE, JOIN, ORDER BY columns?
  5. Update statistics - Run ANALYZE table_name
  6. Re-test - Did it improve?

Create indexes for:

Don’t create indexes for:


Last updated: October 2025

Project Repository: isolation-and-query-optimization

Questions or corrections? Open an issue on GitHub or reach out on [Twitter/LinkedIn].

About the Author

Aniket Indulkar is an Android Engineer based in London with a Master's in Artificial Intelligence. He writes about AI, ML, Android development, and his continuous learning journey.

Connect on LinkedIn →