Understanding Database Isolation Levels and Query Planning - A Beginner's Guide
Understanding Database Isolation Levels and Query Planning: A Beginner’s Guide
When I started learning about database internals, I was overwhelmed by terms like “phantom reads,” “serializable isolation,” and “query planners.” These concepts seemed abstract and disconnected from real-world programming. After building a hands-on project to understand them, I realized they’re not only fundamental to database systems but also critical for writing correct, performant applications.
This guide documents everything I learned about two core database concepts: isolation levels (how databases handle concurrent transactions) and query planning (how databases optimize query execution). Think of this as my learning journal—a reference I can return to whenever I need to refresh my understanding.
Table of Contents
- Part 1: Understanding Transactions
- Part 2: Isolation Levels
- Part 3: Query Planning and Optimization
- Part 4: Practical Guidelines
- Conclusion
Part 1: Understanding Transactions
Before diving into isolation levels, I needed to understand what a transaction actually is.
What is a Transaction?
A transaction is a unit of work that must either completely succeed or completely fail—there’s no in-between. It’s a group of database operations that are treated as a single atomic operation.
Example: Bank Transfer
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE user = 'Alice';
UPDATE accounts SET balance = balance + 100 WHERE user = 'Bob';
COMMIT;
This is ONE transaction with TWO updates. Either both updates happen, or neither happens. If something goes wrong (like the database crashes), you can ROLLBACK, and it’s as if nothing happened.
The ACID Properties
Transactions follow ACID principles:
- Atomicity: All or nothing
- Consistency: Database stays in a valid state
- Isolation: Transactions don’t interfere with each other (this is what isolation levels control)
- Durability: Once committed, changes are permanent
The “I” in ACID—Isolation—is what we’ll explore deeply.
The Core Problem
Imagine you’re building a ticket management system like Jira. Two developers, Alice and Bob, try to assign the same ticket to themselves at the exact same time:
Time | Alice's Transaction | Bob's Transaction
------|------------------------------|---------------------------
T1 | READ ticket: assigned = NULL |
T2 | | READ ticket: assigned = NULL
T3 | WRITE ticket: assigned=Alice |
T4 | | WRITE ticket: assigned=Bob
After T4, who owns the ticket? Bob—his write overwrote Alice’s. Neither of them knows there was a conflict. This is called a lost update, and it’s just one of several problems that arise with concurrent transactions.
Part 2: Isolation Levels
Isolation levels define what data transactions can “see” when multiple transactions run simultaneously. They represent a trade-off between consistency (correctness) and performance (speed).
The Spectrum of Isolation
Think of isolation levels as a dial:
Lower Isolation ←------------------------→ Higher Isolation
(Faster, Less Safe) (Slower, More Safe)
Read Uncommitted → Read Committed → Repeatable Read → Serializable
The Four Problems We’re Trying to Prevent
Before understanding isolation levels, I needed to know what problems they solve:
1. Dirty Read
Definition: Reading uncommitted data from another transaction that might be rolled back.
Example: Banking System
T1: Alice's transaction updates her balance to $500 (not committed)
T2: Bob's transaction reads Alice's balance: $500
T1: Alice's transaction ROLLS BACK (balance stays at $100)
T2: Bob made a decision based on $500 that doesn't exist
Why it’s bad: You’re basing decisions on data that might never exist. In financial systems, this could lead to fraud or incorrect calculations.
Real-world analogy: You see someone put money in your account on their phone, but they cancel the transfer before hitting “send.” If you already spent that money, you’re in trouble.
2. Non-Repeatable Read
Definition: Reading the same row twice in your transaction and getting different values.
Example: Report Generation
-- Your transaction starts
SELECT story_points FROM tickets WHERE id = 1; -- Returns 5
-- You do some calculations...
-- Meanwhile, another transaction commits an update
SELECT story_points FROM tickets WHERE id = 1; -- Returns 10
-- Your calculations are now inconsistent!
Why it’s bad: If you’re generating a report that reads the same data multiple times, you expect consistency. Getting different values mid-transaction makes your logic unreliable.
Real-world analogy: You check your bank balance ($100), decide to buy something for $90, check again and see $50. Someone transferred money out while you were shopping. Your decision to buy was based on outdated information.
3. Lost Update
Definition: Two transactions read the same value, both modify it, and one overwrites the other’s change.
Example: Story Points Update
Initial state: story_points = 10
Transaction A (Alice):
READ story_points = 10
Calculate: 10 + 3 = 13
WRITE story_points = 13
COMMIT
Transaction B (Bob):
READ story_points = 10 (read same initial value)
Calculate: 10 + 5 = 15
WRITE story_points = 15
COMMIT
Final state: story_points = 15
Expected: 18 (10 + 3 + 5)
Lost: 3 points from Alice's update
Why it’s bad: Data gets lost without anyone knowing. In inventory systems, you could oversell products. In accounting, you could lose financial data.
Real-world analogy: Two people editing the same Google Doc without real-time sync. They both start from “Version 1,” make changes, and save. Whoever saves last overwrites the other person’s changes.
4. Phantom Read
Definition: Running the same query twice and getting a different number of rows because another transaction inserted or deleted rows.
Example: Aggregation Query
-- Your transaction starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO'; -- Returns 5
-- Another transaction inserts a new TODO ticket and commits
SELECT COUNT(*) FROM tickets WHERE status = 'TODO'; -- Returns 6
-- Your aggregation is inconsistent!
Why it’s bad: Aggregations, reports, and batch operations need a consistent view. If rows appear or disappear mid-transaction, your results are unreliable.
Real-world analogy: You count the number of people in a room (10 people), go back to writing it down, count again to verify, and now there are 11 people. Someone walked in while you were counting.
The Four Isolation Levels
Now let’s see how each isolation level prevents (or allows) these problems.
Read Uncommitted
What it guarantees: Almost nothing.
What it prevents: Nothing reliably.
Problems allowed:
- Dirty reads
- Non-repeatable reads
- Lost updates
- Phantom reads
When to use: Almost never. Even PostgreSQL’s “Read Uncommitted” actually behaves like Read Committed.
Performance: Fastest (but too dangerous)
Read Committed (PostgreSQL Default)
What it guarantees: You only see committed data.
What it prevents:
- Dirty reads ✓
Problems allowed:
- Non-repeatable reads ✗
- Lost updates ✗
- Phantom reads ✗
Example from my demos:
# Transaction 1
UPDATE tickets SET story_points = 20 WHERE id = 1;
# Not committed yet
# Transaction 2 (Read Committed)
SELECT story_points FROM tickets WHERE id = 1;
# Returns 10 (old value, not the uncommitted 20)
# Transaction 1
COMMIT;
# Transaction 2 (still in same transaction)
SELECT story_points FROM tickets WHERE id = 1;
# Now returns 20 (sees the committed change)
When to use:
- Web applications with mostly independent operations
- Read-heavy workloads
- Dashboards and reports where slight staleness is acceptable
- Most general-purpose applications (80% of use cases)
Performance: Fast and scalable
Key insight I learned: This is why it’s the default. It prevents the most dangerous problem (dirty reads) while maintaining good performance.
Repeatable Read
What it guarantees: If you read the same row twice, you get the same value.
What it prevents:
- Dirty reads ✓
- Non-repeatable reads ✓
- Phantom reads ✓ (in PostgreSQL)
Problems allowed:
- Lost updates ✗ (still possible, but can be detected)
Example from my demos:
# Transaction starts (Repeatable Read)
SELECT story_points FROM tickets WHERE id = 1; # Returns 10
# Another transaction commits: UPDATE story_points = 20
# Same transaction reads again
SELECT story_points FROM tickets WHERE id = 1; # Still returns 10!
# It sees a consistent snapshot from transaction start
When to use:
- Reports requiring consistent snapshots
- Batch processing jobs
- Data exports
- Analytics queries
- Any operation that reads data multiple times and expects consistency
Performance: Medium (some overhead for maintaining snapshots)
Key insight I learned: PostgreSQL’s Repeatable Read actually implements Snapshot Isolation, which is stronger than the SQL standard requires. It prevents phantom reads too.
Serializable
What it guarantees: Transactions appear to run one at a time (serialized), even if they actually run concurrently.
What it prevents:
- Dirty reads ✓
- Non-repeatable reads ✓
- Lost updates ✓
- Phantom reads ✓
- All anomalies ✓
Problems allowed: None
Example from my demos:
# Transaction 1 (Serializable)
READ story_points = 10
UPDATE story_points = 13
COMMIT # Success
# Transaction 2 (Serializable)
READ story_points = 10 (read same initial value)
UPDATE story_points = 15
COMMIT # ABORTED! Serialization failure detected
When to use:
- Financial transactions (money transfers, payments)
- Inventory management (prevent overselling)
- Booking systems (prevent double-booking)
- Any scenario where correctness is more important than speed
Performance: Slowest (uses locks or conflict detection)
Key insight I learned: One transaction gets aborted when a conflict is detected. Your application must be prepared to retry failed transactions.
Retry pattern:
def transfer_money_with_retry(from_account, to_account, amount, max_retries=3):
for attempt in range(max_retries):
try:
conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
# Do the transfer
conn.commit()
return True
except SerializationFailure:
if attempt == max_retries - 1:
raise
time.sleep(0.1 * (2 ** attempt)) # Exponential backoff
return False
Summary Table
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Lost Update | Performance |
|---|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Possible | Fastest |
| Read Committed | Prevented | Possible | Possible | Possible | Fast |
| Repeatable Read | Prevented | Prevented | Prevented* | Possible | Medium |
| Serializable | Prevented | Prevented | Prevented | Prevented | Slowest |
*In PostgreSQL. The SQL standard allows phantom reads at this level.
Real-World Scenarios I Tested
Scenario 1: The Lost Update (Most Common Bug)
This was the most eye-opening demo I built. Here’s what happens with Read Committed:
Initial: story_points = 10
Alice and Bob both want to increment story points:
- Alice wants to add 3
- Bob wants to add 5
Expected result: 10 + 3 + 5 = 18
Actual result with Read Committed: 15 (one update lost)
Actual result with Serializable: 13 or 15 (one transaction aborted and should retry)
The lesson: Read Committed doesn’t prevent lost updates. You need either:
- Serializable isolation
- Explicit locking (
SELECT ... FOR UPDATE) - Optimistic locking (version numbers)
- Application-level atomic operations
Scenario 2: The Phantom Read Problem
Running reports while data changes:
-- Your report starts
SELECT COUNT(*) FROM tickets WHERE status = 'TODO'; -- 5 tickets
-- Someone creates a new TODO ticket
SELECT SUM(story_points) FROM tickets WHERE status = 'TODO'; -- Now 6 tickets worth of points
-- Your count and sum are inconsistent!
The lesson: For reports, use Repeatable Read or higher to get a consistent snapshot.
Scenario 3: Non-Repeatable Read
The subtle bug in calculations:
BEGIN; -- Read Committed
SELECT price FROM products WHERE id = 1; -- $100
-- Calculate discount: $100 * 0.1 = $10
-- Another transaction updates price to $200 and commits
SELECT price FROM products WHERE id = 1; -- $200
-- Apply discount: $200 - $10 = $190
-- You subtracted $10 from a $200 item because you calculated based on $100!
The lesson: Any time you read data, perform calculations, and use that data again, you’re vulnerable to non-repeatable reads.
My Decision Framework
After building these demos, here’s how I now choose isolation levels:
1. Start with Read Committed (default)
- Works for 80% of applications
- Good performance
- Prevents the most dangerous issue (dirty reads)
2. Upgrade to Repeatable Read when:
- Generating reports or exports
- Performing multi-step calculations on same data
- Running batch jobs that need consistency
- Reading related data across multiple queries
3. Use Serializable when:
- Handling money or critical inventory
- Preventing resource conflicts (booking systems)
- Correctness matters more than performance
- Your application can handle retries
4. Consider application-level solutions:
- Optimistic locking (version columns)
- Explicit locks (
SELECT ... FOR UPDATE) - Atomic operations in a single statement
- Message queues for serializing operations
Part 3: Query Planning and Optimization
After understanding isolation levels, I learned about the second pillar of database performance: how databases decide how to execute queries.
The Fundamental Question
When you write:
SELECT * FROM employees
WHERE department = 'Engineering'
AND salary > 100000;
The database has many ways to execute this:
- Read every row and check both conditions
- Use an index on
department, then filter by salary - Use an index on
salary, then filter by department - Use both indexes and combine results
How does it decide? The query planner.
What is the Query Planner?
The query planner (or optimizer) is the part of the database that:
- Generates all possible ways to execute your query
- Estimates the cost of each approach
- Picks the cheapest plan
- Executes it
Your SQL → Parse → Generate Plans → Estimate Costs → Pick Best → Execute → Results
Key Concept 1: Indexes
Before understanding query planning, I needed to understand indexes.
The Phone Book Analogy
Without an index (sequential scan):
- Start at page 1
- Read every name: “Adams… Baker… Chen…”
- Keep going until you find “Smith”
- Could check all 1 million names
With an index (index scan):
- Know that S is about 3/4 through the book
- Jump directly there
- Check a few pages
- Check maybe 100 names instead of 1 million
That’s a 10,000x improvement!
What Are Indexes Really?
An index is a data structure (usually a B-Tree) that stores:
- The indexed column values (sorted)
- Pointers to where the actual rows are stored
Index on salary:
50000 → Row 1543
50000 → Row 8721
50001 → Row 234
...
100000 → Row 456
100001 → Row 789
...
When you query WHERE salary > 100000, the database:
- Jumps to the index at the 100000 entry
- Reads all entries after that point
- Fetches only those row IDs from the table
The Cost: Indexes Aren’t Free
Pros:
- Make reads MUCH faster (10-100x for selective queries)
- Essential for large tables
- Help with sorting (ORDER BY)
- Help with joins (JOIN … ON)
Cons:
- Slow down writes (INSERT, UPDATE, DELETE)
- Take up disk space
- Need maintenance (rebuilding, updating statistics)
Rule of thumb: Create indexes when reads outnumber writes by 10:1 or more.
Key Concept 2: Selectivity
Selectivity = How many rows a filter keeps vs eliminates
High selectivity (good for indexes):
- Filters out MOST rows
- Example:
WHERE employee_id = 42(1 out of 100,000 rows) - Keeps < 5% of rows
Low selectivity (indexes may not help):
- Keeps MOST rows
- Example:
WHERE country = 'USA'(90,000 out of 100,000 rows) - Keeps > 20% of rows
Example from My Demo
I have 100,000 employees:
- Engineering: 50,000 employees (50% - low selectivity)
- HR: 5,000 employees (5% - high selectivity)
- Salary > $150k: 10,000 employees (10% - medium selectivity)
Query 1: WHERE department = 'Engineering'
- Not very selective (50% of rows)
- Index might not help much
- Planner might choose sequential scan
Query 2: WHERE department = 'HR'
- Very selective (5% of rows)
- Index helps a lot
- Planner will use index scan
Query 3: WHERE salary > 150000
- Medium selectivity (10% of rows)
- Index helps significantly
- Planner will use index scan
Key insight I learned: The planner chooses indexes based on selectivity, not just because an index exists.
Key Concept 3: Statistics
The query planner can’t actually run all possible plans to see which is fastest—that would take forever. Instead, it uses statistics to estimate costs.
What statistics does the database keep?
For each table and column:
- Total number of rows
- Number of distinct values
- Most common values and their frequencies
- Data distribution (histogram)
- Null value percentage
- Average row size
Example statistics:
-- PostgreSQL stores statistics
SELECT * FROM pg_stats WHERE tablename = 'employees';
-- You'll see things like:
n_distinct: 5 (for department column - 5 unique departments)
most_common_vals: {Engineering, Sales, Marketing}
most_common_freqs: {0.50, 0.30, 0.15} (50%, 30%, 15%)
Why Statistics Matter
SELECT * FROM employees WHERE department = 'Engineering';
The planner thinks:
- “Table has 100,000 rows”
- “Statistics say ‘Engineering’ is 50% of rows”
- “So this query will return ~50,000 rows”
- “That’s too many for an index to help”
- “I’ll use sequential scan”
If statistics are wrong:
-- Statistics say HR is 5% of rows (5,000 employees)
-- Actually, HR grew to 50,000 employees (50% of rows)
SELECT * FROM employees WHERE department = 'HR';
-- Planner thinks: "Only 5,000 rows, use index!"
-- Reality: 50,000 rows, index scan is slow
-- Should have used sequential scan
The lesson: Keep statistics updated!
ANALYZE employees; -- Update statistics for this table
ANALYZE; -- Update all tables
PostgreSQL auto-analyzes tables, but for rapidly changing tables, manual ANALYZE helps.
Key Concept 4: Cost Estimation
The planner assigns a “cost” to each operation:
Costs include:
- Sequential page read: 1.0 cost units
- Random page read: 4.0 cost units (disk seeks are expensive)
- CPU comparison: 0.01 cost units
- Row processing: 0.01 cost units
Example cost calculation:
-- Sequential scan on 100,000 row table
-- Table is 1,000 pages on disk
Cost = 1,000 pages * 1.0 + 100,000 rows * 0.01 = 2,000
-- Index scan returning 10,000 rows
-- Need to read 10,000 random pages
Cost = 10,000 pages * 4.0 + 10,000 rows * 0.01 = 40,100
-- Wait, index scan is MORE expensive!
-- Planner will choose sequential scan
Key insight I learned: For low selectivity queries (returning > 20% of rows), sequential scans are often faster than index scans because sequential reads are cheaper than random reads.
Seeing Query Plans with EXPLAIN
The best tool for understanding query planning is EXPLAIN ANALYZE:
EXPLAIN ANALYZE
SELECT * FROM employees WHERE salary > 100000;
Output:
Seq Scan on employees (cost=0.00..1834.00 rows=10000 width=40)
(actual time=0.012..15.234 rows=9876 loops=1)
Filter: (salary > 100000)
Rows Removed by Filter: 90124
Planning Time: 0.123 ms
Execution Time: 16.789 ms
How to read this:
Seq Scan: Type of operation (sequential scan, reading all rows)cost=0.00..1834.00: Estimated startup cost .. total costrows=10000: Estimated rows returnedwidth=40: Estimated average row size (bytes)actual time=0.012..15.234: Real time (ms) to first row .. all rowsrows=9876: Actual rows returnedRows Removed by Filter: 90124: Rows scanned but didn’t matchPlanning Time: Time to create the planExecution Time: Time to actually run the query
Key metrics to watch:
- Execution Time: Higher = slower query
- Rows: Estimated vs Actual: Big difference = bad statistics
- Seq Scan on large tables: Often indicates missing index
- Rows Removed by Filter: High number = inefficient filtering
Real Examples from My Demos
Example 1: Full Scan vs Index Scan
Query: Find Engineering employees making > $100k
SELECT * FROM employees
WHERE department = 'Engineering'
AND salary > 100000;
Scenario A: No indexes
Seq Scan on employees (cost=0.00..2834.00 rows=5000 width=40)
(actual time=0.023..35.456 rows=4987 loops=1)
Filter: (department = 'Engineering' AND salary > 100000)
Rows Removed by Filter: 95013
Execution Time: 36.789 ms
Reads: 100,000 rows Time: 36 ms
Scenario B: With index on salary
Index Scan using idx_salary on employees (cost=0.42..456.78 rows=5000 width=40)
(actual time=0.045..5.234 rows=4987 loops=1)
Index Cond: (salary > 100000)
Filter: (department = 'Engineering')
Rows Removed by Filter: 5013
Execution Time: 5.789 ms
Reads: 10,000 rows (only high earners) Time: 5 ms
7x faster! The index on salary reduced the rows to scan from 100,000 to 10,000.
Example 2: Which Index to Use?
Query: Same as above
Available indexes:
idx_departmentondepartmentidx_salaryonsalary
Stats:
- Engineering: 50,000 employees (50%)
- Salary > $100k: 10,000 employees (10%)
If planner uses department index:
- Reads 50,000 rows (all Engineering employees)
- Filters to 5,000 who make > $100k
- Cost: ~50,000 rows processed
If planner uses salary index:
- Reads 10,000 rows (all high earners)
- Filters to 5,000 who are in Engineering
- Cost: ~10,000 rows processed
The planner chooses salary index because it’s more selective (10% vs 50%).
The lesson: Given multiple indexes, the planner picks the one with highest selectivity.
Example 3: Bitmap Index Scan
Query: Same as above
Strategy: Use BOTH indexes!
BitmapAnd (cost=234.56..678.90 rows=5000 width=40)
-> Bitmap Index Scan on idx_department
Index Cond: (department = 'Engineering')
-> Bitmap Index Scan on idx_salary
Index Cond: (salary > 100000)
-> Bitmap Heap Scan on employees
Execution Time: 3.456 ms
How it works:
- Scan department index → get bitmap of row IDs [50,000 rows]
- Scan salary index → get bitmap of row IDs [10,000 rows]
- Perform AND operation on bitmaps → [5,000 rows]
- Fetch only those 5,000 rows from table
Even faster! Only fetches the exact rows needed (5,000) instead of filtering 10,000.
When this is used: When the intersection is significantly smaller than either individual filter.
Example 4: JOIN Order
Tables:
employees: 100,000 rowsdepartments: 5 rows
Query:
SELECT e.name, d.name
FROM employees e
JOIN departments d ON e.department = d.name
WHERE d.name = 'Engineering';
Bad plan (employees first):
- Scan all 100,000 employees
- For each employee, look up their department
- Filter to Engineering
- Cost: 100,000 lookups
Good plan (departments first):
- Find Engineering department (1 row)
- Find all employees in Engineering (50,000 rows)
- Cost: 50,000 lookups
The planner automatically picks the good plan by starting with the smallest table and most selective filter.
The lesson: The planner considers JOIN order. Smaller tables and more selective filters first.
My Query Optimization Process
After building these demos, here’s my process for slow queries:
Step 1: Measure with EXPLAIN ANALYZE
EXPLAIN ANALYZE your_slow_query;
Step 2: Look for red flags:
- Seq Scan on large tables (> 10,000 rows)
- Large difference between estimated and actual rows
- High execution time
- Many rows removed by filter
Step 3: Check for indexes:
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'your_table';
Step 4: Consider creating indexes:
-- Single column
CREATE INDEX idx_salary ON employees(salary);
-- Multiple columns (order matters!)
CREATE INDEX idx_dept_salary ON employees(department, salary);
-- Partial index (for specific conditions)
CREATE INDEX idx_high_earners ON employees(salary)
WHERE salary > 100000;
Step 5: Update statistics:
ANALYZE your_table;
Step 6: Test again:
EXPLAIN ANALYZE your_slow_query;
Index Best Practices I Learned
Create indexes on:
- Primary keys (automatic)
- Foreign keys (manual - PostgreSQL doesn’t auto-create these!)
- Columns in WHERE clauses
- Columns in JOIN conditions
- Columns in ORDER BY
- Columns in GROUP BY
Don’t create indexes on:
- Small tables (< 1,000 rows)
- Columns with low cardinality (gender, boolean)
- Columns rarely queried
- Frequently updated columns (indexes slow down writes)
Multi-column index order:
-- If you often query:
WHERE department = 'Engineering' AND salary > 100000
-- Create index in this order:
CREATE INDEX idx_dept_salary ON employees(department, salary);
-- NOT this order:
CREATE INDEX idx_salary_dept ON employees(salary, department);
Rule: Most selective column first, OR column you always filter on first.
Part 4: Practical Guidelines
After building this entire project, here are the practical lessons I’ll apply in real applications.
When to Use Each Isolation Level
Use Read Committed (default) for:
- Web applications (most requests)
- REST APIs
- Microservices
- Any operation that doesn’t read the same data twice
- Dashboards (slight staleness is fine)
Use Repeatable Read for:
- Generating reports
- Data exports
- Batch jobs
- Analytics queries
- Any operation that reads data multiple times and needs consistency
Use Serializable for:
- Money transfers
- Payment processing
- Inventory updates (prevent overselling)
- Booking systems (prevent double-booking)
- Any critical operation where lost updates would be catastrophic
Handling Serialization Failures
When using Serializable, transactions can be aborted. Always implement retries:
def safe_transaction(operation, max_retries=3):
for attempt in range(max_retries):
try:
conn = get_connection(ISOLATION_LEVEL_SERIALIZABLE)
result = operation(conn)
conn.commit()
return result
except SerializationFailure:
conn.rollback()
if attempt == max_retries - 1:
raise
# Exponential backoff
time.sleep(0.1 * (2 ** attempt))
finally:
conn.close()
Alternative: Optimistic Locking
Instead of Serializable isolation, use version numbers:
-- Add version column
ALTER TABLE tickets ADD COLUMN version INTEGER DEFAULT 0;
-- Update with version check
UPDATE tickets
SET story_points = story_points + 3,
version = version + 1
WHERE id = 1
AND version = 5; -- Expected version
-- If rowcount = 0, someone else modified it
This works at Read Committed level and gives you control over conflict handling.
Query Optimization Workflow
For Development:
- Don’t optimize prematurely
- Build features first
- Measure performance with real data
- Optimize bottlenecks
For Production:
- Enable slow query logging
- Monitor
pg_stat_statements - Identify top 10 slowest queries
- Use EXPLAIN ANALYZE
- Add indexes strategically
- Run ANALYZE regularly
Red flags in production:
- Queries taking > 1 second
- Sequential scans on tables > 10,000 rows
- Many queries on same table (needs index)
- High CPU usage from sorting (needs index for ORDER BY)
Index Maintenance
Check index usage:
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan;
If idx_scan = 0: Index is never used → consider dropping it
Unused indexes waste:
- Disk space
- Write performance (every INSERT/UPDATE must update the index)
- Maintenance overhead
Drop unused indexes:
DROP INDEX idx_unused;
Rebuild fragmented indexes:
REINDEX INDEX idx_salary;
Common Patterns I Use Now
Pattern 1: Transaction Wrapper
from contextlib import contextmanager
@contextmanager
def transaction(isolation_level=None):
conn = get_connection(isolation_level)
try:
yield conn
conn.commit()
except Exception as e:
conn.rollback()
raise
finally:
conn.close()
# Usage
with transaction(ISOLATION_LEVEL_SERIALIZABLE):
# Do work
cursor.execute("UPDATE ...")
Pattern 2: Read-Write Split
# Read operations: Read Committed (fast)
def get_user_profile(user_id):
with transaction(ISOLATION_LEVEL_READ_COMMITTED) as conn:
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
return cursor.fetchone()
# Write operations: Serializable (safe)
def transfer_credits(from_user, to_user, amount):
with transaction(ISOLATION_LEVEL_SERIALIZABLE) as conn:
cursor = conn.cursor()
cursor.execute("UPDATE users SET credits = credits - %s WHERE id = %s",
(amount, from_user))
cursor.execute("UPDATE users SET credits = credits + %s WHERE id = %s",
(amount, to_user))
Pattern 3: Optimistic Locking
def update_with_optimistic_lock(ticket_id, new_points):
conn = get_connection()
cursor = conn.cursor()
# Read with version
cursor.execute("SELECT story_points, version FROM tickets WHERE id = %s",
(ticket_id,))
current_points, version = cursor.fetchone()
# Update with version check
cursor.execute("""
UPDATE tickets
SET story_points = %s, version = version + 1
WHERE id = %s AND version = %s
""", (new_points, ticket_id, version))
if cursor.rowcount == 0:
raise ConcurrentModificationError("Ticket was modified by another user")
conn.commit()
conn.close()
Pattern 4: Batch Processing with Repeatable Read
def generate_monthly_report():
# Use Repeatable Read for consistent snapshot
with transaction(ISOLATION_LEVEL_REPEATABLE_READ) as conn:
cursor = conn.cursor()
# All queries see the same snapshot
cursor.execute("SELECT SUM(amount) FROM sales WHERE month = 'October'")
total_sales = cursor.fetchone()[0]
cursor.execute("SELECT COUNT(*) FROM orders WHERE month = 'October'")
total_orders = cursor.fetchone()[0]
cursor.execute("SELECT AVG(amount) FROM sales WHERE month = 'October'")
avg_sale = cursor.fetchone()[0]
return {
'total_sales': total_sales,
'total_orders': total_orders,
'average_sale': avg_sale
}
Debugging Checklist
When things go wrong, here’s what I check:
For Isolation Issues:
- Are multiple users/processes accessing the same data?
- Could two transactions be reading and writing the same rows?
- Am I seeing inconsistent data within a single operation?
- Are updates getting lost?
- Is the isolation level appropriate for the operation?
For Performance Issues:
- Did I run EXPLAIN ANALYZE on the slow query?
- Are there sequential scans on large tables?
- Do indexes exist on WHERE/JOIN columns?
- Are statistics up to date? (ANALYZE)
- Is the query returning more data than needed?
- Are there N+1 query problems? (multiple queries in a loop)
Testing Isolation Levels
The most important lesson: isolation bugs only appear under concurrent load.
# Bad: Testing serially
def test_lost_update():
update_story_points(1, 5) # Works fine
update_story_points(1, 3) # Works fine
# No bug detected!
# Good: Testing concurrently
def test_lost_update():
thread1 = Thread(target=lambda: update_story_points(1, 5))
thread2 = Thread(target=lambda: update_story_points(1, 3))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
# Now we can see if updates were lost!
Load testing tools:
pgbench(PostgreSQL’s built-in tool)Apache JMeterLocust- Custom Python scripts with threading
Conclusion
When I started this learning journey, database internals felt like black magic. Now I understand they’re based on fundamental trade-offs:
Isolation Levels: Safety vs Performance
- More isolation = More safety, Less speed
- Less isolation = More speed, Less safety
- No silver bullet - choose based on your use case
Query Planning: Indexes vs Table Scans
- Indexes = Fast reads, Slow writes, Uses space
- No indexes = Slow reads, Fast writes, Saves space
- The planner makes intelligent choices based on statistics
What I Wish I’d Known Earlier
- Read Committed is usually enough - Don’t cargo-cult Serializable everywhere
- Indexes aren’t magic - They help selective queries, not bulk operations
- Statistics matter - An outdated ANALYZE can tank performance
- Test with concurrency - Serial tests won’t catch isolation bugs
- EXPLAIN ANALYZE is your best friend - Use it liberally
- Premature optimization is real - Measure first, optimize second
My Mental Models Now
For Isolation:
- Think about concurrent users editing the same data
- Ask: “What happens if two people do this simultaneously?”
- Default to Read Committed, upgrade only when needed
- Always handle Serializable failures with retries
For Query Planning:
- Think about data volume and selectivity
- Ask: “How many rows will this filter keep?”
- Create indexes for selective filters (< 10% of rows)
- Update statistics after bulk changes
- Use EXPLAIN ANALYZE to verify assumptions
Resources That Helped Me
Books:
- “Designing Data-Intensive Applications” by Martin Kleppmann
- “PostgreSQL: Up and Running” by Regina Obe and Leo Hsu
Online:
Papers:
- “A Critique of ANSI SQL Isolation Levels” (explains why standards differ from reality)
The GitHub Project
I built a complete hands-on project demonstrating all these concepts:
- 7 isolation level demos showing each problem and solution
- 5 query planning demos with real performance comparisons
- Docker setup for easy reproduction
- Colored terminal output for clarity
- Full documentation
Repository: github.com/yourusername/isolation-and-query-optimization
You can clone it, run the demos, and see these concepts in action. The demos use PostgreSQL and Python with threading to simulate real concurrent scenarios.
Final Thoughts
Understanding these concepts transformed how I think about database design:
- I’m no longer afraid of concurrent users
- I understand why some operations are slow
- I can debug performance issues systematically
- I make informed decisions about indexes
- I choose the right isolation level for each operation
The key insight: Databases are sophisticated systems with well-understood trade-offs. Once you understand the trade-offs, you can make intelligent choices for your specific use case.
This blog post is my “note to future self” - a comprehensive reference I can return to whenever I need to refresh these concepts. I hope it helps you too.
If you’re learning about database internals, I highly recommend building your own demos. The act of coding these scenarios and seeing them fail (and then fixing them) cemented my understanding in a way that reading never could.
Quick Reference
Isolation Levels Quick Guide
| Level | Use For | Prevents | Allows |
|---|---|---|---|
| Read Committed | Web apps, APIs, dashboards | Dirty reads | Non-repeatable reads, Lost updates, Phantoms |
| Repeatable Read | Reports, exports, batch jobs | Dirty reads, Non-repeatable reads, Phantoms* | Lost updates |
| Serializable | Financial transactions, critical updates | Everything | Nothing (safest) |
*In PostgreSQL
Query Optimization Quick Guide
Slow query? Check these in order:
- Run EXPLAIN ANALYZE - See the actual plan
- Look for Seq Scan - On tables > 10k rows? Needs index
- Check selectivity - Filter keeps > 20% of rows? Index may not help
- Verify indexes exist - On WHERE, JOIN, ORDER BY columns?
- Update statistics - Run ANALYZE table_name
- Re-test - Did it improve?
Create indexes for:
- Foreign keys (always!)
- Columns in WHERE clauses (if selective)
- Columns in JOIN conditions
- Columns in ORDER BY (if used frequently)
Don’t create indexes for:
- Small tables (< 1,000 rows)
- Low cardinality columns (gender, boolean)
- Rarely queried columns
- Frequently updated columns
Last updated: October 2025
Project Repository: isolation-and-query-optimization
Questions or corrections? Open an issue on GitHub or reach out on [Twitter/LinkedIn].