Thomas Wang's blog

From journeyman to master.


DDIA Chapter 7. Transactions

Buy the book https://dataintensive.net/

Intro

The Slippery Concept of a Transaction

The Meaning of ACID

Atomicity

Consistency

Term is terribly overloaded:

Atomicity, isolation, and durability are properties of the database, whereas consistency (in the ACID sense) is a property of the application, by relying on the database’s atomicity and isolation properties in order to achieve.

Isolation

Durability

Single-Object and Multi-Object Operations

Single-object writes

A transaction is usually understood as a mechanism for grouping multiple operations on multiple objects into one unit of execution. But we also need single-object writes:

The need for multi-object transactions

Handling errors and aborts

Weak Isolation Levels

Read Committed

  1. When reading from the database, you will only see data that has been committed (no dirty reads). Figure 7-4. No dirty reads
  2. When writing to the database, you will only overwrite data that has been committed (no dirty writes). Figure 7-5. With dirty writes

Implementing read committed

Snapshot Isolation and Repeatable Read

Implementing snapshot isolation

Visibility rules for observing a consistent snapshot

When a transaction reads from the database, transaction IDs are used to decide which objects it can see and which are invisible.

An object is visible if both of the following conditions are true:

Indexes and snapshot isolation

Repeatable read and naming confusion

Preventing Lost Updates

Atomic write operations

Explicit locking

Automatically detecting lost updates

Compare-and-set

Conflict resolution and replication

Write Skew and Phantoms

Characterizing write skew

More examples of write skew

Phantoms causing write skew

Similar pattern:

  1. SELECT checks some requirement by searching for rows that match
  2. Continue based on result of the query
  3. If continue, application makes a write and commits the transaction. The effect of the write changes the step 1 query result. (This is called a phantom)

Snapshot isolation avoids phantoms in read-only queries.

Materializing conflicts

Serializability

Problems of preventing race conditions with isolation levels:

The recommendation from researchers is to use serializable isolation, the strongest level. The database prevents all possible race conditions.

[[DDIA-9 Consistency and Consensus]] discusses geenralizing serializability to a distributed system.

Actual Serial Execution

History

Encapsulating transactions in stored procedures

Taking out human from critical path to avoid long waiting time, transactions have continued to be executed in an interactive client/server style. Network time is the throughput bottleneck.

Therefore, systems with single-threaded serial transaction processing don’t allow interactive multi-statement transactions. The application must submit the entire transaction code to the database ahead of time, as a stored procedure.

Pros and cons of stored procedures

Modern implementations supports general language, e.g., Redis uses Lua. Combined with in-memory dat, executing all transactions on single-thread is now feasible.

Partitioning

Two-Phase Locking (2PL)

In 2PL, writers don’t just block other writers; they also block readers and vice versa. Snapshot isolation has the mantra readers never block writers, and writers never block readers ([[#Implementing snapshot isolation]])

Implementation of two-phase locking

Performance of two-phase locking

Predicate locks

How to fix [[#Phantoms causing write skew]]? The key idea here is that a predicate lock applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms).

Index-range locks

Serializable Snapshot Isolation (SSI)

Pessimistic versus optimistic concurrency control

Decisions based on an outdated premise

A premise is a fact that was true at the beginning of the transaction. How does the database know if a query result might have changed?

Detecting stale MVCC reads

Detecting writes that affect prior reads

Performance of serializable snapshot isolation

The rate of aborts significantly affects the overall performance of SSI. So SSI requires short read-write transactions, but it's less sensitive to slow transactions than two-phase locking or serial execution.