Thomas Wang

From journeyman to master.

DDIA Chapter 5. Repliation

Buy the book


Leaders and Followers

Synchronous Versus Asynchronous Replication

Setting Up New Followers

Handling Node Outages

Failover Process

  1. Determining that the leader has failed: timeout
  2. Choosing a new leader: election or appointed by previous controller node (more in [[DDIA-9 Consistency and Consensus]])
  3. Reconfiguring the system to use the new leader

Things can go wrong during failover:

Implementation of Replication Logs

Statement-based replication

Write-ahead log (WAL) shipping

Logical (row-based) log replication

Trigger-based replication

Problems with Replication Lag

Reading Your Own Writes

Monotonic Reads

Consistent Prefix Reads

Solutions for Replication Lag

  1. Does the user experience need to avoid replication lag
  2. Ways the app to provide stronger guarantee
  3. Underlying database gaurantee by transactions

Multi-Leader Replication

Use Cases for Multi-Leader Replication

Use on multi-datacenter deployment

Figure 5-6. Multi-leader replication across multiple datacenters

Clients with offline operation

E.g., CouchDB is designed for this mode of operation

Collaborative editing

Handling Write Conflicts

Synchronous versus asynchronous conflict detection

Conflict avoidance

Converging toward a consistent state

Custom conflict resolution logic

Multi-Leader Replication Topologies

Leaderless Replication

Writing to the Database When a Node Is Down

Read repair and anti-entropy

Quorums for reading and writing

Limitations of Quorum Consistency

However, even with w + r > n, there are likely to be edge cases where stale values are returned:


Monitoring staleness

Sloppy Quorums and Hinted Handoff

Multi-datacenter operation

Detecting Concurrent Writes

Explore more details about [[#Handling Write Conflicts]] below:

Last write wins (discarding concurrent writes)

The “happens-before” relationship and concurrency

Capturing the happens-before relationship

Note that the server can determine whether two operations are concurrent by looking at the version numbers—it does not need to interpret the value itself (so the value could be any data structure). The algorithm works as follows:

Merging concurrently written values

Version vectors