Skip to the content.

Distributed Consensus Reading List 📚

Since its inception in the 1980s, distributed consensus has been the subject of extensive academic research. Whilst definitions vary, distributed consensus (or equivalently, atomic broadcast) most often refers to the problem of how to decide an ordered sequence of values between a set of distributed nodes. This can be used to implement an append-only replicated log which can be utilized either directly or indirectly, to provide services such as primary backup replication or state machine replication. These abstractions can, in turn, form the building blocks of new abstractions, such as a distributed key-value store. Some consensus algorithms instead decide only a single value or a partially ordered sequence of values. What unifies distributed consensus algorithms is the fact that they are always safe, regardless of delays and crashes (though they are not necessarily Byzantine fault tolerance), and are guaranteed to make progress provided sufficient liveness.

This is a long list of papers relating to distributed consensus. Many of the papers listed below fit into more than one section. However, for simplicity, each paper is listed only in the most relevant section. Where possible, open access links for each paper have been provided.

Contributions via pull requests are welcome.

⭐️ Influential papers - If you are looking for a starting point, a subset of the most influential papers on distributed consensus are highlighted using a yellow star. ⭐️

💎 Hidden gems - Papers which I personally love but are not as highly cited as the influential papers 💎

Key: acmdl = ACM Digital Library

The sections are as follows:

Distributed Consensus

Theoretical results

This section lists theoretical results relating to distributed consensus.

Surveys

This section lists surveys, tutorials, and systemization of knowledge papers covering distributed consensus algorithms.

Algorithms for consensus

This section lists papers describing algorithms for distributed consensus. These papers tend to be theory papers (venues such as PODC, DISC, OPODIS) whereas the Implementations of consensus section focuses on systems papers.

Consensus for specialist hardware

This section lists papers describing consensus algorithms using specialist hardware such as SDN, IP-multicast, or RDMA.

Consensus for geo-distributed systems

This section covers papers describing consensus algorithms for WANs and/or geo-replicated systems. Many of these algorithms (such as EPaxos) are leaderless and decide a partial-ordering over values instead of the more traditional total-ordering approach.

Consensus in production

This section lists papers describing experiences of deploying distributed consensus in production.

Implementations of consensus

This section lists papers describing implementations of distributed consensus algorithms.

Evaluations of consensus

This section lists papers describing standalone evaluations of consensus algorithms.

State machine replication

This section lists papers about the application of consensus to State Machine Replication (SMR/RSMs) and Linearizability.

Reconfiguration

This section lists papers on reconfiguration & leader election.

Weaker consistency models

This section lists papers that discuss alternative consistency models to linearizability and/or systems that depend upon synchrony for correctness (not just liveness).

Failures

This section lists papers that analyze and/or handle real-world failures of distributed systems.

Clocks

The liveness of distributed consensus depends on some degree of clock synchronization. The following section lists papers on the topic of clock synchronization.

Correctness of consensus algorithms

This section lists papers on proving or testing the correctness of consensus algorithms.

Quorum systems

This section lists papers on quorum systems.

Byzantine fault tolerance

This section lists papers on Byzantine Fault Tolerance (BFT), often used as the basis of permissioned blockchains.

BFT surveys

BFT in theory

BFT in practice

Alternative fault models in distributed consensus

Most of these papers handle crash faults or byzantine faults. This section considers the fault models between crash and byzantine.

Misc

Blog posts, books, talks, dissertations, etc…


Future reading list

The following lists contain places to watch for new writings in the field of distributed consensus. They are in no particular order.

Blogroll

Reading lists

Academic conferences & symposiums

Dan Tsafrir maintains a useful list of systems conferences by deadline.

Academic workshops

Academic journals & magazines