Raft :
Understandable Distributed Consensus

Tech Geek
4 min readNov 27, 2018

So What is Distributed Consensus?

Let’s start with an example…

Let’s say we have a single node system (For this example, you can think of our node as a database server that stores a single value.)

We also have a client that can send a value to the server.

Coming to agreement, or consensus, on that value is easy with one node.

But how do we come to consensus if we have multiple nodes?

That’s the problem of distributed consensus.Raft is a protocol for implementing distributed consensus.

Let’s look at a high level overview of how it works.

A node can be in 1 of 3 states:

  • The Follower state
  • The Candidate state
  • The Leader state.

All our nodes start in the follower state.

If followers don’t hear from a leader then they can become a candidate. The candidate then requests votes from other nodes.Nodes will reply with their vote.

The candidate becomes the leader if it gets votes from a majority of nodes.

This process is called Leader Election.

All changes to the system now go through the leader.Each change is added as an entry in the node’s log.

This log entry is currently uncommitted so it won’t update the node’s value.

To commit the entry the node first replicates it to the follower nodes…

then the leader waits until a majority of nodes have written the entry.The entry is now committed on the leader node and the node state is “5”.

The leader then notifies the followers that the entry is committed.

The cluster has now come to consensus about the system state.This process is called Log Replication.

Leader Election

In Raft there are two timeout settings which control elections.

First is the election timeout: The election timeout is the amount of time a follower waits until becoming a candidate.The election timeout is randomized to be between 150ms and 300ms.

After the election timeout the follower becomes a candidate and starts a new election term…. votes for itself…and sends out Request Vote messages to other nodes.

If the receiving node hasn’t voted yet in this term then it votes for the candidate and the node resets its election timeout.

Once a candidate has a majority of votes it becomes leader.

The leader begins sending out Append Entries messages to its followers.

These messages are sent in intervals specified by the heartbeat timeout.

Followers then respond to each Append Entries message.This election term will continue until a follower stops receiving heartbeats and becomes a candidate.

Followers then respond to each Append Entries message.This election term will continue until a follower stops receiving heartbeats and becomes a candidate.

Requiring a majority of votes guarantees that only one leader can be elected per term.If two nodes become candidates at the same time then a split vote can occur.

Log Replication

Once we have a leader elected we need to replicate all changes to our system to all nodes.This is done by using the same Append Entries message that was used for heartbeats.

Let’s see a process:

First a client sends a change to the leader.

The change is appended to the leader’s log then the change is sent to the followers on the next heartbeat.

An entry is committed once a majority of followers acknowledge it and a response is sent to the client.

Raft can even stay consistent in the face of network partitions.

Because of our partition we now have two leaders in different terms.

More details : https://raft.github.io/

--

--

Tech Geek

I’m a software developer from India, currently working with blockchain.