Alaya Consensus Mechanism - BFT Consensus Protocol

Alaya network applies CBFT (Concurrent Byzantine Fault Tolerance) based on partially synchronous to solve the consensus efficiency issues of blockchain. Based on that, Alaya proposed an whole new consensus protocol called Giskard consensus protocol.

Giskard combines the advantages of the mainstream excellent consensus protocols like PBFT, Tendermint and Hotstuff, enables multiple block productons in one view window by leaverging pipeline to complete the parallel processing of block production and verification. Moreover, it can switch the view windows with O(n2) to improve the consensus efficiency.

According to the distributed system theory, there are three kinds of network models of distributed systems:



1. Synchronous network model : all info sent by nodes will surely arrive at the target nodes in the specific time period;

2. Asynchronous network model : it’s not for sure that all info sent by the nodes will arrive at the target nodes;

3. Partially synchronous network model : all info sent by the nodes will finally arrive at the target nodes though it may be denied.



The synchronous network model works when it is in a very ideal situation, while asynchronous network model is the most close to the real model. However, according to the FLP Impossibility , under asychronous network model, the consensus algorithm can’t satisfy both satefy and liveness. Currently, most BFT consensus algorithms are based on the partially synchronous network model, and next we will talk about the consensus algorithms based on partially synchronous network model.





BFT Consensus Protocol

A distributed sytsem consists of multiple nodes, and the communication between nodes requires network and will reach an agreement and execute on some specific tasks according to the protocol they follow. There might be different types of errors in this process, which are:

1. Harmless errors such as node crash, network failure, packet loss, etc., which are not byzantine dfaults;

2. Vicious errors , for example, validators delay or refuse the message from network, leaders can propose the invalid blocks or nodes can send messages to different peers. Sometimes, the vicious validators will collaborate. All of these are byzantine faults.



Thus, we wish the system can keep two important attributes: safety and liveness all the time.

Safety : when the two above-mentioned errors happen, the consensus system won’t produce wrong results, namely, double spending and sharding.

Liveness : the system can keep delivering, namely, the consensus will keep going without any freezes. If the consensus of blockchain system freezes, then there is no response from new transactions. That is to say, it doesn’t meet liveness.



BFT can ensure the safety and liveness of distributed ecosystem even if there are vicious nodes. According to the Lamport ’s paper, all BFT have a basic assumption: when the sum of nodes is larger than 3f , the sum of vicious nodes is f maximally, and the honest nodes can reach an agreement on the correct result.





PBFT

PBFT (practical byzantine fault tolerance) is one of the first BFTs that can deal with the first and second types of errors in the real world, and solve the low efficiency of BFT algorithms before based on partially synchronous network model. It reduces the computation complexity from the expotential order to be the quadratic level of number of nodes, making BFT algorithms be feasible in the applications of practical system.



So far, the BFT consensus protocols applied in blockchain can be regarded as the variants of PBFT.

1. Normal Process

As shown in the picture below, C is the client, and there are 4 nodes starting from No.0 to No.3. Among them, No.3 is the byzantine node.





The normal process consists of three stages:

1. pre-prepare : The main node( Primary ) broadcasts the preprepare message ( Preprepare ) to all replica nodes ( Replica );

2. Prepare : All nodes tell that they know this message to each other in this stage. Once a node received the message containing n-f prepare messages (We will replace it with QC , short for Quorum Certificate ), it has entered into the prepared status.

  1. Commit : All nodes know this message and also know that so do other nodes. Once a node received the message containing n-f commit messages, it has entered into the committed status.





2. View Change Process

View Change is the most important part of the PBFT design. When Primary is suspended or Replica all think Primary is the problem node, ViewChange event is triggered and the viewchange stage starts. Under such circumstance, all nodes in this system will broadcast the view change request. When one nodes received enough view change requests, it will send the view change confirmation to the new main node. When the new main node received enough view change confirmations, the next view starts. Each view exchange should contain the sequence number of message that the node has entered into the prepared status.

During view change, we need to ensure that the sequence number of the delivered messages are consistent with the ones in the whole view change. To put it simple, when the sequence number of a message is n , and there have been 2f+1 received prepare messages, then likewise, in the next view, the sequence of it will be n, and the normal process starts.





As shown in picture 2, there are three stages of viewchange : view-change , view-change-ack and new-view . Once a node think Primary is suspicious, it will send view-change message to other node, and the the existing euryhaline node with the smallest sequence number will be the new main node. When the new main node received 2f view-change message from other nodes, it indicates that there are enough nodes think the main node is suspicious, and will broadcast it.