【论文翻译】HotStuff:BFT Consensus in the Lens of Blockchain – Introduction

这是《HotStuff:BFT Consensus in the Lens of Blockchain》论文翻译的第一篇,包含了论文的摘要和简介部分。


摘要

在这篇文章中,我们将介绍HotStuff算法。这个算法是一个基于leader节点的拜占庭问题(Byzantine fault-tolerant replication protocal, BFT)的部分同步(partially synchronous)模型。一旦网络进入同步状态,HotStuff算法将允许一个leader节点依照某个频率(如最大网络延迟)发起共识。这个算法的网络通信开销和副本(replica)的数量成线性相关。据我们所知,HotStuff是第一个拥有上述特征的部分同步的BFT算法(partially synchronous BFT replication protocol)。我们的算法被应用到了一个小型网络中,该网络可以同时支持其他的著名协议,如DLS、PBFT、Tendermint、Casper等等……

【副本的概念是由状态复制机引入的。全异步指的是网络传输不存在上限,半同步指的是网络传输存在上限,但大家不知道。全同步指的是网络传输存在已知上限】

We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay—a property called responsiveness—and with communication complexity that is linear in the number of replicas. To our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. HotStuff is built around a novel framework that forms a bridge between classical BFT foundations and blockchains. It allows the expression of other known protocols (DLS, PBFT, Tendermint, Casper), and ours, in a common framework.

在一个副本超过100个的网络中,HotStuff延迟与BFT-SmaRt相当,且在leader节点更替时,拥有线性的网络通信次数(与之相对的,BFTSMaRt算法相同的功能需要使用O(n^3)次网络通信)。

Our deployment of HotStuff over a network with over 100 replicas achieves throughput and latency comparable to that of BFT-SMaRt, while enjoying linear communication footprint during leader failover (vs. cubic with BFTSMaRt).

简介

拜占庭容错(Byzantine fault tolerance, BFT)指的是一个计算网络在其副本节点遭遇任意错误(如拜占庭错误)时,如何保证关键的网络操作能够得以执行。从状态复制机(State Machine Replication, SMR)的角度来看,整个系统提供了一个可复制的服务,该服务可以被镜像部署到网络的N个副本中。一个BFT-SMR协议用来保证没有出错的副本能够以统一的顺序执行一系列由客户端提交的指令,即使存在一些拜占庭节点尝试阻止网络达成共识。在一轮共识中,算法保证由n-f个未出错的副本能够独立的执行客户端指令,并产生该指令下相同且唯一的结果。我们在这里尝试考虑部分同步通行模型(Partially Synchronous Communication Model),在指令发出之后的一定全局稳定时间(Global Stabilazation Time, GST) \Delta后,所有的未出错副本都能够对于指令的顺序以及结果产生共识,进而让整个网络的状态变为确定性的。该模型中,共识达成需要满足条件n \geq 3f+1

【这里的f意思是可以允许的“恶意”拜占庭节点的个数,具体推导细节可以看本文Model章节的Cryptographic primitives部分】

Byzantine fault tolerance (BFT) refers to the ability of a computing system to endure arbitrary (i.e., Byzantine) failures of its components while taking actions critical to the system’s operation. In the context of state machine replication (SMR) [35, 47], the system as a whole provides a replicated service whose state is mirrored across n deterministic replicas. A BFT SMR protocol is used to ensure that non-faulty replicas agree on an order of execution for clientinitiated service commands, despite the efforts of f Byzantine replicas. This, in turn, ensures that the n−f non-faulty replicas will run commands identically and so produce the same response for each command. As is common, we are concerned here with the partially synchronous communication model [25], whereby a known bound ∆ on message transmission holds after some unknown global stabilization time (GST). In this model, n \geq 3f + 1 is required for non-faulty replicas to agree on the same commands in the same order (e.g., [12]) and progress can be ensured deterministically only after GST [27].

在BFT SMR协议最开始构想的时候,研究者往往使用一个经典的研究系统,该系统拥有n=4n=7个副本节点,并被部署在一个本地网络中。然而,近期对于BFT的研究,由于区块链的出现,拥有了新的研究重点,即尝试让网络能够拓展到更大的规模n。与无授权的区块链系统(如比特币)相反,一种称之为授权区块链系统,拥有指定个数的副本,用于维护账本或者客户端指令的顺序,换句话来说,这些副本支持了网络的SMR特性。如果不考虑副本授权的限制,副本的个数可以达到几百个。值得注意的是,最终的全局稳定时间\Delta在一个广域网络中,需要进行针对性调整以适应网络通信的延时。

【区块链往往看重是怎么能够得到一个高吞吐量的账本,现在比特币那种10分钟一次记账频率事实上是完全不达标的】

When BFT SMR protocols were originally conceived, a typical target system size was n = 4 or n = 7, deployed on a local-area network. However, the renewed interest in Byzantine fault-tolerance brought about by its application to blockchains now demands solutions that can scale to much larger n. In contrast to permissionless blockchains such as the one that supports Bitcoin, for example, so-called permissioned blockchains involve a fixed set of replicas that collectively maintain an ordered ledger of commands or, in other words, that support SMR. Despite their permissioned nature, numbers of replicas in the hundreds or even thousands are envisioned (e.g., [42, 30]). Additionally, their deployment to wide-area networks requires setting \Delta to accommodate higher variability in communication delays.

扩展性挑战 算法PBFT的引入,标志着第一个可以实践的半同步BFT解决方案的实现。基于该算法的两步共识算法,又有各种各样的BFT解决方案被创造。从实践的角度来看,一个稳定的leader节点可以通过两轮的消息交换达成共识,第一阶段通过让一个副本获得包含n-f个投票的法定人数证书(Quorum Certificate, QC),使之成为leader,进而保证提议的唯一性。第二阶段则保证leader节点可以说服其他副本去向一个安全的提议投票。

The scaling challenge. Since the introduction of PBFT [20], the first practical BFT replication solution in the partial synchrony model, numerous BFT solutions were built around its core two-phase paradigm. The practical aspect is that a stable leader can drive a consensus decision in just two rounds of message exchanges. The first phase guarantees proposal uniqueness through the formation of a quorum certificate (QC) consisting of (n−f) votes. The second phase guarantees that the next leader can convince replicas to vote for a safe proposal.

为新的leader节点收集信息以及向副本发送提案的过程,称之为视图转换(view change),是整个网络的核心操作。不幸的是,视图转换是基于两步算法的,过于简单且容易出错,即使是在中等大小的网络中,也会存在严重的通信损失。它要求新的领导者从n-f副本中传递信息,每个副本都报告自己知道的QC值最高的副本。即使仅计算认证耗时(数字签名或消息认证码),为了传输新的提案,PBFT算法需要消耗O(n^3)的通信代价。该算法的一个变体,通过阈值数字签名(Threshold Digital Signatures),将多个认证器组合成一个的算法能有效优化耗时,不过即便如此依旧有O(n^2)的时间复杂度。在PBFT中,在单次共识达成之前,有O(n)的视图转换,故总体的传输量为O(n^4),即使加入了阈值签名优化也是O(n^3)复杂度。这种由规模增大带来的传输量剧增挑战着包括PBFT在内的诸多协议,包括Prime、Zyzzyva 、Upright、BFT-SMaRt、 700BFT和SBFT.

The algorithm for a new leader to collect information and propose it to replicas—called a view-change—is the epicenter of replication. Unfortunately, view-change based on the two-phase paradigm is far from simple [38], is bug-prone [4], and incurs a significant communication penalty for even moderate system sizes. It requires the new leader to relay information from (n-f) replicas, each reporting its own highest known QC. Even counting just authenticators (digital signatures or message authentication codes), conveying a new proposal has a communication footprint of O(n^3) authenticators in PBFT, and variants that combine multiple authenticators into one via threshold digital signatures (e.g., [18, 30]) still send O(n^2) authenticators. The total number of authenticators transmitted if O(n) view-changes occur before a single consensus decision is reached is O(n^4) in PBFT, and even with threshold signatures is O(n^3). This scaling challenge plagues not only PBFT, but many other protocols developed since then, e.g., Prime [9], Zyzzyva [34], Upright [22], BFT-SMaRt [13], 700BFT [11], and SBFT [30].

HotStuff算法含有三个核心步骤,第一步允许新leader节点选择周围QC值最高的副本。第二步允许其他副本在投票环节之后改变想法,由于这个改变不需要leader节点进行任何的证明,因此同时降低了时间复杂度以及换leader协议的复杂度。最后,在几乎完成所有阶段的共识后,HotStuff能够很简单的并行化,以保证便捷的leader循环。

HotStuff revolves around a three-phase core, allowing a new leader to simply pick the highest QC it knows of. It introduces a second phase that allows replicas to “change their mind” after voting in the phase, without requiring a leader proof at all. This alleviates the above complexity, and at the same time considerably simplifies the leader replacement protocol. Last, having (almost) canonized all the phases, it is very easy to pipeline HotStuff, and to frequently rotate leaders.

据我们所知,BFT协议只有在区块链领域(如Tendermint和Casper)拥有如此简单的leader选举制度,然而,这些系统都是围绕同步内核建立的,在此情形下,提议是按照一个提前定义好的间隔被提出的。这个间隔保证了在最坏情况下,消息能够在广域网络中完整传播。这样做意味着我们不得不放弃大部分最为常用的BFT-SMR解决方案(包括上面列举的那一些),我们放弃的上述解决方案都被定义为拥有乐观响应性(Optimistic Responsiveness)。非正式的,响应性(Responsiveness)要求非故障leader节点一旦被指定,就可以仅根据实际消息延迟来使协议及时达成共识,而与消息传输延迟的任何已知上限无关。更加适合我们模型的是乐观响应性,他放宽了标准,仅仅要求在“乐观”的情况(比如最普通的情况下)拥有响应性,在这里乐观的情况指的是GST时限到达的时候。不管是否乐观,在算法Tendermint/Casper中,响应能力都被排除在外,这个问题的症结在于,可能存在一个诚实的副本,拥有最高的QC值,但是leader对此毫不知情。我们完全可以构造一个场景,使得刚刚描述的问题能够让共识的进展直接卡死(第4.4节将介绍这个问题的详细细节)。基于现有的一些已部署工程产生的报告,如果没有能够在关键协议步骤中纳入必要的延迟,可能使得整个系统彻底失去活性。

To our knowledge, only BFT protocols in the blockchain arena like Tendermint [15, 16] and Casper [17] follow such a simple leader regime. However, these systems are built around a synchronous core, wherein proposals are made in pre-determined intervals that must accommodate the worst-case time it takes to propagate messages over a wide-area peer-to-peer gossip network. In doing so, they forego a hallmark of most practical BFT SMR solutions (including those listed above), namely optimistic responsiveness [42]. Informally, responsiveness requires that a non-faulty leader, once designated, can drive the protocol to consensus in time depending only on the actual message delays, independent of any known upper bound on message transmission delays [10]. More appropriate for our model is optimistic responsiveness, which requires responsiveness only in beneficial (and hopefully common) circumstances—here, after GST is reached. Optimistic or not, responsiveness is precluded with designs such as Tendermint/Casper. The crux of the difficulty is that there may exist an honest replica that has the highest QC, but the leader does not know about it. One can build scenarios where this prevents progress ad infinitum (see Section 4.4 for a detailed liveless scenario). Indeed, failing to incorporate necessary delays at crucial protocol steps can result in losing liveness outright, as has been reported in several existing deployments, e.g., see [3, 2, 19].

我们的贡献点 据我们所知,HotStuff是第一个拥有下列特性的MFT-SMR协议——

  • 线性视图变换复杂度:在达到GST时限后,任何正常的leader节点,一旦被选定,只需要与O(n)个认证副本通信,就可以提出一次共识决策。包括leader节点被替换的情况。这也意味着,大量领导者崩溃的情况下,到达GST的共识通信成本最坏O(n^2)
  • 乐观响应性:在达到GST时限后,任何正常的leader节点,一旦被选定,就只需要等待最先到达的n-f个响应,就可以确信自己有能力发送一个可以通过的提案。包括leader节点刚刚修改的情况。

Our contributions. To our knowledge, we present the first BFT SMR protocol, called HotStuff, to achieve the following two properties:

  • Linear View Change: After GST, any correct leader, once designated, sends only O(n) authenticators to drive a consensus decision. This includes the case where a leader is replaced. Consequently, communication costs to reach consensus after GST is O(n^2) authenticators in the worst case of cascading leader failures.
  • Optimistic Responsiveness: After GST, any correct leader, once designated, needs to wait just for the first n-f responses to guarantee that it can create a proposal that will make progress. This includes the case where a leader is replaced.

HotStuff的另一个特征是,一个新的leader节点尝试达成共识的协议通信开销不大于当前leader节点。这意味着HotStuff支持频繁的leader节点更替,这样更能够保证链的质量,在区块链领域是非常重要的。

Another feature of HotStuff is that the costs for a new leader to drive the protocol to consensus is no greater than that for the current leader. As such, HotStuff supports frequent succession of leaders, which has been argued is useful in blockchain contexts for ensuring chain quality [28].

HotStuff为了实现上述的特性,对于每个视图,都增加了一个额外的阶段,额外阶段的微小开销换取了简化leader更替的协议复杂度。这个交换只对于真实的网络延迟带来的开销,通常这个额外开销相比于\Delta是非常小的。我们期望额外的延迟会远小于之前协议为了实现线性视图变换而放弃响应性产生的开销。此外,通过第5章介绍的高效的管道操作,我们甚至可以让吞吐量不受影响。

HotStuff achieves these properties by adding another phase to each view, a small price to latency in return for considerably simplifying the leader replacement protocol. This exchange incurs only the actual network delays, which are typically far smaller than \Delta in practice. As such, we expect this added latency to be much smaller than that incurred by previous protocols that forgo responsiveness to achieve linear view-change. Furthermore, throughput is not affected due to the efficient pipeline we introduce in Section 5.

在理论贡献之外,HotStuff同时提供了一个BFT算法的复现。有助于读者全面了解BFT协议:

  • 一个运行在图上的BFT复现框架。 通过投票和规则提交保障的安全性。 活跃性是通过Pacemaker提供的,支持向图添加新的副本。
  • 在该框架下实现已知协议(DLS、PBFT、Tendmint和Casper),以及我们自己的HotStuff协议。

In addition to the theoretical contribution, HotStuff also provides insights in understanding BFT replication in general and instantiating the protocol in practice (see Section 6):

A framework for BFT replication over graphs of nodes. Safety is specified via voting and commit graph rules.

  • Liveness is specified separately via a Pacemaker that extends the graph with new nodes.
  • A casting of several known protocols (DLS, PBFT, Tendermint, and Casper) and one new (ours, HotStuff), in this framework.

HotStuff的另一个优点是非常简单,由于其机制本身就是非常经济性的:只包含两种消息的类别,以及一个简单地规则用于决定副本相互之间如何对待。安全性是通过投票和提交规则来保证的,而活跃性则需要Pacemaker机制,该机制与安全机制是完全独立的。同时,他允许通过少量的变化实现几种已知的协议(DLS, PBDF, Tendermint和Casper)。这种灵活性来源于其对于图上节点的操作,在现代区块链和传统的BFT基础间架设了一个桥梁。

HotStuff has the additional benefit of being remarkably simple, owing in part to its economy of mechanism: There are only two message types and simple rules to determine how a replica treats each. Safety is specified via voting and commit rules over graphs of nodes. The mechanisms needed to achieve liveness are encapsulated within a Pacemaker, cleanly separated from the mechanisms needed for safety. At the same time, it is expressive in that it allows the representation of several known protocols (DLS, PBFT, Tendermint, and Casper) as minor variations. In part this flexibility derives from its operation over a graph of nodes, in a way that forms a bridge between classical BFT foundations and modern blockchains.

我们设计了一个HotStuff的原型并进行了初步评估。HotStuff被部署在一个具有一百多个副本的网络上。HotStuff实现的吞吐量与延迟可以媲美BFT-SMaRt等成熟系统,这些系统的代码复杂度都远大于HotStuff。我们进一步演示了在频繁更改领导者的情况下,HotStuff的通信开销基本不变,而BFT-SMaRt则随着副本的增长呈二次方增长。

We describe a prototype implementation and a preliminary evaluation of HotStuff. Deployed over a network with over a hundred replicas, HotStuff achieves throughput and latency comparable to, and sometimes exceeding, those of mature systems such as BFT-SMaRt, whose code complexity far exceeds that of HotStuff. We further demonstrate that the communication footprint of HotStuff remains constant in face of frequent leader replacements, whereas BFT-SMaRt grows quadratically with the number of replicas.

表1 GST后各协议的表现对比

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据