【论文翻译】HotStuff:BFT Consensus in the Lens of Blockchain – Model

这是《HotStuff:BFT Consensus in the Lens of Blockchain》论文翻译的第三篇,包含了论文的模型部分。


3. 模型

我们考虑一个由n=3f+1个副本组成的系统,由i \in [n]索引,其中[n] = {1,2,\dots,n}。一个集合拥有至少f = |F|的集合F \contains [n]是拜占庭正确的,反之则是错误的。我们通常将拜占庭副本定义为由竞争对手操控的副本,它们拥有这些副本所持有的所有内部状态(包括它们的加密密钥,见下文)。

We consider a system consisting of a fixed set of n = 3f + 1 replicas, indexed by i ∈ [n] where [n] = {1, . . . , n}. A set F ⊂ [n] of up to f = |F| replicas are Byzantine faulty, and the remaining ones are correct. We will often refer to the Byzantine replicas as being coordinated by an adversary, which learns all internal state held by these replicas (including their cryptographic keys, see below).

网络通信是点对点的,经过验证且可靠的:一个正常的副本可以从另一个正常副本接受消息当且仅当后者确实曾经发送过消息。当我们说“广播”时,也往往指的是发送相同的点对点消息给包括自身在内的所有副本。我们采用了Dwork等人的部分同步模型,模型存在一个已知的界限\Delta和一个未知的全局稳定时间GST。在GST时限过后,副本间的所有通信都顺利到达。我们的模型可以保持持续的安全性,并且保证可以在GST之后有限的时间内获得进展(前人已经证明GST之前取得任何进展是不可能的)。在实际操作中,如果系统在GST后保持足够长时间的稳定,我们的模型一定会取得进展。这里,对于稳定性的要求是用来简化讨论的。

Network communication is point-to-point, authenticated and reliable: one correct replica receives a message from another correct replica if and only if the latter sent that message to the former. When we refer to a “broadcast”, it involves the broadcaster, if correct, sending the same point-to-point messages to all replicas, including itself. We adopt the partial synchrony model of Dwork et al. [25], where there is a known bound ∆ and an unknown Global Stabilization Time (GST), such that after GST, all transmissions between two correct replicas arrive within time ∆. Our protocol will ensure safety always, and will guarantee progress within a bounded duration after GST. (Guaranteeing progress before GST is impossible [27].) In practice, our protocol will guarantee progress if the system remains stable (i.e., if messages arrive within ∆ time) for sufficiently long after GST, though assuming that it does so forever simplifies discussion.

加密原语 HotStuff使用阈值签名加密。在一个(k-n)阈值签名的场景下,所有节点都持有一个公钥,而这n个副本各自又单独持有一个唯一的私钥。第i个副本可以用他所持有的私钥去生成消息m的一个部分签名\rho_i \leftarrow tsign_i(m)。在|I|=k的条件下,部分签名{\rho_i}, i \in I可以用于生成一个m的数字签名\sigma \leftarrow tcombine(m, {\rho_i}, i\in I)。所有其余的节点都可以使用签名以及函数tverify验证签名的合法性。即tverify(m, \sigma)返回真当且仅当对于i \in I, |I| = k,有\rho_i \leftarrow tsign_i(m),且\sigma \leftarrow tcombine(m, {\rho_i}, i\in I)。考虑到我们需要允许先知者(oracles)访问oracles{tsign_i(\cdot)}{i \in [n]/F},一个竞争对手通过在小于k-f个先知者处查询tsign_i(m)得到可验证通过的签名概率可以忽略不计。在后续的论文中,我们将使用k=2f+1作为阈值,同样,我们通常会在协议描述中保留隐含的tverify调用。

Cryptographic primitives. HotStuff makes use of threshold signatures [48, 18, 14]. In a (k, n)-threshold signature scheme, there is a single public key held by all replicas, and each of the n replicas holds a distinct private key. The i-th replica can use its private key to contribute a partial signature \rho_i \leftarrow tsign_i(m) on message m. Partial signatures {\rho_i} i \in I , where |I|=k and each \rho_i \leftarrow tsign_i(m), can be used to produce a digital signature \sigma \leftarrow tcombine(m, {\rho_i}, i\in I) on m. Any other replica can verify the signature using the public key and the function tverify. We require that if \rho_i \leftarrow tsign_i(m) for each i \in I, |I| = k, and if \sigma \leftarrow tcombine(m, {\rho_i}, i\in I), then tverify(m, \sigma) returns true. However, given oracle access to oracles oracles{tsign<em>i(\cdot)}</em>{i \in [n]/F} , an adversary who queries tsign_i(m) on strictly fewer than k-f of these oracles has negligible probability of producing a signature σ for the message m (i.e., such that tverify(m, \sigma)returns true). Throughout this paper, we use a threshold of k = 2f + 1. Again, we will typically leave invocations of tverify implicit in our protocol descriptions.

我们还需要一个加密哈希函数h(该函数也被称为消息摘要函数),该函数将任意长度的数据映射至一个固定长度的输出。哈希函数应该是能够防止碰撞的,意思是对手能够找到两个输入mm_o,满足h(m)=h(m_o)是忽略不计的。综上,h(m)可以被看作输入消息唯一的标识符。

We also require a cryptographic hash function h (also called a message digest function), which maps an arbitrary length input to a fixed-length output. The hash function must be collision resistant [46], which informally requires that the probability of an adversary producing inputs m and m 0 such that h(m) = h(m_0) is negligible. As such, h(m) can serve as an identifier for a unique input m in the protocol.

复杂性度量 我们关心的复杂度度量是“认证复杂度”,具体来说,指的是GST之后(在GST之前,可能存在一些比较坏的情况,共识通信消息尚未完全到达),为了获得决策的共识,所有副本i \in [n]从其他副本那里获得的验证器(authenticator)的总和。这里,验证器或者是一个部分签名或者是一个完整签名。认证复杂度在衡量通信复杂度方面很有意义,原因有三点:第一,和比特复杂度相似,和消息复杂度不同,认证复杂度隐藏了一些由于传输方式导致的细节偏差。比如说,被分装到n个消息中的认证器和被装在1个消息中的认证器拥有相同的认证开销。第二,与比特复杂度相比,认证复杂度更能够捕获这类迭代达成共识的算法的开销。第三,在实践中,我们发现生成或验证数字签名,合并部分签名的密码学操作通常是使用他们的协议中计算最为密集的操作,因此验证复杂度同样也提供了对于计算量的洞察。

Complexity measure. The complexity measure we care about is authenticator complexity, which specifically is the sum, over all replicas i \in [n], of the number of authenticators received by replica i in the protocol to reach a consensus decision after GST. (Again, before GST, a consensus decision might not be reached at all in the worst case [27].) Here, an authenticator is either a partial signature or a signature. Authenticator complexity is a useful measure of communication complexity for several reasons. First, like bit complexity and unlike message complexity, it hides unnecessary details about the transmission topology. For example, n messages carrying one authenticator count the same as one message carrying n authenticators. Second, authenticator complexity is better suited than bit complexity for capturing costs in protocols like ours that reach consensus repeatedly, where each consensus decision (or each view proposed on the way to that consensus decision) is identified by a monotonically increasing counter. That is, because such a counter increases indefinitely, the bit complexity of a protocol that sends such a counter cannot be bounded. Third, since in practice, cryptographic operations to produce or verify digital signatures and to produce or combine partial signatures are typically the most computationally intensive operations in protocols that use them, the authenticator complexity provides insight into the computational burden of the protocol, as well.

【论文翻译】HotStuff:BFT Consensus in the Lens of Blockchain – Related work

这是《HotStuff:BFT Consensus in the Lens of Blockchain》论文翻译的第二篇,包含了论文的相关工作部分。


2. 相关工作

面对拜占庭失败,如何能够让节点达成共识,就是Lamport等研究人员提出的“拜占庭将军问题”。当然,所谓“拜占庭失败”这个概念也是他们创造出来的。该问题的第一个同步解决方案是由Pease等人提出的,经由Dolev和Strong等人的改进后,通信复杂度被降至O(n^3)级别,这个复杂度由Dolev和Resichuk等人进行了严格证明。Katz和Koo给出了一个基于leader节点的同步解决方案,具有期望常数轮的运行次数以及(n-1)/2的节点容错量。

Reaching consensus in face of Byzantine failures was formulated as the Byzantine Generals Problem by Lamport et al. [37], who also coined the term “Byzantine failures”. The first synchronous solution was given by Pease et al. [43], and later improved by Dolev and Strong [24]. The improved protocol has O(n^3) communication complexity, which was shown optimal by Dolev and Reischuk [23]. A leader-based synchronous protocol that uses randomness was given by Katz and Koo [32], showing an expected constant-round solution with (n-1)/2 resilience.

同时,在异步的场景下,Fischer等人证明了问题在单一故障下,是无法被确切地解决的。此外,Ben等人证明了任意异步解决方案拥有一个容错量(n-1)/3的界限。两种技巧被提出,用于规避上述的“无解性”。一个是由Ben提出的基于随机化的做法,使用持续随机的“虚拟硬币”翻转,直到恰好收敛到共识状态。后续的工作使用加密方法来共享一个不可预测的“硬币”,将复杂度降低到可预期的恒定回合,以及复杂度O(n^3)的通信开销。

Meanwhile, in the asynchronous settings, Fischer et al. [27] showed that the problem is unsolvable deterministically in asynchronous setting in face of a single failure. Furthermore, an (n-1)/3 resilience bound for any asynchronous solution was proven by Ben-Or [12]. Two approaches were devised to circumvent the impossibility. One relies on randomness, initially shown by Ben-Or [12], using independently random coin flips by processes until they happen to converge to consensus. Later works used cryptographic methods to share an unpredictable coin and drive complexities down to constant expected rounds, and O(n^3) communication [18].

第二种做法则依赖于部分同步,该方法最初由Bwork, Lynch和Stockmyer(DLS)提出,该协议在异步执行期间保证了安全性,并且在系统同步后,DLS保证系统可停机。一旦同步性被成功维护,DLS总计会产生O(n^4)的通信开销和后续单次决策的O(n)级别开销。

The second approach relies on partial synchrony, first shown by Dwork, Lynch, and Stockmeyer (DLS) [25]. This protocol preserves safety during asynchronous periods, and after the system becomes synchronous, DLS guarantees termination. Once synchrony is maintained, DLS incurs O(n^4) total communication and O(n) rounds per decision.

状态复制机(SMR)通过其核心的共识机制对客户端请求进行排序,以达到确定性的执行结果。这种SMR反复出现的需求,让Lamport设计出了Paxos协议,该协议提供了一个高效的流水线,在这个流水线上,一个稳定的leader节点可以在一轮通信中,使用线性的通信开销来广播决策。Castro和Liskov也实现了相似的需求,他们提供了一个高效的基于leader节点的拜占庭SMR协议,该协议名字叫做PBFT。在PBFT中,一个稳定的leader节点需要(n^2)的通信开销和两轮通信来完成一次决策广播,同时,倘若leader节点需要更替,则需要花费O(n^3)的通信开销。PBFT被部署在了多个系统中,其中就包括BFTSMaRt。Kotla等研究者则在PBFT基础上,引入了一个乐观的线性路径(optimistic linear path),并得到了Zyzzyva协议,该协议也被部署在了Upright、Byzcoin等系统中。乐观路径拥有线性的复杂度,但leader节点的更替仍然需要O(n^3)的时间复杂度。Abraham等人随机表示Zyzzyva协议存在安全漏洞,并提交了修正方案。改修正方案还有效地减小了协议自身的复杂度。Song等人提出了Bosco协议,这是一个建议的单阶段协议,在乐观路径上拥有较低的延迟。该协议要求5f+1个副本节点。SBFT协议引入了一个能够在O(n^2)通信复杂度下,实现view-change的协议,并且支持一个稳定的leader节点实现乐观的线性、单轮决策。协议利用两种方法来降低通信复杂度,Reiter提出的一个机遇收集者(collector)的通信机制,以及Cachin等人提出的通过门限密码(threshold crptography)对协议投票以实现部分签名合并的算法。

State machine replication relies on consensus at its core to order client requests so that correct replicas execute them in this order. The recurring need for consensus in SMR led Lamport to devise Paxos [36], a protocol that operates an efficient pipeline in which a stable leader drives decisions with linear communication and one roundtrip. A similar emphasis led Castro and Liskov [20, 21] to develop an efficient leader-based Byzantine SMR protocol named PBFT, whose stable leader requires O(n^2) communication and two round-trips per decision, and the leader replacement protocol incurs O(n^3) communication. PBFT has been deployed in several systems, including BFTSMaRt [13]. Kotla et al. introduced an optimistic linear path into PBFT in a protocol named Zyzzyva [34], which was utilized in several systems, e.g., Upright [22] and Byzcoin [33]. The optimistic path has linear complexity, while the leader replacement protocol remains O(n 3 ). Abraham et al. [4] later exposed a safety violation in Zyzzyva, and presented fixes [5, 30]. On the other hand, to also reduce the complexity of the protocol itself, Song et al. proposed Bosco [49], a simple one-step protocol with low latency on the optimistic path, requiring 5f + 1 replicas. SBFT [30] introduces an O(n^2) communication view-change protocol that supports a stable leader protocol with optimistically linear, one round-trip decisions. It reduces the communication complexity by harnessing two methods: a collector based communication paradigm by Reiter [45], and signature combining via threshold cryptography on protocol votes by Cachin et al. [18].

Ramasamy等人提出了一个基于leader节点的随机化拜占庭SMR协议,同时,Miller等人则提出了一个无leader节点的变种,称之为HoneyBadgerBFT。他们的核心思想是,随机化的拜占庭解决方案应该由随机化的异步共识解决。他们最佳的通信复杂度为O(n^3),并通过批处理摊销成本。然而,最近,基于HotStuff文章中提到的思想,本文同时投递的另一篇PODC’19的论文将通信复杂度降低到了O(n^2)

A leader-based Byzantine SMR protocol that employs randomization was presented by Ramasamy et al. [44], and a leaderless variant named HoneyBadgerBFT was developed by Miller et al. [39]. At their core, these randomized Byzantine solutions employ randomized asynchronous Byzantine consensus, whose best known communication complexity was O(n^3) (see above), amortizing the cost via batching. However, most recently, based on the idea in this HotStuff paper, a parallel submission to PODC’19 [8] further improves the communication complexity to O(n^2).

比特币的核心模块是一个称之为中本共识(Nakamoto Consensus)的协议,这是一个同步协议,仅保证了概率意义上的安全性,并存在不确定性。该协议是在一个无许可限制的网络中运行的,也就是说参与者是未知的。通过工作量证明保证其摊销。正如上述所介绍的,最近的区块链解决方案都是以各种方式将工作量证明和经典的BFT协议混合设计的。这些混合的解决方案中,都需要解决轮流领导的问题,这为HotStuff提供了设计动机。

Bitcoin’s core is a protocol known as Nakamoto Consensus [40], a synchronous protocol with only probabilistic safety guarantee and no finality (see analysis in [28, 41, 6]). It operates in a permissionless model where participants are unknown, and resilience is kept via Proof-of-Work. As described above, recent blockchain solutions hybridize Proof-of-Work solutions with classical BFT solutions in various ways [26, 33, 7, 17, 29, 31, 42]. The need to address rotating leaders in these hybrid solutions and others provide the motivation behind HotStuff.

【论文翻译】HotStuff:BFT Consensus in the Lens of Blockchain – Introduction

这是《HotStuff:BFT Consensus in the Lens of Blockchain》论文翻译的第一篇,包含了论文的摘要和简介部分。


摘要

在这篇文章中,我们将介绍HotStuff算法。这个算法是一个基于leader节点的拜占庭问题(Byzantine fault-tolerant replication protocal, BFT)的部分同步(partially synchronous)模型。一旦网络进入同步状态,HotStuff算法将允许一个leader节点依照某个固定间隔(如最大网络延迟)发起共识。这个算法的网络通信开销和副本(replica)的数量成线性相关。据我们所知,HotStuff是第一个拥有上述特征的部分同步的BFT算法(partially synchronous BFT replication protocol)。我们的算法被部署到了一个小型测试网络中,该网络可以同时支持其他的著名协议,如DLS、PBFT、Tendermint、Casper等等……

【副本的概念是由状态复制机引入的。全异步指的是网络传输不存在上限,半同步指的是网络传输存在上限,但大家不知道。全同步指的是网络传输存在已知上限】

We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay—a property called responsiveness—and with communication complexity that is linear in the number of replicas. To our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. HotStuff is built around a novel framework that forms a bridge between classical BFT foundations and blockchains. It allows the expression of other known protocols (DLS, PBFT, Tendermint, Casper), and ours, in a common framework.

在一个副本超过100个的网络中,HotStuff延迟与BFT-SmaRt相当,且在leader节点更替时,拥有线性的网络通信次数(与之相对的,BFTSMaRt算法相同的功能需要使用O(n^3)次网络通信)。

Our deployment of HotStuff over a network with over 100 replicas achieves throughput and latency comparable to that of BFT-SMaRt, while enjoying linear communication footprint during leader failover (vs. cubic with BFTSMaRt).

简介

拜占庭容错(Byzantine fault tolerance, BFT)指的是一个计算网络在其副本节点遭遇任意错误(如拜占庭错误)时,如何保证关键的网络操作能够得以执行。从状态复制机(State Machine Replication, SMR)的角度来看,整个系统提供了一个可复制的服务,该服务可以被镜像部署到网络的N个副本中。一个BFT-SMR协议用来保证没有出错的副本能够以统一的顺序执行一系列由客户端提交的指令,即使存在一些拜占庭节点尝试阻止网络达成共识。在一轮共识中,算法保证由n-f个未出错的副本能够独立的执行客户端指令,并产生该指令下相同且唯一的结果。我们在这里尝试考虑部分同步通行模型(Partially Synchronous Communication Model),在指令发出之后的一定全局稳定时间(Global Stabilazation Time, GST) \Delta后,所有的未出错副本都能够对于指令的顺序以及结果产生共识,进而让整个网络的状态变为确定性的。该模型中,共识达成需要满足条件n \geq 3f+1

【这里的f意思是可以允许的“恶意”拜占庭节点的个数,具体推导细节可以看本文Model章节的Cryptographic primitives部分】

Byzantine fault tolerance (BFT) refers to the ability of a computing system to endure arbitrary (i.e., Byzantine) failures of its components while taking actions critical to the system’s operation. In the context of state machine replication (SMR) [35, 47], the system as a whole provides a replicated service whose state is mirrored across n deterministic replicas. A BFT SMR protocol is used to ensure that non-faulty replicas agree on an order of execution for clientinitiated service commands, despite the efforts of f Byzantine replicas. This, in turn, ensures that the n−f non-faulty replicas will run commands identically and so produce the same response for each command. As is common, we are concerned here with the partially synchronous communication model [25], whereby a known bound ∆ on message transmission holds after some unknown global stabilization time (GST). In this model, n \geq 3f + 1 is required for non-faulty replicas to agree on the same commands in the same order (e.g., [12]) and progress can be ensured deterministically only after GST [27].

在BFT SMR协议最开始构想的时候,研究者往往使用一个经典的研究系统,该系统拥有n=4n=7个副本节点,并被部署在一个本地网络中。然而,近期对于BFT的研究,由于区块链的出现,拥有了新的研究重点,即尝试让网络能够拓展到更大的规模n。与无授权的区块链系统(如比特币)相反,一种称之为授权区块链系统,拥有特定个数的副本节点,用于维护账本或者客户端指令的顺序,换句话来说,这些副本支持了网络的SMR特性。如果不考虑副本授权的限制,副本的个数可以达到几百个。值得注意的是,最终的全局稳定时间\Delta在一个广域网络中,需要进行针对性调整以适应网络通信的延时。

【区块链往往看重是怎么能够得到一个高吞吐量的账本,现在比特币那种10分钟一次记账频率事实上是完全不达标的】

When BFT SMR protocols were originally conceived, a typical target system size was n = 4 or n = 7, deployed on a local-area network. However, the renewed interest in Byzantine fault-tolerance brought about by its application to blockchains now demands solutions that can scale to much larger n. In contrast to permissionless blockchains such as the one that supports Bitcoin, for example, so-called permissioned blockchains involve a fixed set of replicas that collectively maintain an ordered ledger of commands or, in other words, that support SMR. Despite their permissioned nature, numbers of replicas in the hundreds or even thousands are envisioned (e.g., [42, 30]). Additionally, their deployment to wide-area networks requires setting \Delta to accommodate higher variability in communication delays.

扩展性挑战 算法PBFT的引入,标志着第一个可以实践的半同步BFT解决方案的实现。基于该算法提出的两阶段共识思想,又有各种各样的BFT解决方案被创造。从实践的角度来看,一个稳定的leader节点可以通过两轮的消息交换达成共识,第一阶段通过让一个副本获得包含n-f个投票的法定人数证书(Quorum Certificate, QC),使之成为leader,进而保证提议的唯一性。第二阶段则保证leader节点可以说服其他副本去向一个安全的提议投票。

The scaling challenge. Since the introduction of PBFT [20], the first practical BFT replication solution in the partial synchrony model, numerous BFT solutions were built around its core two-phase paradigm. The practical aspect is that a stable leader can drive a consensus decision in just two rounds of message exchanges. The first phase guarantees proposal uniqueness through the formation of a quorum certificate (QC) consisting of (n−f) votes. The second phase guarantees that the next leader can convince replicas to vote for a safe proposal.

为新的leader节点收集信息以及向副本发送提案的过程,称之为视图转换(view change),是整个网络的核心操作。不幸的是,视图转换是基于两步算法的,过于简单且容易出错,即使是在中等大小的网络中,也会存在严重的通信损失。它要求新的领导者从n-f副本中传递信息,每个副本都报告自己知道的QC值最高的副本。即使仅计算认证耗时(数字签名或消息认证码),为了传输新的提案,PBFT算法需要消耗O(n^3)的通信代价。该算法的一个变体,通过阈值数字签名(Threshold Digital Signatures),将多个认证器组合成一个的算法能有效优化耗时,不过即便如此依旧有O(n^2)的时间复杂度。在PBFT中,在单次共识达成之前,有O(n)的视图转换,故总体的传输量为O(n^4),即使加入了阈值签名优化也是O(n^3)复杂度。这种由规模增大带来的传输量剧增挑战着包括PBFT在内的诸多协议,包括Prime、Zyzzyva 、Upright、BFT-SMaRt、 700BFT和SBFT.

The algorithm for a new leader to collect information and propose it to replicas—called a view-change—is the epicenter of replication. Unfortunately, view-change based on the two-phase paradigm is far from simple [38], is bug-prone [4], and incurs a significant communication penalty for even moderate system sizes. It requires the new leader to relay information from (n-f) replicas, each reporting its own highest known QC. Even counting just authenticators (digital signatures or message authentication codes), conveying a new proposal has a communication footprint of O(n^3) authenticators in PBFT, and variants that combine multiple authenticators into one via threshold digital signatures (e.g., [18, 30]) still send O(n^2) authenticators. The total number of authenticators transmitted if O(n) view-changes occur before a single consensus decision is reached is O(n^4) in PBFT, and even with threshold signatures is O(n^3). This scaling challenge plagues not only PBFT, but many other protocols developed since then, e.g., Prime [9], Zyzzyva [34], Upright [22], BFT-SMaRt [13], 700BFT [11], and SBFT [30].

HotStuff算法含有三个核心步骤,第一步允许新leader节点选择周围QC值最高的副本。第二步允许其他副本在投票环节之后改变想法,由于这个改变不需要leader节点进行任何的证明,因此同时降低了时间复杂度以及换leader协议的复杂度。最后,在几乎完成所有阶段的共识后,HotStuff能够很简单的并行化,以保证频繁的leader更迭。

HotStuff revolves around a three-phase core, allowing a new leader to simply pick the highest QC it knows of. It introduces a second phase that allows replicas to “change their mind” after voting in the phase, without requiring a leader proof at all. This alleviates the above complexity, and at the same time considerably simplifies the leader replacement protocol. Last, having (almost) canonized all the phases, it is very easy to pipeline HotStuff, and to frequently rotate leaders.

据我们所知,BFT协议只有在区块链领域(如Tendermint和Casper)拥有如此简单的leader选举制度,然而,这些系统都是围绕同步内核建立的,在此情形下,提议是按照一个提前定义好的间隔被提出的。这个间隔保证了在最坏情况下,消息能够在广域网络中完整传播。这样做意味着我们不得不放弃大部分最为常用的BFT-SMR解决方案(包括上面列举的那一些),我们放弃的上述解决方案都被定义为拥有乐观响应性(Optimistic Responsiveness)。非正式的,响应性(Responsiveness)要求非故障leader节点一旦被指定,就可以仅根据实际消息延迟来使协议及时达成共识,而与消息传输延迟的任何已知上限无关。更加适合我们模型的是乐观响应性,他放宽了标准,仅仅要求在“乐观”的情况(比如最普通的情况下)拥有响应性,在这里乐观的情况指的是GST时限到达的时候。不管是否乐观,在算法Tendermint/Casper中,响应能力都被排除在外,这个问题的症结在于,可能存在一个诚实的副本,拥有最高的QC值,但是leader对此毫不知情。我们完全可以构造一个场景,使得刚刚描述的问题能够让共识的进展直接卡死(第4.4节将介绍这个问题的详细细节)。基于现有的一些已部署工程产生的报告,如果没有能够在关键协议步骤中纳入必要的延迟,可能使得整个系统彻底失去活性。

To our knowledge, only BFT protocols in the blockchain arena like Tendermint [15, 16] and Casper [17] follow such a simple leader regime. However, these systems are built around a synchronous core, wherein proposals are made in pre-determined intervals that must accommodate the worst-case time it takes to propagate messages over a wide-area peer-to-peer gossip network. In doing so, they forego a hallmark of most practical BFT SMR solutions (including those listed above), namely optimistic responsiveness [42]. Informally, responsiveness requires that a non-faulty leader, once designated, can drive the protocol to consensus in time depending only on the actual message delays, independent of any known upper bound on message transmission delays [10]. More appropriate for our model is optimistic responsiveness, which requires responsiveness only in beneficial (and hopefully common) circumstances—here, after GST is reached. Optimistic or not, responsiveness is precluded with designs such as Tendermint/Casper. The crux of the difficulty is that there may exist an honest replica that has the highest QC, but the leader does not know about it. One can build scenarios where this prevents progress ad infinitum (see Section 4.4 for a detailed liveless scenario). Indeed, failing to incorporate necessary delays at crucial protocol steps can result in losing liveness outright, as has been reported in several existing deployments, e.g., see [3, 2, 19].

我们的贡献点 据我们所知,HotStuff是第一个拥有下列特性的MFT-SMR协议——

  • 线性视图变换复杂度:在达到GST时限后,任何正常的leader节点,一旦被选定,只需要与O(n)个认证副本通信,就可以提出一次共识决策。包括leader节点被替换的情况。这也意味着,大量领导者崩溃的情况下,到达GST的共识通信成本最坏O(n^2)
  • 乐观响应性:在达到GST时限后,任何正常的leader节点,一旦被选定,就只需要等待最先到达的n-f个响应,就可以确信自己有能力发送一个可以通过的提案。包括leader节点刚刚修改的情况。

Our contributions. To our knowledge, we present the first BFT SMR protocol, called HotStuff, to achieve the following two properties:

  • Linear View Change: After GST, any correct leader, once designated, sends only O(n) authenticators to drive a consensus decision. This includes the case where a leader is replaced. Consequently, communication costs to reach consensus after GST is O(n^2) authenticators in the worst case of cascading leader failures.
  • Optimistic Responsiveness: After GST, any correct leader, once designated, needs to wait just for the first n-f responses to guarantee that it can create a proposal that will make progress. This includes the case where a leader is replaced.

HotStuff的另一个特征是,一个新的leader节点尝试达成共识的协议通信开销不大于当前leader节点。这意味着HotStuff支持频繁的leader节点更替,这样更能够保证链的质量,在区块链领域是非常重要的。

Another feature of HotStuff is that the costs for a new leader to drive the protocol to consensus is no greater than that for the current leader. As such, HotStuff supports frequent succession of leaders, which has been argued is useful in blockchain contexts for ensuring chain quality [28].

HotStuff为了实现上述的特性,对于每个视图,都增加了一个额外的阶段,额外阶段的微小开销换取了简化leader更替的协议复杂度。这个交换只对于真实的网络延迟带来的开销,通常这个额外开销相比于\Delta是非常小的。我们期望额外的延迟会远小于之前协议为了实现线性视图变换而放弃响应性产生的开销。此外,通过第5章介绍的高效的管道操作,我们甚至可以让吞吐量不受影响。

HotStuff achieves these properties by adding another phase to each view, a small price to latency in return for considerably simplifying the leader replacement protocol. This exchange incurs only the actual network delays, which are typically far smaller than \Delta in practice. As such, we expect this added latency to be much smaller than that incurred by previous protocols that forgo responsiveness to achieve linear view-change. Furthermore, throughput is not affected due to the efficient pipeline we introduce in Section 5.

在理论贡献之外,HotStuff同时提供了一个BFT算法的复现。有助于读者全面了解BFT协议:

  • 一个运行在图上的BFT复现框架。 通过投票和规则提交保障的安全性。 活跃性是通过Pacemaker提供的,支持向图添加新的副本。
  • 在该框架下实现已知协议(DLS、PBFT、Tendmint和Casper),以及我们自己的HotStuff协议。

In addition to the theoretical contribution, HotStuff also provides insights in understanding BFT replication in general and instantiating the protocol in practice (see Section 6):

A framework for BFT replication over graphs of nodes. Safety is specified via voting and commit graph rules.

  • Liveness is specified separately via a Pacemaker that extends the graph with new nodes.
  • A casting of several known protocols (DLS, PBFT, Tendermint, and Casper) and one new (ours, HotStuff), in this framework.

HotStuff的另一个优点是非常简单,由于其机制本身就是非常经济性的:只包含两种消息的类别,以及一个简单地规则用于决定副本相互之间如何对待。安全性是通过投票和提交规则来保证的,而活跃性则需要Pacemaker机制,该机制与安全机制是完全独立的。同时,他允许通过少量的变化实现几种已知的协议(DLS, PBDF, Tendermint和Casper)。这种灵活性来源于其对于图上节点的操作,在现代区块链和传统的BFT基础间架设了一个桥梁。

HotStuff has the additional benefit of being remarkably simple, owing in part to its economy of mechanism: There are only two message types and simple rules to determine how a replica treats each. Safety is specified via voting and commit rules over graphs of nodes. The mechanisms needed to achieve liveness are encapsulated within a Pacemaker, cleanly separated from the mechanisms needed for safety. At the same time, it is expressive in that it allows the representation of several known protocols (DLS, PBFT, Tendermint, and Casper) as minor variations. In part this flexibility derives from its operation over a graph of nodes, in a way that forms a bridge between classical BFT foundations and modern blockchains.

我们设计了一个HotStuff的原型并进行了初步评估。HotStuff被部署在一个具有一百多个副本的网络上。HotStuff实现的吞吐量与延迟可以媲美BFT-SMaRt等成熟系统,这些系统的代码复杂度都远大于HotStuff。我们进一步演示了在频繁更改领导者的情况下,HotStuff的通信开销基本不变,而BFT-SMaRt则随着副本的增长呈二次方增长。

We describe a prototype implementation and a preliminary evaluation of HotStuff. Deployed over a network with over a hundred replicas, HotStuff achieves throughput and latency comparable to, and sometimes exceeding, those of mature systems such as BFT-SMaRt, whose code complexity far exceeds that of HotStuff. We further demonstrate that the communication footprint of HotStuff remains constant in face of frequent leader replacements, whereas BFT-SMaRt grows quadratically with the number of replicas.

表1 GST后各协议的表现对比

博客的浮动小人

寻找素材

首先在p站找一张可爱的女孩子,最好是全身、呆萌系的,比如我找了这张pid56056419

使用在线ps软件将背景图片去掉,另存为png格式图片,像这样——

拿到这张图片的链接,比如我直接使用了本文中图片链接 https://mhy12345.xyz/wp-content/uploads/2020/04/56056419_p0-2.png.

引入javascript

添加spig.js文件到博客后端,如[WP_ROOT/wp-includes/js/spig.js],对于这种情况,我们可以使用静态链接https://mhy12345.xyz/wp-includes/js/spig.js,访问改文件,文件中定义了一些角色动作以及交互逻辑。具体js代码可以参考文末附件。

将jquery和spig.js引入网页,对于wordpress来说,是在<主题编辑器/footer.php>中插入

<!--.浮动小人-->    
<script type="text/javascript">
    <?php if(is_home()) echo 'var isindex=true;var title="";';else echo 'var isindex=false;var title="', get_the_title(),'";'; ?>
    <?php if(((display_name = wp_get_current_user()->display_name) != null)) echo 'var visitor="',display_name,'";'; else if(isset(_COOKIE['comment_author_'.COOKIEHASH])) echo 'var visitor="',_COOKIE['comment_author_'.COOKIEHASH],'";';else echo 'var visitor="游客";';echo "\n"; ?>
    </script>
    <script type="text/javascript" src="https://libs.baidu.com/jquery/1.8.3/jquery.min.js"></script>
    <script type="text/javascript" src="https://mhy12345.xyz/wp-includes/js/spig.js"></script>
    <style>
    .spig {display:block;width:130px;height:170px;position:absolute;bottom: 300px;left:160px;z-index:9999;}
    #message{color :#191919;border: 1px solid #c4c4c4;background:#ddd;-moz-border-radius:5px;-webkit-border-radius:5px;border-radius:5px;min-height:1em;padding:5px;top:-65px;position:absolute;text-align:center;width:auto !important;z-index:10000;-moz-box-shadow:0 0 15px #eeeeee;-webkit-box-shadow:0 0 15px #eeeeee;border-color:#eeeeee;box-shadow:0 0 15px #eeeeee;outline:none;}
    .mumu{width:130px;height:170px;cursor: move;background:url(//mhy12345.xyz/wp-content/uploads/2020/04/56056419_p0-2.png) no-repeat;background-size:150px;}
    </style>
    <div id="spig" class="spig">
    <div id="message">加载中……</div>
    <div id="mumu" class="mumu"></div>
    </div>
<!--.end浮动小人-->

保存刷新就可以看到浮动的小人啦~

开发者
https://ihuan.me/2107.html
https://www.dreamwings.cn/spig/2929.html

 

D-Cube: Dense-Block Detection in Terabyte-Scale Tensors 阅读笔记

背景

密集块

本文直接略过了Dense-Block的含义,这里稍微提一提[1]——

假设你拥有一个点评网站的评分数据集,里面记录了“某个时刻,某个用户,对某个页面,进行了点评”。怎么找到数据集里面的虚假点评呢?

研究发现,“点评者相同、点评时间相同、点评页面相同”,意味着这些点评可能是一个群聚虚假点评!

如果把所有的数据,放到一个K维的立方体中,上述虚假点评会组成一个“块”,块内的权值比其他位置更为“密集”。这就是密集块(Dense-Block)的含义。

计算机储存的层次结构

计算机内存访问速度快、容量较小;与之相反计算机磁盘访问速度慢、容量较大。这意味着,对于小数据,我们的程序都会将他放在内存中以获得较快的访问速度。但是倘若数据量增大,那么数据就不得不被放在更低层级的储存结构上(如硬盘),而访问速度会变慢约1000倍之多。

简介

在欺诈检测领域,Dense-Block检测被证明非常的有效。但是,至今为止,所有的Dense-Block计算方法都默认了数据集足够的小以至于可以被塞到电脑内存中。当数据量稍微大一些的话,这类算法就会产生大量的磁盘IO,以至于变得非常低效。

本文提出了D-Cube,一个基于磁盘的最密集块检测算法,该算法以最小化磁盘IO为目标进行优化,并支持Hadoop的MapReduce框架进行分布式运算。D-Cube有如下的特征——

  • 储存高效:D-Cube与传统算法相比,在相同数据集下,使用的内存减小了1600倍,而能够处理的数据规模则增大到1000倍
  • 快速:相比于State-Of-Art模型,D-Cube在真实世界的向量中有5倍加速比,在人工生成向量中有12倍加速比。
  • 准确率可靠:可证明的算法效果,和State-Of-Art算法在真实世界向量中结果相似性极高。
  • 有效性:能够高准确率检测出TCP-Dump网络攻击数据集。

问题描述

指标定义

该问题的形式定义相对比较复杂,完整的定义可以参见原文。

以下图为例,首先我们有若干关系R(A_1, \dots, A_N, X)的多元组,其中编码了N维的张量组合A_1, A_2, \dots, A_N以及其对应的非负指标 X。例如在wikipedia revision history数据集中,R(user, page, date, count)表示了用户u修订了页面p,修改时间d,修改次数c。显然,关系R可以被表示到一个N维立方体中,如图b所示。我们可以进一步,在R的诸多维度中,每个维度都提取出一个子集,形成一个块B。对于B和R,我们都可以定义其质量为块内所有指标t[A_n]的和。

上图是Dense-Block计算的例子,a)描述了一个关系R,b)描述了R的Tensor表示方法,以及其中的一个Block, B(alice,1),M(B(alice, 1)) = 9

块密度

研究者发现定义快密度对于异常检测非常有用,以下是两种快密度的表示方法

算数块密度

\rho_{ari}(B,R) = \rho_{ari}(M_B, \{|B_n|\}_{n=1}^{N},M_R, \{|M_n|\}_{n=1}^{N}) = \frac{M_B}{\frac{1}{N}\sum_{n=1}^{N}|B_n|}

几何快密度

\rho_{geo}(B,R) = \rho_{geo}(M_B, \{|B_n|\}_{n=1}^{N},M_R, \{|M_n|\}_{n=1}^{N}) = \frac{M_B}{(\prod_{n=1}^{N}|B_n|)^{\frac{1}{N}}}

需要注意的是,这里的块密度并不是单纯的块内数值加权平均,而是一个类似于块内总质量除以块各维度平均大小的一个东西。

最后定义块的可疑性(Suspiciousness)

(1)   \begin{align*}\rho_{susp}(B,R) =& \rho_{susp}(M_B, \{|B_n|\}_{n=1}^{N},M_R, \{|M_n|\}_{n=1}^{N}) \\=& M_B \left(log\left(\frac{M_B}{M_R}\right)-1\right) + M_R \prod_{n=1}^{N} \frac{|B_N|}{|R_N|}  \\& -M_B log\left(\prod_{n=1}^{N}\frac{|B_N|}{|R_N|}\right)\end{align*}

问题定义

大规模前k密集块检测问题
给定:a)一个大规模R数组,该数组无法在内存中存下来,b)一个计算块密度的函数\rho
计算:k个密度最大且不相交的块。

方法

首先是D-Cube的算法框架,该框架使用迭代的方式,每一轮都调用find\_single\_block算法获得一个密集块,然后将密集块中出现过的标签都从数据集中删掉——

  • 计算R的质量M_R作为后续find_single_block调用的参数
  • 调用find_single_block 得到一个密集块B
  • 令R←R – B
  • 进入下一个迭代

可以看出,D-Cube作为大的框架,多次调用快选择算法find\_single\_block

D-Cube算法框架,通过迭代,每次找到一个块,并从R中删除掉,直到成功选取k个块为止。

接下来是重点,如何在R中找到一个块B,使其密度\rho尽量大。事实上也是使用迭代法,首先令B为R全集,每一轮都挑选出一个维度i,计算该维度的所有标签对应的M值,删掉那些不能够有效增加M_B的项。在所有维度都被遍历之后返回一个极度压缩的块。大体思路如下——

  • 选择一个维度i(如标签数量最大的维度)
  • 遍历维度i的所有标签,计算每个标签对应的质量Mass
  • 一定会有一部分标签的Mass小于平均值,那么这些标签一定拖了总体密度的后腿,可尝试删去(实现细节有所不同)
  • 遍历直到每一个维度都被压缩成一系列高密度的标签为止。
find_single_block,该函数目的在于从R中选取一个尽量密集的块。

最后,维度的选择也有若干方法,但他们本质上只决定了贪心的顺序,对于模型本身没有太大的结构性贡献。

效率优化

注意到所有数据都是以文件的形式储存的,这意味着

  • 每次访问数据,都需要从文件中把数据给读出来。
  • 每次修改数据,都需要先从文件中读取数据,将修改后的数据写入临时文件,最后将临时文件移动覆盖原文件。

文章提到了两种效率优化的方式——

合并磁盘访问步骤:磁盘IO的总量可以通过合并多步骤的磁盘访问进行优化,这种优化的核心思路是,如果我们首先要访问一遍磁盘数据,又要写相同的数据,我们完全可以省略第二遍的磁盘IO中最为耗时的磁盘扫描。该优化能将磁盘访问数量减至原来的30%。

使用内存缓存法:对于那些最常访问的Tensor,将他们缓存到内存中,能有提速约3倍。

以D-Cube函数的line5和line8为例

此图像的alt属性为空;文件名为image-3.png
line5:计算R中所有权值的和;Line8:依据计算出来的R中的块B,将B的所有标签从R中删除
  • 在执行line8的时候
    • 从磁盘文件[disk_value_current]中读入权值数据
    • 依照Block的指示,将那些存在于Block的权重置0
    • 置0的同时,将权重写入文件[disk_value_temp]
  • 顺便执行下一轮迭代的line5
    • 写回数据过程中,顺便对新的权重求和
  • 使用[disk_value_temp]覆盖[disk_value_current]完成数据更新
  • 额外的,对于数据前面num_of_buffer条数据,直接缓存到内存中。

复杂度分析

具体分析略过,可以证明最坏复杂度O(kN^2|R|L)

  • k表示要寻找k个密集块
  • N表示数据维度
  • R表示数据集大小
  • L = max_{n \in \{1\dots n\}}|R_n|

除去最坏的情况,在实际的分析中,我们发现时间复杂度和k,N,R线性相关,和L次线性相关。

准确率分析

可证明模型返回的块密度上界为最优解块密度的N倍,即——

B*为最大化\rho_{ari}(B,R)的块,\tilde{B}为算法返回值,有\rho_{ari}(\tilde{B},R) \geq \frac{1}{N} \rho_{ari}(B*,R),证明略。

MapReduce实现

Map-Reduce框架适用于数据被储存在分布式文件系统的情况。 实际上本文基本上算是将最经典的Map-Reduce模型套进来了,创新点是“用了”而非“如何用”。

例如要计算R每一个标签对应的质量——

  • [MAP]将所有需要计算的标签分发到所有分布式Client节点
  • [PROCESS]分布式节点计算每一个标签的质量
  • [REDUCE]将计算的结果汇总到Master节点

参考文献

[1]M-zoom: Fast dense-block detection in tensors with quality guarantees.
[原文] Shin, K., Hooi, B., Kim, J., & Faloutsos, C. (2017, February). D-cube: Dense-block detection in terabyte-scale tensors. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (pp. 681-689).

grpc-web “Http response at 400 or 500 level”

原因:Go语言(也许其他语言也差不多)的GRPC库是使用HTTP/2.0进行交互的。而Grpc-Web框架则发送的事HTTP/1.1请求。协议不同就产生了这个错误。

检验方式:在Grpc-Web的Hello world样例工程 的说明文档,有一个快速搭建Grpc-Web服务的流程。最终的展示工程分为三个模块

  • client: 网页服务器运行在8081端口,向8080端口发送请求
  • server:在9090端口接收请求
  • proxy:将8080端口的请求路由到9090端口

如果我们修改工程,将client的请求端口直接对接到server监听端口,就会出现前端“Http response at 400 or 500 level”,后端“grpc: Server.Serve failed to create ServerTransport: connection error: desc = “transport: http2Server.HandleStreams received bogus greeting from client: \”OPTIONS /protos.Orderer/\””这样的报错。

解决方案1:按照Grpc-Web的文档,使用Envoy进行流量转换。

解决方案2:使用非官方库,在Go语言端多开一个端口转美与Grpc-Web进行交互。