【论文翻译】HotStuff:BFT Consensus in the Lens of Blockchain – Evaluation

8. 模型评估

我们已经在大约4K行C ++代码中将HotStuff实现为一个库。 最值得注意的是,伪代码中指定的核心共识逻辑仅占用了大约200行。 在本节中,我们将自己的模型与State-Of-Art的模型BFT-SMaRt 进行比较,首先检查baseline吞吐量和延迟。 然后,我们将重点放在观察视图更改的消息成本上,以了解HotStuff在这种情况下的优势。

We have implemented HotStuff as a library in roughly 4K lines of C++ code. Most noticeably, the core consensus logic specified in the pseudocode consumes only around 200 lines. In this section, we will first examine baseline throughput and latency by comparing to a state-of-art system, BFT-SMaRt [13]. We then focus on the message cost for view changes to see our advantages in this scenario.

8.1 设置

我们使用c5.4xlarge实例在Amazon EC2上进行了实验。每个实例具有16个由Intel Xeon Platinum 8000处理器组成的vCPU。所有内核均具有高达3.4GHz的Turbo CPU时钟。我们将每个副本运行在单个VM实例上,因此,大量使用线程的BFT-SMaRt被允许每个副本使用16个核心,就像它们的原始评估一样[13]。 iperf测得的最大TCP带宽约为每秒1.2 GB。我们没有在任何测试中限制带宽。两台计算机之间的网络延迟小于1毫秒。

在我们的HotStuff原型实现中,secp256k1算法被用于投票和仲裁证书中的所有数字签名。 BFT-SMaRt在正常消息上,使用hmac-sha1生成消息的身份验证代码(MAC),并在视图更改期间外额外还使用数字签名。

HotStuff的所有结果均都是使用客户端的端到端测量的。对于BFT-SMaRt,我们使用了BFT-SMaRt网站(https://github.com/bft-smart/library)的微型基准程序ThroughputLatencyServer和ThroughputLatencyClient。使用客户端程序测量端到端延迟,但不测量吞吐量,服务器端程序测量吞吐量和延迟。最终我们使用了服务器的吞吐量结果和客户端的延迟结果。

We conducted our experiments on Amazon EC2 using c5.4xlarge instances. Each instance had 16 vCPUs supported by Intel Xeon Platinum 8000 processors. All cores sustained a Turbo CPU clock speed up to 3.4GHz. We ran each replica on a single VM instance, and so BFT-SMaRt, which makes heavy use of threads, was allowed to utilize 16 cores per replica, as in their original evaluation [13]. The maximum TCP bandwidth measured by iperf was around 1.2 Gigabytes per second. We did not throttle the bandwidth in any run. The network latency between two machines was less than 1 ms.

Our prototype implementation of HotStuff uses secp256k1 for all digital signatures in both votes and quorum certificates. BFT-SMaRt uses hmac-sha1 for MACs (Message Authentication Codes) in the messages during normal operation and uses digital signatures in addition to MACs during a view change.

All results for HotStuff reflect end-to-end measurement from the clients. For BFT-SMaRt, we used the microbenchmark programs ThroughputLatencyServer and ThroughputLatencyClient from the BFT-SMaRt website (https://github.com/bft-smart/library). The client program measures end-to-end latency but not throughput, while the server-side program measures both throughput and latency. We used the throughput results from servers and the latency results from clients.

8.2 基准

我们首先在评估其他BFT复制系统时常见的设置下测量吞吐量和延迟。 我们以允许单次故障(即f = 1)的配置运行了4个副本,同时更改了操作请求速率,直到系统饱和。 该基准测试使用了空的(零大小)操作请求和响应,并且未触发任何视图更改。 我们将扩展到下面的其他设置。 尽管我们的响应式HotStuff是三阶段的,但由于BFT-SMaRt基线只有两个阶段,因此我们也将其两阶段变型作为附加基线运行。

图4描绘了两个系统,分别在三个批处理大小(batch-size)的表现,batch-size分别取100、400和800。由于这些系统具有不同的batch-size定义,所以这些数字对于每个系统的意义略有不同。 BFT-SMaRt为每个操作驱动一个单独的共识决策,并批量处理来自多个共识协议的消息。 因此,它具有典型的L形等待时间/吞吐量性能曲线。 HotStuff在一个节点中分使用批处理打包了多个操作,从而降低了每个决策的数字签名成本。 但是,如果每批处理超过400次操作,批处理引起的等待时间变得比加密耗时的成本高。 尽管存在这些差异,但三步骤(HS3-*)和两步骤(HS2-*)HotStuff均可在所有三个批次大小上实现与BFT-SMaRt(BS-*)相当的延迟性能,同时最大吞吐量显著胜过BFT-SMaRt。

图4 带宽和延时在选择不同batchsize的区别,4副本,0/0载荷

Figure 4 depicts three batch sizes for both systems, 100, 400, and 800, though because these systems have different batching schemes, these numbers mean slightly different things for each system. BFT-SMaRt drives a separate consensus decision for each operation, and batches the messages from multiple consensus protocols. Therefore, it has a typical L-shaped latency/throughput performance curve. HotStuff batches multiple operations in each node, and in this way, mitigates the cost of digital signatures per decision. However, above 400 operations per batch, the latency incurred by batching becomes higher than the cost of the crypto. Despite these differences, both three-phase (“HS3-”) and two-phase (“HS2-”) HotStuff achieves comparable latency performance to BFT-SMaRt (“BS-”) for all three batch sizes, while their maximum throughput noticeably outperformed BFT-SMaRt.

对于batch-size为100和400的场景,HotStuff的最低延时情景所对应的延迟和吞吐量要比BFT-SMaRT在其最高吞吐量下同时可实现的延迟和吞吐量要好,同时会导致延迟稍有增加。这种增加部分是由于HotStuff采用的批次策略:它需要三个额外的完整批次(两阶段变种算法则需要两个批次)才能使得一个批次的决策完成。我们的实验使未完成的请求数量保持较高,但是批处理量越大,填充批处理流水线所需的时间就越长。可以进一步在实际部署优化模型,以使批次大小适合未完成的操作数量。

For batch sizes of 100 and 400, the lowest-latency HotStuff point provides latency and throughput that are better than the latency and throughput simultaneously achievable by BFT-SMaRT at its highest throughput, while incurring a small increase in latency. This increase is partly due to the batching strategy employed by HotStuff: It needs three additional full batches (two in the two-phase variant) to arrive at a decision on a batch. Our experiments kept the number of outstanding requests high, but the higher the batch size, the longer it takes to fill the batching pipeline. Practical deployments could be further optimized to adapt the batch size to the number of outstanding operations.

图5描绘了0/0、128/128和1024/1024的三种客户端请求/答复有效载荷大小(以字节为单位),分别表示为”p0″,”p128″和”p1024″。在所有有效负载大小下,三阶段和两阶段HotStuff的吞吐量在性能上均优于BFT-SMaRt,且延迟时间相似或相当。

图5 带宽延迟对比,在选取不同载荷大小的情况下对比,4副本,400batchsize

Figure 5 depicts three client request/reply payload sizes (in bytes) of 0/0, 128/128, and 1024/1024, denoted “p0”, “p128”, and “p1024” respectively. At all payload sizes, both three-phase and two-phase HotStuff outperformed BFTSMaRt in throughput, with similar or comparable latency.

注意,BFT-SMaRt使用基于对称加密的MAC,其速度比HotStuff使用的数字签名中的非对称加密快几个数量级,并且与BFT-SMaRt使用的两阶段PBFT变体相比,三阶段HotStuff具有更多的往返行程。但是,HotStuff仍然能够实现可比的延迟和更高的吞吐量。下面我们将在更具挑战性的情况下评估这两种系统,在这些情况下,HotStuff的性能优势将更加明显。

Notice BFT-SMaRt uses MACs based on symmetric crypto that is orders of magnitude faster than the asymmetric crypto in digital signatures used by HotStuff, and also three-phase HotStuff has more round trips compared to two-phase PBFT variant used by BFT-SMaRt. Yet HotStuff is still able to achieve comparable latency and much higher throughput. Below we evaluate both systems in more challenging situations, where the performance advantages of HotStuff will become more pronounced.

8.3 扩展性

为了评估HotStuff在各个维度上的可伸缩性,我们进行了三个实验。对于基准模型,我们使用零大小的请求/响应有效负载,同时更改了副本的数量。第二次评估使用128字节和1024字节的请求/响应有效负载重复了基准实验。第三次测试重复了基线(有效载荷为空),同时引入了副本之间的网络延迟,这些延迟以5ms±0.5ms或10ms±1.0ms的均匀分布(使用NetEm实现)(请参见https://www.linux.org/docs/ man8 / tc-netem.html)。对于每个数据点,我们以相同的设置重复进行五次运行,并显示误差区间以指示所有运行的标准偏差。

To evaluate the scalability of HotStuff in various dimensions, we performed three experiments. For the baseline, we used zero-size request/response payloads while varying the number of replicas. The second evaluation repeated the baseline experiment with 128-byte and 1024-byte request/response payloads. The third test repeated the baseline (with empty payloads) while introducing network delays between replicas that were uniformly distributed in 5ms ± 0.5ms or in 10ms ± 1.0ms, implemented using NetEm (see https://www.linux.org/docs/man8/tc-netem.html). For each data point, we repeated five runs with the same setting and show error bars to indicate the standard deviation for all runs.

第一个设置如图6a(吞吐量)和图6b(延迟)所示。三阶段和两阶段HotStuff始终显示出比BFT-SMaRt更好的吞吐量,而它们的延迟仍然可以与BFTSMaRt相提并论。当n <32时,性能扩展比BFT-SMaRt好。这是因为我们目前仍将secp256k1签名列表用于QC。将来,我们计划通过使用快速阈值签名方案来减少HotStuff中的密码计算开销。

图6 0/0载荷,400batch-size对应的扩展性

The first setting is depicted in Figure 6a (throughput) and Figure 6b (latency). Both three-phase and two-phase HotStuff show consistently better throughput than BFT-SMaRt, while their latencies are still comparable to BFTSMaRt with graceful degradation. The performance scales better than BFT-SMaRt when n < 32. This is because we currently still use a list of secp256k1 signatures for a QC. In the future, we plan to reduce the cryptographic computation overhead in HotStuff by using a fast threshold signature scheme.

有效负载大小为128或1024字节的第二个设置在图7a(吞吐量)和图7b(等待时间)中用“ p128”或“ p1024”表示。由于其二次带宽成本,对于相当大的有效载荷大小(1024字节),BFT-SMaRt的吞吐能力比HotStuff差。

图7 128/128或1024/1024载荷,400batch-size的扩展性

The second setting with payload size 128 or 1024 bytes is denoted by “p128” or “p1024” in Figure 7a (throughput) and Figure 7b (latency). Due to its quadratic bandwidth cost, the throughput of BFT-SMaRt scales worse than HotStuff for reasonably large (1024-byte) payload size.

第三种设置在图8a(吞吐量)和图8b(等待时间)中显示传输延迟为“ 5ms”或“ 10ms”的对比。同样,由于在BFT-SMaRt中使用了更多的通信,因此HotStuff在这两种情况下均始终优于BFT-SMaRt。

图8 0/0载荷、中途传输延时5ms或10ms,batch-size400ms的扩展性

The third setting is shown in Figure 8a (throughput) and Figure 8b (latency) as “5ms” or “10ms”. Again, due to the larger use of communication in BFT-SMaRt, HotStuff consistently outperformed BFT-SMaRt in both cases.

8.4 View Change

为了评估leader更换的通信复杂性,我们计算了在BFT-SMaRt的视图更改协议中执行的MAC或签名验证的数量。 我们的评估策略如下。 每隔一千个决策,我们就会向BFT-SMaRt中注入视图更改。 我们检测了BFT-SMaRt源代码,以计算在视图更改协议内接收和处理消息后的验证次数。 除了通信的复杂性之外,此测量还强调了与传输这些经过身份验证的值相关的密码计算负担。

To evaluate the communication complexity of leader replacement, we counted the number of MAC or signature verifications performed within BFT-SMaRt’s view-change protocol. Our evaluation strategy was as follows. We injected a view change into BFT-SMaRt every one thousand decisions. We instrumented the BFT-SMaRt source code to count the number of verifications upon receiving and processing messages within the view-change protocol. Beyond communication complexity, this measurement underscores the cryptographic computation load associated with transferring these authenticated values.

图9a和图9b显示了为每个视图更改处理的额外的身份验证器(分别为MAC和签名)的数量,其中“额外”定义为如果领导者保持稳定就不会发送的那些身份验证器。 请注意,根据此定义,HotStuff没有“额外”的身份验证者,因为身份验证者的数量保持不变,而不管领导者是否保持不变。 这两个数据表明,BFT-SMaRt使用的是MAC的立方数和签名的二次数。 HotStuff不需要额外的身份验证器即可更改视图,因此在图中将其省略。

图9 视图转换的时候,额外使用验证器个数

Figure 9a and Figure 9b show the number of extra authenticators (MACs and signatures, respectively) processed for each view change, where “extra” is defined to be those authenticators that would not be sent if the leader remained stable. Note that HotStuff has no “extra” authenticators by this definition, since the number of authenticators remains the same regardless of whether the leader stays the same or not. The two figures show that BFT-SMaRt uses cubic numbers of MACs and quadratic numbers of signatures. HotStuff does not require extra authenticators for view changes and so is omitted from the graph.

Evaluating the real-time performance of leader replacement is tricky. First, BFT-SMaRt got stuck when triggering frequent view changes; our authenticator-counting benchmark had to average over as many successful view changes as possible before the system got stuck, repeating the experiment many times. Second, the actual elapsed time for leader replacement depends highly on timeout parameters and the leader-election mechanism. It is therefore impossible to provide a meaningful comparison.

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据