BlockPerf: A Hybrid Blockchain Emulator/Simulator Framework

Blockchain is increasingly used for registering, authenticating and validating digital assets (financial assets, real estate, etc.) and transactions, governing interactions, recording data and managing identification among multiple parties in a trusted, decentralized, and secure manner. Today, a large variety of blockchain technologies is expanding in order to fulfill technical and non-technical needs and requirements. Within this context, determining and most importantly evaluating the characteristics/performance of a given blockchain platform is crucial for system designers before deploying it. A number of blockchain simulators have been proposed in the literature over the past few years, as reviewed in this paper, but are often limited in several respects (lack of extensibility, do not allow for evaluating all aspects of a blockchain…). This paper extends and improves a state-of-the-art simulator (BlockSim) into a new simulator called “BlockPerf” to overcome those limitations. Both simulators are compared based on a real-life (benchmarking) Bitcoin scenario, whose results show that BlockPerf provides more realistic results than BlockSim, improving by around $\approx 50\%$ (in average) the outcomes.

Hyperledger Caliper. Such approaches/tools incur high costs for deployment, lack of scalability (e.g., to carry out large-scale experiments) and modularity. On the other hand, simulators can help to deploy and test blockchain technologies in large-scale infrastructure settings. To date, several blockchain simulators exist (e.g., BlockSim [16], PeerSim [17], Shadow [18], Vibes [19], etc.), but they are often limited in several respects. Recent literature reviews of existing blockchain simulation tools [20], [21] point out the fact that those tools often limit themselves to evaluate part of the blockchain system (i.e., they fail in covering all layers and associated performance metrics (a) to (o) emphasized in FIGURE 1). This is particularly stressed by Paulavicius et al. [20] in their systematic review and empirical survey of blockchain simulators, in which the authors conclude that ''there is no 'one-size-fits-all' Proof-of-Work blockchain simulator that is able to accurately simulate all the layers''. To overcome such a limitation, a new hybrid blockchain emulator/simulator called BlockPerf is proposed, which extends the BlockSim simulator proposed by Faria and Correia [16]. One of the main improvement lies in the fact that BlockPerf relies on real network infrastructure (at the Network layer) while simulating the upper layers, leading to more realistic results than existing simulators.
State-of-the-art simulators and the extent to which they cover the six-layer model and associated performance metrics are reviewed and discussed in section II. The architectural design of BlockPerf, and how it extends BlockSim, is presented in section III. In section IV, a performance comparison analysis between BlockPerf and BlockSim is carried out based on a real-life (benchmarking) Bitcoin scenario; the conclusions and BlockPerf limitations being discussed in section V.

II. BACKGROUND AND RELATED WORK
As previously discussed, evaluating a blockchain technology or platform turns to be a complex process due to the various inter-dependent layers and associated parameters. To better explain this complexity, a slightly adapted version of the model introduced in [15] is considered in this paper, which corresponds to the six-layer model given in FIGURE 1. Although one may find other blockchain abstraction models in the literature, as the ones proposed by the ITU and ISO standardization bodies [22], [23], we adopted this six-layer model as it covers most aspects of a blockchain (DLT) platform, and it is straightforward to understand. Nonetheless, to allow readers to understand to what extent this model covers the ongoing ITU and ISO blockchain standard initiatives, and vice-versa, we provide a summary table in TABLE 1. Considering this model, sections II-A to II-F provide the necessary background regarding each of these layers, starting from top (Application) to bottom (Network), by detailing the key metrics -(a) to (o) -that a simulator should allow for measuring/tracking throughout a simulation run. Section II-G then discusses the extent to which existing blockchain simulators cover those layers and metrics.

A. APPLICATION LAYER
This layer manages the user interface, APIs (Application Programming Interface), and computational resources (e.g., needed for blockchain element storage, wallet creation, etc.). In many blockchains, there is the possibility to configure a node to run as a full node (storing a local copy of all transactions and blocks) or as a lightweight node (in charge of creating transactions and sending them to full nodes for validation purposes). From a simulation (or emulation 1 ) perspective, the following metrics should be measurable at this layer: (a) 1) Execution time: it refers to whether the simulator keeps track of the time needed to run the simulation; 2) Computational resource usage: it refers to whether the simulator keeps track of the resource usage evolution throughout the (simulation) run, which includes CPU of each node, swap space, storage capacity, etc.:

B. CONTRACT LAYER
Blockchain systems consist of scripts, also referred to as ''smart contracts'', that run when predetermined conditions are met (e.g., used to automate the execution of an agreement so that all participants can be immediately certain of the outcome). At this layer, the following metrics should be measurable: (a) 3) Contract creation time: it refers to whether the simulator keeps track of the time needed to generate the different contracts (which can be of different sizes) over the simulation; 4) Contract validation & execution time: it refers to whether the simulator keeps track of the time needed to validate and execute the different contracts of the simulation scenario.

C. INCENTIVE LAYER
Participation incentive means the way how honest behavior is incentivized and dishonest discouraged. Incentives can be in the form of transaction fees and/or rewards [1]. This decision affects the implemented consensus algorithm (introduced in the next model layer) and, respectively, is affected by the selected algorithm. At this layer, the following metrics should be measurable: (a) 5) Reward evolution: it refers to whether the simulator keeps track of the amount of cryptocurrencies distributed from the consensus process (e.g., leader election or mining) over the simulation; 6) Fee evolution: it refers to whether the simulator keeps track of the reward fees that client nodes offer to miners to incentivize them to process their transaction(s). 7) Currency evolution: it refers to whether the simulator keeps track of the currency generation rate, which evolves differently according to the implemented blockchain (e.g., in Bitcoin or Ethereum, it evolves along with the hashing difficulty).

D. CONSENSUS LAYER
Consensus protocols are needed to validate the data to prevent and remove any duplicated entry and/or fraud [24], [25]. The type of blockchain to be implemented (public, private, consortium) influence profoundly the type of consensus protocol to be used, the most well-known being Proof-of-Work (PoW), Proof-of-Stake (PoA), or still Practical Byzantine fault tolerance (PBFT). At this layer, the following metrics should be measurable: (a) 8) Pending transactions: it refers to whether the simulator keeps track of the number of transactions, over time, that are waiting to be confirmed (such a waiting area is called ''Mempool'' in Bitcoin, or sometimes called transaction queues); 9) Fork resolution: it refers to whether the simulator keeps track of (i 1 ) the number of forks that appear within the chain; (i 2 ) the stale rate (i.e., block discarded) throughout the simulation; 10) Consensus computation: it refers to whether the simulator keeps track of the collective (or individual) computation effort required to validate transactions and blocks.

E. NODE/DATA LAYER
The node (or data) layer is responsible for structuring the data before appending it to blocks, whose structure usually includes information such as previous block hash, Merkle root, time, bits, etc. At this layer, the following metrics should be measurable: (a) 11) Transaction evolution: it refers to whether the simulator keeps track of the number of transactions generated per day (k 1 ) and whether the simulated transactions match the real-life data structure (k 2 ). Unlike k 1 , k 2 is not a quantifiable metric but rather a boolean metric that states whether the simulator generates transaction following the real-world blockchain specification; 12) Block evolution: it refers to whether the simulator keeps track of (l 1 ) the number of blocks that are validated, mined and accepted as part of the longest chain; (l 2 ) the (average) time taken by each block to be validated; (l 3 ) the block sizes, which depend on the size of the transactions they include; and finally (l 4 ) the number of transactions included, on average, within a block. 13) Chain evolution: it refers to whether the simulator keeps track of the length of the chain over time (i.e., the number of blocks that form the longest chain), which is a good indicator of the load a new node would have to compute if it joins the network at a given point in time.

F. NETWORK LAYER
Blockchain is a pure P2P network, which is actually an overlay network [41] for distributed object storing, searching, and sharing (e.g., Ethereum relies on the Kademlia P2P protocol [41], [42]). At this layer, the following metrics should be measurable: (a) 14) Network graph evolution it refers to whether the simulation accurately follows the P2P protocol overlay network specifications (e.g., support for adding and discovering a node at any time) and keeps track of the network (node) evolution using network graph metrics such as Clustering Coefficient, Mean Geodesic Distance, and Diameter [43], [44]; 15) Throughput: it refers to whether the simulator keeps track of the number of valid transactions per second (Tx/s) that have been incorporated as part of a valid block within the longest chain. The transactions that are considered here are the ones that reach the majority of nodes within the network depending on the underlying consensus (i.e., ≥ 50% for PoW, ≥ 66% for PBFT, etc.); Literature comparison-• symbol indicates that the simulator makes available the performance metric without any code modification and/or post-processing, while symbol indicates that such a modification and/or post-processing is needed to obtained the desired performance metric.

G. RELATED WORK AND DISCUSSION
As of today, there are several blockchain simulators found in the literature, a summary of which is in TABLE 2. This table highlights what layers and associated metrics those simulators cover. Note that a simulator may, in some cases, cover a given layer but without necessarily providing as output a performance metric. This means that either a modification of the source code or post-processing treatments are required in order to compute the desired metric. For example, in BlockSim, the authors claim in their paper [31] that the simulator keeps track of the fee evolution, however, after analyzing it, we realized that it is not possible to retrieve it without modifying part of the code. Another example is HIVE [39] that emulates the behavior of smart contracts within the Ethereum environment, however, we found that it does not allow for analyzing the (h) metric (i.e., pending transactions). To make a distinction between simulators that make available a performance metric without requiring any code modification or post-processing, and the ones that require a modification/post-processing to obtain the metric, the following two symbols are respectively used in • (does not require any code modification and/or postprocessing) and (does require a code modification and/or post-processing. As a first simulator, let us mention Bitcoin-Simulator [33], which has been designed for educational purposes to help students to understand how the block generation rate (l 1 ) and block size (l 2 ) evolve over time. Although it is a well-designed pedagogical tool, it is quickly limited to carry out in-depth simulation analyzes. More research-oriented simulators have been proposed, such as Ethereum Hive [39] that have been proposed for emulating and evaluating Ethereum's smart contracts from a validation & execution time perspective (d 1 -d 2 ). However, as revealed in [30], simulating a large number of nodes becomes difficult with limited computational resources. To overcome this limitation, a series of simulators including VIBES [19], eVIBES [30], eVIBES Plasma [29] ones, as well as CIDDS [38], were proposed, but unfortunately fall short of fulfilling the promise, as reported by Lathif et al. [38]. One of the main reasons for this is that those simulators do not adequately model the transactions at the Node/Data layer (i.e., k 2 ), considering them as empty in the majority of the simulators. Such a consideration poses several issues when evaluating the blockchain system as a whole, knowing that transaction and block sizes may have non-negligible impacts on the overall system performance [45]. To overcome this issue, a new range of simulators, including BlockSim [16], [31], SIMBA [34], DLSF and others (cf., TABLE 2), have considered the transaction/block structure, algorithms for wallet creation, message signing, and so forth, thus leading to more realistic performance evaluation results. Nonetheless, the network modeling is often too simplistic, not reflecting the real behavior of the P2P overlay network (i.e., possible communication delays, network congestion, packet losses, computation resource limitations, etc.), which, in our opinion, can have significant impact on the overall system performance evaluation process. A few simulators have been designed to consider the throughput and the network graph evolution, such as CIDDS, PeerSim. However, these two simulators neglect many of the upper layers and metrics, as highlighted in TABLE 2.
The drawbacks discussed in this section prove that designing a modular simulator that allows for testing/evaluating different blockchains, with different consensus protocols, different contract and transaction/block specifications, different incentive schemes, along with network infrastructures, turns to be a challenging task. FIGURE 2 provides an overview of the extent to which the reported blockchain simulators cover the different layers of the six-layer model previously introduced. It can be first observed that they primarily often VOLUME 9, 2021 neglect the Contract and Incentive layers. Second, key aspects of a blockchain systems at the Network and Data/Node layers, namely the consideration of the real P2P overlay network protocol behavior (n) and transaction/block structure (k 1 ), result in non-optimal performance evaluation results (non-optimal compared to the reality). A new hybrid emulator/simulator tool called ''BlockPerf'' is proposed in this paper to overcome this limitation. BlockPerf is hybrid in the sense that it emulates the network layer to correctly address the network layer while simulating the upper layers based on statistical data modeling approaches, as presented in the next section.

III. BLOCKPERF SIMULATOR DESIGN
This section describes how BlockPerf extends BlockSim, whose main extensions are summarized in FIGURE 3. These layer extensions and/or adaptations are respectively discussed through sections III-A to III-E. Note that, at this stage, Block-Perf does not address/model the ''Smart Contract'' layer and is part of future research work.

A. APPLICATION LAYER
At this layer, BlockPerf consists of a configuration module responsible for instantiating the run's parameters similar to the one used in BlockSim. BlockSim takes as input a list of parameters, including the size of blocks, statistical models for transaction validation, block validation, and node labels. Similarly, BlockPerf takes as input a JSON file that extends the parameters from BlockSim to include fork occurrences, the block size parameter, IP address of each node, including the 'main'' one (corresponding to the BitcoinNode class, as summarized in TABLE 3) that tracks all the results using the CliStats, CpuTimeSnapshoot and MemorySnapshot classes. Optionally, the input file also considers the location to store all the logs and data for each node. Note that the BitcoinNode is responsible for tracking the execution time (a) of the overall run and collecting the metrics regarding the overall computational resource usage (b) of the nodes.

B. INCENTIVE LAYER
This layer is responsible for instantiating the models for rewarding participants, which vary from one blockchain to indicates that the simulator makes available the performance metric without any code modification and/or post-processing, while symbol indicates that such a modification and/or post-processing is needed to obtained the desired performance metric.
another. While BlockSim does not simulate the incentive layer, 2 BlockPerf models two types of rewards: one for the generation of the valid blocks (known as block reward) and the second for the inclusion of a particular transaction to the block (known as transaction fee). These reward operations are implemented via the TxChain, BTCNode and TickEvent classes (cf., TABLE 3. As of today, the block reward is represented as a fixed amount of cryptocurrency that can be configured for the run, while the transaction fee is dependent on the size of the transaction, as formalized as in Eq 1, where Tx i refers to the i th transaction and f (size(Tx i )) can be configured using different distributions laws (e.g., defined as a fixed fee, or following a Weibull, Log-normal, or Gamma distribution). All node wallets are continuously updated throughout the simulation run.
The amount of new cryptocurrencies generated over time depends on the consensus layer and the overall computational resources of the nodes in the network. BlockPerf continuously monitors such resources and allocates the newly generated cryptocurrencies to the right (mining) nodes.

C. CONSENSUS LAYER
In BlockPerf, the consensus protocol is implemented as a model that represents the consensus process and  (see TABLE 3), which is in charge of selecting the miner/validator that builds the next block. The method of selecting the node varies from one blockchain to another, although in BlockPerf, as a first step, the Proof-Of-Work (PoW) algorithm has been implemented. As opposed to BlockSim that simulates the behavior of the consensus algorithm (i.e., the block validation and a random selection of miner), BlockPerf uses an extended approach where each mining node, similar to PoW algorithm, selects a random number, the input for all the non-confirmed transactions from its queue within the limit of block size (further discussed in Section III-D), as well as the reference of the previous known block, which allows for being closer to a real PoW process. This information is then combined and hashed recursively until a result is obtained. The mining node selects the transactions from the queue (sometimes called mempool), check whether they meet the balance requirements, verify whether the sender signatures match and that the sender's wallet has a sufficient amount of cryptocurrencies. In Block-Perf, the BitcoinNode class keeps track of the fork occurrences within the network.

D. NODE/DATA LAYER
Any node wishing to generate a transaction has to follow a set of operations, namely: (i) select a wallet address (from wallet_list) as the recipient; (ii) select a random value for the transaction; (iii) generate the transaction by signing it using its private wallet key (those operations being handled via the TickEvent, Transaction Queue and TxChain classes, as reported in TABLE 3). The set of transactions to be created per unit of time t, which is denoted by X (t) = {Tx 1 , . . . , Tx k } | k ∈ N, can be configured using different distributions laws (e.g., a fixed amount of transactions per unit of time, or following a Weibull-like distribution). Every new transaction created by a node calls the propagation function from the network layer to transmit this new transaction to its neighbors, who then append it to their pool. Upon reception of a transaction (Tx i ), several operations are undertaken by the recipient node, as summarized in the flow chart FIGURE 4 (operations that have been added compared with BlockSim being highlighted in green). Unlike BlockSim, BlockPerf models transactions as they exist in the real system (i.e., each transaction can be traced back and follow a similar process within the system).
As a second stage, the simulator has to handle the inclusion and/or removal of transactionsfrom its transaction pool -into blocks. Although the consensus layer governs operations towards deciding which block to be accepted by everyone within the network, the Data Layer is concerned with the reception of the block, being in charge of removing all the transactions which are enclosed in the last received VOLUME 9, 2021 block from its transaction pool, and finally attaching the new block to its last known state of the chain (handled by the Node class). These operations follow the sequence of operations given in the flow chart of FIGURE 5 (new operations compared with BlockSim being highlighted in green, while BlockSim's operations that have been modified are highlighted in orange).
The linked blocks form a chain, but nodes may have a different state of the chain (or ledger) due to network communication delays (this is termed forks). As opposed to many existing simulators, including BlockSim, each node in BlockPerf has its local ledger implemented, which means that if a node discovers that the state of the chain is different from its own, it sends a request to neighboring nodes for getting the full state of the chain. This allows being closer to reality, which should result in more realistic simulation performance. This improvement, compared with BlockSim, is highlighted in the flow chart given in FIGURE 6 (cf. green boxes/steps).

E. NETWORK LAYER
The network layer of BlockPerf differs significantly from BlockSim, and many other simulators reported in TABLE 3, since, in BlockPerf, the blockchain network is emulated over distributed (physical) nodes. The benefit of doing so is that the simulation should lead to results closer to reality, as BlockPerf implements a P2P protocol similar to the ones used by real blockchain systems using advertisement messages (e.g., GETADDR and ADDR) to discover neighboring nodes. BlockPerf also integrates: • a wallet management module: in charge of generating wallet addresses to uniquely identify all nodes. This module also integrates the protocol to broadcast wallet-related information in the network (via WALLADD messages) to make all nodes aware of existing addresses; • a reputation score module: in charge of assigning negative scores from the connections where malformed messages were received (e.g., if the transaction correctness fails, nodes keep receiving outdated block information). This module is fulfilled by the PublicBitcoinNode class, which is responsible for keeping a log of the neighboring nodes and associated scores. Such logs allow BlockPerf to keep track of the network graph evolution (n) over the simulation run.

IV. VALIDATION AND EVALUATION
This section aims to compare the extent to which Block-Perf provides realistic results, but also to which it outperforms BlockSim. FIGURE 7 provides an overview of the experimental process that has been conducted in this respect, which consists of five main stages denoted by to in FIGURE 7. First (), a real-life blockchain network (bitcoind) consisting of 63 nodes spread over the world has been implemented, which corresponds to the benchmarking Testbed in this study whose metrics (a) to (o) have been collected/measured whenever possible (see ). Based on those metrics, several parameters are specified as input parameters of BlockPerf and BlockSim (see and ), as will be presented in section IV-A. BlockPerf and BlockSim are then compared against the results obtained for the (benchmarking) Testbed in section IV-C (see ). A discussion about our experiments is finally given in section IV-C.

A. BENCHMARKING TESTBED
The (benchmarking) Testbed is made of a modified version of the reference bitcoin implementation, namely  bitcoind. 4 The evaluation is performed during 11 days, whose blockchain network consists of 23 full nodes and 40 light nodes spread over three locations, namely Ireland, Luxembourg and India. During the course of the experiment, transactions are randomly created. Based on this experimental configuration setting, metrics (a) to (o) are measured, whenever possible, and then reused for two purposes: 1) to serve as benchmarking metrics to evaluate the extent to which BlockPerf and BlockSim deviate from the reality (see in FIGURE 7); VOLUME 9, 2021  2) to identify, from the Testbed run, several parameters that need to be configured as inputs of BlockPerf and BlockSim such as node hash rate values, block size distribution model, etc. (see -). To this end, a similar process as the one defined in [16] has been applied to extrapolate the probability distribution model for each parameter.  FIGURE 11 reports the exact number transactions over the 11 days of the experiment).

B. EXPERIMENT WITH BLOCKPERF AND BLOCKSIM & COMPARISON
In this section, the performance of BlockPerf and BlockSim is compared against the benchmarking Testbed results. Block-Sim experiments have been conducted on a single machine with a 2.40 GHz Intel Xeon E5 CPU and 4GB of RAM. In contrast, BlockPerf has been deployed over distributed nodeslocated in Ireland, Luxembourg and India -with the same computational resource (i.e., 2.40 GHz Intel Xeon E5 CPU and 4GB of RAM). Sections IV-B1 to IV-B5 discuss the comparison analyses regarding the different layers, as emphasized in FIGURE 7 (see ). Note that for this comparative analysis, only the metrics that are covered and for which the results are available (i.e., metrics with • in

From an Execution time (a) perspective, BlockSim and
BlockPerf allow for simulating the experiment at a faster pace compared to the testbed, the former being 528 times faster while the latter is 33 times faster. This substantial difference is  explained by the fact that BlockSim does not rely on a real-life network layer. In terms of Computational resource usage (b), the average CPU, memory and storage metrics have been reported in TABLE 4, although it cannot be concluded that one simulator is better than another for those metrics.

2) INCENTIVE LAYER
Unlike BlockSim and the Testbed, BlockPerf stores log about the Reward evolution (e), as they are generated within the run. FIGURE 8 provides insight into the evolution of the reward over the 11 days of simulation. On average, the reward is 0.043991 BTC per day, equally distributed to the miners with a standard deviation of 0.0002 throughout the run. Although no comparison for this metric could be made with the Testbed and BlockSim, we nonetheless report those results as they could serve as benchmarks for other researchers.
The Currency generation rate (g) can be compared with the Testbed, but not with BlockSim (metric not available). FIGURE 9 shows the evolution of the currency generated over the course of the simulation, 384 BTC being generated on average by BlockPerf, against 720 BTC generated by the Testbed. Despite this difference, what is interesting to note is that that the evolution in BlockPerf and Testbed follows a similar trend.

3) CONSENSUS LAYER
During the simulation, a number of Fork resolutions (i 1 ) are handled by the Testbed (2 in total, one at day 4 and one at day 5), by BlockSim (2 in total, one at day 4 and one at day 5), and by BlockPerf (4 in total, three at day 5 and two at day 6). These forks led to the appearance of stale blocks (i 2 ), whose occurrences over the 11 days of the experiment have been reported in FIGURE 10. Two observations can be made: (i) stale blocks mostly appear in the same timeframe (between day 4 and 6); (ii) BlockPerf provides more realistic results than BlockSim, as the number of stale blocks in BlockPerf (4 in total) is closer to the Testbed (6) than BlockSim (13).
Regarding now the Consensus computation effort (j), BlockPerf also provides more realistic results, ranging between 9-42 MH/s, while the Testbed ranges between 10-45 MH/s and BlockSim between 1-2 MH/s. The reason for such a difference is that, in BlockSim, even when specifying as input parameters the min and max hash rate values (i.e., 10-45 MH/s identified through the benchmarking phase, cf. section IV-A), it seems that BlockSim does not consider that parameters in the simulation run, always applying 1-2 MH/s as hash rate range.

4) NODE/DATA LAYER
As the first metric of that layer, let us look at the Transaction evolution (k 1 ) in FIGURE 11, which shows the number of transactions generated daily by the Testbed and the two simulators. It can be observed that BlockPerf closely follows the Testbed's transaction evolution, thus providing more realistic results than BlockSim. Indeed, BlockPerf and BlockSim respectively generate 4504 and 14184 Tx/day on average, while the Testbed generates 7341 Tx/day. For a more accurate view on the extent that BlockPerf leads towards realistic results, we show the relative absolute errors of BlockPerf and BlockSim with respect to the results obtained from the testbed is represented in the form of a boxplot in FIGURE 11. For example, the max value of the BS (BlockSim)-related boxplot (i.e., ≈ 12000) indicates that, over the 11 days of experiments, the maximal difference in terms of number of transactions generated by the Testbed and BlockSim is 12000, whichif we look at the transaction evolution in FIGURE 11 -corresponds to day 5. Looking at the error median values, it can be noted that the difference/error is quite significant in BlockSim, being approximately twofold higher compared to BlockPerf. In addition of k 1 , let us also note that BlockPerf follows the real-life transaction specification (k 2 ), while BlockSim adopts a more simplistic model, which may explain part of the difference in results of the transaction evolution above-discussed (k 1 ).
Let us now discuss about the Block evolution (l), which consists of four sub-metrics (l 1 to l 4 ). FIGURE 12(a) shows the evolution of the number of valid blocks (l 1 ) over the 11 days of experiment. It can be observed that BlockPerf produces fewer blocks than the testbed, which is likely due to the unpredictable nature of the network connections and the occurrence of forks. On the other hand, BlockSim produces a number of blocks (≈400/day) that is twice higher than the Testbed (≈200/day). Overall, as evidenced through the boxplot in FIGURE 12(a), BlockPerf provides more realistic results than BlockSim, whether in terms of min, quartiles and max values. Looking now at the second sub-metric (l 2 : block validation times), both BlockPerf and BlockSim provide similar results to the testbed, as shown in FIGURE 12(b). Indeed, the three curves followed the same trend and the relative absolute error was ≤ 1 min (i.e., one-tenth of the total time required by the Testbed). FIGURE 12(c) then provides insight into the evolution throughout the run of the block validation times (l 3 ), along with the differencerelative absolute error to be precise -between BlockPerf/BlockSim and the Testbed. It is interesting to note that, when looking at the boxplot, BlockSim provides closer results with the Testbed than BlockPerf; however, when looking at the day-by-day block validation times, BlockPerf evolves in a similar way to the Testbed, while BlockSim does not. Finally, regarding l 4 (i.e., the average number of transactions mined within blocks per day), the two simulators provide similar results to the Testbed, following a similar trend over the course of the run and having similar differences/errors compared to the Testbed (the min and max errors being the same for BlockPerf and BlockSim).
The last metric of the Node/Data layer refers to the length of the chain (m), whose evolution over the 11 days of the experiment for the Testbed and the two simulators are plotted in FIGURE 13. It can be observed that BlockPerf provides more realistic results than BlockSim, which becomes  increasingly significant over time. Indeed, the longer the simulation run, the higher the difference between BlockSim and the Testbed. This is also confirmed by the boxplot given in FIGURE 13, the minimal error values of the two simulators being approximately the same, while the quartiles, median and max values become increasingly higher for BlockSim.

5) NETWORK LAYER
Two distinct metrics are analyzed at this layer, namely how the network graph evolves over time (n), and what throughput performance is possible with the implemented blockchain and scenario (o).
Regarding the first metric (n), FIGUREs 14, 15 and 16 provide the network graph evolution over two consecutive days (days 1 and 2) respectively regarding the Testbed, Block-Perf and BlockSim. Pink nodes correspond to light nodes and yellow ones to full ones, each node being denoted by the country's initial (L: Luxembourg, I: Ireland; In: India) and the node's numbering. A twofold observation can be drawn from those graphs. First, unlike BlockPerf and the Testbed, BlockSim assumes that all nodes are interconnected in a fully connected mesh throughout the experimental run, adding that it does not distinguish lightweight and full blockchain nodes. A second observation is that the P2P logical network infrastructure evolves quite substantially from one day to another, whether regarding the Testbed or BlockPerf, which are due, among other things, to eventual connection losses, ongoing traffic,. . . . The Diameter, 5 mean geodesic distance, 6 and clustering coefficient 7 reported in TABLE 4 show that the graphs evolve in a similar manner for BlockPerf and BlockSim, which is not the case for BlockSim.
Throughput is a key metric that provides a good performance indicator of a given blockchain system. It is nonetheless tricky to analytically formalize it, as it depends on multiple processes across the different layers. Indeed Application (for generation), Node/Data (for accessing rewards), Consensus (mining), Incentive (for rewards), and Network layers (for final throughput), all combined, influence the throughput of a given blockchain system. FIGURE 17 shows the evolution of the average throughput for the considered scenario, along with the boxplot showing the relative absolute errors of BlockSim and BlockPerf when compared to the Testbed. The results show that throughput ranges between 1 and 2.5 Tx/s for both the Testbed and the two simulators (BlockPerf having slightly closer results to the Testbed than BlockSim).

C. DISCUSSION
The previous section has shown that BlockPerf outperforms BlockSim for most of the metrics (a)-(o), or provide more realistic results to be more precise. We firmly believe that this is mainly due to the assumption made by BlockSim (i.e., non-consideration of the network-level change within the system). Furthermore, our approach presents several advantages. First, it covers all the layers (except the contract one) and {n 1 , . . . , n N } the set of nodes (full and lightweight) from the network graph. 7 Clustering coefficient of entire graph: where C i refers to the clustering coefficient of a node n i that is calculated as follows: , k i referring to the degree of node n i and L i to the number of edges between the k i neighbors of n i . allows to obtain a higher number of performance metrics. Also, by emulating the network layer, our approach replicates a more realistic behavior, thus leading to more realistic results than BlockSim. Finally, even if it is not yet the case, our approach is aimed at making the plugging of any blockchain technology easier, as our code is structured following the six-layer model described in FIGURE 3.
One may wonder whether the number of nodes used in our experiments is not too small. This parameter is not critical in this study as the objective is not about showing how efficient a given blockchain is (e.g., from a scalability standpoint), but rather on showing that the proposed simulator provides more realistic evaluation results than an existing one (BlockSim in this case). As a consequence, we limited the experiment to be large enough to capture a realistic scenario, and short enough to be manageable (in terms of costs).
One may also wonder whether the size of the datasets is large enough to draw relevant conclusions about the comparison analysis. In this regard, let us note that all the graphs given in FIGUREs 8 to 17 only report the average values of all the measurements obtained on a daily basis for the corresponding metrics. Just to give an indication about the number of measurements that have been collected every day, around 10000 measurements/day were collected regarding metrics (e) and (k 1 ), 200 measurements/day regarding (l 1 )-(l 3 ), and about 50 regarding l 4 . Given this amount of measurements, we confidently state that the conclusions drawn from our comparison analysis are relevant (statistically speaking).

V. CONCLUSION, LIMITATIONS & PERSPECTIVES A. CONCLUSION
Today, a large variety of blockchain technologies is expanding in order to fulfill technical and non-technical needs and requirements. Within this context, determining and, most importantly, evaluating the characteristics/performance of a given blockchain platform is crucial for system designers before deploying it. In this respect, several blockchain simulators have been proposed in the literature over the past few years. Still, they are often limited in several respects (lack of extensibility, failure in covering all aspects/metrics underpinning a blockchain system, etc.). In this paper, the six-layer model introduced by [15] (Application, Contract, Incentive, Consensus, Node/Data, Network) is considered, against which 15 performance metrics have been mapped.
To overcome the limitations of state-of-the-art blockchain simulators, a new one called BlockPerf is proposed in this paper, an extension to the existing BlockSim simulator. BlockPerf tries to cover as much as possible the layers mentioned above along with its metrics. A comparative analysis with BlockSim is carried, which has demonstrated that Block-Perf provides more realistic results than BlockSim (e.g., at the Node/Data and Network layers) respectively improved by 39% and 55% in average. However, several limitations remain to be addressed in the future, as discussed in the next section. The comparison analysis conducted in this paper is built upon a benchmarking bitcoin scenario.

B. LIMITATIONS & PERSPECTIVES
Several limitations to BlockPerf are to be addressed for it to be a more exhaustive simulator. First, the deployment cost of deploying nodes within BlockPerf on different geographical locations is not a simple and straightforward task, as it requires careful planning and deployment of nodes, connecting them with the main interface within BlockPerf. However, such a layer is, in our opinion, crucial to obtain simulation results closer to reality.
Second, BlockPerf, in its current form, only supports Bitcoin-related experiments, and some efforts still remain to be done to allow for simulating different blockchain platforms. For example, a blockchain-based system such as the IOTA, which uses graph-based transaction chains, would require more extensive tuning of BlockPerf to replicate the effects of its real-world deployment.
Third, the Contract layer still remains to be developed within BlockPerf. The ability of deploying contracts has been a key element and argument within all the blockchain technologies that emerged over the last years, and being able to analyse them within the simulation environment would likely provide further insights on its usage. However, covering/integrating such a layer in a simulator is particularly challenging because each smart contract execution happens within a virtual computing environment (e.g., EVM in Ethereum), which is called by every node execute the instructions contained by that contract. This environment itself relies on other layers for execution, including the execution of certain transactions or changing the state of nodes, which makes it difficult to obtain realistic execution effects on the overall system, as discussed in [46]. Given this complexity, we believe that the adoption of a hybrid approach that uses the virtual computing environment as an emulation layerin a similar manner as done in BlockPerf with the network layeris an efficient way to cover/integrate the smart contract layer in a simulator.