An In-Depth Investigation of Performance Characteristics of Hyperledger Fabric

This chart illustrates the impact of the most influential experimental parameters on Hyperledger Fabric performance. We show that manifold parameters, such as the choice of hardware, transaction privacy, database

1,000 2,000 3,000 4,000 5,000 6,000 0 Bitcoin marks the first form of a blockchain system, developed primarily for the decentralization of financial systems.
In particular, it created a new virtual currency that allows transfers without distinct intermediaries such as banks [33].
Based on the general concept of Bitcoin, Buterin et al. [7] extended the scope of blockchain technology towards a broader field of applications.Ethereum improved the versatility of blockchains, which so far could only provide limited programming logic to users, by introducing a programming language and an associated runtime environment ("Ethereum virtual machine") to execute smart contracts.These were initially conceptualized by Szabo [40] and allow the execution of highly customizable program code in a Peer-to-Peer (P2P) environment without relying on a distinct intermediary.
The advancement of blockchain technology has fostered the development of decentralized applications in business and the public sector, going far beyond the initial use cases within the financial sector [10,15,17,21,27,30,31,38].However, despite the large potential benefits of distributed ledgers to enterprises, such as consolidating audit and production data in an unimpeachable distributed database, public blockchains suffer from many limitations.For example, they yield high transaction fees, low throughput, high latency and the lack of finality, high energy consumption, as well as transaction confidentiality [22].
Toward these new paradigms, developers introduced new frameworks to answer various industries' rising demand regarding enterprise-level blockchain applications.Modified blockchain architectures address the shortcoming of public permissionless blockchains and adapt them to the need of enterprises [22].To achieve these goals, frameworks that implement private permissioned blockchains, which restrict participation to the blockchain and consensus to a consortium, were developed, [4].Within these implementations, Hyperledger Fabric (Fabric) has become the leading solution for many applications.In particular, a wide range of use cases make use of Fabric's properties, as the framework provides high security and performance as well as flexible tools for access management, privacy, and implementing business logic [1,22].
Currently, several major projects based on Fabric are transitioning from tests and minimum viable products with limited scope to production-ready systems, which results in a growing number of participating parties and operations in these projects [19,32].However, the requirements toward private or public transactions, different complexities of smart contracts, and the need to support and adapt topologies differ heavily between these projects [22].Fabric offer various configurations to adapt to a wide range of different use case requirements [24].The choice of various architectural parameters, such as network size, choice of hardware, internet connection speed, and complexity of operations (i.e., smart contracts methods), is known to have a large impact on blockchains' performance.Consequently, trade-offs between security, network size, privacy, and performance must be considered when designing a system with high-performance and specific reliability requirements [22].
Our literature review in Section 3 identifies two important gaps in the current body of knowledge that have not been addressed by previous research [39].First, existing studies focus on particular variables without allowing for a holistic view.This drawback is mainly due to current studies conducting their measurements with different non-standardized tools.Additionally, many do not provide full transparency of how they define their key metrics and arrive at their results.Thus, observations of certain variables lack replicability and generalizability.However, these criteria are essential to allow a holistic view of the performance of Fabric.Second, the development of Fabric progresses fast and frequently introduces changes, offering new configuration options and features that impact the performance and are not covered by literature.For example, private data collections effectively implement the necessary level of privacy in a cross-enterprise system and are hence essential for many enterprise-level applications [24,45].The private data transaction process comprises increased complexity over the conventional transaction process by introducing additional gossip routines.
These protocol changes make it hard to predict private data transactions' performance compared to conventional transactions.However, the impact of using private data collections on performance has not been studied in academic literature to the best of our knowledge.
In this paper, we address the identified research gap by studying a wide variety of performance characteristics of Fabric.We present an in-depth performance analysis of Fabric from the perspective of both researchers and architects of large-scale, enterprise, and public sector projects.Our measurements significantly extend the range of performance characteristics studied before.It includes scenarios that are highly relevant for the use of blockchain-deployments in the real world by industry and the public sector, such as a need for confidentiality, cross-data center, and inter-continental deployments, and availability and resilience.Following Kannengießer et al. [23], the right balance of these factors is essential for allowing blockchain to create value effectively.Thus, our research objective is to develop a list of relevant variables, measure their specific impact on different Fabric implementations, and demonstrate the potential of Fabric in different scenarios.Our results aim to contribute to a better understanding of enterprise blockchains as an example of a highly complex fault-tolerant distributed system in real-world settings as enterprises would need them.For example, we show that Fabric scale very well with CPU-heavy transactions but struggles to perform with payloads larger than 100kb.Another important finding is that Fabric is very suitable for intercontinental networks, but especially private transactions suffer from the resulting high latency.Besides demonstrating our results, we also contribute to an extended version of the Distributed Ledger Performance Scan (DLPS) [39], a blockchain benchmarking framework, to investigate many of the identified knowledge gaps regarding performance characteristics.The DLPS provides clear definitions of key performance metrics and offers an end-to-end description of their setup and measurement, allowing for full transparency and repeatability.We provide our extension of the DLPS in the open-source repository [13] as well as the results of our experiments for researchers to repeat our measurements or easily investigate new configurations.
The remainder of this paper is structured as follows: Section 2 gives an overview of essential background concepts and the architecture of Fabric.Section 3 provides a literature review of existing work on benchmarking Fabric.We then use the findings of the literature and derive the main shortcomings that we want to address in Section 4. We also provide details on the measurement process with the DLPS.Afterward, Section 5 presents the main findings of this paper by demonstrating our benchmarking results relative to a wide range of variables employed in the benchmark testings.In Section 6, we discuss our findings, provide implications for real-world applications, and give design guidelines.Finally, in Section 7, we conclude this paper and provide suggestions for future research opportunities.

System Architecture
Since version 1.0, Fabric facilitates a paradigm that fundamentally differs from most blockchains to offer improved performance, flexibility and privacy features [24].Instead of relying on an order-execute architecture, Fabric uses an execute-order-validate paradigm (see Figure 1).Order-execute means, first, that the consensus mechanism is responsible for ordering and then broadcasting new transactions and, second, that all peers execute these transactions sequentially.
In contrast, execute-order-validate implies that Fabric separates execution and validation from ordering [1].
The changed replication process requires a new system architecture.A Fabric node can take up one of the following three roles [1]: Order-execute architecture in replicated services Execute-order-validate architecture of Hyperledger Fabric Fig. 1.The execute-order-validate paradigm in Fabric compared to the order-execute architecture that most blockchains exhibit.
• Clients are responsible for submitting a transaction proposal to the peers and finally broadcast transactions in the form of a bundled endorsement response for ordering [1].• Peers receive the transaction proposal from the clients, simulate them and send the signed result back to the clients.Moreover, they eventually validate transactions.All peers maintain a ledger consisting of an append-only data structure (blockchain) of all previous transactions and a structure that represents the latest world state of the ledger.For storing the ledger state, peers can make use of conventional databases.Currently, Fabric v2.0 supports LevelDB and CouchDB.Due to the execute-order-validate paradigm, Fabric does not require all peers to execute all transaction proposals -a design choice that makes Fabric quite special compared to most other blockchains, both public permissionless and private permissioned.By means of an endorsement policy, one can specify the subset of peers that is required for the transaction proposal execution for each smart contract method individually.These peers are also called endorsers or endorsing peers [1].• Ordering Service Nodes, which can also be called orderers, jointly form the ordering service.The ordering service is responsible for creating the total order of all transactions.There are different ways to implement the ordering service, ranging from a (by now deprecated) solo orderer to distributed protocols, such as RAFT [36] and Kafka [25], to address different levels of fault tolerance [1].In the future, developers want to introduce additional ordering services, which should also consider byzantine faults [28].
Clients, peers, and orderers are further grouped into organizations (abbreviated as orgs), typically representing companies or wider groups of participants.Based on their organizational affiliation, these entities have different rights, including the permission to join a Blockchain channel, which represents a private subnet of communication between two or more network participants including a corresponding ledger.A peer can either join one or multiple channels.Opting to have an extended amount of nodes in one organization provides additional redundancy and eventually to some extent performance through the distribution of the simulation workload [41].This approach allows parallel endorsements of transactions.

Transaction Flow
Fabric's execute-order-validate paradigm separates the transaction flow into three parts: Execution (sometimes also referred to as simulation) of a transaction and checking its correctness by comparing the signed result of redundant execution on different peers, which is also called endorsement; ordering by means of a consensus protocol, regardless of the semantics of a transaction; and transaction validation, ensuring endorsement policy and state consistency [1].
Figure 2 gives an overview of the transaction flow.In detail, [1] describe the three phases as follows: (i) Execution Phase: A client sends a cryptographically signed transaction proposal to one or more endorsing peers for the execution (simulation).The peers do not yet update their ledger but only generate a read set and a write set (1).The write set consists of all key updates resulting from the simulation and the read set contains all keys that the peers read during the simulation.The endorsers then create a cryptographically signed endorsement, including the read and write set, and send it back to the client in a proposal response.The client collects endorsements until the requirements set by the endorsement policy are met (2).This action also ensures that all endorsers produce the same execution result and, thus, respond with the same read-write set [1].
(ii) Ordering Phase: Once the client has received enough endorsements, it bundles them all, creates a signed transaction, and sends it to the ordering service (3).The ordering service uses consensus to establish a total order of all transactions.In addition, the ordering service will batch the transactions in blocks and sign them [1].
(iii) Validation Phase: Blocks can either be delivered directly by the ordering service or from other peers through a gossip protocol (4).If a new block arrives at a peer, it will enter the validation phase (5), which involves the following three sequential steps [1]: a.The peer checks if every transaction fulfills the endorsement requirements.If a transaction is invalid, the peer will mark it accordingly and ignore its effect.b.The peer checks each transaction sequentially for read-write set conflicts.Hence, it compares the key, value, and version of the transaction with the current state of the ledger and ensures they are still the same.If a transaction is invalid, the peer will mark it accordingly and ignore its effect.Non-Endorsing Peer Data communication Fig. 2. Fabric high-level transaction flow, adapted from Androulaki et al. [1].
Manuscript submitted to ACM c.The peer enters the ledger update phase and appends the block to the local store ledger.For each transaction that is not marked as invalid, the peer will also write all key-value pairs of the write set to the local state.Fabric consequently records invalid transactions even though they do not affect the state [1].

Private Data Transaction Flow
Since version 1.2, Fabric also supports private data through private data collections.A private data collection represents privacy policies, defining which peers should process and store related data and which organization should be able to access them [29].In particular, private data is a feature that is primarily made possible by the execute-order-validate paradigm and the endorsement policies that do not require every peer to recompute every transaction to validate it.
This feature allows to conduct transactions where only a subset of the organizations that participate on the Fabric blockchain store the actual data while the remaining organizations only see the transaction hash, without relying on complex cryptographic techniques such as Zero-Knowledge Proofs or Homomorphic Encryption [20].
Private data mainly makes use of the standard protocol described in Section 2.2, but differs at some stages to address the confidentiality in the three phases execution, ordering, and validation (see Figure 3).
(i) Execution Phase: The client sends a proposal request to the designated endorser of the authorized organizations, including the confidential data.Based on the collection policy, defining which organizations should be able to access the private data, the endorsing peers distribute the private data to other authorized peers via a gossip protocol.All peers receiving the private data store them in a transient data store.Similar to the general transaction flow, the endorsers generate a read-write set and send it to the client in the form of an endorsement (1).However, the read-write sets do not contain any confidential data, but rather a hash of the private data keys and values.As soon as the client has received enough endorsements (2), it will send a transaction to the ordering service (3) that is responsible for the total order of transactions [29].
(ii) Ordering Phase: The ordering phase works similar to the general transaction flow.On consensus, the orderers include the transactions in a block and distribute them to all peers (4).Therefore, all peers receive the hashes of the private data, allowing for later validation [29].(iii) Validation Phase: All peers will store the transaction in their ledger and update the read-write set with the associated hash values.Additionally, in case a peer is authorized to access the private data related to the transaction, it will check its transient data store for the private data.If the peer has not received the private data in the execution phase, it will try to pull the private data from other authorized peers.Then, the peer will use the hash values of the transaction to validate the private data (5) and eventually save it to the private state database [29].In general, the peer makes use of an additional table of the regular state database.
While private data provides an approach to transfer data confidentially between specific organizations, the needed certificates are still used in plain text for verifying permissions.Even without insights into the content of a transaction, this alone leads to severe confidentiality issues, as it is apparent who is issuing new transactions.IBM, therefore, introduced an implementation of the identity mixer protocol [5,9] to hide the identity of the issuing client certificate.
Besides being highly experimental, this feature is only supported in the Java-implementation of the Fabric client yet.

RELATED WORK
While the performance of blockchains is often considered a crucial part when working towards production-level systems [41], research in the field of systematic benchmarking at the time of writing this article is still scarce.To gain a better understanding of existing research on the performance of Fabric, we conducted a structured literature review.To ensure that we include any relevant paper, we first defined Hyperledger AND Fabric as a search string.We then used the string for queries in ACM Digital Library, Web of Science, IEEE Explore, and arXiv.The initial set contained 552 papers.After the first screening of the title and abstract, we excluded 533 publications.Based on a full-text analysis, our final set includes fourteen articles that analyzed the performance and scalability of Fabric.With the full-text analysis, we removed any paper that performed benchmarking on a heavily modified version of Fabric, as related findings are highly theoretical and not transferable to the publicly available versions of the system.Furthermore, we excluded studies, which only test very small network sizes (less than six peers), as the generalizability of these results is very limited.Table 1 depicts the final set of papers that we analyzed thoroughly.
Pongnumkul et al. [37] marks the first publication on the performance of Fabric.Comparing the Go implementation of the Ethereum client (Geth) to Fabric, the authors demonstrated the potential performance benefits of using a private permissioned blockchain.Later, Dinh et al. [11] laid another vital foundation of the analysis of the performance of private, permissioned blockchains by introducing the first systematic benchmarking framework called Blockbench [6].
Blockbench makes heavy use of Yahoo!Cloud Serving Benchmark and Smallbank, which are both benchmarking frameworks for conventional IT systems focusing on centralized databases.The authors eventually compared Fabric to Geth and Parity.Dinh et al. [11] decided not to opt for the re-architectured v1.0 of Fabric but used v0.6 for the comparison, as they gained much better performance results with the older release.
Later work almost exclusively makes use of Fabric >v1.0.In comparison to the findings of Dinh et al. [11], who reached around 1,000 transactions per second, Androulaki et al. [1] gained much higher performance statistics with the newly introduced architecture of v1.x.The authors provided an extensive analysis of the preview version of v1.1.
They demonstrated that Fabric is potentially able to cope with over 100 peers and, under the right circumstances, perform more than 3,500 transactions per second.Nevertheless, the performance results of Baliga et al. [3] were again significantly lower than that of Androulaki et al. [1], illustrating that the potential performance is dependent on various factors, such as benchmarking framework, the release version of Fabric and employed hardware.Later research, therefore, extended testing on Fabric by introducing even more parameters and newer release versions of Manuscript submitted to ACM

Source Detailed content
Pongnumkul et al. [37].This article presents a methodology for evaluating the performance of Ethereum and Fabric.The research team eventually derives performance figures for execution time, latency, and throughput, also considering different workloads.
Androulaki et al. [1].This paper presents the execute-order-validate blockchain architecture of Fabric v1.1.0.The research team examines the throughput and latency under consideration of various parameters, such as block size, number of vCPUs, and number of peers.
Baliga et al. [3].This research makes use of Caliper to examine the performance of Fabric v1.0.The authors consider various impacting factors, such as the number of nodes, endorsement policy, block size, and transaction size.
In this research paper, the authors present the first systematic benchmarking framework for permissioned blockchains called Blockbench.It builds on the established YCSB and Smallbank frameworks and allows benchmarking of private Ethereum (Geth, Parity), Fabric, and Quorum.Based on the framework, the authors compare the performance of Fabric to Ethereum.
Hao et al. [18].This article presents a method for evaluating the performance of consensus algorithms in Ethereum and Fabric.The authors eventually derive performance figures for latency and throughput, taking also varying workloads into consideration.
The authors of this article demonstrate the performance of Fabric v1.0 in comparison to v0.6.Besides analyzing execution time, latency, and throughput, their study also varied the number of nodes to examine the scalability of the two implementations.
In this research article, the researchers examine the impact of various factors such as block size, endorsement policy, channels, and state database choice on Fabric v1.1.They eventually identify performance bottlenecks and propose optimizations that were included in later Fabric versions.
Kuzlu et al. [26].Making use of Caliper, this research examines the performance of Fabric regarding throughput, response time, and simultaneous transactions.
The authors employ a customized version of the Hyperledger Caliper benchmarking framework to examine the effect of sup-second network delays on the performance of Fabric.The test setup used two cloud instances, one in Germany and one in France, to create the Fabric network.
Dreyer et al. [14].This article evaluates the performance of Fabric.The authors create various network configurations and measure throughput, latency, and error rate, along with the overall scalability of the Fabric platform.
The authors put the results of their research in context with older versions of Fabric.
The authors strongly motivate their research on the application of blockchain in the field of crossborder e-government services.However, in doing so, they also address the performance of Fabric in a dedicated manner.Among other variables, they address network delay as an important factor for the performance of Fabric.
Thakkar and Nathan [41].This paper examines the performance of Fabric v1.4 considering horizontal scaling (e.g., by adding more nodes) and vertical scaling (e.g., by varying the number of CPUs per node).Based on these observations, the authors propose an optimization of the Fabric architecture, including pipelined execution of validation and commit phase.
Wang and Chu [43].This article goes into detail about the performance of Fabric and especially shows the performance of different ordering services.For this purpose, a network with 20 machines is used, and the different phases of the transaction flow and endorsement policies are considered.
In this paper, the authors developed a theoretical analysis framework to study the performance of Fabric, considering the execute-order-validate logic in Fabric v1.4.A series of experiments were conducted to compare the results with the simulations, thus verifying the theoretical model.
Table 1.Existing literature on performance investigations of Fabric.
Fabric.Thakkar and Nathan [41] and Kuzlu et al. [26] performed their analysis on v1.4 of Fabric, further revealing the complexity of performing performance tests of blockchain systems.In particular, Kuzlu et al. [26] concludes that besides the specific infrastructure the blockchain resides, also the design of the transactions, e.g., type and number, profoundly impact performance.Recently, Dreyer et al. [14] published their results, showing first measurements of the performance of Fabric v2.0.According to the authors, the performance of Fabric v2.0 improved significantly in comparison to older versions of the blockchain framework.

Vertical scaling
Horizontal scaling

Multiple workloads
Network delays

Crashing nodes
This paper 2.0 (1.4) Table 2. Evaluation of the measurements conducted by the research papers in our literature review.
While later work introduced additional influencing factors, the results of Androulaki et al. [1] and Thakkar et al. [42] still remain one of the most complete presentations.Table 2 demonstrates that later work primarily focuses on specific characteristics, such as sole analysis of the effect of very high network delays.Nevertheless, due to the particular dependency on a wide range of other factors, including different benchmarking tools and definitions of key metrics [39], related findings give an initial impression but are hard to integrate into the results of other researchers.
In summary, the existing literature offers first important insights into the properties of Fabric, but still has a narrow parameter space.Furthermore, reproducibility is limited due to the often minimal description of the methodology used, which, finally, still offers considerable room for improved generalizability of the results.

EVALUATION FRAMEWORK
To further expand our understanding of private permissioned blockchains and specifically conduct further analysis of the various potentially influential variables and the performance of Fabric, we do standardized benchmarking.We found that available tools for blockchain benchmarking that apply to Fabric are Blockbench [6], Caliper [8] and the DLPS [39].
Blockbench and Caliper do not clearly define how they determine the key performance metrics, particularly throughput and latency.The algorithm by which these are determined remained unclear.Thus, we chose the open-source framework DLPS.Furthermore, DLPS allows for sophisticated network deployment using cloud services, which enabled us to test a wide variety of configurations.
Our benchmarking covers all variables that we identified through reviewing the related work (see Table 2).Throughout our testing, we furthermore identified seven additional variables that potentially affect the performance of Fabric.In fact, the DLPS did not yet cover all the particularities of Fabric.First, we upgraded the Fabric version supported by DLPS to Fabric 2.0 and included multi-channel setups.Second, we added support for private transactions and complex Table 3. Design choices and network specifics for which a need for detailed investigations was determined.
queries.Third, we extended the supported architectural parameters by allowing the CouchDB and ordering node docker containers to run either on the same node as the peers or on separate nodes.Splitting tasks on multiple machines or joining them to reduce cross-instance latencies might help to increase performance.Fourth, we added both support for simulating network delays as well as for multi-datacenter deployments.Finally, we refined the overall benchmarking process, evaluated single-core CPU usage, traffic stats, and added capabilities to trigger automatic crashes of orderers and peers, which, e. g., required the dynamic identification of the current leader in the RAFT ordering service.Hence, the final framework allows testing for all previously mentioned variables that we found in existing research and extends them by new unique features that, according to our literature review, have not been investigated before.Table 3 provides a description of all variables considered for the benchmarking.With the publication of this paper, we make our enhancement to DLPS, as well as the configurations and results of all the experiments that we conducted for this publication, available on the DLPS GitHub repository [13].
We performed the testing in an incremental way to increase the reliability of our results.The left chart in Figure 4 describes a single benchmarking run.We used a series of these runs to create a benchmarking ramping series (see the right chart of Figure 4).We create a configuration file that specifies all particularities of the Fabric network.The DLPS uses this file and automatically sets up a blockchain and client network in Amazon Web Services (AWS) before the benchmarking process starts.A single benchmarking run in the DLPS involves sending requests from clients to the network for a specific duration at a specific rate  req , namely the slope of the requests (see orange requests curve).The arrival of confirmations that the transactions have been processed successfully is illustrated by a responses curve (see green responses curve).In this curve, one can see distinguished blocks as they mark a quasi-simultaneous confirmation  of the included transactions, resulting in steps.The slope of the linear regression of the response curve corresponds to the average rate of responses.The average time between sending and receiving confirmation of a specific transaction, or, equivalently, as long as the linear regressions remain parallel, the distance of the intersections of the regression for the request and response curves with the -axis marks the latency.By starting at a low request rate and repeating at an increased request rate in case the network can process requests at the given rate (see -axis of right Chart in Figure 4), we can localize the maximum throughput, where an increase in the request rate does not further improve or even decreases the response rate (due to queueing or overstress).In Figure 4, this behavior can be seen in the right chart at a request rate of appr.450 tx/s.In all measurements, we monitored the effectivity, i.e., the rate of transactions that were finally successfully operated and controlled resource stats such as CPU usage and network traffic, to gather additional information that might help to find the bottleneck.More information about the DLPS can be found on GitHub [13] and in the associated paper [39].
By default, the deployments and tests with the DLPS are highly homogeneous and symmetric.We associate each client with one blockchain node, and each blockchain node is associated with the same number of clients.We use this to have the ability to send requests completely uniformly, i.e., at  transactions per second, a transaction is sent each 1  second, which is again uniformly distributed among the clients.For example, if we have a 10 node network and 20 clients, and a request rate of 100 tx/s, every client sends requests at 5 tx/s, and we also make sure that there is a uniform offset between the clients.In case a client has multiple cores, and we use multiple workers for multi-threading, we make sure that there is a homogeneous offset.While at high request rates, the offset is harder to enforce and much less relevant.A high degree of uniformity is relevant for measuring maximum throughput correctly when it is low, as there are no spikes in the nodes' workload.
We used instances from the m5 series in AWS since they offer a good balance between computation, networking, and disk operations, all of which are necessary for blockchain nodes.Details with respect to the instances are displayed  { " n o d e _ t y p e " : " m5 .l a r g e " , " f a b r i c _ v e r s i o n " : " 2 .0 .0 " , " f a b r i c _ c a _ v e r s i o n " : " 1 .4 .4 " , " t h i r d p a r t y _ v e r s i o n " : " 0 .4 . 1 8 " , " c h a n n e l _ c o u n t " : 1 , " d a t a b a s e " : " CouchDB / LevelDB " , " e x t e r n a l _ d a t a b a s e " : " F a l s e " , " i n t e r n a l _ o r d e r e r " : " F a l s e " , " o r g _ c o u n t " : 4 , " p e e r _ c o u n t " : 2 , " o r d e r e r _ t y p e " : " RAFT " , " o r d e r e r _ c o u n t " : 4 , " b a t c h _ t i m e o u t " : 0 .5 , " max_message_count " : 1 0 0 0 , " a b s o l u t e _ m a x _ b y t e s " : 1 0 , " p r e f e r r e d _ m a x _ b y t e s " : 4 0 9 6 , " t l s _ e n a b l e d " : " True " , " e n d o r s e m e n t " : " OutOf ( 2 , 4 ) " , " p r i v a t e _ f o r s " : 2 , " l o g _ l e v e l " : " Warning " , " c l i e n t _ t y p e " : " m5 .l a r g e " , " c l i e n t _ c o u n t " : 4 , } Fig. 5. Default settings for the Fabric network architecture.
{ " d u r a t i o n " : 2 0 , " l o c a l i z a t i o n _ r u n s " : 2 , " r e p e t i t i o n _ r u n s " : 0 , " method " : " w r i t e D a t a " , " mode " : " p u b l i c " , " s h a p e " : " smooth " , " d e l a y " : 0 , " r2_b ound " : 0 .9 , " f r e q u e n c y _ b o u n d " : 1 0 0 , " l a t e n c y _ b o u n d " : 1 0 0 0 0 , " d e l t a _ s e n d " : 0 .5 , " d e l t a _ r e c e i v e " : 0 .5 , " s u c c e s s _ b o u n d " : 0 .8 , " r e t r y _ l i m i t " : 2 , " ramp_bound " : 2 , " s u c c e s s _ b a s e _ r a t e " : 0 .8 , " s u c c e s s _ s t e p _ r a t e " : 0 .0 4 , " f a i l u r e _ b a s e _ r a t e " : 0 .8 , " f a i l u r e _ s t e p _ r a t e " : 0 .0 4 , " d e l t a _ m a x _ t i m e " : 1 0 } Fig. 6.Default settings for the benchmarking logic.

Stability of the default setup and comparison of software versions and databases
We first compared different modifications of the default architecture with respect to parameters that we considered relevant from the related work that we found in our literature review and from our experience working with the DLPS (see Figure 7).As the error bars, which represent the standard deviation obtained from conducting every experiment three times, indicate, the results are very consistent and reproducible.
Like Thakkar et al. [42], we found that write throughput for the default setup with LevelDB is around three times the maximum throughput with CouchDB, and we can even extend this result to private transactions (see Figure 7).
We observe that the following modifications only have an insignificant impact on throughput: Doubling the number of clients to distribute client workload to more workers (0 % impact on performance with public transactions and 4 % increase on performance with private transactions), doubling the number of channels (2 % decrease on performance with public transactions and 13 % increase on performance with private transactions), deactivating TLS and switching to a centralized ("solo") orderer (13 % increase on performance with public transactions and 3 % increase on performance with private transactions).This suggests that neither the number of clients and channels nor the ordering service and TLS is the bottleneck for the default architecture in our setup.While the results of [42] support that the ordering service is not a bottleneck in a similar architecture, they find that doubling the number of channels increases CPU utilization and hence also throughput considerably.However, in our case, the peers' CPU utilization is already very close to the maximum on all virtual cores with one channel.This observation indicates that the two-channel configuration eventually will not exhibit a higher throughput.In our case, the difference between single-channel and dual-channel setup was marginally low, resulting in a variation of only 9 % for private transactions.In contrast, public transactions did not show any significant impact.It is to note that these numbers represent the results with CouchDB.With LevelDB, the relative deviations largely were even smaller.
Performance benchmarks with older versions of Fabric, particularly by Pongnumkul et al. [37], Dinh et al. [12] and Nasir et al. [34], generally yield lower throughput (few hundred tx/s with LevelDB) on considerably better hardware, indicating that the development of Fabric has already led to considerable improvements of performance.Surprisingly, we noticed a slight decrease in performance of v2.0 compared to the previous version 1.4.4 for CouchDB, as opposed to the results of [14].As such, v1.4.4 using CouchDB was about 26 % faster with public transactions and 68 % faster with private transactions than v2.0.With LevelDB, the difference for private transactions dropped to only 11 % and with public transactions v2.0 was even 5 % faster than v1.4.4.We argue that the discrepancy between our results and that of Dreyer et al. [14] is due to how they made their conclusion.In fact, the authors compare their results for v2.0 with the results of Nasir et al. [34] for v0.6 and v1.4.However, the studies' testing environment was different and Dreyer et al. [14] used stronger machines with more computing power, probably resulting in their v2.0 measurements' better performance.In contrast, our comparison of v1.4.4 and v2.0 was conducted ceteris paribus.

Endorsement policy.
The endorsement policy, which we describe in Section 2, is an important setting, as it drastically changes the level of redundancy.More endorsers mean more overhead but also higher robustness.As we illustrate in Figure 8, increasing the number of endorsers, i. e., the degree of redundancy of simulation, decreases throughput as expected.In absolute and relative numbers, LevelDB suffers from a much higher performance decrease with a higher number of orderers compared to CouchDB.For example, maximum throughput for simple public transactions with LevelDB decreases by 24 % resp.54 % when switching from only one endorser (and thus no crosschecks of correct chaincode execution) to two or four endorsements, respectively.For CouchDB, degradation is 14 % resp.41 %.
For private transactions, we looked at pairwise private collections, i. e., private transactions between two orgs.For private transactions, a degradation from 2 to 4 endorsers results in a loss of 14 % (CouchDB) and 30 % (LevelDB).These numbers are notably lower than compared with public transactions (31 %, and 42 %).Thus, in general, performance decreases heavier for LevelDB relatively (and absolutely) when more endorsements are necessary.Surprisingly, for only one endorser, in the case of private transactions, we noticed a strange behavior of Fabric that would result in throughput in the one-digit range as soon as multiple clients were requesting transactions from different peers.However, so far, we could not determine the underlying reason.5.2.2 Network architecture.Initially, we see an increase in maximum throughput when increasing the number of peers per org while keeping the number of endorsers constant (see Figure 9).Likewise, increasing the number of orgs while keeping the number of peers per org and the number of endorsers constant increases maximum throughput.However, we also notice that maximum throughput decreases again for large network sizes, so there generally seems to be an optimum.For the given setup, this optimum is at 8 peers per org and an endorsement policy of 2. Hence, we could improve the performance by up to 32 % for public transaction performance by ensuring the right number of peers.With private transactions, the numbers are still about 21 % better with 8 peers per org than with 2 peers per org.
Scaling the number of orgs and the endorsement policy equally only slightly reduces throughput for smaller networks, a potential reason being that the endorsement workload for each peer remains constant and the other operations like networking and committing are not yet the bottleneck in this regime.Nevertheless, for larger network sizes, the throughput degrades considerably, and we also see that the difference between having one and two peers per org on throughput becomes negligible.This makes sense, as networking becomes the bottleneck in this regime, and splitting the endorsement workload does no more give a considerable advantage to the networks with more peers per org.
For scaling the number of RAFT orderers, we expected a performance decrease but could not yet observe one in our chosen scenarios.Seemingly, in the regime below 1,500 tx/s, the ordering service is not a bottleneck for up to 64 orderers.Nevertheless, the ordering service might become a bottleneck for even larger ordering services.However, using a RAFT ordering service with up to 64 nodes should be sufficient in practically any scenario since this would allow a total of 31 crashes and still ensure the network's functionality.

Database location.
By deploying databases, orderers, and peers onto separate systems, one can gain a little performance boost (see Figure 10).In our default scenario, the database runs on the peer node (which is obligatory for LevelDB) and orderers on separate nodes.We see that running both the orderer and the peer on the same node decreases performance only slightly.Disregarding other important factors, separating an org's Fabric components on several computers is notably less efficient.In particular, we observed a decrease of only 15 % in the case of the m5.large machines and 6 % in the case of the m5.2xlarge machines.Running the CouchDB on a separate node has some considerable effect on weaker hardware (increase of 23 % with m5.large).The throughput improvement is similar for private transactions on m5.large and m5.2xlarge hardware, amounting to 22 % resp.18 %.Still, it becomes smaller (in relative terms) as soon as better equipment comes into play (increase of 12 % with m5.2xlarge).

Database
Type.Since already very early during our experiments, we realized that the performance of Fabric is susceptible to database choice.On average, the performance of Fabric was two to three times faster with LevelDB compared to CouchDB.Especially using private data, the difference was more noticeable.With private data and m5.large machines, Fabric was 272 % faster using LevelDB over CouchDB in the default setup.

Hardware.
It is important to determine the correlation between machine strength and performance since systems should scale with better hardware (and better network).
As long as the number of vCPUs is small, an increase in their number improves performance notably (see Figure 11).
For example, the performance increase for private transactions with CouchDB is 97 % when moving from m5.large to m5.xlarge instances and 62 % when moving from m5.xlarge to m5.2xlarge instances.This observation holds similarly both for CouchDB and LevelDB and both for public and private transactions.However, the improvement for moving from m5.2xlarge (8 vCPUs) to m5.4xlarge (16 vCPUs) is already small (less than 25 % for CouchDB and less than 20 % for LevelDB for both public and private transactions), also when taking into account that this also implies twice the costs for hardware or cloud services.We also measured with m5.8xlarge instances.However, we noted that this, at best yielded moderate performance improvements (even less than moving from m5.2xlarge to m5.4xlarge).Besides, crashes of peers became quite frequent (particularly for LevelDB), which led to our result being even worse than for m5.4xlarge.
We think that searching for the reason for this behavior might be an interesting starting point for future improvements of Fabric and yield better scaling with hardware.
Like Thakkar and Nathan [41], we also observed that CPU utilization drops for hardware with many cores.Thakkar and Nathan [41] also argues that throughput can be increased by using more peers on multiple channels; however, this basically corresponds to running multiple blockchains instead of one, and currently, only cross-chain read-operations between the blockchains (channels) are supported.Our experiments also suggested that for hardware with many cores, the CPUs cannot be fully utilized, and there is also not a single core that reaches more than 90 % CPU utilization.
The computational tasks, hence, seem well parallelized and suggest that in the end, writing to disk is the bottleneck.
However, we wanted to check whether using multiple channels could leverage additional, previously unparallelizable resources.This does indeed seem to be the case, but only to a small amount.Our results (Figure 12) confirm their observations that increasing the number of channels has only a small impact of an average of 12 % (regardless of the database type) when going from a single-channel setup to a dual-channel setup.Any additional channel shows no noticeable further improvement in maximum throughput.

Block parameters.
New blocks are generated by the ordering service whenever the maximum blocksize is reached or the time period that has passed since the generation of the last block is longer than the blocktime.Varying the blocktime (with a fixed maximum blocksize of 1000) keeps the maximum throughput below 500 tx/s, which always makes the blocktime trigger a new block.This observation is because as long as the maximum blocktime is less than 2 s, no more than 1,000 tx will be operated within maximum block time.A small maximum blocktime (less than 0.25 s) implies low throughput since there is considerable overhead involved in creating, sending, and validating a new block.A positive correlation between block size and maximum throughput has already been observed by Thakkar et al. [42].For larger blocktimes, the workload related to transactions dominates and hence makes performance largely insensitive to blocktime.However, latency grows with blocktime, which makes perfect sense as it is always the associated timeout that triggers the creation of new blocks.It also makes a block timeout of around 0.5 s a sweet spot since increasing it does not further improve performance but increases latency, and decreasing maximum blocktime -though decreasing latencyalso heavily decreases throughput.For varying the blocksize, we get the same results, but with a "cutoff", which is the case because we used the default maximum blocktime of 0.5 s, which -considering that maximum throughput is around 500 tx/s when blocks become sufficiently large -becomes the actual trigger as soon as maximum blocksize is higher than 0.5 s • 500 tx/s=250 tx.For the low throughput tests on latency, i.e., at 50 tx/s for public transactions and a blocktime of 500 ms, blocks never get bigger than 25 tx, so for the latency chart, we see no changes in latency beyond 50 ms.See Figure 13 for an overview of the results.

Business Logic
5.4.1 I/O-heavy workload.First, we checked for the impact of maintaining larger data sets in terms of the state databases' keyspace size.We did not observe any relevant dependence on the keyspace's size for less than 10 5 keys (see Figure 14).Performance implications of very large keyspace sizes for LevelDB are given, e.g., in [3,12] -due to space restrictions, we consider this rather a property of the databases themselves than the Fabric network.
Manuscript submitted to ACM Furthermore, we checked for the sensitivity of the size of data written in a single transaction, both when the data is communicated via the client (data sent from the peer) and when it is already present on the peer (i.e., data created on peer as a result of the executing a smart contract).One critical observation is that -for adequate bandwidth as within a cloud data-center -it is not crucial whether large amounts of data for being processed are already available on the peer or sent via the client.Transactions with 10 bytes have around the same throughput as simple (public/private) transactions benchmarked before.Degradation for moving from 10 bytes to 1 kB only yields degradations of less than 10 % for CouchDB and less than 20 % for LevelDB.However, moving from 10 bytes to 100 kB degrades throughput by more than 85 % for public transactions (even 95 % in case the data is generated on the client; leading us to the conclusion that networking is particularly resource-intensive) and 75 % to 95 % for private transactions.Notably, the degradation of CouchDB and LevelDB is similar, except for private transactions with CouchDB.Here, throughput is already rather low for 10 kB.Consequently, while there is no significant difference between the creation of the data on the client (networking intensive) and peer (no additional networking) for 10 Bytes, it is 3x for 100 kB LevelDB public, private and CouchDB public.Only for CouchDB private, the difference is only 30 -50 %.For 1 MB, throughput is less than 10 tx/s for LevelDB and less than 3 tx/s for CouchDB.The maximum throughput in terms of data is at around 14 MB/s for the run with 100 kB packages.

Reading
Data.We first checked that the keyspace size does not impact less than 10 5 keys.Reading speed is only a reasonable number on a "per peer" basis since no other node is involved in a reading operation (except for cross-checks in cases the client does not trust its peer).For simple key-based queries on m5.large instances, we obtained around 400 reads per second on CouchDB (150 reads per second with complex queries) and around 750 reads per second on LevelDB.We used non-invoked queries, which do not lead to the Fabric transaction flow.Again, we use the standard configuration, consisting of 4 clients and 2 peers.Consequently, clients distribute requests equally between the peers.
Complex queries are only feasible on CouchDB.Here we could observe a massive difference between no indexing (this performs approximately as good as querying the total database and searching the value space afterward resulting in a low one-digit number of successful queries per peer and second) and indexing (which still allows approximately 150 reads per second and peer).Note that networks with high-performance requirements on reading processes should either go for multiple peers for scaling benefits or consider fetching the peers' data and maintaining another database.

CPU-heavy workload.
To test Fabric's performance on CPU heavy operations, we conducted matrix multiplications, implemented through simple nested loops, with different matrix sizes because this allows for quantitative control of the complexity.Please refer to Figure 15 for an overview of our findings.Multiplying two n× matrices requires O(n 3 ) simple operations (additions and multiplications) in our nested loop implementation.So for large n, we expect that the throughput should scale as 1  3 .Indeed, we see that the total number of operations approaches a saturation curve for large n, since for small n, also the Fabric-related overhead matters.For n=300, the performance of the network is still around 30 tx/s resp.15 tx/s for two resp.four endorsements.When comparing this to a matrix multiplication on a standalone Ethereum node, we found that the Ethereum Virtual Machine could not deal with a multiplication of a 90×90 matrix, and already multiplying a 30×30 matrix took almost one second.This emphasizes the significant performance improvements of Fabric when executing CPU-intensive tasks.
We also checked that -as one would expect -there is no difference between public and private transactions since there are no database operations.Moreover, for a stricter endorsement policy (4 out of 8), performance is approximately half the performance compared to a weaker endorsement policy (2 out of 8).This result is because, in total, there are twice as many computations for a single transaction.While the ratio of maximum throughput between the four endorsements and two endorsements case is 40 % for multiplying a 1×1 matrix, demonstrating the Fabric-related overhead, it is already 46 % for a 100×100 matrix and 50 % and, hence, the expected asymptotic value for 300×300.We also checked that for larger n, the performance of the CPU heavy task in Fabric is approximately the performance of the same computation in a standalone node.jsscript.

Delays.
To investigate the impact of network delays in a real-world but still general scenario, we defined groups within our default architecture, where each group represents an enterprise and consists of two peers, one orderer, and four clients.Within a group, we want to assume minimal network delays.This hypothesis is certainly optimistic for global enterprises, but in a large network, one might in fact choose the nearby peers within an organization for endorsement if speed matters.In a first attempt, we used the standard traffic-control (tc) tool under Ubuntu to set an artificial delay for any communication between the members of different groups.However, we noticed that the results obtained by imposing artificial delays became highly unreliable at high throughput, suggesting that when CPU Manuscript submitted to ACM usage or network traffic is high, tc does not operate correctly.Therefore, we started using deployments over multiple data-centers and set up a cross-European and global network.In particular, we set up groups located in Germany, Ireland, Italy, and Sweden for the European case with moderate network delays, and US East, Germany, Brazil, and Singapore for the intercontinental case with high network delays.Latency increases by 30-50 % from single datacenter to cross-European ( 30 ms one-way) and by more than 3x from a single data center to an intercontinental distributed system (up to 330 ms delay).In the intercontinental case, already at low throughput, transactions will, on average, take 1.2 seconds (public) to more than 1.7 seconds (private).Once the throughput approaches closer to maximum sustainable throughput, the latencies become even higher.A detailed topology of the network, including the network delays that we measured between each data center pair, is displayed in Figure 16.
While our initial simulation with artificial network delays imposed using tc suggests a decrease in performance by approx.50 % for CouchDB and 70 % for LevelDB (and significant standard deviation thereof) for delays of 50 ms, using the actual cross-data center deployments with real-world delays we find that for both LevelDB and CouchDB and for both public and private transactions, performance does not degrade that significantly in the intercontinental case (see Figure 17).This refines and confirms a statement in [1] according to which a cross-data center deployment of 100 nodes using five different datacenters (with unknown network delays), using LevelDB and public transactions, still offers high performance.For public transactions with CouchDB, for example, we find that maximum throughput drops from 426 tx/s for the single datacenter case to 376 tx/s in the cross-European case and 358 tx/s in the intercontinental case.
This corresponds to a drop by 12 % and 16 %, respectively.We can also readily observe that the performance decrease is less significant for LevelDB in the cross-European case, whereas for both CouchDB and LevelDB, the performance decrease of private transactions in the intercontinental case is considerable: For CouchDB, we observe a decrease in maximum throughput of 39 % compared to the single datacenter case, and for LevelDB, we see a drop by 26 %.
For a systematic investigation of the relationship between performance metrics and network delays, particularly for an analysis of latency, which we found to be (intuitively and empirically) much more sensitive on network delays than throughput, we had to adapt our benchmarking procedure.While the real-world deployments make it hard to vary network delays continuously, we found that the latencies in the real-world deployment are similar to the latencies that we observed using tc when staying well below maximum throughput in our measurements.corresponding experiments with artificial network delays imposed through the tc tool, we found that transaction latency seems to grow approximately linearly with the network delay; interestingly, the average slope in Figure 18 is approximately 15, which suggests that there are around 15 communications between different nodes leading to the observed network latency.It is apparent that avoiding communication paths that exhibit notable network delays is important for Fabric networks, which operate at high-performance requirements.For example, this can be achieved by weakening endorsement policies and preferring endorsers with low latency, or avoiding particularly large distances between ordering nodes.Close proximity between nodes, on the other hand, is at the cost of availability ("liveness") because stronger geographic localization increases the threat of correlated crash-failures, e.g., caused by blackouts.
5.5.2Bandwidth.We investigated the bandwidth that the different roles, i.e., orderers, peers, and clients, in the Fabric network require.We first noted that within the roles of peers and clients, inbound traffic is distributed very uniformly.
Moreover, the maximum requirement on download speed for peers, orderers, and clients is very homogeneous among each of these roles.The maximum values that we observed were at most as large as the respective maximum on Manuscript submitted to ACM outbound traffic.Since, additionally, upload speed is more likely a bottleneck than download speed, we will only discuss the requirements on upload speed in detail here.Figure 19 illustrates the dependence of outbound traffic for all roles in the network and different architectures.As expected there is a general linear correlation between throughput and outbound traffic for all roles.Thakkar et al. [42] already measured the download rate of a peer to be approximately 2.5 MB/s (and the download rate 0.5 MB/s) in their Fabric network.Regarding the upload rate of peers, we arrive at a similar order of magnitude for equally high throughput.
By contrast, we found the upload rate of orderers more heterogeneous, and it can become very large.More precisely, the RAFT leader requires a very high upload speed when the ordering service has many nodes.For n=64 orderers, for example, we observed an upload rate of more than 350 MB/s (recall that the maximum performance was independent of the number of orderers for up to 64 orderers, so upload is still not the bottleneck at least for deployment within a single datacenter with high networking capabilities).This is plausible, as the crash-fault tolerant consensus mechanism RAFT that Fabric uses for the ordering service has a two-phase commit.Thus, the complexity of network traffic, i.e., the number of sent messages, is in the order of n(n-1), and the leader needs to be involved in any of these messages.For the other orderers, the outbound traffic is one order of magnitude smaller.The charts in the second row of Figure 19 illustrate that the upload speed of non-leading ordering nodes mainly depends on the number of peers in the network, as well as on the number of endorsers per transaction, which both makes sense because they need to distribute new blocks to the peers.Transactions are larger when more endorsements (signatures) need to be collected.
The third and fourth row of Figure 19 suggests that this observation remains true also for the upload requirements of peers and clients.Moreover, for the clients, the linear interrelation between outbound traffic and maximum throughput is clear.The non-leading orderers' upload speed requirements are often about twice the requirement on the peers, which makes sense because, in our default scenario, there were twice as many peers as orderers.Moreover, the clients have only a very small requirement on outbound network speed.Please refer to Figure 19 for an overview of the results.5.6 Robustness 5.6.1 Temporal distribution of requests.We tested different temporal distributions of the requests (i.e., jitter).As illustrated in section 4, the DLPS sends transaction requests highly uniformly by default.We modified this to a stepshaped distribution to check for the queuing system's sensitivity and efficiency.Here, clients send transactions at the beginning of each second at a fluctuating distribution with notably more or fewer transactions per second (Δ ≤  2 ).In this scenario, we did not notice a considerable deterioration of maximum throughput and latency.This suggests that as long as queues do not become too large, the queuing process of Fabric is efficient.5.6.2Node crashes.As soon as a system transitions from testing to productive usage, its resistance and resilience against failures become extremely relevant.By operating multiple peers within one organization on physically separated nodes and using a blockchain per se, the negative impact of crashes and attacks in terms of data loss is already notably mitigated.Naturally, however, the impact of single nodes' failures on overall performance is also very important, since it might take some time until a failed node is reset and re-synchronized.By the different roles in the system, we expect different consequences of failures.We will only look into crashes here because malicious attacks need sophisticated and specialized implementations and -since we are in a private permissioned network -can also be traced down to the responsible parties and therefore disincentivized.We, therefore, recommend this topic for future work.Moreover, since we have always used enough clients to saturate the system and clients can easily be replaced on short notice (no need for synchronizing), we will not look further into clients' crashes.What remains, therefore, is looking at crashes of orderers and peers.Since the recommended ordering service is currently RAFT, which is crash fault-tolerant, we expected that crashing a single orderer does not significantly impact the performance.Figure 20  To check this, we set the network under stress at 400 tx/s, which is close to maximum throughput (and hence maximum CPU utilization).After 30 seconds of sending transactions, we crashed a single orderer and continued sending requests at the same rate for an additional 30 seconds.We see that overall the impact of crashing an orderer is indeed limited.However, it makes a considerable difference whether the crash is affecting the current RAFT leader or a non-leading orderer: In the case of crashing a non-leading ordering node, the ordering service stops distributing new blocks for around 5 seconds, and resumes with the previous speed thereafter with a newly selected orderer (Figure 20, chart on the left).If a non-leading orderer crashes, the impact on performance is negligible (Figure 20, chart in the center).If a single peer crashes, the performance drops by the rate of transactions that have needed the respective peer as an endorser.However, this is only the case because we restricted clients to requesting endorsements only from a fixed set of peers, which contains exactly as many peers as the endorsement policy requires.In this case, we used our default configuration with 4 orgs, each associated with 2 peers, and the endorsement policy requires 2 endorsements for every transaction.Consequently, every peer participates in 1  4 of all transactions, which explains the drop of throughput by 25 % after t=30 seconds.In a production-grade Fabric network, one would likely provide at least a few more peers to each client to compensate for crashes, which would then not lead to transactions that fail.However, the shift of the endorsement workload to another peer might decrease maximum throughput accordingly, namely to that of a Fabric network without the crashed peer.

DISCUSSION
Fabric is a highly customizable permissioned blockchain framework, allowing enterprises to adjust the network architecture to the requirements of their use case.While this ability allows for many optimizations, it also leads to complexity and requires in-depth knowledge regarding Fabric's design options and parameters.Together with existing research, this paper should help to understand better what metrics are particularly important when setting up a Fabric-based application.In general, we were able to reproduce many results from existing work.We extended our understanding of Fabric by using a benchmarking framework that is built on precise definitions of key metrics and testing many yet unexplored additional settings in a structured way.For example, we build upon the findings of Androulaki et al. [1] regarding the effect of network delays on the throughput of Fabric, but extend their result by comparing three different setups (no delay, continental, and inter-continental).Consequently, we showed how DLPS Manuscript submitted to ACM can be used to test for a wide range of variables, evaluating a blockchain-based system's potential performance prior to implementation.Table 5 depicts our findings by each impacting factor.

Architecture
Number of organizations, peers, and orderers The number of orderers does not influence overall performance in the regime of 1 000 tx/s and below.Adding peers to small networks while keeping the endorsement policy constant improves the performance.The number of organizations has only a limited impact on small networks (≤ 32 orgs).However, its effect increases with bigger networks due to how gossip dissemination works.
Endorsement policy A stricter endorsement policy (higher number of endorsers per org), ceteris paribus, reduces total throughput.It is possible to balance a stricter endorsement policy by introducing additional peers to keep the throughput stable.

Number of channels
The number of channels has only a minimal effect on the performance of the system.

Database location
The database location has a very limited effect on the performance of the system.

Setup
Hardware Better hardware scales well with the performance for less than 8 vCPUs.However, its impact diminishes for significantly larger numbers of vCPUs.

Database type
The database type has a high impact on the performance of the system.Depending on the actual setup, LevelDB is up to three times faster than CouchDB.
Block parameters Block time of around 0.5 s yields a particular sweet spot.Any addition of block time or block size, respectively, has only limited performance benefits but increases block latency.Below 0.5 s, the throughput decreases considerably.

Business Logic
Private data Public data is an order of magnitude three for CouchDB and an order of two for LevelDB faster than private data.

I/O-heavy workload
Once the transaction payload is beyond 1 kB, the performance decreases rapidly.
CPU-heavy workload CPU-heavy node.jssmart contracts work as fast as native implementations.
The endorsement policy, however, significantly influences the performance, as the redundancy of calculation rises.
Reading vs. writing Reading scales linear with the number of peers, considering the client trusts the peer, and no endorsement is needed.The performance of reading and writing commands do not depend on the index size.

Network Delays
The impact of network delays is very low, even considering an intercontinental network.The influence on private data is marginally stronger than on public data.

Bandwidth
The bandwidth requirements rise proportional to the number of nodes.Considering the RAFT setup, the leader node demands comparatively high upload bandwidth with an increasing number of orderers.

Robustness Node crashes
Fabric is very robust with regard to crashes.A crashing peer does not influence the overall network, despite its loss in endorsement power.In case a Raft leader crashes, it takes about 5s for the system to re-elect a new leader and continue normal operations.

Temporal distribution of requests
Small deviations in distribution do not impact the performance of the systems.Peaks beyond the maximum sustainable throughput can lead to undesirable congestion effects.
Table 5. Results of the benchmarking efforts by impacting factor.
Figure 21 depicts the summary of our measurement results.We see that the maximum throughput heavily depends on the type of transactions (reading operations, CPU heavy transactions, i/o heavy transactions, and simple write transactions) and the type of hardware.For homogeneous hardware (m5.large, for which we conducted most experiments), there is a clear correlation between maximum throughput and CPU utilization overall highly heterogeneous deployments.We see that both depend heavily on the kind of database used (LevelDB achieves higher throughput), the 0 1,000 2,000 3,000 4,000 5,000 6,000 0 type of transactions (private transactions achieve lower throughput), and network size (large Fabric networks have lower throughput).Therefore, these parameters should be particularly respected whenever conceptualizing the network architecture for a use case with higher performance requirements.Kannengießer et al. [22] describe various trade-offs that developers have to face when employing blockchain systems.
Our article investigates some of the described trade-offs and provides additional metrics to quantify them in the case of Fabric.In particular, our measurements of different Smart Contract methods, e.g., varying the complexity of matrix multiplications and the size of transactions, quantify the trade-off between transaction validation speed and complexity of operations.Similarly, by investigating private and public transactions in Fabric, we also quantify the trade-off between confidentiality and performance derived in the paper.Finally, our various performance measurements on different network sizes and topologies and varying endorsement policy quantify the dependency of performance on the degree of decentralization and, thus, security and availability.Since our experiments demonstrate that the ordering service is not the bottleneck in the investigated architecture, the trade-off between performance and security was hardly present.
The solo orderer, which lacks any crash or byzantine fault tolerance, provided about the same overall performance as the crash fault-tolerant RAFT ordering service.It will be interesting to see whether using a byzantine fault-tolerant ordering service, which will be provided in the future, will have any impact.
We focused on a subset of interesting factorial combinations as the tremendously high degree of freedom of a Fabric makes it unfeasible to test all possibilities.We settled on a standard configuration and then changed single parameters to identify their influence on performance.This drawback leads to the restriction that when moving away too far from our testing scenario, the results might be different from ours.For example, Thakkar and Nathan [41] suggest that some characteristics might change for very strong servers.Therefore, this work is to be understood as orientation for the potential of Fabric, but not as a strict reference for all possible cases.Hence, we suggest conducting specific evaluation Manuscript submitted to ACM

Fig. 7 .
Fig. 7. Different architectures in comparison.The configuration for the default setup is described at the end of section 4.

Fig. 10 .
Fig. 10.The effect of separating the ordering nodes and the database for CouchDB.

Fig. 13 .
Fig. 13.Comparison of different block times and block sizes.

Fig. 16 .
Fig.16.Network topologies and corresponding network delays (one-way) used for determining the impact of network topology on maximum throughput.

Fig. 20 .
Fig.20.Impact of crashing leading or non-leading orderer nodes and peers (at t=30 s) on performance.

Fig. 21 .
Fig. 21.Summary of all measurements and the overall most important design parameters.

Table 4 .
The fully automatic setup with the DLPS takes at least about 10 minutes, and a reasonable test takes about an hour -reducing the duration of a test increases the variance of the results and tended to overestimate the true performance in our tests.Modifying network parameters generally requires restarting the blockchain completely.Consequently, we decided to use a small network (4 orgs, each with 2 peers, one orderer, and 4 clients) with AWS m5.large instances as default, and to vary different individual or small subsets of parameters starting from this default to keep costs and time bounded.Figure5illustrates the most important remaining default parameters for the Fabric architecture, and Figure6gives an overview of the benchmarking settings.Therefore, our default configuration comprises 8 peers and 4 orderers, as well as 16 clients, in a one-channel network with RAFT consensus.At the start of our experiments, the latest Fabric version was v2.0, so we conducted all experiments with this version.However, we also made some spot checks when v2.2 was released, noticing no significant performance changes.The remaining parameters are described in detail in the dedicated DLPS repository[13].In total, our experiments involve approximately 2,000 hours of testing, setting up approximately 1,500 Fabric networks with a total of around 20,000 nodes and 40,000 clients, and sending more than 200 million transactions.In this process, we also collected 100 GB of log files, including sending and response times of each transaction and resource stats such as CPU, memory, disk usage, ping, and traffic for each node and client.

Table 4 .
Used instance types in the AWS m5 series.They all base on Intel Xeon [2]latinum 8175M processor (up to 3.1 GHz), and we added 16 GB of SSD storage.As the operating system, we used Ubuntu 18.04 LTS.Source:[2] Fig. 11.Different instance types in comparison for simple public and private transactions with CouchDB and LevelDB.
Fig. 12.Using multiple channels with varying hardware for simple public with CouchDB and LevelDB.
By conductingFig.17.Maximum throughput for single datacenter, cross-European, and intercontinental Fabric networks.