Taxonomy of Centralization in Public Blockchain Systems: A Systematic Literature Review

Bitcoin introduced delegation of control over a monetary system from a select few to all who participate in that system. This delegation is known as the decentralization of controlling power and is a powerful security mechanism for the ecosystem. After the introduction of Bitcoin, the field of cryptocurrency has seen widespread attention from industry and academia, so much so that the original novel contribution of Bitcoin i.e. decentralization, may be overlooked, due to decentralizations assumed fundamental existence for the functioning of such cryptoassets. However recent studies have observed a trend of increased centralization in cryptocurrencies such as Bitcoin and Ethereum. As this increased centralization has an impact the security of the blockchain, it is crucial that it is measured, towards adequate control. This research derives an initial taxonomy of centralization present in decentralized blockchains through rigorous synthesis using a systematic literature review. This is followed by iterative refinement through expert interviews. We systematically analyzed 89 research papers published between 2009 and 2019. Our study contributes to the existing body of knowledge by highlighting the multiple definitions and measurements of centralization in the literature. We identify different aspects of centralization and propose an encompassing taxonomy of centralization concerns. This taxonomy is based on empirically observable and measurable characteristics. It consists of 13 aspects of centralization classified over six architectural layers Governance Network Consensus Incentive Operational and Application. We also discuss how the implications of centralization can vary depending on the aspects studied. We believe that this review and taxonomy provides a comprehensive overview of centralization in decentralized blockchains involving various conceptualizations and measures.


Introduction
Since the introduction of Bitcoin in 2009, blockchain technology has seen a proliferation of scholarly articles investigating the potential and limitations of the technology [1,2,3,4,5,6,7,8,9,10].Control over the system is a focal point in a significant proportion of these studies, as this either enhances or restricts the usability of blockchain [11,12,13,1,14,15,16,17,18,3,6,19,20,21,22].Indeed, removing central control from the monetary system The removal of trusted entities from a distributed system makes a public blockchain attractive to numerous potential users in academia and industry [3].Public blockchain-based cryptocurrencies have a market capitalization of over $200 million [27], making the platform a lucrative target for malicious actors.The majority of these blockchains use decentralization as a security mechanism.In a decentralized system, the malicious actor would need to compromise half of the consensus power before causing significant harm to the system [28].Because of this interplay between decentralization and security, it is highly desirable to have a high degree of decentralization in public blockchains.The security of a public blockchain has been thoroughly investigated in research [23,29,30,31].For example, Bitcoin has been reported as secure, subject to its adherence to the honest majority assumption, with notable exceptions such as selfish mining attacks [32] where the attacker only needs to control over 26 % of the network.
Even though the initial implementation of Bitcoin was able to circumvent the need for centralization in the system, new avenues of centralization are surfacing [12].Numerous studies have reported various forms of centralization in Bitcoin and other decentralized cryptocurrency systems [16,15,14,12].These reports of a trend towards centralization have raised security concerns as the security guarantee of a public blockchain is inherently dependent on the honest majority assumption [13].Trusting the probabilistic security guarantees of a public blockchain has often been identified as a barrier to entry in the ecosystem [33].Understanding more fully the security implication of centralization will aid the process of public blockchain adoption.
The threats of centralization range well beyond security into adoption, and even crypto-economics [34].The decentralized nature of bitcoin permits the uncensored execution of transactions in the payment system irrespective of political or geographical associations.Centralization may threaten the uncensored nature of the decentralized blockchain.Thus, it is crucial for the security and, consequently, the utility of public blockchain systems that they remain adequately decentralized.
Given the significance of decentralization, several studies have analyzed technical aspects [15,14,12] as well as social constructs of decentralization [16].By far, the most commonly measured aspects of centralization is the consensus power concentration [16,15,14,12,17].In a Proof-of-Work based blockchain solution, the individual participants' consensus power is defined by their computational power in proportion to the total computational power of the network.However, this measurement mechanism is only useful in determining the present state of the computational power portions of the network.It fails to capture the multitude of factors that may constitute the overall centralization of the system, such as system governance [7], wealth concentration [35], and geographic distribution of participants [14].
To better understand the semantics of decentralization in blockchain, we intend to measure it on all building blocks of the public blockchain.As reported by Wang et al. (2017) [36], the governance structure of the blockchain can have a profound impact on the operations of a public blockchain but is often overlooked as a potential source of centralization.The issues caused by centralization of governance include the long-discussed issue of block size in Bitcoin [37] and specific instances of unilateral decision making regarding forks in Ethereum [38].Consequently, we reason that we need a vocabulary to discuss and measure centralization in a more holistic manner.
To allow for such modular measurement of centralization, we review the generic architecture [18] of blockchain and use it to identify potential avenues of centralization, via a literature review of the field.Focusing on the generic architecture enables us to capture centralization-causing factors that are not implementation-specific, i.e., the same model may be used for both Bitcoin and Ethereum.We also use the generic architecture to partition different centralization concerns into architectural categories such as consensus, network, and application.This abstraction allows us to organize and observe centralization holistically.Thus, in this work, we present the first in-depth analysis of centralization in blockchains to assess the following questions: RQ1: What are the different aspects of centralization in public blockchains?RQ2: How can centralization be adequately measured in a decentralized blockchain instance?To study decentralization in blockchain, we coded and analyzed the content of relevant blockchain literature.We chose ten years subsequent to the publication of the original Bitcoin white paper [39].The survey process was primarily driven by the guidelines provided by Kitchenham et al. (2004) [40].In adherence to the guidelines, we conducted a five-step systematic literature review consisting of Search, Selection, Quality Assessment, Data Extraction, and Analysis.This systematic literature review produced the final article pool of 89 articles.These final articles, partitioned by architectural components, form the basis of the taxonomy proposed in this review.
Following the development of the taxonomy, we interviewed industrial and academic experts in the blockchain domain to establish the completeness of the taxonomy and to assess any redundant or less relevant components of the taxonomy.This consisted of ten expert interviews: four academic researchers and six industry experts.It resulted in an iterative refinement of the taxonomy.
The paper makes the following contributions: • We systematically review the existing literature to document the different aspects of centralization in public blockchains (Section 3).• We outline the different techniques employed in the literature to measure centralization (Section 4).
• We manifest the findings of our review in a conceptual taxonomy that encompasses both categorization and measurement of different aspects of centralization in public blockchains (Section 4).• We illustrate the relevance and utility of this taxonomy by presenting the centralization state of the two most prominent blockchain instances: Bitcoin and Ethereum, based on this taxonomy (Section 5).We also discuss how the adverse impact of centralization varies depending on aspects (Section 6).• We identify research gaps specifically with regards to the lack of non-Bitcoin-specific centralization investigations.We also report on the lack of objective metrics for some centralization causing factors.

Background
The term blockchain is often used as a generic descriptor for the broader field of Distributed Ledger Technologies [41].Distributed ledger technology refers to the distributed computing networks that record, share, and synchronize data across many participants.More specifically, Blockchain is a type of data structure used to record data on these distributed computing networks.It is a chronologically linked list of data packets received by the participants within a predefined time period.These blocks are connected in a chronological order to form a chain of blocks.The link between these blocks is secured by the use of a computationally hard cryptographic hash function based puzzle [39].
As the chain of blocks grows, the difficulty involved in recalculating the puzzles also grows to make any alteration to past data expensive.This growth in difficulty leads to a deterministic guarantee of data immutability.
The participants of the blockchain-based network have to reach consensus on a single state of this append-only structure.Blockchain-based systems utilize a peer-to-peer distributed system with a clever incentive mechanism [20] to accomplish this consistency of data in an unconstrained distributed environment.Proof-of-work (PoW) and Proof-of-stake (PoS) are two prominent examples of consensus mechanisms used in blockchain-based systems.In PoW, the participants are expected to perform computationally expensive operations to solve a puzzle.The first participant to solve and propagate the solution to a majority of the network is rewarded.PoW is often criticized for the extensive use of electricity [42].This issue of electricity usage is addressed in PoS, where the reward distribution is based on the monetary assets of the participants [43].Other notable consensus algorithms include Proof-of-Authority, Proof of Elapsed Time, and Delegated Proof-of-Stake; we refer the reader to [44] for an in-depth review of consensus algorithms.
As discussed earlier, based on the type of consensus mechanism deployed and the constraints imposed, we can segment blockchain-based systems in three broad categories: Public, Private, and Consortium.In private and consortium-based blockchain systems, the participation in consensus is limited to users approved by a trusted authority.However, in Public blockchain systems, the participation in consensus is open to any individual with appropriate computing and networking capabilities.This unconstrained access to controlling power for all participants in the network is referred to as decentralization.Bitcoin and other public blockchains establish consensus on the blockchain through a decentralized, pseudonymous protocol.This protocol can be considered a core innovation and possibly the most crucial ingredient to the success of public blockchains [23].

Decentralization and Public Blockchain
Decentralization is an essential property of public Blockchain systems where participants can read, write data, and contribute to consensus without authorization [9].In this subsection, we review the existing discussion around decentralization in the blockchain.
Consensus on the state of data in a public blockchain is attained by the acceptance of a valid block by the network in a predefined time interval.To deter malicious participants from accepting fraudulent blocks, the majority of the control must be decentralized.This decentralization of control ensures that the blockchain is secure from malicious participants as long as the majority of the network remains honest.This interplay of security and decentralization makes it fundamental that the system remains decentralized.
A survey paper by He et al. (2017) [10] identifies decentralization, among other features, as a prominent reason to adopt blockchain technology for business applications.This view is supported by numerous studies which demonstrate the application of decentralized Blockchains to the liberalization of financial asset management [45], the Internet of Things [46,47], healthcare [48] and smart cities [49].The extent of literature surveyed by these review articles demonstrates the significance of decentralization in blockchain applications.
As decentralization is core to the secure functioning of public blockchains, it may be taken as a fundamental given.This assumed association between decentralization and public blockchains may be a vulnerability that malicious actors attack.Security research on the blockchain has focused on the assumption of an honest majority.A survey paper by Li et al. (2017) [50] identifies the centralization of consensus power as a significant security threat to that network.Centralization of consensus power is intrinsic to attacks on the public blockchain, such as the 51% attack [51] and Selfish Mining [32].
In the 51% attack, the attacker is assumed to have gained control of more than half of the consensus power, which can then be used to enter fraudulent transactions in the blockchain.Unlike the 51% attack, in selfish mining, the attacker only needs to control 26 % consensus power to cause harm to the network [13].More detail on the security of blockchains is provided in Zhang et al. (2019) [18].
Studying blockchain as from solely a technical perspective may be misleading due to the inherent socio-technical nature of the blockchain [52].As the study of centralization in public blockchain is still fragmented, current conceptual models, such as security and privacy models, do not provide adequate insights.To overcome this limitation, we devise a novel centralization taxonomy focusing on the different architectural layers of blockchain to categorize centralization concerns.We employ a two-step research approach, first conducting a systematic literature review to construct a taxonomy of centralization, and refine this further through expert interviews.

Architecture of Public Blockchains
The first public blockchain, Bitcoin, incorporated the blockchain data structure and consensus mechanism in-depth, but omitted any formalization of the networking structure [39].Since the introduction of Bitcoin, numerous attempts have been made to describe the structure of public blockchains more formally.
Some of these attempts have been aspect-specific with a microscopic focus on one or a few components of the blockchain.For example, Garay et al. (2015) [53] describe the architecture of blockchain in terms of consensus mechanisms and participants of the network.Another notable description of blockchain architecture is given by Gervais et al. (2016) [54], who focus on security and scalability by describing consensus and a peer-to-peer network.
Since the aim of our review is to analyze public blockchains more holistically to capture the factors causing centralization, we adhere to a more generic description of blockchain used by Zhu et al. (2019) [47] and Zhang et al. (2019) [18].In this generic description, the authors propose a layered architecture of blockchain.As a blockchain is a peer-to-peer distributed network, it is intuitive that blockchain systems will share many similarities with a generic, distributed computing architecture, such as the traditional OSI layered model of a network [55].This layered architecture, illustrated in Figure 2, describes how the data is stored (Data Layer) and shared (Network Layer) between different participants of the network.Once the data is shared with peers in the network, the network is tasked with agreeing a single view of the data (Consensus Layer).Public blockchains attain consensus in the network by incentivizing non-malicious participants using an incentive mechanism (Incentive Layer).Incentive and consensus operations are performed by the execution of computational scripts (Contract Layer).The computational capabilities of a blockchain are not just limited to these two operations; many different applications can be built on top of the blockchain such as cryptocurrencies and decentralized applications (DAPPS) (Application Layer) [56].
In the following subsection, we describe these layers in-depth:

Data Layer
The data layer contains the definition of the data structure used by the system, including how transactions are stored, thus encompassing the transactions component proposed by Bonneau et al. (2015) [23].Other data layer components include the cryptographic primitives employed on the blockchain.The network participants must adhere to the data layer specifications to participate in the network, i.e., use the same protocol to communicate.Application layer blockchain clients implement these specifications for the end-user.

Network Layer
The network layer specifies the behavior of the nodes (network participants) in a distributed network.This behavior includes the network connection establishment and intercommunication mechanism.The network layer is responsible for the discovery of other nodes on the network and for efficient communication among nodes.The network layer serves as the information dissemination mechanism of the system.This network layer is identical to the network subsystem in the structure proposed by Judmayer et al. (2017) [19].

Consensus Layer
Once the participating nodes are connected in a predefined topology, the next step is to generate blocks to contribute to the growing ledger.As all the participating nodes are tasked with the creation of the next block, it is crucial that the network can agree on a single state of the ledger.The aim of the blockchain network is to deterministically agree on a single state of the data.The consensus layer assures that the network reaches a consensus with a certain degree of assurance.

Incentive Layer
This deterministic assurance is based on the assumption of an honest majority i.e. the network has at least higher than 50% non-malicious participants.Blockchain systems use incentive engineering to ensure that the majority of the network is honest [13].This incentive is often in the form of a block reward which is assigned to the node that successfully adds a new block to the blockchain.The incentive layer describes the mechanism used for issuance of reward and the distribution of reward.This layer acts as an interface between the user-facing layers and the technical implementation layers.

Contract Layer
To process transactions in the network, Bitcoin uses a scripting language called script [57].This scripting language is significantly limited in terms of functionality as it lacks Turing completeness [58].One example of this is the lack of loops in Script.Despite the lack of such functionality, the scripting language serves as the building block of Bitcoin cryptocurrency, enabling complex financial transaction processing.
The limitations on the scripting language of Bitcoin served as a motivation for Ethereum's developers [59].Ethereum implements a Turing complete computing engine on top of a distributed blockchain.Applications on top of the blockchain exploit this programmable nature of blockchain.

Application Layer
Public blockchains provide a mechanism that can be used to interact with and run user-defined code on the computing engine provided by the contract layer.JSON Http API is an example of one such public API provided by Ethereum [60].These public APIs serve as an interface between different Broker-Dealer services such as Wallets and Exchanges and the blockchain.These services are primarily used by end-users to interact with the blockchain [61].
3 Methodology In this section, we describe the research methodology employed for our systematic literature review (SLR) of blockchain through which we sought to provide a more cohesive overview of centralization in public blockchains.We follow the SLR guidelines proposed by Kitchenham et al. (2004) [40] to identify the factors associated with centralization.We then use a classification scheme based on the generic architecture presented in Section 2.1 to map the identified factors and associated measurement techniques.This mapping is loosely based on the approach proposed by Petersen et al.
(2008) [62].The mapping of obtained data to the generic architecture produces an initial taxonomy, which we then refined by conducting ten expert interviews to improve the taxonomy.This process is graphically illustrated in Figure 3.

Systematic Literature Review
The systematic literature review guidelines suggested by Kitchenham et al. (2004) [40] span four phases: • In the first phase, we define the two primary research questions for the review and produce relevant keywords for the subsequent search.• In phase two, we systematically extract relevant articles from leading research repositories.We filter the resultant articles through a manual review of titles and abstracts.• In phase three, the shortlisted articles are then used for data extraction, which is driven by an extraction protocol.• In phase four, we perform the mapping of the data extracted from phase three to the generic architecture presented in Section 2.1,leading towards an initial taxonomy of centralization in public blockchains.
Figure 4 illustrates the literature review employed in the study in more detail.The primary aim of our review is to provide richer insight into the different types of centralization present in public blockchain.We also identify techniques used to measure these aspects of centralization quantifiably.This will inform the development of our initial centralization taxonomy of public blockchains.We define the research questions of our study as follows: • RQ1: What are the different aspects of centralization in public blockchains?
• RQ2: What techniques are employed to measure these centralization aspects?
Regarding RQ1, if a paper presented a novel centralization-causing factor, it is mapped to the architecture.If our generic architecture cannot accommodate the identified factor, we modify the architecture.This process is repeated for every novel factor identified.If a paper identified a factor already present in our taxonomy, we retain the reference to the article, using number-of-articles to define a proxy for the significance of that particular factor.
For every identified factor, we also recorded any measurement technique used to quantify the factor.If multiple papers employ different measurement techniques for a single factor, we retained all measurement techniques.
These research questions form the basis of article identification and selection, as they define the relevance of a particular article to our review.As we aim to capture factors from different socio-technical aspects of the blockchain, we conducted an exhaustive search on the following leading digital repositories: Google Scholar, ACM Digital Library, IEEE Digital Library, ISI Web of Science, Science Direct, Scopus and Springer Link.These repositories provided us with access to a wealth of articles, including gray literature.
Having identified the search repositories, we formed the search query.We adopted a systematic approach to keyword generation to form the search query: 1. Initial set of keywords: We formulated an initial set of keywords for the search consisting of "Blockchain" and "Centralization" with the following synonyms and alternate words: Blockchain: bitcoin, ethereum, blockchain, cryptocurrencies, cryptocurrency, distributed ledger, DLT, Merkel tree, smart contract platform, tokenized asset.Centralization: centralisation, centralism, consolidation, decentralisation, decentralization, devolution, dominating, domination, managed, monopolisation, monopolization, monopoly, singular, unipolar.2. Text Corpus Creation: Complementary to the initial set of keywords, we also reviewed existing studies on centralization to extract more relevant keywords.We selected the two most cited relevant studies from Google Scholar [14,12].We performed forward, and backward snowballing on these two articles and generated a list of the most used keywords from this set.We selected the top 5 keywords from this set.This leads to the inclusion of "digital currency" and "oligopoly" to our initial set of keywords.
The resultant queries from query formation step are present in Appendix.

Phase 2: Article Search and Selection
Given that decentralization is fundamental to a public blockchain, we expect that the search will return a high number of articles.We implement a filtering process to limit the search to relevant articles.We restrict our search to articles published in English after the introduction of Bitcoin in 2009.We refrain from treating citations as a proxy for quality to filter articles, as it has been questioned in the past [63].
After the execution of a search query, Google Scholar returned the highest number of articles with 4,380 results.However, due to the restrictions imposed by Google Scholar, we can only retrieve the first 1000 most relevant articles [64].After applying the language and publication date constraints, we retrieved 982 articles from Google Scholar.We also retrieved additional 2737 articles from all other sources resulting in a total of 3728 articles.All of these articles were cross-checked to identify duplicate entries.After the removal of duplicate articles, the final set contained 3572 articles 1 .
Due to the high number of articles, we first analyzed the title and abstract to establish relevance.This was based on explicit inclusion criteria.The shortlisted, relevant articles were then scanned further to assign a quality score.These shortlisted articles were assessed for quality with regards to our research questions.To ensure that the assessment process is reliable, we followed the inclusion criteria for titling, abstraction, and full-text screening.This process obeyed the following inclusion criteria: 1.The paper's title mentions centralization, or any of the synonyms mentioned above, or is potentially relevant to the study of centralization.2. The abstract is relevant to the identification or measurement of centralization-causing factors.
During the review of the title, we tried to avoid eliminating articles that might have some relevance to the topic of centralization.This relevance was evaluated by the review of the abstract.We excluded articles that did not pass both criteria.
The first author conducted this analysis.To test for reliability, we performed cross-validation by following Fleiss al. (1973) [65].We specifically use the guidelines proposed by Sim al. (2005) [66] for the calculation of sample size.We Table 1: Quality Assignment Matrix Attribute No Yes 1. Centralization Factor Identified 0.0 1.0 2. Factor Measurement Technique Proposed 0.0 1.0 select 89 articles with a confidence level of 95% and a margin of error of 10%.This sampling contained an equal number of accepted and rejected articles by the first author to eliminate the possibility of only sampling accepted or rejected articles.The second author was then tasked with the evaluation of these 89 articles based on the guidelines provided above.Results from the cross-validation suggest that both the reviewers were in almost perfect agreement over the acceptance and rejection of the articles with the Cohen's Kappa2 exceeding 0.8 [67].
Using this process, we retrieved 212 relevant articles for our study.Subsequently, we performed quality assessment of these articles by conducting full-text review.We assigned a quality score between 0 to 2 based on the relevance of the article to our research question.Table 1 outlines the assignment matrix employed for quality assessment.
We reviewed each article on two attributes -1) factor identification and 2) measurement techniques used.If an article identifies a novel centralization-causing factor, we assign a score of 1.0 for Attribute 1. Articles that do not identify a novel centralization or refer to already identified factors are assigned a score of 0.0 for Attribute 1 3 .
We follow a similar quality assignment scheme for Attribute 2, where we assign a score of 1 for the identification of a novel measurement technique.Articles not proposing or using any existing measurement techniques are assigned a score of 0.0 for attribute 2.
To ensure that the quality assignment process is reliable, we again perform a similar reliability test but with a smaller data set of 9 articles.We observe that both the reviewers (first and fourth authors) agree on eight score assignments with one score difference for the ninth article.This disagreement is resolved when the article is reviewed by the third author.This filtering process resulted in a set of 89 articles.These articles are used in the third phase of our study: Data Extraction.

Phase 3: Data Extraction
Having identified relevant studies, the next step is to extract relevant data from them.For this purpose, we design a protocol to analyze the articles towards the development of an initial taxonomy of centralization.In this context, we focused on the factors identified and measurement techniques proposed or used.We reviewed all of the shortlisted articles to create a list of factors and associated measurement techniques.The extracted data from this step serves as a building block for our taxonomy.

Phase 4: Development of Initial Taxonomy
As we aim to structure the findings of the review in an initial taxonomy, we use the data extracted in Phase 3 and map it to appropriate layers in the generic blockchain architecture.We repeat this process for all identified factors; if a factor cannot reasonably be mapped to the existing layers, we typically refine the architecture by including an additional layer.This iterative refinement results in a blockchain architecture specific to the study of centralization.Results from this mapping analysis are illustrated in Figure 5. Out of all shortlisted articles, 63 considered the consensus layer as prone to centralization, the highest reported count for any layer in our survey: This is represented in Figure 5 by the size of the bubble, but we discuss these results in more depth in Section 4.
To further validate the initial taxonomy and refined architecture, we conducted interviews with industry and academic experts.

Interview with experts
The initial taxonomy, as referred to in Section 3.1, is based on the review of existing literature.To raise confidence that the initial taxonomy proposed by the study provides relevant coverage and is accurate, we further refine and validate it by interviewing experts.To identify experts in the blockchain field, we relied on the epicenters of the bibliographic map generated by [68].We approached 112 active researchers based on their prominence determined by their location on the bibliographic map.Out of 112 researchers approached for the study, we received a response from 10 and subsequently interviewed them.We interviewed four academic experts (I 1 to I 4 ) and six experts from industry (I 5 to I 10 ).Interviews were typically one hour in duration and involved open-ended questions 4 .These open-ended questions were designed to: 1. Extract the view of the expert on centralization and the significance of it in their respective field, i.e., security, economics, information systems, and industrial application.2. If needed, refine the taxonomy and/or the architecture.3. Validate the generic architecture of the blockchain used in this study (Section 2). 4. Assess the accuracy of the initial centralization taxonomy.
The transcripts of these interviews are available in anonymized form 5 .These transcripts are color-coded based on the relevance of the conversation to factor identification and measurement 6 .

Taxonomy of Centralization of Public Blockchain
In this Section, we map the results of the systematic review, and the interviews with experts, to the initial taxonomy of centralization outlined in Table 2.
As discussed in Section 3.1, this generic architecture is refined to reflect the centralization-related aspects of the blockchain better.To this end, we refined the generic architecture by removing the Data and Contract layers as none of the surveyed articles suggested any centralization aspects for either of these layers.As can be seen from Table 2, on average two centralization factors were identified for each resultant layer.As is also presented in the table, there are some factors for which there are no proposed measurement techniques (for example 'Wallet Concentration').We also note that the existing generic architecture was unable to capture governance-related aspects of the blockchain system.For example, as blockchain systems evolve, it is crucial to have a mechanism to handle improvements such as security patches of the system.We account for the governance-related aspects of centralization by including a Governance Layer.
Another set of centralization causing issues that the generic architecture does not capture are associated with the operation of a node on the network.These issues include the computational requirements for participation, such as proprietary hardware and storage.In accordance with the recommendation of interviewee I 10 , we include an Operational layer to represent the centralization associated with operating as a node on the blockchain.  2 from the perspective of 'prevalence-of-occurrence' in the literature and the interviews, where prevalence is considered as a proxy for whether the factor is "established" or not.The literature references in the table identify that particular factor as a potential source of centralization 7 .The interviewer identifiers are used to indicate explicit recognition of the factor as a contributor to centralization in the associated interview.Interestingly, based on the data presented in this table, most of the factors can be considered well established, with the possible exception of Bandwidth Concentration and Routing Centralization.Even though Node Discovery Protocol Control was only referred to by one academic article, the majority of interviewees perceived it as a relevant factor.
Based on our taxonomy, we define centralization of public Blockchains as the process by which one or more architectural dimensions (aspects) of the Blockchain are restrictive to the majority of participants by direct or indirect economic, social, or technical constraints.We report a total of 13 aspects spread over six architectural layers.The governance layer aims to capture the social constructs of building and maintaining a public blockchain, specifically reporting on the incentives to build (Owner Control) and maintain a public blockchain (Improvement Protocol).The governance layer feeds into the economic aspects of the Blockchain in forms of incentives, this is captured by the Incentive layer, where we review the wealth inequality (Wealth Concentration).This inequality is in part caused by the technical constraints of participation ranging from Networking aspects such as bandwidth and routing requirements to operational requirements such as storage and specialized pieces of equipment for participation.These higher storage and specialized equipment requirements restrict participation in the consensus, which is observable in the consensus layer.We also report on the centralization of end-user applications such as wallets and exchanges.The following subsections discuss the taxonomy in detail.

Governance
Blockchain, like any other information system, is subject to evolutionary changes that are governed by a governance structure.These evolutionary changes may include security patches, scalability provisions, and improvement proposals.Wang al. (2017) [36] theorizes the relationship between the value proposition of blockchain and the governance structure in place.They reason that the core value proposition of blockchain is rooted in decentralization.This property of decentralization is considered valuable by investors.
Decentralized governance was also indicted as a vital component of public blockchains by our interview participants.80% mentioned governance as a significant centralization threat (I 1 ,I 2 ,I 3 ,I 4 ,I 5 ,I 7 ,I 8 ,I 9 ).This is best illustrated by a quote from I 1 , with respect to the implication of centralized governance structure: "if you are talking about the centralization of governance, that for me is the prime example of a private permissioned Blockchain".Despite the significance of decentralization for blockchain, Wang al. (2017) [36] argue that a high level of decentralization may slow down the strategic decision-making process.Contrary to the proposition in favor of some centralization by Wang al. (2017) [36], Gervais al. (2014) [12] argue against the concentration of decision making power by pointing out instances of unilateral decision making by core developers in the short history of bitcoin; for example, when the core developers unilaterally decided to lower the minimum transaction fee.This criticism of governance centralization is shared by Roubini (2018) [69] who criticizes the centrality of control over governance as it may concentrate the decision power to a few entities involved in governance of the blockchain.Atzori (2015) [70] expands the analysis of blockchain governance issues towards the emergence of blockchain governance oligarchy.Azouvi al. (2018) [16] conducts an empirical analysis of two of the most prominent blockchain projects, Bitcoin and Ethereum, by comparing the state of governance to other major open-source projects.They conclude that control governance is usually concentrated in a handful of people in Bitcoin and Ethereum, which is a big centralization factor.
As reported by Wang et al. (2017) [36], the centralization on the governance layer may not be detrimental due to the advantages of rapid strategic decision-making.We expand on the argument in favor of some centralization [36] in Section 6, where we discuss how the adverse impact of centralization varies across the different layers of the taxonomy.
Based on the literature review and subsequent interviews, we further divide the issue of governance into owner control and improvement protocol.These results are presented in Table 4.

Owner Control
As described by Wang et al. (2017) [36], the developers of the blockchain often retain some control over the implementation on the governance level.This can be in the form of, for example, the native cryptocurrency owned by the developers.Wang et al. (2017) [36] describes this as Owner Control.
Measurement Technique : This type of owner control can be measured by examining the total cryptocurrency accumulated by the owners in the early adoption period [71].This early adoption period also includes the pre-mined8 cryptocurrency [36].We report studies such as [71,35] that have implemented a proportional measure to quantify owner control.Owner control can be measured as the fraction of the total allowed cryptocurrency if the supply is capped, as measured by Equation 1, where C OwnerControl represents the fraction of total cryptocurrency that the owner controls.
If the supply is uncapped, owner control is measured as the fraction of total currency in circulation, as illustrated in Equation 2.
Most interview participants indicated that the use of fractional measurement for owner control was appropriate.However, I 9 suggested a refinement: "The fractional calculation of the owner control varies with the supply; a simpler approach might be to use a metric such as how much power over the network can be achieved with the money in the owner control.How much hardware can you afford, and what hash power can you get with it.Relating the cryptocurrency to the hashing power would be more informative".
Implication of high owner control : Depending on the consensus mechanism used, the owner control has severe impacts on the network.This adverse impact is particularly worrying in the case of Proof-of-stake based cryptocurrency, where the consensus power is determined by the quantity of native cryptocurrency owned by the participant.Having a large amount of pre-mined or early adoption period accumulated cryptocurrency will give the owner a significant advantage over others, resulting in a more centralized network.This high consensus power pose a security threat as an owner with over 50% consensus power can conduct a double spending attacks.Ethereum is a prime example of such wealth concentration due to pre-mined cryptocurrency.
The Ethereum platform was crowdfunded by investors who were rewarded in the form of ETH 9 during the creation of the first block in Ethereum.An estimated 60 Million ETH were distributed among the early investors; another 12 Million were distributed among the developers of Ethereum [72].We calculate the value of C OwnerControl by considering the 12 Million pre-mined ETH that developers control and the total current supply of ETH obtained (from [73]): C OwnerControl = 12, 000, 000/106, 514, 407.78 = 0.11 It should be noted that the value of C OwnerControl feeds into the issue of Wealth Concentration, which is a significant cause of economic centralization.A high wealth concentration in a cryptocurrency is against the founding principle and premise of cryptocurrency providing a more even monetary system.This can consequently disincentivize the adoption.

Improvement Protocol
As discussed earlier, evolutionary changes require blockchains to have a robust governance structure in place.As decentralized blockchains do not have any authorized entities moderating the changes, the process of moderation is delegated to the participants.Bitcoin improvement protocol (BIP) is a prime example of such an improvement system [74].The formal voting protocol, such as that in BIP, is used to establish consensus over proposed changes, often through voting.60% of interview participants (I 1 ,I 2 ,I 3 ,I 4 ,I 5 ,I 7 ) mentioned that the improvement protocol performs an essential function in the network with I 7 suggesting: "Whoever controls the improvements will inevitably shape the future of the network".
The literature review points out the similarities between the Python Enhancement Proposals and BIPs, both of which heavily draw from the "canonical" approach to consensus [75].In the "canonical" based BIP, all the suggested changes have to be made available to the public for open discussion.However, the final decision as to how proposed changes will be implemented is taken by the core developers [12].
Measurement Technique: The centralization in a formal voting protocol is measured by analyzing the moderation control.If specific developers or owners moderate the voting, the moderation may jeopardize the changes that developers or owners disagree with.Thus the determination of the control level is done by examining the voting protocol in place and the controls imposed on it.As public blockchains often have an open platform for proposing improvements, such as BIP for Bitcoin, and EIP for Ethereum, Azouvi et al. (2018) [16] suggests reviewing the number of improvement proposals made by each author and the respective states of those proposals (i.e., approved, rejected or under review).
The authors also suggest reviewing the comments on each proposal to examine the discussions.Based on the data obtained from the author and number of proposals complemented by comments per author on the proposal, Azouvi et al. ( 2018) [16] suggests calculating centrality metrics for the centralization measurement.
These centrality metrics include Mean, Median, interquartile range (IQR), and interquartile mean (IQMean).IQR is a measure of variability that assists in locating where the majority of values lie in the data sample.It is calculated as the difference between 75 th and 25 th percentiles of the data.However, IQR is sensitive to noisy outliers, which can impact the overall result.This can be overcome by using the IQMean, which allows us to eliminate the outliers from our data set by calculating the median of IQR.
Implication of control over improvement protocol: If a subset of all participants moderate the improvement protocol, it will result in control over improvements or modifications to the network.The debate over block size in Bitcoin an example of an issue arising due to this type of control over the network [76,75].Other significant control implications over the improvement protocol include the unilateral decision making in both Bitcoin and Ethereum, where the governance structure implemented a change not widely supported by the community.This includes the notable transaction fee reduction in Bitcoin [12] and Ethereum hard fork due to DAO attack which led to the subsequent creation of Ethereum classic [38].More incidents of unilateral decision making include the changes to the Ethereum consensus algorithm in 2018, where developers decided to modify the algorithm to disable newer mining hardware [77].These incidents not only represent the lack of a systematic governance model in terms of improvement but also present a challenge in terms of newer participation and updates.This type of centralization impacts the presumed open nature of the Blockchain, which is one of the core contributions of Blockchain to the field of financial technologies.

Network
The network layer acts as the information dissemination mechanism for the blockchain instance.As the decentralized network cannot have centralized nodes that act as relay points to transmit messages between the participants, the network is largely a peer-to-peer system.The network layer acts as the information dissemination mechanism for the blockchain instance.As the decentralized network cannot have centralized nodes that act as relay points to transmit messages between the participants, the network is largely a peer-to-peer system.This peer-to-peer network serves as an essential security and usability measure as pointed out by I 8 : "In this peer to peer network, there is no single point of failure and participants can join and leave the network without risking interruption or degradation of the network".
Network connectivity of a node is an important aspect of performing the mining operation [32].Higher network connectivity results in a higher likelihood of adding the next block on the longest chain as the miner can propagate the block to a large number of nodes in the network.This interplay between the reward from adding a block to the blockchain and network connectivity has resulted in networking phenomena such as strategizing networking resource concentration in the form of bandwidth [14] and strategizing geographic distribution of nodes in the network [78,79].
Based on the literature review, we identify another source of centralization on the network layer as the topology formation of the network.This formation includes the node discovery protocol for finding peers in the network [80] and the routing structure of the network [81].The relevant studies identified by our review are presented in Table 5.We describe each of the outlined factors in detail in the following Subsections.

Node Discovery Protocol Control
In a peer-to-peer topology, participating nodes directly communicate with other participants to transmit data packets.A node discovery protocol is used to discover nodes in the network with which to communicate [82].The node discovery protocol often relies on a set of seed DNS nodes that distribute the address of other active nodes on the network.These predefined DNS nodes may be a potential source of security threat, as demonstrated by becoming undiscoverable.As the new nodes in the network discover others by querying these predefined seed DNS nodes, the literature identifies seed nodes as a contributor to centralization on the network layer [80].
Measurement Technique : After the review of all relevant articles in our study, we conclude that no measurement technique focuses on the Node Discovery protocol.Studies such as [84,83] investigate the issue of seed DNS nodes from a security perspective, specifically focusing on the single point of failure issue.We reason that further investigation into centralization in node discovery level is warranted due to the significant security threats that it poses.
Implication of control over DNS: Centralized DNS services are linked to security threats in the network [84].They also allow the DNS owners to observe the participants of the network.These centralized DNS services can also act as a single point of failure, which is of particular concern in the case of a Denial of Service attack [85].As core developers select these DNS nodes, the issue of node discovery protocol also feeds into that of trust in the core developers [83].A malicious developer can also change the DNS seed nodes to conduct an eclipse attack.Serval Monte Carlo simulations have shown the effectiveness of such eclipse attacks on Bitcoin and Ethereum [86].

Geographic distribution
Bitcoin and similar cryptocurrencies have been able to gain significant attention from governments around the world due to their decentralized uncensored nature.This has prompted many to argue that a significant concentration of the nodes in any geographic area may be a threat to the network [79].This type of geographic concentration may lead to centralization on the network layer as the nodes become prone to geopolitical manipulation.70% of interview participants indicated that geographic concentration is harmful to the network.I 6 suggested that geographic centralisation may be disadvantageous for miners who are not centrally located: "I fear that in a geographicallyfocused network, people within the same geographic location will have an edge over others, they will receive and send transactions first".
The nodes are distributed over the participating countries in the network.In an ideal case, the distribution of nodes should be equal in all participating countries so as to be able to withstand a geopolitical blockade.Findings from our review suggest there is a trend towards geographic concentration of nodes in both Bitcoin and Ethereum [78,79,87,14].
Measurement Technique : Our review suggests that the geographic location measurement in blockchain can be done by measuring latency in the peer-to-peer network [78,14].This approach draws heavily from Saroiu et al. ( 2001) [88], where the authors proposed using latency as a measurement tool in Gnutella.Gencer et al. ( 2018) [14] first proposed measuring the distance between their geographically distributed nodes and other peers in the network by sending a data packet and measuring the round-trip time.Based on the round-trip time, Gencer et al. (2018) [14] calculated upper and lower bounds between two remote peers in the network.If two nodes take a similar time to respond to the data packet sent by their nodes, it is reasoned that these two nodes are likely geographically close.This approach is further refined by Kim et al. (2018) [78], who consider the average of bounds for final latency estimation.

Implications of geographical centralization:
The most prominent issue with geographic centralization is the potential for geopolitical manipulation of the network [79].Other issues with geographic clustering include the possibility of faster transmission of packets to nearby nodes promoting faster network propagation.This can lead to more clustering, since participant must propagate the solution to the majority of the network in order to get rewarded in Proof-of-work based blockchains.If the majority is located in a geographical cluster away from the participant, that may translate to a loss of revenue.As suggested by Gencer et al. (2018) [14], a low number of geographic clusters are considered good for the decentralization of the network.This is due to the association of potentially high block rewards due to faster network propagation.As shown in Sapirshtein et al. (2016) [32], network connectivity is directly related to the ability to successfully conduct selfish mining attacks, which can support a double spending attack.

Bandwidth Concentration
In a public blockchain's peer-to-peer network, the network bandwidth often acts as a crucial factor in the successful propagation of data packets.In Proof-of-Work based blockchain, every consensus cycle acts as a race to first calculate the solution to the cryptographic puzzle followed by dissemination of the solution to a majority of the network.Dissemination requires a large number of network connections with peers in the network, thus increasing the bandwidth requirements.This arms race to attain higher bandwidth may lead to the centralization of mining equipment to services like a centralized data center with high bandwidth [14].
Measurement Technique : Gencer et al. (2018) [14] proposed measuring the bandwidth of each peer by requesting a large amount of data and estimating the speed by observing the time taken for the transmission.Once they estimate the speed of each accessible peer, they calculate and cluster the provisioned bandwidth in groups.
Implication of bandwidth concentration: A high bandwidth requirement may limit the participation to only the participants with significant bandwidth [89].It may also result in a high concentration of networking devices in centralized spaces such as data centers [14].This potential increment in bandwidth requirement may limit the participation to only those entities with high network capabilities making the consensus participation not viable in a domestic setting.The inability to participate in the network violates the open nature of the public blockchain preventing a widespread adoption of the technology.

Routing Centralization
As public blockchain networks run over the existing networking stack, they rely on the networking structure used by IP (Internet Protocol).Centralization present in the networking structure of IP transfers to the blockchain as well.
Our review reports that this centralization has been studied in blockchain from the privacy [90] and security [81] perspectives.Gencer at al. (2018) reports that concentration on AS-Level10 as a source of centralization for a public blockchain [14].Interestingly, none of the industrial participants mentioned this concern unprompted, suggesting that it might be more of an academic concern than a real-world one.However, when the concern was mentioned, one industry participant agreed.
Measurement Technique : Our review suggests that there is a common network traversing strategy used to determine the network structure from the AS-Level perspective [90, 81,14].To measure the number of ASes in a peer to peer network, the observer node traverses the network by recursively collecting IP addresses of each peer and querying every reachable address.This process is repeated until no new reachable nodes are available in the IP list.For the determination of AS of each IP, Feld et al. (2014) [90] recommend using Maxmind's free Geo API 11 .
Implication of control over ASes: Centralization on AS-Level is reported to have privacy implications for blockchain users as it allows more traceability on a network level [90].This concentration of IP addresses under a few ASes is directly linked with potential network security issues in Bitcoin [81] and Ethereum [14].However, these privacy and security threats remain largely academic with no real world incident reports in our sample set of articles.This is further evident through our interviews, where no academic or industrial experts pointed to control over ASes as a centralization threat unprompted.

Consensus
The consensus layer establishes an agreement on a single state of the data in the public blockchain.As described in Section 2.2, in the case of Proof of Work, it is attained by inducing a race to solve a mathematical problem.The first person to solve and propagate receives a monetary reward as an incentive.The likelihood of finding the solution to the mathematical problem depends on the computational power devoted to the solution.Thus a high concentration of computational power is a direct signifier of centralization in the blockchain.As identified by articles in Table 6, the consensus power distribution is a key contributor to the centralization of the Proof-of-Work based blockchain.Eight interviewees mentioned this aspect unprompted, suggesting that this is a prevalent concern.In this subsection, we review how the literature defines and measures the consensus power centralization.

Consensus Power Distribution
In the case of a Proof-of-Work based blockchain, the Consensus power is also known as the hash power of the miner (participating node).The centralization of hash power can pose a significant security threat to blockchain solutions such as Bitcoin and Ethereum.One key contributing factor to centralization is commercial mining pools.The income from mining operations depends on the probability of finding and propagating the solution of the puzzle before everyone else.The probability of successfully calculating the solution depends on the hash power of the computing device used for the calculation.Lower probability leads to a lack of stable income and may prompt users to mine as a group and share the profit.This group mining is also known as pooled mining [91].Based on the analysis of the shortlisted literature, we report that the concept of pooled mining in itself is not considered a threat to the decentralization of the network; however, the literature is in agreement over the harms of a centrally run commercialized mining pool.In these centrally run mining pools, the pool manager decides which transactions to include in a block and subsequently distributes the workload among participants of the pool.This type of structure requires trusting the manager of the pool thus limiting the decentralization in the blockchain [92].
Measurement Technique : Studies including [13,14,15,93], have deployed an experimental setup to measure consensus centralization.Judmayer et al. (2017) [93] refer to this approach as a "block attribution scheme".In this experimental set-up, a participating node is connected to the blockchain that actively sniffs the network to extract mined blocks and coinbase addresses 12 .The coin base address is then used to query public blockchain explorers to determine if it belongs to a known mining pool.Based on the results, a list of the mining pools and the proportion of the blocks mined by each respective public mining pool is constructed.Using this approach, we can calculate the proportion of total computational power that each mining pool controls.
This proportion can be represented as a percentage value as suggested by referred articles in Table 6 or by using the Gini values, based on the Lorenz Curve [94,95].
The Lorenz curve is a graphical representation of the distribution of wealth.The curve illustrates the proportion of the income earned by any given percentage of the population.This curve has proven to be of significant importance in economic disparity measurement.To numerically describe this distribution, we can use the Gini Coefficient, which is based on the difference between the Lorenz curve and the line of equality 13 .We can calculate the Gini Coefficient as follows: Where A is the area between the line of equality and Lorenz curve, and B is the area under the line of equality.The value of Gini can range between 0 to 1, where 0 represents complete equality, and 1 represents complete inequality.Implications of consensus power centralization: The impact of centralization in consensus power has been widely studied in security literature [98,18,54,31,13,32].A concentration of 26% in proof of work-based blockchain can lead to successful selfish mining attacks.Whereas a consensus power concentration of over 51% can result in a 51% attack.
Smaller cryptocurrecies tend to be more prone to 51% attack as evident by successful attacks on Aurum Coin, Bitcoin Gold, Ethereum Classic, Flo Blockchain, Monacoin, Verge, Vertcoin and ZenCash [99].These 51% attacks have, on average, resulted in a loss of $2.5 million per cryptocurrency [99].The significance of these attacks is evident by the agreement of all our interviewees on the centralization implications of a 51% attack caused by consensus power concentration.

Incentive Layer
Bitcoin and similar decentralized cryptocurrencies are inherently dependent on the economics associated with rewards [13].Sai et al. (2019) [13] reports that the exchange rate of Bitcoin is related to the overall consensus power of the network.If the exchange rate falls below a given threshold of profitability, the participants of the network may withdraw from active mining, which may result in a fall in overall hashing power of the network.A low value of hashing power of the network makes it easier for attackers to attain a higher consensus proportion; thus it may increase the threat of selfish mining and 51% attack.This interplay between the monetary aspect of public cryptocurrencies and security makes it essential to inspect centralization on the economy driven incentive aspect of the network.A high concentration of wealth to a select few may be an aspect of centralization that can prove to be harmful to the network.Attacks such as the Whale Transaction Attack [100] have exploited wealth concentration.In a whale transaction attack, the attacker attempts to induce disagreement 14 between the participants by providing a high transaction fee in an already published block.
The issue of wealth concentration was raised by 60% of our interview participants unprompted.P 7 , for example, noted how they focused on wealth concentration: "In general, I follow the money.If the trail of funds leads to one natural person or group of natural persons (regardless of number of addresses), then the process is relatively centralized along the spectrum of centralized-decentralized blockchain".
Table 7 outlines the result of our review, identifying relevant articles and shortlisted techniques for measurement.In this subsection, we review the centralization based on Wealth Concentration in depth.This type of centralization may be of significance for a blockchain solution that employs a wealth-oriented consensus mechanism such as Proof-of-Stake [101].

Wealth Concentration
High accumulation of native cryptocurrency may give a unique advantage to an adversary.The high wealth concentration can also be used to increase the overall cost of transactions [100], as demonstrated in the iFish attack on the Ethereum network [102].In the iFish attack, the attacker induced a large number of transactions with a high transaction fee in a short period.This influx of high transaction fees resulted in a considerable increase in the transaction fee.Another form of network abuse arising from high wealth concentration involves transaction fee manipulation by artificially increasing the overall fee required for a successful transaction.
Based on the results from our review, we point that this wealth concentration also has economic impacts on the network.As reported by Kondor et al. (2014) [103], already wealthy nodes in the bitcoin's transaction graph tend to increase their wealth at a higher speed than smaller nodes.They call this phenomena the "rich get richer" scheme.
Technique : Wealth concentration measurement is at the center of disparity studies in economics [104].One of the most commonly used measures is the Gini Coefficient calculated from the Lorenz Curve.The wealth concentration is measured in the form of inequality based on the population and what proportion of population controls how much wealth.Translating this directly to the blockchain could mean calculating Gini over a cryptocurrency and all existing addresses on the blockchain.But we argue that this may not be the most efficient way as techniques such as Hierarchical Deterministic Wallets [105] promote the generation of new addresses for every transaction.To overcome this limitation, Srinivasan et al. (2017) [106] proposes establishing a lower bound value on the cryptocurrency contained in the address for inclusion in the measurement, i.e., a wallet with 0 cryptocurrencies may be excluded from the study, as it most likely resembles an inactive address.Another reported measurement technique is to use a percentage measure.However, a simple percentage measure fails to capture the distribution.

Implications of Wealth Concentration:
Wealth concentration is linked with a number of potential attacks, such as the possibility of a 51% attack in the case of a wealthy attacker during a fall in exchange rate [13].Whale attack, as discussed above, is another example of a wealth oriented security threat to the network.However, both of these potential attacks are without any real-world incident reports.
One example of wealth concentration in a real-world attack is the transaction fee price manipulation caused by the iFish attack [102].During the iFish attack, the attacker was able to artificially inflate the transaction fee of Ethereum by 35%.Another example of a wealth oriented attack is the bZx hack, where a smart contract designed for lending Ether was exploited by sending high-value transactions and manipulating the platfrm [107].
A public blockchain with high wealth concentration contradicts the foundational notion of a more even and open monetary system.This has a direct implication on the adoption of the technology.

Operational Layer
The uncertainty of reward imposes a constraint on participation for rational investors.This reasoning is primarily based on the cost of mining [108].A miner can earn rewards in the form of mining incentives and accumulated transaction fees from the mined block but to profitably mine on a Proof-of-Work blockchain, the difference between rewards earned and the expenses of the mining operation should be positive.This is the 'operations' we are referring to in this 'operational' layer.The expenses of mining operations include capital costs such as the acquisition of adequate hardware and other recurrent costs such as the cost of electricity.
After conducting the systematic review, we report two types of centralization associated with operational aspects of the public blockchain.The first is the move from commercially available mining equipment to proprietary applicationspecific integrated circuit machines.This increased capital, operational cost has proven to be a significant barrier to entry for new miners in Bitcoin [109].We categorize this type of specialized hardware centralization as Specialized Equipment Concentration.
Another factor that contributes to the cost of mining is the storage requirements for operating on the network.As all full nodes in the network are required to store and process all the transactions, the data stored increases [110].This imposes a significant barrier as traditional computing devices may not be able to participate in the network given high storage requirements.This may limit the participation in consensus to only the participants who can afford greater computational resources imposing a constraint on participation.A significant storage requirement may deter users with conventional computing devices from participating in the consensus altogether, resulting in a more centralized network converged on participants with high computational capabilities.This high storage requirement has been discussed as a centralization causing factor [111,112,113].
In this layer of centralization, interviewee I 10 had an interesting perspective suggesting a restructuring of contract layer to widen our definition of the layer to include other operational concerns.
In this subsection, we report the centralization caused by the operational cost involved in participating in the consensus of the blockchain.We also manifest the result of our systematic literature review in Table 8.

Size of the Blockchain
The traditional computing devices are often limited in-memory capabilities and can only hold a constrained amount of data.Attaining a higher storage capacity may prove to be costly if the growth rate of the storage requirement is significantly high [111].This growth in requirement may act as a deterring factor for non-organizational users as the requirement of the investment may be significant [114], thus prompting centralization of mining effort.
The issue of storage requirement was articulated by 20% of our interview participants.I 10 said: "Nothing really stops blockchains from becoming so large that we will run out of capacity.Personally, I have just experienced the first challenge because my Linux partition ran out of capacity; however, if I bought additional hard-disks, I will still be able to run a full node, but it is getting more expensive to run full nodes".
Measurement Technique : To capture the storage-oriented centralization, Raman et al. ( 2017) [114] suggests using the growth rate as a metric.This growth rate is determined based on historical data about the total size of the blockchain.The growth rate can be calculated periodically, ideally after every difficulty recalibration 15 .
I 10 stated their expectations for storage growth rate: "considering that Moore's Law applies to hard drives, it will be interesting to measure the growth rate in comparison with Moore's law".
Implication of high storage requirements: Every blockchain instance may have a different storage requirements, based on its implementation.For example, Bitcoin does not pose significant storage issues as the overall requirement is still low.In contrast, Ethereum has an important storage requirement where the growth rate may limit participation.A growing storage requirement for Ethereum may result in fewer people being able to participate in the network as the participating nodes on Ethereum are expected to store code of smart contracts.A low number of participating nodes increases the likelihood of a successful DDoS attack as it reduces the attack surface.

Specialized Equipment Concentration
Proof-of-work based blockchains have seen a surge in the overall computational power of the network [13].This surge has made it harder to get higher proportional control over consensus and, consequently, over the rewards associated with incentive.This higher computational requirement has induced an arms race in miners to acquire more efficient and specialized hardware [115].This type of specialized hardware is often not open source and gives the developers an advantage over others [115].
60% of our interview participants acknowledged specialized equipment concentration as an issue for a public blockchain.I 7 suggested that this concentration may undermine the whole proposition of public blockchains: ".. but blockchain doesn't live in a vacuum, so really it was/is the externalities (ASICs and other special hardware for example) that threw the biggest spanner in the experiment".
Measurement Technique : Despite the significance of specialized equipment in Proof-of-work based mining operations, there is no existing metric to measure the centralization of hardware.Based on our literature review, we reason that this may be due to the non-public nature of this specialized hardware.As discussed earlier, most of these hardware implementations are not open source and often not available for public use.

Implication of Specialized Equipment Concentration:
As reported by several studies listed in Table 8, the specialized equipment concentration may have given commercial entities an advantage over normal users.If this results in those commercial entries becoming focal, they may utilize the efficient computing equipment to attain higher consensus power and only release it to the public when it becomes less profitable to operate that computing equipment.This approach to hoarding efficient computing equipment is illustrated as the superhashing power dilemma by Bruschi et al. (2019) [94].As a result of our review, we suggest that further investigation is warranted into the measurement of specialized equipment and its impact on centralization.

9: Categories of centralization in Application Layer
Apart from the above reported DDoS attack due to the low number of nodes, the specialized equipment requirement severely contains the participation.This higher barrier of entry and lack of profitability with old hardware makes it impractical to contribute to the network without significant investment.This lack of involvement has been shown to increase the likelihood of a successful selfish mining and double-spending attack [116].

Application Layer
Users often rely on third-party applications to facilitate user interaction with the blockchain [27].These third-party applications include reference implementations, wallets, and exchanges [12].As a result of our review, we report on centralization on these three application layer entities.We also suggest that a monopoly in the user end applications for a blockchain instance is a contributor to the centralization of the blockchain.This issue of centralization on third-party applications was also pointed out by I 8 : "If you remember the catastrophe that centralized implementations such as Mt.Gox, Bitfinex have brought to the blockchain world, you can clearly see the desperate need for decentralization in user-facing applications".
Results from our literature review are outlined in Table 9.
This subsection is a manifestation of the identified centralization prone application layer entities.

Reference Client Development Concentration
As described in Section 2.2, the data layer definition is implemented by a reference client, which acts as the gateway to the blockchain system.As any client that implements the protocol can become a part of the network, it is desirable from the decentralization point of view to have as many developers working on the reference implementation.Each client is expected to fulfill the protocol specification suggested by the core protocol.The development of the core protocol is decentralized by developing an open-source reference implementation.If a select few developers primarily drive the development of the core client, it contributes to centralization [12,16].The decentralized protocol development factor captures this type of centralization.We note that this centralization is different from the improvement protocol centralization as the focus here on the development of a reference client and not improvements to the protocol.
Despite the reported adverse impact of this type of centralization on the blockchain, in Section 6, we present an argument in favor of some centralization in client development as the developer concentration may be a result of highly skilled developers making useful contributions.
Measurement Technique : [12] suggests examining the number of unique developers contributing to the open-source project with the number of commits on the main core client codebase.This approach is then extended by [16], where they propose using the Satoshi index, which represents the minimum percentage of all contributors required to reach 51% of data contribution.

Implication of reference client development concentration:
If only a select few developers work on the reference implementation, they may gain unfair influence over the network.This concentration of power in the hand of select few feeds into the governance issues discussed earlier.As discussed by [12,16], this type of concentration is harmful to the decentralization of the network as a few developers may influence the implementation of change to the codebase.One of the major implications of influential actors in the public blockchain ecosystem is the defiance of open and equal monetary system assurance provided by the blockchain.As this open and equal system is one of the primary contributions of the public blockchain, the existence of influential entities severely limits systems capabilities to perform in an open and equal manner.

Exchange
Incentives for honest behavior are at the core of the decentralized, trustless transaction ledger.These incentives are often offered in the native cryptocurrency such as BTC and Ether.The real-world value of these cryptocurrencies has been debated [13] with the recommendation that they be determined by the exchange rate to traditional fiat currencies.The exchange of cryptocurrency to traditional fiat currency is aided by application layer entities known as exchanges.These exchanges act as the means of consensus formation around the exchange value.This process is also known as Price Discovery.Due to the vital importance of the exchanges, the exchange applications must not be monopolized.
Measurement Technique : To measure the state of centralization in exchanges, Marvin et al. (2017) propose measuring the centrality of exchanges by examining the flow of cryptocurrencies between addresses on the blockchain [117].
Addresses with high centrality in transactions may point to exchanges.This is observed by graphing the transaction flow and identifying nodes with a high degree of centrality.This was followed by the calculation of a Gini Coefficient that reports on the trend of centralization due to exchanges.
Other studies, such as Hileman et al. (2017) [118], have employed a percentage-based value measure, where they measure the proportion of all bitcoin transactions processed by exchanges.

Implication of centralized exchanges:
A large number of successful attacks on Bitcoin and Ethereum have focused on exploiting vulnerabilities in exchanges [119].These centralized systems act as a single point of failure in case they also serve as a central repository of keys.A prominent example of this is the closure of Mt.Gox due to numerous security flaws leading to loss of Bitcoins owned by its users [120].
Attacks on centralized exchanges not only impact the users of the exchange but the broader cryptocurrency community as it can instill doubts over the security of the ecosystem.These security attacks contribute to the barring trust and adoption by the wider community.

Wallet Concentration
Wallet applications are another form of centralized service on the application layer, as these applications are often developed and maintained by centralized organizations [27].
Measurement Technique: Based on the review of the relevant literature, we report that there are no suggestions regarding the measurement of wallet concentration.We reason that this may be due to the nature of how wallets operate in a closed commercial environment.However, as most of these wallets use an exchange service to transmit funds such as Coinbase [118], it may be reasoned that exchange centralization may provide a rough proxy for wallet based centralization as well.
Implication of centralized wallet: Applications such as wallets have been identified as a single point of failure and are considered a security threat [27].A high concentration of wealth in centrally managed wallets may give the host an advantage feeding into the issue of wealth concentration.This concentration may also result in a dependence on centralized organization, consequently reducing the decentralization.
Similar to exchanges, a centralized wallet poses a potential barrier of entry in the ecosystem.Due to the technical ability required to host their wallets, most end users tend to prefer hosted wallets, which provides attackers with a small attack surface.This can aid attackers in conducting more targeted yet profitable attacks on the centralized wallet hosting service.

State of centralization in Bitcoin and Ethereum
The following subsection provides an overview of empirical evidence specific to the two most prominently used blockchain-based cryptocurrencies: Bitcoin and Ethereum.We present the view of the literature on the centralization of these two cryptocurrencies.To structure this investigation, we use the initial taxonomy.The results from this investigation are manifested in Table 10.
than Ethereum.The average peer-to-peer network latency of Ethereum is 26.7% higher than Bitcoin, suggesting that Ethereum nodes are located at a greater geographic distance.They reason that this is due to the data center focused approach to mining for Bitcoin, whereas Ethereum can be mined by using consumer hardware.This association between geographical distribution and operational centralization neatly illustrates the interdependency between different aspects of centralization, even those based in different layers.
Bandwidth Concentration: Gencer et al. ( 2018) [14] states that nodes in Bitcoin tend to have about 1.9 to 2.7 times more network bandwidth than Ethereum nodes.They also report that based on the bandwidth, it can be assumed that Bitcoin nodes are located in data center clusters, whereas Ethereum exhibits a more spread out distribution of bandwidth.
Routing Centralization: Feld et al. (2014) [90] reports that 30% of the bitcoin network was only made up of 10 ASes, which presents a level of security threat.This work was expanded by Apostolaki et al. (2017) [81], where they report that 13 ASes covered about 30% of the network but only consisted of 36 IP prefixes.These 36 IP prefixes cover about 50% of mining power.However, the only investigation that has reported on AS-Level centralization in Ethereum, Gencer et al. ( 2018) [14] reports that 28% of Ethereum nodes belonged to a single AS.

Consensus Layer
Centralization of consensus power of bitcoin has been studied thoroughly in the literature [15,54,12,13,30].Beikverdi et al. (2015) [15] uses a percentage based centralization value to derive a new metric called Centralization Factor.They report that at the beginning of 2011, 30% of all hashing power was controlled by eight mining pools.This concentration sees a significant increase in 2014 when, according to Gervais et al. (2014) [12], the top mining pool alone controls close to 40 % of all hashing power of the network.
Gencer et al. ( 2018) [14] expands these analyses by also examining Ethereum's network.During the observation period, Gencer et al. ( 2018) [14] reports that Bitcoin had a less centralized consensus mechanism than Ethereum.On average, the top four mining pools in Bitcoin controlled 53% of the hashing power, whereas in Ethereum the top three mining pools controlled 61% hashing power.

Incentive Layer
According to Malik et al. (2016) [125], as of 2016, 11,000 unique Bitcoin addresses, out of a total of 12 million, contained 75.2% of all Bitcoin in circulation.This disparity shows a significant concentration of wealth to a select few.Chohan (2019) [35] also supports the claim of significant inequality in the Bitcoin network.The author claims that the level of inequality reflects that of traditional economies and voids the proposed purpose of Bitcoin: decentralization.Gupta et al. (2017) [126] conducted an in-depth investigation of the inequality of Bitcoin.They report that Bitcoin had a Gini value of 0.995 in the year 2013.This result is then refined by Srinivasan et al. (2017) [106], where they set a lower bound on the Bitcoin account to account for Hierarchical Deterministic wallets as described in Section 3.They report that in 2018, Bitcoin had a Gini value of 0.65, where they set the minimum threshold to 185 BTC per account.This Gini value suggests that wealth in bitcoin is highly centralized when compared to real economies where, according to the World Bank [127], the highest reported Gini value is 0.63.
According to Srinivasan et al. (2017) [106], Ethereum demonstrates a similar trend of significant centralization with a Gini value of 0.76 with a minimum threshold of 2477 ETH per account.This suggested trend is in line with the report by [128], where they claim Ethereum to be more centralized in terms of wealth distribution.

Operational Layer
In Pustivsek et al. (2019) [129], the authors report that the Bitcoin full node requires 204 GB storage space.This storage requirement is slightly lower than the 385 GB required by Ethereum for a full node [130].Pustivsek et al. ( 2019) [129] also reports that the storage growth rate is about 0.1-0.5 GB per day.Our review was unable to identify any longitudinal studies that observe the growth in storage requirements over a long time.
As reported in Section 4, numerous studies identify specialized equipment concentration as a cause of centralization.Despite the significant attention to this issue, our review suggests that there are no proposed measurement techniques.

Application Layer
Reference Client Concentration: According to Azouvi et al. (2018) [16], a single author wrote about 30% of all files in the bitcoin reference implimentation 16 .This is significantly higher in Ethereum, where an individual author wrote 55% of all files.They also analyze the comments on the GitHub pages of Bitcoin and Ethereum reference clients.They report that only eight people contributed to half of all comments representing 0.3% of all commenters.This concentration in comments is also observable in Ethereum, where 0.6% commenters contributed to 50% of comments.
Exchange Concentration: Intermediary services such as Exchanges that also act as central key stores for Bitcoin have been suggested as a centralization causing factor by Bohme et al. ( 2015) [131].A prominent example of the harm caused by exchange concentration is the collapse of Mt.Gox in 2014 [120].In 2014, Mt.Gox was the leading exchange for Bitcoin, and its closure resulted in a total loss of $450 Million.Bohme et al. ( 2015) [131] reports that the concentration of exchanges was still high in 2015 when the seven largest exchanges served more than 95% of all bitcoin trades.
An empirical analysis conducted by Bohme et al. ( 2015) [131] reported that out of 40 Bitcoin exchanges examined, 18 had closed, wiping out customers' account balance as they stored the private keys of customers.They argue that these exchanges operate as the de facto centralized authorities in the Bitcoin network.
As for Ethereum, we report that there are no studies that explicitly report on the behavior of exchanges for Ethereum.However, as suggested by Kim et al. ( 2018) [132], most of the Bitcoin exchanges also exchange multiple other cryptocurrencies, including Ether.
As discussed earlier, based on our systematic review, we conclude that there is no suggestion regarding a measurement technique to capture wallet based centralization.
So, in terms of Bitcoin, the main centralization threats are at the Network, Consensus, and Application layers.Specifically, the centralization aspects of the Network layer: geographic distribution, bandwidth, and routing are vulnerabilities for bitcoin in that they allow the specific threats of geopolitical manipulation of the network, high resource requirement for participation, and possibility of network attacks.These threats for bitcoin are augmented by the high concentration of consensus power to centralized mining pools and application layer operations such as exchanges and wallets.
Ethereum also shares the issues of centralization on the application layer as they lead to reliance on centralized entities such as exchanges and wallets for participation in the network.Other significant centralization threats for Ethereum include the Governance, Consensus, and Incentive layers.Especially the centralization aspects of the Governance and Incentive layers may induce vulnerabilities for Ethereum in that they allow unilateral decision making on the governance layer and high wealth concentration on the incentive layer.

Discussion
In this first in-depth investigation of the centralization of public blockchain solutions, we conducted a systematic review of existing literature to produce an initial taxonomy of centralization.We then refined this initial taxonomy through expert interviews.We provide an overview of centralization in different aspects of the blockchain.We examine different means of measuring centralization, also pointing out the absence of measurement techniques in these research studies.This initial taxonomy provides a framework for a more systematic discussion around the centralization of major blockchain systems.The following section discusses the findings of our survey.

Non Binary Nature of Centralization
We observe that decentralization in the public Blockchain literature is a loosely-defined term that can take many shapes and forms.We also observe that most of the non-decentralization-specific articles reviewed treat decentralization as a binary construct.That is: a blockchain instance is either centralized or decentralized.However, based on our taxonomy, we define centralization of public Blockchains as the process by which one or more architectural dimensions (aspects) of the Blockchain are restrictive to the majority of participants by direct or indirect economic, social, or technical constraints and so argue that centralization is not suited to binary classification.
This latter observation aligns with expert interviews, where 60% of participants preferred a spectrum of values for centralization rather than the conventional binary notion.However, the interviewees also acknowledged that the complexity of a more granular definition might dilute the meaning to non-experts in the blockchain domain.For example, I 5 said: "I am an engineer, so I prefer precision and a multidimensional model, but I know when you are presenting to business people, a single score might be what they are looking for".
This survey presents a novel, initial taxonomy to address this dilution concern and allows for structured discussion on centralization.The following text discusses the key findings of the taxonomy.
Consensus power concentration was the most recognized form of blockchain centralization by both the literature and experts interviewed.We reason that this wide recognition is due to the dependence of significant security threats such as the Double Spending [28] and Selfish mining [32] attacks on the consensus power concentration.The practical implication of this centralization is the heavy the impact of mining pools when operating a profitable mining operation.The dominance of mining pools is observable in both Ethereum and Bitcoin.In Bitcoin the top 4 mining pools control over 53% of the hashing power, whereas in Ethereum the top 3 mining pools control over 61% of the hashing power (See Table 10).
A high concentration of consensus power can induce an arm's race to attain the most efficient hardware [13].Our survey reports that this race often results in specialized proprietary hardware.The practical implication of this type of hardware concentration is an indirect limitation to participation as only efficient, and often proprietary hardware can result in a profitable operation.To remedy this situation, studies such as Cho et al.Hyungmin (2018) [133], have proposed using a consensus algorithm that is memory heavy, for which specialized hardware design is inefficient.
Surprisingly on a similar operational constraint, the Storage growth rate was less widely recognized to contribute to centralization.However, I 10 raised an interesting issue on the ever-increasing append-only nature of Blockchain that may result in consistent growth in storage requirements.As reported in Table 10, the current growth rate for Bitcoin is around 0.1 to 0.5 GB per day.The practical implication of this increased storage requirement is the inability of conventional computing devices to serve as nodes in the blockchain [111].Guo et al. [111] propose a storage optimization scheme based on the redundant residual number system that can reduce the storage requirement.We suggest that a further investigation into storage optimization in public Blockchain is warranted.
Another unexpected finding of our survey was that 50% of the interviewees accepted node discovery protocol control as a threat to decentralization, despite only one research article reporting on the issue.We reason that this may be due to the practical implications of setting up a new node such as the potential delay in network connection for new nodes due to high traffic through DNS nodes.This type of delay is often not accounted for in network simulation tools such as NS3, employed by studies such as [54,13].Contrary to the previous example, routing and bandwidth centralization in the network was not widely recognized by the interviewees.One potential explanation could be the experimental nature of the measurement associated with the routing and bandwidth centralization.Despite these being recognized as issues, both the bandwidth and routing do not cause operational issues to most participants at present.
Another network-oriented centralization concern widely recognized by both the literature and interviewees is the geographic distribution of the nodes.Our findings suggest that the Ethereum network is more geographically spread out than Bitcoin.We reason that this is due to the possibility of using conventional hardware such as GPUs to participate in Ethereum.Despite the recognition, our literature review did not identify potential strategies to address this centralization.We suggest that strategies to limit geographic concentration should be investigated.
The lack of mitigation techniques is also persistent in the application layer aspects.The wallet and exchange centralization have been reported on by the literature and also recognized as centralization issues by expert interviews.As reasoned earlier, the centralized store of cryptocurrencies may give an advantage to the exchange or wallet operator.This advantage is often in the form of wealth concentration and can be observed in the centralization of Bitcoin exchange platforms, where only seven exchanges were reported to serve more than 95% of all trades.
Interviewees and literature also agree on the implication of wealth concentration on the decentralization.Surprisingly, despite the apparent issue of a "Rich getting Richer" effect in Proof-of-Stake cryptocurrencies [134], most of the reported literature focused on the wealth concentration in Proof-of-Work.We suggest that the issue of wealth concentration be investigated in the context of Proof-of-Stake cryptocurrencies.
Another factor that may result in a "Rich getting Richer" effect is the distribution of wealth at the very start of the Blockchain captured by owner control in our taxonomy.The issue of owner control is also associated with how the Blockchain is governed.Governance centralization in Blockchain is widely recognized by both the literature and interviewees.Interestingly, Wang et al. (2017) [36] argue for some centralization in the governance to facilitate quick response to security threats.We expand on this line of reasoning in the following subsection.

Aspect based Measurement of Implications of Centralization
As pointed out earlier, not all aspects of our taxonomy are an equal contributor to the overall centralization of the blockchain.This was also substantiated by six interviewees agreeing that a combined value of centralization for the overall blockchain would not be meaningful.For example, storage constraint oriented centralization may be an issue in Ethereum due to the requirement to store smart contracts.In contrast, this may not be a significant issue for Bitcoin as only transactions drive the storage requirements.We expand on this category-based significance reasoning that not all centralization is necessarily equally bad for the network: The governance layer based centralization argument presented by Gervais et al. (2014) [12] assumes that concentrating decision making power to a select few is bad for blockchain.However, we question this argument, as true decentralization is an impossibility in real world scenarios [135,17].The concentration in decision making had also proven to be useful in instances of network attacks when a prompt response was mandated [36].Delegation of controlling power during the cases of security bugs or attacks may have proven to be detrimental to the network.Despite the lack of decentralization in governance, it may be to the overall benefit of the network.We present this as a potential future research avenue to explore the most suitable governance structure for decentralized systems.
We also argue that the results obtained by Azouvi et al. ( 2018) [16] regarding the centralization in source code development for core client implementation may not necessarily be bad.It may just be the case that only a handful of developers have an in-depth understanding of the source code to make useful contributions to the system.This reasoning of limited expertise feeds into the argument against the decentralization of the improvement protocol.As pointed out by Azouvi et al. (2018) [16], the vast majority of the Ethereum Improvement Protocol recommendations originated from a single developer, Vitalik Buterin.We reason that this may be due to the quality of suggestions proposed by Vitalik.
These arguments in favor of some centralization are an example of the complex nature of decentralization in distributed systems.We propose that the significance of each aspect of centralization be determined based on the empirical evidence specific to each blockchain instance.

Conclusion
In this paper, we conduct a systematic literature review to provide a summary of the research done on the centralization aspect of blockchain.We structure our findings in a novel initial taxonomy of centralization.This taxonomy is then refined and validated through expert interviews.

Contribution
Decentralized blockchain solutions provide a means of monetary asset transfer without a trusted third party; this is attained through the delegation of the validation power to all participants of the system rather than the administrator.This delegation of control is often referred to as the original contribution of blockchain systems [23].Based on previous studies, [13,14,11,12], we reason that the preconceived notion that blockchains are inherently decentralized may not hold in the present situation and that raises the potential of severe issues for blockchain instances.Due to the lack of an objective measure of centralization, it becomes impractical to discuss improvement in terms of centralization.
Centralization is a challenging variable to research, in part because of the multiple definitions and measures of centralization applicable in blockchain and, to date, the implicit nature of several of those aspects and the lack of an encompassing framework.We report on these myriads of definitions, conceptualizations, and dimensions used to describe this concept by segmenting them based on a generic architecture proposed by Zhang et al. (2019) [18].
Our study contributes to the existing body of knowledge by systematically surveying and synthesizing the blockchain literature, reporting on the adverse impact of centralization such as security threats, as well as identifying research gaps such as the lack of Ethereum specific research on centralization.
With this systematic review, we provide the reader with an overview of various forms of centralization in Blockchain resulting in an initial taxonomy.This taxonomy also contains numerous existing measurement techniques used to measure centralization.It may help researchers evaluate the centralization of a blockchain instance, but will also allow researchers add more aspects of centralization as they become known, providing them with a vocabulary of centralization that will allow them address the issues that arise.
We have also reported on the platform-specific findings for the two most prominently used blockchain-based cryptocurrencies: Bitcoin and Ethereum.We report that both Bitcoin and Ethereum have similar centralization issues with regards to reference client implementation, decentralized protocol development, and exchanges.However, in terms of wealth concentration, Ethereum is more centralized than Bitcoin, primarily due to high owner control.This trend continues with consensus power concentration, where Ethereum is reported to be more centralized than Bitcoin.Ethereum nodes, however, are geographically more spread out than Bitcoin, resulting in a low geographic concentration when compared to Bitcoin.
We also discuss that centralization on all aspects is not necessarily adverse for the blockchain by expanding the argument in favor of some centralization by Wang et al. (2017) [36].We suggest that the unpropitious impact of centralization be measured on each aspect based on empirical evidence.This aspect-specific investigation may assist the move from the binary notion of decentralization to a multidimensional scale encompassing adequate measurement and control where necessary.

Threats to validity
As decentralization is fundamental to a public blockchain, the term is frequently used in the title and abstract of articles relating to public blockchains.To not omit any relevant articles, we kept the search queries generic by including any article that includes the term "Blockchain" and "Decentralization" along with suggested alternate words in Section 3. We acknowledge that despite the broad terms used, we may have missed relevant articles not present, or with different phrasing, on these leading search repositories.These missed articles may include "grey literature", which is of significant importance in the blockchain research domain [136].To overcome this limitation, we included Google Scholar in our search process.However, as reported earlier, the Google Scholar search was limited to the top 1,000 entries, even though the relevant articles dropped off significantly after the top four hundred returned articles.
The literature review may also be limited due to the strict inclusion and exclusion criteria for the title and abstract filtering.We reason that these strict criteria are warranted due to a large number of articles retrieved by the search queries (3,574 non-duplicate entries).To overcome this limitation, we employed a two-step filtration by reviewing both the title and abstract.We also performed cross-validation of the filtration process by the independent review of the articles by two authors.This cross-validation process resulted in Cohen's Kappa value of 0.84, which is considered an almost perfect agreement.We repeated a similar cross-validation process for the full-text filtration.
The review process aimed to extract factors from all shortlisted articles despite their core focus.As the study of centralization in public blockchain is still in the early stage, we included articles where the core focus was not centralization.This inclusion may have limited the quality of shortlisted articles, as observed by the exclusion of 148 articles after full-text filtration.To overcome this limitation, we performed a quality review of all 212 shortlisted articles and shortlisted a final set of 89 articles.
To further evaluate the literature-review findings, we interviewed ten experts.The recruitment process was based on the prominence of authors in the bibliographic map generated by Ramona et al. (2019) [68].As with any other qualitative research method, interviews have several limitations, as pointed out by Opdenakker et al. (2006) [137].In addressing them, we adher to the validity dimensions put forth by Maxwell (1992) [138] for qualitative studies.The first validity threat is the descriptive validity of the data obtained through interviews.To limit this, we transcribed the audio-captured interview in verbatim form.However, in the interviews that relied on contemporaneous notes, it is possible that the interviewer may have missed some observations.The second threat to validity is the interpretive validity of the interviews.To address this, we used open-ended questions and restricted the questions strictly to the research questions presented in Section 3.2.We also coded the interviews based on the terms used by the interviewees rather than an interpretation.The transcripts and notes were individually checked by researchers from the author list.The interviewees were also given back the interview transcripts and notes for validation.

Future Work
Having provided a comprehensive overview of centralization in public blockchain, a case study focused on individual cryptocurrencies, and blockchain implementations would complement our study.This case study could include an in-depth centralization review of, for example, Bitcoin, Ethereum, and Libra [139].
The taxonomy developed by our study can also be expanded to provide an objective measure of centralization for blockchain instances, as a whole, to facilitate comparison.This objective measure may prove to be useful for the evaluation of centralization from a novice user, or governance perceptive.Four of our ten interviewees stated that they would prefer a single score to measure centralization objectively, and thought it would assist end-users and nonspecialist researchers.
We also hope to develop different flavors of this initial taxonomy that are specific to implementation details.For instance, the presented taxonomy is generic and does not consider consensus specific issues such as Stake bleeding [140].It also omits the consideration of source code dependencies in Smart Contracts.In future, we intend to statistically examine the source code of smart contracts to observe if a handful of libraries dominate the smart contracts in Ethereum.
work presented here only examines the already identified factors that may lead to centralization and does not analyze the existence of other novel forms of centralization.As a part of future work, we will consider a thorough review of one of the reference blockchain implementations to identify factors that may also contribute to centralization directly or indirectly.
We also aim to review existing literature to identify potential solutions to the centralization avenues suggested by our review.These solutions may facilitate integrating centralization considerations during the development of public blockchains.

Figure 4 :
Figure 4: Overview of Systematic Literature Review

Figure 5 :
Figure 5: Article Titling and Abstraction Process

Table 2 :
Taxonomy of Centralization in Public Blockchains

Table 3
considers the factors identified in Table

Table 4 :
Categories of centralization in Governance Layer

Table 5 :
Categories of centralization in Network Layer [84]et al. (2017)018)[83]andJin et al. (2017)[84].If one of the seed nodes becomes inaccessible, it may result in many participants of the network

Table 7 :
Categories of centralization in Incentive Layer

Table 8 :
Categories of centralization in Operational Layer