Unlocking Blockchain UTXO Transactional Patterns and Their Effect on Storage and Throughput Trade-Offs

: Blockchain technology ensures record-keeping by redundantly storing and verifying trans-actions on a distributed network of nodes. Permissionless blockchains have pushed the development of decentralized applications (DApps) characterized by distributed business logic, resilience to centralized failures, and data immutability. However, storage scalability without sacrificing throughput is one of the remaining open challenges in permissionless blockchains. Enhancing throughput often compromises storage, as seen in projects such as Elastico, OmniLedger, and RapidChain. On the other hand, solutions seeking to save storage, such as CUB, Jidar, SASLedger, and SE-Chain, reduce the transactional throughput. To our knowledge, no analysis has been performed that relates storage growth to transactional throughput. In this article, we delve into the execution of the Bitcoin and Ethereum transactional models, unlocking patterns that represent any transaction on the blockchain. We reveal the trade-off between transactional throughput and storage. To achieve this, we introduce the spent-by relation, a new abstraction of the UTXO model that utilizes a directed acyclic graph (DAG) to reveal the patterns and allows for a graph with granular information. We then analyze the transactional patterns to identify the most storage-intensive ones and those that offer greater flexibility in the throughput/storage trade-off. Finally, we present an analytical study showing that the UTXO model is more storage-intensive than the account model but scales better in transactional throughput.


Introduction
Blockchain technology is an innovative digital ledger system that provides secure record-keeping by storing and redundantly verifying transactions on a distributed network of nodes [1].This technology bifurcates into two primary classes: public (or permissionless) and private (or permissioned) blockchains.Permissionless blockchains are open access and allow the participation of any individual or entity [2], while permissioned blockchains require credential validation or an economic incentive to allow collaboration in the network [3].Permissionless blockchains have pushed the development of DApps, which exhibit features such as distributed business logic, distributed data, resilience to failures at central points, and a guarantee of data immutability [4].
However, permissionless blockchains face challenges that limit the optimal operation of DApps.One of the most relevant challenges is storage scalability, specifically the growth of the blockchain's sublinearly with the number of nodes.To understand the problem of storage scalability in blockchains, let us imagine a library that constantly receives new books (blockchain transactions) with a constant daily rate of ten books, known as the growth rate, c.For security and redundancy, the library stores copies of all the received books in different sections, with the number of sections equivalent to the number of nodes n.In this scenario, if we want to determine the total number of books in the library storage size, s, we could calculate it as s = c × n.However, the challenge occurs when the librarian cannot control the number of sections (nodes) where the book copies are stored.For example, one day there are five sections, and the next day, there are seven sections.This fluctuation in the number of sections affects the storage capacity of the library and the management of the books.A real-world example of this challenge is seen in Bitcoin, where the storage size of the blockchain has currently reached 3.28 petabytes [5].This situation is influenced by the constant growth rate of the blockchain, which is approximately 488 GB per node, and by the number of nodes redundantly storing transactions, presently around 7065 [5].Ethereum [6] serves as a notable case where storage growth may follow an exponential trend, as depicted in Figure 1.The previously mentioned issues arise from the inherent redundancy built into the design of permissionless blockchains.This redundancy creates a delicate balance: improvements in transactional throughput (measured in transactions per second) inevitably lead to increased storage requirements, while attempts to reduce storage potentially compromise throughput due to decreased availability and increased latency.
There are three primary approaches to increasing transactional throughput: block size management, off-chain mechanisms, and sharding.Block size management increases the block size to allow more transactions per block, temporarily helping transaction congestion [7,8].Off-chain mechanisms process transactions outside the main blockchain through payment channels or sidechains, reducing the load on the main blockchain [9][10][11][12].Sharding increases throughput by splitting the blockchain into smaller, parallel-processing parts called shards [13][14][15].However, the impact of these approaches on storage growth needs careful consideration.
Storage efficiency enhancement approaches are divided into centralized and decentralized data.Centralized approaches store data in a single location or through a central entity [16][17][18], while decentralized strategies distribute data across multiple nodes in the blockchain network, enhancing robustness and immutability [19,20].The common goal is to increase storage efficiency, but these strategies affect transactional throughput.
In summary, advances in blockchain technology aim to enhance transactional throughput and reduce node storage requirements.However, these goals are not mutually exclusive, as improvements in one often impact the other.We identified a noticeable gap in the analyses that relates storage growth to transactional throughput and vice versa.In this article, we unlock transactional patterns of the UTXO model to reveal the relation between storage and transactional throughput, providing the first analysis of the relation of these parameters.To achieve this, we apply the following methodology: 1.
Analysis and abstraction of transactional models.

2.
Formal comparison of models to highlight their cost on storage.

3.
Run experiments with data from the Bitcoin and the Ethereum blockchains.
The analysis resulting from the previous methodology shows that the UTXO model is more storage-intensive but offers flexibility in transactional throughput, showing signs of a trade-off in the parameters.The transactional behavior of the models, resulting from the abstraction step, led us to introduce a novel DAG-based abstraction of the Bitcoin transactional model: the spent-by relation.This new relation unlocks the transactional patterns that represent any transaction on the blockchain and shows the relationship between throughput and storage.Finally, the experiments on more than 800 M transactions show the most storage-intensive transactional patterns.
The remainder of the paper is structured as follows: Section 2 presents an overview of the fundamental concepts of transactional models.Section 3 presents an overview of related work, with particular emphasis on strategies that impact storage/throughput within blockchain systems.Section 4 presents an analysis of the execution of transactional models and their impact on blockchain storage.In Section 5, the spent-by relation is introduced as a novel abstraction of the UTXO model.In Section 5.3, we unlock the transactional patterns within the UTXO model.Finally, in Section 6, we introduce an experimental comparison of storage costs in UTXO transactional patterns.

Fundamental Background of Transactional Models
Blockchain technology, at its most basic essence, provides a mechanism for secure and verifiable storage of records through a redundancy system.This redundancy results from the verification and distributed storage of transactions in a network of nodes operating in a peer-to-peer (P2P) system.Transaction records on the blockchain network are grouped into blocks, thus creating a chain of blocks, hence the term "blockchain".Each block contains a series of transactions, all of which are validated and confirmed by the network.The block is linked to the previous one through a unique identifier called hash.This hash results from a cryptographic function that takes the data of the current block and the ID from the previous one, producing a unique fixed-length string.This implies that any change breaking the blockchain indicates manipulation.
This fundamental understanding of blockchain technology sets the stage for a deeper exploration of its complexity and functionality, especially in the context of the transactional models of Bitcoin and Ethereum.In this section, the main transactional models are discussed, specifically the unspent transaction output (UTXO) [21] model and the account model [22].

UTXO Model
In the UTXO model, the state of transactions is represented as a collection of unspent transaction outputs.This is illustrated in the DAG shown in Figure 2, where vertices symbolize transactions, and edges represent pointers that consume the previous transaction to generate a new one.There are several definitions of the UTXO model, such as a Directed Acyclic graph.In this article, the definition provided by Jeyakumar et al. [23] is highlighted for its ability to encompass the transactional model of Bitcoin and Ethereum.Definition 1.A directed graph G(V, E), where V = {v 1 , v 2 , . . ., v n } represents the set of nodes and E ⊆ V × V represents the set of edges.For each vertex v i , an edge e i is of the form v i → v j .
Bitcoin transactions use one or more unspent outputs from previous transactions to create new outputs.These new outputs become unspent outputs that are available for future transactions.
According to Narula and Dryja [24], a digital signature, a public key, and a timestamp must be provided to consume an unspent output.In addition, the following properties must be met: 1.
All outputs are not the same.

2.
An unspent output refers to a specific output when spending.

4.
An output can only be spent once.
These properties are based on the Bitcoin protocol and replicated in other applications.

Account Model
In contrast to the UTXO model, the account model represents the state of blockchain transactions as a variety of accounts or addresses, which are managed by entities or smart contracts [25].These entities can be individuals, organizations, or automated systems.An example of automated systems is smart contracts, which are simple programs housed within Ethereum's virtual machine (EVM), facilitating the execution of complex operations and agreements autonomously, while providing high reliability.
In Ethereum's implementation of the account model, transactions are abstractly represented as state transitions.Figure 3 shows a graphical representation that illustrates the flow of transactions that update account statuses as they are executed.In addition to the traditional transactions in Ethereum, there are types of transactions specifically related to smart contracts.These transactions are typically classified in the literature as contract deployment and contract invocation:

•
The process of contract deployment essentially involves the creation of a smart contract.This can be equated to an executable program that is assigned a unique address within the blockchain.The smart contract contains a set of predefined functions or instructions that are written in a programming language compatible with the Ethereum blockchain, such as Solidity [26].

•
On the other hand, contract invocation refers to the process of executing or "calling" the functions embedded within the smart contract.These functions can be invoked by other addresses within the blockchain network, allowing them to interact with the smart contract and initiate specific operations.These operations can range from simple value transfers to more complex interactions involving multiple smart contracts [27].
Finally, there are other transactional models, such as the EUTXO model [28] and the account abstraction ledger [29], but these are based on the models discussed above.

Related Work
As previously discussed in the introduction, strategies to save storage or increase transactional throughput have been addressed in a disjointed manner.
This section analyzes storage improvement within two strategies: centralized and decentralized.At the same time, we examine approaches designed to improve throughput based on sharding, off-chain, and block size.Figure 4 categorizes each method based on improvements in throughput and storage parameters.In particular, strategies that reduce storage tend to reduce transactional throughput, as depicted in the top left.Contrarily, methods that increase transactional throughput tend to be storage-intensive, as shown in the lower right.After classifying each method, we found a noticeable gap in the literature: a lack of studies investigating the trade-off between throughput and storage based on blockchain transactional models.

Approaches to Enhance Throughput
Strategies to enhance transactional throughput are primarily focused on increasing the number of processed transactions within a given time frame.The approaches are categorized in Table 1, organized in ascending order based on the level of transactional throughput they achieve.We describe the advantages and disadvantages associated with each method as follows.

Block Size
The first approach to enhance transactional throughput involves increasing the block size.This strategy has been used in cryptocurrencies, such as Bitcoin and Ethereum, where the block size has been increased to allow for the inclusion of more transactions, thereby temporarily alleviating transaction congestion [7,8,30].However, this design decision has disadvantages since larger blocks require more storage resources and take longer to process and propagate over the network.

Off-Chain
The off-chain transaction approach mainly uses two methods: (a) payment channels and (b) sidechains, which effectively enhance transactional throughput while simultaneously impacting storage requirements on the main blockchain.

(a)
Payment channels increase transactional throughput by creating private paths between entities.For example, the Lightning Network [9,10] is capable of handling up to one million TPS off the main blockchain, recording transactions on the blockchain only when the channels are closed.This method shifts the storage overhead from the main chain to external systems to increase transaction throughput, altering the balance between these two parameters.In addition, managing the states of the channels off the main chain requires additional resources to ensure the integrity of the transactions.(b) Sidechains [11,12] operate as independent blockchains with their own storage and consensus mechanisms, linked to the main chain by two-way pegs.This setup allows them to process transactions that do not burden the main chain, enhancing overall system performance.However, the need for additional infrastructure to maintain the security and operability of sidechains increases off-chain storage and management overhead.
Finally, regardless of the method used, the off-chain transaction approach does not offer transparency at the same level as on-chain transactions.This is because the channels are not visible to all participants, and the sidechains do not maintain the same security.

Sharding
Introduced in research such as Elastico [13], OmniLedger [14], and RapidChain [15], is recognized as a strategy for parallelizing transactional throughput and sharing storage.This enhances the transaction processing rate and mitigates short-term storage pressures in traditional blockchains.However, storage is not sustainable in the long term.For example, in its experiments with 4000 nodes distributed across 16 shards, RapidChain processed 7380 transactions per second (TPS).Assuming an average transaction size of 256 bytes, each shard stores around 9.93 GB per day, for a total daily storage of 159.6 GB across all shards.After 60 days, the storage required per shard escalates to 600 GB, for a total of 9600 GB across all shards.This exponential growth in storage highlights the lack of a trade-off between transactional throughput and storage in these approaches.

Approaches to Reduce Storage
Strategies in this category are classified into two distinct approaches: centralized and decentralized, as illustrated in Table 1.In the case of centralized strategies, data are stored in a single location or managed by a central entity, resulting in significant storage savings compared to decentralized storage.Conversely, decentralized strategies distribute data across multiple nodes within the blockchain network, enhancing security and availability by eliminating dependence on a single node.We delineate the advantages and disadvantages of each method and highlight the percentage of storage savings as follows.
3.2.1.Centralized Data CUB (consensus unit-based) is a centralized proposal to solve the storage problem in industrial blockchains.ZihuanYu et al. [16] organize different subsets of nodes called consensus units that work in parallel and are based on the assumption that all nodes in the same unit must trust each other.Then, each CUB node stores only a part of the blockchain data, and the entire subset stores a full copy, reducing the storage of the nodes by 90%.However, the assumption of inherent trust among nodes is rather idealistic, especially when services such as immutability and availability need to be guaranteed.In permissionless blockchain environments, such proposals are not applicable due to the specific requirements that decentralized applications develop in these environments.
Xiaohai Dai et al. [17] proposed Jidar as a better CUB.Each node in Jidar only stores the transactions it considers relevant for processing, and stores the identifier of the other transactions in a Merkle root, reducing storage.For the synchronization of the new nodes, Jidar adds a mechanism that joins all the fragments stored in the different nodes, similar to joining the pieces of a puzzle.Jidar results show that they reduce storage by 98% compared to CUB.However, Jidar requires additional processing to generate the proof in each transaction.The availability of the blockchain is very low because a node can be offline for a long time, and it affects the synchronization of new nodes.Also, implementing this solution on a high-speed multi-chain is infeasible due to the high latency required to create new transactions.
Haolin Sun et al. [18] propose SASLedger, a centralized off-chain proposal, which relieves the storage burden of the nodes that replicate the blockchain since they use a centralized server to store the blockchain.Similarly to CUB, the nodes are divided into subsets, and each subset has a centralized server outside the system, achieving a 93% reduction in storage.The nodes that interact within the system guarantee the integrity of the database by keeping the hashes of the blocks.However, the solution is against decentralized applications, as it has a central point of failure that affects data availability.

Decentralized Data
SE-Chain is a protocol proposed by Da-Yu Jia et al. [19], where the system consistency affects the redundant storage.Each node in the SE-Chain works as a Bitcoin node and redundantly stores a complete copy of the blockchain.But the consistent blocks are stored in fewer replicas, i.e., the greater the depth of the block in the chain, the fewer the nodes that store the block.This strategy is inspired by decentralized file systems, such as IPFS [31] or Swarm [32].However, reducing blockchain replicas drastically reduces availability, and a DApp, being an application that does not need third parties, is at risk of losing essential data to guarantee traceability.In addition, the search for transactions that are at a high depth would have a longer query delay.
Lightweight blockchain is a protocol proposed by Chunlin Li et al. [20], an optimization scheme based on the Reed-Solomon (RS) erasure code [33] to reduce storage overhead while ensuring the availability and reachability of the blockchain.The storage scheme is focused on resource-constrained devices, making it more accessible for IoT scenarios.Moreover, the use of RS erasure coding allows for a reduction in storage without compromising data loss in the blockchain.However, it does not specify how transactional throughput varies depending on the specific IoT scenario.Erasure coding is a complex scheme that could potentially impact throughput parameters.The effectiveness of the proposal in reducing storage costs needs to be evaluated in permissionless blockchains to verify its benefits.

Summary
Finally, this section identifies a gap in the current research landscape: there is a lack of studies regarding the relation between throughput and storage in permissionless blockchain.This gap is evident in Table 1, where it is clear that existing research focuses on transactional throughput or storage efficiency, but not both.We have identified that the relation between transactional throughput and storage is complex.Understanding the variables in this relation must be approached from the perspective of transactional models.Therefore, this paper aims to fill a research gap by suggesting that understanding the relationship between storage and transactional throughput is achieved by proposing the transactional patterns in the UTXO model.

Understanding the Execution of Transaction Models and Their Relation to Blockchain Storage
This section analyzes the most relevant transactional models in the literature, such as the UTXO and the account model.The goal is to understand their transactional behavior and the relationship with storage.This was done by abstracting the transactional models of Bitcoin and Ethereum into transactional cases: three cases for the UTXO model and one for the account model.Using these abstractions, we performed a formal and experimental comparison and identified which of the two models incurs higher storage costs.

UTXO Model Storage Growth Analysis
In the UTXO transactional model, each transaction consumes one or more unspent outputs and generates one or more new outputs.When a new transaction is generated, it is possible to choose which unspent outputs are involved.This selection is arbitrary as long as the sum of the inputs is greater or equal to the total value of the outputs.The arbitrariness of the UTXO model allows for simultaneous operations while ensuring that the new transaction is directly linked to previous transactions on the blockchain.To better understand transaction execution consider the following example.
Example: Suppose that Alice purchases a coffee from Bob using Bitcoin.Alice has BTC 0.2as unspent outputs in her wallet, and the coffee value is BTC 0.1.Three cases can be produced after the purchase regarding how unspent outputs can be selected: (a) a single output, (b) multiple outputs with a value less than the input value, or (c) multiple outputs with the same value as the input value.

(a)
In the first case, as shown in Figure 5a, Alice has a single output in her wallet with a value of BTC 0.2.To pay for the coffee, she creates a transaction that splits the BTC 0.2 unspent output into two new outputs: one with BTC 0.1 that she sends to Bob and another with BTC 0.1 that she sends back to herself.(b) In the second case, as shown in Figure 5b, Alice has multiple outputs in her wallet with a value less than the input value.To pay for the coffee, Alice merges the unspent outputs with a lesser value up to BTC 0.1, and creates a transaction that she pays to Bob.(c) In the third case, as shown in Figure 5c, Alice has multiple outputs in her wallet with a value equal to the input value.To pay for the coffee, Alice transfers the unspent output with the same value as the coffee and creates a new transaction that is sent to Bob.
The example above shows that the execution of the UTXO model has two features: the order selection of unspent outputs and the concurrently executed transaction.The arbitrary order of unspent output selection allows granular control over the input consumed by each transaction, allowing flexibility since a single transaction can consume multiple combinations of unspent outputs.This flexibility of the UTXO model allows for the simultaneous execution of unspent outputs.This approach facilitates the processing of multiple operations from a single unspent output within a single transaction, increasing transactional throughput.However, we have observed that this simultaneous execution in the UTXO model incurs a high storage cost.This cost escalates with an increase in the number of new unspent outputs.This additional storage demand impacts the efficiency of these nodes' storage capabilities.A detailed analysis of the storage costs associated with the UTXO model is provided in Section 5.3.

Account Model Storage Growth Analysis
In the account model, each user has a unique address used as an identifier and associated with the balance of the transaction history.Figure 6 shows how an address's balance, as a state, is updated by transactions, which subtract transferred value assets from the sender's account and add value to the recipient's account.An example of the account model is traditional banking systems, where a user has a unique account number associated with their balance.When a user initiates a transaction, the funds are debited from their account and credited to the recipient's account.The account balance represents the current state of the user's funds, and all transactions are recorded in a ledger.Ethereum's programmability allows for two additional types of transactions within its account model: those that deploy contracts on the Ethereum virtual machine (EVM) and those invoking functions of these smart contracts.Each contract within the EVM operates under its unique set of rules and transactions, executed by external transactions.However, maintaining account states in Ethereum, as illustrated in Figure 6, requires transaction serialization.This condition limits high transaction throughput but is offset by transactions that require less storage capacity.
In the transactional models described before, we find significant differences in terms of transaction execution, which directly impacts storage requirements.For instance, the Bitcoin model can split one output to create new ones and merge multiple outputs into a smaller set, as shown in the Alice and Bob example.This flexibility means that the storage size of each transaction can vary depending on the number of outputs it manipulates.On the other hand, the account model manages the state in a serialized manner that is less storage-intensive but at less throughput.Each transaction updates the state of accounts directly, leading to a more predictable and often smaller storage footprint per transaction compared to the Bitcoin model.
To validate this analysis, we conduct an analytical study, comparing the two models by representing them as graphs (The details of the formal comparison of the two transactional models are available in Appendix A), and evaluate a particular case in the following subsection.

Transaction Sizes in Bitcoin and Ethereum
In this section, we analyze the Bitcoin and Ethereum blockchains.Our hypothesis based on the previous section is that the UTXO model requires more storage than the account model.For our comparison, we used a random sample of 10% of the transactions processed on each blockchain until 4 July 2023.This resulted in the analysis of 84,474,947 transactions in Bitcoin and 348,506,740 in Ethereum.
For data extraction, a set of specific tools and libraries were used: BlockSci version 0.7 [34], Geth version 1.12.0 [35], Python 3, along with the libraries Pandas, NumPy, Multiprocessing, and Matplotlib.The repository for reproducing the experiments can be found at: https://github.com/jdom1824/Unlocking-UTXO-transactional-patterns(accessed on 3 June 2024).The results obtained are visualized in the form of histograms, shown in Figures 7 and 8, to facilitate comparison.The X-axis represents the size of the transactions, while the Y-axis represents the number of transactions.When comparing the histograms, it is clear that the distribution of transactions in Bitcoin extends up to 1 MB.This is a significant size that reflects the robust nature of the UTXO model, as it can handle large transactions while resisting failures.In contrast, Ethereum operates differently.Only a small number of transactions in Ethereum reach a size of up to 0.3 MB.This is less than a third of the maximum observed in Bitcoin, indicating a more compact transaction size in Ethereum's model.
A closer look at the data reveals that most transactions in Ethereum are situated in the range of 0.13 MB.This is a narrower range compared to Bitcoin, where a wider distribution is observed, reaching up to 0.2 MB.This difference in distribution patterns between the two cryptocurrencies provides valuable insights into their respective transactional models.
As a result of these observations, the histograms suggest that the transactional model of Bitcoin implies a higher storage cost.This cost is not static; it is anticipated to escalate in line with the fragmentation of unspent outputs, as depicted in example (a) of Figure 5.This trend suggests that as Bitcoin usage increases, transactions increase storage requirements.
On the other hand, Ethereum presents a different scenario.It has a lower storage cost that is expected to remain constant within the same storage ranges.This stability is related to transaction serialization, indicating a more stable model for Ethereum in terms of storage.This has significant implications for the development of DApps on Ethereum.

Summary
This section analyzes the execution of the transactional models for both Bitcoin and Ethereum.We identified that the transactional model of Bitcoin is more flexible when selecting the available outputs to consume, while the Ethereum model presents simpler transactions that are easily programmable.The flexibility of the UTXO model makes it efficient when transferring value to users, while the account model is limited by serialization to update the state of the EVM.
We established the hypothesis that the UTXO transactional model incurs higher storage costs due to the splitting and consolidation of unspent outputs.We confirmed our hypothesis with an analytical study in Appendix A as well as the histograms shown in Figures 7 and 8.Although the UTXO model is storage-intensive, it also allows for significant transaction throughput.This is achieved by allowing multiple operations within a single transaction, providing flexibility between transaction throughput and storage, and showing the signs of the trade-off in the parameters.
In the following section, we focus on the UTXO transactional model, specifically on the model's flexibility to perform multiple operations.We delve deeper into transactional patterns to define the trade-off between storage parameters and transactional throughput.

Unlocking Transactional Patterns Based on Spent-By Relation
This section unlocks transactional patterns of the UTXO model to reveal the trade-off between transactional throughput and storage.To do this, we used abstractions from previous analyses and defined the spent-by relation.We then modified the cardinality of the spent-by relation using less than, greater than, and equal functions to observe three transactional patterns within the UTXO model: splitting, merging, and transferring.For clarity in this analysis, we proceed based on the premise that the number of nodes (η) within a permissionless blockchain system grows linearly.

Defining the UTXO Model as a DAG
The UTXO model is defined as a DAG.Formally, it is represented as a tuple G = (V, R), where V is a finite set of vertices, and R is a set of edges, such that we have the following:

•
The set of vertices represents the outputs of the UTXO model and is divided into two subsets V = Ξ ∪ Θ.Here, Ξ is the set of spent outputs, and Θ is the set of unspent outputs.

•
The set of edges R is determined by the spent-by relation, which specifies how the Θ and Ξ are related.

Spent-By Relation "←"
To define the spent-by relation, we begin by partitioning the graph (G) into a subgraph, H = (V ′ , R ′ ), as illustrated in Figure 9, where Ξ s ⊂ Ξ, Θ s ⊂ Θ, such as V ′ = Ξ s ∪ Θ s .The spent-by relation defines the set of relations that exist between subsets of unspent outputs and spent outputs.Formally, we define the spent-by relation as a subset R ′ of the Cartesian product Ξ s × Θ s : x ← y, where x ∈ Ξ s and y ∈ Θ s Based on the cardinality relation between Ξ s and Θ s , different transactional behaviors are observed: splitting, merging, and transferring.The splitting pattern occurs when a set of spent outputs is divided into a larger set of unspent outputs (i.e., |Θ s | > |Ξ s |).The merging pattern manifests when multiple spent outputs are combined into a smaller number of unspent outputs (i.e., |Θ s | < |Ξ s |).Lastly, the transferring pattern arises when each element in Ξ s is linked precisely to one element in Θ s (|Θ s | = |Ξ s |), representing a one-to-one relation between spent and unspent outputs.

Unlocking Transactional Patterns
This section focuses on transactional patterns and introduces the relationship between throughput and storage parameters.

Splitting Pattern
To illustrate the splitting pattern, let us revisit the example of Alice and Bob, specifically referencing the scenario presented in Figure 5a.This pattern involves dividing one or several unspent outputs into smaller parts, as illustrated in Figure 10.However, it is important to highlight that we have generalized the splitting pattern by extending it to all scenarios where the set of unspent input values is greater than the set of spent output values.In the behavior of the splitting pattern, it is observed that the number of operations depends on a factor defined within the application.For instance, a single Bitcoin in an unspent output can be divided into up to 10 8 new outputs [36].Therefore, to calculate the number of outputs per splitting pattern and its associated storage, we present the following definitions:

Definition 2. (Outputs per splitting pattern)
The number of outputs produced by a splitting pattern within a given time interval is quantified using two parameters: the splitting factor (κ s ) and the time interval (t), where κ s = |Θ s | and |Θ s | > |Ξ s |.Consequently, the output rate per time interval can be expressed as follows:

Definition 3. (Storage per output splitting pattern)
The storage generated by the splitting pattern is related to the average output size (τ), the number of outputs generated per time interval (σ s ), and the number of nodes in the system (η).This is represented as follows: Note that the value of (κ s ) in Definition 2 is determined by each application, setting constraints on the number of new outputs.We operate under the assumption that κ s is a very large number, and therefore, σ s presents a high degree of transactional throughput.However, as indicated in Definition 3, there is a strong relation between transactional throughput and storage.This relation is only observable at the level of transaction models.Our observations reveal that as the number of outputs processed in a transaction increases, so does the storage cost on the nodes.Consequently, storage grows in proportion to transactional throughput.
To evaluate the maximum growth of storage, we employ a Big O notation.This indicates that the increase in storage, following the splitting pattern, is given by O(κ s η).

Merging Pattern
The merging pattern emerges from the consolidation of multiple outputs into a reduced set of unspent outputs, as illustrated in the example of Alice and Bob presented in the previous section, specifically in Figure 5b.The primary characteristic of the merging pattern lies in the reduction of the number of new outputs to a smaller set compared to the input values, establishing a balance with the splitting pattern.The abstraction of this pattern is illustrated in Figure 11.To calculate the number of outputs per merging pattern and the amount of storage used per output, we present the following definitions.
Definitions 4 and 5 illustrate how the merging pattern improves the efficiency of future transactions.This improvement results from consolidating multiple outputs into a reduced set of unspent outputs, which reduces the processing constraint for subsequent transactions.As a result, less time and fewer computational resources are required to process and validate transactions, boosting the overall system efficiency.However, it is important to consider that defining the storage per output merging pattern suggests a similarity to the splitting pattern.We recognize that the average output size, τ, can vary significantly depending on the pattern or transaction type.We explore this variation further in Section 6. Storage growth, following the merging pattern, occurs at a rate of O(κ m η).

Transferring Pattern
The transferring pattern represents the exchange of ownership between parties without the need to engage in computational processing to split or merge unspent outputs.This pattern can be visualized in a scenario where an unspent output changes ownership through its inclusion as an input in a new transaction, generating a new output, as illustrated in Figure 5c.
The transferring pattern is a fundamental component in both the Bitcoin UTXO model and the Ethereum account model.In the UTXO model, it is characterized by the serialized tracing of unspent outputs, while in the account model, it updates the state of individual accounts or Ethereum addresses.Both models share the transferring pattern for managing transactions, as depicted in the abstraction shown in Figure 12.An interesting feature of the transferring pattern is that only a one-to-one operation is carried out at each time interval.This structure has notable implications for both parameter storage requirements and transactional throughput.

Definition 6. (Storage per output transferring pattern)
The storage generated by the transferring pattern is related to the average output size (τ a ), the number of outputs generated per time interval (σ t ), and the number of nodes in the system (η).This is represented as follows: Since a transaction in the transferring pattern is constrained by the non-concurrency of the operations, storage grows constantly.In terms of computational complexity, this means that the storage requirements for this pattern increase linearly with the number of nodes O(η) in the network.This realization comes from the recognition that the transferring pattern is sufficient to represent the serialization process within the account model or UTXO model.

Relationship between Throughput and Storage
Transactional throughput refers to the system's capacity to process transactions over a time interval, and each transaction in environments such as Bitcoin can generate multiple outputs.
We consider the following parameters before defining the transactional throughput and its relationship with storage:

•
Outputs across transactional patterns This parameter, denoted as κ, represents the total number of outputs generated by all transactional patterns (splitting, merging, and transferring).It is the sum of the outputs from each pattern, expressed as follows: • Number of outputs of all transactional patterns in a time interval: This parameter, denoted as σ, represents the total number of outputs generated by all transactional patterns per time interval.It is calculated by dividing the total number of outputs κ by the time interval t, expressed as follows: • Average number of outputs per transaction: This parameter, denoted as λ, represents the average number of outputs generated per transaction.It is calculated by dividing the total number of outputs κ by the total number of transactions Tx, expressed as follows: Definition 7. (Transactional Throughput) We define transactional throughput (tps) as the number of transactions processed per second.If σ is the total number of outputs generated in a time interval t, and λ is the average number of outputs per transaction, then transactional throughput is calculated as follows: Definition 8. (Throughput-Storage Relationship) The storage generated by each transactional pattern is related to the average output size (τ), the number of outputs generated per time interval (σ), and the number of nodes in the system (η).Therefore, the relation between the transactional throughput and storage is given by the following: By increasing the transactional throughput (tps), we also increase the number of outputs per interval of time (σ) and, therefore, the required storage increases.

Summary
In this section, we unlock the transactional patterns inherent in the UTXO model.We formalize the UTXO model by representing it as a DAG and define the spent-by relation.We reveal the trade-off between transactional throughput and storage based on the definitions of each pattern, highlighting that storage growth is related to the number of new outputs generated.We analyze each pattern's contribution to storage size, employing Big O notation.The underlying premise is that the number of nodes in the permissionless blockchain network increases at a linear rate.However, although analytically, the splitting and merging transactional patterns consume more storage, these results are not directly comparable due to our assumption of output size as a constant τ.In the following section, we delve deeper into this variable and define which pattern is most costly in storage and which provides more flexibility in the throughput.

Experimental Comparison of Storage Costs in UTXO Transactional Patterns
This section analyzes the storage cost of each pattern to identify which is higher and which provides greater flexibility in the storage/throughput trade-off.
In the theoretical analysis that we previously conducted, we used a constant τ a for the transferring, merging, and splitting transactional patterns.For this experimental study, we used the entire Bitcoin blockchain as our dataset, examining a total of 791,800 blocks to determine the storage of each pattern.Figure 13 shows the experimental framework for our analysis using Bitcoin Core version 0.22 [37].We synchronized a complete Bitcoin node up to 4 July 2023 and extracted data for further analysis using BlockSci version 0.7.0.After extracting the data, we filtered the dataset based on transaction patterns and converted it into graphical representations to enhance the clarity and interpretability of the results discussed in this section.Derived from this work, we have created a database containing 800 million transactions, which can be used to replicate the experiments in [38].As mentioned before, the initial step taken with the dataset involved filtering and classifying Bitcoin transactions.This classification results in the distribution of transactional patterns within Bitcoin and is represented in a pie chart, as shown in Figure 14.We observed that the splitting pattern is the most frequent in Bitcoin, accounting for 64.6% with a total of 545,585,796 transactions.This trend emerges because Bitcoins are generated through the Coinbase transaction, which includes a UTXO with a significant amount of Bitcoin.Due to the high dollar value of each Bitcoin, their utilization likely begins with a division.
The transferring pattern accounts for 22.1% of classified transactions, totaling 186,881,657.We can assert that this is the second most utilized pattern in Bitcoin.The reason is that in Bitcoin, a fee is levied based on the storage consumed by the transaction.Since this pattern is the least storage-intensive, it is the second most common.
The merging pattern accounts for 13.3% of transactions, amounting to 112,107,603.From these data, we infer that the consolidation of unspent outputs is a more storageintensive process.We assume that the available output for expenditure must encompass the causal history of previous transactions.
The classification depicted in Figure 14 reflects the most used patterns in Bitcoin.From this, we discern an indication suggesting that the merging pattern is the most storageintensive.We then analyze each transaction pattern individually, considering the number of outputs against storage size.This clarifies the storage difference between the splitting and merging patterns.Moreover, we confirm that the least storage-intensive transaction pattern is transferring.

Storage Cost in Splitting Pattern
Figure 15 provides a graphical illustration of transactions classified under the splitting pattern.The X-axis represents the size of the transactions in bytes, whereas the Y-axis represents the number of outputs used in each transaction.Through an in-depth analysis of the data density and distribution depicted in the chart, we confirm our initial observation that the splitting pattern is dominant within Bitcoin.
Concerning the relation between the number of outputs and storage costs, we identified transactions labeled as splitting, which recorded up to 15,000 outputs in a single transaction.In terms of storage, this transaction has demanded up to 0.5 MB.Nevertheless, the transactions tend to fall within a range of up to 4000 outputs with a storage requirement that is close to 0.2 MB.
We highlight the significance of the splitting pattern in Bitcoin.While it is the most common transactional pattern, and some transactions demand substantial storage resources, the overall trend remains moderate.It is important to note that one Bitcoin is split into up to 100 million parts, making this thorough analysis of the pattern crucial to guide future research efforts within the Bitcoin network.

Storage Cost in Transferring Pattern
Figure 16 provides a graphical illustration of transactions classified under the transferring pattern.Based on the spent-by relation, this pattern contains transactions that maintain a one-to-one operation within the set of outputs.In the graph, the X-axis represents the size of the transactions in bytes, while the Y-axis indicates the number of outputs used.
It is observed that some transactions reach up to 2000 outputs, with a storage cost of 0.15 MB.However, the overall trend revolves around transactions using approximately 500 outputs, with a storage requirement of about 0.05 MB.
In addition, in Figure 16, two distinct point distributions are revealed, each representing a specific transaction type.Upon analysis, it is meaningful that certain transactions with a larger number of outputs have a lower storage cost, especially in the 0.05 to 0.06 MB range.This variability in storage arises from the diversity of transaction types in Bitcoin, which includes standard transactions, Multisig transactions [39], Pay-to-Script-Hash (P2SH) transactions [40], SegWit transactions [41], CoinJoin transactions [42], and time-locked transactions [43].Each type has its unique storage characteristics and requirements, reflecting the variety of transactions observed in the graph.

Storage Costs in the Merging Pattern
Figure 17 provides a classification of transactions under the merging pattern.In this chart, the X-axis represents the volume of the transactions in megabytes (MB), while the Y-axis quantifies the number of outputs involved.It is noteworthy that several transactions reach up to 1 MB, which corresponds to the maximum capacity of a Bitcoin block before the SegWit implementation, with an output range oscillating between 6000 and 7500.However, transactions within this pattern fall within a range of approximately 2500 outputs, consuming storage close to 0.2 MB.
Analogous to previous figures, some transactions have a storage distribution that deviates from the classification of the merging pattern.Note that Bitcoin offers a variety of transaction types.This diversity is interesting for future studies and possible classification of patterns in different Bitcoin transaction types [44].

Analysis of Transaction Pattern in Storage/Throughput Flexibility
In our detailed review of the three patterns, we observed that the transferring pattern has the lowest storage requirement, rating it as the second most common pattern in Bitcoin.The splitting pattern is the most common and offers the best trade-off between transactional throughput and storage.The merging pattern supports operations that consolidate outputs on the order of thousands but require more storage.However, the storage cost for each pattern varies depending on the structure.For example, a structure with a higher number of spent outputs than unspent outputs is more costly in terms of storage because it is necessary to prove ownership of the coins by unlocking the transaction script, which requires a digital signature, as shown in Table 2. Further comparison between the structures of spent and unspent outputs, depicted in Tables 2 and 3, illustrates the different storage requirements.

PkScript
Script that sets the conditions to unlock funds Variable Value Satoshi Amount 8 bytes

Summary
In this section, we found that in the UTXO model, there is no fixed storage value for spent and unspent outputs; this varies depending on the transactional pattern and types of transactions.We observed that the splitting pattern offers the best trade-off between throughput and storage, allowing millions of operations in a single transaction while keeping the storage low.However, this benefit is offset by the merging pattern, which consolidates these operations into transactions that, although more storage-intensive, reduce the number of outputs and prevent overflow in processing.Finally, we conclude that the key to achieving storage scalability in a permissionless blockchain system resides in proposing strategies that optimally trade off the relationship between throughput and storage at the transaction pattern level.

Discussion
This research was the first to highlight the importance of the relationship between throughput and storage efficiency, setting the stage for future research on achieving high transactional throughput without sacrificing storage efficiency.
In the current state of the art, different approaches tend to focus on throughput at the expense of storage, or vice versa.For example, while techniques such as sharding and off-chain improve throughput, they also introduce storage challenges.Similarly, storage reduction methods reduce transactional throughput.Our approach shows that it is possible to achieve a balance between the two parameters.For example, Section 6.1 reveals that the splitting pattern in the UTXO model maintains a high number of operations while using low storage consumption.Thus, exploring techniques based on generating this pattern more intensely instead of others will be favorable in terms of storage requirements.This insight paves the way for new blockchain designs that hold this trade-off, leading to a more scalable blockchain.

Practical Implications of Transactional Patterns
Unlocking transactional patterns to abstract transactions in a granular manner showcases its applicability across several blockchain research.For instance, the direct relation between inputs and outputs that our model describes enhances traceability analyses.In high-frequency trading environments where private blockchains are used, and storage constantly grows, the splitting patterns could increase throughput by allowing transactions to be executed in parallel.Lastly, new types of transactions could be proposed based on the identified transactional patterns.These innovations could enhance privacy and security in blockchain environments.

Discussion of Experimental Results
The experimental comparison based on the classification of transactions of 791,800 blocks shows how each pattern grows in storage requirements according to the number of outputs.For example, the splitting pattern, which represents 64.6% of the transactions, shows that its average storage growth per number of outputs is 32 bytes.This flexibility to increase the number of operations at a relatively low storage cost makes this pattern storage efficient.
The transferring pattern, which comprises 22.1% of the transactions, requires around 0.05 MB for approximately 500 outputs, or about 100 bytes per output.This sets it in the intermediate in terms of storage efficiency.
On the other hand, the merging pattern, which represents 13.3% of the transactions, involves the consolidation of multiple inputs into fewer outputs, which is inherently more storage-intensive.This consolidation pattern has an average output size of 128 bytes.Although it is crucial for managing and reducing the number of UTXOs in the system, it also introduces higher storage costs, with transactions that can reach up to 1 MB.

Future Research
Future research explores models that delineate the relationships among transactional throughput, storage, latency, availability, and reachability.Additionally, future studies investigate different transaction types in Bitcoin to develop methods to optimize storage efficiency.
One strategy for future work is to maintain the balance between the set of outputs in a transaction by identifying transactional patterns.For example, a set of transactions in the mempool could be grouped according to the splitting and merging pattern into a single transaction, similar to a CoinJoin transaction, thus reducing storage requirements and allowing more transactions to be processed per block, increasing throughput.
We anticipate that any method that seeks to increase transactional throughput will also need to consider the storage requirements.Future suggestions from this study could explore the fragmentation of the blockchain through transactional patterns to manage space and carefully increase throughput.We invite other researchers to use the databases [38] and tools shared in this study to analyze blockchains based on the UTXO model, such as Litecoin, Dogecoin, and Cardano.Future work with these tools will aim to identify transactional patterns of these blockchains and compare them with this study to improve the storage scalability of the system.

Conclusions
This research focuses on a permissionless public blockchain and reveals the trade-off between the storage and transactional throughput parameters.We unlocked the transactional patterns of the Bitcoin and Ethereum transactional models and found a direct relation between transactional throughput and storage.We defined the spent-by relation that reveals transactional patterns within the UTXO model, facilitating the categorization of Bitcoin and Ethereum transactions.After performing a detailed analysis of the storage growth corresponding to each pattern, we found that the UTXO model requires more storage overhead compared to the account model.This was done by abstracting the transactional patterns and evaluating each pattern in terms of storage growth using Big O notation, assuming that the set of nodes that belongs to the permissionless blockchain network grows linearly.We have successfully encapsulated the transactional behavior in both Bitcoin and Ethereum networks.Our results highlight the need to consider the relationship between throughput and storage to achieve scalability in blockchain storage.
In the UTXO model, as we have previously seen, a single transaction t can theoretically divide a UTXO into n new UTXOs.If we have m transactions in a state change, then the maximum number of new operations is O(m n ).
In contrast, the account model allows each transaction t to generate at most a single operation.In a state change with m transactions, the maximum number of operations is m, and the complexity is O(m), considering that each transaction can perform one operation or modify a smart contract, as shown in previous examples.
It is crucial to note that the abstraction of these models is used to compare them in terms of operations per state transition and, therefore, does not capture the complexity of more advanced transactions in Bitcoin or Ethereum.
From the perspective of Big O notation, we assert that the account model has a lower computational complexity to process a number of operations in a state transition, while the UTXO model has a higher complexity.It is noted that these calculations are theoretical and do not consider practical limitations, such as block size or the maximum number of divisions of a UTXO in Bitcoin.However, this abstraction allows us to conclude that the UTXO model is more efficient for generating a large number of operations in a state transition, although this performance comes at a higher storage cost.On the other hand, Ethereum is less costly in terms of storage, but the number of operations per state transition is limited by its serialization.

Figure 1 .
Figure 1.Growth trend of Ethereum storage capacity.The bar chart illustrates the exponential growth in Ethereum's storage demand over time, peaking at 12,483 nodes and requiring nearly 6000 terabytes of storage.

Figure 2 .
Figure 2. Graphical representation of a DAG showing the flow of transactions in the UTXO model from a Coinbase output (1) to a single input (8), noting the divergence and convergence of paths.

Figure 3 .
Figure 3. Serialized graph, illustrating the transaction sequence in the account model from the origin node (1) to the end node (4).

Figure 4 .
Figure 4. Scatter plot showing the dilemma faced by blockchain environments in the parameters of transactional throughput and storage efficiency.The dots indicate proposals to improve one of the two parameters, including decentralization, centralization, block size, off-chain strategies, and sharding.

Figure 5 .
Figure 5. Transaction scenarios in the UTXO model: (a) Alice splits a single output of BTC 0.2 to pay Bob BTC 0.1 and returns BTC 0.1 to herself; (b) Alice consolidates several smaller outputs, summing up to BTC 0.1 for Bob's payment, and (c) Alice directly transfers an output of BTC 0.1 to pay Bob the exact amount due for the coffee, illustrating the flexibility in transaction structuring within the UTXO model.

Figure 6 .
Figure 6.Illustration of the account model: The transition from State N to State N + 1 via a transaction where Alice sends 0.1 Ether to Bob, updating both their wallet balances.

Figure 7 .
Figure 7. Histogram showing the distribution of Bitcoin transaction sizes on a logarithmic scale, compiled from a dataset of 84,474,947 transactions, highlighting the frequency of transaction sizes in megabytes.

Figure 8 .
Figure 8. Histogram illustrating the size distribution of Ethereum transactions on a logarithmic scale, showing the variation in transaction sizes up to 0.3 megabytes.

Figure 9 .
Figure 9. Visualization of a UTXO model's subset represented as a DAG, where the highlighted subgraph H delineates the relation between spent and unspent outputs within the system.Let us define the set of edges, R ′ , which satisfies the following properties: • R ′ = {(x, y).This represents all pairs (x, y), where x and y are elements of the sets Ξ s and Θ s , respectively.• |R ′ | = |V ′ | − 1.This means that the number of edges in R ′ is one less than the number of vertices in V ′ .

Figure 10 .
Figure 10.Splitting pattern, where a single input from A is divided into multiple outputs B, C, . . ., representing an n-number of possible outputs.

Definition 4 .
(Outputs per merging pattern) In this definition, we use κ m to represent the number of outputs generated by the merging pattern, where κ m = |Θ s | and |Θ s | < |Ξ s |.Therefore, the number of outputs generated by the merging pattern equals the set of unspent outputs, which by definition are fewer than the number of spent outputs.

Definition 5 .
(Storage per output merging pattern) The average output size τ, the number of outputs generated per time interval (σ m ), and the number of nodes in the system (η) measure the storage generated by the merging pattern per time interval as follows:

Figure 11 .
Figure 11.Merging pattern, where multiple outputs from nodes B, C, . . ., converge into a single output at node A.

Figure 12 .
Figure 12.Transferring pattern, showing a direct relation from X to receiver Y.

Figure 13 .
Figure 13.Flowchart of the experimental framework used for analyzing transactional patterns in the UTXO Model, starting from data extraction using Bitcoin Core 0.22, processing with BlockSci 0.7.0 and Python 3/C++, to the final stage of converting data into figures for result interpretation and feedback iteration.

Figure 14 .
Figure 14.Pie chart showing the relative distribution of splitting, merging, and transferring patterns within Bitcoin, with numerical and percentage breakdowns for each category.

Figure 15 .
Figure 15.Scatter plot correlating transaction size in megabytes (MB) to the number of outputs for transactions that follow the splitting pattern, where each point represents a single transaction.

Figure 16 .
Figure 16.Scatter plot showing the relation between transaction size (MB) and the corresponding number of outputs for the transferring pattern, maintaining a one-to-one spent-by relation, where each data point represents a single transaction with an equal number of inputs and outputs.

Figure 17 .
Figure 17.Scatter plot showing the relation between transaction size in megabytes (MB) and the number of outputs for transactions characterized by the merging pattern, illustrating the consolidation of multiple inputs into fewer outputs.

Table 2 .
Spent output in a regular Bitcoin transaction.

Table 3 .
Unspent output in a regular Bitcoin transaction.