Context matters: Methods for Bitcoin tracking

Bitcoin and other cryptocurrencies are well-known for their privacy properties that allow for the “ anonymous ” exchange of money. Bitcoin tracking with taint analysis remains challenging as it does not account for the change in Bitcoins' ownership or the usage of Privacy-Enhancing Technologies (PETs) to obscure Bitcoins' movement


Introduction
Research into Bitcoin tracking remains a relevant subject due to the need to identify and trace Bitcoins related to illegal activities, such as ransomware, sales of illicit goods, tax evasion, and cryptocurrency theft. For example, the Singaporean cryptocurrency exchange, KuCoin, was hacked in September 2020 and lost around 1000 Bitcoins (66 million USD) (Hui and Zhao, 2021). These illegal activities erode trust in cryptocurrencies. While Bitcoin is no longer the cryptocurrency with the most effective tracking resistance, compared to newer cryptocurrencies with additional privacy protocols (e.g., cryptocurrencies like Zcoin (Miers et al., 2013) and Monero (Noether, 2015)), it remains the most prominent and valuable cryptocurrency in use today due to its high acceptability (Hileman and Rauchs, 2017) and pseudonymity system to protect its users' identity (Nakamoto, 2009;Conti et al., 2018;Brown, 2016). This makes Bitcoin attractive to individuals who are looking for a less traceable currency, compared to traditional currencies.
Earlier studies (M€ oser et al., 2014;Anderson et al., 2018), which we present in Section 2, proposed methodologies for Bitcoin tracking named taint analysis. These studies are the current stateof-the-art of taint analysis as they are adopted by numerous more recent works on Bitcoin tracking (Wu et al., 2021a;Oskarsd ottir et al., 2020;Bergman and Rajput, 2021) and analysis (Wu et al., 2021a;de Balthasar and Hernandez-Castro, 2017). However, the tracking of Bitcoins is still challenging due to the current tracking methodology only following Bitcoins' movement from one address to another even if they are long exchanged to other unrelated users and the rise of new Privacy-Enhancing Technologiess (PETs) like the CoinJoin method or mixer services that allow individuals to evade Bitcoin tracking (Meiklejohn and Orlandi, 2015). There has been no significant improvement in the precision of the tracking of the movement trail of individual users' Bitcoins. Therefore, our work contributes to the development progress of cryptocurrency tracking, which can assist cybersecurity in combating cryptocurrency cybercrimes.
In this paper, we propose a methodology to improve the precision of Bitcoin tracking by making the tracking process adaptable to the context of address ownership and tracking evasion. The tracking process will stop tracking Bitcoins that are considered to not be in the hands of the targeted users (e.g., illegal Bitcoin users that steal targeted Bitcoins) any longer, and thereby continuing the tracking on these Bitcoins would not provide meaningful information (we refer to this as unessential tracking).
We conduct an experiment to illustrate the application of our methodology using historic Bitcoin theft incidents as sample cases. We summarise our contributions as follows: C We propose a new approach to Bitcoin tracking that can significantly reduce the number of unrelated transactions in the tracking results by tailoring taint analysis strategies to track tainted Bitcoins until they reach potential exit points (e.g., service and mixer entities). The approach we propose here improves the state-of-the-art taint analysis to produce significantly fewer false positive results. The tracking and PETs profiling in this work also includes recently developed PETs that previous tracking studies did not consider, such as decentralised mixer services and transaction protocols like Lightning Network. C We develop two new context-based taint analysis strategies that use the background of the targeted Bitcoins to track their movement instead of the arbitrary distribution ruleset in the previously proposed strategies. C We design a set of metrics to evaluate tracking accuracy based on transaction and address indicators reflecting specific behaviours (e.g., the distribution of the stolen Bitcoins and PETs' usage) that show potential for detecting shifts in transaction behaviour, which may signify a change in Bitcoin ownership unidentified by the address profile data.
The remainder of the paper proceeds as follows. In Section 2, we give and review the related work on Bitcoin tracking and deanonymisation methods. We detail our methodology in Section 3. We present the sample cases we investigate and the criteria we use to build control groups for our experimentation in Section 4. We detail the results we obtained in Section 5. We provide a discussion of the overall results and limitations in Section 6. Lastly, we conclude in Section 7.
The Python source code implementation of the methodology is available online. 1 We used BlockSci (Kalodner et al., 2020) to extract Bitcoin blockchain data. The Bitcoin blockchain data we use in this experiment is from the first block (2009-03-01) to block 673,473 (2021-03-06).

Related work
We review in this section the Bitcoin tracking and deanonymisation literature (Section 2.1) that propose cryptocurrency forensic analysis methodology we implement in this work, followed by Privacy-Enhancing Technologies (PETs) that facilitate tracking evasion (Section 2.2), which we will profile for Bitcoin tracking modelling and tracking evaluation.

Cryptocurrency tracking methods
Bitcoin and other cryptocurrencies that use a similar privacy protocol as Bitcoin are traceable by the method called taint analysis and can be deanonymised with address clustering.

Taint analysis
Taint analysis or taint checking is a well-known data analysis concept that assigns a "tainted" value to a specific data source of interest and analyses the flow of information by following possible paths the tainting can reach. The concept is commonly used for security exploit detection (Piskachev et al., 2021), programming code analysis (Zhang et al., 2021;Galea and Kroening, 2020), and information flow analysis (Mandal et al., 2020).
Taint analysis is utilised as a tracking method in cryptocurrency that tracks targeted cryptocurrency coins using transaction information in the blockchain. The primary purpose of cryptocurrency taint analysis is to determine the association between the addresses in a transaction (M€ oser et al., 2013), which can be used to classify the targeted cryptocurrency coins (e.g., stolen Bitcoins resulting from a known theft transaction) as tainted (or "dirty"), and any address that uses or transfers them will be considered a tainted address. Meanwhile, coins that are unrelated to tainted coins are considered clean coins. Each taint analysis strategy applies a specific rule-set to estimate how the targeted cryptocurrency coins are distributed in the subsequent transactions.
The taint analysis method can be employed for tracking cryptocurrencies that utilise transparent blockchain systems similar to Bitcoin. For example, several studies have implemented taint analysis to track the movement of Ethereum coins and smart contracts tokens (Cheng et al., 2019a;Gao et al., 2019). However, privacy-oriented cryptocurrencies that obscure transaction and address information, like Zcash, Monero, 2 and Zcoin (Firo), 3 are typically immune to taint analysis tracking because they are explicitly designed to strengthen privacy against blockchain-based tracking.
We identify three taint analysis strategies proposed in the literature, which have been implemented for Bitcoin tracking in various studies (Spagnuolo et al., 2014;de Balthasar and Hernandez-Castro, 2017;van Wegberg et al., 2018;Ahmed et al., 2018;Cheng et al., 2019b).
White rectangles represent clean inputs or outputs, dark grey rectangles represent fully tainted ones, and light grey rectangles represent partly tainted ones. Note that we do not account for the transaction fee in these transaction examples. Each figure illustrates the tainted Bitcoins distribution according to the respective strategy.
2.1.1.1. Poison and Haircut. The Poison strategy is a taint analysis strategy that classifies every transaction output within the transactions as a fully tainted output, regardless of the number of tainted Bitcoins involved (M€ oser et al., 2014), as shown in Fig. 1a. The number of tainted Bitcoins will exponentially increase when tainted and clean Bitcoins are used together in the same transaction, which is a drawback that makes the strategy unable to provide precise tracking results.
The Haircut strategy shares the same tainting methodology as the Poison strategy, though implements an additional rule: instead of being classified as tainted entirely, each output in the transaction will receive a proportion of the tainted inputs according to their proportions (M€ oser et al., 2014), as shown in Fig. 1b.
While the Poison and Haircut strategies are the most common tracking strategies used in the previous Bitcoin tracking research, we argue that both strategies typically produce a very large number of tainted transactions due to their tainting methodology, especially when tainted Bitcoins get combined with other clean Bitcoins, which makes them impractical for both tracking and analysis purposes.
2.1.1.2. FIFO (first-in, first-out). The FIFO strategy is a concept of asset inventory management for sorting the order of items via distribution. The concept of FIFO is essentially that the first item that goes in is also the first one that goes out , as shown in Fig. 1c.
The FIFO strategy is implemented as a taint analysis strategy based on the argument that it has already been established for official law enforcement for tracking stolen traditional currency and can provide more precise results, compared to the Poison and Haircut strategies, as the FIFO strategy does not consider every resulting output as tainted. This would allow governments or relevant organisations to implement more practical law enforcement and blacklisting systems that can constrain illegal Bitcoins from a smaller number of transaction outputs and addresses .
However, as the FIFO strategy distributes Bitcoins based on a uniform predetermined way (from top to bottom), it is possible for the FIFO strategy tracking to produce inaccurate tracking results, in that the FIFO strategy distributes the tainted Bitcoins to the transaction output(s) that are not the intended destination of the tainted Bitcoins (e.g., distribute tainted Bitcoins to other unrelated users in a PET transaction). Using the example in Fig. 1c, if the 410 BTC output is the intended destination for the tainted Bitcoins, the FIFO strategy will produce inaccurate tracking results afterwards. Therefore, it is impractical to implement the FIFO strategy independently for tracking purposes and should be instead implemented in combination with other taint analysis strategies.
2.1.1.3. LIFO (last-in, first-out). A strategy that is a natural alternative to the FIFO strategy is the LIFO strategy which operates in the opposite ordering of the FIFO strategy. The LIFO strategy assumes that the last item that goes in is always the first to go out, as shown in Fig. 1d. It should be mentioned that the LIFO strategy also shares the same weakness as mentioned for the FIFO strategy.
Although taint analysis provides a foundation for Bitcoin tracking, simply employing taint analysis strategies to track individual user's Bitcoins from one address to another is inefficient due to the fact that taint analysis on its own does not take the transaction purpose and address ownership into account, which often produces unessential tracking results regardless of the strategy employed. For example, tracking Bitcoins after they reach addresses that belong to a cryptocurrency service indicates that the tainted Bitcoins are already exchanged with the service and are no longer in the possession of the targeted users. Consequently, it is futile to continue to taint the Bitcoins (once they reach a service) as the tracking would only provide transaction activity of the services and other Bitcoin users afterwards.

Address clustering and deanonymisation
Both address clustering and address deanonymisation process can assist in the Bitcoin forensic analysis by providing the information of address ownership to the taint analysis tracking process and assisting the analysis process to determine accurate Bitcoin spending and movement.
Bitcoin address clustering is the process of linking and classifying Bitcoin addresses that likely belong to the same user/entity into a group or cluster based on transaction information and predefined clustering heuristics. The Bitcoin address clustering process starts with every address classified in a cluster of one address. Bitcoin address clustering then processes all transactions of the Bitcoin blockchain to merge clusters that intersect according to the chosen heuristic. After the clustering process is completed, each address cluster can be labelled with an identification of the potential owners. As Bitcoin address clustering heuristics are typically created based on a specific assumption of transaction pattern due to the lack of verifiable ground-truth information, the proposed heuristics are not guaranteed to be capable of providing completely accurate results.
The most commonly used Bitcoin address clustering heuristic is the multi-input heuristic or input sharing clustering, which T. Tironsakkul, M. Maarek, A. Eross et al. Forensic Science International: Digital Investigation 42-43 (2022) 301475 classifies all the input addresses in the same transactions as belonging to the same entity (Kuzuno and Karam, 2017;Chang and Svetinovic, 2018;Zheng et al., 2020). The multi-input heuristic originated from the fact that in order to perform a multi-input transaction, the owner(s) of all the input addresses have to agree to the transaction by signing the addresses' private keys, which often implies that the input addresses in the same transaction would belong to the same person as they need to possess all of the addresses' private keys. Address clustering is a part of the address deanonymisation process, and Bitcoin addresses can be profiled using information from external off-blockchain information. For example, the two studies by Jourdan et al. (2018Jourdan et al. ( , 2019 implement the multi-input address clustering heuristic to create an address cluster data set and uses the external address profile data scraped from Wallet Explorer website 4 to identify cryptocurrency service address entity. The research identified 30,331,700 addresses belonging to 272 unique entities and discovered common transaction patterns of entities belonging to the same service type that can be applied for the classification of other unidentified addresses.
While the multi-input address clustering heuristic is one of the most utilised, in addition to reducing the tracking effectiveness, the CoinJoin method we discuss in Section 2.2 can significantly reduce the effectiveness of this address clustering heuristic.

Evading tracking with Privacy-Enhancing Technologies (PET)
There are three prominent PETs explicitly designed to facilitate Bitcoin tracking evasion, which Bitcoin users can utilise to obscure their transactions as presented below.

CoinJoin
CoinJoin is a PET where multiple Bitcoin users share the same transaction (Maxwell, 2013). The transaction created with the CoinJoin method, whether manually or via an external service, would often be a transaction with a very large number of input and output addresses. The primary purpose of the CoinJoin method is to specifically reduce the precision of address clustering by combining unrelated transactions from multiple users together in a single transaction, thereby reducing the accuracy of the multi-input address clustering heuristic (Maurer et al., 2017;Meiklejohn and Orlandi, 2015). It should be noted that while the CoinJoin method on its own would not be able to completely invalidate the taint analysis tracking, it can reduce the precision of taint analysis due to the increasing number of unrelated transaction outputs in a transaction.

Bitcoin mixing
Bitcoin mixing is a PET performed by a cryptocurrency mixer service (also often referred to as laundering or tumbling service), which facilitates its users with the mixing process (Barber et al., 2012). A mixer service operates by having users deposit their Bitcoins to one of its deposited addresses. Then the service would send unrelated Bitcoins to the user's destined address(es) in one or multiple transactions (Herrera-Joancomartí, 2015; Kethineni et al., 2018). The most successful mixing process would produce what is commonly called "zero-taint" Bitcoins by completely removing any transaction connection between the original and the resulted mixed Bitcoins, thereby rendering taint analysis tracking completely ineffective.

Off-chain transactions
The off-chain transaction is an external mechanism that allows Bitcoin users to exchange Bitcoins outside of the blockchain. One example is the Lightning Network protocol. As exchanges of Bitcoins in the off-chain transaction system are not recorded inside the blockchain, illegal Bitcoin users can evade blockchain transaction tracking by spending their illegal Bitcoins via the off-chain transaction system.
The Lightning Network protocol allows two or more Bitcoin users to exchange their Bitcoins without requiring any confirmation within the Bitcoin blockchain. The Lightning Network channel can be created by any Bitcoin user, which appears in the Bitcoin blockchain in the form of a P2WSH output to a multi-signature address (bech32). Bitcoin users first set up a Bitcoin Lightning Network node and send Bitcoin funds to the Lightning Network multi-signature address to create a network channel (Tikhomirov et al., 2020). This transaction is typically referred to as a funding or opening transaction. Next, the users can connect to other Lightning Network nodes, which will allow them to exchange Bitcoins with other users in the network. The Lightning Network channel has a maximum Bitcoin capacity limit per channel. The first version of the Lightning Network protocol has a maximum channel capacity of 0.042 BTC (Russell, 2017), while the maximum capacity was increased to 0.167 BTC in the version 0.10 update in 2020 (Russell et al., 2020).
Bitcoin exchanges between users inside the Lightning Network channel can be performed without any limitation until the channel is closed. Upon closing, the channel's address will distribute the Bitcoins back to users' addresses according to the closing balance in a transaction called settlement or closing transaction that will appear in the Bitcoin blockchain (Poon and Dryja, 2015).

Context-based bitcoin tracking methodology
We discuss the data gathering process for address profiling in Section 3.1 and transaction profiling in Section 3.2. We then introduce the context-based taint analysis strategies in Section 3.3 and propose the address and transaction metric for evaluation in Section 3.4.
The taint analysis operates by tracking Bitcoins with a specific tracking strategy (e.g., Haircut or FIFO). The tracking process typically produces transaction trails unrelated to the targeted users' activities as it does not differentiate the ownership of addresses that received the tainted Bitcoins. The effectiveness of Bitcoin tracking can be improved by integrating the context information of the targeted Bitcoins, transactions and addresses involved. By context we mean information external to the blockchain that informs on the nature of some transactions and addresses (i.e., transactions known to be illegal acts or addresses identified to be cryptocurrency services), as well as knowledge of practices inside the Bitcoin ecosystem which could be recognised (patterns of PETs). Therefore, the key principle for our methodology is that the tracking process should take into consideration the background of the targeted Bitcoins, purposes of transactions, and ownership of addresses that are being tracked and adapt its tracking operation accordingly. We formulate the methodology with three main aspects as follows.
1. The modelling of the Bitcoin tracking using the context of address profiling based on identified service and mixer addresses (Section 3.1) and identified PET transactions such as CoinJoin and Mixer Services using the identified transaction patterns, and our hypothesised properties of potential PET transactions (Section 3.2). The purpose of the address profile is to determine the tracking scope and influence taint analysis methodology, while the PET transaction profile is for tracking results' evaluation. 2. The introduction of two context-based taint analysis strategies that we compare as part of our evaluation (Section 3.3). 3. The evaluation of the tracking outcomes with a set of address and transaction metrics. The evaluation metrics are potential characteristics based on the background of the targeted Bitcoins (Section 3.4).
This tracking methodology is specifically designed for cryptocurrencies with an open blockchain system, such as Bitcoins and other similar cryptocurrency coins, and is less applicable to cryptocurrencies with obscured blockchain like Zcoin that use Zerocoin protocol (Miers et al., 2013) and Monero (Noether, 2015) because the tracking process relies on the transaction and address information from within the blockchain.
The context-based tracking methodology process can be summarised as illustrated in Fig. 2. First, we gather address profile data of identified service and mixer addresses and incorporate it into the tracking algorithm to tailor the taint analysis process. Second, we gather transaction profiling for PET service transactions using identified transaction patterns, which we use only for evaluation purposes. Third, we collect transaction data of known Bitcoin theft cases from publicly available sources for sample cases and select control groups with similar transaction characteristics. Fourth, we perform taint analysis with two established taint analysis strategies, namely FIFO and LIFO, and two context-based taint analysis strategies, namely Dirty-First and Taint-In, Highest-Out (TIHO) strategies. We choose to implement several taint analysis strategies to perform Bitcoin tracking instead of choosing one strategy to accomplish such a task. Our rationale for employing multiple strategies is that each strategy has its own strengths and weaknesses that can affect the accuracy of the tracking results. We choose not to implement the Poison and Haircut tracking strategies due to the fact that these two methods render an enormous number of tainted transactions, as mentioned in Section 2.1.1. Lastly, we compare and evaluate the taint analysis results of sample cases and control groups with the evaluation metrics that are based on address and transaction behaviour.

Address profiling
The tracking process can be improved with the implementation of address profiling into the tracking algorithm to prevent the process from following unrelated transactions. In order to achieve this, we require information that can indicate the entities behind pseudonymous addresses and the type of such entities (service or PETs). As such information typically does not exist in the blockchain, we obtain this information from external sources.
The gathering process for the address profile data can be classified into three stages. First, we retrieve address profile data from previous studies on Bitcoin address classification and mixer service analysis. Second, we implement a web scraping process on a public Bitcoin address tagging website to obtain more cryptocurrency service address profile data. Third, we employ the multi-input address clustering heuristic on the scraped service address data and the mixer deposited address data obtained from previous mixer analysis studies. Therefore, the address profile data consists of addresses belonging to various types of cryptocurrency services and mixer services.
There are three limitations and risks in the gathering process we described above. First, the address profile data that is publicly available is likely to account for only a small proportion of all addresses that belong to cryptocurrency services, which means that the tracking process can still track tainted Bitcoins that pass through unidentified service addresses. Second, there is a possibility that the address profile data we retrieve is inaccurate and contains false positive profiling. We address the second risk by including additional data verification in the second gathering step (our gathering via the web scraping) to ensure that the address data we scraped does not contain false positive results (see Section 3.1.1). Third, the accuracy of the multi-input address clustering heuristic that we employ in the third gathering stage can be significantly reduced because of the CoinJoin method. We argue that the heuristic can still be reliably used to cluster addresses that belong to a cryptocurrency service because such services have less need to use additional privacy protections. Furthermore, the findings of the studies that analyse the mixing mechanism of the profiled mixer services indicate no evidence of employing the CoinJoin method in their mixing operations.

Identified service addresses
One aim of our tracking methodology is to identify addresses that involve a change of ownership or exit points of targeted Bitcoins. We set the assumption that the ultimate purpose of illegal Bitcoins is to be exchanged for other currencies, goods and services in either virtual or physical form. This assumption is informed by public research and organisations' reports (Robinson, 2021;Huang et al., 2018;Wang et al., 2021;Europol, 2022). Therefore, we consider service addresses to be the end goal or exit point of the targeted Bitcoins and will stop tracking for those specific Bitcoin outputs. We classify any address that belongs to cryptocurrency services, such as cryptocurrency exchange services, online gambling services, e-commerce businesses, marketplace services (including dark-net market), or payment services that users can exchange their Bitcoins for other currencies or goods to be service addresses.
We retrieve address profile data from studies (Jourdan et al., 2018(Jourdan et al., , 2019, which publish their Bitcoin service address profile data that are labelled into six different service categories, which are exchange, service, gambling, mining, darknet market, and historic addresses (no longer operational). We classify cryptocurrency faucets, e-commerce businesses, service donation addresses, and other kinds of cryptocurrency services as other services. We also recategorise the historic type services to their appropriate service type, as shown in Table 1. The total number of service addresses and entities we obtained from these two studies are shown in the mentioned table.
We use a similar address profile data gathering methodology as in Jourdan et al. (2018Jourdan et al. ( , 2019 to obtain more address profile data, but with an additional verification process. We first utilise a web scraping script on the CheckBitcoinAddress website 5 to obtain data of addresses that are reported as belonging to a cryptocurrency service. Additionally, we verify scraped address data to ensure that the addresses belong to a cryptocurrency service by manually searching the scraped addresses on public websites and removing addresses that we can not find public evidence of ownership by the associated service entity. We subsequently perform multi-input address clustering heuristics on the scraped addresses to expand the address data. The total number of service addresses we obtained with the above method is shown in Table 1 at the "% Addresses Added" column.

Identified PET addresses
The presence of a mixer service can indicate the points where taint analysis tracking is no longer effective because of the creation of zero-taint Bitcoins, as mentioned in Section 2.2.2. Therefore, we can consider the targeted Bitcoins that reach identified mixer service addresses no longer traceable with the taint analysis strategies and will stop tracking for those specific Bitcoin outputs.
The results of the reverse-engineering experiments in the previous studies (M€ oser et al., 2013;de Balthasar and Hernandez-Castro, 2017;van Wegberg et al., 2018;Tironsakkul et al., 2020) discovered that the majority of mixer services typically utilise deposited addresses to receive Bitcoins from users before transferring the deposited Bitcoins to their central address(es) for further mixing.
There are two types of information sources of identified mixer address data which we use for address profiling as follows: 1. The address data from Jourdan et al. (2018Jourdan et al. ( , 2019  The total number of mixer service addresses we obtained from the previous studies is shown in Table 1.
The studies of the second source type analyse the mixing mechanism of the mixer services by using the services, and their findings indicate no evidence of the CoinJoin method in their mixing operations. We performed multi-input address clustering heuristics on the deposited addresses we retrieved to obtain other deposited addresses belonging to the mixer services. The total number of mixer service addresses we expanded with multi-input address clustering heuristics in this experiment is shown in Table 1 at the "% Addresses Added" column.

Transaction profiling
In order to accurately analyse the movement of stolen Bitcoins, we require identification or a method that can indicate the purpose of the transactions, especially for those that involve PETs for tracking evasion.
For the PET transaction data gathering process, we first derive transaction classification methods that can identify PET transactions based on their unique transaction patterns. Subsequently, we employ the transaction classification methods for each PET on every transaction in the blockchain and label any transaction that matches transaction patterns with the classification methods as a PET transaction.
The transaction profiling process has one crucial limitation, which is the lack of ground-truth data to verify the PET transactions and ensure that the classification methods will not produce false positive and false negative results. Therefore, we do not stop taint analysis operation for tainted Bitcoins that move through either identified or potential PET transactions and utilise the transaction profiling primarily for the evaluation metric in this experiment. Table 2 shows the total number of transactions for each identified PET. The "% Addresses Added" column indicates the proportion of addresses that we add to the existing datasets from our data gathering. The other service category consists of cryptocurrency faucets, E-commerce businesses, service donation addresses, and other types of cryptocurrency services. 5 https://checkbitcoinaddress.com is a reporting and labelling website that allows users to report Bitcoin addresses with an identity profile. 3.2.1. ChipMixer transactions ChipMixer 6 is a well-known mixer service that is different from the mixer services previously mentioned in Section 3.1.2. The reverse-engineering experiment on the ChipMixer service indicates that the service's mixing transaction has a unique but static transaction characteristic 7 that is distinct from common transactions (Wu et al., 2021b).
According to the ChipMixer analysis findings of Wu et al. (2021b), the ChipMixer's mixing protocol always distributes Bitcoins to transaction outputs (referred to as "chips") of exactly the same value and in a round number with three decimal places (e.g., 0.005 BTC and not 0.0055 BTC). The chip outputs also can not have their value lower than 0.001 BTC or higher than 4.096 BTC. However, each mixing transaction can have one transaction output that is an exception to the mentioned rule, which is a transaction output that receives the mixing fee or donation from users to the service.

CoinJoin transactions
We utilise the CoinJoin transaction classification provided by the BlockSci library tool to detect CoinJoin transactions performed by JoinMarket, 8 which is one of the most prominent mixing services that allows users to engage in CoinJoin mixing together. The CoinJoin classification is based on the JoinMarket's CoinJoin transaction detection algorithm presented in the study of Goldfeder et al. (2018).
We also utilise a classification of CoinJoin transactions performed by two other well-known CoinJoin services, namely Wasabi Wallet 9 and Samourai Wallet 10 . For the Wasabi CoinJoin transaction detection, we use the static coordinator address belonging to the service as a classification method (Wasabi Wallet Developers, 2021). However, the Wasabi wallet service no longer uses static coordinator addresses to perform CoinJoin transactions as of February 2020 and uses fresh coordinator addresses for every transaction. Hence, we obtained additional Wasabi CoinJoin transaction data from Wu et al. (2021b) that retrieved more transactions directly from the service's public API.
For the Samourai CoinJoin transaction detection, we use the transaction characteristics that the whirlpool mechanism employs. The whirlpool protocol always performs the CoinJoin mixing with five input addresses to five output addresses. The Samourai Coin-Join transactions must have five transaction outputs with the exact same value of either 0.01, 0.05 or 0.5 BTC, as well as five transaction inputs with a Bitcoin value no less than the transaction outputs value (Samourai Wallet Developers, 2021).
The total number of identified CoinJoin transactions we obtained in this experiment is shown in Table 2.

Lightning Network transactions
The Lightning Network transactions consist of a funding transaction and a closing transaction that appear in the blockchain data, as mentioned in Section 2.2.3. These two transactions typically have unique transaction characteristics that can be used to differentiate Lightning Network transactions from other transactions (Guo et al., 2019;Nowostawski and Tøn, 2019). Therefore, it is possible to identify Lightning Network transactions using only blockchain data.
A Lightning Network funding transaction can be potentially identified from the existence of at least one multi-signature address output, and the closing transaction always has only one transaction input from the funding transaction and two transaction outputs. The transaction output value must also be in the channel capacity limit of 0.042 BTC for transactions that occurred before the capacity update on 2020-04-29 and 0.167 BTC for transactions that occurred after the update. Therefore, we classify potential Lightning Network transactions based on this transaction pattern.

Potential PET transactions
As our identified PET profile data is likely to contain only a fraction of all PET addresses and transactions in the Bitcoin blockchain, we design a classification method to identify transactions that may involve PETs based on the pattern recognition of transaction characteristics. Typically, the most distinguishing characteristic of the transactions that involve PETs is the inclusion of other unrelated or clean Bitcoins in the transaction inputs.
We propose a classification method for transactions that potentially involve PETs as follows: if a transaction with tainted Bitcoins also contains completely clean Bitcoins (i.e., Bitcoins unrelated to the tainted Bitcoins) as input then it is labelled as a potential PET transaction, which can be an indication that Bitcoins belonging to other users are being mixed with tainted Bitcoins by either mixer services or the CoinJoin method. The transaction must also have no transaction inputs from service addresses.

Context-based taint analysis strategy
To make taint analysis more efficient, we include into the taint analysis transaction characteristics that are relevant to the targeted Bitcoins. As such, we propose two additional strategies, namely Dirty-First and TIHO, in this experiment.

Dirty-First strategy
As mentioned in Section 3.2.4, when tainted Bitcoins are used in a transaction with clean Bitcoins, this event may indicate that the tainted Bitcoins were obscured by PETs, especially in the case of illegal activities where illegal Bitcoin users are less likely to combine stolen Bitcoins with their other clean Bitcoins since this would expose their other Bitcoin activities. 11 Meanwhile, if there is no clean Bitcoin involved, there is high certainty that the stolen Bitcoins still belong to the illegal Bitcoin users. The assumption is based on the findings of previous studies (Samsudeen et al., 2019;Chainalysis Team, 2022;de Balthasar and Hernandez-Castro, 2017;Wu et al., 2021b) which showed that illegal Bitcoin users are more likely to utilise PETs and that the majority of cryptocurrency PETs, including CoinJoin and centralised mixer services, operate by combining multiple unrelated Bitcoins together to obscure their movement.
To illustrate and analyse the tracking results of fully tainted Bitcoins, we propose a taint analysis strategy named Dirty-First.  Fig. 3. It is worth noting that the Dirty-First strategy produces tracking results that are a subset of other taint analysis strategies' results (i.e., the results of the Poison, Haircut, FIFO, LIFO, and TIHO strategies would contain fully tainted Bitcoin transactions in the Dirty-First strategy's results for the same tracking case).
The Dirty-First strategy has an advantage in that the strategy can create a network of transactions that are most likely to be performed by the targeted illegal Bitcoin users due to the lack of clean Bitcoin mixing. The Dirty-First strategy's tracking results should be able to illustrate the transaction behaviour of illegal Bitcoin users with the least number of false positive results. However, the Dirty-First strategy also has a disadvantage in that it may be possible for illegal Bitcoin users to mix their tainted Bitcoins with their own clean Bitcoins of any amount, which would cause the Dirty-First strategy to misclassify those transactions as false negative and stop tracking even if the transactions afterwards are still performed by the illegal Bitcoin users.

TIHO strategy
We introduce a taint analysis strategy named TIHO (Taint-In, Highest-Out), which prioritises the distribution of the tainted inputs to the highest value outputs, as shown in Fig. 4.
White rectangles represent clean inputs or outputs, and dark rectangles represent fully tainted ones.
The Taint-In strategy possesses an advantage in that the strategy performs a targeted tracking on tainted Bitcoins based on specific transaction patterns rather than purely on the arbitrary transaction order like the FIFO and LIFO strategies.
The TIHO strategy is based on the fact that the primary purpose of PETs like the CoinJoin method is to make it difficult to identify and prove the receiving addresses of obscured Bitcoins. Illegal Bitcoin users who utilise non-zero-taint PETs instead of those that can produce zero-taint Bitcoins (completely immune to taint analysis tracking) are likely to trust these PETs enough that they can safely exchange their stolen Bitcoins to other values without requiring to distribute stolen Bitcoins into smaller proportions.
Additionally, previous research (Samsudeen et al., 2019) shows that most of the well-known Bitcoin theft and ransomware incidents typically involve a significantly large amount of Bitcoins. Therefore, we set the assumption that when stolen Bitcoins are obscured by the CoinJoin method, the highest value outputs are most likely to be stolen Bitcoin outputs because the number of stolen Bitcoins is typically higher than average users' Bitcoins.
The Taint-In strategy should be beneficial for tracking Bitcoins that pass through PETs like the CoinJoin method by distributing tainted Bitcoins to transaction outputs that are likely to belong to the illegal Bitcoin users. However, the Taint-In strategy has a disadvantage similar to the other taint analysis strategies where it can not track zero-taint Bitcoins produced by mixer services.

Tracking evaluation metrics
We hypothesise that the characteristics of transactions and addresses in Bitcoin theft cases are distinguishable from those involved in non-illegal Bitcoin activities due to the attempts to evade tracking and legal enforcement. Findings of previous studies revealed that Bitcoin users are typically not privacy-conscious in their Bitcoin activities (Harrigan and Fretter, 2016;Gaihre et al.,   2018), while illegal Bitcoin users are privacy-conscious and make use of PETs to obscure their Bitcoins (Samsudeen et al., 2019). Therefore, it may be possible to build evaluation metrics that measure the performance of tracking results based on the potential characteristics of Bitcoin theft cases. We propose evaluation metrics and present a corresponding hypothesis based on the behaviour of Bitcoin privacy practices and PETs. We define six evaluation metrics in total, and the hypothesis for each of those metrics is as follows. H6. (Transaction Fee). The transaction fee of the majority of transactions in Bitcoin theft cases is high.

Transaction frequency (H1)
We expect the number of tainted transactions per day to be high for Bitcoin theft cases because of the common transaction obscuring or privacy technique that involves distributing Bitcoins in multiple transactions and addresses to increase the difficulty of tracking. It is unlikely for non-illegal Bitcoin cases to employ this technique because transaction senders typically have to pay a transaction fee for each transaction (see Section 3.4.6), which can incur a significant loss of Bitcoins due to a large number of transactions.

PETs detection (H2)
We include the identified PETs' profile data (both addresses and transactions) and potential PETs classifications as an evaluation metric to identify PETs' usage and strategies that obscure stolen Bitcoins. We anticipate that the number of transactions involving PETs such as the CoinJoin method or a mixer service is different depending on the privacy requirement. Hence, there should be a significantly large number of transactions involving PETs for Bitcoin theft cases as illegal Bitcoin users would likely utilise PETs several times to obscure the transaction trails.

Reused address (H3)
While privacy protection is often considered to be one of the most important aspects of Bitcoin among its user-base, many Bitcoin users do not seem to be privacy-conscious, as can be observed from a large number of reused addresses discovered in the previous research (Ron and Shamir, 2013;Harrigan and Fretter, 2016;Gaihre et al., 2018). These findings provide us with a valid reason to assume that there is a high chance that the number of reused addresses involved in transactions with stolen Bitcoins, which benefit the most from privacy measures, is minimal, compared to nonillegal Bitcoin cases. Hence, we propose a reused address, which is an address that has been used in transactions more than once as one of the evaluation metrics.

Fresh address (H4)
Following the reused address metric, we assume that illegal Bitcoin users would create new addresses every time they distribute stolen Bitcoins to avoid reusing previous addresses. Thus, we expect that the significant majority of tainted addresses in Bitcoin theft cases to be fresh addresses, which are addresses that do not have any transaction activity before receiving any stolen Bitcoin. It is worth noting that both reused and fresh address metrics do not include identified service and mixer addresses.

Number of addresses per transaction (H5)
Based on the privacy technique mentioned in the transaction frequency metric (Section 3.4.1), we anticipate the majority of transactions involving stolen Bitcoins to be distribution transactions for obscuring. We expect distribution transactions in Bitcoin theft cases to have a large number of addresses per transaction in order to distribute stolen Bitcoins to multiple addresses and make tracking more difficult. It should be noted that the number of addresses per transaction metric includes both the input and output addresses in the transaction. For example, a 1-to-2 addresses transaction (a transaction with one input address and two output addresses) is equal to three addresses per transaction.

Transaction fee (H6)
A transaction fee is an incentive provided by transaction initiator(s) to miners to prioritise confirming the transaction into the blockchain. A transaction fee is calculated from the difference between the total number of Bitcoins in transaction inputs and transaction outputs in a transaction (Nakamoto, 2009) (e.g., a transaction with 2 BTC input and 1 BTC output has a transaction fee value of 1 BTC). Typically, the recommended transaction fee rate that Bitcoin miners charge is calculated from the data size of the transaction and the number of transactions that are currently waiting for confirmation at the time.
We implement the transaction fee as one of the evaluation metrics based on the assumption that privacy practices utilised in Bitcoin theft cases can influence the transaction fee value. For example, illegal Bitcoin users may try to obscure their transaction trail by rapidly moving the stolen Bitcoins. Therefore, they need to pay a sufficient transaction fee to accomplish this strategy. 13 The transaction fee variable we use is the ratio of the transaction fee value to transaction data size.

Sample and control groups collection
We evaluate the methodology presented in Section 3 by applying it to known cases of transactions involving illegally-acquired Bitcoins. We explain the selection process of the sample cases for the experiment in Section 4.1. Section 4.2 provides details on the control group criteria and selection.

Theft case sample selection
We selected a total of 26 historical Bitcoin theft cases from the year 2012e2021. The cases of cryptocurrency service thefts and ransomware attacks were reported either on Bitcoin news websites or on Bitcoin forums and included details of the theft transactions or the suspects' Bitcoin addresses. Such details are public information.
It should be clarified that we exclude the affected service's addresses from the address profiling implementation and evaluation of the related theft case (i.e., if a sample case involved illegal Bitcoin users stealing Bitcoins from service A, we excluded all identified addresses of service A from the address profile data when we track and evaluate that sample case). The purpose of this exclusion is to avoid potential service misclassification due to illegal Bitcoin users sharing service addresses with their addresses as transaction inputs in the same transactions since some of the sample cases involved illegal Bitcoin users hacking into the service's computer system and gaining control of the service's addresses to steal the Bitcoins. This scenario can cause the multi-input address clustering heuristic of the service address profile data to misclassify illegal Bitcoin users' addresses as service addresses.

Control groups criteria and selection
For each sample case, we build a control group of non-illegal Bitcoin transactions that bears enough similarity to allow comparison. However, there is no reliable information to guarantee that the control transactions are not related to illegal activities. To mitigate this risk, we select multiple control transactions per sample case using the following steps. We first identify potential control transactions from all transactions in the blockchain that possess similar characteristics as the sample cases based on a set of criteria (see Section 4.2.1, 4.2.2 and 4.2.3). We discard matching control transactions that belong to the same transaction chain (i.e., we keep only the first transaction and discard the following transactions) to prevent the control groups from sharing identical results. We subsequently select from the remaining transactions the first ten that have the transaction value closest to the sample case. We set the limit of ten control transactions per case to reduce the computational cost while retaining a sufficient size for the analysis and evaluation and ensuring the control groups do not disproportionately represent only the sample cases with a significantly larger number of control transactions. We finally discard transactions from the control group if after applying the four taint analysis strategies, the results reach a transaction that is already included in the tainting results of the theft cases. We repeat the process until we find unrelated transactions to avoid the risk of control groups being related to the sample cases.
There are three transaction characteristic criteria that we use to identify potential control transactions for each sample case, as we define below.

Time
To avoid selecting control transactions that can be either directly or indirectly related to the sample case, we set the time criteria to be within 60 days prior to the day when the sample cases' first distribution transaction occurred. We select 60 days periods to ensure the control transactions can be obtained in a sufficient number while still sharing similar conditions of the cryptocurrency market, PETs, and privacy practices to the respective sample case as closely as possible since such factors can influence transaction behaviours in significant ways. For example, the average transaction fee rate at a specific time affects transaction fee payment, which in turn can increase or decrease the willingness of users to send transactions (transaction frequency).

Transaction value
We set the transaction value criteria to be in the 10% range of the sample value, e.g., if the sample case's distribution transaction involves 5000 BTC, the transaction value criteria for the control transactions selection will be at between 4500 and 5500 BTC for that particular sample case. If the sample case is involved in multiple transactions, we will select the transaction with the highest number of Bitcoins. If the criteria of the transaction with the highest value results in zero control matching, we will instead select transactions with the next highest value.

Transaction type
The transaction type refers to the number of addresses in the transaction inputs and outputs. For example, if the sample case's distribution transaction is a 1-to-2 transaction (one input address to two output addresses), the control transactions we select will also be a 1-to-2 addresses transaction. Similar to the transaction value criteria, we will use the distribution transaction with the highest value of Bitcoins.

Results
In this section, we present and interpret the results of our tracking methodology for the sample cases and control groups. We discuss the overall results in Section 6.
We performed tracking on each sample case and control case for 15 days with the FIFO, LIFO, Dirty-First, and TIHO strategies. For simplicity, we refer to the results of each sample theft and ransomware case as 'TC' (Theft Case). We present the results of the control group of each sample case together and refer to their results as 'CG' (Control Groups). We indicate tracking results with the inclusion of address profiling described in Section 3.1 with ' AP ' (short for Address Profiling) for sample cases (TC AP ) and control groups (CG AP ). We also indicate results without address profiling with ' Full ' (short for Full results) for sample cases (TC Full ) and control groups (CG Full ). We indicate the taint analysis strategy's results with address profiling as 'Dirty-First AP ', 'FIFO AP ', 'LIFO AP ' and 'TIHO AP ' and the full results as 'Dirty-First Full ', 'FIFO Full ', 'LIFO Full ' and 'TIHO Full '. It is also worth noting that we subtract the transaction fee from every transaction input proportionally (similar to the Haircut strategy) for all taint analysis strategies and do not taint transaction fee outputs.
The results of the control groups for each taint analysis strategy shown in this section are derived from the weighted average of all control groups' results except for the transaction frequency (H1) metric. We use the transaction number as the weight for the transaction-related metrics which are PET transactions (H2), number of addresses per transaction (H5) and transaction fee (H6). We use the address number for the address related metrics, which are reused addresses (H3) and fresh addresses (H4).

Address profiling results
The bar in each case represents the results for each taint analysis strategy in the order as follows: Dirty-First, FIFO, LIFO, and TIHO. TX stands for transaction and ADR stands for address. Group U (unidentified spending) is not shown in the figure because the percentage is negligible.
The bar in each case represents the results for each taint analysis strategy in the order as follows; Dirty-First, FIFO, LIFO, and TIHO.
The percentage shown in Fig. 5 is the proportion of the exchanged/obscured stolen Bitcoins reaching addresses and transactions identified as belonging to a cryptocurrency service or PET, compared to the total number of stolen Bitcoins when we start tracking. The results of the sample cases can be categorised into four groups, which are sample cases that spend stolen Bitcoins with services (Group S), sample cases that obscure stolen Bitcoins with PETs that are identified in address profiling (Group X), sample cases that obscure stolen Bitcoins with identified PETs in transaction profiling (Group P), and sample cases that have a minimal number (less than 1%) of stolen Bitcoins reaching identified addresses and transactions (Group U). 14 Although both Group X and Group P involve PETs, we distinguish them because they rely on different 14 Because the percentage of tainted Bitcoins is negligible in Group U, we did not include it in Fig. 5.
T. Tironsakkul, M. Maarek, A. Eross et al. Forensic Science International: Digital Investigation 42-43 (2022) 301475 classification methods, respectively address profiling and transaction profiling. For the rest of the results, we will present the results of the sample cases based on the spending group classification. Unexpectedly, the majority of the sample cases show a significant percentage of the stolen Bitcoins reaching cryptocurrency services without passing through PETs (Group S), as can be seen in the Dirty-First results in Fig. 6a. For example, the Dirty-First results of cases TC2, TC7, TC9, TC15, and TC24 show around 20% of the stolen Bitcoins reaching service addresses, while case TC18's Dirty-First results show 100% of the stolen Bitcoins reaching service addresses within the first 15 days. The results of sample cases in Group S indicate that the majority of sample cases we observe may not rely on PETs to obscure the stolen Bitcoins, as indicated by the significantly high Bitcoin spending in the Dirty-First results.
The majority of sample cases in Group S typically have similar patterns for which the FIFO, LIFO, and TIHO strategies' results show a marginally different percentage of stolen Bitcoins reaching service addresses. However, two sample cases, TC3 and TC15, show significantly different results between the three strategies. Furthermore, the results of some of the sample cases in Group S show a significant increase in the number of stolen Bitcoins reaching service addresses for the FIFO, LIFO, and TIHO strategies, compared to the Dirty-First strategy. As the sample cases in Group S contain no visible PET transactions in the results, it is possible to assume that the difference between each taint analysis strategy's results is because the stolen Bitcoins passed through unidentified service addresses. Subsequently, the unidentified service addresses combined the stolen Bitcoins with clean Bitcoins and sent them to identified service addresses afterwards. The majority of stolen Bitcoins from the sample cases in Group S reach either payment or exchange services, as shown in Fig. 6a. There are only four sample cases, namely TC2, TC3, TC5, and TC9, that show a substantial number of stolen Bitcoins reaching darknet markets in the Dirty-First results. The results suggest that most of the illegal Bitcoin users of these sample cases may prefer to exchange stolen Bitcoins with reliable services rather than illegal channels, despite the risk of the Bitcoins being seized by the receiving services or law enforcement.
The results of four sample cases in Group X show a significant number of stolen Bitcoins reaching identified mixer addresses, namely cases TC10, TC12, TC14, and TC17, as shown in Fig. 5b. On the other hand, the results of three sample cases in Group P show either the stolen Bitcoins reaching CoinJoin transactions (TC21) or ChipMixer transactions (TC19 and TC25), as shown in Fig. 5c. The results of these two groups indicate the difference in the illegal Bitcoin users' obscuring and spending strategies, compared to the sample cases in Group S. Intriguingly, the results of two sample cases (TC12 and TC17) in Group X and two (TC19 and TC21) in Group P show a small number of stolen Bitcoins directly reaching both service and mixer addresses in the Dirty-First results, as shown in Fig. 6. These results may suggest that illegal Bitcoin users intend to spend the stolen Bitcoins in several ways. For example, illegal Bitcoin users may obscure some of the stolen Bitcoins before exchanging them with exchange services that require personal information in a large number and directly spend the rest on T. Tironsakkul, M. Maarek, A. Eross et al. Forensic Science International: Digital Investigation 42-43 (2022) 301475 darknet market tradings or small number exchanges, which do not require obscuring measures. There are some sample cases' results that show a very small number of stolen Bitcoins reaching identified cryptocurrency services or PETs, which are cases TC1, TC13, TC20, TC22, and TC26 (Group U). There is no sample case that transfers the stolen Bitcoins through Lightning Network channels in this experiment.
The results of the control groups (CGs) and the sample cases in Group S are most similar, where both show mostly tainted Bitcoins reaching service addresses. However, when looking at the service types (Fig. 5d), the results reveal the difference in the Bitcoin spending methods between the sample cases and control groups. Unsurprisingly, the majority of services in the control groups' results are exchange services followed by payment services. Interestingly, the control groups' results also show a noticeable number of Bitcoins reaching gambling and darknet market services, which are not outside our expectations as both types of services are widely reported as a significant part of the Bitcoin ecosystem (Tasca et al., 2018;Crystal Analytics Team, 2020;Chainalysis Team, 2020) and do not necessarily indicate that the control groups are related to illegal activities since it is possible for the Bitcoins to be exchanged with other users via other unidentified services first before reaching gambling and darknet market services.
The inclusion of address profiling shows a considerable high reduction in the number of tainted transactions for the majority of the TC AP results, especially for the FIFO, LIFO and TIHO strategies, as can be seen in Fig. 7. The Dirty-First strategy shows the least number of sample cases with a significant change in the transaction number, which suggests that cryptocurrency and mixer services typically combine the stolen Bitcoins they receive with other Bitcoins shortly after the exchanges.
The FIFO AP , LIFO AP , and TIHO AP results show a considerably distinct pattern where sample cases show a reduction in the number of tainted transactions from lower than 10% to as high as 90%. These results suggest that the illegal Bitcoin users have different spending strategies where some try to quickly spend the stolen Bitcoins to lessen the risk of the Bitcoins being blacklisted by cryptocurrency services, while the others are more cautious and likely to wait for the interest of tracking the stolen Bitcoins decline before spending them.
Intriguingly, the TIHO AP results show an overall lower reduction in tainted transaction numbers, compared to the FIFO AP and LIFO AP results for most sample cases. One explanation for this pattern is that the lower value outputs are utilised more as spending outputs, compared to the higher value ones that the TIHO strategy prioritises.
The total number of transactions' results for sample cases in Group U show that three out of five sample cases have a very small number of transactions, which explains the lack of Bitcoin spending. However, the results of cases TC20 and TC26 show a large number of transactions, which indicates that our address and transaction profile data are unable to identify the spending and obscuring strategies for these two sample cases.
The address profiling results demonstrate that the taint analysis tracking can benefit from the implementation of address profiling, as can be seen from the significant reduction in the unessential tracking results for multiple sample cases.

Transaction frequency (H1) results
Sample cases with a red colour number are cases with low transaction activity (less than average of one transaction per day) for any taint analysis strategy.
The results of the transaction frequency metric (defined in Section 3.4.1) in Fig. 8 are shown as the average number of tainted transactions per day. The results of each sample case seem to yield a T. Tironsakkul, M. Maarek, A. Eross et al. Forensic Science International: Digital Investigation 42-43 (2022) 301475 considerably diverse pattern, ranging from the average of one transaction to as high as 1000 transactions per day. There are four sample cases that have an average number of transactions of less than one per day in both of the TC AP and TC Full results for all four taint analysis strategies, which are cases TC1, TC12, TC13, and TC22. The results suggest suggests that illegal Bitcoin users did not use the distribution technique described in Section 3.4.1 possibly to maximise their profit from the theft. Meanwhile, the results with high transaction activity, such as cases TC9, TC16, and TC24, indicate that illegal Bitcoin users rapidly distributed their stolen Bitcoins to increase the number of transactions that needed to be tracked and analysed. For this paper, we mainly focus on the results of the sample cases with high transaction activity as the sample cases with very low transaction activity (less than an average of one transaction per day for all results) do not provide meaningful transaction behaviour information for analysis and comparison.
Similar to the total number of transactions results (shown in Fig. 7), the inclusion of address profiling shows a significant reduction of transaction frequency in the TC AP results for the FIFO, LIFO, and TIHO strategies, as can be seen in Fig. 8 However, there are few sample cases that do not show as much difference, such as case TC24's FIFO AP results, which show an average of 2592.9 transactions per day, compared to the FIFO Full results at an average of 2794 transactions per day. This pattern is similar to the service address results presented in Section 5.1, where sample cases with a higher number of stolen Bitcoins reaching identified service or mixer addresses (such as case TC18) also show a higher reduction in the transaction number.
The TC AP and TC Full results show an overall lower number of transactions, compared to the control groups' results (CG AP and CG Full , respectively), especially for the Dirty-First strategy. There are a few exceptions that show remarkably high transaction frequency for all four taint analysis strategies' results, such as cases TC9, TC16, and TC24, as can be seen in Fig. 8. This pattern indicates that for most theft cases, the illegal Bitcoin users do not distribute the stolen Bitcoins rapidly in a large number of transactions as we expected, possibly to avoid unnecessarily losing their profits because of the transaction fee. Hence, the majority of the sample cases' results do not support our H1 hypothesis that the Bitcoin theft cases would have a high transaction frequency.
Nevertheless, the transaction frequency metric shows the potential for further analysis that can assist in the effort to investigate illegal Bitcoin users' strategies. Additionally, the lack of transaction T. Tironsakkul, M. Maarek, A. Eross et al. Forensic Science International: Digital Investigation 42-43 (2022) 301475 activity for some sample cases may be due to our tracking period of 15 days from the first distribution transaction. This issue can be alleviated by extending the tracking timeframe further to reveal more transaction activity that we have not yet captured for these sample cases.

PETs detection (H2) results
As discussed in Section 5.1, we observed seven sample cases in Groups X and P that utilise identified PETs at a considerable level. The identified PET transaction proportion results reveal further insights into the obscuring strategies employed by illegal Bitcoin users, as shown in Fig. 9. The results suggest that illegal Bitcoin users typically employ only one type of PET to obscure the stolen Bitcoins. The proportion of transactions involving identified PETs in the sample cases' results is not substantially different from the results of the control groups except for the sample cases in Groups X and P. Interestingly, case TC14's Dirty-First AP , FIFO AP , LIFO AP , and TIHO AP results show 100% of transactions involving identified PETs, but fewer than 10% of transactions for the FIFO Full , LIFO Full , and TIHO Full results. These results indicate that the illegal Bitcoin users of this sample case sent the stolen Bitcoins to a PET service in every transaction starting from the first transaction. Additionally, case TC15, which shows an insignificant number of Bitcoins reaching PETs in Fig. 5a, has almost 10% of transactions involving identified PETs for all four taint analysis strategies' results. These results may be an indication of the illegal Bitcoin users changing their stolen Bitcoins' spending strategies.
The results in Fig. 10 are shown as the proportion of tainted transactions that are classified as a potential PET transaction by the classification method described in Section 3.2.4. It is also worth noting that the potential PET transaction results do not include the identified PET transactions or transactions with an identified service or mixer address.
The Dirty-First strategy shows a significantly large number of potential PET transactions from 10% to 80% of transactions, while the FIFO, LIFO, and TIHO strategies show a small number of potential PET transactions of fewer than 10% for most sample cases in both of the TC Full and TC AP results, including those that employ identified PETs in Groups X and P, which are not much different from the CG Full and CG AP results. Additionally, the TC Full results generally show either an equal or smaller proportion of potential PET transactions, compared to the TC AP results for the majority of the sample cases. Since the potential PET transaction classification method relies on the presence of completely clean Bitcoins in transaction inputs, these results imply that clean Bitcoin mixing occurs mainly when stolen Bitcoins reach either services or PETs but not in subsequent transactions.
As we discovered that the sample cases in Group S show the stolen Bitcoins reaching service addresses directly in the Dirty-First results, the potential PET transactions in the results of these sample cases are more likely to be transactions involving unidentified cryptocurrency services rather than PETs. Meanwhile, there is a high possibility that the potential PET transactions identified in the results of Groups X and P' cases are transactions involving unidentified PETs. Intriguingly, the results of Group U cases show a considerably high proportion of potential PET transactions for all four taint analysis strategies despite showing a remarkably small number of identified addresses and transactions, including the sample cases with a small total number of transactions like cases TC1 and TC13. The results of sample cases in Group U suggest that the lack of Bitcoin spending in the sample cases of this group may not be due to the limitation of a small tracking timeframe, but rather the incompleteness of our address and transaction profile data.
Therefore, our H2 hypothesis that there would be a significant number of PET transactions in Bitcoin theft cases is not supported by the results. Nevertheless, the identified PETs profiling and the potential PETs classification method reveal insights into the obscuring strategy, or lack thereof, employed by the illegal Bitcoin users. The PET address and transaction' profiling can be expanded further to assist the taint analysis algorithm in detecting and adapting its tracking process for PET transactions. It would also be possible to expand PETs' profile data by identifying common patterns such as transaction shape that may indicate when tainted Bitcoins reach transactions with similarity as identified PET transactions.

Reused address (H3) and fresh address (H4) results
The results of reused address (defined in Section 3.4.3) and fresh address (defined in Section 3.4.4) metrics in Fig. 11 are shown as the proportion of addresses that are either old and reused, fresh but reused later, or fresh and never reused. It should be noted that we exclude addresses identified as belonging to either a service or PET in reused and fresh address results.
The results of the reused and fresh address metrics reveal a consistent pattern for most sample cases. The Dirty-First strategy generally shows a higher number of fresh and not reused addresses, compared to the FIFO, LIFO, and TIHO strategies for the TC AP results.  T. Tironsakkul, M. Maarek, A. Eross et al. Forensic Science International: Digital Investigation 42-43 (2022) 301475 The presence of reused addresses in the Dirty-First strategy's results may indicate that the illegal Bitcoin users tend to avoid reusing addresses but not always since the Dirty-First AP results still show a substantial reused address proportion. Meanwhile, the FIFO, LIFO, and TIHO strategies generally show an increasing number of reused addresses for most of the TC AP results. Considering that the address profile data is still likely to contain only a fraction of service and mixer addresses in existence, the increase in the number of reused addresses for these three strategies in the TC AP results can be from addresses belonging to unidentified services, PETs, or other Bitcoin users that receive the stolen Bitcoins.
The TC Full results generally show an increase in the number of reused addresses and a decrease in the number of fresh addresses, compared to the TC AP results for most sample cases. The increase in the number of reused addresses in the TC Full results supports our hypothesis in Section 3.1.1 that cryptocurrency services have lower privacy requirements to perform privacy techniques.
Intriguingly, the Dirty-First AP results of cases TC5, TC10, and TC20 show a more significant proportion of reused addresses, compared to the control groups. These results reveal intriguing insights that even illegal Bitcoin users may not completely avoid reusing their addresses, which is one of the most common privacy techniques that any Bitcoin user can costlessly perform without requiring any PET. It would be possible to analyse the illegal Bitcoin users' transaction activity outside of the tainted transactions with these previously used addresses, which can ultimately help unveil their personal information. The results of the sample cases generally show a lower number of reused addresses and a higher number of fresh addresses, compared to the control groups. Therefore, the results of reused and fresh addresses support our H3 and H4 hypotheses.

Number of addresses per transaction (H5) results
The results of the number of addresses per transaction metric as defined in Section 3.4.5 are shown as the weighted average from all tainted transactions, as shown in Fig. 12.
The results of the number of addresses per transaction reveal an unexpected pattern where the majority of the TC AP results show a small average number of addresses per transaction (lower than 20) for the Dirty-First strategy. There are a few exceptions, like cases TC21 and TC24, which show a much higher average number of T. Tironsakkul, M. Maarek, A. Eross et al. Forensic Science International: Digital Investigation 42-43 (2022) 301475 addresses per transaction (around 140 and 40 addresses per transaction). The results suggest that the illegal Bitcoin users in most sample cases prefer to send the stolen Bitcoins in small transactions, possibly to avoid making their transactions distinct from other transactions. The small number of addresses per transaction for the sample cases' Dirty-First AP results suggests that the transactions that are likely to be performed by the illegal Bitcoin users do not typically have a large number of addresses because of the Bitcoin privacy technique mentioned in Section 3.4.5. There is also no clear difference between each group that indicates either a common or unique pattern. The majority of the sample cases show an increase in the average number of addresses per transaction for the TC Full results. This pattern indicates that the transactions that occurred after the stolen Bitcoins reached service or mixer addresses generally have a higher number of addresses, compared to the transactions in the TC AP results. As such, the results illustrate that the transactions that occur after the stolen Bitcoins reach a service or PET address are substantially different from the transactions in TC AP results for all four taint analysis strategies.
The TC AP results show an increasing average number of addresses per transaction for the FIFO, LIFO, and TIHO strategies, compared to the Dirty-First strategy. Based on our previously mentioned hypothesis that the presence of clean Bitcoins indicates the possibility of PETs' usage, the increase in the number of addresses per transaction in TC AP results for the three strategies may be due to the transactions that involve unidentified cryptocurrency services or PETs. However, the CG AP results show an overall higher number of addresses per transaction, compared to most sample cases in the TC AP results for all four taint analysis strategies.
The results of the number of addresses per transaction metric do not support our H5 hypothesis that the majority of transactions in Bitcoin theft cases would be large transactions. Nevertheless, the number of addresses per transaction metric shows potential for revealing a change in transaction behaviour between the Dirty-First strategies and the FIFO, LIFO, and TIHO strategies, which can be an indicator for changes in the stolen Bitcoins' ownership.

Transaction fee (H6) results
The results of the transaction fee metric as defined in Section 3.4.6 are shown as the weighted average of the difference between the transaction fee size ratio (Satoshis 15 per byte) in tainted transactions and all of the transactions on the same day, as shown in Fig. 13.
The transaction fee size ratio in the TC AP results shows a considerably diverse pattern for the Dirty-First strategy, where the sample cases show a transaction fee size ratio of either lower than, equal to, or higher than the daily average. There are four sample cases, TC1, TC2, TC4, and TC21, which have an average of 100 transaction fee size ratio lower than the daily average. Meanwhile, five sample cases, namely TC9, TC16, TC17, TC18, and TC25, show a transaction fee size ratio of 100 Sat per byte higher than the daily average. The varied results in the Dirty-First AP results indicate that there seems to be no standard practice that illegal Bitcoin users employ for transaction fee payment, and each user typically pays according to their preference.
Meanwhile, the FIFO, LIFO, and TIHO strategies in the TC AP results show a substantial change in the transaction fee size ratio, compared to the Dirty-First strategy. The FIFO AP results generally show an increase in the transaction fee size ratio and exceed the daily average for many sample cases. Intriguingly, the LIFO AP and TIHO AP results are significantly different from the FIFO AP results, as they seem to exhibit a transaction fee size ratio remarkably close to the daily average and the CG AP results for most sample cases. These results may indicate that the results of the two strategies contain a large number of transactions performed by a similar type of entity, which we assume can be either unidentified services or PETs. The reasoning for this assumption is that services and mixers (as mentioned in Section 3.1.2) tend to combine their Bitcoins into transaction outputs with a large number of Bitcoins and transfer 15 Satoshis is the smallest unit of Bitcoin, 1 BTC is equal to 100,000,000 Satoshis. them to their users in a "peeling chain 16 ". Hence, the TIHO strategy that prioritises distributing tainted Bitcoin to the output with the highest value would keep following change outputs that belong to the services. Change outputs are also often the last outputs in the transactions as many wallet clients create transactions by putting change outputs after spending outputs by default (Atlas, 2015).
The TC Full results are considerably different from the TC AP results, especially for the Dirty-First and FIFO strategies, where the transaction fee size ratio are closer to the daily average for most sample cases. This pattern supports the assumption in the previous paragraph that the services and PETs typically pay transaction fees close to the daily average. Although similar to the results of the number of the addresses per transaction (H5), there seems to be no obvious pattern in each group that can differentiate the sample cases in the same group from the others.
The transaction fee metric results do not support our H6 hypothesis that the transactions in Bitcoin theft cases would have a high transaction fee in this experiment. Nevertheless, the transaction fee metric results illustrate a clear change in transaction fee behaviour, especially between the Dirty-First and the FIFO, LIFO, and TIHO strategies. The changes in transaction fee behaviour after clean Bitcoins mixing are likely an indication that the transactions with clean Bitcoins are performed by different entities, which support our hypothesis of clean Bitcoin mixing.

Results summary and discussion
In this section, we summarise the results for each hypothesis (Section 6.1) and evaluation metrics (Section 6.2). We then discuss the evaluation of the practical application of the methodology (Section 6.3). Lastly, we detail the limitations of this work's experiment (Section 6.4).
The scale is based on the difference between the TC AP results and the CG AP results of the same taint analysis strategy, where 0 is the CG AP value, À1 (inner) is the value of TC AP that contradicts the hypothesis the most and further than the CG AP , while 1 (outer) is the value of TC AP that supports the hypothesis the most and further than the CG AP .

Summary of hypothesis results
As shown in Fig. 14, two out of six evaluation metrics' hypotheses are supported by the experiment's results, namely reused address (H3) and fresh address (H4) metric. The four other metrics' results do not support their hypothesis but illustrate a significant change in transaction behaviour that may also indicate a change in the ownership of the stolen Bitcoins. We summarise the key findings of each hypothesis as follows.
6.1.1. Theft cases do not have higher transaction frequency (contradicting H1) The results of the theft cases do not support the hypothesis of H1) but the metric manages to reveal the difference in illegal Bitcoin users' spending strategy, whereas most spent their stolen Bitcoins in a small number of transactions, while there are also those who rapidly distribute their stolen Bitcoins in a large number of transactions (as high as 1000 transactions per day).

Theft cases do not have a higher number of PET transactions (contradicting H2)
The results of the theft cases do not support the hypothesis of H2 but suggest that most of the illegal Bitcoin users we observed do not utilise PETs to obscure their stolen Bitcoins before spending them, and those that utilise PETs tend to employ only one type of PETs.

Theft cases have a lower number of reused addresses (supporting H3)
The results of the theft cases support the hypothesis of H3 and illustrate shifts in the behaviour when the tainted Bitcoins were exchanged with identified services.

Theft cases have a higher number of fresh addresses (supporting H4)
Similar to the H3, the results of the theft cases support the hypothesis of H4 and indicate that most of the illegal Bitcoin users tend to use fresh addresses that are typically not reused afterwards.

Theft cases have transactions with a lower number of addresses per transaction (contradicting H5)
The results of the theft cases support the hypothesis of H5 by illustrating that the illegal Bitcoin users perform mostly small transactions. However, the results indicate a significant increase after the stolen Bitcoins reach identified service addresses.

Theft cases have transactions with diverse transaction fees (contradicting H6)
The results of the theft cases support the hypothesis of H6 by revealing diverse transaction fee spending behaviours between the sample cases. However, the results illustrate a general shift toward the day average when including transactions after reaching identified service addresses.

Summary of metric results
The results of evaluation metrics (as shown in Figs. 8,10 and 11 12,13) show that the transactions in TC AP and TC Full typically have clearly different transaction behaviours except for the Dirty-First strategy. 17 The grouping of sample cases based on their Bitcoins spending does not seem to illustrate a common pattern of those in the same group or a difference from the other groups outside of the Bitcoin spending results. However, this may be due to the limitation of using unrelated theft and ransomware attack cases.
While some of the evaluation metrics hypotheses are not supported by the results, the majority of the evaluation metrics show distinct results between the sample cases and the control groups, which suggest that the evaluation metrics might be useful for further contextualising of Bitcoin tracking solutions. We summarise the significant key findings that the evaluation metric illustrates as follows.

Theft cases have distinctly different transaction and address behaviours
The results of evaluation metrics illustrate that the transactions and addresses in both TC AP and TC Full are significantly different from the CG AP and CG Full , respectively, including the metrics that contradict our hypothesis. The only remarkable exception is the PETs usage metric, whereas most theft cases show minimal PET transactions similar to the control groups. Nonetheless, this is likely to be because of the incompleteness of the current address profile data, which makes the tracking process unable to identify transactions that involve a PET service.

Theft cases' transaction and address behaviours show significant change after reaching identified service addresses
The results of evaluation metrics (as shown in Figs. 8, 10 and 11 12, 13) show that the transactions in TC AP and TC Full typically have clearly different transaction behaviours except for the Dirty-First strategy. 18 The grouping of sample cases based on their Bitcoins spending does not seem to illustrate a common pattern of those in the same group or a difference from the other groups outside of the Bitcoin spending results. However, this may be due to the limitation of using unrelated theft and ransomware attack cases.

Methodology evaluation and discussion
The integration of address profiling (Section 3.1) into the taint analysis process demonstrate that it can reduce a substantial number of unessential transactions that also affect the overall transaction behaviour in the analysis results. The integration can become more effective by expanding the address profile data and classifying other types of entities. Additionally, the tracking process can use transaction profiling to recognise PET transactions and adapt its tracking operation once future work can thoroughly verify that the transaction classifications do not produce false positive results.
The introduction of context-based taint analysis strategies (Section 3.3) and compilation of multiple taint analysis strategies reveal insights into transaction behaviour patterns that would be elusive for tracking results of an individual taint analysis strategy. The Dirty-First strategy illustrates multiple occasions where fully tainted stolen Bitcoins managed to directly reach addresses that are likely to belong to a cryptocurrency service without relying on PETs. The strategy also shows the capability to reveal various behavioural changes when compared to the FIFO, LIFO, and TIHO results that may indicate the change of stolen Bitcoins' ownership.
Meanwhile, the TIHO strategy's results typically show the lowest number of stolen Bitcoins reaching service and PET addresses, compared to the FIFO and LIFO strategies. Additionally, the TIHO results for case TC21 that show usage of the CoinJoin method do not seem to exhibit a higher service address reaching or a substantial difference, compared to the other two strategies. Therefore, the TIHO strategy does not illustrate a clear benefit of providing more accurate tracking over the FIFO and LIFO strategies in this experiment.
Instead of continuing to track tainted Bitcoins after they reach a service address, the tracking process should adapt its operation to track the targeted users' activity outside of the Bitcoin system. For example, when targeted users exchange tainted Bitcoins for other cryptocurrency coins via an exchange service, then the tracking process should attempt to identify the other cryptocurrency coins that the targeted users receive using information obtained from the exchange service involved. Subsequently, the tracking process can continue tracking using the blockchain data of the exchanged cryptocurrency coins (Yousaf et al., 2019). Similar to Bitcoins that reach a service address, the tracking process should adapt its algorithm for tracking obscured Bitcoins like zero-taint Bitcoins with specialised tracking strategies. There are two strategies proposed by previous studies to track zero-taint Bitcoins as far as we know. The first strategy operates by matching every transaction in the blockchain that occurs during the mixing period with a set of criteria and filtering the potential transaction outputs that may contain the targeted mixed Bitcoins (Hong et al., 2018). The second strategy involves a method called Address taint analysis, which is a variant of taint analysis designed to identify the mixer service address network and produce a transaction network that may be involved with the mixing operation. Then, the outputs of the targeted Bitcoins can be pinpointed using a set of criteria similar to the tracking strategy above (Tironsakkul et al., 2020).

Limitations
Although context-based tracking demonstrates potential benefit in reducing a relatively large number of unessential transactions, there are limitations to our approach and experiment that we discuss below.

Potential incorrect profiling
As context-based tracking is designed with the assumption that service addresses are exit points of targeted Bitcoins, the approach has a limitation where service address profiling data may contain false positive results. This limitation can stem from users setting up false service addresses to trick the address profiling or disreputable services sharing their transactions with other addresses via the CoinJoin method. It is possible to mitigate this limitation by utilising more thorough address profile data gathering and verification methods, which will also ensure that context-based tracking is most effective and accurate.

Limited to publicly available data only
In this study, we choose to use profile data only from publicly available sources to ensure the replicability of the methodology. The address and transaction profiling methodology and data we utilise in this work are likely incomplete, and many addresses that belong to cryptocurrency services and PETs remain unidentified (false negative). This limitation can be improved with more address and transaction profile data. One example of data sources that can help strengthen context-based tracking is blockchain analysis companies, which typically possess more extensive profile databases compared to the public sources we utilise in this work. Expanded address profile data will also allow us to analyse the Bitcoin spending of each sample theft case with more accuracy.

Unrelated theft cases
The hypothesis for the evaluation metrics in this experiment is based on the assumption that the illegal sample cases would generally follow privacy practices to increase the tracking difficulty. However, the results reveal that the sample cases do not share common behaviours as we hypothesised. This issue is likely due to the use of unrelated theft cases as samples and the limited number of sample cases we study. A higher number of sample cases may help illustrate more common behaviours among theft cases. Nevertheless, the evaluation metric still serves its intended purpose of illustrating distinct behaviour in the sample theft transaction cases from those of the control groups.

No ground-truth on thieves intention
Additionally, the current work lacks the ground-truth data of the sample theft cases that can verify the actual movement of the stolen Bitcoins. As such, we could not thoroughly analyse the theft cases in this work. While it is likely that some of the theft cases we examined are already solved by law enforcement, this information typically is not publicly available because of the nature of the information itself. This information will allow us to compare and evaluate each taint analysis strategy and the profiling data.

Conclusion
In an attempt to precisely track Bitcoins and other similar cryptocurrency coins, tracing the targeted Bitcoins to the end of the blockchain would only show which pseudonymous addresses are the last holders of the targeted Bitcoins chosen by the tracking process. The methodology we presented in this paper proposes to make the tracking process adaptive to the change in Bitcoin ownership with address profiling. The results of our experiment involving the analysis of 26 historical Bitcoin theft cases compared to a set of controls show benefits in incorporating address profiling to taint analysis process and confirm the relevance of the set of metrics we defined. One of the context-based strategies we introduced, Dirty-First, allows us to observe the spending and obscuring strategies of the stolen Bitcoins used by illegal Bitcoin users.
However, the TIHO strategy does not show distinct outcomes, compared to existing taint analysis strategies.
Just as the privacy in Bitcoin and other cryptocurrencies continue to evolve to protect its users from tracking attempts, so too must the tracking methodology. Our context-based tracking methodology presents the necessary improvements for cryptocurrency tracking efforts and provides the next step for future cyber forensics research to assist in understanding practices within cryptocurrencies and combating cybercrimes.