Crypto Mining Makes Noise

—A new cybersecurity attack (cryptojacking) is emerging, in both the literature and in the wild, where an adversary illicitly runs Crypto-clients software over the devices of unaware users. This attack has been proved to be very effective given the simplicity of running a Crypto-client into a target device, e.g., by means of web-based Java scripting. In this scenario, we propose Crypto-Aegis , a solution to detect and identify Crypto-clients network trafﬁc—even when it is VPN-ed. In detail, our contributions are the following: (i) We identify and model a new type of attack, i.e., the sponge-attack , being a generalization of cryptojacking; (ii) We provide a detailed analysis of real network trafﬁc generated by 3 major cryptocurrencies; (iii) We investigate how VPN tunneling shapes the network trafﬁc generated by Crypto-clients by considering two major VPN brands; (iv) We propose Crypto-Aegis, a Machine Learning (ML) based framework that builds over the previous steps to detect crypto-mining activities; and, ﬁnally, (v) We compare our results against competing solutions in the literature. Evidence from of our experimental campaign show the exceptional quality and viability of our solution—Crypto-Aegis achieves an F1-score of 0.96 and an AUC of 0.99. Given the extent and novelty of the addressed threat we believe that our approach and our results, other than being interesting on their own, also pave the way for further research in this area.


I. INTRODUCTION
Blockchain actually tries to solve the old problem of distributed consensus by exploiting solutions matured from decades of research [1].The solution coming from Bitcoin's blockchain is particularly interesting: entities participating to the "voting" process should prove to have solved a moderately hard puzzle, the so called Proof-of-Work (PoW) [2].Indeed, for the vast majority of cryptocurrencies, in order to verify a transaction and to have it added to the distributed ledger, participants are requested to compute a PoW.Computationally solving PoW is referred as mining.Over time, the complexity of puzzle solving (typically based on hashing as per Bitcoin and several others, such as Altcoin) has increased, leading to a rush for deploying more and more powerful systems that nowadays are able to compute more than 40 • 10 18 hashes per second (worldwide hash rate for Bitcoin at the time of writing this paper [3]).ASIC architectures are today guaranteeing the best trade-off between power consumption, terrific hash rate, size, cost, and life-time.The recent adoption of ASIC architectures brings in again the major issue of centralization [4].Indeed, the huge gap between CPU/GPU and ASIC mining makes the latter the only viable way to participate to the network as a miner.Eventually, this causes centralization since only ASIC-based crypto-miners can participate to the consensus process.In order to mitigate the above trend, other digital currencies have been created that are actually exploiting different PoW strategies, being therefore ASIC-resistant.For instance, Monero and Bytecoin are just two examples of cryptocurrencies specifically designed to be mined with CPUbased architectures.Indeed, both the cryptocurrencies adopt the CryptoNight PoW algorithm where the marginal benefit derived from specialized architectures such as GPU, FPGA, or ASIC does not introduce any significant gain for justifying the adoption of such a hardware.Therefore, mining is usually performed via CPU-based architectures.The CryptoNight algorithm works by filling a segment of cache with random data corresponding to memory addresses, then subsequently hashing the resulting block after reading and writing to those addresses [5].
PoW is becoming a significant source of revenues for the entities participating to the consensus process.This phenomena will grow even more with the increasing number of users joining the digital currency markets.However, while PoW computational requirements are fueling methodologies and techniques to achieve more and more computational power with less energy consumption, new malicious practices involves PoW-offloading to unaware users.Indeed, a very recent cybersecurity attack involves the illicit use of resources from an unaware users to carry out PoW, i.e., cryptojacking [6].This attack mainly consists on the unauthorized mining of cryptocurrencies allowing malicious parties to steal resources in terms of CPU, GPU, and memory from a target machine with the aim of effortlessly collecting crypto-wealth.This behaviour is gaining momentum for two main reasons: the ease of deployment of crypto-clients; and, the difficulty to detect those crypto-clients.While being a general threat, cryptojacking is becoming particularly critical in Corporate ICT where the vast majority of laptops, desktops, and smartphones are distributed among the employees under a limited (if any) supervision.Indeed, several unauthorized mining activities have already been discovered.Russian nuclear scientists have been arrested for "Bitcoin mining plot" [7], the US government banned a Professor for secretly mining with National Science Foundation supercomputers [8], a former Federal Reserve employee was sentenced to 12 months probation and a $5,000 fine after pleading guilty to installing unauthorized software that connected to an online Bitcoin network in order to earn units of the digital currency [9], a Harvard student used 14,000-Core supercomputer to mine Dogecoin [10], the factory lines • A new type of attack: we define a novel type of attack that subsumes the cryptojacking attack, i.e., the spongeattack, where an adversary (either internal or external) secures a personal profit illicitly exploiting third party computing resources.• The Crypto-Aegis Framework: a ML based framework to detect and identify clients suffering from a spongeattack.Crypto-Aegis enjoys the following features: (i) Infrastructure independence.The analysis is performed at the exit points (edge) of the Corporate network, independently of network size, network layout, and even when multiple layers of encryption are set in place, e.g., a VPN is in use; (ii) Device Independence.We do not require any modification to the already existing devices adopted by the Corporate employees; (iii) Multi-adversarial profiles support.Our solution detects the presence of illicit behaviours via network traffic analysis and independently of the adversarial profiles, i.e., be it an insider or an outsider; (iv) No clean state required.Our solution detects the presence of a miner independently of the time the miner started its activities; and, (v) Effectiveness.Our solution achieves an F1-score of 0.96 and the AUC of the ROC is greater than 0.99.Roadmap.The paper is organized as follows.Section II resumes the most important contributions in the area.Section III introduces the scenario and the adversary model.The details related to the measurement setup are depicted in Section IV, while a throughout analysis of the collected network traffic traces is presented in Section V. Section VI depicts a baseline example analysis, i.e., Bitcoin vs standard software, introducing all the statistics that will be considered for the subsequent analysis.Sections VII and VIII introduce the metodologies used by our Crypto-Aegis framework, related to the detection and identification of full nodes and miners, respectively.Finally, Section IX tackles with the general problem of detecting a Crypto-node in a Corporate network, while a detailed discussion of our results and a comparison with other solutions from the literature is presented in Section X. Section XI draws some concluding remarks.

II. RELATED WORK
The computational power required to validate and to add blocks to the Bitcoin blockchain has greatly limited the odds that individuals without specialized hardware can provide any contribution to this process.Dedicated small devices (e.g., smartphones, laptops, desktop), or more powerful ones (e.g., workstations, servers), to the mining process would not be worth the cost of the electricity.This discourages users and leaves only a few in the world the opportunity to make contributions and earn the rewards arising.With the advent of other CPU-based cryptocurrencies this scenario has undergone many changes.History repeats itself again.In other ages, by seeing a mine populated by mechanical diggers, the gold digger with the only pick-axe on his shoulders would be forced to find new promising shores.This return to the gold rush led to the rediscovery of numerous attacks that had lost meaning with Bitcoin.This type of attacks are identified by the term cryptojacking.Hackers, as well as dishonest employees who would like to round off their earnings, "borrow" resources belonging to others to run the mining process.Hand in hand with threats, some solutions have been already proposed, with the aim of implementing countermeasures to mitigate their effects.

A. Cryptojacking and Network Traffic Classification
Cryptojacking (also known as Drive-by Mining).The computational power required to validate and to add blocks to the Bitcoin blockchain has greatly limited the odds that individuals without specialized hardware can provide any contribution to this process.Dedicating small devices (e.g., smartphones, laptops, desktop), or more powerful ones (e.g., workstations, servers), to the mining process would not be worth the cost of the electricity.This discourages users and leaves only a few in the world the opportunity to make contributions and earn the rewards arising.With the advent of other CPU-based cryptocurrencies this scenario has undergone many changes.History repeats itself again.In other ages, by seeing a mine populated by mechanical diggers, the gold digger with the only pick-axe on his shoulders would be forced to find new promising shores.This return to the "gold rush" led to the rediscovery of numerous attacks that had lost meaning with Bitcoin.This type of attacks are identified by the term cryptojacking.Hackers, as well as dishonest employees who would like to round off their earnings, "borrow" resources belonging to others to run the mining process.Hand in hand with threats, some solutions have been already proposed, with the aim of implementing countermeasures to mitigate their effects.
In [12], the state-of-the art of crypto-mining attacks have been investigated.By analyzing the malware code, as well as its behavior upon execution, authors examine two common attacks: web browser-based crypto-mining, and installable binary crypto-mining, respectively.Browser-based cryptomining attacks exploit the JavaScript technology of webpages, taking advantage of two web technology's advancements: asm.js and WebAssembly [13], while installable binary crypto-mining is possible by using modified versions of the XMrig software [14].The paper analyzes the techniques adopted by cybercriminals to establish a persistence mechanism and avoid detection, and it introduces both static and dynamic analysis, useful to uncover the techniques employed by the malware to exploit potential victims.In [15] the authors present an in-depth study over cryptojacking.After identifying a set of inherent characteristics of cryptojacking scripts, such as the repeated hash-based computations and the regular call stack, a behavior-based detector called CMTracker has been introduced.The analysis of 853,936 popular web pages led to the identification of 2770 unique cryptojacking samples, of which 868 belonging to Alexa's top 100k ranking websites.A similar solution has been proposed by [16].The authors propose an approach aiming to identify mining scripts, conducting a largescale study on the prevalence of cryptojacking in the Alexa's 1 million websites.According to the analysis, on average 1 out of 500 websites hosts a mining script.Numerous works have followed the same direction.In [6], authors conduct measurements to establish the cryptojacking relevance and profitability, wondering whether it should be classified as an attack or as a business opportunity.In [17], 138 million domains have been explored, of which 137 million among com/net/org domains and 1 million coming from the Alexa's Top 1 million list.The analysis shows that the prevalence of browser mining is currently the 0.08% of the analyzed set, a worrying number that should not be underestimated.The Alexa's Top 1 million websites have been taken into account even by [13], in which the authors studied the websites affected by drive-by mining to understand the techniques being used to evade the detection.As a result, 20 active cryptomining campaigns have been identified.
A first step towards the application of Machine Learning techniques has been made by [18].The authors present an experimental study in which the dynamic opcode analysis successfully allows the browser-based crypto-mining detection.The proposed model can distinguish among crypto-mining sites, weaponized benign sites (e.g., benign sites to which the crypto-mining code has been injected), de-weaponized cryptomining sites (e.g., crypto-mining sites to which the start() call has been removed), and real world benign sites.Although the paper proposes an innovative approach to detect cryptomining activity, network traces are not analyzed and only web-based crypto-mining is taken into account.In [19], the authors introduce a method to detect the browser's malicious mining behavior.Heap snapshot and stack features have been asynchronously extracted and automatically classified using Recurrent Neural Networks (RNNs).With 1159 malicious samples analyzed, the experimental results show that the proposed prototype recognizes the original mining samples with 98% of accuracy if not encrypted, 93% otherwise.

III. SCENARIO, ASSUMPTIONS, AND ADVERSARY MODEL
In the following, we describe our reference scenario, the assumptions we make as for the network infrastructure and network traffic classification and, finally, the adversary model.

A. Scenario
Figure 1 shows the details of our reference scenario.We consider a corporate network constituted by several interconnected devices, including one that is controlled by a malicious entity, willing to mine cryptocurrencies without being detected.Our solution should be deployed at the network edge and it involves only an Ethernet connection from the main Corporate Network switch to a server running our Machine Learning algorithm.We observe that our solution requires interventions neither on the employee devices, nor on the already existing network infrastructure.Moreover, our solution can be easily deployed even when there are multiple exit connections between the Corporate Network and the Internet: this can be easily achieved by deploying multiple Ethernet links to collect the data from the Corporate Network exit points.We observe that the above configuration is very conservative with respect to standard commercial solutions.Indeed, in the vast majority of cases, corporate solutions involve hardware for deep packet inspection deployed before the exit point, or even at multiple locations of the network.On the one hand the association between traffic and device is much easier, while on the other hand it requires a significant cost in terms of hardware equipment and deployment.In our solution, we consider the traffic already aggregated, i.e., affected by IP masquerading/NAT, or even tunneled and re-encrypted by a Virtual Private Network (VPN).

B. Adversary model
We consider two adversary models with respect to the corporate network: (i) insider; and, (ii) outsider.We assume the insider has direct access to the hardware resources of the company, and therefore, has the opportunity to install new software into it.A typical example is the employee willing to accumulate crypto-wealth by exploiting corporate resources such as CPU, GPU, and network bandwidth.Moreover, we envisage an external adversary (outsider) being able to inject one or more corporate devices with a malicious software for performing unauthorized crypto-mining.Typical examples might be both the increasing number of malware delivering crypto-mining software to unaware users, and websites running crypto-mining Java scripts without the user's consent.Our adversary model (as depicted in Fig. 1) takes into account a corporate device illicitly running crypto-mining-related activities.As previously stated, we stress that our model takes into account a malicious device that might be controlled by either a dishonest employee or by a remote hacker who took over control of the device itself.
There are several strategies to mine without the company consent, an activity that today is really difficult to detect and prevent.This malicious behaviour might be implemented by either a full node or a miner.
• Full node.It is a full-featured client of the cryptocurrency infrastructure.It locally stores the whole blockchain and participates to the consensus algorithm, being able to validate all the transactions.The mining activity performed by a full node is called solo mining because the process is done independently from other nodes.• Miner.It is a lightweight software that implements a simple worker that receives jobs (i.e., hash computations useful for the PoW) from a third party (i.e., a mining pool).When the mining pool successfully mines a block, both the reward and the fees will be divided among all the participants, proportionally to the computational power offered.This software does not participate in the cryptocurrency protocol and does not require to store a blockchain to work.
In our scenario, we assume the client (being either a full node or a miner) is already provided with the ledger, if needed, and does not require any warm-up operations.Indeed, Crypto-Aegis does not resort to application-specific transients and it does not require to be deployed before the malicious device starts its illicit activity.
Definition.We define sponge-attack as the malicious behavior of exploiting third-party hardware and software resources to obtain a personal profit without the consent of the infrastructure's owner.The sponge-attack illicitly absorbs resources from the corporate infrastructure and makes money out of them in favor of the attacker.This definition is more general than the one of Cryptojacking, that only refers to mining activities.The sponge-attack, instead, includes also any other activities performed with someone else's resources without authorization.An example of sponge-attack could be a malicious full node installed on a corporate server to perform a DDoS attack against a cryptocurrency network by using the company's network resources.
The sponge-attack can be implemented by deploying either a Full Node or a Miner.Miner.The use of a mining pool software allows to carry out mining activities without installing the heavier full node software.An adversary can use this software to perform a faster and stealthier attack since the targeted device does not need to store the ledger, usually very large.Furthermore, by joining a mining pool, profits are increased even if the available resources are limited.Full node.Deploying a full node into a network without the administrator consent has significant advantages for the adversary.Firstly, the full node gives to the adversary the capability to perform solo mining, if the victim's resources are sufficiently powerful.Moreover, the full node could be used to attack the cryptocurrency's network, by performing double spending attacks, DDoS attacks, Sybil attacks, Eclipse attacks and possibly others.

C. Terminology
In the following we refer to different actors and actions by using the fallowing terminology: • Crypto-client: A software illicitly installed in a device belonging to the Corporate Network with the aim of performing the sponge-attack.• Standard software: A software legitimately installed in a device of the Corporate Network.• Reference device: A laptop used for running both the Standard software and the Crypto-clients used in this paper.

IV. MEASUREMENT SETUP AND PRELIMINARY CONSIDERATIONS
In this section we provide a description of our measurement setup and a preliminary statistical analysis of the collected traces.
Measurement setup.Our measurement setup can be resumed by Fig. 2. We consider two scenarios: Scenario 1 where a VPN tunnel adds an encryption layer to the communication, and Scenario 2 where the client is directly connected to the Internet.In Scenario 1, the malicious device is connected to the Internet through an encrypted VPN tunnel.For our measurements, we used two different well-known VPN brands, i.e., Nord VPN (v.1.2.0) and Express VPN (v.1.5.0).At the time of writing this paper, Express VPN features more than 2000 servers in 148 countries while Nord VPN features 5064 servers in 62 countries.We arbitrarily set the VPN exit node to France for all our measurements.Conversely, in Scenario 2, the malicious device is directly connected to the Internet without resorting to any additional encryption layer.The malicious device-acting as our reference device when not mining-is a Dell XPS15 laptop running Ubuntu 18.04 (64 bit).All the extracted features are publicly available at [20].
Definition.We define ingoing flow all the network traffic from the Internet to our reference device.Moreover, we refer as outgoing flow the network traffic generated by the reference device and sent to the Internet.
We collected network traffic from three different cryptocurrencies (Bitcoin, Bytecoin and Monero) and three different applications (Skype, YouTube, and standard office applications mixed together) as it follows: • Skype.We run an audio Skype-call and collected all the network traffic from/to the reference device.• YouTube.We collected the network traffic generated by a random YouTube video from/to the reference device.• Office network traffic.We logged the network traffic generated by the reference device while using it for standard office tasks, e.g., e-mail, web-browsing, download and upload of files, Microsoft Office365, etc.The above applications have been selected as a reference excerpt of three traffic patterns coming from three different

Scenario 2
Direct traffic Fig. 2. Measurement setup: We consider 2 different scenarios.The malicious device is connected to the network through a VPN Tunnel (Scenario 1) and the malicious device is directly connected to the Internet (Scenario 2).We adopted one laptop for the mining activities (malicious device), one other laptop for collecting all the in-transit packets and finally, a switch featuring a monitoring port.
application scenarios that are audio calls, video streaming, and standard office network traffic.We observe how such network traffic categories cover more than 87% of the 2018 global consumer internet traffic [21].Since our idea is to infer on the presence of a Crypto-client from the network traffic, we considered the network traffic from the above standard applications as the background "noise" hiding the traffic of the Crypto-client.Our goal is to discriminate the flows involving the Crypto-client from the other flows in the network.
It is known that Machine Learning for network traffic classification is biased by several parameters, i.e., features, type of traffic, trace length, network state, etc.One major concern is related to the consistency of the extracted features given the limited trace length.In particular, we paid particular attention to capture packets from clients at steady-state and after the initial sync period was accomplished.Indeed, Cryptoclients require a warm-up period to download the blockchain and validate it.This guarantees that our log excerpts represent a consistent snapshot of a steady-state client either syncing or mining for the blockchain network.

V. NETWORK TRAFFIC ANALYSIS AND PATTERNS
In this section, we start the analysis of the collected network traffic by considering the two network flows: ingoing and outgoing, as explained in the previous section.In order to guarantee a fair comparison between the various scenarios, we extracted the same number of consecutive samples for each network trace, i.e., 4576 samples.Table II shows the network traces we have collected considering the different application scenarios, i.e., Office, Skype, YouTube, Bytecoin, Monero, and Bitcoin.For each scenario, we report the trace duration equivalent to the extracted samples, the quantile 0.5 computed on the interarrival times, and finally the quantile 0.5 computed on the packet sizes.In order to ease the discussion, we refer to each trace by using a sequence of keywords as follows: [Application][Flow direction][VPN Type], where Application can be Office, Skype, YouTube, Bytecoin, Monero, or Bitcoin, Flow direction might be either Ingoing or Outgoing, while VPN Type might be empty (no VPN), Express, or Nord.
Firstly, we observe how considering the same amount of samples involves very different collection time depending on the application scenario, i.e., about 38.37 seconds for YouTube Ingoing with Express VPN, while about 4598 seconds for Bytecoin Ingoing.In the following we provide some insight from Table II Discussion.VPN tunnelling tends to squeeze the packets all together and to increase the packet size.Bitcoin is special: VPN tunnelling is affecting much less the original traffic pattern although there are some significant variations for the packet size.It is worth noting the differences among the cryptocurrencies when the traffic is collected without VPN tunnelling.Interarrival times and packet sizes are very different from each other among the currencies as well as between the ingoing/outgoing flows.
Given the above considerations, we consider more indepth analysis of the flows in order to subsequently identify the features to be used for the Machine Learning process.Figures 3 and 4 show quantile 0.05, 0.5, 0.95, minimum and maximum values associated to each collected trace during our measurements.Outgoing flows present packet sizes very different from each other in the range between 100 and 1000 bytes.Only few exceptions fall out of that range, while it is worth noting how quantile 0.5, e.g., the median, changes for each network trace.Moreover we observe that, raw traffic from Bytecoin, Monero, and Bitcoin present almost the same values   are randomly distributed and we do not observe any significant pattern in the VPN tunneling of cryptocurrency clients.We performed the same analysis for the interarrival times obtained by differentiating the absolute arrival times logged by WireShark.The ingoing flows (Fig. 5) of cryptocurrencies are characterized by very similar values, i.e., almost the same quantile 0.05 and 0.95, although we observe that the median values span between 10 −2 and 10 −4 seconds.Similar observations can be drawn by looking at the outgoing flows as depicted by Fig. 6.

VI. TRAFFIC CLASSIFICATION: A BASELINE EXAMPLE
We implemented all the traffic-classification related tasks in MatLab (R2018a) adopting the Statistics and Machine Learning Toolbox © .Our Crypto-client detection algorithm involves the following steps: • Features extraction.Features identification and extraction are paramount activities to maximize the performance of the classifier.In this work, we consider several features starting from the very standard ones, i.e., interarrival time and packet size.We also consider other derived features with the aim of validating how they affect the final classifier performance.Fig. 6.Interarrival times for outgoing flows: Candle-sticks represent the minimum, quantile 0.05, quantile 0.95 and the maximum interarrival times for the outgoing flows while the circles represent quantile 0.5.
• k-Fold Cross Validation.Cross validation is a common practice to average the results of Machine Learning algorithms.It is usually performed by defining a random partition of k out of n observations.The partition divides the observations into k disjoint subsamples (or folds), chosen randomly but with roughly equal size.The default value of k is 10.• Random Forest (RF).We adopted the TreeBagger MatLab class to implement the RF algorithm.The TreeBagger combines the results of many decision trees, which reduces the effects of overfitting and improves generalization.TreeBagger grows the decision trees in the ensemble using bootstrap samples of the data.• Statistics.This task involves the generation of statistics from the classifier results.Our statistics include (among others) True Negative (TN), False Positive (FP), False Negative (FN), and True Positive (TP), confusion matrix, etc..
In this section, we introduce a simplified version of our methodology considering a binary decision problem.Our goal is to analyze the traffic of a network to determine whether a malicious mining activity is happening.We recall that the traffic collected from the Bitcoin client is related only to the syncing process (being a Full Node), while the one related to the mining process will be considered later on.Moreover, as for the "noise" traffic, we adopted a laptop featuring Windows 10 PRO and performing standard office tasks as discussed in the previous section.We now consider only two network traces from Table II: Office Outgoing and Bitcoin Outgoing.Moreover, we assume the hypothesis H 0 : the current network event has been generated by the Bitcoin client.We run the 10-Fold cross validation algorithm using the RF algorithm (with a default value of 20 trees) and only two features: interarrival time and packet size.Table III shows the confusion matrix associated to the classifier results, i.e., 4314 times Bitcoin is correctly recognized (True Positive -TP) while 4321 times the class Office is correctly recognized (True Negative -TN).The other values refer to False Positive -FP, i.e., 254 observations are wrongly classified as Bitcoin, and False Negative -FN, i.e., FPR and TPR can be used to highlight the classifier performance at different threshold values when the system can accept different levels of false positive values.Figure 7 shows the Receiver Operating Characteristic (ROC) curve consisting of True Positive Rate (TPR) as a function of False Positive Rate (FPR).Another important metric directly connected to the ROC curve is the so called Area Under the Curve (AUC), i.e., the area under the ROC curve being a value spanning between 0 (worst case) and 1 (best case).As for the ROC curve in Fig. 7, AUC is about 0.971 for both the classes, Bitcoin and Office, respectively.
We now add more features to the current scenario and we analyze the performance of the classifier.Let us define the already (basic) introduced features and the new ones, as follows: • Interarrival time (δ): the time elapsed between two consecutive packets.• Packet Size (γ): Packet size associated to each packet.
• Moving mean of δ (µ δ (w)): each mean value is calculated over a sliding window of length w across neighboring elements of δ. • Moving standard deviation of δ (σ δ (w)): each standard deviation is calculated over a sliding window of length w across neighboring elements of δ. • Moving mean of γ (µ γ (w)): each mean value is calculated over a sliding window of length w across neighboring elements of γ. • Moving standard deviation of γ (σ γ (w)): each standard deviation is calculated over a sliding window of length w across neighboring elements of γ.
In order to evaluate the impact of the features on the classification algorithm we used the Mean Square Error (MSE) averaged over all the trees in the ensemble and divided by the standard deviation taken over the trees, for each feature.The larger this value, the more important the feature is in the classification process.Figure 8 shows the Mean Square Error (MSE) as a function of the moving window size (w) and the different type of features.Firstly we observe that, for this scenario-Bitcoin Vs Office-the most important feature is γ, i.e., the packet size, represented by the red bar.The other features have about the same weights, while it turns out that w = 5 is a good trade-off for the window size of the moving mean and the standard deviation.

VII. CRYPTO-AEGIS: DETECTION AND IDENTIFICATION OF FULL NODES
In this section, we consider the traces of Table II while parting them into ingoing and outgoing flows.As previously discussed, we consider three main metrics: True Positive Rate (TPR), False Positive Rate (FPR) and the Area Under the Curve (AUC).Our RF classifier has been configured with 20 default trees, the 6 features already introduced in the previous section, and a moving window of 5 observations.Moreover, we consider only Full Node clients; therefore, the observed network traffic will be related to syncing and consensus operations.Ingoing flows.Figure 9 shows TPR and FPR for all the cryptocurrencies we have considered in this work.Firstly, we observe how the overall results are quite satisfactory, i.e., the mean computed on the TPR and FPR values is about 0.86 and 0.0088, respectively.The best detection performance are achieved over Bytecoin Express (TPR=0.92,FPR=0.0047) and Bitcoin Express (TPR=0.92,FPR=0.008).Conversely, worst case performance are achieved for: • Bytecoin (TPR=0.81,FPR=0.012).Misclassifications are mainly due to Monero (332 cases -7%), Bitcoin (106 cases -2%), and Bytecoin Nord (97 cases -2%).• Monero Express (TPR=0.80,FPR=0.014).False positive are mainly due to Office Express (381 case -8%), YouTube Express (148 cases -3%) and Bytecoin Express (63 cases -1%).• Bitcoin Nord (TPR=0.84,FPR=0.009).Classification errors mainly come from Monero Nord (251 cases -5%), Bitcoin Express (217 cases -4%) and Bytecoin Nord (70 cases -1%).Our results prove that ingoing flows (from the Internet to the device) can be used to effectively identify malicious miners inside local networks.In particular, traffic generated by cryptocurrencies clients (without VPN) can be detected with high TPR values, i.e., 0.84, 0.87 and 0.90 for Bytecoin, Monero, and Bitcoin.The adoption of a VPN tunnel does not improve the privacy of the Crypto-client: TPR is increasing for Bytecoin when tunneled through a VPN, while Monero and Bitcoin have diverging performance as a function of the adopted VPN brands.Moreover, we observe that worst case performance are due to Crypto-clients misclassified for other Crypto-clients; indeed, there is only one exception: 148 cases of YouTube Express classified as Monero Express.We consider the previous phenomenon as a by-product of the VPN tunnelling; indeed, it is reasonable to assume that the same VPN is (slightly) re-shaping different traffic patterns in the same way.
Outgoing flows.Figure 10 shows TPR and FPR for all the cryptocurrencies we have considered in this work.As for the ingoing flows, we observe how the overall results are, again, quite satisfactory, i.e., the mean value computed on the TPR and the FPR values is 0.85 and 0.008, respectively.The best detection performance are achieved over Bitcoin Express (TPR=0.93,FPR=0.006) and Bitcoin (TPR=0.90,FPR=0.005).Conversely, worst case performance are achieved for: • Monero Express (TPR=0.79,FPR=0.011).Misclassifications are mainly due to Office Express (563 cases -12%) and YouTube Express (295 cases -6%).• Bytecoin Nord (TPR=0.77,FPR=0.012).False positive are mainly due to Monero Nord (281 cases -6%) and Bitcoin Nord (202 cases -4%).• Bytecoin (TPR=0.78,FPR=0.009).Classification errors mainly come from Monero (345 cases -7%) and Bitcoin (147 cases -3%).The above considerations prove that outgoing flows (from the device to the Internet) can be used to effectively identify malicious miners inside local networks.Crypto-clients (without VPN) can be detected with TPR of about 0.80 (0.78), 0.84 (0.87) and 0.88 (0.90), for Bytecoin Ingoing (Outgoing), Monero Ingoing (Outgoing) and Bitcoin Ingoing (Outgoing), respectively.As for the ingoing flow analysis, VPN does not significantly increase the privacy of the device: TPR values fluctuate depending on the adopted VPN brand, but they still remain high-and even increase for some cases.One more interesting aspect is that misclassifications happen among cryptocurrencies adopting the same VPN, i.e., a Crypto-client is misclassified as belonging to another cryptocurrency but still using the same VPN; although this is a marginal effect (most frequent case is less than 12%) we can reasonably suspect that VPN tunnelling is shaping the traffic patterns; that is why (at least partially) the Machine Learning algorithm experience such type of misprediction.
Area Under the Curve (AUC).As previously introduced, the Area Under the Curve (AUC) is the area under the ROC curve.Its value spans between 0 and 1, and the largest it is, the better the classifier's performance are. Figure 11 shows the AUC values for the different traces/scenarios in Table II.For both the ingoing and outgoing flows, the worst performance are experienced by Bytecoin Express, Bitcoin Express, and Bytecoin.Of course, this does not imply that VPN tunnelling is protecting the privacy of the clients as previously discussed.Indeed, we do not observe any significant difference with or without the presence of a VPN tunnel: for all the three cryptocurrencies the AUC values in presence of a VPN tunnel are still very high, i.e., larger than 0.955.

VIII. CRYPTO-AEGIS: DETECTION AND IDENTIFICATION
OF MINERS In the following, we consider cryptocurrency clients acting as pure Miners as already introduced in Section III.Therefore, we consider bfgminer for both Bitcoin and XMrig for Bytecoin and Monero as illicitly installed clients in a device belonging to a Corporate Network.As for the full node's case, we consider two different VPN, i.e., Express VPN and Nord VPN, and two flows, i.e., ingoing and outgoing.
In the vast majority of cases, the mining task is performed by mining pools [22] [23].In practice, Miners collaborate in pools to lower the variance of their revenue by sharing rewards with a group of other Miners.We consider the same measurement setup of Fig. 2, but we used a Miner as the peer for the blockchain network.As for the analysis of the Full Nodes (Section VII), we part all the collected traces into two subsets: ingoing and outgoing flows.
Table IV resumes the network traces we have collected when considering a Miner as a client: for each trace, we report the duration, the median of both the interarrival times, and the packet sizes.Firstly, we reduce all the traces to the same amount of packets to guarantee a fair classification process for the RF algorithm: we choose 832 packets.We observe how the number of packets from the Miners is less than the one of the Full Nodes: this is mainly due to the fact that Miners generate Ingoing flows.Figure 12 shows the TPR and the FPR for different Miner clients when their ingoing flow is compared with the ingoing flows of Office, Skype, and YouTube (same traces from the Standard software of Table II).Firstly we observe how raw traffic (not tunneled through a VPN) is better identified: Bytecoin, Monero, and Bitcoin turn out to have TPR values equal to 0.993, 0.983, and 0.985, respectively, while FPR values are equal to 0.0018, 0.0004, and 0.0011, respectively.
We highlight how-for both the Outgoing and Ingoing cases-misclassification are only due to cryptocurrencies,  i.e., the vast majority of the False Positive are due to other cryptocurrencies and not to Standard software traffic.Finally, it is interesting to observe how VPNs are more effective to protect the privacy of the Miners respect to the Full Node scenario.This is mainly due to the fact that the overall traffic sent/received by each Miner is significantly less than that one of Full Nodes and it is buried in the VPN-encrypted traffic.

IX. CRYPTO-AEGIS: SPONGE-ATTACK DETECTION
In this section, we consider the more general binary decision problem of detecting the presence of a Crypto-client in the scenario of Fig. 1.Therefore, we assume the hypothesis H 0 : the current network event has been generated by a Cryptoclient.We consider the traces from the Standard software previously introduced, i.e., Office, Skype, and YouTube, and we test them against all the traces involving a Crypto-clients.As a preliminary step, we pre-process all the traces and uniform their lengths to obtain two classes: the first one constituted by Office, Skype, and YouTube traces (evenly distributed); and, the second one constituted by all the traces from the Crypto-clients (Bytecoin, Monero, and Bitcoin) with and without the VPN tunnels.Table V ( VI) shows the confusion matrix associated to the previously introduced binary classifier assuming the ingoing (outgoing) network traffic.Firstly, we observe that True Positive sums up to 290728, i.e., the number of times Crypto-clients are correctly identified.Moreover, the classifier correctly identifies 288452 events as network traffic from Standard software.Nevertheless, there are 10255 and 7979 false positive and false negative events, respectively.The results on the outgoing traffic are characterized by similar outstanding performance, i.e., TP=276187, TN=275969, FP=9370, and FN=9152.
Figure 14 shows the Receiver Operating Characteristic (ROC) curve consisting of the True Positive Rate (TPR) as a function of the False Positive Rate (FPR) assuming only the ingoing flows.We also report: the Area Under the Curve (AUC), being it equal to about 0.99428 for the ingoing traffic; and, the F1-score -given by the harmonic average of the precision (TP / (TP + FP)) and the recall (TP / (TP + FN)).F1-score is a measure of the accuracy: it is equal to 1 when there is perfect both precision and recall, while it approaches 0 when conditions worsen.
Figure 15 shows the Receiver Operating Characteristic (ROC) curve consisting of the True Positive Rate (TPR) as a function of the False Positive Rate (FPR), assuming only the outgoing flows.As for the previous case, we report both the AUC and the F1-score.
Time to detect the Crypto-client.Recalling Fig. 8, we observe that the detection time strictly depends on the number of packets requested by the feature generation algorithm; in particular, the moving mean and the moving standard deviation.Full Nodes are characterized by very short interarrival times, i.e., worst case is equal to 0.022749 seconds (median value for YouTube Outgoing Express), and therefore, waiting for 5 subsequent packets (moving window size used throughout this paper) involves a really short period of time, i.e., less than 120 ms.Even the longest window size depicted in Fig. 8, i.e., 12 packets, requires less than 300ms.Conversely, Miners require more time for being identified due to their stretched interarrival times.We observe that there are specific Crypto-client configurations from Table IV that are characterized by large interarrival times, i.e., Bytecoin Ingoing and Monero Ingoing.For such cases, assuming a moving window size w equal to 5 consecutive packets, 12.05 and 69.85 seconds are required, respectively.

SOLUTIONS
In this section, we discuss the results shown in this paper, while also comparing our Crypto-Aegis framework against competing solutions existing in the literature.
Full Node.Our solution turned out to be very effective for the detection and identification of Full Nodes in local networks.The RF algorithm is able to independently detect and identify Full Nodes by leveraging either ingoing and outgoing flows.TPR and FPR metrics have similar values although we observe a few exceptions related to the worst cases: Bytecoin Ingoing and Monero Express Ingoing (TPR > 0.8), and Bytecoin Outgoing and Bytecoin Nord Outgoing (TPR > 0.79).Conversely, FPR are always less than 0.015, guaranteeing an extremely low number of potential false alarms.
Miner.Miners' detection performance are similar to the previous case, although we observe that TPR values for the outgoing flows are (in general) greater than the ones for the ingoing flows.This is mainly due to worse anonymization performance achieved by Express VPN for all the ingoing flows.More in general, we highlight the outstanding performance of the classifier when the crypto-clients' network traffic is not tunneled through a VPN.Finally, we stress that, for both the above scenarios (Full Nodes and Miners), false positives are mainly due to cryptocurrencies predicted as other cryptocurrencies.This significantly mitigates the more general problem of the number of false negative when considering the detection of a Crypto-client at large: since FP are still due to other cryptocurrencies, the detection algorithm could be still considered successful (although not being able to fully identify the Crypto-client).

A. Comparison with other solutions
To the best of our knowledge, our contribution is the first one to leverage Machine Learning techniques to detect Crypto-clients by analyzing the network traffic-though ML techniques have been extensively used to effectively detect anomalies in network traffic such as malware, intrusion detection, etc..In this paper we have proved that network traffic generated by Crypto-clients is characterized by pattern anomalies with respect to standard applications such as Skype, YouTube, and standard software used during typical office tasks.
Naive solutions [24] [25] [26] dealing with pure web-based mining (cryptojacking) involve blacklisting, a set of malicious URLs reported as suspicious from different sources, i.e., Twitter, blogs, etc.The idea mainly resorts to the installation of a plugin for the web-browser and by monitoring the connections, preventing those belonging to the blacklist.Blacklisting is neither effective nor efficient since it suffers from many false negatives due to URL randomization and requires the installation of third-party software in all the corporate devices.Other solutions involve to monitor the CPU throttle and asking for extra permissions to the web-browser when a process requires high CPU usage [27].While monitoring the CPU is a promising solution, such a technique potentially suffers from high false positive rates due to the impossibility of discriminating between the miners and other CPU demanding processes, such as videogames.Another interesting technique has been recently proposed by [13]: authors combined multiple techniques to eventually identifying cryptographic operations and inferring on the execution of a miner.
We observe that while the above solutions can be adopted for the detection of Miners under certain conditions, they do not take into account Full Node detection.For this reason, the solo mining activity, as well as other types of attacks described in Section III-B, are not recognized by existing solutions.Moreover, none of these solutions can be used to identify the cryptocurrency used by the attacker.Finally, we highlight that all the host-based solutions can be only adopted for the detection of outsider adversaries delivering the crypto-jacking attacks to the users browsing the web.Indeed, none of the above solutions can be used for detecting insider adversaries, i.e., malicious corporate employee willing to run the attack from inside the Corporate Network by having full rights and control of their devices, i.e., laptops, desktops, servers, etc.. Table VII wraps up on the comparison between our solution and the existing ones from the literature.

B. Limitations.
In the following we list the major limitations of the proposed solution, as well as the general limitations of the adopted methodology.
Our Crypto-Aegis Framework, being rooted on ML techniques, is significantly affected by the features' identification and selection.Although we provided a detailed analysis about the impact of features to the classification algorithm performance, other features might improve the figures reported in this paper.The applications that could generate false positives (e.g.Skype, Web Browsers, Email clients) considered in this study and described in Section IV, are limited in number but we believe they are quite representative of a typical scenario (e.g. a corporate network) and as such, they represent a good starting point to lay the foundations of crypto-mining activities detection by only considering the encrypted network traffic.
The length of the training dataset is another significant parameter, and adopting longer traces is part of future work, though our results are already good (an AUC of 0.99).In our analysis, we provided an estimation of the time required for detecting and identifying both the Full Node and the Miner, i.e., about 12 seconds and 70 seconds, respectively, but we are aware that those estimations are affected by several CPU-throttle monitoring [27] MineSweeper [13] Crypto-Aegis parameters, i.e., the network configuration, the Crypto-clients, the features, and possibly others.Moreover, the geographical distribution of the nodes participating to the consensus mechanism might represent another significant factor for the detection and identification of the Crypto-client.Finally, all the ML-based techniques suffer from the evasion attack, where a Crypto-client might reshape its (outgoing) network traffic to decrease the detection and identification performance of the classifier.Note that all the above issues are clear directions for further research.

XI. CONCLUSION
In this paper, in the context of unauthorized crypto-mining activities, we have first proposed a novel attacker model (sponge-attack) that subsumes the attacker model present in the literature (cryptojacking).Then, we have introduced Crypto-Aegis, a ML based framework that is able to detect and identify crypto-mining activities related to the sponge-attack.Crypto-Aegis enjoys several features; in particular: (i) it is infrastructure independent; (ii) it is device independent; (iii) it supports multi-adversarial profiles; (iv) it does not require a clean state to operate; and, (v) it is highly effective (e.g.F1-score of 0.96 and an AUC for the ROC greater than 0.99).Moreover, we have also proved that the Crypto-Aegis framework is resilient to the adoption (by the adversary) of VPN tunnelling.The quality and viability of the achieved results, superior to competing solutions in the literature, combined with the novelty of the introduced attacker model, pave the way for further research in this domain-a few issues being exposed in Section X-B.

Fig. 1 .
Fig. 1.Network scenario: A Corporate device mines cryptocurrencies controlled by a malicious entity.Crypto-Aegis is constituted by a Machine Learning algorithm classifying the traffic coming from an Ethernet switch at the edge of the Corporate Network.
: • Bytecoin.Interarrival times are significantly affected by the use of VPN, i.e., time reduction spans from 5 to 10 times.Packet sizes are affected as well, i.e., the increase spans from 2 times to 3 times.It is worth noting that the reduction of the interarrival time with the increasing of the packet sizes involves a reduction of the trace length to guarantee the delivery of the same amount of data.• Monero.VPN tunnelling affects interarrival times of Monero depending on the flow.While ingoing flows experience a reduction of the interarrival time, outgoing flows slightly increase their values.Packet size is affected by the same phenomena.While packet size of ingoing flow ramps up from 66 Bytes (No VPN) to 1433 Bytes (Nord), outgoing flows work in the opposite way decreasing from 1242 Bytes (No VPN) to 131 Bytes (Nord).• Bitcoin.Interarrival times are more homogeneous for Bitcoin.Indeed, values span between 180 µs and 300 µs.Nevertheless, we observe that VPN tunnelling affects packet size, indeed for both Nord VPN and Express VPN, packet size is becoming significantly larger.

Fig. 3 .Fig. 4 .Fig. 5 .
Fig. 3. Packet size for outgoing flows: Candle-sticks represent the minimum, quantile 0.05, quantile 0.95 and the maximum packet size for the outgoing flows while the circles represent quantile 0.5. of quantile 0.05 and 0.95.Interestingly, such values get closer (being characterized by less variations) when their traffic is tunnelled through a VPN.Ingoing flows behave differently from outgoing ones.Packet size spans between closer ranges, i.e., quantile 0.05 and 0.95 are closer with respect to the outgoing flows.Median values

Fig. 8 .
Fig. 8. Mean Square Error (MSE) of the classification results as a function of the features and the moving window size (w).

Fig. 9 .
Fig. 9. True Positive Rate and False Positive Rate for ingoing network traffic of a Full Node considering different cryptocurrencies.

Fig. 10 .Fig. 11 .
Fig. 10.True Positive Rate and False Positive Rate for outgoing network traffic of a Full Node considering different cryptocurrencies.

Fig. 12 .Fig. 13 .
Fig. 12. True Positive Rate and False Positive Rate for ingoing network traffic of Miners of different cryptocurrencies.

TABLE II COLLECTED
TRACES: DURATION, MEDIAN OF INTERARRIVAL TIMES, AND MEDIAN OF PACKET SIZES.

TABLE III BASELINE
EXAMPLE: BITCOIN VS OFFICE SCENARIO.
Fig. 7. Receiver operating characteristic (ROC) curve: True Positive Rate as a function of the False Positive Rate.The Area Under the Curve (AUC) is about 0.971 for both the application scenarios.261observations are classified as not-Bitcoin (Office) while they actually are.Other interesting metrics-that will be used in the remainder of the paper-are the True Positive Rate (TPR) = 0.941, i.e., the number of True Positive normalized to the number of actual Bitcoin observations (TP / (TP + FN)), and the False Positive Rate (FPR) = 0.059, i.e., the number of False Positive normalized to the number of predicted observation for Bitcoin (FP / (FP+TN)).

TABLE IV COLLECTED
TRACES FROM MINERS: DURATION, MEDIAN OF INTERARRIVAL TIMES, AND MEDIAN OF PACKET SIZES., as it turns out by comparing the interarrival times from Table IV with the ones from TableII.Moreover, it is worth noting the case of Monero: the interarrival times are decreasing from 13.97 (Ingoing) and 6.11 (outgoing) seconds to less than 3 seconds (Express), or 1 second (Nord) when VPN tunneling is activated.Conversely, VPN does not significantly affect packet size with one exception: Monero Ingoing Nord experiences a packet size of 428 Bytes, while the packet size of Monero Outgoing with no VPN is 66 Bytes.As for the previous case, we consider True Positive Rate (TPR) and False Positive Rate (FPR) as our reference metrics.

TABLE VII COMPARISON
BETWEEN OUR SOLUTION AND THAT ONES ALREADY PROPOSED IN THE LITERATURE.