Modelling Volatility of Cryptocurrencies Using Markov-Switching GARCH Models

This paper aims to select the best model or set of models for modelling volatility of the four most popular cryptocurrencies, i.e. Bitcoin, Ethereum, Ripple and Litecoin. More than 1,000 GARCH models are fitted to the log returns of the exchange rates of each of these cryptocurrencies to estimate a one-step ahead prediction of Value-at-Risk (VaR) and Expected Shortfall (ES) on a rolling window basis. The best model or superior set of models is then chosen by backtesting VaR and ES as well as using a Model Confidence Set (MCS) procedure for their loss functions. The results imply that using standard GARCH models may yield incorrect VaR and ES predictions, and hence result in ineffective risk-management, portfolio optimisation, pricing of derivative securities etc. These could be improved by using instead the model specifications allowing for asymmetries and regime switching suggested by our analysis, from which both investors and regulators can benefit.


Introduction
Modelling volatility is crucial for risk management. Following the global financial crisis of 2008, the Basel III international regulatory framework for banks has imposed more stringent capital requirements, and enhanced risk management systems have been developed. Since then the international financial system has had to face a new challenge, namely the introduction of decentralised cryptocurrencies, the first being Bitcoin, which was created in 2009 (Nakamoto, 2009). Unlike traditional currencies, cryptocurrencies are based on cryptographic proof, which provides many advantages over traditional payment methods (such as credit cards) including high liquidity, lower transaction costs, and anonymity (these features are discussed by Fantazzini et al., 2016).
Interest in Bitcoin and other cryptocurrencies has risen considerably in recent years.
Their market capitalisation increased from approximately 18 billion US dollars at the beginning of 2017 to nearly 600 billion at the end of that year, 1 and high returns have attracted new investors. In addition, two big exchanges, i.e. the Chicago Mercantile Exchange (CME) and the Chicago Board Options Exchange (CBOE), started to trade futures on Bitcoin. 2 As a result of these developments, central banks have been facing the question of whether or not cryptocurrencies should be regulated, given the numerous technical and legal issues involved.
Further, cryptocurrencies are highly volatile and consequently it is important to estimate appropriate risk metrics, which can be used for calculating capital requirements, margins, hedging and pricing derivatives etc. It is well known that standard GARCH models can produce biased results if the series display structural breaks (Bauwens et al. (2010(Bauwens et al. ( , 2014); these are likely to occur in the case of cryptocurrencies, and therefore a suitable modelling approach should be used. Ardia et al. (2017) suggest estimating in such cases Markov-Switching GARCH models, whose parameters are allowed to change over time according to a discrete latent variable.
The aim of this paper is to find the best model or set of models for modelling volatility of the four most popular cryptocurrencies, i.e. Bitcoin, Ethereum, Ripple and Litecoin. More than 1,000 GARCH models are fitted to the log returns of the exchange rates of each of these cryptocurrencies to estimate a one-step ahead prediction of Value-at-Risk (VaR) and Expected Shortfall (ES) on a rolling window basis. The best model or superior set of models is then chosen by backtesting VaR and ES as well as using a Model Confidence Set (MCS) procedure for their loss functions.
The paper is organised as follows. Section 2 briefly discusses the relevant literature; Section 3 provides a description of the data; Section 4 outlines the methodology; Section 5 presents the empirical results; finally, Section 6 offers some concluding remarks.

Literature Review
Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models are the most commonly used in the literature for modelling volatility and estimating Value-at-Risk (VaR) and Expected Shortfall (ES). The original Autoregressive Conditional Heteroskedasticity (ARCH) specification was introduced by Engle (1982) and then extended by Bollerslev (1986), who put forward the GARCH framework. Additional specifications were then developed: the exponential GARCH (EGARCH) model of Nelson (1991), the threshold GARCH (TGARCH) model of Zakoian (1994), the Student's t-GARCH model of Bollerslev (1987), the GJRGARCH model of Glosten et al. (1993) and many others (see Bollerslev et al. (1992), Bollerslev et al. (1994) and Engle (2004)).
Recent studies have shown that structural breaks result in biased estimates of GARCH models and poor volatility forecasts (Bauwens et al. (2010(Bauwens et al. ( , 2014). To overcome this problem Markov-switching GARCH models (MSGARCH) have been proposed, whose parameters can change over time according to a discrete latent (i.e., unobservable) variable (Ardia et al., 2017); to make computations easier a normal (or mixture) distribution) is typically assumed. This framework has been used in recent papers to analyse various type of assets: commodity prices (Alizadeh et al. (2008)), stock returns (Henry (2009) They compared two estimation techniques, namely Maximum Likelihood (ML) and Markov chain Monte Carlo (MCMC) procedures, and found that MSGARCH outperforms single regime models in the case of stock prices, but not of stock indices and currencies.
GARCH models have also been used for modeling the volatility of cryptocurrencies. Glaser et al. (2014) estimated a standard GARCH (1, 1). Gronwald (2014) reported that an autoregressive jump-intensity GARCH model fits the Bitcoin data better than a standard GARCH. Dyhrberg (2016) estimated an asymmetric GARCH for Bitcoin arguing that it can be used for hedging. Bouoiyour and Selmi (2016) compared different specifications including EGARCH, Asymmetric Power ARCH (APARCH), weighted GARCH and component GARCH with multiple thresholds by using in-sample criteria such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the Hannan-Quinn information criterion (HQC); they concluded that, despite a noticeable decrease in its volatility, Bitcoin cannot yet be considered a mature currency. Katsiampa (2017) found that the AR-CGARCH model gives the best fit for Bitcoin, which means that accounting for both the short-and long-term components of the conditional variance is important. Chu et al. (2017)

Data Description
The series analysed are the daily closing prices of Bitcoin, Ethereum, Ripple and Litecoin.
The Bitcoin data were taken from the Coindesk Price Index and cover the period from 18 July 2010 to 30 April of 2018; for the other three series the data source is CoinMarketCap. 3 The end date is the same for all series, whilst the start date differs: it is 7 August 2015 for Ethereum, 4 August 2013 for Ripple, and 28 April 2013 for Litecoin. Prices were transformed into log returns by taking first differences of their logarithm (see Figure 1).

Figure 1. Log returns
Summary statistics for the log returns are shown in Table 1. 3 https://coinmarketcap.com/ As can be seen, they are negatively skewed in the case of Bitcoin and positively skewed in all other cases. All four cryptocurrencies exhibit leptokurtosis. The histograms of the log returns series are shown in Figure 2.

GARCH models
Let ∈ be the percentage log-returns of the financial asset (exchange rate) of interest at time t. The general Markov-Switching GARCH specification has the following form: where �0, ℎ , , � is a continuous distribution with zero mean, time-varying variance ℎ , , and additional shape parameters contained in the vector . Following Ardia et al. (2017), the integer-valued stochastic variable , defined on the discrete space {1, …, K}, is assumed to evolve according to an unobserved first-order ergodic homogeneous Markov chain with , with , = [ = | −1 = ], with −1 being the information set available at time t-1.
As in Haas et al. (2004), the conditional variance of is assumed to follow a GARCH process. This is not restricted to be the standard GARCH model: where ω(·) -is a −1 -measurable function which defines the filter for the conditional variance and also ensures its positiveness. By contrast, a variety of GARCH specifications are considered, in particular: SGARCH (Bollerslev (1986)) .
As for distribution mixture models, suppose that: where DM is a mixture of densities with the following form: where [ 1 , … , ] is the positive mixing law and denotes the density function.
As an example of a distribution mixture model, consider the normal mixture models: according to Alexander and Lazar (2006), these can be seen as Markov switching GARCH models in a restricted form, where the transition probabilities are independent of the past state. It is assumed that K variances follow some mixture of distributions. For example, the normal mixture standard GARCH (1, 1) will be defined as follows: According to the Cifter and Ozun (2007) the overall conditional variance will then be:

Value-at-Risk Backtesting
Value-at-Risk at level is defined as the maximum loss one could expect to incur with probability over a specific period. Mathematically: Basically, VaR is the -quantile of the underlying distribution.
For carrying out VaR forecast tests, the first step is to define the "hit sequence" of VaR violations: +1 is the VaR prediction at time t+1 for risk level α. Under the null hypothesis of correct specification the hit sequence should be an independent Bernoulli distributed variable.
The unconditional coverage (UC) test of Kupiec (1995) uses the fraction of observed violations for a particular risk model -and compares it with p. For this purpose the likelihood Bernoulli function is needed: where 0 , 1 are the number of 0s and 1s in the sample ( = 0 + 1 ). The maximum likelihood estimator is: The hypothesis of interest can be tested by means of the following likelihood ratio test: Under the null hypothesis that the model is correct is asymptotically 1 2 .
However, this test focuses only on the number of exceptions. A situation can arise when the model passes the unconditional coverage test but all violations are concentrated. In order to reject a VaR with clustered violations a test of independence of the hit sequence is needed. Suppose that this exhibits time dependence and follows a first-order Markov sequence with the following transition probability matrix: where 01 is the probability of getting a violation tomorrow given no violation today.
Then the corresponding likelihood function is the following: where is the number of observations with a j following i.
If the hit sequence is independent over time, then 01 = 11 = . The transition probability matrix will have the following form: Then, independence can be tested using a likelihood ratio test statistic defined as follows: It is important for VaR users to be able to test simultaneously whether the hit sequence is independent and the average number of violations is correct; the conditional coverage (CC) test developed by Christoffersen (1998) This is equivalent to testing the null 01 = 11 = . Note also that = + .
An additional test is the dynamic quantile (DQ) one introduced by Engle and Manganelli (2004). It is based on a linear regression of the hit variable on a set of explanatory variables including a constant, the lagged values of the hit variable and any useful function of past information. Let us denote ( ) = ( ) − . Under the correct model specification, the following moment conditions are satisfied: The linear regression is the following: where (. ) is a function of past information. The null hypothesis of conditional efficiency is equivalent to testing whether the coefficients are jointly equal to zero: However, these tests do not provide any insights into the magnitude of exceedance and therefore do not enable the researcher to make model comparisons.
A final test is due to González-Rivera et al. (2004) and McAleer and Da Veiga (2008).
It uses the asymmetric linear losses incurred by VaR forecasts. The quantile loss (QL) is given by the following function:

Expected Shortfall Backtesting
VaR is an elicitable risk measure, where a statistics ( ) of a random variable is said to be elicitable if it minimises the expected value of a scoring function (Acerbi and Szekely (2014)): It has been shown that VaR has several shortcomings, and that in particular is not able to capture tail risks beyond the -quantile , Danielsson et al. (2001), Basel Committee (2013)). For these reasons, ES was introduced (Artzner et al. (1997). To put it simply, ES is the expected loss given that the loss exceeds VaR ( = [ | > ]). Mathematically: The transition from VaR to ES as the main market risk metric (Basel Committee, 2016) makes it necessary to have a reliable backtesting procedure of ES.
McNeil and Frey (2000)  The most recent test is the regression-based one of Bayer and Dimitriadis (2018), known as the ESR test. These authors use a joint regression framework for the quantile and the ES, and present two-sided and one-sided versions of the test. Suppose that � is the ES forecast and is the log return. Then one can regress returns on the ES forecast as follows: where ( |ℱ −1 ) = 0. One can then test the joint hypothesis that is equal to 0 and is equal to 1, 0 : ( , ) = (0, 1), against the alternative that 1 : ( , ) ≠ (0, 1).
As the functional ES is not elicitable, these authors estimate semi-parametrically the following system: A Wald statistic is then carried out using the parameters ( , ): where Σ ES � is an estimator for the (asymptotic) covariance matrix of the M-estimator of the parameters ( , ). Hence, the test statistics follows asymptotically a 2 2 distribution.
The test described above has been named the bivariate ESR test by Bayer and Dimitriadis (2018). Since in their opinion the possibility of underestimating risk is the main issue for regulators, they also suggest another regression-based backtesting procedure for the ES: where ( |ℱ −1 ) = 0 and the null hypothesis is that is zero. These are t-tests based on the asymptotic covariance.

Model Confidence Set
The backtesting procedures described above cannot help to select the best GARCH specification (even QL is not sufficiently informative). For this purpose we shall use instead Finally assume that is finite and does not depend on ∀ , ∈ 0 .
In order to eliminate inferior elements of the set 0 the following EPA hypotheses are tested From these statistics the following t-statistics can be constructed and . = ̅ .
The two EPA hypotheses map into the two test statistics As mentioned before, the MCS procedure is a sequential testing procedure that removes the worst model that removes the worst model at each step, until the hypothesis of EPA is not rejected for all models in the SSM. The choice of the worst model to be eliminated is made using an elimination rule that is coherent with the statistic tests defined above, namely: To summarise, the MCS procedure consist of the following steps: 1. set = 0 , 2. Test for the EPA hypothesis; if EPA is not rejected terminate the algorithm and set = * , otherwise use the elimination rule and find the worst model, 3. Remove the worst model, and go to step 2.
The MCS procedure requires a loss function. For the VaR the QL will be used. However, the choice of a loss function is not straightforward. Various papers have attempted to develop a consistent scoring function. Here we shall use the same loss function for ES backtesting as in Bayer and Dimitriadis (2018), which belongs to a class of functions originally introduced by Fissler and Ziegel (2016) in the context of forecast evaluation. They derived a consistent scoring function for ( , ) in the following way.
Let ∈ (0, 1). Also let ℱ be a class of distribution functions on ℛ with finite first moments and unique -quantiles, and 0 = { ∈ ℛ 2 : 1 ≥ 2 }. A scoring function S is then defined as follows: 0 × ℛ → ℛ of the form if 1 is increasing and 2 is increasing and convex.
If 2 is strictly increasing and strictly convex, then S is strictly ℱ-consistent for ( , ).

Empirical Results
As mentioned above, we estimate the following GARCH specifications: SGARCH, EGARCH, GJRGARCH and TGARCH. Normal (norm), skewed normal (snorm), Student's t (std), skewed Student's t (sstd), generalized error (GED) and skewed generalized error distributions (sged) are used. Finally, the mixture parameter is included; if this takes the value of one (TRUE) then the model becomes the distribution mixture GARCH, if it is instead set equal to zero (FALSE) it yields the MSGARCH specification. In total, 1176 GARCH models were estimated for each cryptocurrency.
The results were obtained from 1-step ahead VaR and ES predictions. A moving window with refitting at every step was used for the estimates and the predictions. The window size is 70% of the total number of observations. The models that did not fail the backtesting procedures were then used in the MCS procedure in order to obtain the best model or set of models. The p-values in the MCS procedure was set equal to 30%. Table 2 reports p-values for the backtesting procedures and the MCS procedure for the superior set of models (SSM) for Bitcoin. As can be seen, 24 models out of 1176 satisfy the VaR and ES backtesting procedures and were selected by MCS as models with equal predictive power with respect to QL. Note that none of the single regime models was selected for the SSM.

Value-at-Risk Results
Standard GARCH models prevail in the first and second regimes. Interestingly, the normal distribution prevails in the first regime, but whilst the Student's t distribution is appropriate for 70% of the models in the second regime. Mixture models represent approximately 60% of those selected for the SSM. It is also noteworthy that in the first regime a variety of specifications appear to be appropriate, whilst in the second standard GARCH and GJRGRACH account for more than 90% of the specifications. The same pattern emerges for the distribution functions: in the second regime the Student's t and normal distribution are chosen in 87.5% of the cases. Interestingly, no models with a skewed GED distribution are selected for the SSM, and none with skewed distributions in the second regime. Specifications that account for leverage effects at least in one of the regimes represent more than 75% of those selected for the SSM.    Table 4 reports the results for Ripple. Only eight models satisfied all backtesting procedures and were selected by the MCS procedure. For the first regime models with standard GARCH or TGARCH were chosen by this procedure. For the second regime 50% of the models selected for the SSM have a TGARCH specification and 25% of them has either a standard GARCH or GJRGARCH specification. In the first regime normal and skewed normal distributions prevail, while in the second the Student-t and skewed Student-t distributions are found to be appropriate for all models. The same percentage of mixture models and Markov-switching models are selected for the SSM. Specifications accounting for leverage effects at least in one of the regimes represent more than 75% of those in the SSM.  Table 5 shows that MCS selects the highest percentage of models satisfying the backtesting procedures in the case of Litecoin; in total, 33 models out of 1176 were chosen. In the first regime TGARCH models represent 40% of those selected, standard GARCH ones 30% and EGARCH and GJRGARCH the rest; the most common distribution is the normal, followed by the skewed normal and Student's t. In the second regime standard GARCH models exceed 50%, and the Student's t and skewed Student's t distributions are also selected in more than 50% of the cases for the SSM. Mixture models represent more than 80% of those in the SSM. Specifications allowing for leverage at least in one regime account for approximately 76% of those in the SSM.  Table 6 reports p-values for all backtesting procedures and the MCS procedure to choose the best set of models for Bitcoin. As can be seen 25 models out of 1176 satisfy the VaR and ES backtesting procedures and were selected by MCS as models with equal predictive power with respect to the joint loss function. Note that none of the single regime models was selected for the SSM.

Expected Shortfall Results
The results from the MCS procedure with a joint loss function are mostly the same as those for QL, since SSM is constructed from the same set of models that did not fail the VaR and ES backtesting procedures. Standard GARCH models prevail in the first and second regimes. Interestingly, the normal distribution prevails in the first regime, but in the second the Student's t distribution is selected in 70% of the cases. Mixture models represent 60% of those in the SSM. In the first regime there is more variety of specifications, whilst in the second standard GARCH and GJRGRACH represent 90% of the chosen models. Similarly, the Student's t and normal distribution prevail in the second regime being selected in 88% of the cases. No models with a skewed GED distribution are included in the SSM, and none with a skewed distribution in the second regime. Specifications accounting for leverage effects at least in one of the regimes represent 80% of those in the SSM.  Table 7 shows the results for Ethereum, which are essentially the same as for QL.  Table 8 reports the results for Ripple. Only five models did not fail the backtesting procedures and were selected by the MCS procedure. In the first regime standard GARCH or TGARCH models were chosen by MCS procedure; in the second a TGARCH specification is selected for the SSM in 60% of the cases, and GARCH or GJRGARCH ones in 20% of them.
Normal and skewed normal distributions prevail in the first regime, whilst Student's t and skewed Student's t distributions are chosen in all cases in the second. Markov-switching and mixture models represent respectively 60% and 40% of those included in the SSM.
Specifications with leverage effects at least in one of the regimes represent 80% of the total in the SSM.  Table 9 show the results of the MCS procedure for Litecoin with a joint loss function.
In total 47 models were chosen, the highest percentage of models satisfying the VaR and ES backtesting procedures. Standard GARCH is chosen in one third of the cases in the first regime, and in approximately half of them in the second. The Student's t distribution and its skewed version prevail in both regime, whilst the percentages for the normal distribution and its skewed version are 40% and 4% respectively in the two regimes. The corresponding percentages for the GED distribution are 44% and 18% respectively. Mixture models are selected in 80% of the cases for the SSM. Specifications with leverage effects are also chosen in 80% of the cases at least in one regime.

Conclusions
This paper has used VaR and ES backtesting as well as the MCS procedure to select the best model or superior set of GARCH volatility models for four of the main cryptocurrencies, namely Bitcoin, Ethereum, Ripple and Litecoin. Two-regime GARCH models are found to produce better VaR and ES predictions than single-regime models. In particular, models that allow for asymmetry prevail in the superior set of models on the basis of both VaR and joint loss functions. Mixture GARCH models prevail for almost all cryptocurrencies with both a quantile loss function and a joint loss function.
On the whole, our findings are consistent with those reported in the existing literature showing that cryptocurrencies exhibit extreme volatility and leverage effects. They also indicate that using standard GARCH models may yield incorrect VaR and ES predictions, and hence result in ineffective risk-management, portfolio optimisation, pricing of derivative securities etc. These could be improved by using instead the model specifications suggested by our analysis, from which both investors and regulators (such as the US Securities and Exchange Commission (SEC), which is planning to regulate the cryptocurrency exchanges) can benefit.
Future work will use intraday data to address the issue of the large number of the observations required by some of the tests carried out in the present paper, and also estimate multivariate GARCH models, e.g. to examine the linkages between Altcoin and Bitcoin.