Hierarchical Gaussian graphical models: Beyond reversible jump

The Gaussian Graphical Model (GGM) is a popular tool for incorporating sparsity into joint multivariate distributions. The G-Wishart distribution, a conjugate prior for precision matrices satisfying general GGM constraints, has now been in existence for over a decade. However, due to the lack of a direct sampler, its use has been limited in hierarchical Bayesian contexts, relegating mixing over the class of GGMs mostly to situations involving standard Gaussian likelihoods. Recent work has developed methods that couple model and parameter moves, first through reversible jump methods and later by direct evaluation of conditional Bayes factors and subsequent resampling. Further, methods for avoiding prior normalizing constant calculations-a serious bottleneck and source of numerical instability-have been proposed. We review and clarify these developments and then propose a new methodology for GGM comparison that blends many recent themes. Theoretical developments and computational timing experiments reveal an algorithm that has limited computational demands and dramatically improves on computing times of existing methods. We conclude by developing a parsimonious multivariate stochastic volatility model that embeds GGM uncertainty in a larger hierarchical framework. The method is shown to be capable of adapting to swings in market volatility, offering improved calibration of predictive distributions.


Introduction
The Gaussian graphical model (GGM) has received widespread consideration (see Jones et al., 2005) and estimators obeying graphical constraints in standard Gaussian sampling were proposed as early as Dempster (1972). Initial incorporation of GGMs in Bayesian estimation has largely focused on decomposable graphs (Dawid and Lauritzen, 1993), since prior distributions factorize into products of Wishart distributions. Roverato (2002) proposes a generalized extension of the Hyper-Inverse Wishart distribution for covariance matrices Σ over arbitrary graphs. Atay- Kayis and Massam (2005) turn this into a prior specified for precision matrices K and outline a Monte Carlo (MC) method that enables pairwise model comparisons. Following Letac and Massam (2007) and Rajaratnam et al. (2008), Lenkoski and Dobra (2011) term this distribution the G-Wishart, and propose computational improvements to direct model comparison and model search.
A number of samplers for precision matrices under a G-Wishart distribution have been proposed. These involve either block Gibbs sampling (Piccioni, 2000), Metropolis-Hastings (MH) moves (Mitsakakis et al., 2011;, or rejection sampling (Wang and Carvalho, 2010).  show that the rejection sampler of Wang and Carvalho (2010) suffers from extremely low acceptance probabilities in even moderate dimensions. Wang and Li (2012) conclusively show that block Gibbs sampling is both computationally more efficient and exhibits considerably less autocorrelation than the MH methods.
The block Gibbs sampler provides a Markov chain Monte Carlo (MCMC) sample. When the likelihood assumes standard Gaussian sampling, determining posterior expectations of K can technically be performed as in Lenkoski and Dobra (2011), whereby model probabilities are first directly assessed via stochastic search, and model averaged samples are then collected using block Gibbs over each model. However, when the GGM is specified over latent data in a hierarchical Bayesian framework, such an approach is no longer valid. This is due to the use of the matrix K in updating other hyperparameters as well as its involvement in updating the latent Gaussian factors.  propose a reversible jump MCMC (which for brevity we refer to as RJ) method (Green, 1995) that simultaneously updates the GGM and its associated K, and embed the GGM in a semiparametric Gaussian copula.  expand the RJ algorithm and show how GGMs may be used to model dependent random effects in a generalized linear model context, focusing on lattice data. Wang and Li (2012) utilize conditional properties of G-Wishart variates that enables model moves through calculation of a conditional Bayes factor (CBF) (Dickey and Gunel, 1978) and subsequently update K through direct Gibbs sampling. Wang and Li (2012) also explore the use of a double MH algorithm (Liang, 2010) to avoid the computationally expensive and numerically unstable MC approximation of normalizing constants proposed by Atay- Kayis and Massam (2005).
We continue this departure from RJ and investigate an alternative method for simultaneously updating the GGM and associated K in hierarchical Bayesian settings. Our method is built on the framework outlined in Wang and Li (2012), but uses an alternative representation of the CBF with considerably less computational cost.
Simulation experiments compare our new algorithm to the algorithm of Wang and Li (2012) (which we refer to as WL). Both methods perform equally well at determining posterior distributions. However, we show that while the WL approach is theoretically appealing, it suffers significant computational overhead on account of many matrix inversions. By contrast, our new approach exhibits a dramatic improvement in computation time. Further links made to recent work in Gaussian Markov random fields (Rue, 2001) allow for additional improvement.
We conclude with an example of how GGMs may be embedded in hierarchical Bayesian models. GGMs have been shown to yield parsimonious joint distributions useful in financial applications (Carvalho and West, 2007;Rodriguez et al., 2011;Wang et al., 2011) and we develop a multivariate analogue of a common stochastic volatility model (Jacquier et al., 1994) that embeds GGM uncertainty. Our proposal differs from the dynamic linear model context in that we model an exponential term that represents market volatility. This latent Gaussian factor requires a strategy for hierarchically estimating the GGM and associated precision term, which are used to subsequently update the volatility process. We show that our method is able to obtain sharper posterior distributions than a commonly employed alternative and remains robust throughout the financial crash associated with the collapse of Lehman Brothers.
The article is organized as follows. In Section 2 we review the G-Wishart distribution, establish results necessary for CBF calculations and describe the block Gibbs sampler. Section 3 conducts two simulation studies showing the computational advantage gained by our new algorithm. In Section 4 we describe our multivariate graphical stochastic volatility model and give results over the data mentioned above. We conclude in Section 5.
The normalizing constant I G is in general not known to have an explicit form, and Atay- Kayis and Massam (2005) propose an MC approximation for this factor. Furthermore, the G-Wishart is conjugate and thus pr(K|D, G) = W G (δ+ n, D * ) where D * = D + U . Let Φ be the upper triangular matrix such that Φ ′ Φ = K, the Cholesky decomposition. Rue (2001) notes that we may associate with G another graph F , called the fill-in graph, such that G ⊂ F , Φ ij = 0 when (i, j) / ∈ F and when i < j and (i, j) ∈ F \ G. Rue (2001) outlines a straightforward method for constructing a graph F from G, which is referred to as a symbolic Cholesky factorization. As noted by Rue (2001), use of node reordering software, which aims to reduce F \ G, can lead to additional reduction in computing time. In what follows, we use the C library AMD (Amestoy et al., 2004) to perform symbolic Cholesky factorizations and node reorderings. Roverato (2002) shows that if K ∼ W G (δ, D) then where ν G i is the number of nodes in {i + 1, . . . , p} that are connected to node i in G.  and Wang and Li (2012) contain detailed reviews of the many methods that have been proposed for sampling W G (δ, D) variates. We briefly outline the block Gibbs sampler, originally discussed by Piccioni (2000). Let C denote the cliques of G. In the following, we consider a clique to be a maximally complete subgraph, though Wang and Li (2012) note that this can be relaxed to any complete subgraph. Piccioni (2000) shows that for any C ∈ C, where W denotes a standard Wishart variate. The expression (3) thereby gives the full conditionals of W G (δ, D). The block Gibbs sampler thus cycles through C, updating each component using (3). Wang and Li (2012) convincingly show that for posterior inference of W G (δ + n, D * ) the block Gibbs sampler outperforms all other proposed methods, both in terms of computing time and mixing. The authors also provide a useful review of the algorithm and indicate its broad flexibility. Throughout, we use the block Gibbs sampler for updating the matrix K.

Conditional Bayes factors
Prior to Wang and Li (2012), model moves between two graphs G and G ′ focused on approximating the ratio first through MC (Atay- Kayis and Massam, 2005;Jones et al., 2005), then a combination of MC and Laplace approximation  and ultimately through RJ . Suppose that G ⊂ G ′ which differ only by the edge e = (i, j) ∈ G ′ and that K ∈ P G . Let K −e = K \ {K ij , K ji , K jj }. In lieu of (4), Wang and Li (2012) consider ratios of the form which are related to the conditional Bayes factors (CBFs) of Dickey and Gunel (1978). Using properties related to the form (3) Wang and Li (2012) show that where, in general

Y. Cheng and A. Lenkoski
. By using the CBF in (6), Wang and Li (2012) propose model moves that do not rely on RJ methods, and after assessing which graph to move to, the parameter K jj , as well as K ij if e is in the accepted graph, are resampled according to their conditional distributions given K −e . This method is appealing, as it offers an automatic manner of moving between graphs and does not rely on the tuning parameters used in the RJ methods of  and .
While the result has significant theoretical appeal we show that computation of the factor H(δ + n, e, K −e , D * ) is extremely costly, even in low dimensions. This is due to the formation of the matrices K 0 and K 1 , which require the solution of systems involving large matrices, in particular, K V \j and K V \e .
Suppose now that G and G ′ differ only by the edge f = (p − 1, p) again with G ⊂ G ′ . We consider the CBF , with, in general This result originally appeared in an early version of Wang and Li (2012). In order to update a general edge e, we propose determining a permutation ς of V such that the nodes of V \ e are reordered to reduce the fill-in of the graph G V \e and finally, the edge e is placed in the (p − 1, p) position. Equation (7) is then calculated after permuting K and D * according to ς.
The benefit of this method is the reduced computational overhead required to compute (7). The method requires merely relabeling the matrices K and D * and determining the Cholesky decomposition of the permuted version of K.

Avoiding normalizing constant calculation
Both (6) and (7) require determination of the prior normalizing constants I G and I G ′ . While the MC method of Atay-Kayis and Massam (2005) enables these factors to be approximated, the routine can be subject to numerical instability Wang and Li, 2012) and involves significant computational effort. Wang and Li (2012) propose a method for avoiding the use of the MC approximation for prior normalizing constants. Their method employs the double MH algorithm of Liang (2010), which is an extension of the exchange algorithm developed by Murray et al. (2006).
We briefly review the implementation of the double MH algorithm in Wang and Li (2012). Suppose that (K, G) is the current state of the MCMC chain and we propose to move to G ′ by adding the edge e to G. The double MH algorithm then forms a copyK of K, resamplesK ij andK jj according to G ′ . It then updatesK via block Gibbs according to W G ′ (δ, D). Equation (6) We see that the expression (8) has replaced the prior normalizing constants with an evaluation of H in the prior, evaluated atK (see Murray et al., 2006;Liang, 2010, for theoretical justifications of this procedure). This is clearly beneficial, as it avoids the need for involved MC approximation. We follow this procedure with our revised version of the CBF calculation. Again suppose that (K, G) is the current state and we propose to move to G ′ by adding the edge f = (p − 1, p). We first determine Φ from K. We also run an iteration of the block Gibbs sampler relative to W G ′ (δ, D) starting from K, with the cliques of G ordered in such a manner that the K f subblock is updated first. We then extract the matrixΦ from this update. Equation (7) is The expression (9) requires less computation than (8) as only Cholesky decompositions are used.

Algorithms for full posterior determination
In this section we outline the two algorithms we will consider for full posterior determination. Both algorithms create a sequence 1. For each edge e, do: if e ∈ G the ratio is flipped. If G is not to be updated, skip to step c.
b. If attempting to update G to G ′ , sampleK as discussed in Section 2.3 and calculate and with probability α set G = G ′ , otherwise leave it unchanged.
c. Resample K ij , K jj according to G. We see that in one iteration of the WL algorithm, each edge is potentially updated in the graph. Our new algorithm (which we call CL) also follows this idea, and proceeds as follows 1. For each edge e, do: a. Randomly selection a permutation ς of V p , which places the edge e in the (p − 1, p) = f position, and likewise permute K, G, D and D * . Let G ς denote the permuted version of G and Φ be the Cholesky decomposition of the permuted version of K.
if f ∈ G ς the ratio is flipped. If G ς is not to be updated then, skip to step c.
b. If attempting to update G ς to G ′ , run the block Gibbs sampler over the permuted K relative to W G ′ (δ, D) and formΦ −f . Then calculate and with probability α set G ς = G ′ , otherwise leave it unchanged.
c. Resample Φ p−1,p , Φ pp according to G ς . Then reform K and G by unpermuting the system.
After attempting to update each edge, set G [s+1] = G. 2. Resample K [s+1] using the block Gibbs sampler relative to G [s+1] and the current state of K.
As we can see, there is somewhat more bookkeeping involved in the implementation of the CL algorithm, as the system is constantly being permuted. However, the reduction in computation time by the use of expression (9) and requiring only the calculation of the factors as we show below.

Simulation study
In this section we conduct two simulation studies. The first shows, in a relatively small example, that both the WL and CL algorithms yield correct answers, however the CL algorithm takes less computing time. The second example shows how these approaches scale with dimension, both for sparse and dense true underlying graphs.

First simulation study
We conduct a simulation study that compares the method we have developed to the WL algorithm. Our example comes directly from Wang and Li (2012). We consider a situation in which p = 6 and let U = Y Y ′ = nA −1 where n = 18 and A ii = 1 for i = 1, . . . , 6; A i,i+1 = A i+1,i = .5 for i = 1, . . . , 5 and A 16 = A 61 = .4. We finally assume the prior K ∼ W G (3, I 6 ). Using exhaustive MC approximation of the entire graph space, Wang and Li (2012) show that the posterior probability of each edge is We use this example and compare the CL algorithm to the WL algorithm. We run two different versions of the CL algorithm: CL simple is a straightforward implementation that does not reorder nodes when computing Cholesky decompositions, beyond moving the edge to be updated to the end of the graph, nor does this compute the fill-in graph F , and therefore relies only on the full Cholesky decomposition. CL Fill-in, does compute node reorderings and the fillin graph F and uses this as guidance when computing Cholesky factorizations.  (2012) we run both the WL and the two CL algorithms as described in Section 2.4 for 60, 000 iterations and discard the first 10, 000 iterations as burn-in. All algorithms were implemented in R, though C++ was used for block-Gibbs updates. The C library AMD of Amestoy et al. (2004) is called (from R) to perform reorderings.
We record the total computing time and looked at the mean squared errors of the posterior inclusion probabilities from these runs compared with the true values given above. We repeated the entire process 100 times, each time starting both WL and CL from the same random starting point. Table 1 shows the average computing time in seconds (on a 2.8 GHz desktop computer with 4GB of RAM running Linux), average MSE and standard deviations across the 100 runs. The first column shows the expected result: even in six dimensions the WL algorithm takes more than 4 times as long to perform the same number of iterations as the CL algorithms. This shows the improved efficiency of the proposed method. Furthermore, we actually see that in this low dimensional example, the additional work put into forming sparse Cholesky decompositions is largely unhelpful. Indeed, computing times are slightly higher, due to the overhead involved in using these additional routines.
We found the results in the third column surprising, but do not draw broad conclusions from it. It appears that in this example, using 60, 000 iterations, the CL algorithm approaches the true posterior edge expectation more quickly than the WL algorithm. Since both algorithms are correct theoretically, we choose not to emphasize this result. Furthermore, we have determined that by doubling the number of iterations, both approaches yield essentially the exact posterior distribution, though again the WL algorithm takes almost 4 times as long to run.
This example was chosen as it appears in Wang and Li (2012) and has an exact answer. The fact that the CL algorithms are faster than the WL approach even in 6-dimensions indicates the broader appeal for searching truly high dimensional spaces.

Second simulation study
This study is conducted to determine how the computing time for the CL and WL algorithms scales as dimension increases. To do this, we consider the following framework. For a fixed dimension p, we first sample a graph G where the probability of an edge being in the graph is a given θ. We then form a matrix Φ ii = 1/i for i ∈ {1, . . . , p} and Φ ij = −0.5 for those (i, j) ∈ G, i < j. The matrix Φ is then completed so that K = Φ ′ Φ ∈ P G . We finally sample . . , Z (3p) } independently from a N p (0, K −1 ). The WL and the two CL algorithms are then run for 50, 000 iterations using these data. We consider three scenarios, when θ = .05 (Sparse), θ = .5 (Medium) and θ = .9 (Dense) and consider p in 5 unit increments from 10 to 150. For each setting of p and θ, the process is repeated 25 times. Computation was performed in parallel on a 400 core server with 3.2 gHz Xeon chips and 96 GB of RAM, running Linux. The entire experiment took two weeks to conduct. Figure 1 shows the average computing time (in seconds) per iteration of the WL and the two CL algorithms under these settings.
The key feature of these results is that the WL appears to be consistently 4 times as slow as the CL alternatives. It seems reasonable to have a linear improvement, as all algorithms should technically be O(p 3 ) in computational complexity. The difference is therefore largely the additional work involved in computing the CBF in the WL setting.
Further, for dense graphs, there is no discernable difference between the different CL algorithms. However, in the Medium and Sparse cases, we begin to see the more involved CL algorithm contributing some additional improvement over the CL full algorithm. Indeed, in 150 dimensions, in the sparse graphs case, the CL full method is twice as fast as the CL simple.
The main purpose of this section is to highlight that the CBF calculation involved in the CL algorithm yields the majority of computational improvement over the WL algorithm and that this improvement holds as dimension increases and regardless of graph density. Further, when the graphs under consideration are very sparse, use of the fill-in graph F and node reordering software offer additional increase in performance, while the computational overhead is negligible when the graph is dense.
While the overall trends are the same in the three density scenarios, we note that that the per-iteration computing time is longest in the Medium density case. This is because of the block Gibbs sampler which operates on the cliques C of the graph. Medium sized graphs tend to have more cliques than dense or sparse graphs. Since this feature is shared by both the WL and CL algorithms, it does not affect their relative performance.

A multivariate graphical stochastic volatility model
Modeling the joint distribution of returns for a large number of assets is an important component of portfolio allocation and risk management. Carvalho and West (2007), Rodriguez et al. (2011) and Wang et al. (2011) all show that the use of GGMs can substantially improve modeling of joint asset returns. In each of these studies heterogeneity in asset returns was addressed either through the use of dynamic linear models with variance discounting (Carvalho and West, 2007;Wang et al., 2011) or infinite hidden Markov models .
We consider an alternative approach, that models market volatility in a manner analogous to the stochastic volatility models discussed in Jacquier et al. (1994) which jointly determines the general graphical model along with all other parameters.

The stochastic volatility model
Let Y t be the log-returns of p correlated assets. We specify the following hierarchical model for these returns: In the likelihood (10) we see that asset returns are assumed to be mean-zero. Such an assumption is common in the absence of a detailed forecasting model. The X t terms then dictate an overall level of market volatility, while a constant precision parameter K dictates the degree to which asset returns are correlated.
While this model is parsimonious, it serves as a useful first departure from previous studies as it explicitly incorporates notions of stochastic volatility and can be seen as a multivariate extension of the models outlined in Jacquier et al. (1994) and elsewhere. For purposes of model identification, we set X 1 = 0 as well as assume that the variance of the X t is equal to 1. In the conclusions section we discuss further possible generalizations to this framework.
After collecting a time-series of returns Y (1:T ) , we then aim to determine the posterior distribution pr(K, G, α, φ, X|Y (1:T ) ) where X = (X 1 , . . . , X T ). Furthermore, we may be interested in the posterior predictive distribution pr(Y (T +1) |Y (1:T ) ). The full detail of our MCMC algorithm is outlined in Appendix B, but largely follows the guidelines discussed in Rue and Held (2005). The posterior predictive distribution of Y (T +1) is easily formed from these parameters. However, we note in particular that From (11) we see why the developments in Section 2 prove useful. We may update K and G jointly using the CL algorithm discussed in Section 2.4 simply by setting D * = D + T t=1 exp(X t )Y (t) Y (t) ′ . This allows us to easily embed a sparse precision matrix K and mix over the class of GGMs in any hierarchical Bayesian model that involves a standard Wishart distribution.
In what follows we will compare our methodology to an alternative, discussed in Carvalho and West (2007) and extended using our methodology. The variance discounting approach assumes that for the log-returns Y (t+1) the following evolution model holds where, for 0 << ϑ < 1 with δ 0 = 3 and D 0 = I p . We use this framework and mix over the graph space, the details of which are provided in Appendix C.
The variance discounting model falls into the category of dynamic linear models (West and Harrison, 1997) and works by rapidly discounting previous observations, thereby adapting quickly to swings in market behavior.  Figure 2 shows the mean of the squared returns for the these 20 securities over the entire dataset. The extreme volatility present in the markets after the collapse of Lehman brothers in September 2008 is readily evident, showing that a homoskedasticity assumption is untenable for these data.

Model validation
In order to compare our stochastic volatility approach to that of variance discounting, we take the perspective verifying distributional forecasts (Gneiting and Raftery, 2007). This, in part, has to do with the challenging dynamics of the period that we consider. Previously, Carvalho and West (2007), Rodriguez et al. (2011) and Wang et al. (2011) were able to consider returns from optimal portfolios constructed according to variance minimization arguments in order to verify their methods' superior performance. However, portfolio optimization typically requires an assumption on the (non-zero) mean return level of the assets under consideration.
We feel that typical assumptions (for instance Carvalho and West (2007) use the previous day's returns while Rodriguez et al. (2011) use the mean return level for the previous 50 days) that have been used in the absence of detailed professional return forecasts are unsuitable for the data collected over the time period we consider due to the extreme swings in asset value. Furthermore, our goal is to characterize the entire distribution of the next day's returns, instead of forming an investment rule, since such distributional forecasts could be used to subsequently inform investment decisions.
Let s t = 20 i=1 Y ti be the sum of returns of the 20 assets at time point t and let F (t) be a given forecast distribution of s t . We evaluate the competing methods using the continuous ranked probability score (CRPS) (see Gneiting and Raftery, 2007, for additional details of the properties of the CRPS) which is calculated as where U and U ′ are independent random variates sampled according to F (t) . It has frequently been shown in the literature on distributional forecasting of weather, that using only a small amount of previous data can be beneficial Gneiting et al., 2005;Thorarinsdottir and Gneiting, 2010). The variance discounting method naturally embeds these considerations (this feature is shown quite clearly in our results section). For the stochastic volatility model we consider manually dropping data that come a given number of days before the forecast time t + 1.
Similar to what is done in Raftery et al. (2005) and Thorarinsdottir and Gneiting (2010) in the weather context, we isolate the first 700 days as pure training data, with the sole intention of determining the appropriate length window. We then consider the last 500 days in this set for which we make predictions. For each day in this period, we fit our stochastic volatility model using windows between 30 and 200 days in length, in increments of 10 days. For each window-length, we form a predictive distribution for s t+1 and compare this to the observed level using the CRPS. We then calculate the mean CRPS over these 500 days for each window length considered. Figure 3 shows the mean CRPS for each window length under consideration over this period. Such U shaped results are common when training lead time, and we see that using somewhere between 80 to 140 days of data offers the best predictive performance. For both the stochastic volatility and variance discounting models, we therefore present two results. One set of results uses all data available before timepoint t + 1 for forming the predictive distribution of s t+1 , while the other uses just returns between time t − 120 and t.
After conducting our training exercise, we use the last 1800 days to validate the performance of our proposed methodologies. We compare the stochastic volatility model to variance discounting when ϑ = .97 and ϑ = .99. Table 2 shows the mean CRPS of the six methodologies. We see that the stochastic  volatility model outperforms the variance discounting approaches in both situations. However, there is a considerably larger improvement if the forecast distribution for s t+1 is formed using the stochastic volatility model over only the previous 120 days. The variance discounting approaches, by contrast, are relatively agnostic to the use of a reduced sample to form predictions, since these models naturally down-weight previous observations. Our general intuition is that the stochastic volatility model is able to incorporate additional uncertainty over the variance discounting approach, and the CRPS rewards this. In addition to the results regarding posterior predictive performance, we also provide some indication of the posterior inference provided by our stochastic volatility model, shown in Figure 4. The left panel of the Figure 4 shows the estimate of the X t when using the entire dataset, thereby giving an indication the daily estimated latent volatility. Since these factors enter into the precision, lower levels indicate higher market volatility. We see the obvious increase in market volatility associated with the market crash of 2009. However, we also see a smaller, but non-negligible increase in volatility near the beginning of our dataset. We have determined that this period corresponds exactly to the American invasion of Iraq in 2003.
The right panel of Figure 4 shows the posterior distribution of the autocorrelation parameter φ for each day, when the model was fit using only the previous the parameters, predictive distributions were able to focus more quickly on the changing dynamics throughout the period under consideration.

Conclusions
We have synthesized a number of recent results related to the G-Wishart distribution. This has allowed for an algorithm that does not rely on RJ methods, obviates the need for expensive and numerically unstable MC approximation of prior normalizing constants and does so with minimal computational effort. The improvement in computation time is sufficient that at each stage of the algorithm, all edges may be evaluated for inclusion or exclusion in the graphical model. This algorithm allows the GGM to be embedded in more sophisticated hierarchical Bayesian models and opens the possibility of replacing standard Wishart distributions with G-Wishart variates, leveraging the improvement in predictive performance offered by sparse precision matrices.
The applied example shows the usefulness of this combination. We are able to sparsely model the interactions in financial assets while simultaneously addressing the issues of stochastic volatility prevalent in markets undergoing turbulence, successfully characterizing the distribution of asset returns during periods of rapidly fluctuating volatility.
The stochastic volatility model we develop remains parsimonious and several adjustments could be made. The first such development would be to replace the univariate term X t with a multivariate factor that allows the variance of each asset to follow its own path, while potentially tying the evolution of these factors together with a separate GGM. Furthermore, employing some form of the iHMM framework of Rodriguez et al. (2011) could allow for the matrix K to change throughout the period as well. Such developments will be considered in future work.
Furthermore, while the evaluation of GGMs outside of the RJ framework has proven to be a useful development, much work remains computationally. In particular, the growing prevalence of options of parallel computing, through multi-chip processors and graphics processing units has yet to be harnessed. Unfortunately, it is not yet clear how parallel computation can best be used in the confines of MCMC and further research is necessary to determine how these resources can best be leveraged in the hierarchical modeling context.