Loss-based prior for the degrees of freedom of the Wishart distribution

,


Introduction
Macroeconomic and health forecasting has always been of pivotal importance for policymakers and researchers.In the twenty-first century, the availability of large data sets and novel modeling techniques have improved the forecasting of macroeconomic and financial variables (see Bańbura et al., 2010;De Mol et al., 2008;Huber and Feldkircher, 2019;Koop and Korobilis, 2013;Stock and Watson, 2002).It is beneficial to extract relevant information about the processes of interest from the data, and this paper demonstrates how to leverage the information that is intrinsic to variables of interest, to provide an adaptable representation of prior uncertainty.
A core model used by researchers and policymakers is the Vector Autoregressive (VAR) model, introduced in Sims (1980).The VAR model is ideal for representing the relationships between quantities of interest given its flexibility to accommodate multiple time series.
Within the Bayesian framework, substantial research has been carried out to avoid over parametrization and overfitting (e.g.Doan et al., 1984;Litterman, 1986;Sims and Zha, simulation scenarios and compared with alternative approaches, such as the flat prior proposed by Uhlig (2005).
We study the merits of our approach by considering the Federal Reserve Economic Data (FRED) dataset by McCracken and Ng (2016) and McCracken and Ng (2020), and by performing both point and density forecasting measures on different macroeconomic variables.
The proposed prior has two important advantages compared to fixing the degrees of freedom to a fixed value.Firstly, in the absence of any prior information about the true variancecovariance matrix, it is possible to integrate the process with suitable information contained in the data, making the approach more robust.Secondly, there is the possibility of evaluating the degrees of freedom across time via a rolling window estimation procedure.Indeed, we find strong evidence of changes in the degrees of freedom after the year 2000 and a fall around the 2009 financial crisis, which is strongly related to the Lehman Brothers failure.
As a second case study, we analyse the Google Dengue Trends (GDT) dataset for ten different countries, which is a query-based reporting system for infectious disease (Carneiro and Mylonakis, 2009;Strauss et al., 2017).Dengue is a viral infection transmitted by mosquitoes and is present in South American and Asian countries.The results from the second case study demonstrate the flexibility of the proposed approach.
The paper is structured as follows.Section 2 describes the VAR model and derives the proposed loss-based prior for the degrees of freedom of the Wishart distribution.In Section 3, using simulation, we compare the proposed hyperprior to the model that assumes fixed degrees of freedom.Section 4 deals with the forecasting of macroeconomic variables, and Section 5 shows additional results on the forecasting of the Google Dengue Trends data.Section 6 concludes the paper.

VAR model for forecasting and the novel loss-based prior
In this section, we define the VAR model, present the prior assumptions, and derive the novel loss-based hyperprior for the degrees of freedom of the Wishart distribution and its properness property.
Let y t be the m-dimensional vector of observations, for t = 1, . . ., T .We define a VAR model with p lags as where A j is a (m × m) matrix of coefficients, and ε t is an m-dimensional vector of independent and identically normally distributed error terms centered on 0, and with covariance matrix Σ, In a more compact form, Eq. ( 1) can be written as where Y is a (T × m) matrix constructed as Y = (y 1 , y 2 , . . ., y T ) ′ , and X = (x 1 , x 2 , . . ., x T ) ′ is a (T × k) matrix containing the lagged response variables, where (T × m) matrix of errors.In a vectorized form, we can define the VAR models of order p as In this paper, we adopt a global-local shrinkage-Wishart prior for the parameters of the model, where we assume a Horseshoe prior distribution (Carvalho et al., 2010) for the matrix of coefficients and a Wishart distribution for the precision matrix Σ −1 ∼ W(ν, S −1 ).We set the hyperparameters for the Wishart prior equal to ν = m + 1 and S = I m .As a robustness check, we ran a Normal-Wishart prior with hyperparameters for the Normal prior equal to α = 0 and V = 10 • I mk (the results are available upon request).
In particular, the general Horseshoe prior for each element of the vectorized matrix of coefficient α, takes the form where C + (•, •) denotes the half-Cauchy distribution, λ α j is the local shrinkage parameter, τ α is the global shrinkage parameter, and j = 1, . . ., k • m.For the posterior distribution of α, and of the global and local shrinkage parameters λ α j and τ α , refer to Cross et al. (2020) and the algorithm proposed by Makalic and Schmidt (2016).

Loss-based hyperprior
In the literature, the usual assumption on the degrees of freedom of the Wishart distribution is to set the parameter to ν = m + 1.In this section, we derive the loss-based prior distribution for ν.Thus, the parameter is assumed discrete, and, to construct the prior, we employ the objective method introduced in Villa and Walker (2015).
Let us consider a Bayesian model with sampling distribution f (x|θ), characterized by the discrete parameter θ ∈ Θ, and prior π(θ).A mass is assigned to each value of the parameter that is proportional to the Kullback-Leibler divergence between the model defined by θ and the nearest one.In other words, if f (x|θ) is the true model, and it is not chosen, then the loss in information that one would incur is represented by the Kullback-Leibler divergence between f (x|θ) and f (x|θ ′ ), where the latter is the nearest model to the true one.Therefore, the prior on θ is where KL f (x|θ)∥f (x|θ ′ ) represents the Kullback-Leibler divergence between the two models.A more detailed derivation of the prior in Eq. ( 2) is illustrated in Appendix A.1.
To derive the prior in Eq. ( 2) for ν, the Kullback-Leibler divergence between two Wishart distributions that share the same scale matrix V and differ in the number of degrees of freedom, say W ν and W ν+c is required.The probability density function of a Wishart distribution with parameters V and ν is given by: , where Γ m (•) is the multivariate Gamma function, and Tr(•) is the trace function.Thus the Kullback-Leibler divergence between two Wishart distributions is given by: where ψ m (•) is the multivariate digamma function defined as is the digamma function, and c ∈ Z.As KL(W ν ∥W ν+c ) is a convex function of c, and its global minimum is at c = 0, the nearest Wishart distribution to W ν will be W ν+c for either c = −1 or c = 1.Theorem 1 shows that the Kullback-Leibler divergence between W ν and W ν+c is minimized for c = 1.
Theorem 1.Consider two Wishart distributions, W ν and W ν+c , with the same scale matrix and ν and ν + c degrees of freedom, respectively, and c ̸ = 0 is an integer.Then, the Kullback-Leibler divergence between W ν and W ν+c is minimum for c = 1.
Proof.For c = 1, the Kullback-Leibler divergence is while for c = −1, we obtain By taking the difference of the two divergences, we yield We will prove that Eq. ( 4) is always negative for any ν, m such that ν > m ≥ 2. As ν > m, ν = m + k, for k = 1, 2, . . ., and m ≥ 2, thus we obtain that the minimum Kullback-Leibler To prove the inequality in Eq. ( 5), we rely on and where Eq. ( 6) comes from the definition of the multivariate digamma function, and inequality (7) can be deduced from log(x + 1 2 ) − 1 x < ψ(x) < log(x + e −γ ) − 1 x , (see Elezovic et al., 2000) for x > 1 2 and with γ equal to 0.57721 as the Euler-Mascheroni constant.
Initially, we assume it holds for a particular m, and then we prove for m + 1: where in the last inequality we have log m+k 2 < ψ m+1+k 2 as a consequence of the result in inequality (7).Thus, if the inequality (5) holds for m, then it holds for m + 1, and it holds for any k.The smallest possible value m is equal to m = 2, where where we have log k 2 < ψ k+1 2 and log k+1 2 < ψ k+2 2 due to inequality (7).Thus, inequality (5) holds for m = 2 and, subsequently, it holds for any m.
We can then define the objective prior distribution for ν as Figure 1 shows the loss-based prior for ν for three different values of m ∈ {3, 7, 15}, which represents the dimensionality of the macroeconomic dataset that we analyse in Section 4.1.
Each line yields similar patterns and shows that the loss-based prior probability distribution decreases as the degrees of freedom ν increases.

Properness of the posterior for ν
Let us assume that we observe one random matrix Σ −1 from the Wishart W (S −1 0 , ν), then, the likelihood function is given by Using the loss-based prior for ν in Eq. ( 2), we obtain the posterior distribution for the number of degrees of freedom as Theorem 2 shows that the marginal posterior distribution for ν is proper.This is a necessary step when deriving objective priors.
Theorem 2. The posterior distribution for the number of degrees of freedom ν in Eq. (10) is proper.

Gibbs sampling algorithm
Based on the proposed loss-based prior distribution, we provide a Gibbs sampler algorithm.In particular, the Monte Carlo Markov Chain (MCMC) algorithm follows the usual steps seen in the multivariate time series literature (see, Follett and Yu, 2019;Koop and Korobilis, 2010), where the new step relates to the full conditional posterior distribution of ν.We implement a Metropolis-Hastings algorithm since the posterior distribution of ν is not available in closed form.
To summarize, the Gibbs sampler is based on the following steps: (i) Update the vectorized matrix of coefficients, α given the data y and Σ −1 by using the corrected triangular algorithm of Carriero et al. (2022).
(ii) Update the precision matrix Σ −1 given α, y and ν from a Wishart distribution.
(iii) Update the local and global shrinkage parameters λ α j and τ α given the vectorized matrix of coefficients, α, as in Makalic and Schmidt (2016).
(iv) Update the degree of freedom ν given the Σ −1 by using a Metropolis-Hastings algorithm with a symmetric random walk proposal.
For the model with fixed ν equal to 0 or m + 1, the Gibbs sampler is based only on Steps (i)-(iii).

Simulation study
We compare the performance of our loss-based hyperprior in different simulation studies to the case of fixed ν equal to 0 or m + 1.We generate the data from a VAR model with one lag with different matrix dimensions and time lengths, where the elements of the matrix of coefficients of the VAR model are drawn from a U (−0.95, 0.95) distribution and then stationarity conditions are checked.We consider small, medium, and large numbers of response variables, where m is equal to 5, 10, and 20, respectively, and time, T , is equal to 30 or 100.Moreover, we have considered different combinations for the choice of the degrees of freedom when generating the data: for each dimension m, we have chosen ν equal to {5, 10, 15}, for m = 5, {10, 15, 20}, for m = 10, and {20, 24, 26}, for m = 20.In Appendix A.2, we generate data as in the macroeconomic application with a time dimension equal to 240 and the number of response variables equal to 3, 7, and 15.The choice of ν is {3, 5, 7} for m = 3; {7, 10, 13} for m = 7 and {15, 20, 25} for m = 15.
For the comparative approaches, we use a flat prior as in Uhlig (2005), or we treat the degrees of freedom as fixed at ν = m + 1.We also assume an identity matrix for the prior scale matrix of the Wishart distribution.We assume a Horseshoe prior for the matrix of coefficients since the interest of our simulation experiment is in the evaluation of the covariance matrix.
For each dataset, and each posterior sample, we estimate the posterior means of the vectorized matrix of coefficients and of the covariance matrix Σ, and compute the Root Mean Absolute Deviation (RMAD) between the posterior means and the true parameter values as , where N is the number of parameters estimated (which is equal to m 2 for the covariance matrix and depends on the lags for the matrix of coefficients) and θ is the matrix of coefficients or the covariance matrix.
This process is repeated 250 times, and the Gibbs sampler is run for 6000 iterations with a burn-in of 1000 iterations.For each batch of RMAD, we create boxplots for the cases with fixed ν equal to 0 or m + 1, and the case with loss-based prior for the degrees of freedom.From Figure 2, we observe that differences in RMAD between the alternative priors increase when increasing ν.The left panel shows no difference between the three cases, except for the outliers, which are smaller for our proposed hyperprior.Increasing ν to 10 and 15 leads to smaller RMAD when our hyperprior is used.
As a second measure of evaluation of the proposed prior, in Table 1 we provide the RMAD for the orthogonal (Choleski) impulse response function evaluated at four different horizons (h = 1, 3, 5 and 7).In particular, the column called Fixed ν = 0 refers to the Uhlig ( 2005) flat prior and it is considered as the benchmark, and the other two columns (called Fixed ν = m+1 and Hyperprior) provide the ratio with respect to the benchmark.If a value is greater than 1, it means that the flat prior outperforms the other priors, while if the value is lower than 1, the flat prior shows poorer performance.In Table 1, increasing ν from 5 to 15 leads to strong improvements of around 12% at horizon 3 and 27% at horizon 5 for the proposed loss-based prior against the other priors.When ν = 5, the differences between the hyperprior and the flat prior are small except for horizon 5 and 7, while for ν = 10, the improvement ranges from 7% at horizon 3 to 31% at horizon 7.  1) with dimension m = 5 and sample size T = 30.Column Fixed ν = 0 provides the RMAD of the IRF, while Columns Fixed ν = m + 1 and Hyperprior provide the ratio between the referred priors and the flat prior.
These results are also confirmed in high dimensional cases as shown in Figures 3 and 4 for the ten-dimensional and twenty-dimensional cases, respectively.In Figure 3, we compare our loss-based hyperprior with the fixed ν equal to 0 and m + 1 for the data generated from a Wishart with degrees of freedom equal to 10 (left panel), 15 (center), and 20 (right).In this scenario, the results demonstrate that our loss-based prior is an improvement over fixed ν for the case of 15 and 20 degrees of freedom, while for ν equal to 10, we have small differences between the three prior representations.Table 2 provides the RMAD of the IRFs for different horizons for the scenario with m = 10.
These results confirm conclusions obtained from Figure 3, where our hyperprior outperforms alternatives when degrees of freedom are increased.In detail, for ν equal to 10, the loss-based prior shows strong differences when the horizons increase.For the medium and large cases, the loss-based prior outperforms the other two priors with fixed ν of around 5 − 7% at horizon 3 and 25 − 30% at horizon 7.
For the twenty-dimensional case, Figure 4

Data description
The FRED data are a set of key US macroeconomic quantities sampled at a quarterly frequency from the second quarter of 1959 to the third quarter of 2019.All variables are transformed to be stationary by following the approach of McCracken and Ng (2020) and we use three different sets of variables to estimate a small, medium, and large-scale VAR model.
The small-scale VAR model considers three variables, which represent inflation, the Gross Domestic Product (GDP), and the target interest rate.In particular, the GDP is measured in Billions of Dollars, while inflation is measured by the GDP deflator, which computes the changes in prices for all goods and services produced in the economy and differs from the Consumer Price Index (CPI) because it is not based on a fixed basket of goods.The final variable is the effective Federal Funds Rate (FEDFUNDS), which is the target interest rate set by the Federal Open Market Committee at which commercial banks borrow and lend their excess reserves to each other overnight.
For the medium-scale model, we additionally consider consumption, investment, and production variables.The real personal consumption expenditure (PCECC96) is a measure of consumer or household spending for a period and it is used to construct the PCE Price index, which measures the price changes in consumer goods and services in the US economy.Real gross private domestic investment (GPDIC1) is a component of the GDP and measures the quantity of money invested by private businesses in the domestic economy.The average weekly hours of production and nonsupervisory employees for the manufacturing sector (AWHMAN) relate to the average hours per worker for which pay was received, and it differs from the standard and scheduled hours.
The large-scale model additionally uses the macroeconomic variables related to GDP, inflation, production, consumption, and investment jointly with private investment in the residential sector (PRFIx), consumption (PCECTPI), and common stock index based on the S&P 500 index (SP500).The Industrial Production Index (INDPRO) measures the level of production and capacity in the manufacturing, mining, electric, and gas industries relative to 2012.Capacity Utilization (CUMFNS) captures the manufacturing and production capabilities that are being used by the economy at any given time, relating the output produced with the given resources and the potential output that can be produced if capacity is fully used.Lastly, CPI for All Urban Consumers (CPIAUCSL) measures the average change over time in the prices paid by consumers for a market basket of consumer goods and services.

Forecasting results
In this section, we evaluate the performance of our loss-based hyperprior with respect to the fixed ν prior by forecasting one-quarter ahead (h = 1).We compare the predictive ability of the three different priors by using point and density forecasting measures.To evaluate the forecasting capability, we compute the root mean square error (RMSE), given by where R is the length of the rolling window, y i,t+1 is the observation for the i-th variable, and ŷi,t+1 is the one-step ahead prediction for the i-th variable.
In addition, we evaluate the density forecasting using the continuous ranked probability score (CRPS) introduced by Gneiting and Raftery ( 2007) and Gneiting and Ranjan (2011).
The use of the CRPS has some advantages with respect to the log score since it weights values from the predictive density that are close to the outcome and it is less sensitive to outlier outcomes.The CRPS is defined such that a lower value indicates better performance, and is given by where F (•) is the cumulative distribution function associated with the posterior predictive density, f , 1(y t+1 ≤ z) is an indicator function taking the value 1 if y t+1 ≤ z and 0 otherwise, and Y t+1 , Y ′ t+1 are independent random draws from the posterior predictive density.In addition, we apply Diebold-Mariano t tests (Diebold and Mariano, 1995) for equality of the average loss (with loss defined as the RMSE or CRPS) to compare the predictions of alternative models with the benchmark.The differences in accuracy that are statistically different from zero are denoted with one, two, or three asterisks, corresponding to significance levels of 10%, 5%, and 1%, respectively.
As stated in Section 4.1, we use three sets of variables for small, medium, and large-scale VAR models.The small-scale VAR only includes three variables, the medium-scale VAR seven variables, and the large-scale VAR 15 variables.Given the quarterly frequency of our data, we include p = 5 lags for all of the models considered, and we fit the models using the MCMC algorithm with 6000 iterations after discarding the first 1000 iterations as burn-in.For forecasting, we use a rolling window size of 60 quarters and run one-step ahead forecasts.
Before evaluating forecasting performance, we briefly show the results of the inferred degrees of freedom using a rolling window estimation approach.We show the results for the three different datasets of the posterior mean of the degrees of freedom estimated by using our lossbased hyperprior.Figure 5 shows the results of the estimated degrees of freedom jointly with the 95% highest posterior density (HPD) and the degrees of freedom used in the Horseshoe-Wishart scenario with fixed ν = m + 1 (in red).
From Figure 5, we observe strong evidence of changes in the estimated degrees of freedom across time.The left panel results indicate an increase in the degrees of freedom after 2000 and a fall around 2009, strongly linked to the Lehman Brothers failure.These results are confirmed also in the medium-scale VAR model (center panel) with an increase also observed in the last period.These changes are less evident in the large-scale VAR model (right panel) but provide clear support for not fixing ν.These findings of changes in the degrees of freedom across time can be linked to the uncertainty literature (see, Bloom, 2014), which shows countercyclical fluctuations during recessionary periods.
J a n / 7 5 J a n / 8 0 J a n / 8 5 J a n / 9 0 J a n / 9 5 J a n / 0 0 J a n / 0 5 J a n / 1 0 J a n / 1 Table 4 shows the results for the small-case VAR for the RMSE (left panel) and the CRPS (right panel).The column "Fixed ν = 0", which is the benchmark, reports the RMSE and the average CRPS, while the columns "Fixed ν = m + 1" and "Hyperprior", provide the ratio between each prior and the flat prior.When the ratio is less than 1, it indicates that the model with loss-based hyperprior or with fixed ν = m+1 outperforms the benchmark model with fixed ν = 0.In the point forecasting measure, the benchmark model is outperforming our hyperprior for the FEDFUNDS by a small amount.On the other hand, for the GDP, our hyperprior leads to an improvement of around 9% with respect to the benchmark.This result is confirmed for the average CRPS, where our loss-based hyperprior outperforms the benchmark model by 15% for real GDP growth and the GDP deflator, and by 3% for FEDFUNDS.By using the Diebold-Mariano test, we show evidence of statistical significance for all variables when density forecasting is considered, while for point forecasting, the results are statistically significant for GDP and GDP deflator.In addition, our loss-based prior shows improved density forecasting performance relative to the alternative fixed ν = m + 1 prior.

RMSE average CRPS Variable
Fixed Fixed Hyperprior Fixed Fixed Hyperprior Column Fixed ν = 0 provides the RMSE and the average CRPS; Columns Fixed ν = m + 1 and hyperprior provide the ratio between the referred prior and the flat prior.* * * , * * and * indicate ratios are significantly different from 1 at the 1%, 5% and 10% significance level according to the Diebold-Mariano test.
For the 7-variable model, Table 5 presents the point and density measures, and similar results can be observed for both forecasting performance measures.The main difference relates to the GDPCTPI, where the hyperprior approach outperforms the benchmark by about 11% in point forecasting, whilst for the FEDFUNDS, there is little difference.Moreover, we observe that the hyperprior model is outperforming the other models for the PCECC96 by about 11%, while for the GDP both the hyperprior and the model with fixed ν = m + 1 (with m = 7) demonstrate around 7% and 9% improvement.As in the small-scale VAR, the average CRPS shows better results with respect to the benchmark model across the 7 variables.In particular, the GDP, GDP deflator, and the FEDFUNDS, the hyperprior model outperforms the benchmark by between 4% and 16%.This is also confirmed for the other variables analysed and from the Diebold-Mariano test in particular for the density forecasting measures.

RMSE average CRPS Variable
Fixed Fixed Hyperprior Fixed Fixed Hyperprior Column Fixed ν = 0 provides the RMSE and the average CRPS; Columns Fixed ν = m + 1 and hyperprior provide the ratio between the referred prior and the flat prior.* * * , * * and * indicate ratios are significantly different from 1 at the 1%, 5% and 10% significance level according to the Diebold-Mariano test.
The results for the large-scale VAR with 15 variables are presented in Table 6.For point forecasting, the three main variables of interest show similar improvements to the one in the medium-scale VAR.In fact, the loss-based hyperprior improves with respect to the benchmark by about 12% for the GDP and 28% for its deflator, while for the FEDFUNDS the situation is similar.The average CRPS demonstrates that the improvement is stronger for every variable, particularly for the FEDFUNDS.Column Fixed ν = 0 provides the RMSE and the average CRPS; Columns Fixed ν = m + 1 and hyperprior provide the ratio between the referred prior and the flat prior.* * * , * * and * indicate ratios are significantly different from 1 at the 1%, 5% and 10% significance level according to the Diebold-Mariano test.
5 Case study 2: Dengue data The second case study analyses the Google Dengue Trend (GDT) dataset, which tracks the Dengue incidence based on internet search patterns and clusters weekly queries for key terms related to the disease.We use GDT data from January 2011 to December 2014 for Argentina, Bolivia, Brazil, India, Indonesia, Mexico, Philippines, Singapore, Thailand, and Venezuela, thus having 10 response variables.Following Davis et al. (2016), we examine a VAR model with two lags, and we run a forecasting exercise with a rolling window of 104 weekly observations.
In Figure 6 we present the posterior mean of the degrees of freedom from the model using our hyperprior (solid black line), the 95% HPD (dotted line), and the fixed values of ν equal to 11.This has been evaluated using rolling window estimation.The results indicate that the estimated degrees of freedom are often considerably larger than the fixed value of 11.We observe some changes in values at the beginning of the sample and then a decrease before it remains relatively stationary.
Table 7 shows the forecasting results for each country.For the RMSE, the proposed lossbased hyperprior leads to small improvements with respect to the benchmark prior and the other fixed prior for Bolivia, India, and Mexico, while for Brazil and Philippines, the model with fixed prior equal to ν = m + 1 is performing better with respect to the benchmark and slightly better than the proposed loss-based prior.When we look at the average CRPS for every country, we notice strong improvements against the benchmark model and the fixed ν equal to m + 1 prior in all countries.In particular, the proposed loss-based prior outperforms the benchmark by 5% for India, Philippines, and Brazil, and by 3% for Argentina, Indonesia, Thailand, and Venezuela.These results provide evidence of strong significance from density forecasting for all countries as shown by the Diebold-Mariano test, while from a point forecast measure, the statistical significance of the proposed prior is shown for a few countries (such as Bolivia, India, and Brazil).

Discussion
We have presented a novel method to perform forecasting in VAR models, where a hyperprior is set on the number of degrees of freedom of the covariance matrix for the global-localshrinkage-Wishart prior.(Bernardo and Smith, 1994).To link the worth of each parameter value to the prior probability, we use the self-information loss function.This particular type of loss function assigns a loss to a probability statement and, say we have defined prior π(θ), its form is − log π(θ).More information about the self-information loss function can be found, for example, in Merhav and Feder (1998).
To formally derive the prior for θ, we can proceed in terms of utilities, instead of losses; this approach allows for a clearer exposition and does not impact the logic behind the prior derivation.Let us then write utility u 1 (θ) = log π(θ).We then let the minimum divergence from θ be represented by utility u 2 (θ).We naturally want u 1 (θ) and u 2 (θ) to be matching utility functions; though as it stands −∞ < u 1 ≤ 0 and 0 ≤ u 2 < ∞, and we want u 1 = −∞ when u 2 = 0.The scales are matched by taking exponential transformations, so exp(u 1 ) and exp(u 2 ) − 1 are on the same scale and we obtain π(θ) = e u 1 (θ) ∝ e u 2 (θ) − 1, yielding the loss-based prior for θ in Eq. (2).

A.2.1 Case T = 100
In this section, we provide further simulation results.We report the root mean absolute deviation (RMAD) for the case with T = 100.In particular, Figure A.1 shows the RMAD for the covariance matrix for the five-dimensional case, when the data are generated from a Wishart distribution with degrees of freedom equal to 5 (left), 10 (center), and 15 (right).
Table A.1 provides the RMAD for the impulse response function at four different horizons h = 1, 3, 5 and 7.As stated in the paper, the results show improvements in the use of our loss-based prior with respect to a fixed ν prior when the data are generated with a higher than the dimension degrees of freedom.

A.2.2 Case T = 240
As a third simulated experiment, we report the RMAD for the case with T = 240.Figure A.4 shows the RMAD for the covariance matrix for the three-dimensional case when the data are generated from a Wishart distribution with degrees of freedom equal to 3 (left), 5 (center), and 7 (right).Table A.4 shows the RMAD for the impulse response variable at four different horizons h = 1, 3, 5, and 7.

Figure 2 Figure 2 :
Figure2shows the results for the RMAD for the case with m = 5 and T = 30, where the left panel explains the results when the data are generated with ν = 5, the center with ν = 10, and the right with ν = 15.
reports the results for data generated from a Wishart with 20 (left panel), 24 (center), and 26 (right) degrees of freedom with T equal to 30.In this case, we observe that our loss-based prior and the fixed ν = m + 1 prior behave similarly for the left and center panels, when both outperform the flat prior.When data are generated from a Wishart distribution with 26 degrees of freedom (right panel), our loss-based prior outperforms the fixed ν = m + 1 prior.Table3provides the results for the RMAD for the IRFs for the dimensionality m = 20 and it confirms the findings from Figure4.Hence,
Simulation -RMAD of the IRF for four horizons h = 1, 3, 5 and 7 by simulating 250 VAR(1) with dimension m = 20 and sample size T = 30.Column Fixed ν = 0 provides the RMAD of the IRF, while Columns Fixed ν = m + 1 and Hyperprior provide the ratio between the referred priors and the flat prior.These results are confirmed when T is equal to 100 and 240 as shown in Appendix A.2.4 Case study 1: Forecasting macroeconomic dataIn Section 4.1 we first summarise the macroeconomic dataset (see,McCracken and Ng, 2016,   2020), while Section 4.2 presents the forecasting measures and the main findings of the forecasting exercise by comparing the results of our loss-based hyperprior with respect to the fixed ν prior.

Figure 5 :
Figure5: Estimated degrees of freedom (solid line) for the loss-based hyperprior by using a rolling window of 60 quarters with the 95% Highest posterior density (dotted lines) and the case with ν = m + 1, where m ∈ {3, 7, 15} (red dashed line) for the macroeconomic data.The left panel is for the small-case; the center for the medium-scale and the right for the large-scale.

Figure 6 :
Figure 6: Estimated degrees of freedom (solid line) for the loss-based hyperprior by using a rolling window of 104 weekly data with the 95% Highest posterior density (dotted lines) and the case with ν = m + 1, where m = 10 (red dashed line) for the Dengue data.

Figure
Figure A.1: Monte Carlo Simulation -RMAD of the covariance matrices.These distributions are obtained by simulating 250 VAR(1) with dimension m = 5 and sample size T = 100.Results are reported for data generated from a Wishart distribution with ν = 5 (left), ν = 10 (center), and ν = 15 (right).
3: Monte Carlo Simulation -RMAD of the IRF for four horizons h = 1, 3, 5 and 7 by simulating 250 VAR(1) with dimension m = 20 and sample size T = 100.Column Fixed ν = 0 provides the RMAD of the IRF, while Columns Fixed ν = m + 1 and Hyperprior provide the ratio between the referred priors and the flat prior.

Figure
Figure A.4: Monte Carlo Simulation -RMAD of the covariance matrices.These distributions are obtained by simulating 250 VAR(1) with dimension m = 3 and sample size T = 240.Results are reported for data generated from a Wishart distribution with ν = 3 (left), ν = 5 (center), and ν = 7 (right).
Figure A.7: Posterior Chain of the estimated degrees of freedom when T = 30 and m is equal to 5 (left); 10 (center), and 20 (right).

Figure A. 8 :
Figure A.8: Gelman-Rubin plot for the estimated degrees of freedom when T = 30 and m is equal to 5 (left); 10 (center), and 20 (right) when no burn-in iterations are discarded.

Table 2 :
Monte Carlo Simulation -RMAD of the IRF for four

Table 6 :
If we look at all 15 variables analysed, the loss-based hyperprior model always outperforms the benchmark in a density forecasting scenario, which is also highlighted by the Diebold-Mariano test.These results are confirmed when looking at the prior with fixed ν equal to m + 1, where our loss-based prior outperforms the other prior in density forecasting.RMSE (Columns 2-4) and average CRPS (Columns 5-7) for the large-case VAR for each prior.

Table 7 :
RMSE (Columns 2-4) and average CRPS (Columns 5-7) for the Dengue Data for each prior.Column Fixed ν = 0 provides the RMSE and the average CRPS; Columns Fixed ν = m + 1 and hyperprior provide the ratio between the referred prior and the flat prior.* * * , * * and * indicate ratios are significantly different from 1 at the 1%, 5% and 10% significance level according to the Diebold-Mariano test.
The proposed loss-based prior takes into consideration only the intrinsic properties of the model, that is the sampling distribution and the priors.The method and Leibler, 1951) KL(f (•|θ)∥f (•|θ ′ )) is minimised.That is, if the true model is removed, the estimation process will asymptotically indicate as the correct model the nearest one, in terms of the Kullback-Leibler divergence; i.e., the model which is the most similar to the true one