GARCH density and functional forecasts (cid:73)

This paper derives the analytic form of the h -step ahead prediction density of a GARCH(1,1) process under Gaussian innovations, with a possibly asymmetric news impact curve. The contri-butions of the paper consists both in the derivation of the analytic form of the density, and in its application to a number of econometric problems. A ﬁrst application of the explicit formulae is to characterize the degree of non-Gaussianity of the prediction distribution; for some values encountered in applications, deviations of the prediction distribution from the Gaussian are found to be small, and sometimes not. the Gaussian density as an approximation of the true prediction density. A second application of the formulae is to compute exact tail probabilities and functionals, such as the Value at Risk and the Expected Shortfall, that measure risk when the underlying asset return is generated by a Gaussian GARCH(1,1). This improves on existing methods based on Monte Carlo simulations and (non-parametric) estimation techniques, because the present exact formulae are free of Monte Carlo estimation uncertainty. A third application is the deﬁnition of uncertainty regions for functionals of the prediction distribution that reﬂect in-sample estimation uncertainty. These applications are illustrated on selected empirical examples.


Introduction
Since their introduction in Engle (1982) and Bollerslev (1986), Generalised AutoRegressive Conditional Heteroskedasticity (GARCH) processes have been widely employed in financial econometrics, see e.g. Bollerslev et al. (2010). In the GARCH original formulation, the conditional distribution of innovations was typically assumed to be Gaussian; even with Gaussian innovations, GARCH processes were shown to generate volatility clustering and, when stationary, an unconditional distribution with fatter tails than the Gaussian, see e.g. Bollerslev et al. (1992).
GARCH processes can include several lags q of the past squared shocks and several lags p of the past volatility; in practice, however, the GARCH(1,1) model with p = q = 1 is often found to offer a good fit for asset returns, and it is usually preferred to GARCH models with more parameters, see Tsay (2010) section 3.5, or Andersen et al. (2006), section 3.6. Moreover, many multivariate GARCH models are built on the univariate GARCH(1,1), see e.g. Engle et al. (2019) and references therein. In this sense the GARCH(1,1) is both the prototype and the workhorse of GARCH processes in practice.
GARCH processes map news into the conditional volatility; the function obtained by replacing past conditional volatilities with unconditional ones was called by Engle and Ng (1993) the newsimpact-curve. For GARCH(1,1) processes, this curve yields the same value of volatility for positive and negative shocks, i.e. it is symmetric. Glosten et al. (1993) (henceforth GJR) extended the GARCH setup to allow for asymmetric news impact curve responses to negative shocks.
Many measures of risk are functions of the prediction distribution of asset returns. These measures include the Value at Risk, which is a quantile of the prediction distribution of the asset return, see Jorion (2006), as well as the Expected Shortfall, see Patton et al. (2019) and Arvanitis et al. (2019). The latter is the expected value of the prediction distribution of the asset return in the left tail of the prediction density below the Value at Risk; this measure has been recently re-emphasised by the Third Basel Accords. Both measures are functionals of the prediction distribution of asset returns.
The prediction distribution of a GARCH(1,1) hence plays an important role for the computation of risk measures in financial applications. This distribution is not known in analytic form beyond the distribution of innovations for the 1-step ahead case, which is given by assumption when building the process, see e.g. Andersen et al. (2006), page 811 and Baillie and Bollerslev (1992).
The present paper derives the analytical form of the h-step-ahead prediction density of a Gaussian GARCH(1,1), x t = σ t ε t , with σ 2 t = ω + αx 2 t−1 + βσ 2 t−1 , also allowing for GJR-GARCH(1,1) with asymmetric news-impact-curve. The first contribution of the paper is theoretic, and consists in the form of the p.d.f. and c.d.f. of the prediction distribution. The results are obtained by marginalizing the joint density of the prediction observations, using integration and special functions, for any prediction horizon h = 1, 2, . . . .The formulae are valid for stationary as well as non-stationary GARCH(1,1) processes.
The 2-step-ahead prediction distribution is obtained without imposing constrains on the values of the α and β coefficients. For the h-step-ahead prediction distribution with h ≥ 3, a condition on β is required to guarantee integrability, which depends on how the coefficients are related to the last observed value of the volatility σ 2 1 . Unless the last observed value of volatility σ 2 1 is relatively low, a sufficient condition for integrability is β larger than 0.5. Another sufficient condition that is independent of the last observed value of volatility σ 2 1 is β larger than 0.61803. As suggested by one referee, one could wonder how frequently the conditions β ≥ 0.5 and β ≥ 0.61803 are satisfied by empirical estimates in practice. While this is ultimately an empirical question that depends on the data at hand, one could consider the estimates in Bampinas et al. (2018), who fit GARCH(1,1) models to all stock returns in the S&P 1500 universe, from January 2008 to December 2011, with daily observations. Their Table 1 reports a mean value ofβ of 0.855 (median 0.887) with standard deviation of 0.154. If the empirical distribution of these estimates were Gaussian, the predicted frequency of β estimates below 0.61803 (respectively 0.5) would be 6.2% (respectively 1.1%). Typical values for β estimates reported in textbooks and in empirical finance literature are also above 0.8. On this basis, one would expect β ≥ 0.5 or β ≥ 0.61803 not to be binding restrictions in most practical applications. 4 The assumption of Gaussianity of the innovations is central in the derivations in the paper, which employs analytical integration and special functions that are specific for this distribution. The approach in the derivation of the prediction distribution is expected to be amenable to extensions to non-Gaussian symmetric distributions of innovations. These extensions are not trivial and are not considered in the present paper.
A first application of the present results is the characterization of the degree of non-Gaussianity of the prediction distribution. This problem is relevant in practice, because the Gaussian density is often used as an approximation of the true prediction density, see e.g. Lau and McSharry (2010), page 1322. The exact formuale in this paper demonstrate that the prediction density can be very far from normal. However, for some parameter values encountered in applications, the discrepancy of the prediction density from the Gaussian density can be small, and this fact would support to the Gaussian approximation. Note that it is the analytic form of the prediction density derived in this paper that allows to measure this discrepancy.
A different question in this first area of application is the association of the coefficients of the GARCH with both the shape of the prediction distribution and the shape of the stationary distribution, when this exists. Under stationarity, the predictive distribution converges to the stationary distribution as the number of steps increases. The tail behavior of the stationary distribution of the Gaussian GARCH(1,1) has been studied extensively, see Mikosch and Starica (2000) and Davis and Mikosch (2009). The tails of the stationary distribution of both the volatility and of the 4 Table 1 in Bampinas et al. (2018) reports a skewness of −2.670 and kurtosis of 12.75, which indicate that the distribution is far from normal; unfortunately, no min or max values are reported in the table. Typicalβ values in textbooks and in empirical finance literature can be found in Linton (2019)  GARCH process x t are of Pareto type, Pr(x t > u) ≈ cu −2κ say. These properties are based on results for random difference equations and renewal theory obtained in Kesten (1973) and Goldie (1991).
The prediction density is found to resemble a Gaussian density (with appropriate variance) for high values of β/α, and far from it for low values of it. Similarly, large values of β/α are found to be associated with higher values of κ, i.e. a Pareto stationary distribution with more moments (the Gaussian has all moments).
A second application of the explicit formulae in this paper is to compute exact tail probabilities and functionals, such as Expected Shortfall, that measure risk when the underlying asset return is generated by a Gaussian GARCH(1,1). This improves on existing methods based on approximations or Monte Carlo (MC) simulations combined with (non-parametric) estimation techniques.
The so-far unknown analytic form of the prediction density of a GARCH has led econometricians to look for alternative approximate solutions. Alexander et al. (2013) have resorted to approximations based on the first 4 moments of the prediction distribution; Baillie and Bollerslev (1992) use a Cornish-Fisher expansion and a Johnson SU distribution using the first 4 moments of the GARCH(1,1) to fit the distributions.
Alternative methods for estimating the prediction density and risk measures such as the Value at Risk and the Expected Shortfall rely on MC simulations of the underlying GARCH processes, see e.g. Delaigle et al. (2016). All these MC methods implicitly assume that the density and that the unknown functionals are finite. 5 In this domain, the present exact formulae are key to prove that the density and the unknown functionals are finite, which is pre-requisite for MC methods to work. It must be noted, however, that MC methods have an additional layer of uncertainty -associated with MC estimation -that the exact methods proposed in this paper bypass entirely. Specifically, MC estimation of prediction functionals results in confidence intervals with positive length and coverage probability 1 − η < 1; their counterpart based on the exact results of this paper can be represented by confidence intervals with length equal to 0 and coverage probability 1. Hence the exact methods in this paper are qualitatively superior to those based on MC methods, in addition to being much more parsimonious in terms of required calculations.
A third application of the exact formulae in this paper is to provide uncertainty intervals for (functionals of) the prediction distribution that reflect in-sample estimation uncertainty. This allows one to map estimation uncertainty onto forecast uncertainty for the risk measures and to construct the associated forecast intervals that have a pre-specified (asymptotic) coverage level. For instance, one could predict the Expected Shortfall to lie in an interval (ES l , ES u ) with 95% confidence level, where the uncertainty reflects in-sample estimation uncertainty on the GARCH parameters. As already discussed for the second area of application, the exact methods in this paper lead to results that are structurally different from the ones based on MC methods. In fact the latter involve an additional layer of MC estimation uncertainty associated, which is avoided by the exact methods in the present paper.
The rest of the paper is organised as follows. Section 2 describes the general approach for the derivation of the prediction distribution. Section 3 states the main theoretical results. Section 4 discusses the degree of non-Gaussianity of the prediction distribution and compares the prediction distribution with the tails of the stationary distribution when this exists; this is the first area of application discussed above. Section 5 discusses the second area of application, i.e. how to apply the present exact results to the calculation of the Value at Risk and of the Expected Shortfall, and compares the obtained results with alternative estimators based on MC methods. Section 6 discusses the third area of application of the present analytical formulae to the construction of forecast intervals for risk measures that reflect in-sample estimation uncertainty. Section 7 concludes. Proofs are collected in the Appendix.

The prediction density
This section summarises the construction used to derive the prediction density as an integral, involving a product of densities of the innovations. Consider the asymmetric GJR-GARCH(1,1) where ω, α, β > 0, λ ≥ 0 and 1 x t <0 = 1 2 (1 − ς t ) is the indicator function for the event x t < 0, and ς t := sgn(ε t ) = sgn(x t ) is the sign of ε t or x t ; these signs are the same because σ t > 0. The sequence {ε t } is assumed to be i.i.d., centered around zero and with Gaussian p.d.f. f ε ( ) := g( 2 ) := (2π) − 1 2 exp(− 2 /2). Time t = 0 is taken to be the starting time of the prediction, and it is assumed that one wishes to predict x h for some h = 1, 2, 3, . . . , conditional on the information set at time t = 0, which contains x 0 and σ 0 ; the conditioning on x 0 and σ 0 is not explicitly included in the notation of the prediction distribution for simplicity. 6 Throughout the paper the values taken by the random variables x t , and z t := x 2 t are denoted u t and w t respectively, or sometimes u and w for simplicity, when this may not cause ambiguity.
Lemma 2.1 (Densities). For symmetric f ε ( ) = g( 2 ), the p.d.f. f x t (·) is symmetric, i.e. f x t (u) = f x t (−u), u ∈ R, and it is related to the p.d.f. of z t in the following way f x t (u) = f z t (u 2 ) |u| .
(2.2) 6 The information set at t = 0 containing contains x 0 and σ 0 is consistent with observing x t from minus infinity to time 0 under stationarity. Note also that, because x 0 and σ 2 0 are observed, also σ 2 1 is observed.
Moreover, Pr(ς t = ±1) = 1 2 and one has where σ 2 t depends on w t− j (the value of z t− j = x 2 t− j ) and s t− j (the sign of x t− j ) for j = 1, . . . , t − 1 via (2.1).
Next denote the set of all possible h − 1 sign vectors ς by S, #S = 2 h−1 . Densities are first computed conditionally on ς and later they are marginalized with respect to it. Here, conditioning on ς is relevant only for the GJR case λ 0.
The basic building block is given by the expression in (2.3). This density can be marginalised with respect to z as follows (2.4) Finally, f z h |ς (w h |s) can be marginalised with respect to the signs ς using the mutual independence of the signs ς t− j and the fact that Pr(ς t = ±1) = 1 2 for all t, due to the symmetry of g. One hence finds where the sum s is over s j ∈ {−1, 1}, for j = 1, . . . , h − 1. The prediction density f z h (w h ) is found by combining (2.5), (2.4), (2.3), (2.2). The next Lemma reports a recursion for the volatility process, that turns out to be useful when solving the integral in (2.4). In the Lemma, the following notation is used: for t = 1, . . . , h − 1, let y t := α t z t /(βσ 2 t ) = α t x 2 t /(βσ 2 t ) = α t ε 2 t /β and y := (y 1 , . . . , y h−1 ) , where v := (v 1 , . . . , v h−1 ) denotes a value of y.
Lemma 2.2 (Volatility and transformations). The volatility process can also be written For h ≥ 2, σ 2 h has the following recursive expression in terms of y's with σ 2 1 = ω + βσ 2 0 + α 0 x 2 0 , which is measurable with respect to the information set at time 0. Moreover, one has
In Theorem 3.2 below, Ψ is the confluent hypergeometric function of the second kind, also known as Tricomi function, see Abadir (1999) and Gradshteyn and Ryzhik (2007), section 9.21, whose integral representation is, The Ψ function is used to define the following quantities: where c j := 2 −h+1 s∈S c j,s and c j,s is defined as c j,s := γ Observe that the expression for c j,s does not depend on the points of evaluation w h or u h , and hence the c j,s coefficients can be computed only once for the whole densities. One can prove, see Lemma Appendix A.1 in the Appendix, that Ψ(a; c; z) → 0 for c → −∞. This implies that the terms Ψ 1 2 , r − K t + 3 2 ; z converge to 0 for large k t in the product in (3.2). Note also that for h = 2, equation (3.4) holds for any value of β, while for h = 3 it holds if and only if β ≥ β. For h > 3, the validity of the (3.4) is guaranteed by the sufficient condition β ≥ max( 1 2 , β), which is, however, not necessary. The line of proof of Theorem 3.2 is the following: for h = 2 the integral is solved by substitution and by using equation (3.1). For h ≥ 3, subsequent (negative) binomial expansions of expression (2.7) for σ 2 t are required, whose validity is ensured by the inequality which is satisfied under Assumption 3.1, see Lemma Appendix A.2 in the Appendix. Immediate consequences of Theorem 3.2 are collected in the following corollary.
with 0 odd moments for x h and even moments where γ h and A h (m) depend on s, see their definitions in Lemma 2.2 and in (3.2).
Note that A h (m) in the moments calculations are made of finite sums extending to m, involving the Tricomi functions, which do not fall in the logarithmic case as in Theorem 3.2; see Abadir (1999) for the logarithmic case. In fact, m − k ∈ {0, 1, . . . , m} implies that is a finite sum, see Abadir (1999), which is proportional to the generalized Laguerre polynomial Abramowitz and Stegun (1964) Chapter 22. For the moments of a GARCH(1,1), one can compare (3.6) with equations (34) and (35) in Baillie and Bollerslev (1992).
Some standardized densities of x h and the corresponding right tails are plotted in Figure 2 for h = 1, 2, 3, 4. The curve h = 1 is the standard Gaussian. Computations for Figures 2, 3 and 4 were performed in Mathematica. 7 Figure 3 shows the standardized prediction densities for h = 2 and values of β/α that range from to 8.5 (α = 0.1, β = 0.85) to 1/8.5 (α = 0.85, β = 0.1). This figure shows that the deviations from the Gaussian case of the prediction density can be substantial; the prediction densities are more similar to a Gaussian when β/α is large. Figure 4 shows the tails for the GJR-GARCH(1,1) case.   Figure 3: Prediction density f x 2 (u 2 ) for standardized x 2 , ω = 0.1, σ 2 0 = 1; x 2 0 = 1 varying values of (α, β) The formulae in Theorems 3.2 and Corollary 3.3 are alternating in sign. While (absolutely) convergent, the associated series was found in practice to be ill-behaved numerically when ρ u := u 2 /(ω + βσ 2 1 ) is very large, causing the oscillations in the terms of the series to become large before decreasing in amplitude toward zero, where 'large' refers to the greatest floating point number handled by the computer. Note that this is can be linked to large u and/or small ω + βσ 2 1 . This extreme behaviour implies accumulation of numerical errors, which can lead to inaccurate calculations of the prediction density.
Example 3.4 (Numerical accuracy). One such case can be obtained for the density of x 2 in formula (3.4), h = 2, in the following way: select u = 4 for ω = 0.0000114, α = 0.85, β = 0.14, and choose σ 2 0 = x 2 0 = σ 2 := ω/(1 − α − β) = 0.00114. This results in ρ u = 53.3 for the standardized p.d.f. of x 2 /s with s 2 := ω + (α + β)σ 2 1 . In this case, the oscillations of the terms in the series increase up to ±4 · 10 20 around the 50 th term of the series, before oscillations decrease toward zero; the resulting 7 When x has mean 0 and standard deviation s, the standardized variate is z = x/s, with density f z (a) = s f x (sa). series truncated after its first 100 terms gave the negative number −2.9628 · 10 12 . Calculations performed in MATLAB 2018a on an Intel i7 Windows 10 computer. As a comparison, the same script applied to u = 2 gave f x 2 (2) = 0.03688432 In order to address these numerical accuracy problems when ρ u = u 2 /(ω + βσ 2 1 ) is large, the following theorem presents a different set of formulae for the prediction density. This alternative set has the advantage to allow computations in the far tails of the density, at the price of a slightly higher implementation cost.
The improved numerical performance of formula (3.8) is linked to the presence of the term e −ρ u when ρ u := u 2 2(ω+βσ 2 1 ) is large. In fact for u 2 → ∞ the term e −ρ u → 0, so that e −ρ u compensates the large terms of the type ρ j u that appear in the sum for large u 2 . For u 2 → 0, the term e −ρ u → 1, so that e −ρ u does not influence the sum for small values of u 2 . Note, moreover, that all the terms in the series (3.7) and (3.8) are positive, so that there are no oscillations associated with different signs for the terms in the series.
Example 3.6 (Numerical accuracy -continued). In the same setup of Example 3.4, formula (3.8) is numerically accurate. In fact, all the terms in the series were found to be bounded by 2 · 10 −4 , with value of the density equal to 0.002953901, again using the first 100 terms of the series. Calculations were performed in the same environment as in Example 3.4. As a comparison, the same script applied to u = 2 gave f x 2 (2) = 0.03688291, which agrees with formula (3.4) in Example 3.4 up to the 5th digit (discrepancy equal to 1.4017 · 10 −6 ).
The slightly higher implementation cost of formula (3.8) is associated with the presence of the generalised Laguerre polynomial in p j (ρ) for h ≥ 3. They are finite sums and add a moderate cost in terms of computations. Similar derivations to Corollary 3.3 can be performed on (3.7), (3.8) to derive the corresponding c.d.f.s.

Stationary distribution
The limit representation of the random variable x h in the stationary case can be found in Francq and Zakoian (2010) Theorem 2.1 page 24. The tail behaviour of the limit distribution is reviewed in Mikosch and Starica (2000) and Davis and Mikosch (2009). The tails of the stationary distribution of both the volatility and of the GARCH process x t are of Pareto type, Pr(x t > u) ≈ cu −2κ say, where κ > 0 is a tail index. These properties are based on results for random difference equations and renewal theory obtained in Kesten (1973) and Goldie (1991).
The tail index of the stationary distribution depends on the coefficient α and β of the GARCH(1,1) process x t as well as on the one-step-ahead distribution. Examples of the tail index are given in Davis and Mikosch (2009); for Gaussian innovations, κ = 14.1 for α = β = 0.1, while κ = 1 for α = 1 − β.
The index κ is the unique solution of E((αε 2 t + β) κ ) = 1. When κ is an integer, the expression simplifies to see Davis and Mikosch (2009) eq. (10). Substituting the moments E(ε 2n t ) from the χ 2 distribution, and assigning values to α/β over a grid of pre-specified values, one can solve (4.1) for β, and hence for α = (α/β)β. This allows to compute (values of) the surface κ(α, β). Figure 5 reports the level curves of κ(α, β) as a function of α and β obtained in this way. The figure also reports the lines where β/α is constant. It is seen that, for large values of β/α, κ and β/α increase roughly together. This association is not present for small values of β/α. The relation between β/α and fat-tailedness of the prediction density for finite horizon h can be illustrated using the case h = 2. From Theorem 3.2, The quantityσ 2 2 := ω + βσ 2 1 can be interpreted as the minimum value that σ 2 2 = ω + (1 + y 1 ) βσ 2 1 can take, in the ideal case when α = 0 (thus y 1 = 0) and σ 2 1 is given, i.e. x 2 ∼ N(0,σ 2 2 ).
Hence when β/α → ∞ one has z → ∞ with √ zΨ 1 2 ; 1 − j; z = 1 + O(|z| −1 ), see Abramowitz and Stegun (1964), eq. 13.1.8, so that all the Tricomi functions Ψ j , for varying j, tend to one. 10 As a result, when β/α → ∞ the prediction distribution converges to a N(0,σ 2 2 ). One concludes that both for the prediction density for h = 2 and for the stationary distribution, the fat-tailedness of the distributions is small for large values of β/α, unless α is very close to 0.

Comparing exact formulae with simulation-based methods
This section describes the application of the formulae in the previous section to the calculation of the Value at Risk and of the Expected Shortfall, comparing them with alternatives based on Monte Carlo. This comparison is made under Gaussianity and Assumption 3.1, so that the formulae in the paper can be applied. The analysis in this and the remaining sections is for generic forecast horizon h = 2, 3, . . . , while illustrations are made for h = 2 and λ = 0 for simplicity and without loss of generality.
Let p be some tail probability, such as 5%, and let Q h,p be the Value at Risk, defined as the (negative) of the p quantile of the prediction distribution, i.e. p = Pr(x h < −Q h,p ). Let also ES h,p indicate the corresponding expected shortfall, i.e. ES h,p := − E(x h |x h < −Q h,p ), following standard notation, see e.g. Francq and Zakoian (2015).
Observe that ES h,p may fail to exist when the underlying density has Cauchy tails. One implication of the exact results in Theorem 3.2 of Section 3 is that for finite h the prediction of x h has thinner-than-Cauchy tails, and hence ES h,p exists; this appears to be a central issue for the application of ES h,p as a measure of risk.
The following subsections show first how the exact formulae can be applied in this context, and next their relative advantage over methods based on MC methods. The same advantages discussed for the quantification of the Value at Risk and the Expected Shortfall apply more generally to other functionals of the prediction distribution, as well as to the nonparametric estimation of the prediction distribution itself. For brevity, these latter cases are not discussed in this paper in detail.
The rest of the section refers to the standardized prediction distribution of x 2 when ω = 1.14 · 10 −5 , α = 0.131007, β = 0.845708, λ = 0; these values are the ML estimates on a AR(2)-GARCH(1,1) model for the weekly S&P500 stock index return from 1950-2018 reported in Table  11.4 in Linton (2019). These values of α and β are very similar to the median estimates in Table 1 of Bampinas et al. (2018) for the set of individual S&P 1500 daily returns. In the calculations σ 2 1 was set equal to ω/(1 − α − β). For these parameters, standard double precision was found to be sufficient for h = 2 for a range of |x| < 6 in standardized units.

Exact calculations
Both Q h,p and ES h,p can be calculated using the exact formulae in this paper. This subsection combines the use of results in Section 3 with numerical techniques to illustrate applications of Table 1: Values of Q 2,p for ω = 1.14 · 10 −5 , α = 0.131007, β = 0.845708, λ = 0 using iterations (5.1).
Here F x h (u) is given in (3.5) and f x h (u) is given in (3.4); this typically requires a handful of function evaluations. Consider next ES h,p ; one can write where the second equality follows by integration by parts, and the third because F x h (−Q h,p ) = p by definition. 11 This integral can be evaluated numerically using f x h (u) in (3.4), or F x h (u) in (3.5), employing quadrature methods (trapezoid), see Press et al. (2007), Chapters 4 and 13. Table 1 reports values of Q 2,p using the reference values from Table 11.4 in Linton (2019). The chosen algorithm in (5.1) was implemented in Matlab, using a tolerance value of 10 −7 and avoiding to divide by f when this is smaller than 10 −14 . Initial values of the iterations were chosen equal to the corresponding Gaussian quantiles. Values of F x h (u) were computed as in (3.5) and f x h (u) as in (3.4), truncating sums at 100 terms. Table 1 reports terminal values of the iterations, along with number of iterations and a comparison with the standard Gaussian distribution. Unsurprisingly, the Values at Risk are found to be close to the Gaussian quantiles. However, they are both smaller or larger than the Gaussian, depending on the value of p. The number of iterations needed was smaller than 5. Table 2 reports values of ES 2,p using the standardized prediction distribution with the same parameter values as in Table 1. Numerical integration as in the last expression in (5.2) was performed 11 Note that, whenever E(x) exists, one has lim x→−∞ (−xF(x)) = 0; in fact for −∞ < u < x one has  Values of F x h (u) were computed as in (3.5) and f x h (u) as in (3.4), truncating sums at 100 terms. Table 2 shows that the Expected Shortfall values are close to the Gaussian case, but systematically lower than them. In practice, the call to the integral function was quicker than the computation of the Q 2,p in the Table 1.

Alternatives based on Monte Carlo
Alternative methods to compute Q h,p and ES h,p rely on MC simulations. Simple MC solutions are reviewed here for comparison with the exact methods above. In order to estimate Q h,p and ES h,p , for replication j = 1, . . . , n, one could generate pseudo random numbers (ε * t, j ) h t=1 and construct the corresponding values (x * t, j ) h t=1 using recursion (2.1). Let x * h, j be the j-th MC realization of x h constructed in this way, and observe that x * h, j are independent realisations across repetitions j from the prediction distribution.
Repeating this for j = 1, . . . , n, the sample (x * h,1 , . . . , x * h,n ) can be formed; let (q 1 , . . . , q n ) indicate the ordered values of (x * h,1 , . . . , x * h,n ), with q 1 , ≤ · · · ≤ q n . The MC quantile q np +1 can be used to estimate −Q h,p , where x and x indicate the round-down or round-up of x to the nearest integer. 12 Observe here that the sample is a (pseudo) i.i.d. sample from the prediction distribution, and hence all results for i.i.d. samples apply on it. Standard results of quantiles based on the application of the central limit theorem to the MC empirical c.d.f., see e.g. Dudevicz and Mishra (1988) Theorem 7.4.21, imply that , 12 x (respectively x ) denotes the largest (respectively smallest) integer value less or equal (respectively greater or equal) to x.
where w → indicates weak convergence for n → ∞. Hence a MC large-n confidence interval for Q h,p using q np +1 at level η is given by q np +1 ± z 1−η/2 p(1 − p)/( √ n f x h (q np +1 )) where z b is the b-quantile from the standard normal distribution. The length of the confidence interval for Q h,p is hence Q = 2z 1−η/2 p(1 − p)/( √ n f x h (Q h,p )), which is linked to the precision of the MC estimate. Setting Q ≤ 10 −a for some integer a, this equation can be solved for R, giving Similarly, consider the MC estimation of ES h,p for given Q h,p . Assuming Q h,p known here simplifies derivations without altering the main discussion of MC uncertainty; see Patton et al. (2019) for the joint estimation of Q h,p and ES h,p . The Expected Shortfall could be estimated by Observe that this MC estimator is consistent when ES h,p exists, which is the case thanks to the results in Theorem 3.2.
Let further Observe that these expectations exist thanks to the results in Theorem 3.2. Further, note that v h,r /p has expectation ES h,p and variance V 2 h /p 2 , and hence E(v h,r ) = pES h,p . Application of the central limit theorem, see e.g. Dudevicz and Mishra (1988) Thus a MC large-n confidence interval for ES h,p using m h,p at level η is given by m h,p ±z 1−η/2 V h /(p √ n). The length (precision) of the confidence interval for ES h,p is hence ES = 2z 1−η/2 V h /(p √ n). Setting ES ≤ 10 −a for some integer a, this equation can be solved for n, giving (5.5) Values of n from (5.3) are reported in Table 3 for the selected precision level a = 5 and h = 2, using the values of α and β from Table 1 and with reference to the standardized variate. In Table  3, f x h (Q h,p ) in (5.3) is computed using the exact formula (3.4).
From Table 3 one deduces that a large number of replications n is required to compute a confidence interval at level 1 − η for Q h,p for given a. Note that the values of n are large also because of the factor f 2 x h (Q h,p ) and p 2 in the denominators of (5.3) and (5.5), respectively. Values of n from (5.5) are reported in Table 4 for the selected precision level a = 5 and h = 2, using the values of α and β from Table 1, and with reference to the standardized variate. In Table  Table 4: Number of replications n for a MC confidence interval on Q 2,p , see eq. (5.5), at given MC coverage level 1 − η. p = 0.05 0.025 0.01 0.005 η = 0.05 5.0484E+12 1.2813E+13 4.0575E+13 9.3989E+13 η = 0.01 8.7196E+12 2.2131E+13 7.0081E+13 1.6234E+14 4, V 2 h in (5.5) is evaluated using numerical integration in (5.4) for n = 2 with f x h (·) computed as in (5.3). Also from Table 4 one deduces that a large number of replications R is required to compute a confidence interval at level 1 − η for ES h,p .
More importantly, because of the nature of confidence intervals, there is probability η that each of Q h,p or ES h,p does not fall within its MC confidence interval. Decreasing η does not offer a solution to this problem, because the quantile z 1−η/2 of the standard normal distribution would diverge.
One hence concludes that the MC estimation of Q h,p or ES h,p is costly in terms of number of replications n, and it does not guarantee any given level of numerical precision a, because of the probability η of Q h,p or ES h,p to fall outside its MC confidence interval. This is in contrast with the ease and precision of the exact formulae (3.4) and (3.5) provided in this paper.
Similar consideration apply the to direct nonparametric estimation of the prediction density.

Uncertainty regions for prediction functionals
This section discusses how uncertainty regions can be constructed for prediction functionals to reflect estimation uncertainty, making use of the explicit formulae in the paper.
Let θ = (ω, α, β, λ) indicate the parameters of the GARCH(1,1) in eq. (2.1), and assume that the model has been estimated on a sample of data {x t } 0 t=−T +1 by Quasi Maximum Likelihood (QML). Note that the estimation sample includes T > 0 observations indexed by negative values of t. Let θ be the corresponding QML and θ 0 the (pseudo)-true values.
Under appropriate regularity conditions, see Lee and Hansen (1994), Jensen and Rahbek (2004) and Arvanitis and Louka (2017) and references therein, one has results of the type T indicates convergence in distribution as T → ∞, and R indicates a full-columnrank matrix with r columns. This allows to construct asymptotic confidence regions of the type where Pr(w ≤ c η ) = 1 − η and w ∼ χ 2 (r), and R is a full column rank matrix with r columns. This region has the property that Pr(R θ 0 ∈ A η ) → 1 − η. Note that in (6.1) Ω R can be replaced by a consistent estimator. A special case of this is when R is chosen equal to the identity I; in this case (6.1) gives the confidence ellipsoid for the unrestricted vector θ; this is default case in the following.
Let ϕ be a (multivariate) functional of interest, such as the Q or the ES , or both, which depends on θ, ϕ = ϕ(θ). Define also the set of values B η taken by the ϕ map for any value of θ in A η , i.e.
Then the following proposition shows that B η is an uncertainty region for ϕ with at least asymptotic coverage equal to 1 − η.
Two cases were considered for illustration. The first case, labelled 'Microsoft stock returns', corresponds to a GARCH (1,1,) estimated on daily log returns of the Microsoft stock price, over the period 2010-12-08 to 2018-11-15, for a total of 2000 observations. The GARCH(1,1) ML estimates wereω = 0.048977(0.0049464),α = 0.078824(0.0075383),β = 0.88389(0.0092256), with estimated standard errors in parenthesis. The estimated asymptotic variance covariance was saved and used to compute the estimation uncertainty region for Q 2,p and ES 2,p . Table 5 reports the results.
For both cases in Table 5, 200 points were used in the grid, half of which were selected as image of points θ for which the inequality in (6.1) is valid as an equality, i.e. points on the surface of the confidence ellipsoid. The last column in Table 5 reports how many of the extremes in each row were found corresponding to θ values on the surface. It can be seen that many of these extremes come from points on the surface, but not all. Increasing the number of points in the grid to 2000 gave marginal improvements for the extremes. 13 More details on the computations behind Table 5 are reported in Appendix Appendix B. One could ask whether analogues to this procedure exist which use MC in place of the exact formulae, where each map ϕ(θ) is replaced by MC simulation plus MC estimation of ϕ(·). The MC approach implies a large computational burden, because of the added MC simulation and estimation burden associated with the estimation of ϕ(·) map. Moreover, the inherent limitations associated with MC confidence interval discusses in Section 5 would apply here, which would add extra uncertainty for the estimation of ϕ(·). This additional layer of MC uncertainty is completely avoided by the present exact methods.
In other words, the uncertainty regions produced via the present exact methods only reflect insample estimation uncertainty associated with the GARCH parameter, but not the MC simulation and estimation uncertainty of the ϕ(·) map.

Conclusions
This paper presents the analytical form of the prediction density of a GARCH(1,1) process. This can be used to evaluate the probability of tail events or of quantities that may be of interest for value at risk calculations. The exact formulae improve on approximation methods based on moments, or on Monte Carlo simulation and estimation.
The exact formuale show that, while the prediction density can be very far from normal, for common parameter values often encountered in applications, the discrepancy of the prediction density from the Gaussian distribution can be small. These results could not be obtained without the explicit form of the prediction density.
The present exact results are shown to imply easy-to-compute uncertainty regions for risk functionals, so as to reflect estimation uncertainty. These tools are not available for alternatives based on approximations or MC simulations and estimation of functionals.
The techniques in this paper can be extended to the case of symmetric innovations density g(·) different from the N(0,1) one. Different densities imply distinct subsequent (negative) binomial expansions of expression (2.7) for σ 2 t , and different auxiliary convergence conditions on the GARCH coefficients, similarly to Assumption 3.1. These extensions are left to future research.
Because of the asymptotic nature of the confidence ellipsoid, some of the obtained points R θ = B −1 (x+u) contained negative values of ω or α, i.e. were not inside the parameter space. This never happened for the case of Microsoft stock returns, but happened some 20% of the time in the case of the simulation run; these points R θ were discarded.