Cross-temporal Probabilistic Forecast Reconciliation

Forecast reconciliation is a post-forecasting process that involves transforming a set of incoherent forecasts into coherent forecasts which satisfy a given set of linear constraints for a multivariate time series. In this paper we extend the cur-rent state-of-the-art cross-sectional probabilistic forecast reconciliation approach to encompass a cross-temporal framework, where temporal constraints are also applied. Our proposed methodology employs both parametric Gaussian and non-parametric bootstrap approaches to draw samples from an incoherent cross-temporal distribution. To improve the estimation of the forecast error covariance matrix, we propose using multi-step residuals, especially in the time dimension where the usual one-step residuals fail. To address high-dimensionality issues, we present four alternatives for the covariance matrix, where we exploit the two-fold nature (cross-sectional and temporal) of the cross-temporal structure, and introduce the idea of overlapping residuals. We evaluate the proposed methods through a detailed simulation study that investigates their theoretical and empirical properties. We further assess the effectiveness of the proposed cross-temporal reconciliation approach by applying it to two empirical forecasting experiments, using the Australian GDP and the Australian Tourism Demand datasets. For both applications, we show that the optimal cross-temporal reconciliation approaches signiﬁcantly outperform the incoherent base forecasts in terms of the Continuous Ranked Probability Score and the Energy Score. Overall, our study expands and uniﬁes the notation for cross-sectional, temporal and cross-temporal reconciliation, thus extending and deepening the probabilistic cross-temporal framework. The results highlight the potential of the proposed cross-temporal forecast reconciliation methods in improving the accuracy of probabilistic forecasting models.


Introduction
Forecast reconciliation is a post-forecasting process intended to improve the quality of forecasts for a system of linearly constrained multiple time series . There are many fields where forecast reconciliation is useful, such as when forecasting GDP and its components, electricity demand and power generation, demand in supply chains with product categories, tourist flows across geographic regions and travel purpose, and more. Moreover, effective decision-making depends on the support of accurate and coherent forecasts.
Classical reconciliation methods addressed the issue of incoherent forecasts in a crosssectional hierarchy by forecasting only one level and using these to generate forecasts for the remaining series. The bottom-up approach (Dunn et al. 1976) starts by generating forecasts at the most disaggregate level and summing these to arrive at the desired forecasts for aggregate levels. On the other hand, the top-down approach (Gross & Sohl 1990) forecasts the most aggregated level and then disaggregates it to lower levels (Fliedner 2001, Athanasopoulos et al. 2009). The middle-out method (Athanasopoulos et al. 2009 All of these approaches ignore useful information available at other levels (Pennings & van Dalen 2017). Consequently, in the last decade, hierarchical forecasting has significantly evolved to include modern least squares-based reconciliation techniques in the cross-sectional framework , Wickramasuriya et al. 2019, later extended to temporal hierarchies , Nystrup et al. 2020. Obtaining coherent forecasts across both the cross-sectional and temporal dimensions (known as crosstemporal coherence) has been limited to sequential approaches that address each dimension separately (Kourentzes & Athanasopoulos 2019, Yagli et al. 2019, Punia et al. 2020, Spiliotis et al. 2020. Recently, Di Fonzo & Girolimetto (2023a) suggested a unified reconciliation step that takes into account both the cross-sectional and temporal dimensions, instead of dealing with them separately, utilizing the entire cross-temporal hierarchy. However, these cross-temporal works focus on point forecasting, and do not consider distributional or probabilistic forecasts (Gneiting & Katzfuss 2014  (2022) and Wickramasuriya (2023). Panagiotelis et al. (2023) made a significant contribution by formalizing cross-sectional probabilistic reconciliation using the geometric framework for point forecast reconciliation of Panagiotelis et al. (2021). They show how a reconciled forecast can be constructed from an arbitrary base forecast when its density is available and when only a sample can be drawn. They also show that in the case of elliptical distributions, the correct predictive distribution can be recovered via linear reconciliation, regardless of the base forecast location and scale parameters, and derive conditions for this to hold in the special case of reconciliation via projection.
In this paper, we extend cross-sectional probabilistic reconciliation to the cross-temporal case, working on issues related to the two-fold nature of this framework. First, we revise and develop the notation proposed by Di Fonzo &  to generalize the work of Panagiotelis et al. (2023). This allows us to move from cross-temporal point reconciliation to a probabilistic setting through the generalization of definitions and theorems well-established in the cross-sectional framework. Second, we propose effective and practical solutions to draw a sample from the base forecast distribution according to either a parametric approach that assumes Gaussianity or a non-parametric approach that bootstraps the base model residuals.
Third, we propose some solutions to specific problems that arise when combining the crosssectional and temporal dimensions. We propose using multi-step residuals to estimate the relationships between different forecast horizons when we deal with temporal levels, since one-step residuals are not suitable for this purpose. To solve high-dimensionality issues we introduce the idea of overlapping residuals and consider alternative forms for constructing the covariance matrix. Fourth, we propose new shrinkage procedures for reconciliation that aim to identify a feasible cross-temporal structure. The methodological contributions described in this paper are implemented in the FoReco package (Girolimetto & Di Fonzo 2023) for R (R Core Team 2022). Furthermore, the Appendix contains complementary materials on methodological and practical issues, and supplementary tables and graphs related to the empirical applications.
The remainder of the paper is structured as follows. In Section 2, we provide a unified notation for the cross-sectional, temporal and cross-temporal point reconciliation. We generalize the cross-sectional definitions and theorems developed by Panagiotelis et al. (2023) in Section 3, and propose both a parametric Gaussian and a non-parametric bootstrap approach to draw a sample from the base forecast distribution. In Section 4, we analyze the structure of the cross-temporal covariance matrix, proposing four alternative forms, and propose shrinkage approaches for reconciliation. In addition, we explore cross-temporal residuals (overlapping and multi-step) looking at their advantages and limitations. Two empirical applications using the Australian GDP and the Australian Tourism Demand datasets are considered in Sections 5 and 6, respectively 1 . Finally, Section 7 presents conclusions and a future research agenda on this and other related topics.

Notation and definitions
Let y t = [y 1,t , . . . , y i,t , . . . , y n,t ] be an n-variate linearly constrained time series observed at the most temporally disaggregated level, with a seasonality of period m (e.g., m = 12 for monthly data, m = 4 for quarterly data, m = 24 for hourly data). Suppose that the constraints are expressed by linear equations such that  C cs y t = 0 (n a ×1) , t = 1, . . . , T, y X,t y Y,t y T,t y i,4τ−3 y i,4τ−2 y i,4τ−1 y i,4τ x [2] i,2τ−1 x [2] i,2τ x [4] i,τ where C cs is the (n a × n) zero constraints cross-sectional matrix, that can be seen as the coefficient matrix of a linear system with n a equations and n variables.
An example is a hierarchical time series where series at upper levels can be expressed by appropriately summing part or all of the series at the bottom level. Figure 1(a) shows the two-level hierarchical structure for three linearly constrained time series such that y T,t = y X,t + y Y,t , ∀t = 1, ..., T. Now let y t = [u t b t ] , where u t = [y 1,t , . . . , y n a ,t ] is the n a -vector of upper levels time series and b t = y (n a +1),t . . . y n,t is the n b -vector of bottom level time series with n = n a + n b . The upper and lower level time series are connected by the cross-sectional aggregation matrix A cs such that u t = A cs b t . Following Di Fonzo & , we can always construct a zero-constraints cross-sectional matrix from the aggregation matrix, C cs = [I n a −A cs ] . Finally, the cross-sectional structural matrix is given by S cs = A cs I n b , providing the structural representation  y t = S cs b t . Considering the hierarchical example in Figure 1(a), we have A cs = [1 1], In general there is no reason for u t to be restricted to simple sums of b t ; therefore A cs ∈ R n a ×n b may contain any real values, and not only 0s and 1s.
Considering now the temporal framework, we denote as K = {k p , k p−1 , . . . , k 2 , k 1 } the set of p factors of m, in descending order, where k 1 = 1 and k p = m . Given a factor k of m, and assuming that T = Nm (where N is the length of the most temporally aggregated version of the series), we can construct a temporally aggregated version of the time series of a single variable {y i,t } t=1,...,T , through the non-overlapping sums of its k successive values, which has a seasonal period equal to M k = m k : x where j = 1, . . . , N k , i = 1, . . . , n, N k = T k and x [1] i,j = y i,t . Define τ as the observation index of the most aggregate level k p . For a fixed temporal aggregation order k ∈ K, we stack the observations in the column vector x [k] i,τ = x [k] i,M k (τ−1)+1 x [k] i,M k (τ−1)+2 . . . x [k] i,M k τ , and obtain the vector for all the temporal aggregation orders . . . x [1] i,τ , τ = 1, . . . , N. The structural representation of the temporal hierarchy  is then x i,τ = S te x [1] i,τ , where S te = A te I m is the [(m + k * ) × m] temporal structural matrix, A te = 1 k p I m k p−1 ⊗ 1 k p−1 . . . I m k 2 ⊗ 1 k 2 is the (k * × m) temporal aggregation matrix with k * = ∑ k∈K\{k 1 } M k , and ⊗ is the Kronecker product. For each series x i,τ , i = 1, . . . , n, we have also the zero-constrained representation C te x i,τ = 0 [k * ×(m+k * )] , τ = 1, . . . , N, i = 1, . . . , n where C te = [I k * −A te ] is the [k * × (m + k * )] zero constraints temporal matrix. Figure 1(b) shows the hierarchical representation of a quarterly time series, for which m = 4, K = {4, 2, 1}.When we temporally aggregate each series, the cross-sectional constraints for the most temporally disaggregated series (1) hold for all the temporal aggregation orders such that C cs x [k] j = 0 (n a ×1) , for k ∈ K and j = 1, . . . , N k , where x n a , j is the n a -vector of upper time series and b [k] j = x [k] (n a +1), j . . . x [k] n, j is the n b -vector of bottom time series in the temporal hierarchy.
To include both cross-sectional and temporal constraints at the same time in a unified framework, we stack the series into a [n × (m + k * )] matrix X τ , whose rows and columns represent, respectively, the cross-sectional and the temporal dimension: X τ = x 1,τ . . . x n,τ = , where for any fixed k, U [k] τ is the (n a × N k ) matrix grouping the upper time series, B [k] τ is the (n b × N k ) matrix grouping the bottom time series. Further, C cs X τ = 0 [n a ×(m+k * )] and C te X τ = 0 (k * ×n) . We can consider the cross-temporal framework as a generalization of the cross-sectional and temporal frameworks, that simultaneously takes into account both types of constraints. The cross-sectional reconciliation approach proposed by  can be obtained by assuming m = 1, while the temporal one ) is obtained when n = 1 (with n a = 0 and n b = 1).
Di  show that the cross-temporal constraints working on the complete set of observations corresponding to time period τ can be expressed in a zeroconstrained representation through the full rank [(n a m + nk * ) × n(m + k * )] zero constraints cross-temporal matrix C ct such that where x τ = vec(X τ ) = [x 1,τ , . . . , x n,τ ] , C * = [0 (n a m×nk * ) I m ⊗ C cs ]P , and P is the commutation matrix (Magnus & Neudecker 2019, p. 54) such that Pvec(Y τ ) = vec(Y τ ). A structural representation can be considered as well: is the [n(k * + m) × n b m] cross-temporal summation matrix, s : R n b m → R n(m+k * ) is the operator describing the pre-multiplication by S ct , and b [1] τ ). In agreement with Panagiotelis et al. (2021), x τ lies in an (n b m)-dimensional subspace s ct of R n(k * +m) , which we refer to as the cross-temporal coherent subspace, spanned by the columns of S ct .

Optimal point forecast reconciliation
Let x h = vec( X h ), h = 1, . . . , H, be the h-step ahead base forecasts (however obtained) with error covariance matrix given by W h = Var( x h − x), where H is the forecast horizon for the most temporally aggregated time series. Denote h is the (n a × M k ) matrix grouping the upper time series and B [k] h is the (n b × M k ) matrix grouping the bottom time series for a given temporal aggregation order k. The matrix X h , contains incoherent forecasts, such as C ct x h = 0 [(n a m+nk * )×1] with h = 1, . . . , H and x h = vec( X h ). In this framework, the definition for forecast reconciliation in the crosssectional framework given by Panagiotelis et al. (2021) can be generalized as follows.
Definition 2.1. Forecast reconciliation aims to adjust the base forecast x h by finding a map- For a given forecast horizon h = 1, . . . , H, the mapping ψ may be defined as a projection onto s given by where M = I n(m+k * ) − Ω ct C ct (C ct Ω ct C ct ) −1 C ct , for a positive definite matrix Ω ct , and x h = vec( X h ). Wickramasuriya et al. (2019) showed that the minimum variance linear unbiased reconciled forecasts, satisfying the unbiased condition E( x h − x h ) = 0, has solution (5) when Alternatively, the cross-temporal reconciled forecasts X h may be found according to the structural approach proposed by  for the cross-sectional framework, yielding x h = S ct G x h for some matrix G. Wickramasuriya et al. (2019) showed that this leads to a solution equivalent to the cross-temporally reconciled forecasts in (5), given by where G = (S ct Ω −1 ct S ct ) −1 S ct Ω −1 ct , and M = S ct G. In this case, ψ is the composition of two transformations, say s • g, where g : R n(m+k * ) → R n b m is a continuous function. In Appendix A we report some cross-sectional, temporal and cross-temporal approximations for the covariance matrix to be used in (5) and (6).

Cross-temporal bottom-up forecast reconciliation
The classic bottom-up approach (Dunn et al. 1976, Dangerfield & Morris 1992 simply consists in summing-up the base forecasts of the most disaggregated level in the hierarchy to obtain forecasts of the upper-level series. To reduce the computational cost involved in optimal cross-temporal reconciliation, we may be interested in applying a reconciliation along only one dimension (cross-sectional or temporal) and reconstructing the cross-temporal structure using a partly bottom-up approach , Sanguri et al. 2022).  reconciled forecasts at the highest frequency (k = 1), and then apply temporal bottomup to obtain coherent cross-temporal forecasts. On the right (Figure 2b), we first compute temporally reconciled forecasts for the most disaggregated cross-sectional level, and then apply the cross-sectional bottom-up. We denote these two-step reconciliation approaches, respectively, as ct(rec te , bu cs ), and ct(rec cs , bu te ), where 'rec te ' and 'rec cs ' denote a forecast reconciliation approach in the temporal and cross-sectional dimensions and, 'bu cs ' and 'bu te ' denote using bottom-up in the cross-sectional and temporal dimensions, respectively. It is worth noting that the simple cross-temporal bottom-up approach corresponds to ct(bu cs , bu te ) = ct(bu te , bu cs ) = ct(bu).

Probabilistic forecast reconciliation
To introduce the idea of coherence and probabilistic forecast reconciliation, we adapt the notations and the formal definitions introduced in Wickramasuriya (2023) and Panagiotelis et al. (2023) for the cross-sectional probabilistic case. These definitions can also be generalized to the cross-temporal framework by following the approach developed by Corani et al. (2022) for count data. However, in this paper we only focus on the continuous case.
Our aim is to extend these definitions to cross-temporal coherent probabilistic forecasts and cross-temporal probabilistic forecast reconciliation. Let (R n b m , F R n b m , ν) be a probability space for the bottom time series b [1] τ , where F R n b m is the Borel σ-algebra on R n b m . Then a σ-algebra F s can be constructed from the collection of sets s(B) for all B ∈ F R n b m . Definition 3.1 (Cross-temporal coherent probabilistic forecasts). Given the probability space (R n b m , F R n b m , ν), we define the coherent probability space as the triple (s, F s ,ν) satisfying the following property:ν(s(B)) = ν(B), ∀B ∈ F R n b m .
Let (R n(m+k * ) , F R n(m+k * ) ,ν) be a probability space referring to the incoherent probabilistic forecast ( x h ) for all the n series in the system at any temporal aggregation order k ∈ K.
The map ψ may be obtained as the composition s • g, as for the cross-temporal point reconciliation (6).
Theorem 3.1 is the cross-temporal extension of Theorem 4.5 in Panagiotelis et al. (2023). It means that a sample from the reconciled distribution can be obtained by reconciling each member of a sample from the incoherent distribution. With this result, we can separate the mechanism used to generate the base forecasts samples from the reconciliation phase.

Parametric framework: Gaussian reconciliation
It is possible to obtain a reconciled probabilistic forecast analytically for some parametric distributions, such as the multivariate normal (Corani et al. 2021, Eckert et al. 2021, Panagiotelis et al. 2023, Wickramasuriya 2023). In the cross-sectional framework, Panagiotelis et al. (2023) show that, starting from an elliptical distribution for the base forecasts, the reconciled forecast distribution is also elliptical. Using the results shown in Section 2, we extend 2 this results to the cross-temporal case. To obtain a reconciled forecast using the multivariate normal distribution, we start with a base forecast distributed as N ( x, Σ), where x is the mean vector and Σ is the covariance matrix of the base forecasts. The reconciled forecast distribution is then given by N ( x, Ω), where where M is the projection matrix defined in (5). Note that if we assume that Σ = Ω ct , then the covariance matrix in (8) simplifies to Ω = MΩ ct . In the cross-temporal case, sensibly estimating the covariance matrix Σ can be difficult because we need to simultaneously consider both the temporal and cross-sectional structures. This requires many parameters to be estimated, which can be challenging in practice. Additionally, naively using one-step residuals to estimate the cross-temporal correlation structure can lead to an inappropriate estimate of the covariance matrix 3 . These challenges will be explored in more depth in the following sections.
Focusing on the computational aspect 4 , we can take several steps to reduce the time required to obtain simulations from the reconciled forecast distribution. For example when 2 We assume H = 1 and simplify the notation by removing the h suffix without loss of generality 3 In particular, some temporal covariances are fixed to zero (see Appendix C for more details). 4 We use two R packages to sample from a the base forecast gaussian distribution: MASS (Venables & Ripley 2002) and

Non-parametric framework: bootstrap reconciliation
Analytical expressions for the base and reconciled forecast distributions are sometimes challenging to obtain. Furthermore parametric assumptions can be restrictive and unrealistic.
We propose a procedure called cross-temporal joint (block) bootstrap (ctjb) to generate samples from the base forecast distributions that preserve cross-temporal relationships. This approach involves drawing samples of all series simultaneously from the most temporally aggregated level, and using the most temporally aggregated level to determine the corresponding time indices for the other levels.
Let E [k] be the (n × N k ) matrix of the residuals for k ∈ K. Figure 4 (on the left) provides a visualization of these matrices and how they are related to each other for the example in Figure 1. It is assumed that the residuals cover four years (N = 4): the green color corresponds to the first year, the blue to the second year, and so on. Further, let M i be the model used to calculate the base forecasts and residuals for the i th series. Assuming H = 1, τ is a random draw with replacement from 1, . . . , N and the th bootstrap incoherent sample is i ), where f i (·) depends on the fitted model M i . That is, x [k] i,l is a sample path simulated for the i th series with error approximated by the corresponding block bootstrapped On the left there are the residual matrices with 4 years of data (N = 4): the green, blue, red and black colors correspond, respectively, to years 1, 2, 3 and 4. On the right the bootstrapped residuals are represented.
sample residual e [k] i , the i th row of Figure 4 (on the right) shows E [k] τ for the quarterly cross-temporal hierarchy in Figure 1. One of the main advantages of the cross-temporal joint bootstrap is that it allows us to accurately account for the dependence between the different levels of temporal aggregation and not only the cross-sectional dependencies. By sampling residuals from the most temporally aggregated level and using it to determine the indices for the other levels, we can ensure that the bootstrap sample reflects the underlying data distribution. Additionally, the cross-temporal joint bootstrap is easy to implement for many forecasting models, making it a practical and efficient tool. Furthermore, this approach is easily scalable in order to utilize multiple computing power simultaneously for each individual series. This can be especially useful when dealing with large datasets or when trying to speed up the analysis process.

Cross-temporal covariance matrix estimation
As the covariance matrix Ω is unknown in practice, a natural estimate is the empirical sample covariance matrix of the base forecasts Ω. In this section, our focus will be exclusively on the cross-temporal framework., this means that we have to estimate r = n(k * + m)[n(k * + m) − 1]/2 different parameters. A possible solution to estimating many parameters when we have fewer observations than r, is to construct a shrinkage estimator (Efron 1975, Efron & Morris 1975, 1977, using a convex combination of Ω and a diagonal target matrix Ω D = Ω I n(k * +m) , such that Ω G = λ Ω D + (1 − λ) Ω, where λ ∈ [0, 1] is the shrinkage intensity parameter that can be estimate using the unbiased estimator proposed by    zero. Ω G corresponds to the matrix used by the reconciliation approach oct(shr) shown in Appendix A. However, shrinking all off-diagonal elements to zero, when we know that the covariance matrix has a cross-sectional and/or temporal structure, results in information loss.
Therefore, we propose to estimate a smaller matrix, and to use the cross-sectional and/or temporal structure to obtain a better estimator for the covariance matrix of the entire system.
Given that S ct = S cs ⊗ S te , it is possible to express the actual covariance matrix in terms of three smaller matrices such that where Ω hf-bts is the (n b m × n b m) covariance matrix for the bottom time series at temporal aggregation level k = 1 (highest frequency bottom time series), Ω hf is the (nm × nm) covariance matrix related to all the high frequency time series and Ω bts is the [n b (k * + m) × n b (k * + m)] covariance matrix related to bottom time series at any temporal aggregation.
Therefore, we can apply the idea of "Stein-type shrinkage" (Efron & Morris 1977) to Ω hf-bts , Ω hf and Ω bts by using the corresponding empirical base forecasts residuals estimation. We obtain the following expressions (see Appendix B for details): • High frequency Bottom time series shrinkage matrix (HB): Ω HB = λS ct Ω hf-bts,D S ct + (1 − λ)S ct Ω hf-bts S ct ; • High frequency shrinkage matrix (H): : where Ω l,D = I n b m Ω j , l = {hf-bts, hf, bts}, and λ is the shrinkage parameter.
Another important aspect is the number of parameters to be estimated through the residuals of the base forecasts. In Table 1 we report the number of different parameters for the two forecasting experiment: Australian GDP (see Section 5) and Australian Tourism Demand (see Section 6). In addition, we also calculate the percentage reductions in the number of parameters compared to the global approach. As we can see, G involves a considerably large number of parameters compared to other estimators. HB leads to the largest decrease of around 85%, whereas approaches H and B lie somewhere between G and HB. In general, as m and n increase (see Appendix B), using H requires the estimation of less parameters than B.
In the forecasting experiments that follow and in the simulation in Appendix C, we closely analyze these different constructions with a dual purpose. In particular, we use the full covariance matrix (λ = 0) of the base forecasts to obtain base forecast samples of the linearly constrained time series under Gaussianity. We also use the shrinkage versions as approximations of the covariance matrix to be used for reconciliation. This will allow us to better understand the properties and abilities of each parameterization.

Multi-step residuals
Model residuals may be used to estimate the covariance matrix in cross-temporal forecast reconciliation. In time series analysis, it is common to use residuals corresponding to onestep ahead forecasts. However, due to the temporal dimension in our setting, residuals corresponding to different forecast horizons are required. Thus, we define multi-step residuals i,j+h|j , where i = 1, . . . , n, j = 1, . . . , N k and x [k] i,j+h|t is the h-step fitted value, calculated as the h-step-ahead forecast using data up to time j. In general, these residuals will be autocorrelated except when h = 1.
Following Di Fonzo & Girolimetto (2023a), we use a matrix organization of the residuals similar to the one for the base forecasts in Section 2.1. Specifically, let N be the total number of observations for the most temporally aggregate time series. Then, the N k -vectors of multi-step residuals for the temporal aggregation k and the series i, e [k] i,h = e To better understand the properties of the proposed alternatives, a simulation study was performed, and the results are shown in Appendix C. We have studied the effect of combining cross-sectional and temporal aggregations using a simple hierarchy, where the small size and nature of the data generating process make it possible to exactly calculate the true crosstemporal covariance structure, thus providing insights into the nature of the time series data involved in the forecast reconciliation process. We find that simulating base forecasts from multi-step residuals allows for a more accurate estimation of the covariance matrix and that reconciliation further improves the forecast accuracy.

Overlapping residuals
Another issue that arises in the case of cross-temporal reconciliation is the low number of available residuals, especially for the higher orders of temporal aggregation. A possible solution is to use residuals calculated using overlapping series by allowing the year to have a varying starting time. To better explain how to calculate overlapping residuals, assume we have a single series y = [y 1 y 2 y 3 . . . y T−1 y T ] . We can construct k non overlapping For example, suppose we have a biannual series with k = 2 and T = 6, then we can construct two annual time series depending on which time is deemed the start of the year: ,1 2 = y 2 + y 3 y 4 + y 5 . To calculate overlapping residuals, we propose the following steps: 1. Fit a model to x [k],0 (i.e., select an appropriate model and estimate the model parameters using the available data) and calculate the residuals.
2. Apply the same model in step 1 to x [k],s for s = 1, . . . , k − 1, without re-estimating the parameters, and calculate the residuals.
The resulting residuals can be used to estimate the covariance matrix in cross-temporal forecast reconciliation. This increases the number of available residuals, particularly when working with higher frequency observations such as monthly or daily data. It is important to note that this approach assumes that the model used in step 1 is appropriate for all the different series x [k],s . Some seasonal models will not be appropriate as the seasonal pattern will be shifted for different values of s. However, this will not affect seasonal ARIMA models as the seasonality is defined in terms of lags which are unaffected by the value of s.

Forecasting Australian GDP
The Australian Quarterly National Accounts (QNA) dataset has been widely studied in the literature on forecast reconciliation ). Building on these results (see Appendix D.1), we now consider cross-temporally reconciled probabilistic forecasts.
We use univariate ARIMA models 5 to obtain quarterly base forecasts for the n = 95 QNA time series, spanning the period 1984:Q4 -2018:Q1, defining GDP from both the Income and Expenditure sides. We perform a rolling forecast experiment with an expanding window: the first training sample spans the period 1984:Q4 to 1994:Q3, and the last ends in 2017:Q1, for a total of 91 forecast origins. For the temporal aggregation dimension we aggregate the quarterly data to both semi-annual and annual. We obtain 4-step, 2-step and 1-step ahead base forecasts respectively from the quarterly, semi-annual and annual frequencies, i.e., K = {4, 2, 1}.
The base forecast samples in the Gaussian case are obtained using the sample covariance matrices with the Global (G) and High frequency (H) parameterization (Section 4), since it is 5 We use the auto.arima function from the R package forecast ).
ct( · , bu te ) Partly bottom-up (Section 2.2) starting from cross-sectional reconciled forecasts using the shr and wls approaches (see Appendix A).
ct(wlsv te , bu cs ) Partly bottom-up (Section 2.2) starting from temporally reconciled forecasts using the wlsv approach (see Appendix A).

oct( · )
Optimal cross-temporal reconciliation for the ols, struc, wlsv and bdshr approaches (see Appendix A). One-step residuals were used with wlsv and bdshr.
Optimal cross-temporal reconciliation with multi-step residuals (see oct oh ( · ) Optimal cross-temporal reconciliation with overlapping and multi-step residuals (see Section 4.1 and 4.2) for the approaches presented in Section 4: shr stands for Global shrinkage, hshr for High frequency shrinkage. not possible to identify a unique representation for the other cases 6 . We compare the results obtained using multi-step residuals with and without overlapping, in order to measure the benefit of obtaining overlapping residuals. In the non-parametric case, we use the crosstemporal joint bootstrap (ctjb) presented in Section 3.2. Finally, to reconcile the resulting (1000) base forecasts samples, we have applied the following techniques 7 (see Table 2): ct(shr cs , bu te ), ct(wls cs , bu te ), oct o (wlsv), oct o (bdshr), oct oh (shr) and oct oh (hshr).

The accuracy of the probabilistic forecasts is evaluated using the Continuous Ranked
Probability Score (CRPS, Gneiting & Katzfuss 2014), which is an index that considers the single series and provides us a marginal evaluation of the approaches. In addition, we employ the Energy Score (ES, Gneiting & Katzfuss 2014), that is the CRPS extension to the multivariate case, to evaluate the forecasting accuracy for the whole system (Panagiotelis et al. 6 When simultaneously considering Income and Expenditure sides hierarchies, the result is a general linearly constrained time series, where bottom and upper time series are not uniquely defined, making unfeasible the cross-sectional bottom-up reconciliation approach (Di Fonzo & Girolimetto 2022c). 7 In Appendix D.2, we show the results with shrunk covariance matrices. We also report the results obtained using one-step residuals in the reconciliation.   (10) and (11) for the Australian QNA dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2. 2023, Wickramasuriya 2023). In particular, we consider the geometric mean of the relative CRPS (Fleming & Wallace 1986), and the relative ES:

Generation of the base forecasts sample paths
where j denotes the reconciliation approach and s indicates the approach used to simulate the base forecasts. As a reference approach (s = 0 and j = 0), we consider the base forecasts produce by the Bootstrap approach. If we consider all the temporal aggregation orders (i.e. ∀k ∈ K), the overall accuracy indices are given by, respectively, (11) Tables 3 and 4, respectively. As a benchmark approach, we use the base forecasts calculated using the bootstrap method. For base forecasts, we find that using a parametric approach with the normal distribution performs better than the non-parametric bootstrap approach. This is likely due to the limited number of residuals available for bootstrapping, which does not allow for sufficient exploration of the data. Directly specifying diagonal covariance matrices seems to be more effective than shrinking to a target covariance matrix. Among all the procedures, ct(wls cs , bu te ) and oct o (wlsv) show the greatest relative gains. In contrast, oct oh (shr) and   (10) and (11) for the Australian QNA dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2. oct oh (hshr) do not show much improvement. Furthermore, the greatest improvements are observed for higher temporal aggregation levels.

Forecasting accuracy indices based on CRPS and ES are presented in
We utilize the non-parametric Friedman test and the post hoc "Multiple Comparison with the Best" (MCB) Nemenyi test (Koning et al. 2005, Kourentzes & Athanasopoulos 2019, Makridakis et al. 2022 to determine if the forecasting performances of the different techniques are significantly different from one another. Figure 5 presents the MCB using the CRPS. The probabilistic forecasts from ct(wls cs , bu te ) and oct o (wlsv) are significantly better than the base forecasts at any level of aggregation.
Overall, we find that using overlapping residuals almost always leads to a greater improvement in terms of both ES and CRPS. Forecasts at the most aggregated level (year) seem to benefit the most from reconciliation, and using one-step overlapping residuals appears to be sufficient to improve forecasts if the generation of the base forecasts sample paths takes into account the multi-step structure.

Forecasting Australian Tourism Demand
The Australian Tourism Demand dataset (Wickramasuriya et al. 2019)   Six of these zones (see Table E.14 in Appendix E) are each formed by a single region, resulting in a total of 105 unique nodes in the hierarchy.  Table 7 from Wickramasuriya et al. (2019). This data can be temporally aggregated into 2, 3, 4, 6, or 12 months (K = {12, 4, 3, 2, 1}).
We perform a rolling forecast experiment with an expanding window. The process begins by using the first 10 years, from January 1998 to December 2008, to generate forecasts for the entire following year (2009). Then, the training set is increased by one month. This process is repeated until the last training set is used (January 1998 to December 2015) with a total of 85 different test sets. For the temporal aggregation dimension we aggregate the monthly data up to annual data. We obtain 12-step, 6-step, 4-step, 3-step, 2-step and 1-step ahead base forecasts respectively from the monthly data and the aggregation over 2, 3, 4, 6, and 12 months. ETS models selected by minimizing the AICc criterion ) are fitted to the log-transformed data, with the resulting base forecasts being back-transformed to produce non-negative forecasts (Wickramasuriya et al. 2020).
The (1000) base forecast samples are obtained using the Gaussian approach with sample 8 covariance matrices (Section 4) using multi-step residuals 9 and the bootstrap approach (Section 3.2). For reconciliation, 11 different approaches have been adopted (see Table 2 (10) and (11) for the Australian Tourism Demand dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2. non-negativity constraints starting from an incoherent sample simulated from a Gaussian distribution. Finally, to evaluate the performance, we employ the Continuous Ranked Probability Score (CRPS), the Energy Score (ES), and the "Multiple Comparison with the Best" (MCB) Nemenyi test, introduced in Sections 5 and 5.1.

Results
The CRPS and ES indices are shown, respectively, in Tables 6 and 7 for monthly, quarterly and annual forecasts 10 . These tables are divided by different temporal levels and each column uses a different approach to calculate the base forecasts, referred to as "base". The bootstrap method is used as a benchmark to calculate the accuracy indices.    (10) and (11) for the Australian Tourism Demand dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2. different covariance matrix estimates for Gaussian base forecasts, there are no big differences.
For this reason, using only the high frequency bottom time series can be useful to estimate fewer parameters and reduce the initial high dimensionality.
In the Gaussian case, bottom-up ct(bu) and partly bottom-up techniques like ct(shr cs , bu te ) and ct(wlsv te , bu cs ) lead to better results than the benchmark (bootstrap base forecasts).
However, it's not always guaranteed that the improvement is higher than the starting base forecasts (by comparing the value of each column). This is particularly true for higher levels of temporal aggregation (see Appendix E.2 for details). Overall, oct(bdshr) in terms of CRPS is almost always the best. The shrinkage approach oct h (hshr) performs well in the bootstrap case: it is competitive with oct(bdshr) at lower temporal frequency (k ∈ {2, 1}) and it is able to improve for k ≥ 3. In terms of ES, oct(bdshr) is still competitive, although it does not always show the best relative performance. In this case, approaches that attempt to explicitly take into account temporal and cross-sectional relationships, such as oct h (hbshr) and oct h (bshr), perform better. It is also worth noting that techniques that don't make use of residuals like oct(ols) and oct(struc) prove to be competitive by consistently improving on the base forecasts in terms of both CRPS and ES. Gaussian approach (multi−step residuals, HB) Figure 6: MCB Nemenyi test for the Australian Tourism Demand dataset using CRPS at different temporal aggregation levels for the Gaussian (multi-step residuals, HB) and the non-parametric bootstrap approaches. In each panel, the Friedman test p-value is reported in the lower right corner. The mean rank of each approach is shown to the right of its name. Statistical differences in performance are indicated if the intervals of two forecast reconciliation procedures do not overlap. Thus, approaches that do not overlap with the blue interval are considered significantly worse than the best, and vice-versa. Figure 6 shows the MCB using the CRPS for the Gaussian approach using multi-step residuals (HB) and the non-parametric bootstrap approach. In general, the partly bottom-up procedure improves with respect to base forecasts at monthly level, but optimal crosstemporal procedures are always better. In the bootstrap framework, we can identify a group of three procedures, oct(bdshr), oct(hshr) and oct(struc) that are almost always in the group of the best approaches (denoted by the blue dot). In the Gaussian framework, oct(wlsv), oct(struc), and oct(bdshr) are always significantly better than base forecasts and equivalent in terms of results for temporal aggregation orders greater than 2. For monthly series, oct (bdshr) is always significantly better than all other approaches.

Conclusion
In this paper, we extend the probabilistic reconciliation setting developed by Panagiotelis et al. (2023) for the cross-sectional case to the cross-temporal framework. Through appropriate notation, we show how theorems and definitions valid for the cross-sectional case can be reinterpreted and extended. The general notation proposed can help investigate extensions following different probabilistic approaches, such as those in Jeon et al. (2019), Ben Taieb et al. (2021) and Corani et al. (2022). We propose a Gaussian and a bootstrap approach to simulate the base forecasts able to take into account both cross-sectional and temporal relationships simultaneously, opening the way for further research into cross-temporal probabilistic forecasting.
Moreover, we analyze the use of residuals, showing that one-step residuals fail to capture the temporal structure, and we propose multi-step residuals that can fully capture the crosstemporal relationships. In Appendix C, we present a simple simulation to investigate using the different types of residuals. When dealing with covariance matrices (due to the highdimensionality of the cross-temporal setting), we propose four alternative forms to reduce the number of parameters to be estimated, showing that the overlapping residuals may reduce the high-dimensionality burden by increasing the number of available residuals. These ideas are worth requiring further investigation in future works.
Finally, we perform empirical applications on two datasets commonly used in forecast reconciliation research: Australian GDP from Income and Expenditure sides and Australian Tourism Demand. We find that in both cases optimal cross-temporal reconciliation approaches significantly improve on base forecasts. We also compare these with partly bottom-up techniques that use uni-dimensional reconciliations (either cross-sectional or temporal) and confirm that simultaneously exploiting both dimensions in reconciliation produces better results, especially at higher levels of temporal aggregation. In conclusion, reconciliation approaches can play an important role to improve the accuracy of forecasts in a probabilistic framework while achieving the important attribute of producing coherent forecasts.   oct(ols) -identity: Ω ct = I n(k * +m) .
oct(wlsv) -series variance scaling: Ω ct = Ω ct,wlsv , a straightforward extension of the series variance scaling matrix presented by .
oct(bdshr) -block-diagonal shrunk cross-covariance scaling: Ω ct = P W BD ct,shr P , with W BD ct,shr a block diagonal matrix where each k−block (k = m, k p−1 , . . . , 1) is shr , W [k] shr is the shrunk estimate of the cross-sectional covariance matrix proposed by Wickramasuriya et al. (2019), and P is the commutation matrix such that oct(shr) -MinT-shr: Ω ct =λ Ω ct,D + (1 −λ) Ω ct , whereλ is an estimated shrinkage coefficient , Ω ct,D = I n(k * +m) Ω ct with denoting the Hadamard product, and Ω ct is the covariance matrix of the cross-temporal one-step ahead in-sample forecast errors.

B Alternative forms of the cross-temporal covariance matrix
In this appendix, some derivations of the solutions proposed in Section 4 to obtain an estimator of the cross-temporal covariance matrix are reported. Starting from the the definition of cross-temporal covariance matrix we obtain the first equivalence in (10). Therefore, we have that λ Ω hf-bts,D + (1 − λ) Ω hf-bts ⇓ Ω HB = S ct λ Ω hf-bts,D + (1 − λ) Ω hf-bts S ct = λS ct Ω hf-bts,D S ct + (1 − λ)S ct Ω hf-bts S ct .
The high-frequency time series representation (the second equivalence) can be derived in the following manner: where Ω hf = (S cs ⊗ I m+k * ) Ω hf-bts (S cs ⊗ I m+k * ) and S ct = S cs ⊗ S te = (I n ⊗ S te ) (S cs ⊗ I m+k * ).
We can apply the shrinkage estimator as The bottom time series representation (the third equivalence) follows by Ω = S ct Ω hf-bts S ct = (S cs ⊗ S te ) Ω hf-bts (S cs ⊗ S te ) = (S cs ⊗ I m+k * ) (I n ⊗ S te ) Ω hf-bts (I n ⊗ S te ) (I n ⊗ S te ) = (S cs ⊗ I m+k * ) Ω bts (S cs ⊗ I m+k * ) , where Ω bts = (I n ⊗ S te ) Ω hf-bts (I n ⊗ S te ) and S ct = S cs ⊗ S te = (S cs ⊗ I m+k * ) (I n ⊗ S te ). Finally we have that In general, the covariance matrix of the reconciled forecasts is equal to M ΩM where M = S ct G is the projection matrix. When using the HB approach, the covariance matrix of the reconciliation and the base forecasts will be identical. Indeed, it can be shown (see

C Monte Carlo simulation
We study the effect of combining cross-sectional and temporal aggregations, using a simple hierarchy that allows us to effectively visualize the quantities involved, such as the covariance matrix. Additionally, the small size and nature of the data generating process make it possible to exactly calculate the true cross-temporal covariance structure, thus providing insights into the nature of the time series data involved in the forecast reconciliation process.
Consider a 2-level hierarchical structure with three time series (one upper series, A, and two bottom series, B and C) such that the cross-sectional aggregation matrix is A cs = [1 1] (A = B + C). The temporal structure we are considering is equivalent to using semi-annual data with K = {2, 1} and m = 2. The assumed Data-Generating Processes (DPG) for the semi-annual bottom level series are two AR(2) given by y B,t = φ B,1 y B,t−1 + φ B,2 y B,t−2 + ε B,t y C,t = φ C,1 y C,t−1 + φ C,2 y C,t−2 + ε C,t Additionally, for λ = 1, the cells corresponding to a zero value are colored white. with The error ε t = [ε B,t ε C,t ] driving the process is drawn from a multivariate normal distribution with standard deviations simulated from a uniform distribution between 0.5 and 2 and a fixed correlation of -0.8. The cross-sectional error covariance matrix is thus given by To obtain the remaining series, the bottom series are then cross-temporally aggregated.
For the forecast experiment, the base forecasts are computing using AR models where the order is automatically determined by the algorithm proposed by Hyndman & Khandakar (2008) and implemented in the R package forecast , thus allowing for possible mis-specification in the models. The training window length is 500 years, consisting of 1000 high frequency observations. The experiment is replicated 500 times, with a forecast horizon of 1 year.
Since the AR(2) models used as DPG for the bottom series B and C at the most disaggregated temporal level are known, we may compute the true covariance matrix for one-step ahead 1 The φ B and φ C parameters are estimated from the "Lynx" and "Hare" time series contained in the pelt dataset of  forecasts at the annual level Ω ct = S ct Ω hf-bts S ct , where The detailed calculations can be found in Section C.2. Figure C.3 shows both the covariance matrix and the correlation matrix for fixed parameters.
To construct cross-temporal samples of the base forecasts, we use the Gaussian and bootstrap approaches discussed in Sections 3.1 and 3.2, respectively. For the parametric approach we use multi-step residuals with the different covariance matrix structures analyzed in 4.1, while for the non-parametric approach, we use regular one-step residuals. We do not use overlapping residuals in our analysis as we have the advantage of generating a large number of observation.

C.1 Covariance matrix comparison and forecast accuracy scores
To compare the true covariance matrix Ω ct with the estimated covariance matrix Ω, we use the Frobenius norm to quantify the difference between two matrices: where D = Ω − Ω ct . The true covariance matrix, shown in Figure C.3, was compared to the estimated covariance matrices obtained using various reconciliation approaches and techniques for generating sample paths of the base forecasts. Thus, we should be able to determine which reconciliation approach and simulation technique produce an accurate estimate of the covariance matrix. Other types of matrix norms were also considered with similar results.  From Table C.2, it appears that the reconciled covariance matrices are always closer to the true matrix than the base forecast matrix when using both the Gaussian and the bootstrap approach. Overall, there are no major differences in the findings when using either one-step or multi-step residuals in cross-temporal forecast reconciliation. In fact, using approaches like oct(bdshr), we obtain results that are consistent with approaches such as oct h (shr), where no temporal and/or cross-sectional correlation assumptions are imposed. It is worth noting that the HB covariance matrix when used to calculate the base forecasts samples, is not changed by the reconciliation step (see Appendix B). In conclusion, our results suggest that using multi-step residuals or bootstrap techniques may help find a "good" estimate of the covariance matrix, which can be further improved by the reconciliation.
A limitation of this simulation setting is that we are using a high number of residuals, which may result in undervaluing the benefit from using the parameterization form of the covariance matrix such as HB, H, or B (see Section 4). Additionally, shrinkage techniques often yield very similar results when we use the corresponding matrix with λ = 0 (full covariance matrix). Sections   5 where low values indicate better quality of the forecasts. The good performance of the ct(bu) approach can be explained by a good quality of the base forecasts at the bottom level for k = 1, and therefore it is difficult for the other approaches to correctly adjust them using the somewhat less good forecasts of the higher temporal and cross-sectional levels. This also explains the good performance of ct(shr cs , bu te ), which by definition only takes into account the information provided by the most temporally disaggregated base forecasts. Looking at the optimal cross-temporal reconciliation approaches, it does not seem to be any advantage in using multi-step residuals to calculate the covariance matrix in the reconciliation step.  Table C.3: Simulation experiment. RelCRPS defined in Section 5. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.

In Tables C.3 and C.4 are reported the RelCRPS and ES ratio indices introduced in
In conclusion, we found that simulating base forecasts from multi-step residuals allows us to estimate a covariance matrix close to the true one. Additionally, we observed that reconciliation could be used to further improve the accuracy of these estimates: accurate base forecasts for k = 1 assist the good performance for bottom-up and optimal cross-temporal reconciliation approaches, such as oct(wlsv) and oct(bdshr), which perform well in terms of both CRPS and ES.

C.2 Cross-temporal covariance matrix
We assume two AR(2) processes with correlated errors such that , Ω cs and i ∈ {B, C}. Let y i,T+h be the true observation for the i th series and y i,T+h the corresponding forecasts such that y i,T+1 = φ i,1 y i,T + φ i,2 y i,T−1 + ε i,T+1 y i,T+2 = φ i,1 y i,T+1 + φ i,2 y i,T + ε i,T+2 and y i,T+1 = φ i,1 y i,T + φ i,2 y i,T−1 y i,T+2 = φ i,1 y i,T+1 + φ i,2 y i,T , then y i,T+1 − y i,T+1 = ε i,T+1  Table C.4: Simulation experiment. ES ratio indices defined in Section 5. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.
In conclusion, and Ω ct = S ct Ω hf-bts S ct .

C.3 One-step residuals and shrinkage covariance matrix
In Section 4.1, we discussed the use of one-step residuals in estimating the covariance matrix.
In particular we point out that one-step residuals lead to a biased estimate of the covariance matrix where some correlation are zeros by definition (see Figure C.4). In addition, Tables C.5, C.6 and C.7 show the Frobenius norm, CRPS, and ES skill scores as explained in the paper to investigate the effectiveness of one-step residuals. Moreover, in Tables C.8 and C.9, we have utilized a shrinkage matrix rather than the sample covariance matrix to assess the performance of our approach.  Table C.5: Frobenius norm between the true and the estimated covariance matrix for different reconciliation approaches and different techniques for simulating the base forecasts. Entries in bold represent the lowest value for each column, while the blue entry represent the global minimum. The reconciliation approaches are described in Table 2.    Table C.6: Simulation experiment. RelCRPS defined in Section 5. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.  ( Table C.7: Simulation experiment. ES ratio indices defined in Section 5. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.   Table C.8: Simulation experiment. RelCRPS defined in Section 5. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.   Table C.9: Simulation experiment. ES ratio indices defined in Section 5. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.

Generation of the base forecasts paths
D Forecast reconciliation of the Australian GDP dataset D.1 The dataset  proposed using state-of-the-art forecast reconciliation methods to improve the accuracy of macroeconomic forecasts and facilitate aligned decision-making. In their empirical analysis, they applied cross-sectional forecast reconciliation to 95 Australian QNA time series that represent the Gross Domestic Product (GDP) calculated using both the income and expenditure approaches. These two approaches correspond to two distinct hierarchical structures, with GDP at the top and 15 lower-level aggregates in the income approach, and GDP as the top-level aggregate in a hierarchy of 79 time series in the expenditure approach (for more information, see Athanasopoulos et al. 2020, pp. 702-705 Table D.10: RelCRPS indices defined in Section 5 for the Australian QNA dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.  Table D.11: ES ratio indices defined in Section 5 for the Australian QNA dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.  Table D.12: RelCRPS indices defined in Section 5 for the Australian QNA dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.  Table D.13: ES ratio indices defined in Section 5 for the Australian QNA dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.  Table E.15: RelCRPS defined in Section 5 for the Australian Tourism Demand dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.   Table E.16: ES ratio indices defined in Section 5 for the Australian Tourism Demand dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.  Table E.18: ES ratio indices defined in Section 5 for the Australian Tourism Demand dataset. Approaches performing worse than the benchmark (bootstrap base forecasts, ctjb) are highlighted in red, the best for each column is marked in bold, and the overall lowest value is highlighted in blue. The reconciliation approaches are described in Table 2.