Flexible global forecast combinations

Forecast combination -- the aggregation of individual forecasts from multiple experts or models -- is a proven approach to economic forecasting. To date, research on economic forecasting has concentrated on local combination methods, which handle separate but related forecasting tasks in isolation. Yet, it has been known for over two decades in the machine learning community that global methods, which exploit task-relatedness, can improve on local methods that ignore it. Motivated by the possibility for improvement, this paper introduces a framework for globally combining forecasts while being flexible to the level of task-relatedness. Through our framework, we develop global versions of several existing forecast combinations. To evaluate the efficacy of these new global forecast combinations, we conduct extensive comparisons using synthetic and real data. Our real data comparisons, which involve forecasts of core economic indicators in the Eurozone, provide empirical evidence that the accuracy of global combinations of economic forecasts can surpass local combinations.


Introduction
Forecast combinations-aggregations of multiple individual forecasts-are one of the most persistently reported empirical successes in forecasting.As a key economic institution, the European Central Bank elicits economic forecasts every quarter for the Eurozone from more than one hundred forecasters, an exercise known as the Survey of Professional Forecasters (SPF).Each forecaster has unique expertise, and some possess private information, so combining is a means to a more accurate and robust projection of the economy than any one forecaster could alone produce.For this reason, the Federal Reserve Bank of Philadelphia runs a similar survey by the same name for the United States.Exactly how to combine forecasts from these surveys is a long-standing problem.gradient boosted trees.The trees were grown on thousands of time series, enabling weights to be learned across tasks.Though similar, their problem is distinct from the economic forecast combination problem that is the main focus of this paper.Whereas Montero-Manso et al. (2020) combined a small number of forecasts for a large number of tasks drawn independently from a large pool, we combine a large number of forecasts for a small number of related tasks.Elaborate approaches involving boosted trees are not feasible in our setting.
In light of the preceding discussion, this paper proposes a new framework for globally combining forecasts.
Our framework minimises a global loss function comprised of individual forecasting tasks.The framework is flexible to the level of relatedness among the different tasks.Specifically, using a task-coupling penalty, we interpolate between fully local combination, where all tasks are heterogeneous, and fully global combination, where all tasks are homogeneous.The best interpolation is determined in a data-driven fashion.Via this framework, we 'globalise' the weighting schemes of Bates and Granger (1969), Conflitti et al. (2015), and Matsypura et al. (2018).We then evaluate the new global combinations in both simulation and an application to expert forecasts from the European Central Bank SPF. 1 The results indicate neither fully local nor fully global combination uniformly performs best.Instead, combinations that lie somewhere between these extremes typically lead to the best out-of-sample performance.We also show the benefits of our framework on model-based forecasts of economic and financial time series from the M4 Competition in Appendix D.
The paper is organised into six sections.Section 2 introduces the proposed framework for globally combining forecasts.Section 3 addresses computation of the new combinations.Section 4 presents numerical experiments that gauge the benefits of globalisation.Section 5 describes empirical comparisons of the new methods in application.Section 6 closes the paper.Proofs are in Appendix A, additional synthetic data experiments in Appendix B, and additional empirical results in Appendix C.

Single-task forecast combination
To set the scene for our framework, we first describe the traditional single-task forecast combination problem.Let y ∈ R be the forecast target and f = (f 1 , . . ., f p ) ⊤ ∈ R p be forecasts of y.Denote by e = y1 − f the forecast errors.It is customary to assume the errors satisfy E(e) = 0 and Var(e) = Σ, where Σ is a p × p positive-definite matrix.Consider the linear combination forecast f = f ⊤ w, where 1 The literature on the European Central Bank and Federal Reserve Bank of Philadelphia SPFs typically refers to individual forecasts as 'expert forecasts'; see footnotes 6 and 7 in Magnus and Vasnev (2023) for papers that use those surveys.Expert forecasts often include judgement that is now recognised as an important element in forecasting and can be used to adjust individual model output (Lawrence et al., 2006) or model selection/combination (Petropoulos et al., 2018).In many areas, 'judgemental forecasts' is a more common term; see Lawrence et al. (2006).Our methodology is also applicable to those areas.w = (w 1 , . . ., w p ) ⊤ ∈ R p are unit sum weights controlling the contribution of individual forecasts to the combination forecast.
Since the forecasts are unbiased and the weights sum to one, the mean square error minimising forecast combination is that which minimises the combination forecast error variance Var(e ⊤ w) = w ⊤ Σw.This minimisation is performed with respect to a constraint set W: The simplest configuration of the constraint set is W eql = {1/p}, yielding equal weights.Using W opt = {w ∈ R p : 1 ⊤ w = 1} leads to optimal weights as proposed by Bates and Granger (1969).The constraint set W optcvx = {w ∈ R p : 1 ⊤ w = 1, w ≥ 0}, as studied by Conflitti et al. (2015), adds a nonnegativity condition to guarantee a convex combination.The resulting weights are referred to hereafter as optimal convex weights.
weights restricted to an optimal subset of forecasts.These weights were investigated by Matsypura et al. (2018) and are referred to hereafter as optimal equal weights.Here, z is a vector of p binary variables z j (j = 1, . . ., p) where z j assumes the value one if forecast j is selected for inclusion in the combination and zero otherwise.The constraint w = z/(1 ⊤ z) guarantees the selected forecasts are equally-weighted.Other weighting schemes can also be cast in this setup by appropriately choosing W.
When the covariance matrix Σ is large-dimensional and estimated from data, it can be helpful to include a shrinkage penalty in the objective function (e.g., Roccazzella et al., 2022): where λ ≥ 0. Setting q = 2 yields a ridge penalty (Hoerl and Kennard, 1970), while q = 1 yields a lasso penalty (Tibshirani, 1996).When q = 2, the objective can be rearranged as w ⊤ (Σ + λI)w, so the ridge penalty has the effect of shrinking the covariance matrix towards the identity matrix I, thereby stabilising the objective.The lasso penalty has a similar stabilising effect.Though there exist numerous covariance estimators that explicitly perform shrinkage (Ledoit and Wolf, 2004;Schäfer and Strimmer, 2005;Touloumis, 2015), these do not accommodate missing data.Missing data is an important empirical consideration, discussed further in Section 5. On the other hand, it is straightforward to mimic the effect of shrinkage by plugging a standard missing-data covariance estimator into (1).Under all the aforementioned configurations of W, the limiting shrinkage case (λ → ∞) leads to equal weights as the optimal solution when q = 2.

Multi-task forecast combination
The problem described above concerns one forecasting task y.Suppose now we have multiple tasks y = (y (1) , . . ., y (m) ) ⊤ ∈ R m .The m tasks may comprise, e.g., different variables or different forecast horizons.We index all quantities relating to the kth component by superscript (k).Hence, the combination k) , where k) with Var(e (k) ) = Σ (k) .Though the multi-task setup is typical of economics, research to date has treated the tasks in isolation, using weights fit on a per-task basis: min This combination is local because the individual tasks are in no way linked, i.e., solving optimisation problem (1) for each task individually leads to the same weights as solving optimisation problem (2).Information from one task that might be relevant to other tasks is neglected.Instead, one can consider a single vector of weights that is a minimiser of the total loss across all tasks: This combination is global insofar as the resulting weights take into account information contained in all tasks.Since the loss term in the objective can be expressed equivalently as w ⊤ ( m k=1 Σ (k) )w, this approach can be interpreted as averaging over the task-specific covariance matrices.When the covariance matrices are estimated by the sample covariance matrix, averaging is the same as estimating a single covariance matrix after aggregating data from different tasks.Unfortunately, an implicit assumption underlies this approach that the tasks are completely homogeneous.This assumption might be unreasonably strong in practice and could harm forecast performance.
Rather than committing to a fully local or fully global approach, one can consider bridging the two approaches using per-task weights that are globally regularised: min Here, the penalty γ m k=1 ∥ w − w (k) ∥ q q with γ ≥ 0 is a device to incorporate global information into the per-task weights.It achieves this goal by penalising departures from an auxiliary weight vector w common to all tasks, where the departures are measured as squared deviations (q = 2) or absolute deviations (q = 1).
Regardless of q, taking γ → ∞ yields global combination (3), while taking γ → 0 yields local combination (2).Hereafter, we refer to the limiting case γ → ∞ as 'hard' global combination, and the case with finite nonzero values of γ as 'soft' global combination.These different cases are depicted in Figure 1.The value of γ should reflect the level of relatedness among tasks-larger values encourage homogeneity, while smaller values promote heterogeneity.The best value in terms of out-of-sample forecast performance is usually unknown in application but is estimable from data.•, w (1) f (1) . . . . . .
When the departures are measured as squared deviations (i.e., q = 2), it is not difficult to obtain an analytical solution: That is, the optimal value of the common parameter vector w is the average of the individual parameter vectors w (1) , . . ., w (m) .One can thus interpret our approach as finding per-task weights within a certain distance of the average weight vector.Some additional algebra gives an alternative expression for Ω γ,2 : This expression highlights that our approach explicitly penalises mutual distances between local weight vectors.Our experience is that formulating soft global combination using either of the above closed-form solutions yield computational performance similar to that of (4), provided the number of tasks m is not large.When m is large, these solutions involve many more quadratic terms in the objective, which can impede computation.For instance, under the simulation design of Section 4 when m = 10 and p = 50, it takes roughly six times longer to solve for optimal weights when using the second of the above closed-form solutions.
Proposition 1.When q = 2, the optimisation problem (4) can also be expressed as min where w⋆ = m −1 m k=1 w (k) is the optimal value of the common parameter vector.
Form (6) reveals that γ plays a dual role.It shrinks towards equal weights when it appears in front of I, similar to λ, but it also pushes towards a corner solution via the last term.While the full explicit solution cannot be derived for γ ̸ = 0, it is possible to prove the following proposition.
Proposition 2. The optimal solution of problem (6) when W = W opt satisfies where Proof.See Appendix A.2.
For γ = 0, we get an explicit solution w (k) = A (k) 1/(1 ⊤ A (k) 1) which is optimal weights of Bates and Granger (1969) shrunk towards equal weights by λ.When γ ̸ = 0, it helps λ with shrinkage, as expected, but also enters in a highly nonlinear way via B and D, so the total effect of γ is difficult to discern.

Task grouping
Sometimes it can be useful to limit the flow of information between certain tasks, e.g., when one or more tasks are unrelated.For this purpose, denote by G := {G 1 , . . ., G g } a collection of g groups of tasks, where Using this notation, one can modify Ω γ,q to impose the restriction that only tasks within the same group share information: Ω γ,q (w (1) , . . ., w (m) ) = min where w(l) is an auxiliary weight vector for the lth group.When G consists of just one group, this grouped version of the penalty reduces to (5).Conversely, when G consists of m groups, the grouped penalty has no globalisation effect, i.e., it leads to local combination.The grouped version is helpful in our application to the SPF data in Section 5 where we study different groups of variables and forecast horizons.

Task scaling
If the tasks under consideration vary in difficulty, one or more tasks might dominate the loss component of the objective function.To prevent this behaviour, we consider a scaled version of global combination: min where τ (1) q , . . ., τ > 0 are fixed scaling parameters.If the tasks are to be evenly balanced, a suitable value of τ is the optimal objective value from local combination: This configuration of τ (k) q places all tasks on equal footing, and we use it in all subsequent experiments.

Optimal (convex) weights
Computation of forecast combinations in our framework varies in complexity according to the weighting scheme, i.e., the specific configuration of W. We begin by describing methods for computation for optimal weights of Bates and Granger (1969) and optimal convex weights of Conflitti et al. (2015), both natural candidates for our framework.The constraint sets W opt = {w ∈ R p : 1 ⊤ w = 1} and W optcvx = {w ∈ R p : 1 ⊤ w = 1, w ≥ 0} defining these combinations are convex.All the objective functions described in Section 2 are convex.The resulting convex optimisation problems are efficiently solvable using most mathematical programming solvers; we use Gurobi (Gurobi Optimization, LLC, 2023).

Optimal equal weights
Optimal equal weights of Matsypura et al. (2018) are another natural candidate for our framework.
The constraint set defining these weights is less tractable than that for optimal weights or optimal convex weights.Recall the set is defined by a mix of continuous and discrete variables: The integrality constraint z ∈ {0, 1} p is nonconvex but is amenable to a mixed-integer programming solver such as Gurobi.The constraint w = z/(1 ⊤ z) is also nonconvex but cannot be handled directly by Gurobi.

Matsypura et al. (2018) used the decomposition
is the set of all vectors that equally weight s forecasts.Since s is fixed for , the constraint w = z/s is linear.The authors sequentially optimise over W opteql 1 , . . ., W opteql p and retain a solution with minimal objective value.This decomposition approach is, however, infeasible in our framework, because different tasks need not combine the same number of forecasts.To this end, we use a new one-step approach which directly optimises over W opteql .Though this new approach is proposed for the purpose of globally combining forecasts, it may be of independent interest for local forecast combination.
We have found it to be to be uniformly faster than the approach in Matsypura et al. (2018) in the single-task setting, sometimes by an order of magnitude.
First, we rewrite the constraint w = z/(1 ⊤ z) as the pair of constraints ws = z and s = 1 ⊤ z, where s ∈ {1, . . ., p}.The new constraint ws = z is bilinear in w and s, meaning it is linear for fixed w or fixed s.
Though this bilinear constraint remains nonconvex, it is amenable to spatial branch-and-bound techniques (Liberti, 2008) which are similar to classic branch-and-bound techniques used for handling integrality constraints.As of version 9, released in 2020, Gurobi can solve optimisation problems with bilinear constraints to global optimality.We now rewrite the constraint set (7) using the new bilinear constraint representation: The constraint s = 1 ⊤ z is, in fact, redundant in the above characterisation of W opteql since it is implied by the remaining constraints.Our experience is that Gurobi benefits from excluding it.

Simulation design
We evaluate the possible gains from global forecast combination in simulation.We work directly with the forecast errors which are sampled from a p-dimensional Gaussian e (k) t ∼ N (0, Σ (k) ) for t = 1, . . ., T and k = 1, . . ., m.We fix p = T = 50, so the number of forecasters is of the same order as the number of samples.Different sample sizes are considered in Appendix B.2, though the main findings are robust to sample size.The number of tasks m ∈ {2, 5, 10}.The covariance matrices Σ (1) , . . ., Σ (m) are constructed element-wise as Σ The correlation parameter ρ = 0.75 to induce high correlations between forecasters, typical of forecaster surveys.For forecaster j = 1, . . ., p, the standard deviations σ (k) j are generated by drawing random variables uniformly distributed on [a, b] and correlating them with correlation coefficient α ∈ {0, 1/3, 2/3, 1}.The parameter α dictates the level of task-relatedness.As α approaches one, a forecaster's performance on one task is strongly indicative of their performance on other tasks.The converse is true as α approaches zero-a forecaster's performance on one task is weakly indicative of their performance on other tasks.The bounds a = 1 and b = 3 so the accuracy of the worst forecaster is up to three times poorer than that of the best forecaster.A visualisation of data from this simulation design is given in Appendix B.1.
As a measure of out-of-sample accuracy, we report the mean square forecast error on an infinitely large testing set relative to that from an oracle: where ŵ(1) denotes estimated weights for task one fit using an estimate Σ(1) of the true covariance matrix 1) , and w (1) denotes oracle weights fit using Σ (1) .We restrict our attention to the relative forecast error of the first task only to measure the marginal effect of adding additional tasks.The covariance matrices are estimated using the sample covariances jt for all (i, j) ∈ {1, . . ., p} 2 .The shrinkage parameter λ is swept over a grid of ten values evenly spaced on a logarithmic scale between 0.001 and 1000.For every value of λ, the globalisation parameter γ of soft global combination is swept over the same grid.The best values of λ and γ are chosen on a validation set constructed independently and identically to the training set, which we remark approximates the precision of leave-one-out cross-validation.
The simulations are run parallel in R (R Core Team, 2023) with Gurobi given a single core of an AMD Ryzen Threadripper 3970x and a 300 second time limit for each value of γ and λ.

Forecast performance
Figure 2 reports the relative forecast errors from 30 simulations.The first row of plots is where the estimate ŵ and oracle w are fit under the sum to one constraint that defines optimal weights.The second and third rows correspond to the cases where ŵ and w are fit under the constraints that define optimal convex weights and optimal equal weights, respectively.The relative forecast error reported is not comparable across these three weighting schemes since the oracle is different in each case.Our goal is not to compare weighting schemes but rather to measure the benefits of globalisation.The interested reader is referred to Appendix B.4 for forecasts errors reported relative to equal weights-all key findings below remain the same.
Since local combination ignores information in additional tasks, its performance stays fixed as both the number of tasks and task-relatedness increase.In contrast, the relative forecast error of hard global combination decreases roughly linearly with task-relatedness, providing for substantial improvements when task-relatedness is high.Yet, when task-relatedness is low, hard global combination can underperform relative to local combination.This poor performance is made worse by adding additional tasks.
Soft global combination ameliorates the poor performance of hard global combination when the tasks are unrelated and nearly performs as well as hard global combination when the tasks are identical.There is, of course, a statistical cost to estimating the best level of globalisation.Between the extremes, soft global combination successfully adapts to the level of task-relatedness to improve over both local and hard global combination.The greater the number of tasks, the greater the possibility for improvement.
Among the three weighting schemes, optimal weights benefit most from globalisation.The constraint set that defines optimal weights is unbounded, and thus its relative forecast error can be arbitrarily bad.
Optimal convex weights and optimal equal weights are defined by bounded constraint sets, so there exist finite upper bounds on their relative forecast errors.Thus, the opportunity to improve these weights is somewhat less than for optimal weights, yet often still substantial.
In Appendix B.2, we provide additional results and extended discussion for T ∈ {25, 100, 150}.For shorter series (T = 25), soft global combination performs well even when the tasks are unrelated and significantly improves when they are related.Even though the justification is different for longer series, the conclusion is the same: soft global combination is preferred.It gets the best of both worlds regardless of whether the tasks are related.
The soft global combination results in this section correspond to the globalisation penalty configured with squared deviations (q = 2).Further comparisons in Appendix B.3 indicate no material improvement by the absolute deviation penalty (q = 1), so we restrict our attention to squared deviations hereafter.

Recommendations
The findings from these experiments suggest several recommendations for practitioners.First, consider globalising any forecast combinations when tackling multiple forecasting tasks.The potential gains from globalisation can be significant, even for moderate levels of task-relatedness.Second, unless domain knowledge indicates the tasks are unrelated or strongly related, use soft global combination with cross-validation.
Soft global combination with γ cross-validated is reasonably robust to task-relatedness, while the downside of applying hard global (local) combination to unrelated (strongly related) tasks is large.Last, when using optimal weights, employ global combination whenever possible since that weighting scheme benefits most from globalisation.The benefits persist even when globalising in tandem with shrinkage.

Data and methodology
The European Central Bank SPF is an ongoing survey eliciting predictions for rates of growth, inflation, and unemployment from forecasters for the Eurozone.The survey has been conducted quarterly since 1999 Q1.In each round, the survey participants are asked to provide predictions of the three variables at several time horizons.We focus on the two rolling horizons in this paper, which are one and two years ahead of the latest available observation of the respective variable.For instance, in the 1999 Q1 survey, one-year forecasts corresponded to 1999 Q3 for growth, December 1999 for inflation, and November 1999 for unemployment. 2 The total number of forecasting tasks m = 6.
The SPF data is publicly available at the European Central Bank Statistical Data Warehouse (SDW).
Actual values of inflation and unemployment are also available at the SDW.Actual values of growth are available from Eurostat.We access data at the SDW using the R package ecb (Persson, 2022), and data from Eurostat using the R package eurostat (Lahti et al., 2017).The data used in this paper was retrieved on 17 April 2022.After merging the forecasts and actual values, between T = 85 and T = 90 observations are available.The first observations are 1999 Q3 (one-year growth), 1999 Q4 (one-year inflation and unemployment), 2000 Q3 (two-year growth), and 2000 Q4 (two-year inflation and unemployment).The last observation is 2021 Q4.
A notable feature of the SPF is that forecasters enter and exit the survey at different times.This aspect of the survey, coupled with periodic nonresponse, gives rise to a sizeable portion of missing data.To deal with this issue, we follow previous works (Matsypura et al., 2018;Radchenko et al., 2023) and filter the data to only include forecasters who respond for a reasonable number of periods.Specifically, the forecasters who 2 To simplify exposition, forecasts of inflation and unemployment are referred to by the quarter they belong to, e.g., December 1999 inflation and November 1999 unemployment are called forecasts of 1999 Q4.To handle missing values that remain after filtering, the covariance matrices of forecast errors are estimated using all complete pairs of observations: jt for all (i, j) ∈ {1, . . ., p} 2 .Here, T (k) i denotes the periods in the training set where forecaster i provided a forecast for task k.Covariance matrices constructed in this manner are not guaranteed positive-definite.For this reason, we take the positive-definite matrix nearest to Σ(k) using nearPD from the R package Matrix (Bates et al., 2022).The forecast errors are standardised by the standard deviation of the forecast targets as estimated on the training set prior to estimating the covariance matrices.

Globalisation path
The first set of experiments study the evolution of out-of-sample forecast performance as the globalisation parameter γ is swept over its support (the 'globalisation path').Here, we take 30 values of γ logarithmically spaced between 0.001 and 1000.As a measure of out-of-sample accuracy, we report the mean square forecast error on a testing set relative to that from local combination: , where, for a given weighting scheme, f (k)(γ) t+h|t is a global combination forecast of task k at time t + h produced using a training set up to time t with γ ∈ [0, ∞), and T and T are the first and last periods in the testing set.The denominator is the mean square forecast error from setting γ = 0, so this metric is the percentage improvement due to globalisation.We pick T and T so the testing set is the last five years to 2019 Q4.
The period after 2019 Q4, covering the COVID-19 recession and 2021-2022 inflation surge, is considered in separate experiments in Section 5.3.
Figures 4, 5, and 6 report the globalisation paths of optimal weights, optimal convex weights, and optimal equal weights for fixed shrinkage parameter λ = 0.1.The globalisation paths of optimal weights are smooth because the fitted weights are a smooth function of γ as Proposition 2 implies, while those of optimal convex weights and optimal equal weights are nonsmooth.In the case of optimal convex weights, the convexity constraint makes the fitted weights nonsmooth in γ when it is binding.The path for optimal equal weights is a step function in γ due to the weights being discrete.Three ways of grouping the tasks are considered: grouping variable tasks (group 1: one-year growth, inflation, and unemployment; group 2: two-year growth, inflation, and unemployment); grouping forecast horizon tasks (group 1: one-and two-year growth; group 2: one-and two-year inflation; group 3: one-and two-year unemployment); and grouping all tasks (group 1: one-and two-year growth, inflation, and unemployment).The reader is reminded information flows only between tasks belonging to the same group.Across all weighting schemes and tasks, there is always a globalisation path that attains its minimum at some positive amount of globalisation.The limiting case γ → ∞, hard global combination, is sometimes helpful and sometimes harmful.For instance, growth and inflation realise roughly 15% improvement from hard global combination (optimal weights, grouped variables) at the two-year horizon while unemployment deteriorates by about 40% at the same horizon.This behaviour might be attributable to growth and inflation being difficult tasks at the two-year horizon (e.g., expert forecasts of those tasks are not responsive to the COVID effects in 2020 and 2021 as Figure 3 shows), thus providing a noisy signal to unemployment.However, even in the cases where hard global combination on its own is not useful (such as one-and two-year unemployment forecasts), the optimal choice of γ is still positive, and soft global combination can extract benefits.The results lead us to the following practical suggestions regarding the groupings.For a one-year growth forecast, using all available information (i.e., the 'grouped all' version) is beneficial as it is the best or close to the best performer across the different weights.For the same reason, we also recommend this grouping for two-year inflation and two-year unemployment forecasts.For one-year unemployment, one should group variables as this grouping is the best or close to the best across the different weights.For one-year inflation, grouped horizons deliver stable improvement across different weights (even though grouping variables works best for optimal weights).Finally, for a two-year growth forecast, we recommend grouping horizons but avoiding optimal weights.The convexity of the weights seems to be critical to avoid instabilities of negative weights, which was recently documented by Radchenko et al. (2023).
If one is working with only a single forecasting horizon, the selection of grouping becomes redundant.Also, in other applications, grouping all tasks seems a sensible default provided γ is chosen judiciously on a task-by-task basis.This default option can be improved by using additional cross-validation to help determine which grouping performs best.

Tuned globalisation
The second set of experiments are broader comparisons that acknowledge the level of globalisation requires tuning in practice.For this purpose, we use leave-one-out cross-validation-a valid procedure provided the combination forecast errors are uncorrelated (Bergmeir et al., 2018).The value of γ is tuned over ten values logarithmically spaced between 0.001 and 1000 on a per-task basis, so different tasks need not use the same value.To allow for comparisons of forecast accuracy across weighting schemes, we report  the mean square forecast error relative to that from equal weights, a common benchmark in practice: , where f (k) t+h|t is an arbitrary combination forecast and f (k) t+h|t is the equally-weighted combination forecast.Values of this metric less than one indicate superior performance to equal weights.
Table 1 reports the average value of the performance metric across the six tasks, with the minimal and maximal values among the tasks in brackets.The shrinkage parameter λ = 0.1.We study tuned λ next.The last five years of the data is again studied, but we now include the period 2020 Q1 to 2021 Q4 to evaluate recent performance during the COVID-19 recession and 2021-2022 inflation surge.Figure 3 highlights how the quarters on and after 2020 Q1 contain several outliers.To prevent these outliers dominating Averages over all tasks are next to minimums and maximums over all tasks in brackets.All values are relative to equal weights.
the performance metric, the testing set is split before and after 2020 Q1.Likewise, to avoid the outliers contaminating the estimated covariance matrices and thus the estimated weights, the training set is stopped at 2019 Q4.
With few exceptions, soft global combination improves on local combination.The improvements are generally greatest pre-2020.The more minor improvements post-2020 are possibly a consequence of the recent period of deteriorated economic conditions during which task-relatedness could be less stable.In some instances, hard global combination outperforms both soft global combination and local combination.
However, as in the previous section, it also sometimes underperforms.On the other hand, the data-driven determination of the globalisation level for soft global combination produces good combinations that con-sistently forecast well.
Optimal weights realise the most significant gains from globalisation among the three weighting schemessoft global combination (grouped all) places first in terms of average performance across tasks (pre-2020) compared with local combination, which places last.Moreover, globalisation leads to smaller maximal loss for optimal weights.Though not always beating optimal weights according to average performance, optimal convex weights and optimal equal weights have more consistent performance across tasks, especially pre-2020.With a suitable amount of globalisation, each weighting scheme can beat the notoriously difficult benchmark of equal weights for one or more task groupings.

Tuned shrinkage
The results of Table 1 are from tuning the globalisation parameter γ while holding the shrinkage parameter λ fixed.It is insightful to evaluate whether there are further benefits from tuning λ in addition to γ.
To this end, we cross-validate both parameters here.We focus on optimal (convex) weights to keep computation time reasonably low. the board relative to the results of Table 1 (for local, hard global, and soft global combinations).Though it is known (see Roccazzella et al., 2022) that optimal weights benefit from (carefully tuned) shrinkage, our result is the first documentation of similar behaviour for global combination.The results for optimal convex weights-whose nonnegativity constraint already imparts a form of shrinkage-are similar to Table 1.Our core finding remains the same in both cases: globalisation via soft global combination is typically beneficial.

Forecast combination puzzle
In more than 50 years of forecast combination literature spanning a myriad of weighting schemes, 'forecasters still have little guidance on how to solve the forecast combination puzzle' (Wang et al., 2023) post-COVID at the 1% level (except for the unemployment forecasts).As there is little data available post-COVID, reestimated weights are likely to have large variability that negates the benefits of soft global combination.If weights from the pre-COVID period are used, they will not necessarily be optimal and may not provide the benefits observed under stable conditions.Wang et al. (2023) recommend equal weights in such cases.Until more data is available, equal weights are probably suitable for the post-COVID period.
With more post-COVID data, soft global combination should quickly catch up as a strong competitor and a potential solution to the forecast combination puzzle; see also Frazier et al. (2023).
The benefits of equal weights centre around the substantial reduction of the variance at the cost of introducing a small bias; see Claeskens et al. (2016).A more recent approach by Blanc and Setzer (2020) requires an explicit solution to analyse the bias-variance trade-off.With the absence of an explicit solution in our case, a practitioner needs to empirically validate whether soft global combination beats the equallyweighted combination weights in their setting.Our findings suggest the likelihood of improving over the equal weights is high.

Concluding remarks
To date, the problem of combining economic forecasts has been handled on a per-task basis, with the combination for each variable and forecast horizon learned independently of other variables and horizons.
When the forecasting tasks are related, as economic theory and evidence suggest, this approach of learning the combinations using only local information is potentially suboptimal.This paper investigates the value of a global approach, where task-relatedness is directly exploited to improve the quality of combinations.At the heart of our approach is a principled framework that accounts for the level of homogeneity across tasks by flexibly interpolating between fully local and fully global combinations.In addition to unifying local and global approaches under one umbrella, the new framework accommodates many existing weighting schemes.
Empirical evidence from the European Central Bank SPF suggests combinations of expert forecasts for rates of growth, inflation, and unemployment in the Eurozone benefit from some degree of globalisation, as do combinations of these same variables across one-and two-year horizons.Further empirical evidence on economic and financial data from the M4 Competition in Appendix D indicates similar benefits for combinations of model-based forecasts.
Our approach is not limited to point forecasts and can be extended to probabilistic forecasts.Consider, e.g., the optimal weights of Hall and Mitchell (2007) and Geweke and Amisano (2011) The density parameter θ (k) j can be different across tasks.This problem can be further enhanced using a shrinkage penalty or additional constraints, e.g., the high moment constraints of Pauwels et al. (2023).
Furthermore, our approach is based on the intuitive idea that a forecaster's competence in predicting one variable might contain some signal about their competence in predicting another.One can examine the connection between the accuracy of the combined forecast and individual forecaster characteristics.We leave this direction for future research.
An R implementation of the global forecast combinations in this paper is publicly available at https://github.com/ryan-thompson/global-combinations.proposed soft global approach is not harmed by it and is similar to hard global combination.These results suggest using soft global combination for shorter series as it performs well even when the tasks are not related and significantly improves when they are related.
When longer history is available (T = 100 or 150), Figures B.9 and B.10 show that hard global combination is harmful when task-relatedness is low or moderate and even when it is strong (α = 2/3) for T = 150.
The reason for this result is the large amount of irrelevant information introduced by long time series.The irrelevant information does not harm our proposed soft global combination as γ can be tuned to remove its effect.Soft global combination performs similarly to local combination when the task-relatedness is low, moderate, or even strong (α = 2/3).Of course, when the tasks are perfectly related (α = 1), the situation flips.Now, hard global combination is highly beneficial as long series bring a lot of relevant information.
Again, soft global combination can extract the same benefits as hard global combination.
These results confirm the previous suggestion of using soft global combination, but now for longer series, as it can perform well whether tasks are related or not.In practice, one does not usually know the relatedness of the tasks.In our empirical study, e.g., inflation and unemployment are related by the Phillips curve.This relationship changes over time with periods where the variables are strongly related and periods where they are weakly related.The advantage of soft global combination is that one does not need to know the strength of the relation to witness benefits.weight to local information.By tapering down γ, soft global combination can deal with the additional noise that is harmful to hard global combination even when the task-relatedness is moderate and especially when it is low.When the tasks are perfectly related, γ remains at its maximum.The shrinkage parameter λ also decreases with sample size as the covariance matrix estimator becomes increasingly reliable.Full-info.Cross-valid.Full-info.Cross-valid.

Optimal weights
One and other times further away.This variation is likely due to estimation error from cross-validation, which is potentially considerable given the relatively small training sample.Structural breaks may also play a role and could be addressed via cross-validation with a rolling window.Variability in the cross-validated value of γ is typically largest for the two-year horizon tasks, likely because these are more difficult than the one-year tasks and hence noisier.Though cross-validation is a practical tool for tuning γ, there is certainly value in future work investigating alternatives (e.g., a formulaic characterisation of the optimal γ via asymptotic analysis).
For further insight into the globalisation parameters from by cross-validation, we present

Figure 1 :
Figure 1: Global and local forecast combination frameworks.The notation ⟨x, y⟩ = x ⊤ y represents the dot product of two vectors x ∈ R p and y ∈ R p .Local combination learns different weight vectors for each task independently of other tasks.Hard global combination learns one weight vector for all tasks.Soft global combination learns different weight vectors for each task while sharing information between tasks.

Figure 2 :
Figure 2: Mean square forecast error as a function of task-relatedness parameter α for 30 synthetic datasets with p = 50 forecasters and T = 50 samples.Vertical bars represent averages and error bars denote one standard errors.All values are relative to oracle weights.

Figure 3 :
Figure 3: Data from the Survey of Professional Forecasters.Points represent forecasts and lines denote actual values of the forecast target.Point sizes reflect the number of equivalent forecasts when rounded to one decimal place.

Figure 4 :
Figure 4: Mean square forecast error of optimal weights as a function of globalisation parameter γ for the Survey of Professional Forecasters.Testing period is 2015 Q1 to 2019 Q4.Minimum of each curve is marked by a circle.All values are relative to local combination (γ → 0).The shrinkage parameter λ = 0.1.The x-axis is in log scale.

Figure 5 :
Figure 5: Mean square forecast error of optimal convex weights as a function of globalisation parameter γ for the Survey of Professional Forecasters.Testing period is 2015 Q1 to 2019 Q4.Minimum of each curve is marked by a circle.All values are relative to local combination (γ → 0).The shrinkage parameter λ = 0.1.The x-axis is in log scale.

Figure 6 :
Figure 6: Mean square forecast error of optimal equal weights as a function of globalisation parameter γ for the Survey of Professional Forecasters.Testing period is 2015 Q1 to 2019 Q4.Minimum of each curve is marked by a circle.All values are relative to local combination (γ → 0).The shrinkage parameter λ = 0.1.The x-axis is in log scale.
that, in the case of only one task, maximise the log-score when combining p individual predictive densities p j (•; θ j ) using the historical observations y 1 , . . ., y T :

Figure B. 8 :Figure B. 9 :
Figure B.8: Mean square forecast error as a function of task-relatedness parameter α for 30 synthetic datasets with p = 50 forecasters and T = 25 samples.Vertical bars represent averages and error bars denote one standard errors.All values are relative to oracle weights.

Figure B. 11
Figure B.11 shows the behaviour of the cross-validated globalisation parameter γ and cross-validated shrinkage parameter λ.As the sample size T increases, the globalisation parameter γ decreases, giving more Figure B.12: Mean square forecast error as a function of task-relatedness parameter α for 30 synthetic datasets with = 50 forecasters and T = 100 samples.Vertical bars represent averages and error bars denote one standard errors.All values are relative to oracle weights.
Figure D.15: Mean square forecast error for economic and financial data from the M4 Competition.Vertical bars represent averages and error bars denote one standard errors.All values are relative to equal weights.

Table 1 :
Mean square forecast errors for the Survey of Professional Forecasters with cross-validated globalisation parameter.

Table 2 :
Table 2 reports the results.Optimal weights witness an improvement across Mean square forecast errors for the Survey of Professional Forecasters with cross-validated globalisation and shrinkage parameters.Averages over all tasks are next to minimums and maximums over all tasks in brackets.All values are relative to equal weights.

Table 3 :
Rossi (2021)cal results show that before the COVID-19 recession (2017-2019), soft global combination not only improves upon local and hard global combinations but also upon equal weights.Our synthetic experiments also demonstrate improvement in stable conditions; see Appendix B.4.However, results post-2019 are mixed with around half cases where soft global combination performs better than equal weights.One possible explanation is a structural break produced by the COVID-19 recession.Rossi (2021)showed that the 2007-2008 global financial crisis (GFC) affected forecasting performance significantly.Currently, no similar research is available for the COVID period.However, Figure3leaves no doubt that for inflation and growth, the effects of COVID far exceed those observed during the GFC.Simple tests for a structural break in Table3support this claim.The variance of the average forecast error is significantly different Tests for a structural break in the average forecast error between pre-and post-COVID periods (up to 2019 Q4 and after 2019 Q4).p-values are reported from a t test for a difference in means and an F test for a difference in variances.

Table C .
4: Comparisons of cross-validation tuning with full-information tuning for the Survey of Professional Forecasters over the period 2017 Q1 to 2019 Q4.The globalisation parameter γ is reported in logarithmic scale.The shrinkage parameter λ = 0.1.All tasks are grouped.Standard errors are reported in parentheses.