When to choose the simple average in forecast combination

https://doi.org/10.1016/j.jbusres.2016.05.013Get rights and content

Highlights

  • The asymptotic out-of-sample error variance of forecast combination is derived.

  • Error variances of simple average and optimal weight combinations are analyzed.

  • Multi-criteria decision boundaries determine when to choose the simple average.

  • Boundaries consider sample size and robust robustness against structural breaks.

Abstract

Numerous forecast combination techniques have been proposed. However, these do not systematically outperform a simple average (SA) of forecasts in empirical studies. Although it is known that this is due to instability of learned weights, managers still have little guidance on how to solve this “forecast combination puzzle”, i.e., which combination method to choose in specific settings. We introduce a model determining the yet unknown asymptotic out-of-sample error variance of the two basic combination techniques: SA, where no weightings are learned, and so-called optimal weights that minimize the in-sample error variance. Using the model, we derive multi-criteria boundaries (considering training sample size and changes of the parameters which are estimated for optimal weights) to decide when to choose SA. We present an empirical evaluation which illustrates how the decision rules can be applied in practice. We find that using the decision rules is superior to all other considered combination strategies.

Introduction

The combination of forecasts has been subject to research in economics since the pioneering work of Reid (1968) and Bates and Granger (1969). Numerous studies show that the combination of forecasts often results in increased accuracy in comparison to any of the forecasts alone (Makridakis et al., 1982, Clemen, 1989, Makridakis and Hibon, 2000, Fildes and Petropoulos, 2015). Various techniques aiming at deriving a weighting of individual forecasts which minimizes errors out-of-sample have been proposed.

Bates and Granger (1969) introduced the so-called optimal weights (OW). The weights are determined in a least squares estimation using available past forecast error data. They are referred to as optimal as they minimize the in-sample error variance; by design, OW outperforms any other linear weighting approach in-sample. However, the out-of-sample performance is not necessarily superior since the estimated weights are strongly fitted to the training data and are consequently subject to sampling-based variance.

As a consequence, alternative weight estimation approaches have been proposed. Clemen (1989); Diebold and Lopez (1996), and Timmermann (2006) provided thorough literature reviews of the various approaches to forecast combination. Approaches include variants of optimal weights constrained to the interval [0,1], shrinkage towards the average, Bayesian outperformance probabilities, and several more approaches. Each of the alternative approaches outperformed OW as well as other approaches out-of-sample in some evaluations, but are outperformed in others. As no model exists to decide which of the approaches to choose and empirical results are ambiguous, there is no clear consensus on which forecast combination method can be expected to perform best in a particular situation.

A surprising observation of the reviews was, however, that amongst the approaches under study, the simple average (SA) was not systematically outperformed by any other approach in out-of-sample evaluations. Stock and Watson (2004) coined the term “forecast combination puzzle” for this phenomenon. Besides model-based forecasting, SA is also competitive when combining expert predictions. For instance Genre, Kenny, Meyler, and Timmermann (2013) found that for forecasts of unemployment rate and GDP growth, only few combination methods outperform SA, while their results caution against any assumption that the identified improvements would persist in the future.

The forecast combination puzzle is in line with the more general phenomenon that simpler forecasting procedures usually outperform more complex techniques. Green and Armstrong (2015) reviewed 97 studies comparing simple and complex methods, concluding that “none of the papers provide a balance of evidence that complexity improves the accuracy of forecasts out-of-sample”. Simplicity in forecasting procedures corresponds to using models where few different cues are used and/or few parameters have to be estimated. Likewise, in forecast combination, where weights of forecasts instead of cues are chosen, SA is the simplest model as – in contrast to more complex models such as OW – no parameters are estimated at all.

Brighton and Gigerenzer (2015) argued that the benefits of simplicity are often overlooked because of a “bias bias”, where the importance of the bias component of the error is inflated. In contrast, the variance component, resulting from oversensitivity to different samples from the same population, is often ignored. Simpler approaches are typically more robust against different samples as the variance component is directly related to model complexity.

Simple averaging strategies have also been shown to be highly competitive in applications besides forecast combination. For instance, for venture capital decisions, Woike, Hoffrage, and Petty (2015) found that the decision quality when using equally weighted binary cues is comparable to more complex strategies, but even more robust. Graefe (2015) argued that estimating coefficients (weights) of predictors in multivariate models is only reasonable for large and reliable datasets and few predictors. For small and noisy datasets and a large number of predictors, the authors argued that including all relevant variables is more important than the weighting.

In forecast combination, the robustness of SA has been an important research topic and a considerable body of literature examines the forecast combination puzzle theoretically and empirically. As will be discussed in Section 2, results indicate that the robustness of SA stems from unstable weight estimates from small training samples or diverging forecast error characteristics between the training and the evaluation samples. In a broader sense, these findings support the “‘Golden Rule of Forecasting”, stating that forecasts are to be conservative (Armstrong, Green, & Graefe, 2015). That is because increasing asymmetry of weights results in higher sensitivity to the results of one individual forecast that is less counterbalanced by others.

Although these qualitative relations are known, managers still have little guidance on which method to choose in a particular setting. More specifically, we are not aware of any comprehensive quantitative decision guidance on when to choose OW or SA.

In this paper, we introduce a model for the expected out-of-sample error variance of a forecast combination, in particular when using SA and OW. Using the model, we derive multi-criteria decision boundaries determining whether OW or SA will lead to lower asymptotic error variance in a specific setting. Practitioners can furthermore use the thresholds to assess the robustness of a decision. We show that existing empirical guidelines can largely be explained by the model. Furthermore, in an empirical study with data from the M3 competition, we demonstrate that the recommendations and the thresholds can be used to implement successful combination decision strategies in practical settings.

Section snippets

Related work

A substantial amount of research has been conducted on the performance and robustness of SA in comparison to other forecast combination methods. A basic and intuitive finding is that the performance of SA depends on the ratio of the error variances of the forecasts as well as on their correlation. SA can be expected to perform well in case of similar error variances and low or medium error correlations (Bunn, 1985, Gupta and Wilton, 1987), since the weights which are optimal in the evaluation

Forecast combination

Given two forecasts ŷA and ŷB for an event y, a combined forecast can be calculated by weighting both forecasts. The most common approach is a linear combination of the forecasts using weight w to derive a novel forecast ŷC=wŷA+1wŷB. Assuming unbiased individual forecasts with errors eA=yŷA~N0σA2,eB=yŷB~N0σB2 and a correlation ρ between eA and eB, Bates and Granger (1969) proposed optimal weights (OW) minimizing the error variance of ŷC in-sample. The original definition as well as

Error variance of forecast combination and decision boundaries

The training sample (T) and the evaluation sample (E) are two independent bivariate samples of forecast errors. T has size n, a ratio of error standard deviations ϕT and error correlation ρT. Optimal weights ŵ are estimated from T and are then applied to E (with a potentially different ratio of error standard deviations ϕE and error correlation ρE). The error of ŷC in E is eCE=ŵeAE+1ŵeBE with E[eCE] = 0. For our theoretical analyses, we assume σA = 1 and focus on ϕ, as this reduces one

Application of the decision boundaries to real-world data

In this section, we assess the applicability of the proposed decision rules to empirical data. We use the out-of-sample error variances of SA and OW estimated using the proposed model and the derived thresholds to implement different strategies for deciding between SA and OW.

As empirical data set, we use the time series data of the M3 Competition (Makridakis & Hibon, 2000). We limit our analysis to monthly time series (1426 of the 3003 time series) to ensure a sufficient length of the time

Conclusion and implications

The “forecast combination puzzle” refers to the recurring empirical finding that more sophisticated weight learning models typically do not outperform a simple average (SA) in forecast combination. It is known that estimates of the error variances of individual forecasts and their covariances, the parameters used for weighting the forecasts, are often too unstable because of small training samples or changes in the underlying time series and the corresponding error characteristics. However,

References (31)

  • K.C. Green et al.

    Simple versus complex forecasting: The evidence

    Journal of Business Research

    (2015)
  • S. Makridakis et al.

    The M3-competition: Results, conclusions and implications

    International Journal of Forecasting

    (2000)
  • C. Miller et al.

    The effect of nonstationarity on combined forecasts

    International Journal of Forecasting

    (1992)
  • A. Timmermann

    Forecast combinations

  • J.K. Woike et al.

    Picking profitable investments: The success of equal weighting in simulated venture capitalist decision making

    Journal of Business Research

    (2015)
  • Cited by (36)

    • Forecast combinations: An over 50-year review

      2023, International Journal of Forecasting
      Citation Excerpt :

      Although it is well known that the “forecast combination puzzle” stems from the unstable estimates of combination weights, researchers still lack comprehensive quantitative decision guidance on when to choose a simple averaging strategy over more complex strategies. One exception is Blanc and Setzer (2016), who merely looked at the combination of two individual forecasts and proposed decision rules to decide when to choose simple averaging over the “optimal” weights introduced by Bates and Granger (1969). In addition, the examination of simple averaging in the context of probabilistic forecast combinations deserves further attention and development, both theoretical and empirical.

    • Forecasting for lead-time period by temporal aggregation: Whether to combine and how

      2023, Computers in Industry
      Citation Excerpt :

      Kourentzes et al. (2014) recommended using multiple levels of TA and combining the separate forecasts (MAPA). This approach not only benefits from managing the modelling risk, but also utilises the established gains of forecast combination (Barrow and Kourentzes, 2016; Blanc and Setzer, 2016). Kourentzes et al. (2014) provided empirical evidence to demonstrate gains over conventional forecasting.

    • State-space TBATS model for container freight rate forecasting with improved accuracy

      2022, Maritime Transport Research
      Citation Excerpt :

      Such an approach to re-estimate model parameters for each step forecast is rare. Furthermore, practitioners need to decide whether to choose a single model or combine forecasts from different models (Blanc and Setzer, 2016; Chan and Pauwels, 2018). Hence, this study examines the impact of model combination and re-estimation on forecast performance.

    View all citing articles on Scopus
    View full text