When to choose the simple average in forecast combination
Introduction
The combination of forecasts has been subject to research in economics since the pioneering work of Reid (1968) and Bates and Granger (1969). Numerous studies show that the combination of forecasts often results in increased accuracy in comparison to any of the forecasts alone (Makridakis et al., 1982, Clemen, 1989, Makridakis and Hibon, 2000, Fildes and Petropoulos, 2015). Various techniques aiming at deriving a weighting of individual forecasts which minimizes errors out-of-sample have been proposed.
Bates and Granger (1969) introduced the so-called optimal weights (OW). The weights are determined in a least squares estimation using available past forecast error data. They are referred to as optimal as they minimize the in-sample error variance; by design, OW outperforms any other linear weighting approach in-sample. However, the out-of-sample performance is not necessarily superior since the estimated weights are strongly fitted to the training data and are consequently subject to sampling-based variance.
As a consequence, alternative weight estimation approaches have been proposed. Clemen (1989); Diebold and Lopez (1996), and Timmermann (2006) provided thorough literature reviews of the various approaches to forecast combination. Approaches include variants of optimal weights constrained to the interval [0,1], shrinkage towards the average, Bayesian outperformance probabilities, and several more approaches. Each of the alternative approaches outperformed OW as well as other approaches out-of-sample in some evaluations, but are outperformed in others. As no model exists to decide which of the approaches to choose and empirical results are ambiguous, there is no clear consensus on which forecast combination method can be expected to perform best in a particular situation.
A surprising observation of the reviews was, however, that amongst the approaches under study, the simple average (SA) was not systematically outperformed by any other approach in out-of-sample evaluations. Stock and Watson (2004) coined the term “forecast combination puzzle” for this phenomenon. Besides model-based forecasting, SA is also competitive when combining expert predictions. For instance Genre, Kenny, Meyler, and Timmermann (2013) found that for forecasts of unemployment rate and GDP growth, only few combination methods outperform SA, while their results caution against any assumption that the identified improvements would persist in the future.
The forecast combination puzzle is in line with the more general phenomenon that simpler forecasting procedures usually outperform more complex techniques. Green and Armstrong (2015) reviewed 97 studies comparing simple and complex methods, concluding that “none of the papers provide a balance of evidence that complexity improves the accuracy of forecasts out-of-sample”. Simplicity in forecasting procedures corresponds to using models where few different cues are used and/or few parameters have to be estimated. Likewise, in forecast combination, where weights of forecasts instead of cues are chosen, SA is the simplest model as – in contrast to more complex models such as OW – no parameters are estimated at all.
Brighton and Gigerenzer (2015) argued that the benefits of simplicity are often overlooked because of a “bias bias”, where the importance of the bias component of the error is inflated. In contrast, the variance component, resulting from oversensitivity to different samples from the same population, is often ignored. Simpler approaches are typically more robust against different samples as the variance component is directly related to model complexity.
Simple averaging strategies have also been shown to be highly competitive in applications besides forecast combination. For instance, for venture capital decisions, Woike, Hoffrage, and Petty (2015) found that the decision quality when using equally weighted binary cues is comparable to more complex strategies, but even more robust. Graefe (2015) argued that estimating coefficients (weights) of predictors in multivariate models is only reasonable for large and reliable datasets and few predictors. For small and noisy datasets and a large number of predictors, the authors argued that including all relevant variables is more important than the weighting.
In forecast combination, the robustness of SA has been an important research topic and a considerable body of literature examines the forecast combination puzzle theoretically and empirically. As will be discussed in Section 2, results indicate that the robustness of SA stems from unstable weight estimates from small training samples or diverging forecast error characteristics between the training and the evaluation samples. In a broader sense, these findings support the “‘Golden Rule of Forecasting”, stating that forecasts are to be conservative (Armstrong, Green, & Graefe, 2015). That is because increasing asymmetry of weights results in higher sensitivity to the results of one individual forecast that is less counterbalanced by others.
Although these qualitative relations are known, managers still have little guidance on which method to choose in a particular setting. More specifically, we are not aware of any comprehensive quantitative decision guidance on when to choose OW or SA.
In this paper, we introduce a model for the expected out-of-sample error variance of a forecast combination, in particular when using SA and OW. Using the model, we derive multi-criteria decision boundaries determining whether OW or SA will lead to lower asymptotic error variance in a specific setting. Practitioners can furthermore use the thresholds to assess the robustness of a decision. We show that existing empirical guidelines can largely be explained by the model. Furthermore, in an empirical study with data from the M3 competition, we demonstrate that the recommendations and the thresholds can be used to implement successful combination decision strategies in practical settings.
Section snippets
Related work
A substantial amount of research has been conducted on the performance and robustness of SA in comparison to other forecast combination methods. A basic and intuitive finding is that the performance of SA depends on the ratio of the error variances of the forecasts as well as on their correlation. SA can be expected to perform well in case of similar error variances and low or medium error correlations (Bunn, 1985, Gupta and Wilton, 1987), since the weights which are optimal in the evaluation
Forecast combination
Given two forecasts and for an event y, a combined forecast can be calculated by weighting both forecasts. The most common approach is a linear combination of the forecasts using weight w to derive a novel forecast . Assuming unbiased individual forecasts with errors and a correlation ρ between eA and eB, Bates and Granger (1969) proposed optimal weights (OW) minimizing the error variance of in-sample. The original definition as well as
Error variance of forecast combination and decision boundaries
The training sample (T) and the evaluation sample (E) are two independent bivariate samples of forecast errors. T has size n, a ratio of error standard deviations ϕT and error correlation ρT. Optimal weights are estimated from T and are then applied to E (with a potentially different ratio of error standard deviations ϕE and error correlation ρE). The error of in E is with E[eCE] = 0. For our theoretical analyses, we assume σA = 1 and focus on ϕ, as this reduces one
Application of the decision boundaries to real-world data
In this section, we assess the applicability of the proposed decision rules to empirical data. We use the out-of-sample error variances of SA and OW estimated using the proposed model and the derived thresholds to implement different strategies for deciding between SA and OW.
As empirical data set, we use the time series data of the M3 Competition (Makridakis & Hibon, 2000). We limit our analysis to monthly time series (1426 of the 3003 time series) to ensure a sufficient length of the time
Conclusion and implications
The “forecast combination puzzle” refers to the recurring empirical finding that more sophisticated weight learning models typically do not outperform a simple average (SA) in forecast combination. It is known that estimates of the error variances of individual forecasts and their covariances, the parameters used for weighting the forecasts, are often too unstable because of small training samples or changes in the underlying time series and the corresponding error characteristics. However,
References (31)
- et al.
Golden rule of forecasting: Be conservative
Journal of Business Research
(2015) - et al.
The bias bias
Journal of Business Research
(2015) Statistical efficiency in the linear combination of forecasts
International Journal of Forecasting
(1985)- et al.
The forecast combination puzzle: A simple theoretical explanation
International Journal of Forecasting
(2016) Combining forecasts: A review and annotated bibliography
International Journal of Forecasting
(1989)- et al.
Review of guidelines for the use of combined forecasts
European Journal of Operational Research
(2000) - et al.
Forecast evaluation and combination
- et al.
Simple versus complex selection rules for forecasting many time series
Journal of Business Research
(2015) - et al.
Combining expert forecasts: Can anything beat the simple average?
International Journal of Forecasting
(2013) Improving forecasts using equally weighted predictors
Journal of Business Research
(2015)
Simple versus complex forecasting: The evidence
Journal of Business Research
The M3-competition: Results, conclusions and implications
International Journal of Forecasting
The effect of nonstationarity on combined forecasts
International Journal of Forecasting
Forecast combinations
Picking profitable investments: The success of equal weighting in simulated venture capitalist decision making
Journal of Business Research
Cited by (36)
Forecast combinations: An over 50-year review
2023, International Journal of ForecastingCitation Excerpt :Although it is well known that the “forecast combination puzzle” stems from the unstable estimates of combination weights, researchers still lack comprehensive quantitative decision guidance on when to choose a simple averaging strategy over more complex strategies. One exception is Blanc and Setzer (2016), who merely looked at the combination of two individual forecasts and proposed decision rules to decide when to choose simple averaging over the “optimal” weights introduced by Bates and Granger (1969). In addition, the examination of simple averaging in the context of probabilistic forecast combinations deserves further attention and development, both theoretical and empirical.
Forecasting for lead-time period by temporal aggregation: Whether to combine and how
2023, Computers in IndustryCitation Excerpt :Kourentzes et al. (2014) recommended using multiple levels of TA and combining the separate forecasts (MAPA). This approach not only benefits from managing the modelling risk, but also utilises the established gains of forecast combination (Barrow and Kourentzes, 2016; Blanc and Setzer, 2016). Kourentzes et al. (2014) provided empirical evidence to demonstrate gains over conventional forecasting.
Optimal forecast combination based on PSO-CS approach for daily agricultural future prices forecasting
2023, Applied Soft ComputingState-space TBATS model for container freight rate forecasting with improved accuracy
2022, Maritime Transport ResearchCitation Excerpt :Such an approach to re-estimate model parameters for each step forecast is rare. Furthermore, practitioners need to decide whether to choose a single model or combine forecasts from different models (Blanc and Setzer, 2016; Chan and Pauwels, 2018). Hence, this study examines the impact of model combination and re-estimation on forecast performance.
Artificial bee colony-based combination approach to forecasting agricultural commodity prices
2022, International Journal of Forecasting