Skip to main content

Measuring Forecasting Accuracy: Problems and Recommendations (by the Example of SKU-Level Judgmental Adjustments)

  • Chapter
  • First Online:
Intelligent Fashion Forecasting Systems: Models and Applications

Abstract

Forecast adjustment commonly occurs when organizational forecasters adjust a statistical forecast of demand to take into account factors which are excluded from the statistical calculation. This paper addresses the question of how to measure the accuracy of such adjustments. We show that many existing error measures are generally not suited to the task, due to specific features of the demand data. Alongside the well-known weaknesses of existing measures, a number of additional effects are demonstrated that complicate the interpretation of measurement results and can even lead to false conclusions being drawn. In order to ensure an interpretable and unambiguous evaluation, we recommend the use of a metric based on aggregating performance ratios across time series using the weighted geometric mean. We illustrate that this measure has the advantage of treating over- and under-forecasting even-handedly, has a more symmetric distribution, and is robust.

Empirical analysis using the recommended metric showed that, on average, adjustments yielded improvements under symmetric linear loss, while harming accuracy in terms of some traditional measures. This provides further support to the critical importance of selecting appropriate error measures when evaluating the forecasting accuracy. The general accuracy evaluation scheme recommended in the paper is applicable in a wide range of settings including forecasting for fashion industry.

This paper is an extended version of Davydenko and Fildes [8] which appeared in the International Journal of Forecasting

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The formula corresponds to the software implementation described by Hyndman and Khandakar [19].

References

  1. Alkhazaleh AMH, Razali AM (2010) New technique to estimate the asymmetric trimming mean. J Probab Stat 2010 http://www.hindawi.com/journals/jps/2010/739154/cta/

  2. Andrews DF, Bickel PJ, Hampel FR, Huber PJ, Rogers WH, Tuckey JW (1972) Robust estimates of location. Princeton University Press, Princeton

    Google Scholar 

  3. Armstrong S (1985) Long-range forecasting: from crystal ball to computer. Wiley, New York

    Google Scholar 

  4. Armstrong JS, Collopy F (1992) Error measures for generalizing about forecasting methods: empirical comparisons. Int J Forecast 8:69–80

    Article  Google Scholar 

  5. Armstrong JS, Fildes R (1995) Correspondence on the selection of error measures for comparisons among forecasting methods. J Forecast 14(1):67–71

    Article  Google Scholar 

  6. Chatfield C (2001) Time-series forecasting. Chapman & Hall, Boca Raton

    Google Scholar 

  7. Davydenko A, Fildes R (2008) Models for product demand forecasting with the use of judgmental adjustments to statistical forecasts. Paper presented at the 28th international symposium on forecasting (ISF2008), Nice. Retrieved on 20 Sep 2013 from http://www.forecasters.org/submissions08/DAVYDENKOANDREYISF2008.pdf

  8. Davydenko A, Fildes R (2013) Measuring forecasting accuracy: the case of judgmental adjustments to SKU-level demand forecasts. Int J Forecast 29(3):510–522

    Article  Google Scholar 

  9. Diebold FX (1993) On the limitations of comparing mean square forecast errors: comment. J Forecast 12:641–642

    Article  Google Scholar 

  10. Fildes R (1992) The evaluation of extrapolative forecasting methods. Int J Forecast 8(1):81–98

    Article  Google Scholar 

  11. Fildes R, Goodwin P (2007) Against your better judgment? How organizations can improve their use of management judgment in forecasting. Interfaces 37:570–576

    Article  Google Scholar 

  12. Fildes R, Goodwin P, Lawrence M, Nikolopoulos K (2009) Effective forecasting and judgmental adjustments: an empirical evaluation and strategies for improvement in supply-chain planning. Int J Forecast 25(1):3–23

    Article  Google Scholar 

  13. Fleming PJ, Wallace JJ (1986) How not to lie with statistics: the correct way to summarize benchmark results. Commun ACM 29(3):218–221

    Article  Google Scholar 

  14. Franses PH, Legerstee R (2010) Do experts’ adjustments on model-based SKU-level forecasts improve forecast quality? J Forecast 29:331–340

    Google Scholar 

  15. Goodwin P, Lawton R (1999) On the asymmetry of the symmetric MAPE. Int J Forecast 4:405–408

    Article  Google Scholar 

  16. Hill M, Dixon WJ (1982) Robustness in real life: a study of clinical laboratory data. Biometrics 38:377–396

    Article  Google Scholar 

  17. Hoover J (2006) Measuring forecast accuracy: omissions in today’s forecasting engines and demand-planning software. Foresight Int J Appl Forecast 4:32–35

    Google Scholar 

  18. Hyndman RJ (2006) Another look at forecast-accuracy metrics for intermittent demand. Foresight Int J Appl Forecast 4(4):43–46

    Google Scholar 

  19. Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 27(3)

    Google Scholar 

  20. Hyndman R, Koehler A (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  21. Kolassa S, Schutz W (2007) Advantages of the MAD/MEAN ratio over the MAPE. Foresight Int J Appl Forecast 6:40–43

    Google Scholar 

  22. Makridakis S (1993) Accuracy measures: theoretical and practical concerns. Int J Forecast 9:527–529

    Article  Google Scholar 

  23. Marques CR, Neves PD, Sarmento LM (2000) Evaluating core inflation indicators. Working paper 3-00, Economics Research Department, Banco de Portugal

    Google Scholar 

  24. Mathews B, Diamantopoulos A (1987) Alternative indicators of forecast revision and improvement. Mark Intell 5(2):20–23

    Article  Google Scholar 

  25. McCarthy TM, Davis DF, Golicic SL, Mentzer JT (2006) The evolution of sales forecasting management: a 20-year longitudinal study of forecasting practice. J Forecast 25:303–324

    Article  Google Scholar 

  26. Mudholkar GS (1983) Fisher’s z-transformation. Encyclopedia Stat Sci 3:130–135

    Google Scholar 

  27. Sanders N, Ritzman L (2004) Integrating judgmental and quantitative forecasts: methodologies for pooling marketing and operations information. Int J Oper Prod Manage 24:514–529

    Article  Google Scholar 

  28. Spizman L, Weinstein M (2008) A note on utilizing the geometric mean: when, why and how the forensic economist should employ the geometric mean. J Legal Econ 15(1):43–55

    Google Scholar 

  29. Syntetos AA, Boylan JE (2005) The accuracy of intermittent demand estimates. Int J Forecast 21(2):303–314

    Article  Google Scholar 

  30. Trapero JR, Fildes RA, Davydenko A (2011) Non-linear identification of judgmental forecasts at SKU-level. J Forecast 30(5):490–508

    Article  Google Scholar 

  31. Trapero JR, Pedregal DJ, Fildes R, Weller M (2011) Analysis of judgmental adjustments in presence of promotions. Paper presented at the 31th international symposium on forecasting (ISF2011), Prague

    Google Scholar 

  32. Tversky A, Kahneman D (1974) Judgment under uncertainty: heuristics and biases. Science 185:1124–1130

    Article  Google Scholar 

  33. Wilcox RR (1996) Statistics for the social sciences. Academic, San Diego

    Google Scholar 

  34. Wilcox RR (2005) Trimmed means. Encyclopedia Stat Behav Sci 4:2066–2067

    Google Scholar 

  35. Zellner A (1986) A tale of forecasting 1001 series: the Bayesian knight strikes again. Int J Forecast 2:491–494

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Davydenko .

Editor information

Editors and Affiliations

Appendix 1 Alternative Representation of MASE

Appendix 1 Alternative Representation of MASE

According to Hyndman and Koehler [20], for the scenario when forecasts are made from varying origins but with a constant horizon (here taken as one), the scaled error is defined asFootnote 1

$$ {q}_{i,t}=\frac{e_{i,t}}{{\mathrm{MAE}}_i^{\mathrm{b}}},\kern0.5em {\mathrm{MAE}}_i^{\mathrm{b}}=\frac{1}{l_i-1}{\displaystyle \sum}_{j=2}^{l_i}\left|{Y}_{i,j}-{Y}_{i,j-1}\right|, $$

where MAE b i is the MAE from the benchmark (naïve) method for series i, e i,t is the error of a forecast being evaluated against the benchmark for series i and period t, l i is the number of elements in series i, and Y i,j is the actual value observed at time j for series i.

Let the mean absolute scaled error (MASE) be calculated by averaging the absolute scaled errors across time periods and time series:

$$ \mathrm{MASE}=\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{\displaystyle \sum}_{t\in {T}_i}\frac{\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}}, $$

where n i is the number of available values of e i,t for series i, m is the total number of series, and T i is a set containing time periods for which the errors e i,t are available for series i.

Then,

$$ \mathrm{MASE}=\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{\displaystyle \sum}_{t\in {T}_i}\frac{\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}} $$
$$ =\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m\frac{{{\displaystyle \sum}}_{t\in {T}_i}\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}} $$
$$ =\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{n}_i\frac{\frac{1}{n_i}{{\displaystyle \sum}}_{t\in {T}_i}\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}} $$
$$ =\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{n}_i{r}_i,\kern0.75em {r}_i=\frac{{\mathrm{MAE}}_i}{{\mathrm{MAE}}_i^{\mathrm{b}}}, $$

where MAE i is the MAE for series i for the forecast being evaluated against the benchmark.

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Davydenko, A., Fildes, R. (2014). Measuring Forecasting Accuracy: Problems and Recommendations (by the Example of SKU-Level Judgmental Adjustments). In: Choi, TM., Hui, CL., Yu, Y. (eds) Intelligent Fashion Forecasting Systems: Models and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39869-8_4

Download citation

Publish with us

Policies and ethics