Measuring Forecasting Accuracy: Problems and Recommendations (by the Example of SKU-Level Judgmental Adjustments)

Davydenko, Andrey; Fildes, Robert

doi:10.1007/978-3-642-39869-8_4

Andrey Davydenko⁴ &
Robert Fildes⁴

2843 Accesses
2 Citations

Abstract

Forecast adjustment commonly occurs when organizational forecasters adjust a statistical forecast of demand to take into account factors which are excluded from the statistical calculation. This paper addresses the question of how to measure the accuracy of such adjustments. We show that many existing error measures are generally not suited to the task, due to specific features of the demand data. Alongside the well-known weaknesses of existing measures, a number of additional effects are demonstrated that complicate the interpretation of measurement results and can even lead to false conclusions being drawn. In order to ensure an interpretable and unambiguous evaluation, we recommend the use of a metric based on aggregating performance ratios across time series using the weighted geometric mean. We illustrate that this measure has the advantage of treating over- and under-forecasting even-handedly, has a more symmetric distribution, and is robust.

Empirical analysis using the recommended metric showed that, on average, adjustments yielded improvements under symmetric linear loss, while harming accuracy in terms of some traditional measures. This provides further support to the critical importance of selecting appropriate error measures when evaluating the forecasting accuracy. The general accuracy evaluation scheme recommended in the paper is applicable in a wide range of settings including forecasting for fashion industry.

This paper is an extended version of Davydenko and Fildes [8] which appeared in the International Journal of Forecasting

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The formula corresponds to the software implementation described by Hyndman and Khandakar [19].

References

Alkhazaleh AMH, Razali AM (2010) New technique to estimate the asymmetric trimming mean. J Probab Stat 2010 http://www.hindawi.com/journals/jps/2010/739154/cta/
Andrews DF, Bickel PJ, Hampel FR, Huber PJ, Rogers WH, Tuckey JW (1972) Robust estimates of location. Princeton University Press, Princeton
Google Scholar
Armstrong S (1985) Long-range forecasting: from crystal ball to computer. Wiley, New York
Google Scholar
Armstrong JS, Collopy F (1992) Error measures for generalizing about forecasting methods: empirical comparisons. Int J Forecast 8:69–80
Article Google Scholar
Armstrong JS, Fildes R (1995) Correspondence on the selection of error measures for comparisons among forecasting methods. J Forecast 14(1):67–71
Article Google Scholar
Chatfield C (2001) Time-series forecasting. Chapman & Hall, Boca Raton
Google Scholar
Davydenko A, Fildes R (2008) Models for product demand forecasting with the use of judgmental adjustments to statistical forecasts. Paper presented at the 28th international symposium on forecasting (ISF2008), Nice. Retrieved on 20 Sep 2013 from http://www.forecasters.org/submissions08/DAVYDENKOANDREYISF2008.pdf
Davydenko A, Fildes R (2013) Measuring forecasting accuracy: the case of judgmental adjustments to SKU-level demand forecasts. Int J Forecast 29(3):510–522
Article Google Scholar
Diebold FX (1993) On the limitations of comparing mean square forecast errors: comment. J Forecast 12:641–642
Article Google Scholar
Fildes R (1992) The evaluation of extrapolative forecasting methods. Int J Forecast 8(1):81–98
Article Google Scholar
Fildes R, Goodwin P (2007) Against your better judgment? How organizations can improve their use of management judgment in forecasting. Interfaces 37:570–576
Article Google Scholar
Fildes R, Goodwin P, Lawrence M, Nikolopoulos K (2009) Effective forecasting and judgmental adjustments: an empirical evaluation and strategies for improvement in supply-chain planning. Int J Forecast 25(1):3–23
Article Google Scholar
Fleming PJ, Wallace JJ (1986) How not to lie with statistics: the correct way to summarize benchmark results. Commun ACM 29(3):218–221
Article Google Scholar
Franses PH, Legerstee R (2010) Do experts’ adjustments on model-based SKU-level forecasts improve forecast quality? J Forecast 29:331–340
Google Scholar
Goodwin P, Lawton R (1999) On the asymmetry of the symmetric MAPE. Int J Forecast 4:405–408
Article Google Scholar
Hill M, Dixon WJ (1982) Robustness in real life: a study of clinical laboratory data. Biometrics 38:377–396
Article Google Scholar
Hoover J (2006) Measuring forecast accuracy: omissions in today’s forecasting engines and demand-planning software. Foresight Int J Appl Forecast 4:32–35
Google Scholar
Hyndman RJ (2006) Another look at forecast-accuracy metrics for intermittent demand. Foresight Int J Appl Forecast 4(4):43–46
Google Scholar
Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 27(3)
Google Scholar
Hyndman R, Koehler A (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688
Article Google Scholar
Kolassa S, Schutz W (2007) Advantages of the MAD/MEAN ratio over the MAPE. Foresight Int J Appl Forecast 6:40–43
Google Scholar
Makridakis S (1993) Accuracy measures: theoretical and practical concerns. Int J Forecast 9:527–529
Article Google Scholar
Marques CR, Neves PD, Sarmento LM (2000) Evaluating core inflation indicators. Working paper 3-00, Economics Research Department, Banco de Portugal
Google Scholar
Mathews B, Diamantopoulos A (1987) Alternative indicators of forecast revision and improvement. Mark Intell 5(2):20–23
Article Google Scholar
McCarthy TM, Davis DF, Golicic SL, Mentzer JT (2006) The evolution of sales forecasting management: a 20-year longitudinal study of forecasting practice. J Forecast 25:303–324
Article Google Scholar
Mudholkar GS (1983) Fisher’s z-transformation. Encyclopedia Stat Sci 3:130–135
Google Scholar
Sanders N, Ritzman L (2004) Integrating judgmental and quantitative forecasts: methodologies for pooling marketing and operations information. Int J Oper Prod Manage 24:514–529
Article Google Scholar
Spizman L, Weinstein M (2008) A note on utilizing the geometric mean: when, why and how the forensic economist should employ the geometric mean. J Legal Econ 15(1):43–55
Google Scholar
Syntetos AA, Boylan JE (2005) The accuracy of intermittent demand estimates. Int J Forecast 21(2):303–314
Article Google Scholar
Trapero JR, Fildes RA, Davydenko A (2011) Non-linear identification of judgmental forecasts at SKU-level. J Forecast 30(5):490–508
Article Google Scholar
Trapero JR, Pedregal DJ, Fildes R, Weller M (2011) Analysis of judgmental adjustments in presence of promotions. Paper presented at the 31th international symposium on forecasting (ISF2011), Prague
Google Scholar
Tversky A, Kahneman D (1974) Judgment under uncertainty: heuristics and biases. Science 185:1124–1130
Article Google Scholar
Wilcox RR (1996) Statistics for the social sciences. Academic, San Diego
Google Scholar
Wilcox RR (2005) Trimmed means. Encyclopedia Stat Behav Sci 4:2066–2067
Google Scholar
Zellner A (1986) A tale of forecasting 1001 series: the Bayesian knight strikes again. Int J Forecast 2:491–494
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Management Science, Lancaster University, Lancaster, LA1 4YX, UK
Andrey Davydenko & Robert Fildes

Authors

Andrey Davydenko
View author publications
You can also search for this author in PubMed Google Scholar
Robert Fildes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey Davydenko .

Editor information

Editors and Affiliations

Business Division Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hongkong, People's Republic of China
Tsan-Ming Choi
Business Division Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hongkong, People's Republic of China
Chi-Leung Hui
Business Division Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hongkong, People's Republic of China
Yong Yu

Appendix 1 Alternative Representation of MASE

According to Hyndman and Koehler [20], for the scenario when forecasts are made from varying origins but with a constant horizon (here taken as one), the scaled error is defined as^{Footnote 1}

$$ {q}_{i,t}=\frac{e_{i,t}}{{\mathrm{MAE}}_i^{\mathrm{b}}},\kern0.5em {\mathrm{MAE}}_i^{\mathrm{b}}=\frac{1}{l_i-1}{\displaystyle \sum}_{j=2}^{l_i}\left|{Y}_{i,j}-{Y}_{i,j-1}\right|, $$

where MAE ^b_i is the MAE from the benchmark (naïve) method for series i, e _i,t is the error of a forecast being evaluated against the benchmark for series i and period t, l _i is the number of elements in series i, and Y _i,j is the actual value observed at time j for series i.

Let the mean absolute scaled error (MASE) be calculated by averaging the absolute scaled errors across time periods and time series:

$$ \mathrm{MASE}=\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{\displaystyle \sum}_{t\in {T}_i}\frac{\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}}, $$

where n _i is the number of available values of e _i,t for series i, m is the total number of series, and T _i is a set containing time periods for which the errors e _i,t are available for series i.

Then,

$$ \mathrm{MASE}=\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{\displaystyle \sum}_{t\in {T}_i}\frac{\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}} $$

$$ =\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m\frac{{{\displaystyle \sum}}_{t\in {T}_i}\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}} $$

$$ =\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{n}_i\frac{\frac{1}{n_i}{{\displaystyle \sum}}_{t\in {T}_i}\left|{e}_{i,t}\right|}{{\mathrm{MAE}}_i^{\mathrm{b}}} $$

$$ =\frac{1}{{{\displaystyle \sum}}_{i=1}^m{n}_i}{\displaystyle \sum}_{i=1}^m{n}_i{r}_i,\kern0.75em {r}_i=\frac{{\mathrm{MAE}}_i}{{\mathrm{MAE}}_i^{\mathrm{b}}}, $$

where MAE_i is the MAE for series i for the forecast being evaluated against the benchmark.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Davydenko, A., Fildes, R. (2014). Measuring Forecasting Accuracy: Problems and Recommendations (by the Example of SKU-Level Judgmental Adjustments). In: Choi, TM., Hui, CL., Yu, Y. (eds) Intelligent Fashion Forecasting Systems: Models and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39869-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-39869-8_4
Published: 26 October 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39868-1
Online ISBN: 978-3-642-39869-8
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics

Measuring Forecasting Accuracy: Problems and Recommendations (by the Example of SKU-Level Judgmental Adjustments)

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix 1 Alternative Representation of MASE

Appendix 1 Alternative Representation of MASE

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation