Comparison of selection and combination strategies for demand forecasting methods

Paper aims: In this study, effective strategies to combine and select forecasting methods are proposed. In the selection strategy, the best performing forecasting method from a pool of methods is selected based on its accuracy, whereas the combination strategies are based on the mean methods’ outputs and on the methods’ accuracy. Originality: Despite the large amount of work in this area, the actual literature lacks of selection and combination strategies of forecasting methods for dealing with intermittent time series. Research


Introduction
The introduction of digital technologies on the industry to provide integration between physical and digital systems has emerged under the form of Industry 4.0 (Frank et. al., 2019).These technologies can provide useful data to manufacturing systems.In particular, in smart manufacturing systems, planning activities can rely on information provided by intelligent computer systems, which use historical data to generate valuable information (e.g.future product demands).
Demand forecast refers to predict or estimate the need for a product or a component in a future time period (Armstrong, 2001).The information about the forecasted demand can be used to support managerial decisions and planning process activities in operations.For example, in Guo et al. (2017), the forecasts are employed to support ordering decisions of airplane spare parts.Yu et al. (2011) proposes forecasting models to estimate demands of fashion products.Another example of application is presented by Syntetos et al. (2005), where the main objective is to forecast spare parts demands from an automotive industry.
The accuracy of the demand forecasts is important for companies, since forecasts are usually employed as input to inventory systems (Wang & Petropoulos, 2016;Rego & Mesquita, 2015;Babai et al., 2019).Several demand forecasting methods have been proposed in the literature, such as, Simple Exponential Smoothing (SES) (Hyndman & Athanasopoulos, 2018), Croston (CR) (Croston, 1972), among others; and many studies have been devoted to the selection of the appropriate forecasting method, which can depend on the time series characteristics, performance or professional expertise (Syntetos et al., 2005;Petropoulos et al., 2018;Moon et al., 2013).
In literature, different selection criteria for forecasting models have been used.The selection can be based on the time series characteristics (Syntetos et al., 2005;Petropoulos et al., 2018;Heinecke et al., 2013), on the forecasting model performance (Wang & Petropoulos, 2016;Fildes & Petropoulos, 2015), on the information criteria (Qi & Zhang, 2001), or on the judgmental expert selection (Petropoulos et al., 2018).For example, Adya et al. (2001) proposed an automated framework to identify six different time series characteristics in a rule-based forecast system.
Rule-base forecast is an expert system proposed by Collopy & Armstrong (1992), which relies on 28 characteristics of time series to weight four forecasting methods.In Adya et al. (2001), another strategy is proposed.The presence of outliers, level shifts, changes in trend, unstable recent trend, functional form, and unusual (last) demands are considered as time series characteristics.Another example is the selection based on an information criteria proposed by Qi & Zhang (2001).Using financial times series from S&P 500 Index, a strategy selects among many Artificial Neural Network and Autoregressive (AR) models using the Akaike information criteria and the Bayesian information criteria.
Furthermore, combinations of forecasting methods can significantly improve the accuracy of forecasts and reduce the variance of prediction errors, which are desirable characteristics for inventory purposes (Wang & Petropoulos, 2016).Combination of forecasts is a process of using different forecasting models to produce a final forecast.The application of combination schemes avoids the implicit assumptions about the underlying process of data generation.Kourentzes et al. (2019) proposed a heuristic, based on quartiles definition using forecasting errors, to build a pool of forecasting models for combination and selection.The approach was evaluated using the M3-Competition data; however, the employed evaluation metrics are different from those employed to evaluate the M3-Competiton results, which difficult the approach evaluation and comparison.Barrow & Kourentzes (2016) analyze the impact of a forecasting combination in terms of forecast error distribution and safety stock using demand data of a consumer goods manufacturer.The authors revealed that forecasts from a combination of Naive Forecast (NF), SES, AR, Autoregressive Integrated Moving Average (ARIMA), Theta and Multiple Aggregation Prediction Algorithm (MAPA) models can improve inventory decisions (Barrow & Kourentzes, 2016).However, the work does not compare different combination strategies.
In addition, Guo et al. (2017) evaluated a double-level combination of forecasting methods to predict spare part data of an aircraft fleet.The work employed different data types (e.g., flight time, number of takeoffs, number of landings, among others) that influence the spare part consumption to design the forecasting methods.The used methods are Exponential Smoothing methods variations, Genetic Neural Networks and Grey model.The proposed combination strategy consists of assigning the weights each forecasting method by solving a quadratic programming problem ("low-level combination") and using a genetic algorithm for a "top-level combination".The proposed approach outperformed other forecasting models.However, high computational time is required to produce predictions due to the use of a neural network that requires several computations to determine optimal weights.Also, Wang & Petropoulos (2016) employed a simple 50-50% combination of two forecasting sources, and results have shown improvements in the accuracy forecasts.The proposed combination consists of a simpler scheme, where the assigned weights are equals for two forecasting sources: a forecasting model from commercial software, and forecasts judgmentally produced by experts.The strategies were compared to the other combinations as the selection based on the variance of the forecast error, the selection based on the Mean Absolute Error (MAE), and a single forecasting method.The proposed combination strategies minimized the total cost of inventory meanwhile maximized the forecast accuracy, however, the strategies were evaluated only on a stationary data from a pharmaceutical industry.
On the other hand, Franses & Legerstee (2011) propose the use of a combination of forecasting methods with experts forecast.The proposed combination strategy also considers the specific characteristics of experts to assign Production, 30, e20200009, 2020 | DOI: 10.1590/0103-6513.202000093/13 optimal weights.The results of the evaluation revealed that the 50-50% combination, using a statistical model and an expert model, can be a useful strategy.However, the strategy can hide the contribution of each model on the final forecast, since it can assign zero weight for one of them; although, the literature suggests that the use of both is usually better.Therefore, several works have demonstrated the predictive accuracy of combinations in forecasting problems.
Nonetheless, most studies do not deal with intermittent data series (i.e.series with a large number of zero values), which are common in the spare part industry.Intermittent demand time series can lead to high costs of holding due to the high risk of obsolescence (Babai et al., 2019).This type of time series is especially difficult to forecast, because they are limited to non-zero demand data and have high variability of values.In stock control systems, inappropriate levels can lead to out of stock or excessive quantities for intermittent items.Specific methods have been proposed (Croston, 1972;Syntetos et al., 2005;Babai et al., 2019).A popular method for this type of time series was proposed by Croston (1972), and it achieves empirically good performance in many studies.Although, some works have pointed out the existence of positive bias in this method.For example, Syntetos et al. (2005) evaluated the performance of intermittent and traditional forecasting methods using data from spare parts from an automotive industry.In this case, an adjusted version of the Croston's method, called Syntetos and Boylan Approximation (SBA), achieved superior performance than the Croston's method.
Moreover, Babai et al. (2019) propose and evaluate an approach for intermittent demands, called modified SBA method, using data from the military sector and the automotive industry.The approach is efficient to forecast intermittent time series, since traditional forecasting approaches do not make appropriate adjustments when no demand occurs.However, the study lacks of comparisons between the proposed approach and other approaches to handle intermittent data.
To address the listed problems, this paper proposes and compares one forecasting method selection strategy and two forecasting method combination strategies for demand forecasting problems with different characteristics (e.g.intermittency, trend, stationary and nonstationary).In the forecasting method selection strategy, the best forecasting method from a pool of forecasting methods is selected based on its accuracy on a validation interval.On the other hand, the forecasting method combination strategies are based on the mean methods' outputs and on the methods' accuracy.Forecasting method combinations are employed in this work because researches have demonstrated that they improve the generalization capability and overall performance of the systems (Soares et al., 2012); and thus, they have advantage over a single individual forecasting model in terms of forecasting accuracy (Choi & Lee, 2018).
The strategies are developed using SES, Holt's linear trend method (HOLT) (a variant of SES), CR, AR and NF as forecasting models.The main contribution of this work is to propose a set of forecasting models with heterogeneous capabilities, so that the proposed strategies can achieve good results on time series with different data characteristics (such as, intermittency, increasing or decreasing patterns, stationary and nonstationary).Experiments, using a spare part data set (with intermittent demand) from an industry of elevators and a data set from the M3-Competition (Makridakis & Hibon, 2000), are reported to demonstrate the performance of the proposed strategies.The main contributions of this paper are to propose and compare a number of selection and combination strategies for intermittent and non-intermittent time series.
The rest of this paper is organized as follows.Section 2 describes the proposed combination strategy selection procedures, and evaluation metrics.Moreover, it presents the proposed forecasting method strategies.Section 3 presents and discusses the main results of this paper.In Section 4, concluding remarks of this paper are summarized.

Proposed forecasting approaches
This section describes the proposed approaches for demand forecasting.It starts describing the main concepts about forecasting methods and the employed forecasting methods in this paper (Subsection 2.1).Then, Subsection 2.2 details the main evaluation metrics for forecasting methods.Subsection 2.3 introduces combination strategies for forecasting methods, and presents the proposed combination strategies in this paper.Finally, Subsection 2.4 presents the main strategies for forecasting method selection, and describes the proposed forecasting method selection in this work.Table 1 describes the main employed nomenclature in this paper.

Forecasting methods
In this paper, the demand forecasting methods were selected to cover a wide range of characteristics in time series (such as, trend, intermittency, autocorrelation, among others).Moreover, state-of-the-art methods in literature and employed methods on commercial software were selected in this research.As described previously, the selected methods are SES, HOLT, CR, AR and NF.Table 2 presents the main notations of this work.Production, 30, e20200009, 2020 | DOI: 10.1590/0103-6513.20200009   4/13 NF is a well-known forecasting method, widely used in literature as a benchmark for performing comparisons among other forecasting methods (Franses & Legerstee, 2011;Fildes & Petropoulos, 2015;Wang & Petropoulos, 2016).This method assumes that the last observation in data time series is the most important data.In this case, an obtained estimate by the NF method is equal to the last observed demand on the data, that is: in Equation 1, y t is the observed (real) demand on time t and ŷ t+h is a forecast (prediction) on time t + h.
The SES method is also a popular forecasting method and it can be found in several commercial software, such as, SAP, Oracle RDF and ForecastPro.SES is usually employed when there is no clear pattern of trend or seasonality on a time series (Hyndman & Athanasopoulos, 2018).A forecast made by the SES method is determined as:

ˆ(
) in Equation 2, 0 ≤ α ≤ 1 is a smoothing parameter.The SES method works by weighting past observations, where the weights decrease exponentially over time as the observations get older.To deal with time series with trends, a variant of the SES method, called HOLT, was included in this work.The Holt's linear trend method is a modification of SES on which the forecast value is decomposed into level and trend components, being the trend component is calculated using h (Holt, 2004).A prediction using the HOLT method can be performed as in Equation 3: (5)  in Equation 4and Equation 5, 0 ≤ α ≤ 1 and 0 ≤ β ≤ 1 are the smoothing parameters, l t is a forecast of the level of the series at time t, and b t is a forecast of the trend (slope) of the series at time t.
The literature proposes specifics methods to deal with intermittent demand time series (Croston, 1972;Syntetos et al., 2005), which are characterized by multiples periods of zero demand (Kourentzes, 2013).The most know, the Croston's method (Croston, 1972) is present in commercial software, for example, SAP and Oracle RDF.In the CR method, the estimates are obtained as follows: ˆ( ) ˆ( ) The method consists of forecasting separately a value of demand, y t in Equation 6, and the time interval between demands, p t in Equation 7, assuming that both variables are independent (Croston, 1972).Finally, Equation 8 provides a rate of expected demand (forecast demand) by a period.
The AR models are a flexible class of models and can handle a wide range of time series patterns; but, in general, they are applied on stationary time series (Hyndman & Athanasopoulos, 2018).Unlike the traditional regression models, the independent variable is estimated by considering its past values (the autoregression term is used for this reason).The AR method of order p, also referred as AR(p), was selected in this work and it can be written as follows: ...
In the Equation 9, c is a constant, ε t is the white noise of the time series, and φ 1 ,...,φ p are weight parameters.Note that the estimates (forecasts) are produced by a linear combination of lagged values of y.

Evaluation metrics for forecasting methods
To measure the accuracy of forecasting methods, several evaluation metrics can be found in literature.Mean Absolute Percentage Error (MAPE) is a popular evaluation metric, being suitable to evaluate different time series, because it is independent of the data scale and has easy interpretation.However, it can produce infinite or undefined errors if zero values (or approximately zero values) occur on the data, due division for the real value of demand.Since zero values are common in intermittent demand time series, MAPE is not suitable for this type of series as pointed in (Teunter & Duncan, 2009;Hyndman & Koehler, 2006;Makridakis et al., 2018).
Root Mean Square Error (RMSE) is a typical error metric widely employed in forecasting methods and machine learning methods.It does not suffer from the problem mentioned above.However, RMSE is more sensitive to outlier values.RMSE can be obtained as: In the Equation 10, N is the number of observations, and e t is a forecast error on time t and obtained as y tŷ t .
A survey of evaluation metrics for forecasting methods is proposed by Hyndman & Koehler (2006).Mean Absolute Scaled Error (MASE) can overcome the drawbacks of other evaluation metrics.This is because, MASE is independent of the data scale, less sensitive to outlier values and only produces undefined errors when all the NF forecast errors (on the denominator) are equals.Using a scaled error, q t , as in Equation 11:

Combination of forecasting methods
The combination of forecasting methods has become an important strategy in many forecasting works and has used as a benchmark in many applications (Makridakis et al., 2018).For example, Wang & Petropoulos (2016) propose and evaluate the use a strategy of two models combination, namely judgmental adjustment and statistical output.The strategy is compared to a number of other demand forecast approaches.In most cases, to create a set of forecasting models and to perform a simple average of the methods' outputs usually obtain better accuracy than to use a single forecasting method.
This paper proposes the use of two combination strategies for aggregating forecasting methods: simple mean and weighted mean.In the first strategy, the final forecast is obtained by averaging the methods' outputs; and in the second strategy, the final forecast is calculated by taking a weighted sum of the methods' outputs, where the weight of each method is determined using the method's error on a validation interval.This work uses the terms CF m and CF w for the combination of forecasting methods with simple mean strategy and for the combination of forecasting methods with weighted mean strategy, respectively.
Assuming m as the number of forecasting methods, o j as the output (forecast) of the model j for any time instant, w j as the weight of the model j, the combination output is given by the Equation 12: For the weighted mean strategy (CF w ), the RMSE and MASE metrics on a validation interval are used to compute the methods' weights.Consider error j as the error value (i.e.RMSE value or MASE value) of a method j on a validation interval, its weight w j can be computed by Equation 13 (Soares et al., 2012): where the adjusted error j is computed as: and in Equation 14, the average error j is: Therefore, the main idea of the CF w strategy is to assign a weight for each forecasting method according to its performance on a validation interval.For the CF m strategy, the methods have the same contribution in the system, so that their weights are equal and set to w j =1/m (for j=1,...,m).In this paper, the forecasting methods for designing the combination systems (i.e.CF m and CF w ) are SES, HOLT, NF, AR and CR.Therefore, the number of forecasting models is 5, so that m=5 for Equation 12, Equation 13 and Equation 15.

Selection strategies for forecasting methods
A selection strategy aims to choose the best forecasting method (from a set of forecasting methods) using some criteria (for example, the accuracy on a validation interval).Different criteria can be found in literature.For example, based on the time series characteristics, Syntetos et al. (2005) proposed a selection scheme using the average inter-demand interval and the squared coefficient of variation of demand size.This scheme uses cut-off values to select between CR and an approach called Syntetos and Boylan approximation.This framework was proposed for intermittent demand time series.
On the other hand, Moon et al. (2013) propose other type of selection strategy.In this case, a squared coefficient of variation, a correlation and an equipment group (an external variable) were employed as inputs to a logistic regression model, where the main purpose it to predict which forecasting method has superior performance.The proposed model was evaluated using areal data set containing spare part demands from the Korean Navy.Other approach is to select a forecasting method based on the performance.This approach consists of selecting the best forecasting method on the previous periods (data samples) assuming that the selected forecasting method will be more suitable for the next periods (Wang & Petropoulos, 2016).In most cases, the time series is divided in three time intervals, which are employed as training (used for building the forecasting method), validation (used for selecting the best forecasting method) and testing (used for evaluating the best forecasting method on a future time interval), respectively.
The performance of the selection strategies can be evaluated in terms of the accuracy of the selected forecasting method.For example, Wang & Petropoulos (2016) compared the performance of five different forecasting strategies: a forecasting method using a statistical model, a forecasting method adjusted by an expert, a combination of two forecasting methods, selections of a forecasting method based on the accuracy or on the variance.The performance of the strategies is evaluated in terms of inventory system metrics and forecasting method accuracy.
On the other hand, Fildes & Petropoulos (2015) propose four rules to select a forecasting model, and then they are compared to simple combination and aggregate selection (selection of the best forecasting method for all the time series).The strategies were analyzed with a subset of the M3-Competition data set, a popular data set for time series forecasts.The rules incorporate different selection criteria: "[...] best in-sample fit, best validation performance for one-step-ahead forecast, best validation performance on a pre-defined forecast horizon h and best validation performance for all forecast horizons" (Fildes & Petropoulos, 2015, p. 1694).Other aspects are considered in this work, such as, the size of the pool of forecasting methods and the accuracy of the individual selection.According to the authors, aggregate selection can be preferred if the data contain more similar sub-populations, but the individual selection can be necessary when the considered methods in the pool are low correlated.
The effect of the selection strategy for forecasting methods by experts was analyzed by Petropoulos et al. (2018) using data from the M3-Competition.The performed experiment differs from the others works due to the inclusion of a human judgment on the selection of forecasts.Also, a combination of forecasting methods is employed for performance comparisons.The results suggest that the inclusion of a human judgment on forecasting systems can be a useful practice (Petropoulos et al., 2018).Hyndman & Khandakar (2008) proposed two automatic selection procedures for forecasting methods.The algorithms can select a forecasting method based on the Akaike's Information Criterion for Exponential Smoothing models or based on a Step-wise procedure for an ARIMA model.
This paper proposes and analyzes the use of a selection strategy for choosing the best forecasting model, from a pool of forecasting models, based on the performance on a validation interval.That is, in the first step, a pool of seven forecasting methods (namely, SES, HOLT, NF, AR, CR, CF m and CF w ) are designed using a training interval.After, the forecasting methods are evaluated using a validation interval.And then, the model with the lowest error (MASE or RMSE) is selected as the final forecasting model.Finally, the selected model is designed using the training and the validation intervals, and is evaluated on the testing interval to analyze its performance on future samples.In the next sections, this proposed approach, with automatic selection of a forecasting method, is termed as "AUTO".Figure 1 presents an overview of the described approach.

Experimental design and results
In this section, the proposed forecasting methods (CF m , CF w and AUTO) are evaluated using two real-world data sets: a spare part data set from an industry of elevators and the M3-Competition data set.The proposed approaches are compared to SES, HOLT, NF, AR and CR.The experiments were performed using the Python programming language, running on a PC equipped with an Intel® Core™ i5-7200U 2.50GHz processor of 2 cores and 8GB of RAM.The implementations of the forecasting methods (except the CR method) can be found at a Python library (Statsmodels, 2020) called Econometric and Statistical Modeling with Python published by Seabold & Perktold (2010).On the other hand, the implementation of the CR method was developed using tools of the mentioned library and other Python libraries.

Data set description
The M3-Competition data set is a popular and public data set (M3-Competition, 2020) containing a large number of time series (Makridakis & Hibon, 2000).It consists of 3,003 time series, where 1,428 times series have monthly demand data of various types of applications, including industry, demographic, finance, among others.This data set was selected due to the large amount of observations (samples) on the time series, providing enough information to build forecasting methods.
The other data set is a private data set provided by an elevator industry organization in Brazil.It is a time series data with monthly demands of a spare part throughout the year 2019 in 54 geographic locations (cities) in Brazil.Therefore, the total of time series is 54, one for each geographic location.A particular characteristic of this data set is the high presence of demands with zero values.Therefore, most times series are intermittent.
A test to verify the intermittent characteristics of both data sets was performed, using the framework proposed Syntetos et al. (2005).The test consists of computing the values for the average inter-demand interval and the coefficient of variation.Then, it employs these values to classify a time series as intermittent, using cut-off values proposed by the authors.Using this test, it was verified that the M3-Competition data set does not have intermittent characteristics, whereas the spare part dataset has intermittent characteristics.Table 3 describes the main specifications of both data sets.

Approach description and setup
To train and evaluate the forecasting models, the following time series division was adopted.Each time series (of size T) was divided into three time intervals: training interval (60%), validation interval (20%) and testing interval (20%).This division allows evaluating a time series according to its size, creating a more realistic scenario.
The first interval, with the training data, contains data from time 1 to T 1 ; and it is used to fit and setup the forecasting models.The second interval, with the validation data, has data from time T 1 +1 to T 2 ; and it is used to evaluate the performance of a forecasting method based using the predictions for this interval.The third interval, with the testing data, contains information from time T 2 +1 to T and will be used with a twofold purpose.That is, it will be used to evaluate the performance of the selection and combination strategies, and of the forecasting methods in terms of RMSE and MASE.
As described previously, to select the best forecasting method, the RMSE and MASE metrics on the validation data are used.Some authors name this approach as "past forecast performance" (Wang & Petropoulos, 2016;Fildes & Petropoulos, 2015).It means to produce a h-step-ahead forecast with h varying from 1 to T 2 −T 1 , which computes the accuracy and selects the method with the best value (lowest value) of RMSE or MASE, according to the configuration of an experiment.
The parameters and setup of the forecasting methods are the following: • Naive Forecast method (NF).Implemented considering Equation 1.The NF method does not any require parameter setup; and it can be also implemented using the SES method (described below) by setting α=1; • Simple Exponential Smoothing method (SES).The selected smoothing parameter α is the one that maximizes the log-likelihood.The first "in-sample" fitted value (i.e.ŷ 1 ) is initialized using a grid search method; • Holt's linear trend method (HOLT).The smoothing parameter for level α and the smoothing parameter for trend β are chosen by maximizing the log-likelihood.And, as in the SES method, a grid search method is used for initializing the level and the trend values; • Autoregressive method (AR).The constant term c, the model order p and the coefficients φ 1 ,...,φ p are estimated using unconditional maximum likelihood approach; • Croston Method (CR).The smoothing parameter α was set to 0.15, as suggested Teunter & Duncan (2009).The initialization values for the interval p are is first interval between demands; and the level z is the first non-zero value.

Evaluation methodology
To evaluate the performance of the forecasting methods, data from time 1 to T 2 (training and validation intervals) are employed to train the methods and data from for the time T 2 +1 to T are used to evaluate the methods (testing interval).The h-step-ahead forecast for the testing interval will be produced by varying h from 1 to T−T 2 .The RMSE and MASE errors of the forecasting methods on the testing interval are computed.This configuration allows to compare the performance of the selection procedure to the others forecasting methods.
Below, the results of the forecasting methods on the testing interval, averaged over all the time series are reported.

Results and discussion
In this subsection, the results of the forecasting methods are reported.Table 4 presents the results of the M3-Competition data set using time series with 30, 60 and all the observations (i.e.T = 30, T = 60 and Full) using the RMSE and MASE metrics to compute the accuracy.By varying the time series sizes, it can be analyzed the effect of the h and the amount of used observations to produce the forecasts for a horizon h.Table 5 shows the results of the spare part data set.Each error value, for RMSE and MASE, is calculated by averaging the error values of all the time series.In all tables, the best performing method is highlighted in bold.
Considering the time series with 30 observations (Table 4), the results reveal that the CF w method has the lowest error for MASE and RMSE.Therefore, the CF w method has good performance when compared to the other forecasting methods for small time series size.Moreover, the combination strategy using equal weights (CF m ) has the second lowest error (considering RMSE).This shows that a combination of forecasting methods can outperform other approaches.
Considering the results shown in Table 4 (T=60), the CF w method has good performance and is followed by the AUTO selection strategy, considering both metric errors.This result indicates that the CF w has good accuracy as the size of the time series increases.The CF m remains at the top of the three performing forecasting methods.
Table 4, Full column, shows the result using all the observations of the time series.In this case, the results present some differences.Regarding the AUTO method, its accuracy improved as the size of the time series increase, performing better than all the other forecasting methods, considering the RMSE metric.It suggests that the AUTO strategy is more sensitive to the size of the time series.The CF w method is also the best performing methods, achieving the second best accuracy in both metrics.
Moreover, AUTO and CF m have similar performance when considering the RMSE metric, but AUTO has worse performance when using the MASE metric.In general, all the methods increase the RMSE and MASE values as the time series size (T) increases, since the forecasting horizon (h) also increases (number of testing observations).For example, the RMSE values of the CF w method are 776.355,773.046 and 868.119 for T=30, T=60 and Full, respectively.This occurs because when h is larger, more uncertainty is associated to the time horizon h [9].In particular, the AR method may be less sensitive to this effect, since the RMSE values are 883.593and 871.413 for T=60 and full time series data, respectively.
The performance results of the spare part data set (Table 5) are similar to the results of the M3-Competition data set.The CR method, which is an approach for intermittent series, performs better than the other forecasting methods.In this case, CR has achieved 95.940 and 0.844 values for RMSE and MASE, respectively.Considering RMSE, the second best performing method is CF w , which achieved good performance on this data set, but, in this case, it does not outperform CR.When comparing the performances of AUTO and CF m , AUTO has 101.109 for RMSE and CF m has 101.047 for RMSE; and, for MASE, AUTO has 0.891 for MASE and CF m has 0.882 for MASE.Thus, the AUTO strategy has worse performance than CF m with data from the spare parts data set.
Additionally, a notable performance of SES was obtained in both data sets, performing better than AUTO and CF m in some cases.This confirms the popularity of SES in literature and among the providers of commercial software.
In general, the selection and the combination strategies have good generalization performance on both data sets, so that they can be efficiently applied to other data sets.
It should be pointed that the AUTO has the performance similar to CF m , so that an additional test was performed to analyze their accuracy.Table 6 shows the percentage of time series in which AUTO has better accuracy (lowest error) than CF m in the testing interval.For example, considering the M3-Competition data set (with "Full" times series size) and the RMSE metric, in 50.14% of the times series (from a total of 1,428 times series), AUTO outperforms CF m .The results confirm that both forecasting methods have similar performance, considering data from spare parts data set and M3-Competition.

Conclusion
This work proposes a forecasting application strategy considering two procedures, the combination of state-of-the-art forecasting methods and the selection of forecasting methods based on the models' accuracy.Two combination strategies are proposed: simple mean and weighted mean based on the methods' accuracy.This paper evaluates the model performance by using the MASE and RMSE metrics in order to measure the accuracy of the forecasting strategies under different scenarios, avoiding problems reported by previous works (Hyndman & Koehler, 2006).
To simulate different and more realistic scenarios, this work used two data sets with different characteristics, a public dataset of the M-Competition and a private data set of spare part demand from an elevator industry.This last data set presents a particular characteristic of time series called intermittency.The tested data sets allow assessing the generalization of the proposed strategies in other data sets.
The combination of forecasting methods demonstrates to be valuable if a weighting scheme based on the performance is employed.Although, the combination using simple mean outperforms other forecasting methods (such as, SES).The combination strategy is easy to understand and implement, and can be used in future works of forecasting methods.Moreover, the experiment results indicate that the automatic selection strategy based on the performance on a validation interval (AUTO) may not be good criteria for selecting forecasting models, since CF w outperforms AUTO.
In general, the results suggest that combination strategies have potential application in demand forecasting problems, outperform other state-of-the-art models in trend and stationary series, and have comparable accuracy to other models in intermittent series.Therefore, they can be used to improve production planning activities in different applications and scenarios.Therefore, future works should be devoted to test other selection criteria.For example, a selection strategy based on the inventory performance, as proposed by Wang & Petropoulos (2016).Moreover, future works can also consider using a rolling (dynamic window) forecast design (Fildes & Petropoulos, 2015;Wang & Petropoulos, 2016).The inclusion of other forecasting methods (such as, Production, 30, e20200009, 2020 | DOI: 10.1590/0103-6513.2020000912/13 multivariate methods and machine learning methods) in the pool of selection and combination models can be also considered as a future work.
11) MASE is calculated by mean(|q t |), where N is the time series size on the training interval; and y i and y i−1 are the real demand values on time i and i−1, respectively.Production, 30, e20200009, 2020 | DOI: 10.1590/0103-6513.202000096/13

Figure 1 .
Figure 1.An overview of the time intervals in this paper.

Figure 2 .
Figure 2. A time series of size 30 (T=30) from the M3-Competition data set.

Figure 3 .
Figure 3.A time series of size 60 (T=60) from the M3-Competition data set.

Figure 2 ,
Figure 2, 3 and 4 show selected time series from the M3-Competition data set, where Figure 2 shows a time series of size 30 (T=30), Figure 3 presents a time series of size 60 (T=60), and Figure 4 displays a time series of size 126 (T=126).Times series from the spare part data set are omitted to preserve the privacy of this data set.

Figure 4 .
Figure 4.A time series of size 126 (T=126) from the M3-Competition data set.
w Vector of weights in the combination of forecasts.o The forecast output vector of methods.N Number of observations Production, 30, e20200009, 2020 | DOI: 10.1590/0103-6513.202000095/13

Table 3 .
Specification of the Data Sets Used in the Experiments.

Table 4 .
Results of the M3-Competition Data Set with T = 30, T = 60 and Full (all the observations).

Table 5 .
Results of the Spare parts Data Set.

Table 6 .
Percentage of Better Accuracy of AUTO than CF m .