Models for optimising the theta method and their relationship to state space models

Accurate and robust forecasting methods for univariate time series are very important when the objective is to produce estimates for large numbers of time series. In this context, the Theta method’s performance in the M3-Competition caught researchers’ attention. The Theta method, as implemented in the monthly subset of the M3-Competition, decomposes the seasonally adjusted data into two ‘‘theta lines’’. The first theta line removes the curvature of the data in order to estimate the long-term trend component. The second theta line doubles the local curvatures of the series so as to approximate the short-term behaviour. We provide generalisations of the Theta method. The proposed Dynamic Optimised Theta Model is a state space model that selects the best short-term theta line optimally and revises the long-term theta line dynamically. The superior performance of this model is demonstrated through an empirical application. We relate special cases of this model to state space models for simple exponential smoothing with a drift. © 2016 The Author(s). Published by Elsevier B.V. on behalf of International Institute of Forecasters.


Introduction
The development of accurate, robust and reliable forecasting methods for univariate time series is very important when large numbers of time series are involved in the modelling and forecasting process.In industrial settings, it is very common to work with large lines of products; thus, efficient sales and operational planning (S&OP) depend heavily on accurate forecasting methods.
Despite the advantages of automatic model selection algorithms (Hyndman & Khandakar, 2008;Hyndman, Koehler, Snyder, & Grose, 2002;Poler & Mula, 2011), there is still a need for accurate extrapolation methods.Forecasting competitions have played an important role in moving toward the forecasting of large numbers of time series, with the objective of identifying high-performing methods.The Theta method attracted the attention of researchers by its simplicity and surprisingly good performance (Koning, Franses, Hibon, & Stekler, 2005;Makridakis & Hibon, 2000), and has been one of the benchmarks in more recent forecasting competitions (Athanasopoulos, Hyndman, Song, & Wu, 2011).The Theta method (Assimakopoulos & Nikolopoulos, 2000, hereafter A&N) is applied to non-seasonal or deseasonalised time series, where the deseasonalisation is usually performed via the multiplicative classical decomposition.The method decomposes the original time series into two new lines through the so-called theta coefficients, denoted by θ 1 and θ 2 for θ 1 , θ 2 ∈ R, which are applied to the second difference of the data.The second differences are reduced when θ < 1, resulting in a better approximation of the long-term behaviour of the series (Assimakopoulos, 1995).If θ is equal to zero, the new line is a straight line.When θ > 1, the local curvatures are increased, magnifying the short-term movements of the time series (A&N).The new lines produced are called theta lines, denoted here by Z (θ 1 ) and Z (θ 2 ).These lines have the same mean value and slope as the original data, but the local curvatures are either filtered out or enhanced, depending on the value of the θ coefficient.
In other words, the decomposition process has the advantage of exploiting information in the data that usually cannot be captured and modelled completely through the extrapolation of the original time series.The theta lines can be regarded as new time series and are extrapolated separately using an appropriate forecasting method.Once the extrapolation of each theta line has been completed, recomposition takes place through a combination scheme in order to calculate the point forecasts of the original time series.Combining has long been considered as a useful practice in the forecasting literature (for example, Clemen, 1989;Makridakis & Winkler, 1983;Petropoulos, Makridakis, Assimakopoulos, & Nikolopoulos, 2014), and therefore its application to the Theta method is expected to result in more accurate and robust forecasts.
The Theta method is quite versatile in terms of choosing the number of theta lines, the theta coefficients and the extrapolation methods, and combining these to obtain robust forecasts.However, A&N proposed a simplified version involving the use of only two theta lines with prefixed θ coefficients that are extrapolated over time using a linear regression (LR) model for the theta line with θ 1 = 0 and simple exponential smoothing (SES) for the theta line with θ 2 = 2.The final forecasts are produced by combining the forecasts of the two theta lines with equal weights.In the M3-Competition, this simplified version of the Theta method was applied only to the monthly time series (Nikolopoulos, Assimakopoulos, Bougioukos, Litsa, & Petropoulos, 2011).
The performance of the Theta method has also been confirmed by other empirical studies (for example Nikolopoulos, Thomakos, Petropoulos, Litsa, & Assimakopoulos, 2012;Petropoulos & Nikolopoulos, 2013).Moreover, Hyndman and Billah (2003), hereafter H&B, showed that the simple exponential smoothing with drift model (SES-d) is a statistical model for the simplified version of the Theta method.More recently, Thomakos and Nikolopoulos (2014) provided additional theoretical insights, while Thomakos and Nikolopoulos (2015) derived new theoretical formulations for the application of the method to multivariate time series, and investigated the conditions under which the bivariate Theta method is expected to forecast better than the univariate one.Despite these advances, we believe that the Theta method deserves more attention from the forecasting community, given its simplicity and superior forecasting performance.
One key aspect of the Theta method is that, by definition, it is dynamic.One can choose different theta lines and combine the produced forecasts using either equal or unequal weights.However, A&N limit this important property by fixing the theta coefficients to have predefined values.Thus, the Theta method, as implemented in the M3-Competition, is limited in the sense that it focuses only on specific information in the data.On the other hand, if the selection of the appropriate theta lines had been carried out through optimisation, the method could focus on the information that is actually important.
The contributions of this work are fourfold.First, we extend the A&N method by the optimal selection of the theta line that describes the short-term movements of the series best, maintaining the long-term component.The forecasts derived from the two theta lines are combined using appropriate weights, which ensures the recomposition of the original time series.Second, we provide theoretical and practical links between the newly proposed model, the original Theta method and the SES-d model.Third, we also perform a further extension of the model that allows the regression line (the long term component) to be revised at every time period.An empirical evaluation using the M3-Competition database is undertaken in order to obtain insights into the performances of the proposed models.The results reveal improvements in the forecasting accuracy when using the model with both extensions.This model outperforms several benchmarks as well as the A&N simplified version of the Theta method.Fourth, we reproduce the results for the Theta method, as applied to the monthly data in the M3-Competition, very closely.
The paper is organised as follows.Section 2 reviews the original Theta method of A&N and its relationship with the SES-d model.Section 3 presents different models for optimising the Theta method.Section 4 presents the forecasting performances of the proposed models, compared to a list of widely used benchmarks.The evaluation includes more than 3000 time series.Lastly, Section 5 presents our final comments and directions for future research.

The original Theta method
Originally, A&N proposed the theta line as the solution of the equation where Y 1 , . . ., Y n is the original time series (non-seasonal or deseasonalised) and ∇ is the difference operator (i.e., ∇X t = X t − X t−1 ).The initial values of Z 1 and Z 2 are obtained by minimising However, an analytical solution to compute the Z (θ) was obtained by H&B, which is given by where A n and B n are the minimum square coefficients of a simple linear regression over Y 1 , . . ., Y n against 1, . . ., n, given by From this point of view, the theta lines can be interpreted as functions of the linear regression model applied to the data directly.However, note that A n and B n are only functions of the original data, not parameters of the Theta method.
Finally, the forecasts produced by the Theta method for h steps ahead of n are an ad-hoc combination (50%-50%) of the extrapolations of Z (0) and Z (2) by the linear regression model and the simple exponential smoothing model respectively.We will refer to the above setup as the standard Theta method (STheta).
The steps for building the STheta method of A&N are as follows: 1. Deseasonalisation: The time series is tested for statistically significant seasonal behaviour.A time series is seasonal if where r k denotes the lag k autocorrelation function, m is the number of the periods within a seasonal cycle (for example, 12 for monthly data), n is the sample size, q is the quantile function of the standard normal distribution, and (1 − a)% is the confidence level.A&N opted for a 90% confidence level.If the time series is identified as seasonal, then it is deseasonalised via the classical decomposition method, assuming the seasonal component to have a multiplicative relationship. 12. Decomposition: The seasonally adjusted time series is decomposed into two theta lines, the linear regression line Z (0) and the theta line Z (2). 3. Extrapolation: Z (0) is extrapolated as a normal linear regression line, while Z (2) is extrapolated using SES.

Combination:
The final forecast is a combination of the forecasts of the two theta lines using equal weights. 5. Reseasonalisation: If the series was identified as seasonal in step 1, then the final forecasts are multiplied by the respective seasonal indices.
This approach, based on two theta lines with ad-hoc values for the θ coefficients and equal weights for the recomposition of the final forecasts, resulted in the best performance in the largest forecasting competition to date, the M3-Competition (Makridakis & Hibon, 2000).
1 Arguably, this seasonality test does not work well if a series has one or more unit roots with a slow rate of decay of the autocorrelation function.

SES with drift
Hyndman and Billah (2003) demonstrated that there is a relationship between the STheta method and the Simple Exponential Smoothing with drift model (SES-d) given by for t = 1, . . ., n, where {ε t } is white noise and (α, b, ℓ * * 0 ) are the smoothing, growth (drift) and initial level parameters respectively.
For a non-seasonal time series, the forecasts produced by STheta and SES-d coincide if where ℓ * 0 is the initial level parameter of the SES model applied to Z (2).The second part of Eq. ( 6) is more general than in H&B's derivation, since they used a simple initialisation for the SES model, i.e., ℓ * We can deal with seasonal time series by considering the same prior seasonal test, prior seasonal adjustment and posterior reseasonalisation steps of STheta.

Other generalisations of Theta method
Very few generalisations of the univariate STheta method have been proposed in the literature.For example, Nikolopoulos and Assimakopoulos (2005) and Petropoulos and Nikolopoulos (2013) argue for the use of more theta lines, θ ∈ {−1, 0, 1, 2, 3}, in order to extract even more information from the data.The empirical evidence suggests that the consideration of more/different theta lines can result in improvements relative to the original Theta method, but a formal procedure for the selection of appropriate theta lines is yet to be proposed.
Moreover, Constantinidou et al. (2012) and Petropoulos and Nikolopoulos (2013) suggested the use of unequal weights in the recomposition procedure for the final forecasts.This approach is appealing intuitively, as asymmetric weights, which are linked directly with the forecast horizon, are likely to offer better approximations of the shortand long-term components.However, by definition, the decomposition of the original series in Z t (0) and Z t (2) suggests the use of equal weights, if the aim is to reconstruct the original signal: In other words, the use of weights that are derived directly from the decomposition procedure (the corresponding θ coefficients) may provide a more valid model.

Models for optimising the Theta method
Assume that either the time series Y 1 , . . ., Y n is nonseasonal or it has been seasonally adjusted using the multiplicative classical decomposition approach.Let X t be the linear combination of two theta lines, where ω ∈ [0, 1] is the weight parameter.Assuming that θ 1 < 1 and θ 2 ≥ 1, the weight ω can be derived as It is straightforward to see from Eqs. ( 7) and ( 8) that X t = Y t , t = 1, . . ., n, i.e., the weights are calculated properly in such a way that Eq. ( 7) reproduces the original series.In Theorem 1 of Appendix A, we prove that the solution is unique and that the error from not choosing the optimal weights (ω and 1 − ω) is proportional to the error of a linear regression model.As a consequence, the STheta method is given simply by setting θ 1 = 0 and θ 2 = 2, while from Eq. ( 8) we get ω = 0.5.Thus, Eqs. ( 7) and ( 8) allow us to construct a generalisation of the Theta model that maintains the re-composition propriety of the original time series for any theta lines Z t (θ 1 ) and Z t (θ 2 ).
In order to maintain the modelling of the long-term component and retain a fair comparison with the STheta method, in this work we fix θ 1 = 0 and focus on the optimisation of the short-term component, θ 2 = θ with θ ≥ 1.Thus, θ is the only parameter that requires estimation so far.The theta decomposition is now given by 0 is the extrapolation of Z t (θ ) by an SES model with ℓ * 0 ∈ R as the initial level parameter and α ∈ (0, 1) as the smoothing parameter.Note that for θ = 2, Eq. ( 9) corresponds to Step 4 of the STheta algorithm.After some algebra, we can write where In the light of Eqs. ( 9) and ( 10), we suggest four stochastic approaches.These approaches differ due to the parameter θ , which may be either fixed at two or optimised, and the coefficients A n and B n , which can be either fixed or dynamic functions.To formulate the state space models, it is helpful to adopt µ t as the one-stepahead forecast at origin t − 1 and ε t as the respective additive error, i.e., ε t = Y t − µ t if µ t =  Y t|t−1 .We assume {ε t } to be a Gaussian white noise process with mean zero and variance σ 2 .

Optimised and standard Theta models
Let A n and B n be fixed coefficients for all t = 1, . . ., n, so that Eqs. ( 9) and ( 10) configure the state space model given by with parameters ℓ 0 ∈ R, α ∈ (0, 1) and θ ∈ [1, ∞).The parameter θ is to be estimated along with α and ℓ 0 .We call this the optimised Theta model (OTM).
The h-step-ahead forecast at origin n is given by which is equivalent to Eq. ( 9).The conditional variance For θ = 2, OTM reproduces the forecasts of the STheta method; hereafter, we will refer to this particular case as the standard Theta model (STM).In Theorem 2 of Appendix A, we show that OTM is mathematically equivalent to the SES-d model.As a corollary of Theorem 2, STM is mathematically equivalent to SES-d with b = 1 2 B n .
Therefore, for θ = 2, the corollary also re-confirms the H&B result on the relationship between STheta and the SES-d model.

Dynamic optimised and dynamic standard Theta models
So far, we have set A n and B n as fixed coefficients for all t.We will now consider these coefficients as dynamic functions; i.e., for updating the state t to t + 1 we will only consider the prior information Y 1 , . . ., Y t when computing A t and B t .Hence, we replace A n and B n in Eqs. ( 9) and ( 10) with A t and B t .Then, after applying the new Eq.( 10) to the new Eq.( 9) and rewriting the result at time t with h = 1, we have Then, assuming additive one-step-ahead errors and rewriting Eqs. ( 3) and ( 14), we obtain for t = 1, . . ., n. Eqs. ( 15)-( 20) configure a state space model with parameters ℓ 0 ∈ R, α ∈ (0, 1) and θ ∈ [1, ∞).
The initialisation of the states is performed assuming From here on, we will refer to this model as the dynamic optimised Theta model (DOTM).
An important property of the DOTM is that when θ = 1, which implies that Z t (1) = Y t , the forecasting vector given by Eq. ( 9) will be equal to  Y t+h|t =  Z t+h|t (1).Thus, when θ = 1, the DOTM is the SES method.When θ > 1, DOTM (as SES-d) acts as a extension of SES, by adding a long-term component.Also, for θ = 2, we have a stochastic approach of STheta, which is referred to hereafter as the dynamic standard Theta model (DSTM).
The out-of-sample one-step-ahead forecasts produced by DOTM at origin n are given by However, the variance and prediction intervals for Y n+h can be estimated using the bootstrapping technique, where a (usually large) sample of possible values of Y n+h is simulated from the estimated model.Note that, in contrast to STheta, STM and OTM, the forecasts produced by DSTM and DOTM are not necessary linear.This is also a fundamental difference between DSTM/DOTM and SES-d: while the long-term trend (b) in SES-d is constant, this is not the case for DSTM/DOTM, for either the in-sample fit or the out-of-sample predictions.

Parameter estimation
The parameters are estimated by minimising the sum of squared errors (SSE), Of course, the SSE does not necessarily need to start at t = 1.We suggest starting at t = 3 for DSTM/DOTM, since A t and B t are linear regression coefficients and need at least two points in order to be well defined.
The SSE estimator is equivalent to a maximum likelihood estimator.This result follows from the supposition of Gaussian distributed errors, since, after replacing σ 2 with its estimator  σ 2 = SSE/n, the log-likelihood is given by (1 + log 2π ).
In this section, we have proposed the STM, OTM, DSTM, and DOTM, four very simple and easy to implement models.The latter three models expand the robust Theta method of A&N, and all four build on the state space approach (Hyndman et al., 2002).These models use just two theta lines, with OTM and DOTM optimising the amplification of the local curvature.The forecasts derived from these theta lines are combined optimally so as to retain the re-composition of the original signal.In the next section, we evaluate the performances of the proposed models on the M3-Competition data set.

Design
In order to obtain insights into the performances of the proposed models, STM, OTM, DSTM and DOTM, we present their accuracies compared to each other and to the STheta and SES-d approaches.A full list of the methods and models considered is presented in Table 1, along with the starting values for optimising the various parameters.Note that, to mimic what might be done in practice, the starting values are based on the model being used, and do not correspond mathematically to the mathematically equivalent models/method.
We consider two variants of the SES-d model.The first considers a fixed value for b (equal to B n /2).Assuming perfect optimisers, we should expect this version to produce the same forecasts as STheta and STM.The second version optimises the value of b and is mathematically equivalent to OTM.Perfect optimisers should also produce the same forecasts for these two models; that is, choices such as the starting values for the parameters should not matter.However, we know that different starting values may affect the optimal value of a parameter even for the same model.The parameter estimation is based on minimising the sum of squared errors (SSE) using the Nelder-Mead algorithm, as implemented in the optim() function of the R statistical software.Moreover, we consider five benchmarks that have been used widely in the forecasting literature.A full list and details of the benchmark methods considered are presented in Table 2, including automatic algorithms implemented in the forecast package by Hyndman and Khandakar (2008).
The various Theta methods and models listed in Table 1 are applied to the seasonally adjusted data, with the final forecasts being reseasonalised following the procedure described in Section 2 (steps 1 and 5).The five benchmark methods (Naive, SES, Damped, ETS and ARIMA) are applied to both the original data and the seasonally adjusted data, where the same deseasonalisation/reseasonalisation procedure has been followed.In all cases, the seasonally adjusted data and the seasonal indices to be used for the reseasonalisation are calculated by considering only  13), where θ = 2 ℓ 0 = y 1 /2, α = 0.5 OTM 3.1 (11)-( 13) 20), where θ = 2 ℓ 0 = y 1 /2, α = 0.5 DOTM 3.2 ( 15)-( 20) The benchmark methods used in the current study.the in-sample data points (training set) and setting the confidence level to 90%.Adjusting the data for seasonality prior to forecasting has also been the practice in other forecasting studies, such as the M3-Competition (Makridakis & Hibon, 2000).However, we believe that a higher confidence level (95%) has been used for identifying series as seasonal.

Method
The evaluation is performed by considering real data from the M3-Competition (Makridakis & Hibon, 2000), giving a total of 3003 time series of different frequencies.Table 3 presents the distribution of the series across the different frequencies.The forecast horizon used in this study matched that of the original M3-Competition.The empirical evaluation was implemented using the opensource statistical software provided by R Core Team (2015) (version 3.2.1),and the packages forecast 6.1 and Mcomp 0.10-34.The computer used for this task was equipped with an Intel i5-4200U processor with 8 GB of RAM, and was running Windows 10.
We measured the out-of-sample performances of the different methods using two widely-used accuracy metrics, namely the symmetric mean absolute percentage error (sMAPE) and the mean absolute scaled error (MASE).The sMAPE is selected in order to enable us make comparisons with the original M3-Competition results, even though it penalises positive forecast errors more heavily than negative ones, with the discrepancy increasing at an increasing rate (see Figure 1 of Goodwin & Lawton, 1999).The calculation of the sMAPE is based on the symmetric absolute percentage error (sAPE), defined as The MASE metric was proposed by Hyndman and Koehler (2006), and is calculated based on the absolute scaled error (ASE), which is the absolute error divided by the mean of the absolute of the first seasonal difference of the time series, i.e., where m is the number of periods in a year (one for yearly, four for quarterly, 12 for monthly and one for other data).
In both sMAPE and MASE, the mean is calculated across both horizons and time series at once.This means that we assign equal weights to the point forecast errors of yearly and monthly series.

Results
The results regarding the forecasting performances of the various methods are presented in Table 4.The best result in each frequency (column) is marked in bold.We highlight the results for the four models proposed in this study in grey shading.
Focusing on the non-shaded panels, the STheta method had the best performance across all benchmarks, as expected, according to the sMAPE measure.Any numerical differences from the results published in the M3-Competition (Makridakis & Hibon, 2000) can be attributed to the use of different pre-fixed theta coefficients and extrapolation methods for each frequency of the data (see Nikolopoulos et al., 2011, for more details).The small differences in the monthly data are due to the use of different software and estimation procedures for the smoothing parameters and the initial level when extrapolating the

Z (2).
This is a close reproduction of the Theta method as it was applied to the monthly data in the M3-Competition.The sMAPE of STheta for the monthly time series in this study is 13.83%, versus the value of 13.85% published by Makridakis and Hibon (2000).Moreover, we managed to obtain the exact same populations of seasonal quarterly and monthly series (555 and 780 respectively) as those reported by Nikolopoulos and Assimakopoulos (2005), by rounding the critical value of the t-statistic for identifying a series as seasonal to two decimal places (1.64).Thus, this paper also contributes to the replicability/reproducibility agenda (Boylan, Goodwin, Mohammadipour, & Syntetos, 2015).This is also the first study to demonstrate the practical equivalence of the STheta method and the SES-d model, when the value of b is fixed to B n /2.The very minor numerical differences are attributed to the use of nonperfect optimisers, resulting in the selection of different optimal values for the initial level, ℓ * and ℓ * * , and the smoothing parameter, α.
Focusing on the sMAPE measure for the benchmark methods (non-shaded panel), the superior performances of the STheta method and the SES-d (b = B n /2) model are followed by that of Damped (on the seasonally adjusted data) and ETS.Examining the results across the various frequencies, STheta and SES-d (b = B n /2) perform especially well for monthly and yearly data, while underperforming relative to Damped and ETS for the other data.
Considering the MASE metric, STheta and SES-d (b = B n /2) once again perform very similarly.However, the best performer of the benchmark methods (non-shaded panel) is the SES-d (b optimised) model, which performed rather poorly according to the sMAPE.This discrepancy between the two measures is due to the properties of the sMAPE, and the heavy penalty that it places on large positive errors.To demonstrate this, Appendix B (Table 7) also reports the median value of the symmetric absolute percentage errors (sMdAPE) across horizons and time series.We observe that, apart from SES-d (b optimised), the relative rankings of various other methods are also improved (namely ETS, ARIMA and Damped applied to the seasonally adjusted data).Thus, we conclude that the results reported by the sMAPE are influenced heavily by the presence of positive symmetric percentage errors that are outliers.
As expected, the STM generates results that are equivalent to those of both the STheta method and the SESd (b = B n /2) model.Similarly, the performance of OTM is similar to that of SES-d (b optimised) for the MASE metric, apart from the other data.At the same time, the OTM is less susceptible to resulting in outlying positive errors than SES-d (b optimised), and thus performs better according to the sMAPE measure.This is apparent from the median and trimmed values of sMAPE reported in Appendix B (Table 7) and Table 5, respectively.The trimmed values of sMAPE are calculated as the arithmetic mean of the point forecast errors, after excluding the 10% of the series that produced the highest sMAPE values and the 10% that produced the lowest.
There are several reasons for the divergent results between OTM and SES-d (b optimised).First, the parameter space of b in SES-d does not correspond exactly to the θ parameter space in OTM.Second, the starting values in the optimiser (see Table 1) are natural ones for each model, and do not correspond mathematically (see Theorem 2 of Appendix A).These different starting values will contribute to both the differences in the comparison metrics that are caused by suboptimal solutions, and the differences in computational times (see Table 4).Third, the same increments for the parameters in the optimiser do not correspond mathematically in the two models.
Focusing on the two dynamic models (DSTM and DOTM), the DOTM produces the most accurate forecasts, outperforming the other methods and models in this study overall.This is true for all three error measures considered (sMAPE, sMdAPE and MASE).The DOTM outperforms all other Theta variants significantly for the yearly and other frequencies.This is a very interesting result, as the series classified as other was the single data category for which STheta did not perform well compared to the other benchmarks.However, DOTM is still outperformed in this category by ETS, Damped, and ARIMA.The only data frequency for which the DOTM does not improve on STheta is the quarterly time series, but the performances of the two are very similar.
We also considered the multiple comparisons with the best (MCB) test for all frequencies in order to compare DSTM, OTM and DOTM with STheta and SES-d (b optimised) statistically.In this test, a rank interval is constructed for each method (see Koning et al., 2005, for more details) using the mean absolute error (which will give ranks equivalent to those of the MASE).When the rank intervals considering pairs of methods do not overlap, the null hypothesis of the same performances is rejected in favour of the alternative hypothesis of significantly different performances.The average ranks and rank intervals of each method are presented in Fig. 1, which also presents a comparison of the average rank of each method with the best average rank, adopting a 10% significance level.
In line with our insights when examining the summarised results of the sMAPE and MASE measures, the DOTM provides better performances than the other four approaches, being ranked significantly higher.The models SES-d (b optimised) and OTM are slightly better (not significantly different) than either STheta or DSTM.

Discussion
Previous studies have shown that the Theta method is particularly efficient for trended data (Thomakos & Nikolopoulos, 2014).In order to obtain a better understanding of where the improvements are derived from, we split the data into non-trended and trended series, then calculate the percentage decrease in the value of the MASE when comparing DSTM and DOTM for each type of data.The categorisation of a series as trended or not is based directly on the model form chosen by the ETS algorithm applied to the original data.Table 6 presents the percentage drops in the value of MASE (increase in accuracy) of DOTM over DSTM.We observe that the performance improvements are driven mainly by the trended series.In such cases, DOTM outperforms DSTM by 10.21% and 19.85% for yearly and other data respectively.Also, larger than average improvements are recorded for the monthly and quarterly trended series.Similar insights were obtained when contrasting the performances of the two methods using the sMAPE measure.
However, the question remains as to why the DOTM performs better than the DSTM.The optimisation of the θ value for the second theta line is linked directly to the amplification of the local curvatures of the series (A&N).
The quite arbitrary selection of θ = 2 suggests that the long-term deterministic trend is just as important as the short-term behaviour of the series, which might not be the case for all time series.Therefore, DOTM selects the degree of amplification of the short-term behaviour of the series optimally.An analysis of the optimally-selected θ values shows that when θ ≤ 2 (58% of the series), the average performance improvement is −0.12% (percentage difference between the DOTM and DSTM performances in terms of MASE).However, the improvement increases to 7.57% and 9.94% for the series where θ > 2 (42%) and θ > 3 (31%) respectively.Similar insights are obtained for the sMAPE metric.Thus, we observe that DOTM works particularly well when there is a need to consider higher theta values that capture and model the short-term behaviour of the series effectively.
Of course, one could argue that the OTM also considers optimal values of the θ parameter.However, OTM falls short of DOTM with regard to the stochasticity aspect of the linear regression part of the model, and its effect on the selection of an optimal θ value.To test this, we calculate the absolute percentage differences of the optimal θ values for OTM and DOTM, and divide the series into two equally sized groups, corresponding to small and large differences.The performance improvement of DOTM over OTM, measured via the MASE, increases from 0.39% for small differences in the optimised θ values to 3.44% for large differences.As a result, the dynamic updating of A t and B t has a positive impact on the optimisation of the θ values, especially when a significantly different value is selected.
In terms of the computational times achieved, DOTM and DSTM are more computational intensive than the original Theta model, as expected.It could be argued that the ×1.5 additional computational cost of DSTM over STheta is not worthwhile, given the marginal gains in forecasting performance.However, the robust performance of DOTM clearly pays off.In any case, the calculation times of both models are significantly lower than those of the two automatic model selection algorithms (ets() and auto.arima())implemented using the forecast package (Hyndman & Khandakar, 2008).

Concluding remarks
In this paper, we have proposed a generalisation of the Theta method, namely the dynamic optimised Theta model.The DOTM selects the theta line to be used for the extrapolation of the short-term component of the series optimally, and also revises the A t and B t in the longterm component at each time period t.In addition, the proposed model is provided under a state space approach, which allows already consolidated statistical tools to be used for parameter estimation.The newly proposed model was contrasted with the original Theta method and other variants such as the SES-d model, both theoretically and empirically.
In terms of empirical forecasting performances, DOTM demonstrated improvements over the Theta method for all frequencies and error measures considered.At the same time, it was also the top performing approach across all benchmarks overall.Moreover, DOTM produced the best average ranking, which is statistically different from that of the original Theta method.Apart from the very promising empirical results for the DOTM, this study replicates the Theta method for the monthly time series in the M3-Competition.We also proved the mathematical equivalence of special cases of the DOTM, and compared their empirical forecasts in order to examine how much the optimiser affects the forecasts.
We believe that this study has significant managerial implications.We show that the new, optimised version of the Theta method improves on the forecasting performance of the original approach.Keeping in mind that the original Theta model was already a very good and robust estimator for fast-moving demand time series, the new DOTM achieves even higher levels of forecasting accuracy, which can be translated directly into profits.Thus, the DOTM is able to provide better statistical estimates, which can then be combined with judgemental overrides (Franses & Legerstee, 2011) in order to produce the final (operational) forecasts.
DOTM could be extended further by considering the appropriate selection of extrapolation methods for the theta lines, rather than using pre-fixed estimators, such as a linear regression line for Z (0) and SES for Z (θ).Another path for future research should include the application of the proposed DOTM to a data set that is dominated by stationary data (Thomakos & Nikolopoulos, 2014).In addition, the current seasonality test should be revisited to allow it to distinguish between additive and multiplicative seasonality, and to work well if the series has one or more unit roots.Another interesting study would have the goal of obtaining an understanding of the differences between DOTM and the state space model for exponential smoothing with a stochastic trend (ETS(A,A,N)).Since DOTM outperforms the damped trend model ETS(A,Ad,N), it would be expected to outperform the ETS(A,A,N) model.Discovering the reasons why could be enlightening.

Table 1
The different Theta methods and models considered in the empirical evaluation.

Table 4
Empirical results for all methods using the sMAPE and the MASE.

Table 5
Empirical results for SES-d (b optimised) and OTM for the trimmed sMAPE.

Table 6
Percentage improvements of DOTM over DSTM in terms of MASE (the numbers in brackets refer to sample sizes).