Abstract

In order to reduce the dimensionality of parameter space and enhance out-of-sample forecasting performance, this research compares regularization techniques with Autometrics in time-series modeling. We mainly focus on comparing weighted lag adaptive LASSO (WLAdaLASSO) with Autometrics, but as a benchmark, we estimate other popular regularization methods LASSO, AdaLASSO, SCAD, and MCP. For analytical comparison, we implement Monte Carlo simulation and assess the performance of these techniques in terms of out-of-sample Root Mean Square Error, Gauge, and Potency. The comparison is assessed with varying autocorrelation coefficients and sample sizes. The simulation experiment indicates that, compared to Autometrics and other regularization approaches, the WLAdaLASSO outperforms the others in covariate selection and forecasting, especially when there is a greater linear dependency between predictors. In contrast, the computational efficiency of Autometrics decreases with a strong linear dependency between predictors. However, under the large sample and weak linear dependency between predictors, the Autometrics potency ⟶ 1 and gauge ⟶ α. In contrast, LASSO, AdaLASSO, SCAD, and MCP select more covariates and possess higher RMSE than Autometrics and WLAdaLASSO. To compare the considered techniques, we made the Generalized Unidentified Model for covariate selection and out-of-sample forecasting for the trade balance of Pakistan. We train the model on 1985–2015 observations and 2016–2020 observations as test data for the out-of-sample forecast.

1. Introduction

Since the beginning of time-series analysis, modeling and forecasting have been the center of attraction. The accuracy of the model in time-series analysis is always unknown. Only one in a million models can be accurate; ‘‘essentially, all models are wrong, but some are useful’’ [1]. However, the massive availability of data in the current era leads us to a new phase of time-series analysis for model selection and forecasting. Including many financial and economic covariates in the time-series model for superior prediction may yield considerable benefits. However, parsimonious models in time-series analysis are superior in forecasting. Failure to decrease dimensionality may lead to poor performance due to cumulative estimation losses from redundant or insignificant variables.

On the other hand, the traditional time-series modeling for covariates and lag selection in Autoregressive Distributed Lag (ARDL) modeling uses Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) [2, 3]. This technique is limited to the number of covariates, and their lag must not be greater than the number of observations. The traditional Ordinary Least Square method fails to estimate the forthcoming models with huge regressors and limited observations due to inadequate degrees of freedom. Several statistical techniques exist in the literature for model selection and forecasting when covariates and their lags are more than the number of observations. Meanwhile, classical approach (Autometrics, general-to-specific) and regularization techniques (Machine Learning) are frequently used in time-series modeling when covariates exceed the number of observations. Besides these techniques, complex network theories provide an efficient and reliable solution for handling time-series issues. In recent years, the complex network has been extensively used in socioeconomic phenomena [46]. However, this study aimed to identify the true covariate and evaluate the model's forecasting performance, and we only concentrated on regularization techniques and the classical approach.

The use of sparse modeling has grown widely in time-series analysis as it can efficiently handle big macroeconomic data sets and substitute the factor models [713]. For the time being, Medeiros & Mendes [14] find that the adaptive LASSO (AdaLASSO) consistently chooses the essential covariates as the number of observations grows (model selection consistency) even when the errors are non-Gaussian and conditionally heteroscedastic. Audrino and Camponovo [15] illustrate the theoretical and empirical efficiency of AdaLASSO, as it asymptotically selects covariate with finite-sample in time-series regression models. Covariates and their lag selections are challenging in time-series modeling, mainly when there is a mixture of serial correlation [16]. To probe this gap, Konzen and Ziegelmann [17] introduce Weighted Lag adaptive LASSO (WLAdaLASSO), which applies various weights to each coefficient and penalizes coefficients of higher-lagged variables. The WLAdaLASSO outperforms LASSO and AdaLASSO in forecasting and covariate selection, even in a greater linear dependency between predictors with many candidate lags, whereas Uematsu and Tanaka [18] use folded concave penalties for ultra-high-dimensional time-series forecasting and covariate selection. They verify the oracle inequalities of folded concave penalties (SCAD and MCP) for macroeconomic time series under appropriate conditions with the theoretical and empirical contribution.

In the meantime, very few studies exist that utilize the classical technique (Autometrics) in the context of macroeconomic forecasting [1922]. In cross-sectional modeling, Epprecht et al. [23] compare the LASSO and AdaLASSO estimate with classical technique (Autometrics) in forecasting and covariate selection. The result indicates that LASSO and AdaLASSO estimates outperform Autometrics in prediction. Conversely, for time-series modeling with dynamic structure, WLAdaLASSO outperforms forecasting and covariate selection than LASSO and AdaLASSO. However, we have not come across a work that has compared the computational efficiency of regularization techniques, particularly WLAdaLASSO, SCAD, and MCP, with classical technique (Autometrics) in dynamic time-series modeling. For this purpose, we implement an updated regularization technique for dynamic time-series modeling to assess their performance with the classical approach (Autometrics) for covariate selection and forecasting theoretically and empirically. Furthermore, we assess the efficiency of these techniques in simulation experiments where the real data generating process (DGP) has a dynamic structure. To summarize the entire discussion, our main contribution is a comparison of WLAdaLASSO and Autometrics for covariate selection and forecasting under different scenarios with various autocorrelation coefficients (0.1, 0.5, and 0.8) of regressors and T sample sizes (50, 100, and 500), as well as application to macroeconomic data to provide a conclusive solution to predictability. The computational efficiency of these techniques is assessed in terms of gauge, potency, and out-of-sample Root Mean Square Error (RMSE). We constructed a Generalized Unidentified Model (GUM) and considered all the possible macroeconomic determinants of the trade balance for real data analysis. The techniques are not restricted to the balance of trade but valid for any time series.

The rest of the paper is organized as follows: in Section 2, we will briefly illustrate the model selection techniques. Section 3 is based on the simulation experiment and results. Section 4 discusses real data analysis. Finally, Section 5 presents the conclusion and remarks on the efficacy of considered techniques.

2. Model Selection Techniques

Technically, two broad spectrums of model selection exist in the literature: regularization techniques and classical approach (Autometrics, general-to-specific), whenever P regressors are greater than N number of observations. The classical approach (Autometrics, general-to-specific) starts with a fully saturated model and uses a backward elimination with the multipath search process, and the selection of the model mainly depends on the predefined significance level. However, the regularization technique applies the sparsity on the p-dimensional parameter vector, which forces many of its components to be zero. This technique combats the issues posed by high-dimensionality. We describe each of these techniques in more detail, but we only considered orthogonal regularization techniques.

2.1. Autometrics Algorithms for Covariate and Lag Selection

Autometrics is a third-generation algorithm created on similar concepts of PcGets. Hoover et al. [24] proposed the general-to-specific model selection technique that aggregates many elements of the “Hendry” methodologies and “London School of Economics (LSE).” Doornik [25] proposed PcGets as a second-generation method extended by Krolzig and Hendry [26], prolonging and enlightening Hoover and Perez’s algorithm [26, 27]. The concept of general-to-specific (gets) modeling is the cornerstone of the Autometrics approach:(i)Initially, the GUM includes the overall covariates and estimates it by the OLS method with expelling statistically irrelevant covariates; the reliability of the reduced model is confirmed at each stage to prove the congruence with diagnostic tests.(ii)Autometrics uses a tree-path search with multistep simplifications along numerous paths. Final models are calculated using a tree-path search and confirmed using diagnostic tests; if the coefficient estimates are statistically insignificant, the model is discarded. When a large number of terminal models are identified, Autometrics retests their union. A new GUM is created when the “surviving” terminal models are combined, allowing for one more tree-path search repeat. The entire exploration process is repeated, with the terminal models and their combinations being examined once again. If many models pass the encompassing tests, the final choice is based on predetermined information criteria.

Diagnostic tests are used to double-check the simplified models, while comprehensive tests resolve numerous terminal models. For diagnostic tests, Autometrics uses Jarque and Bera [28] residual normality test, Breusch and Pagan [29], and Godfrey [30] second-order residual autocorrelation, autocorrelated conditional heteroscedasticity (ARCH) to second-order [31], and in-sample stability [32]. In some aspects, Autometrics is a partially black box [23]. However, it allows the user to choose between “nominal significance level” and “1-cut and tight significance level” when establishing modeling approaches. The multipath approach avoids path dependency by using a tree structure and a similar stepwise backward elimination, a built-in function of the gets package in R environments [33].

2.2. Regularization Techniques

Regularization techniques handle saturated models with irrelevant regressors even if regressors are more than the number of observations and shrink the irrelevant coefficients equal to zero with some bias. Several regularization techniques exist in the literature; we only opt orthogonal regularization techniques for dynamic covariate selection and forecasting: LASSO, AdaLASSO, WLAdaLASSO, SCAD, and MCP.

2.2.1. LASSO and AdaLASSO Estimate

Due to lower computation cost, the Least Absolute Shrinkage and Selection Operator (LASSO) is a popular estimation method in a linear regression framework, introduced by Tibshirani (1996). The LASSO method is like ridge regression; however, it set some coefficients precisely equal to zero with a substantial bias. The resulting model is easy to interpret and possesses the least forecast error. Consider a linear regression model, where y =  are the continuous response regressors, and xit =  are the covariates with its lag, and are the estimated coefficients. The equation can be defined aswhere is a penalty function and is the hyperparameter.

The second term in the above equation is defined as “L1 penalty,” and leads to a sparse solution with a shrinking specific set the coefficients precisely equal to zero with a certain amount of bias. The amount of shrinkage depends upon the selection of , whereas its range is 0 <  < ∞.

Zou [34] demonstrated that the LASSO estimator lacks the oracle characteristic and introduced the adaptive LASSO, a simple and effective solution. In contrast, the coefficients in LASSO are all penalized equally in the 'L1 penalty. However, in AdaLASSO, each coefficient is given a distinct weight. Zou [34] showed that the AdaLASSO could have the best results if the weights are data-dependent and carefully chosen; then, the AdaLASSO can possess the oracle property.

, τ >0, and is an initial parameter estimate. The weights for zero coefficients diverge (to infinity) as the sample size expands, and nonzero coefficients converge to a finite constant. To estimate the , Zou [34] recommended the OLS method. However, when the number of candidate variables exceeds the number of observations, the OLS method does not work. A ridge estimate can be employed as an initial estimator in this case.

2.2.2. Weighted Lag Adaptive LASSO (WLAdaLASSO)

The Weighted Lag Adaptive LASSO (WLAdaLASSO) was introduced by Konzen and Ziegelmann [17] and established on the concept of Park and Sakaori [35] work. It is defined as another type of LASSO estimate specifically for time-series modeling with lag structure. The idea is similar to AdaLASSO and built for the time-series ARDL framework, as the more distant lags have a more negligible effect in predicting the dependent variable, imposing more enormous penalties on them.

Here, , l is the lag length, τ >0, and α ≥ 0 are tuning parameters. Moreover, is an initial parameter estimate. τ = 1 like in AdaLASSO. To pick α, Konzen and Ziegelmann (2016) suggest estimating the model for a given λ using a grid (0; 0 : 5; 1; : : : ; 10) and choose the one with the lowest BIC and the λ parameter selected on the same criteria of the lowest BIC.

2.2.3. SCAD and MCP Estimate

Smoothly Clipped Absolute Deviation is unbiased and sparse (i.e., small estimated coefficients automatically set to zero) and fulfills the condition of continuity proposed by Fan and Li [36]. The smoothly clipped absolute deviation (SCAD) for covariate selection and its lags are defined aswhere x is the matrix of covariates and its lag, and the second term in the above equation is , that is, a penalized term designed to meet all three requirements (unbiasedness, sparsity, and continuity). The SCAD has proven to be effective in many statistical circumstances, such as cross-sectional regression and time-series modeling [18]. P(γ|λ, α) is a folded concave penalty; unlike LASSO, it depends on two tuning parameters, and penalties depend on λ in a nonmultiplicative way, so that P(α|λ) = λP(α). Additionally, the tuning parameter α controls the concavity of the penalty. The maximization of the objective function depends on α and λ, whereas α equals 3.7 and λ is selected via cross-validation [36].

Zhang [37] introduced the Minimax Concave Penalty (MCP), a nonconvex penalization strategy that employs sparse area up to a particular variable selection threshold, resulting in an unbiased estimate.

MCP uses regularization path based on the family of nonconvex penalty function with two tuning parameters α and λ, where α is fixed, and λ is selected via cross-validation. The tuning parameter λ controls the amount of shrinkage and α concavity of penalty. MCP prevents the spares convexity to a greater extent due to minimizing the maximum concavity [37]. The regularization parameter tends to larger α coefficient affords less unbiased and more convexity [37]. SCAD and MCP estimates belong to a family of folded concave penalties, as P(·) penalty function is neither convex nor concave.

2.3. Selection of Tuning Parameters for Regularization Techniques

The selection of λ tuning parameter is crucial as it governs the complexity of the selected model. The choice of the optimal tuning parameter provides a parsimonious model with a precise prediction performance. In practice, the tuning parameter is frequently selected using a cross-validation approach to achieve prediction optimality. Such prediction optimality is frequently at odds with covariates selection; however, the objective is to recover the underlying set of sparse variables: frequently, a bigger penalty parameter is required for covariate selection than the optimal prediction [38]. However, the BIC criterion is superior to cross-validation for covariate selection, but it has no theoretical explanation. Meanwhile, WLAdaLASSO with BIC-based tuning parameter outperforms others in covariate selection and out-of-sample forecast [17]. Hence, we use BIC-based tuning parameters for covariate selection and out-of-sample forecast in simulation exercises and real data analysis.

2.4. Theoretical Comparison

To compare these techniques, we use Gauge, Potency, and out-of-sample RMSE. Gauge is the empirical null retention frequency of how irrelevant covariates are retained, whereas potency is known as correct covariate identifications. The comparison of regularization techniques and Autometrics assessed via a correct zero identification interpreted as potency, and incorrect zero identification is referred to as Gauge [39]. We use RMSE for out-of-sample forecasting to evaluate the performance of concerned techniques in a simulation study and real data analysis. If the approaches correctly identify the accurate model, the estimations of the following parameters should be expected:(1)Gauge approaches to nominal significance level α or tight significance level (0.01 or 0.001).(2)Potency approaches 1 if considered estimation techniques efficiently estimate the accurate model.

3. Simulation Experiments and Results

The simulation study has been performed in R-free statistical software; for Autometrics, we used the gets package of R, which is freely available, and for regularization techniques, we use the glmnet for LASSO, AdaLASSO, and WLAdaLASSO as for SCAD and MCP, the ncvreg package. The performance of Autometrics for covariate selection and forecasting is assessed with two levels of significance 0.05 and 0.01.

3.1. Data Generating Process (DGP)

We use Konzen and Ziegelmann [17] DGP for statistical comparison. Regarding covariate and lag selection performance, our goal is to compare the gauge (“size”) and potency (“power”) of the model. We also emphasize the out-of-sample forecasting performance of considered techniques. To illustrate our purpose, we chose Konzen and Ziegelmann [17] DGP with 10 independent time-series covariates that follow AR(1) as xi,t = ϕxi,t-1 + μi,t where μi,t ∼ N(0, 1) and i = 1, 2, …, 10. We assess the performance of considered techniques under different scenarios based on the same linear model with varying autocorrelation coefficients AR(1) ϕ equal to 0.1, 0.5, and 0.8 and T number of observations equal to 50, 100, and 500.

The considered DGP is as follows:

We employ WLAdaLASSO, Autometrics, and other regularization techniques to estimate the model. The lag length of dependent and independent regressors is equal to 5 throughout the simulation study with varying T observations and ϕ parameter of independent regressors. We eliminate the last ten observations of the simulated series to implement the out-of-sample RMSE. The RMSE of the out-of-sample forecast is reported in the below figures, and simulation is repeated 1000 times.

3.2. Simulation Results

Tables 13 illustrate the simulation findings of considered techniques in terms of average gauge and potency. The simulated result of out-of-sample RMSE is presented in the figures. The above indicates that, among all concerned techniques, WLAdaLASSO outperforms others in potency equal to 63.6%, with T being equal to 50, whereas Autometrics with a 0.01 level of significance retains the least potency of 16.1% on average. As the sample size increases, the performance of considered techniques is improved in average potency (increases) and average gauge (decreases). Table 1 indicates that Autometrics retains its average gauge around its nominal significance level (0.05) and 0.01 at the cost of least average potency. However, with an increase in sample T equal to 500, the Autometrics with 0.05 significance performs near WLAdaLASSO both in potency and gauge. Among regularization techniques, LASSO, AdaLASSO, SCAD, and MCP perform inferior to WLAdaLASSO in gauge and potency.

Tables 2 and 3 illustrate the simulated result with ϕ (autocorrelation coefficients) of regressors equal to 0.5 and 0.8. The WLAdaLASSO estimate outperforms others with an average potency of 64.5%, ϕ equal to 0.5, and T equal to 50. As the T sample of WLAdaLASSO increases, the average gauge approaches nominal significance level, and average potency approaches 1. The simulation result indicates that the WLAdaLASSO estimate is not sensitive to autocorrelation coefficients as with ϕ equal to 0.1 and T equal to 50; the average retains potency equal to 63.6% and 64.5% with ϕ being equal to 0.5. However, the Autometrics performs poorly as the autocorrelation coefficient increases from 0.1 to 0.5.

Meanwhile, Autometrics with ϕ equal to 0.8 and T equal to 50 possess 11.5% gauge, which is higher than the 5% significance level. The performance of Autometrics does not get enhanced (gauge ⟶α, and potency ⟶1) as the sample size increases with ϕ being equal to 0.8. However, WLAdaLASSO performs superior in average potency and gauge compared to all other techniques. The simulation experiment indicates that WLAdaLASSO performs robust even with a stronger linear dependence between predictors. With increasing sample, the performance of Autometrics, LASSO, AdaLASSO, SCAD, and MCP does not get enhanced as that of WLAdaLASSO. The WLAdaLASSO performs superior to other considered regularization techniques and to Autometrics as well in average gauge and potency even with higher and weak linear dependency between predictors and small sample size.

Figures 13 illustrate the RMSE of considered techniques with T samples being equal to 50, 100, and 500 and ϕ equal to 0.1, 0.5, and 0.8. The result shows that the WLAdaLASSO outperforms other techniques in out-of-sample forecasting. The WLAdaLASSO estimate is insensitive to autocorrelation coefficients, as the forecast performance and average potency have not decreased because ϕ equals 0.8 even with a small sample. However, with ϕ being equal to 0.8, all other techniques perform poorly in out-of-sample forecasting, whereas WLAdaLASSO possesses the least RMSE. Autometrics with autocorrelation coefficient equals 0.1, and T equal to 50 performs poorly in RMSE compared to WLAdaLASSO, but with sample size increment, the RMSE decreases because the average potency increases. However, Autometrics with a ϕ being equal to 0.8 and T equal to 50 performs the worst among all other techniques. The overall simulation result indicates that WLAdaLASSO outperforms Autometrics and other regularization techniques in potency and out-of-sample forecasting.

4. Real Data Analysis

For the real data analysis, we aim to probe the determinants of the trade balance for Pakistan and implement the considered techniques and assess their performance. Trade has played an important role in developing countries as a growth engine in various eras. The trade deficit or surplus is a term used to describe trade imbalances. Since independence, Pakistan has been in a trade deficit, except for three years: 1947–1948, 1950–1951, and 1972–1973 [40]. According to economic literature, a variety of factors are thought to be responsible for long-term trade deficits in various economies, including ineffective public policies, shocks in major trading countries, oil price hikes if the economy is heavily reliant on oil imports, residents’ socioeconomic conditions, and increased urbanization [41, 42]. The existing studies in the case of Pakistan considered only a few macroeconomic variables as like GDP, exchange rate, broad money supply, inflation, and Foreign Direct Investment [40, 4347]. This study intakes the Generalized Unidentified Model (GUM) that includes each and every possible determinant of trade balance with 11 regressors, namely, Domestic Investment (log), Domestic Consumption (log), FDI (log), GDP (log), Inflation (log), Budget Deficit (log), Remittances (log), Exchange Rate (log), Population (log), Urban population (log), and Government expenditure (log).

We use annual frequency data from 1980 to 2020. The data has been compiled from World Data Indicator. The model contains 11 regressors (with a difference) and includes 5 lags of each covariates and the lags of the dependent variable. The GUM includes 71 covariates; due to differencing data and including 5 lags of covariates, we have a total of 35 observations. We train the model on 30 observations from 1985 to 2015 as the last 5 observations 2016–2020 have been discarded for test data. Throughout simulation experiment and real data analysis, we use BIC-based tuning parameter for regularization techniques, while, for Autometrics, we select the model with 0.01 and 0.05 significance levels.

The real data analysis illustrated in Figure 4 verifies our simulation findings as WLAdaLASSO outperforms all other techniques with the least out-of-sample RMSE being equal to 0.0069 followed by Autometrics (0.01) with RMSE of 0.018. Autometrics with 0.05 possesses a higher RMSE equal to 0.111 than Autometrics with a 0.01 significance level. The finding is aligned with the simulation experiment as Autometrics with a 0.05 level of significance possesses a slightly higher average gauge with higher RMSE than a 0.01 level of significance. SCAD, MCP, LASSO, and AdaLASSO estimate higher RMSE as the model selects more irrelevant covariates and lag than WLAdaLASSO and Autometrics. WLAdaLASSO selects three covariates, namely, difference of urban population (dupop), difference of log GDP lag 1 (dlnGDP(−1)), and difference of log population lag 4 (dpop(−4)). Autometrics with a 0.05 significance level selects five covariates and their lag, and with 0.01 significance level, it selects three covariates. dlnGDP(−1) is a common covariate between WLAdaLASSO and Autometrics with 0.05 and 0.01 significance levels. SCAD, MCP, and LASSO select too many covariates and their lag, due to which these techniques possess higher RMSE compared to WLAdaLASSO. However, AdaLASSO selects three covariates and their lag with RMSE being equal to 0.16, which is higher than that of WLAdaLASSO and Autometrics.

5. Conclusion

Regularization techniques have become extremely popular in time-series modeling in recent years due to availability of massive data. This study aims to analyze the performance of the WLAdaLASSO with Autometrics for covariate selection and forecasting. The simulation study illustrates that the WLAdaLASSO with the stronger linear dependency between predictors outperforms Autometrics and other regularization techniques. However, Autometrics with ϕ being equal 0.1, the performance of gauge ⟶ α (0.05 or 0.01 level of significance), potency ⟶ 1, and the Average RMSE also decrease, with sample size increment. On the contrary, the situation is limited to ϕ equal to 0.1; however, ϕ equal to 0.8 and increasing sample size does not significantly enhance the performance of Autometrics compared to WLAdaLASSO. Autometrics with 0.05 significances level includes irreverent covariates that increase the RMSE compared to 0.01 significance, and the finding is aligned with real data analysis. However, other than the WLAdaLASSO, all considered regularization techniques perform poorly in covariate selection and forecasting even with ϕ being equal to 0.1 and T equal to 50, whereas the performance of considered techniques is improved with an increase in sample size; still, WLAdaLASSO outperformed others among all simulation experiments. The simulation experiment and real data analysis are evidence that the WLAdaLASSO is a more robust technique than all other considered regularization techniques and Autometrics as well in out-of-sample forecasting and covariate selection even with the stronger linear dependence between predictors and small sample size.

5.1. Limitations

One of the study’s constraints is that it only considers linear models and annual data. These simulation experiments are limited to Gaussian distributed errors.

Data Availability

Coding of the simulation study, coding of the real data analysis, and data used for analysis can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.