Prediction of composite indicators using locally weighted quantile regression

Abstract. The main goal of this paper is to improve the existing methods and tools used for solving penalized quantile regression problems. We modified the quantile regression method by implementing the extreme learning machine (ELM) algorithm and features of locally weighted regression. Also, we used different penalty functions. A modified method was used for the onestep-ahead prediction of the composite indicator (CI) of the Lithuanian economy. Our analysis showed that the prediction error of the modified locally weighted quantile regression is smaller in comparison to the other quantile regression.


Introduction
Linear regression methods are well known as classical methods; the ordinary least square (OLS) method allows us to assess the linear relationship of variables. This method finds the conditional mean of the response variable. On the other hand, the OLS method is very sensitive to outliers; hence, it provides unstable estimates. Also applicable is the LADR (least absolute deviation regression), which evaluates the median of the response variable. The median is more robust to outliers; hence, this method is more robust to outliers as well. LADR was summarized in [4] when developing the quantile regression. In this way, the conditional quantile of any order was obtained; practically, this fact characterizes the conditional distribution of a response variable. Over the past thirty-five years, quantile regression has become a very important and widely used technique to analyze the whole conditional distribution of a response variable. Such methods are more robust to outliers than that of the linear regression.
The first step of any regression modeling is the selection of variables. In practical problems, there are situations where quite a few variables are important. Usually, most of them are included in the model, and only then are the most important ones chosen. Insignificant variables should not be included in the model since they complicate interpretation of results and reduce the accuracy of predictions. In order to automate this process, the regularization of the method is often applied. Many versions of regularization are proposed. The L 1 regularization is used in the LASSO (least absolute shrinking and selection operator) method [11]. Later, the LASSO method was improved into the adaptive LASSO method [13]. The nonconcave penalized least squares regression was introduced [2]. This method selects the significant variables and evaluates the estimates of coefficients simultaneously. An example of such a function is SCAD (smoothly clipped absolute deviation) regularization. Also, logarithmic and exponential regularizations were introduced. Later, the MCP (minimax concave penalty) function was developed [12]. In most cases, these above-mentioned penalty functions were used in solving linear regression problems, yet almost analogous methods can be used in quantile regression methods.
This work is an extension of previous work, where the ELM algorithm was combined with the locally weighted regression for the one-step-ahead prediction of the composite indicator (CI) of the Lithuanian economy [8]. An analysis of results showed that the combined method gives a smaller prediction error in comparison to the Levenberg-Marquard, ELM methods or AR(p) process. Also, the analysis of the results based on various accuracy measures suggested that the proposed method may be used for data of rather small sample size and during periods when dynamics of time series may have unexpected changes like during the economical crises and later periods (2008)(2009)(2010). In spite of the acceptable results, as mentioned in the beginning of the paper, it is known that the OLS method is sensitive to outliers. Hence, the quantile regression was chosen as an alternative (that is, more robust to outliers) for the one-step-ahead prediction of CI.
In practical examples, we will use the CI of the Lithuanian economy that was developed under the methodology described in [7]. The methodology for constructing the CI is based on factor analysis. In this paper, the practical problems of quantile regression are solved using the R package rqPen [9]. In this package, different regularization functions are implemented: LASSO, SCAD, and MCP. This tool enables us to solve quantile regression problems using these above-mentioned regularization functions.
The general objective of this research is to modify the existing quantile regression: (i) to develop the locally weighted quantile regression using different regularization functions and the ELM method for the modification of input data; (ii) to check the impact of different regularization functions on the predicted results of CI's.
The structure of this paper is as follows: In Section 1, methodological notes are introduced. The practical implementation of the methods is described in Section 3. Section 4 describes the results, and Section 5 gives concluding remarks.

Methodology
In this section, we define our main terms and introduce the methods used in this paper: the quantile regression with regularization, regularization functions, the locally weighted quantile regression, and the locally weighted quantile regression with the ELM modification.

Quantile regression with regularization
Suppose we have the random variables (r.v.) Y and X. Also suppose the Y 's distribution function with the condition that . Then Y 's τ ∈ [0, 1] order quantile with condition X = x is defined as a number: We define the r = y − f τ (x | β), then the "check" function: Such a "check" function (1) can be used for the evaluation of the quantiles of r.v. Suppose that Y is r.v. with distribution function F Y (y) = P(Y y). Then the quantile of τ order is equal Having the realization y 1 . . . , y N of r.v. Y , we evaluate the estimate of τ order quantile of r.v. Y : Moving to the quantile regression, suppose the argument u is dependent on the r.v. X, i.e., u = f τ (X). Then we find the Y 's conditional quantile Q Y |X (τ ) of order τ .
Analogically, if we have realizations {(x i , y i ), i = 1, . . . , N }, where x i = (x i1 , . . . , x ip ), we can solve the regression problem between Y and X using the "check" function (1). In this case, we will get the estimate Q Y |X (τ ) of τ order conditional quantile of the response variable. In another way, we have this minimization problem: When similar minimization problems are solved, regularization method is often used. Often, the regularization functions L 1 and L 2 are used. Then we obtain the LASSO and ridge regression.
Hence, in order to get better results for the quantile regression problem, we shall add the penalty function J(f τ ): here β = (β 1 , . . . , β p ), and the regularization parameter λ is carefully chosen.

Regularization functions
The regularization technique permits us to choose only significant variables for the regression; also, it helps us avoid overfitting. In this paper, we analyzed three different techniques: LASSO, SCAD, MPC.
LASSO method was defined in order to improve the OLS method [11]. Formerly, two main approaches were employed: ridge regression and the exclusion of insignificant variables from the model. Ridge regression gives fewer estimates of stabler coefficients. The second approach rejects irrelevant covariates and makes the results easier to interpret. LASSO combines features of both methods; hence, this method has its drawbacks. The LASSO regularization function shall be defined: t is a parameter that controls the shrinkage of coefficients. SCAD function was defined in [2]. This regularization function is symmetric and nonconcaved on the interval (0, ∞): Hence, the SCAD regularization function is differentiable in the interval (−∞, 0) ∪ (0, +∞), but its derivative outside of the interval [−aλ, aλ] is equal to zero. SCAD should give unbiased estimates of coefficients and exclude insignificant variables.
In [2], there is a recommendation to use α = 3.7, while λ is often chosen by crossvalidation or other methods.
MCP function can be used in solving linear regression problems. The estimates obtained using MCP are accurate and almost unbiased. This algorithm can be used in solving the large dimension multiple regression [12]. MCP regularization can be defined: here γ > 0 and λ > 0 are regularization parameters.

Locally weighted quantile regression
The goal of the regression is to evaluate the function m in E(Y | X) = m(X) having the response variable Y and independent variable X.
In the linear dependency between Y and X and having realization of these random variables {(x i , y i ), i = 1, . . . , N }, the linear regression is defined as In general, the results of such a minimization may be improved. The regression results obtained are usually applied to evaluate the prediction regarding the latest values of the Hence, in this paper, the locally weighted regression shall be used [5]. It modifies expression (4), which gives weights to the errors ω i : For locally weighted regression, the weights ω i are determined depending on how "close" x i is to the new query x q (further x q = x N +1 ). Often, the Gaussian kernel function K is used: Finally, we obtain this optimization problem: hereβ = (β 1 , . . . ,β p ) T . The solution to this problem is similar not very different from that of problem (4). Expression (5) is transformed: Analogically, we solve the quantile regression problem. Suppose, we have a quantile regression without regularization: After certain transformations, we note: In the case of a linear regression where ω i = K(x i , x q ), we find that the solution of (6) is equivalent to the problem: Hence, the locally weighted regression method is reduced to problem (7). This modification gives a more accurate estimate of β; in this way, we get a more accurate prediction of the response variable.

Locally weighted quantile regression with ELM modification
ELM is a widely used method based on the idea of a single hidden layer of feed-forward neural networks (SLFNs) [3,10].
In our case, ELM is used for the modification of independent variables. Data x i will be changed to a linear transformation with random weights [3], and the activation function (sigmoid function) will be chosen. Let us define the linear quantile regression problem: Hence, once we have x i , the new pseudovariables z i are created: here ω j = (ω 1j , . . . , ω pj ) T are randomly generated weights by ELM, ϕ is a sigmoid function that ϕ(u) = ϕ 1 (u) or ϕ(u) = ϕ 2 (u), where ϕ 1 and ϕ 2 are defined: We have modified data z i = (z i1 , . . . , z im ), i = 1, . . . , N . Now, we have the m new covariates composed of previous p covariates. The number m is arbitrary. Exactly these new data will be used in further analysis.
The following formula defines the modified quantile regression: here, the response variable is unchanged: However, we have a new expression of independent variables: here z i is defined as (8). In this way (only data transformation), the local quantile regression with ELM modified covariates is obtained. The assessment of this method is the same as for the initial quantile regression (2).

Practical implementation
In this section, previously described methods will be applied for data analysis. The regression will be constructed in two ways: the general quantile regression and locally weighted quantile regression with ELM modified covariates (further -the modified method). Practical modeling is concentrated on three different values of the order τ (τ = 0.05, 0.5, 0.95). When the variant is τ = 0.5, we deal with the conditional median of the response variable. As we know, the median and mean are the same for the r.v. with a symmetric density function. The quantile of the τ = 0.5 order is the most likely value of the response variable. An interpretation of the quantiles of order τ = 0.05 and τ = 0.95 gives the confidence interval of the response variable (in this case, 90 per cent).
Recently, new indicators (indexes) have been constructed that reflect the changes in economy more precisely than the gross domestic product [8]. Usually, this type of indicator is constructed as a combination of different indicators from various fields. Statistical data are selected regarding economic theory and additional methods (correlation, causality analysis, etc.). Weights are chosen using mathematical methods or including additional sources of information. In this way, so-called CI's are constructed. The advantage of the CI mainly depends on the chosen methodology and on selected statistical data that reflect the general tendencies of a specific field. In summary, the CI is defined as a mathematical function: here, X is a set of variables that compile CI, ω -weights that are assigned to every variable.
In this paper, we will use the methodology for constructing the CI presented in [8]. For the analysis and prediction, we extended the time period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) and reduced the number of variables to k = 12. Economical indicators of monthly and quarterly periodicity were used: statistical data of industry, construction, domestic trade, foreign trade, services, the producer price index. We will briefly describe the steps followed in constructing the CI. First of all, a preliminary analysis was applied (outliers were detected, missing values were assigned), and all data were seasonally adjusted. Data of quarterly periodicity were transformed into monthly periodicity. The range of all indicators was transformed into the range [0, 1]. Using the factor analysis method, we left only the most important indicators for further analysis. The weights of individual indicators were obtained by using the Nicoletti method [6] from the rotated factor loading matrix.
Hence 12 ) is the t value of the covariate, and y t is the t value of CI. Here we changed the index i to t in order to highlight the importance of time in the time series as the order of data.
In modeling economic indicators, lags of covariates are usually also included in the models. We suppose that In this case, we say that y depends on the last d values of x. In modeling CI, we deal with four different cases. We will use linear dependency and different lags (d = 1, 2, 3, 4): We solve problems (12) and obtain the general quantile regression and the modified quantile regression method y t ,x t and the components are defined in (8), (10), and (11). Here β = (β 0 , . . . , β d ) T , and J(β) -one of the regularization functions (LASSO, SCAD, or MCP). In methods (13) and (14) with d = 1, 2, 3, 4, CI is evaluated without considering the recent values of the covariates. Then the modified quantile regression will be constructed "around" the recent value of the covariate x t , which will be used in forecasting. Twelve different models are constructed using the general (3) and modified (9) quantile regression by following these steps: In all formulas, T stands for the length of time series. In practice, we will use the M 2 to find out how often the actual value falls within the confidence interval. E.g., for the 90 per cent confidence interval, the measure will be calculated: M 2 (y (0.05) , y (0.95) , y), here y (τ ) stands for the quantile of the τ order.

Results
Finally, after we have performed the modeling with different d and all regularizations, the set of vectors is obtained:ŷ (0.05) ,ŷ (0.5) ,ŷ (0.95) ,ŷ (0.05) ,ŷ (0.5) , andŷ (0.95) . In order to analyze the accuracy of our predictions, the original data (y 104 , . . . , y 204 ) are compared to the estimates of quantiles. The prediction value should be around the mean or the median; hence, the prediction of the median shall show the general view. In the case of 90 per cent, the realization of the random variable (CI) should fall within the confidence interval  between the 0.05 and 0.95 quantiles. Hence, we will analyze how well the model was able to assess the confidence interval of the response variable.
The example (d = 4 and MCP regularization) of modeling results is presented in Fig. 1 and in Tables 1, 2. In the figure, the black line denotes the median of original data; the red line denotes the median obtained by the general quantile method; the green line -by the modified method; the dashed red and green lines denote the 0.05 and 0.95 quantiles by the general and modified methods, respectively.
If we observe only the graphical results, it is quite difficult to determine which method is the better one. In different periods, one of the method gives more accurate predictions, or predictions are quite similar.
In the tables, REZ_NEW stands for the estimate of the median obtained by the modified method, and REZ_OLD is the estimate of the median obtained by the general method, REZ_OLD_lo, REZ_OLD_up, REZ_NEW_lo, REZ_NEW_up stand for the quantiles of 0.05 and 0.95, respectively. The notation "1:3" means that input data with the lags first, second, and third were used in a particular model.
Better results were obtained by the modified method using input data with three or four lags. MAE, MAPE, RMSE, and other statistics confirmed this fact. We noticed that the widths of the confidence intervals obtained by both methods are similar, but in the case of the modified method with SCAD and MCP regularizations, more than 90 per cent of observed data fall within the constructed confidence interval. In the case of a general method, only 46 per cent of data realizations fall within the interval. More results can be found in [1].
The analysis showed that measures of accuracy are dependent on regularization and the number of lags. The smallest errors of general method was obtained using LASSO regularization, while in the case of the modified quantile regression, the best result was obtained using four lags and MCP regularization. We see that different methods proceed with different accuracy depending on the number of lags. However, the modified method has this advantage: its confidence interval and estimate of the median are more accurate.
Also, the comparative analysis of obtained results and results of previous research [8] was performed. In this paper and in [8], the CI is slightly different. Here the time period is extended to 1998-2014 (in the previous research, 1998-2010), and the number of variables was reduced to k = 12 (previously, k = 28). The purpose and expected results of the combined method of extreme learning machine and locally weighted regression [8] and locally weighted quantile regression are unequal as well. Regardless of the differences we compared some accuracy measures (RMSE and MAPE) of one-step-ahead predictions. The analysis showed that RMSE and MAPE are slightly smaller of best models of locally weighted quantile regression.

Conclusions
In this paper, the quantile regression with penalty function was extended by including local weights and the ELM method for the modification of covariates. This developed modified method was adopted for the one-step-ahead prediction of the CI of the Lithuanian economy. Our analysis indicated that the locally weighted quantile regression with regularization obtains on average better results than the known quantile regression with regularization: the new method enabled us to obtain more accurate conditional medians and predictions of confidence intervals. The best results were obtained by methods with SCAD and MCP regularizations.