Group Identification and Variable Selection in Quantile Regression

Using the Pairwise Absolute Clustering and Sparsity (PACS) penalty, we proposed the regularized quantile regression QR method (QR-PACS). The PACS penalty achieves the elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients (IC), which are the two issues raised in the searching for the true model. QR-PACS extends PACS from mean regression settings to QR settings. The paper shows that QR-PACS can yield promising predictive precision as well as identifying related groups in both simulation and real data.


Introduction
The regression model is one of the most important statistical models. The ordinary least squares regression (OLS) estimates the conditional mean function of the response. The least absolute deviation regression (LADR) estimates the conditional median function and it is resistant to outliers. The QR was introduced by Koenker and Bassett [1] as a generalization of LADR to estimate the conditional quantile function of the response. Consequently, QR gives us much more information about the conditional distribution of the response. QR has attracted a vast amount of interest in literature. It is applied in many different areas such as economics, finance, survival analysis, and growth chart.
Variable selection (VS) is very important in the process of model building. In many applications, the number of variables is huge. However, keeping irrelevant variables in the model is undesirable because it makes the model difficult to interpret and may affect negatively its ability of prediction. Many different penalties were suggested to achieve VS. For example, Lasso [2], SCAD [3], fused Lasso [4], elastic-net [5], group Lasso [6], adaptive Lasso [7], adaptive elastic-net [8], and MCP [9].
Under QR framework, Koenker [10] combined the Lasso with the mixed-effect QR model to encourage shrinkage in estimating the random effects. Wang, Li, and Jiang [11] combined LADR with the adaptive Lasso penalty. Li and Zhu [12] proposed L1-norm penalized QR (PQR) by combining QR with Lasso penalty. Wu and Liu [13] introduced PQR with the SCAD and the adaptive Lasso penalties. Slawski [14] proposed the structured elastic-net regularizer for QR.
In the setting p > n, where represents the number of predictors and represents the sample size, Belloni and Chernozhukov [15] studied the theory of PQR for the Lasso penalty. They considered QR in high-dimensional sparse models. Wang, Wu, and Li [16] investigated the methodology of PQR in ultrahigh dimension for the nonconvex penalties such as SCAD or MCP. Peng and Wang [17] proposed and studied a new iterative coordinate descent algorithm for solving nonconvex PQR in high dimension.
The search for the true model focuses on two issues: deleting irrelevant predictors and merging predictors with IC [18]. Although the above penalties can achieve the first issue, they fail in achieving the second one. The two issues can be achieved through Pairwise Absolute Clustering and 2 Journal of Probability and Statistics Sparsity (PACS) [18]. Moreover, PACS is an oracle method for simultaneous group identification and VS.
The limitations of existing variable selection methods motivate the authors to write this paper. The aim of the current research is to find an effective procedure for simultaneous group identification and VS under QR framework.
In this paper, we suggested the QR-PACS to get the advantages over the existing PQR methods. QR-PACS benefits from the ability of PACS on achieving the mentioned issues of the discovery of the true model which is unavailable in Lasso, Adaptive Lasso, SCAD, MCP, Elastic-net, and structured elastic-net.
The rest of the paper is organized as follows. In Section 2, penalized linear QR is reviewed briefly. QR-PACS is introduced in Section 3. The numerical results of simulations and real data are presented in Sections 4 and 5, respectively. The conclusions are reported in Section 6.

Penalized Linear QR
QR is a widespread technique used to describe the distribution of an outcome variable ( ), given a set of predictors (x ). Let x be a ⨉ 1 vector of predictors for the ℎ observation and (x ) be the inverse cumulative distribution function of given x . Then, where is a vector of unknown parameters and is the level of quantile.
Koenker and Bassett [1] suggested estimating as follows: where (.) is the check loss function defined as Under regularization framework, Li and Zhu [12], Wu and Liu [13], Slawski [14], and Wang, Wu, and Li [16] among others proposed the penalized versions of (1) by adding different penalties as follows: where > 0 is the penalization parameter and ( ) is the penalty function.
For the rest of this paper, the subscript is omitted for notational convenience.

Penalized Linear QR through PACS (QR-PACS)
In this section, we incorporate PACS into the optimization of (1) to propose QR-PACS. Under the QR setup, the predictors are standardized and the response is centered, = 1, 2, . . . , and = 1, 2, . . . , . The QR-PACS is proposed for simultaneous group identification and VS in QR. The QR-PACS encourages correlated variable to have equal coefficient values. The equality of coefficients is attained by adding group identification penalty to the pairwise differences and sums of coefficients. The QR-PACS estimates are proposed as minimizers of where are the nonnegative weights. The PACS penalty in (4) consists of {∑ =1 | |} that encourages sparseness, {∑ 1≤ < ≤ (−) | − |} and {∑ 1≤ < ≤ (+) | + |}, which are employed for the group identification and encourage equality of coefficients. The second term of the penalty encourages the same sign coefficients to be set as equal, while the third term encourages opposite sign coefficients to be set as equal in magnitude.
Choosing appropriate adaptive weights is very important for PACS. In QR-PACS, we employed the adaptive weights that incorporate correlations into the weights as suggested by Sharma et al. [18] with a small modification as follows: wherẽis a √ consistent estimator of , such as the PACS [18] estimates or other shrinkage QR estimates, and is the biweight midcorrelation ( , ) ℎ pair of predictors. We propose to employ the biweight midcorrelation [19,20] instead of Pearson correlation which is used in the adaptive weights in [18] to obtain robust correlation and robust weights.
In this paper, ridge quantile estimates were employed as initial estimates for 's to obtain weights performing well in studies with collinear predictors.

Simulation Study
In this section, five examples were carried out to assess QR-PACS method by comparing it with existing selection approaches under QR setting in both prediction precision and model discovery. A regression model was generated as follows.
In all examples, predictors and the error term were standard normal. We compared QR-PACS with ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR-elastic-net. The performance of the methods was compared using model error (ME) criterion for prediction accuracy which was defined by (̂− ) (̂− ) where represents the population covariance matrix of X and the resulting model complexity for model discovery. The median and standard error (SE) of ME were reported. Also, selection accuracy (SA, % of true models identified), grouping accuracy (GA, % of true groups identified), and % of both selection and grouping accuracy (SGA) were computed and reported. Note that none of ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR-elastic-net perform grouping. The sample size was 100 and the simulated model was replicated 100 times. Some typical examples are reported as follows.
Example 1. In this example, we assumed the true parameters for the model of study as = (2, 2, 2, 0, 0, 0, 0, 0) , ∈ R 8 . The first three predictors were highly correlated with correlation equal to 0.7 and their coefficients were equal in magnitude, while the rest were uncorrelated.
Example 2. In this example, the true coefficients were assumed as = (0.5, 1, 2, 0, 0, 0, 0, 0) , ∈ R 8 . The first three predictors were highly correlated with correlation equal to 0.7 and their coefficients were different in magnitude, while the rest were uncorrelated.
Example 3. In this example, the true parameters were = (1, 1, 1, 0.5, 1, 2, 0, 0, 0, 0) , ∈ R 10 . The first three predictors were highly correlated with correlation equal to 0.7 and their coefficients were equal in magnitude, while the correlation for the second three predictors was equal to 0.3 and their coefficients were different in magnitudes. The remaining predictors were uncorrelated.
Example 4. In this example, the true parameters were = (1, 1, 1, 0.5, 1, 2, 0, 0, 0, 0) , ∈ R 10 . The first three predictors were correlated with correlation equal to 0.3 and their coefficients were equal in magnitude, while the correlation for the second three predictors was 0.7 and their coefficients were different in magnitudes. The remaining predictors were uncorrelated.
Example 5. In this example, the true parameters were assumed as = (2, 2, 2, 1, 1, 0, 0, 0, 0, 0) , ∈ R 10 . The first three and the second two predictors were highly correlated with correlation equal to 0.7 and their coefficients were different in magnitude, while the rest were uncorrelated.
For all the values of and , Table 1 shows that the QR-PACS method has the lowest ME. Although the QR-elasticnet and QR-adaptive Lasso have the highest SA, it is clear that all the considered methods do not perform grouping except QR-PACS. The QR-PACS method successfully identifies the groups of predictors as seen in the GA and SGA rows.  In Table 2, the percentage of no-grouping (NG, no groups found) and percentage of selection and no-grouping (SNG) were reported instead of GA and SGA, respectively. In terms of prediction and selection, the QR-PACS method does not perform well, while the QR-elastic-net, QR-adaptive Lasso, and QR-SCAD perform the best, respectively. All the methods under consideration perform well in terms of not identifying the group. Thus, the QR-PACS is not a recommended method when there is high correlation and the significant variables do not form a group. Table 3 demonstrates that the QR-elastic-net and QRadaptive Lasso have the best SA, respectively; however, the QR-PACS performs better in terms of ME. It is obvious that QR-PACS identifies the important group with high GA and SGA. Table 4 shows that the QR-elastic-net, QR-adaptive Lasso, and QR-SCAD have the best SA. It is clear that the QR-PACS has the best results among the other methods in terms of ME. In terms of GA and SGA, it can be observed that the QR-PACS performs well.
From Table 5, it can be noticed that the QR-elastic-net has the best SA. The QR-PACS has excellent GA. Also, it is clear that QR-PACS successfully identifies the groups of predictors as seen in the GA and SGA.

NCAA Sports Data
In this section, the behavior of the QR-PACS with ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QR-elastic-net was illustrated in the analysis of NCAA sports data [21]. We standardized the predictors and centered the response before the data analysis.
In each repetition, the authors randomly split the data into a training and a testing dataset, the percentage of the testing data was 20%, and models were fit onto the training set. The NCAA sports data were randomly split 100 times each to allow for more stable comparisons. We reported the average and SE of the ratio of test error (RTE) over QR of all methods and the effective model size (MZ) after accounting for equality of absolute coefficient estimates.
The NCAA data was taken from a study of the effects of sociodemographic indicators and the sports programs on graduation rates.
From Table 6, the results indicate that QR-PACS does significantly better than ridge QR, QR-Lasso, QR-SCAD,      QR-adaptive Lasso, and QR-elastic-net in test error. In fact, ridge QR, QR-Lasso, QR-SCAD, QR-adaptive Lasso, and QRelastic-net perform worse than the QR in test error. The effective model size is 5 for QR-PACS, although it includes all variables in the models.

Conclusions
In this paper, QR-PACS for group identification and VS under QR settings is developed, which combines the strength of QR and the ability of PACS for consistent group identification and VS. QR-PACS can achieve the two goals simultaneously. QR-PACS extends PACS from mean regression settings to QR settings. It is proved computationally that it can be simply carried out with an effective computational algorithm. The paper shows that QR-PACS can yield promising predictive precision as well as identifying related groups in both simulation and the real data. Future direction or extension of the current paper is QR-PACS under Bayesian framework. Also, robust QR-PACS is another possible extension of the current paper.

Data Availability
The data which is studied in our paper is the NCAA sports data from Mangold et al. [21]. It is public and available from the website (http://www4.stat.ncsu.edu/∼boos/var.select/ncaa .html), [21].