A Pass to Variable Selection

Many regularized procedures produce sparse solution and therefore are sometimes used for variable selection in linear regression. It has been showed that regularized procedures are more stable than subset selection. Such procedures include LASSO, SCAD, and adaptive LASSO, to name just a few. However, their performance depends crucially on the tuning parameter selection. For the purpose of prediction, popular methods for the tuning parameter selection include Cp, cross-validation, and generalized cross-validation. For the purpose of variable selection, the most popular method for the tuning parameter selection is BIC. The selection consistency of BIC for some regularized procedures have been shown. However, knowing degrees of freedom is required in the use of BIC. For many regularized procedures, such as those for graphical models and clustering algorithms, the formulae for degrees of freedom do not exist.


Introduction
Many regularized procedures produce sparse solution and therefore are sometimes used for variable selection in linear regression. It has been showed that regularized procedures are more stable than subset selection. Such procedures include LASSO, SCAD, and adaptive LASSO, to name just a few. However, their performance depends crucially on the tuning parameter selection. For the purpose of prediction, popular methods for the tuning parameter selection include C p , cross-validation, and generalized cross-validation. For the purpose of variable selection, the most popular method for the tuning parameter selection is BIC. The selection consistency of BIC for some regularized procedures have been shown. However, knowing degrees of freedom is required in the use of BIC. For many regularized procedures, such as those for graphical models and clustering algorithms, the formulae for degrees of freedom do not exist.
Recently, stability selection has become another popular method for variable selection [1,2]. However, most methods based on stability depend on some hyper-tuning parameter explicitly. For example, the method in [1] depends on a threshold (pre-set as 0.8 in [1]) and the method in [2] depends also on a threshold (pre-set as 0.9 in [2]). Therefore, it is desirable to propose some method to avoid such hypertuning parameter in stability selection methods. One suggestion is to combine the strength of both stability selection and cross-validation. Since cross-validation is one variable selection method based on prediction, the new method is referred as the prediction and stability selection (PASS).
is used to estimate , most regularized procedures have been shown to be selection consistent with appropriate λ = λ n , emphasizing its dependence on data. In general, as shown in [3], there are five cases: Case 4: If n n r λ  , then the sign pattern of  n λ β is consistent with that of β on  with probability tending to one, while for all sign patterns consistent with that of β on , the probability of obtaining this pattern is tending to a limit in (0,1).
Case 5: If λ n  r n , then  A good criterion should intend to select λ n from case 3; selecting λ n from cases 1 or 2 might lead to under-fitting while from cases 4 or 5 might lead to over-fitting. If the two degenerate cases (1 and 5) are pre-excluded, the criterion, referred to PASS, incorporates crossvalidation, which avoids under-fitting, and Kappa selection proposed in [2], which avoids over-fitting. To describe this criterion, consider any regularized procedure with λ and randomly partition the dataset {(y 1 ,x 1 {(y 1 , x 1 ),…,(y n ,x n )} into two halves, Step 2: Based on * Step 3: Step 4: Repeat Steps 1-3 for B times and obtain the following ratio, Step 5: Compute PASS(λ) on a grid of λ and select  = arg ( ) max PASS λ λ λ .

Discussion
The new criterion has several advantages. First, it does not depend on any hyper-tuning parameter. Second, the implementation is straightforward. Third, it can be applied to variable selection in any models such as linear model, generalized linear model, and Cox's proportional hazard model. Fourth, it can also be applied to variable selection in both supervised learning and unsupervised learning.