Abstract
Recently, matrix norm \(l_{2,1}\) has been widely applied to feature selection in many areas such as computer vision, pattern recognition, biological study and etc. As an extension of \(l_1\) norm, \(l_{2,1}\) matrix norm is often used to find jointly sparse solution. Actually, computational studies have showed that the solution of \(l_p\)-minimization (\(0<p<1\)) is sparser than that of \(l_1\)-minimization. The generalized \(l_{2,p}\)-minimization (\(p\in (0,1]\)) is naturally expected to have better sparsity than \(l_{2,1}\)-minimization. This paper presents a type of models based on \(l_{2,p}\ (p\in (0, 1])\) matrix norm which is non-convex and non-Lipschitz continuous optimization problem when \(p\) is fractional (\(0<p<1\)). For all \(p\) in \((0, 1]\), a unified algorithm is proposed to solve the \(l_{2,p}\)-minimization and the convergence is also uniformly demonstrated. In the practical implementation of algorithm, a gradient projection technique is utilized to reduce the computational cost. Typically different \(l_{2,p}\ (p\in (0,1])\) are applied to select features in computational biology.
Similar content being viewed by others
Notes
\(\Vert \cdot \Vert _{2,p}\) (\(0<p<1\)) is not a valid matrix norm because it does not admit the triangular inequality. Here we call it matrix norm for convenience.
References
Nie, F.P., Huang, H., Cai, X., and Ding, C.: Efficient and robust feature selection via joint \(l_{2,1}\)-norms minimization. Twenty-Fourth Annual Conference on Neural Information Processing Systems, pp. 1–9. (2010)
Candès, Emmanuel J., Wakin, Michael B., Boyd, Stephen P.: Enhancing sparsity by reweighed \(l_1\) minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)
Chartrand, R.: Exact reconstructions of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10), 707–710 (2007)
Chartrand, R., and Yin, W.: Iteratively reweighed algorithms for compressive sensing. 33rd International Conference on Acoustics, Speech, and Signal Processing, pp. 3869–3872 ( 2008)
Chen, X.J., Xu, F.M., Ye, Y.Y.: Lower bound theory of nonzero entries in solutions of \(l_2-l_p\) minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Ding, C., Zhou, D., He, X.F., and Zha, H.Y.: \(R1-\)PCA: Rotational invariant \(L_1-\)norm principal component analysis for robust subspace factorization. Proceedings of the 23th International Conference on Machine Learning, pp. 281–288 (2006)
Dudoit, S., Fridly, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Caincross, J.G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M.E.: Gene expression-based classification of malignant gliomas correlates better with servival than histological classification. Cancer Res. 63, 1602–1607 (2003)
Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnoistic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Xu, Z.B., Zhang, H., Wang, Y., Chang, X.Y., Yong, L.: \(L_{\frac{1}{2}}\) regularizer. Sci. China 52(6), 1159–1169 (2010)
Rakotomamonjy, A., Flamary, R., Gasso, G., Canu, S.: \(l_p-l_q\) Penalty for sparse linear and sparse multiple kernel multitask learning. IEEE Transac. Neural Netw. 22(8), 1307–1320 (2011)
Rosen, J.B.: The gradient projection method for nonlinear programming. Part 1 Linear constraints. J. SIAM 8, 181–217 (1960)
Acknowledgments
The first author thanks Dr. Zhang Hongchao for his helpful suggestions on this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work is supported by the NSFC11001128, NSFC61035003, NSFC61170151, NSFC11071117 and the Fundamental Research Funds for the Central Universities (No. NZ2013306 and NZ2013211).
Rights and permissions
About this article
Cite this article
Wang, L., Chen, S. & Wang, Y. A unified algorithm for mixed \(l_{2,p}\)-minimizations and its application in feature selection. Comput Optim Appl 58, 409–421 (2014). https://doi.org/10.1007/s10589-014-9648-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-014-9648-x