Abstract
Feature screening plays an important role in ultrahigh dimensional data analysis. This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors (e.g., genetic makers) given a low-dimensional exposure variable (such as clinical variables or environmental variables). To this end, we first propose a new index to measure conditional independence, and further develop a conditional screening procedure based on the newly proposed index. We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions. The newly proposed screening procedure enjoys some appealing properties. (a) It is model-free in that its implementation does not require a specification on the model structure; (b) it is robust to heavy-tailed distributions or outliers in both directions of response and predictors; and (c) it can deal with both feature screening and the conditional screening in a unified way. We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.
Similar content being viewed by others
References
Candes E, Tao T. The dantzig selector: Statistical estimation when p is much larger than n. Ann Statist, 2007, 35: 2313–2351
Chiang A P, Beck J S, Yen H-J, et al. Homozygosity mapping with snp arrays identifies trim32, an e3 ubiquitin ligase, as a bardet—biedl syndrome gene (bbs11). Proc Nat Acad Sci, 2006, 103: 6287–6292
Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J Amer Statist Assoc, 2014, 110: 630–641
Donoho D L. High-dimensional data analysis: The curses and blessings of dimensionality. In: AMS Math Challenges Lecture. Princeton: CiteSeerX, 2000, 1–32
Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Amer Statist Assoc, 2011, 106: 544–557
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360
Fan J, Li R. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. ArXiv:math/0602133, 2006
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J Roy Statist Soc Ser B, 2008, 70: 849–911
Fan J, Ma Y, Dai W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Amer Statist Assoc, 2014, 109: 1270–1284
Fan J, Samworth R, Wu Y. Ultrahigh dimensional feature selection: Beyond the linear model. J Mach Learn Res, 2009, 10: 2013–2038
Fan J, Song R. Sure independence screening in generalized linear models with np-dimensionality. Ann Statist, 2010, 38: 3567–3604
Hall P, Miller H. Using generalized correlation to effect variable selection in very high-dimensional problems. J Comput Graph Statist, 2009, 18: 533–550
Hoeffding W. Probability inequalities for sums of bounded random variables. J Amer Statist Assoc, 1963, 58: 13–30
Huang J, Horowitz J L, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Statist, 2008, 36: 587–613
Huang J, Ma S, Zhang C-H. Adaptive LASSO for sparse high-dimensional regression models. Statist Sinica, 2008, 18: 1603–1618
Li G, Peng H, Zhang J, et al. Robust rank correlation based screening. Ann Statist, 2012, 40: 1846–1877
Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139
Liu J, Li R, Wu R. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Amer Statist Assoc, 2014, 109: 266–274
Liu J Y, Zhong W, Li R Z. A selective overview of feature screening for ultrahigh-dimensional data. Sci China Math, 2015, 58: 2033–2054
Mai Q, Zou H. The kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika, 2012, 100: 229–234
Scheetz T E, Kim K Y A, Swiderski R E, et al. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Nat Acad Sci, 2006, 103: 14429–14434
Tibshirani R. Regression shrinkage and selection via the LASSO. J Roy Statist Soc Ser B, 1996, 58: 267–288
Van Der Vaart A W, Wellner J A. Weak Convergence and Empirical Processes. New York: Springer, 1996
Xu C, Chen J. The sparse MLE for ultrahigh-dimensional feature screening. J Amer Statist Assoc, 2014, 109: 1257–1269
Zhu L-P, Li L, Li R, et al. Model-free feature screening for ultrahigh-dimensional data. J Amer Statist Assoc, 2011, 106: 1464–1475
Zou H. The adaptive LASSO and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Roy Statist Soc Ser B, 2005, 67: 301–320
Acknowledgements
This work was supported by National Science Foundation of USA (Grant No. P50 DA039838), the Program of China Scholarships Council (Grant No. 201506040130), National Natural Science Foundation of China (Grant No. 11401497), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, the National Key Basic Research Development Program of China (Grant No. 2010CB950703), the Fundamental Research Funds for the Central Universities, National Institute on Drug Abuse, National Institutes of Health (Grants Nos. P50 DA036107 and P50 DA039838) and National Science Foundation of USA (Grant No. DMS 1512422). The content is solely the responsibility of the authors and does not necessarily represent the official views of NIDA, NIH, NSF, NKBRDPC, FRFCU, CSC or NNSFC.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, L., Liu, J., Li, Y. et al. Model-free conditional independence feature screening for ultrahigh dimensional data. Sci. China Math. 60, 551–568 (2017). https://doi.org/10.1007/s11425-016-0186-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-016-0186-8