Abstract
Modern biological experiments often involve high-dimensional data with thousands or more variables. A challenging problem is to identify the key variables that are related to a specific disease. Confounding this task is the vast number of statistical methods available for variable selection. For this reason, we set out to develop a framework to investigate the variable selection capability of statistical methods that are commonly applied to analyze high-dimensional biological datasets. Specifically, we designed six simulated cancers (based on benchmark colon and prostate cancer data) where we know precisely which genes cause a dataset to be classified as cancerous or normal – we call these causative genes. We found that not one statistical method tested could identify all the causative genes for all of the simulated cancers, even though increasing the sample size does improve the variable selection capabilities in most cases. Furthermore, certain statistical tools can classify our simulated data with a low error rate, yet the variables being used for classification are not necessarily the causative genes.
Acknowledgments:
The authors are grateful to the reviewers, as their comments led to substantial improvements in the final manuscript. In addition, the authors are very appreciative of Welling Howell from Wheatstone Analytics for his extensive works on R penalized-SVM, Random Forest, and for his critical review of the manuscript. In addition, we owe special thanks to Charlene Wang of Health First Incorporated on SAS computation. We also appreciate the comments and help from the following colleagues and friends: Chaur-Chin Chen of National Tsing-Hua University, Leonardo Auslender of Cisco Systems, Inc., and Sudhir Nayak of The College of New Jersey. Finally, we would like to thank the following students at The College of New Jersey for the cross-check of the computer experiments in this paper: Edward Lee, Roger Shan, Alana Huszar, Sahnaz Saleem, Cassidy Wilson, Joseph Ruffo, and Roger Shan.
Appendix
The histograms and scatterplots of simulated genes X1, X2, X3 are presented below:
The means and standard deviations of these variables are slightly different:
Variable | N | Mean | Std dev |
---|---|---|---|
X1 | 102 | 4.6881029 | 3.1055643 |
X2 | 102 | 8.3062417 | 3.7449357 |
X3 | 102 | 13.6261146 | 5.0801841 |
The 3 variables are generated by the following formulas in a do-loop:
X1, X2, X3 are 10*Uniform(0,1)
Z2=2X2
Z3=3X3
X2(new)=X1+0.35Z2
X3(new)=X2(new)+0.35Z3
Consequently X2 and X3 have higher probability in the middle. The choice of these distributions were partly motivated by the prostate cancer data (Efron, 2008). See, e.g. the histograms of Gene-1 to Gene-4 below:
After examining the 6033 histograms of the prostate cancer data, along with other disease datasets, we believe the above distributions are reasonable compromises in the simulation study to test statistical methods in gene search.
References
Alon, U., N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine (1999): “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Nat. Acad. Sci., 96, 6745–6750.10.1073/pnas.96.12.6745Search in Google Scholar PubMed PubMed Central
Anonymous (2006): “Making the most of microarrays,” Nat. Biotechnol., 24, 1039.10.1038/nbt0906-1039Search in Google Scholar PubMed
Anonymous (2010): “MAQC-II: Analyze that!,” Nat. Biotechnol., 28, 761.10.1038/nbt0810-761bSearch in Google Scholar PubMed
Anonymous (2014): “A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium,” Nat. Biotechnol., 32, 903–914.10.1038/nbt.2957Search in Google Scholar PubMed PubMed Central
Assimes, T. L., J. W. Knowles, A. Basu, C. Iribarren, A. Southwick, H. Tang, D. Absher, J. Li, J. M. Fair, G. D. Rubin, S. Sidney, S. P. Fortmann, A. S. Go, M. A. Hlatky, R. M. Myers, N. Risch and T. Quertermous (2008): “Susceptibility locus for clinical and subclinical coronary artery disease at chromosome 9p21 in the multi-ethnic advance study,” Hum. Mol. Genet., 17, 2320–2328.10.1093/hmg/ddn132Search in Google Scholar PubMed PubMed Central
Bar, H., J. Booth, E. Schifano and M. T. Wells (2009): “Laplace approximated EM microarray analysis: an empirical bayes approach for comparative microarray experiments,” Statist. Sci., 25, 388–407.10.1214/10-STS339Search in Google Scholar
Becker, N., W. Werft, G. Toedt, P. Lichter and A. Benner (2009): “PenalizedSVM: a R-package for feature selection SVM classification,” Bioinformatics, 25, 1711–1712.10.1093/bioinformatics/btp286Search in Google Scholar PubMed
Benjamini Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Series B Stat. Methodol., 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar
Bootkrajang, J. and A. Kabán (2013): “Classification of mislabelled microarrays using robust sparse logistic regression,” Bioinformatics, 29, 870–877.10.1093/bioinformatics/btt078Search in Google Scholar PubMed
Cordell, H. J. (2009): “Detecting gene-gene interactions that underlie human diseases,” Nat. Rev. Genet., 10, 392–404.10.1038/nrg2579Search in Google Scholar PubMed PubMed Central
Dean, N. and A. E. Raftery (2010): “Latent class analysis variable selection,” Ann. Inst. Stat. Math., 62, 11–35.10.1007/s10463-009-0258-9Search in Google Scholar PubMed PubMed Central
Do, K. A., P. Müller and F. Tang (2005): “A Bayesian mixture model for differential gene expression,” J. R. Stat. Soc. Ser. C Appl. Stat., 54, 627–644.10.1111/j.1467-9876.2005.05593.xSearch in Google Scholar
Dudoit, S., J. P. Shaffer and J. C. Boldrick (2003): “Multiple hypothesis testing in microarray experiments,” Statist. Sci., 18, 71–103.10.1214/ss/1056397487Search in Google Scholar
Efron, B. (2008): “Microarrays, empirical Bayes and the two-groups model,” Statist. Sci., 23, 1–22.Search in Google Scholar
Efron, B. (2010): “The future of indirect evidence,” Statist. Sci., 25, 145–157.10.1214/09-STS308Search in Google Scholar PubMed PubMed Central
Efron, B. and N. Zhang (2011): “False discovery rates and copy number variation,” Biometrika, 98, 251–271.10.1093/biomet/asr018Search in Google Scholar
Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004): “Least angle regression,” Ann. Stat., 32, 407–499.10.1214/009053604000000067Search in Google Scholar
Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1438–1360.10.1198/016214501753382273Search in Google Scholar
Ferreira, J. A. and A. H. Zwinderman (2006): “On the Benjamini-Hochberg method,” Ann. Statist., 34, 1827–1849.10.1214/009053606000000425Search in Google Scholar
Freund, Y. (1995): “Boosting a weak learning algorithm by majority,” Inf. Comput., 121, 256–285.10.1016/B978-1-55860-146-8.50019-9Search in Google Scholar
Freund, Y. and R. E. Schapire (1996): “Experiments with a new boosting algorithm,” Machine Learning: Proc. 13th International Conference, 148–156.Search in Google Scholar
Friedman, J. (2001): “Greedy function approximation: a gradient boosting machine,” Ann. Statist., 29, 1189–1232.10.1214/aos/1013203451Search in Google Scholar
Friedman, J. (2006): “Recent advances in predictive (machine) learning,” J. Classif., 23, 175–197.10.2172/826695Search in Google Scholar
Friedman, J., T. Hastie and R. Tibshirani (2000): “Additive logistic regression: a statistical view of boosting (with discussion),” Ann. Statist., 28, 337–407.10.1214/aos/1016218223Search in Google Scholar
Funke, B., A. K. Malhotra, C. T. Finn, A. M. Plocik, S. L. Lake, T. Lencz, P. DeRosse, J. M. Kane and R. Kucherlapati (2005): “COMT genetic variation confers risk for psychotic and affective disorders: a case control study,” Behav. Brain Funct., 1, 19.10.1186/1744-9081-1-19Search in Google Scholar PubMed PubMed Central
Guyon, I. and A. Elisseeff (2003): “An introduction to variable and feature selection,” J. Mach. Learn. Res., 3, 1157–1182.Search in Google Scholar
Guyon, I., J. Weston, S. Barnhill and V. Vapnik (2002): “Gene selection for cancer classification using support vector machines,” Mach. Learn., 46, 389–422.10.1023/A:1012487302797Search in Google Scholar
Hand, D. J. (2006): “Classifier technology and the illusion of progress,” Statist. Sci., 21, 1–14.10.1214/088342306000000060Search in Google Scholar
Hand, D. J. (2008): “Breast cancer diagnosis from proteomic mass spectrometry data: a comparative evaluation,” Stat. Appl. Genet. Mol. Biol., 7, 15.10.2202/1544-6115.1435Search in Google Scholar PubMed
Hand, D. J. (2012): “Assessing the Performance of Classification Methods,” Int. Stat. Rev., 80, 400–414.10.1111/j.1751-5823.2012.00183.xSearch in Google Scholar
Hastie, T., J. Friedman and R. Tibshirani (2009): “The Elements of Statistical Learning,” Springer-Verlag, New York, USA.10.1007/978-0-387-84858-7Search in Google Scholar
Hazai, E., I. Hazai, I. Ragueneau-Majlessi, S. P. Chung, Z. Bikadi and Q. C. Mao (2013): “Predicting substrates of the human breast cancer resistance protein using a support vector machine method,” BMC Bioinformatics, 14, 130.10.1186/1471-2105-14-130Search in Google Scholar PubMed PubMed Central
Hu, Q., W. Pan, S. An, P. Ma and J. Wei (2010): “An efficient gene selection technique for cancer recognition based on neighborhood mutual information,” Int. J. Mach. Learn. Cyber., 1, 63–74.10.1007/s13042-010-0008-6Search in Google Scholar
Huang, J., P. Breheny and S. Ma (2012): “A selective review of group selection in high dimensional models”, Statist. Sci., 27, 481–499.10.1214/12-STS392Search in Google Scholar PubMed PubMed Central
ICGC-TCGA DREAM Genomic Mutation Calling Challenge (https://www.synapse.org/#!Synapse:syn312572/wiki/), accessed 4/22/16.Search in Google Scholar
Jamain, A. and D. J. Hand (2008): “Mining Supervised Classification Performance Studies: A Meta-Analytic Investigation,” J. Classif., 25, 87–112.10.1007/s00357-008-9003-ySearch in Google Scholar
Jeanmougin, M., A. de Reynies, L. Marisa, C. Paccard, G. Nuel and M. Guedj (2010): “Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies,” PLoS One, 5, e12336.10.1371/journal.pone.0012336Search in Google Scholar PubMed PubMed Central
Lee, Y. J., C. C. Chang and C. H. Chao (2008): “Incremental forward feature selection with application to microarray gene expression data,” J. Biopharm. Stat., 18, 827–840.10.1080/10543400802277868Search in Google Scholar PubMed
Leek, J. T. and J. D. Storey (2011): “The joint null criterion for multiple hypothesis tests,” Stat. Appl. Genet. Mol. Biol., 10, 28.10.2202/1544-6115.1673Search in Google Scholar
Lettre, G., C. D. Palmer, T. Young, K. G. Ejebe, H. Allayee, E. J. Benjamin, F. Bennett, D. W. Bowden, A. Chakravarti, A. Dreisbach, D. N. Farlow, A. R. Folsom, M. Fornage, T. Forrester, E. Fox, C. A. Haiman, J. Hartiala, T. B. Harris, S. L. Hazen, S. R. Heckbert, B. E. Henderson, J. N. Hirschhorn, B. J. Keating, S. B. Kritchevsky, E. Larkin, M. Li, M. E. Rudock, C. A. McKenzie, J. B. Meigs, Y. A. Meng, T. H. Mosley, A. B. Newman, C. H. Newton-Cheh, D. N. Paltoo, G. J. Papanicolaou, N. Patterson, W. S. Post, B. M. Psaty, A. N. Qasim, L. Qu, D. J. Rader, S. Redline, M. P. Reilly, A. P. Reiner, S. S. Rich, J. I. Rotter, Y. Liu, P. Shrader, D. S. Siscovick, W. H. Tang, H. A. Taylor, R. P. Tracy, R. S. Vasan, K. M. Waters, R. Wilks, J. G. Wilson, R. R. Fabsitz, S. B. Gabriel, S. Kathiresan and E. Boerwinkle. (2011): “Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe Project,” PLoS Genet., 7, e1001300.10.1371/journal.pgen.1001300Search in Google Scholar PubMed PubMed Central
Li, C. and M. Li (2008): “GWAsimulator: a rapid whole-genome simulation program,” Bioinformatics, 24, 140–142.10.1093/bioinformatics/btm549Search in Google Scholar PubMed
Ma, S., X. Song and J. Huang (2007): “Supervised group Lasso with applications to microarray data analysis,” BMC Bioinformatics, 8, 60.10.1186/1471-2105-8-60Search in Google Scholar PubMed PubMed Central
MAQC Consortium (2010): “The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models,” Nat. Biotechnol., 28, 827–838.10.1038/nbt.1665Search in Google Scholar PubMed PubMed Central
Michailidis, G. (2012): “Statistical challenges in biological networks,” J. Comput. Graph. Stat., 21, 840–855.10.1080/10618600.2012.738614Search in Google Scholar
Mongan, M. A., R. T. Dunn, S. Vonderfecht, N. Everds, G. Chen, S. Cheng, M. Higgins-Garn, Y. Chen, C. A. Afshari, T. L. Williamson, L. Carlock, C. DiPalma, S. Moss and H. K. Hamadeh (2010) : “A novel statistical algorithm for gene expression analysis helps differentiate pregnane X receptor-dependent and independent mechanisms of toxicity,” PLoS One, 5, e15595.10.1371/journal.pone.0015595Search in Google Scholar
Monti, S., P. Tamayo, J. Mesirov and T. Golu (2003): “Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data,” Kluwer Academic Publishers, The Netherlands.Search in Google Scholar
Park, M. Y. and T. Hastie (2008): “Penalized logistic regression for detecting gene interactions,” Biostatistics, 9, 30–50.10.1093/biostatistics/kxm010Search in Google Scholar
Pool, J. E., I. Hellmann, J. D. Jensen and R. Nielsen (2010): “Population genetic inference from genomic sequence variation,” Genome Res., 20, 291–300.10.1101/gr.079509.108Search in Google Scholar
Ripke, S., B. M. Neale, A. Corvin, J. T. Walters, K. H. Farh, P. A. Holmans, P. Lee, B. Bulik-Sullivan, D. A. Collier, H. Huang, T. H. Pers, I. Agartz, E. Agerbo, M. Albus, M. Alexander, F. Amin, S. A. Bacanu, M. Begemann, R. A. Belliveau Jr, J. Bene, S. E. Bergen, E. Bevilacqua, T. B. Bigdeli, D. W. Black, R. Bruggeman, N. G. Buccola, R. L. Buckner, W. Byerley, W. Cahn, G. Cai, D. Campion, R. M. Cantor, V. J. Carr, N. Carrera, S. V. Catts, K. D. Chambert, R. C. Chan, R. Y. Chen, E. Y. Chen, W. Cheng, E. F. Cheung, S. A. Chong, C. R. Cloninger, D. Cohen, N. Cohen, P. Cormican, N. Craddock, J. J. Crowley, D. Curtis, M. Davidson, K. L. Davis, F. Degenhardt, J. Del Favero, D. Demontis, D. Dikeos, T. Dinan, S. Djurovic, G. Donohoe, E. Drapeau, J. Duan, F. Dudbridge, N. Durmishi, P. Eichhammer, J. Eriksson, V. Escott-Price, L. Essioux, A. H. Fanous, M. S. Farrell, J. Frank, L. Franke, R. Freedman, N. B. Freimer, M. Friedl, J. I. Friedman, M. Fromer, G. Genovese, L. Georgieva, I. Giegling, P. Giusti-Rodríguez, S. Godard, J. I. Goldstein, V. Golimbet, S. Gopal, J. Gratten, L. de Haan, C. Hammer, M. L. Hamshere, M. Hansen, T. Hansen, V. Haroutunian, A. M. Hartmann, F. A. Henskens, S. Herms, J. N. Hirschhorn, P. Hoffmann, A. Hofman, M. V. Hollegaard, D. M. Hougaard, M. Ikeda, I. Joa, A. Julià, R. S. Kahn, L. Kalaydjieva, S. Karachanak-Yankova, J. Karjalainen, D. Kavanagh, M. C. Keller, J. L. Kennedy, A. Khrunin, Y. Kim, J. Klovins, J. A. Knowles, B. Konte, V. Kucinskas, Z. Ausrele Kucinskiene, H. Kuzelova-Ptackova, A. K. Kähler, C. Laurent, J. L. Keong, S. H. Lee, S. E. Legge, B. Lerer, M. Li, T. Li, K. Y. Liang, J. Lieberman, S. Limborska, C. M. Loughland, J. Lubinski, J. Lönnqvist, M. Macek Jr, P. K. Magnusson, B. S. Maher, W. Maier, J. Mallet, S. Marsal, M. Mattheisen, M. Mattingsdal, R. W. McCarley, C. McDonald, A. M. McIntosh, S. Meier, C. J. Meijer, B. Melegh, I. Melle, R. I. Mesholam-Gately, A. Metspalu, P. T. Michie, L. Milani, V. Milanova, Y. Mokrab, D. W. Morris, O. Mors, K. C. Murphy, R. M. Murray, I. Myin-Germeys, B. Müller-Myhsok, M. Nelis, I. Nenadic, D. A. Nertney, G. Nestadt, K. K. Nicodemus, L. Nikitina-Zake, L. Nisenbaum, A. Nordin, E. O’Callaghan, C. O’Dushlaine, F. A. O’Neill, S. Y. Oh, A. Olincy, L. Olsen, J. Van Os, C. Pantelis, G. N. Papadimitriou, S. Papiol, E. Parkhomenko, M. T. Pato, T. Paunio, M. Pejovic-Milovancevic, D. O. Perkins, O. Pietiläinen, J. Pimm, A. J. Pocklington, J. Powell, A. Price, A. E. Pulver, S. M. Purcell, D. Quested, H. B. Rasmussen, A. Reichenberg, M. A. Reimers, A. L. Richards, J. L. Roffman, P. Roussos, D. M. Ruderfer, V. Salomaa, A. R. Sanders, U. Schall, C. R. Schubert, T. G. Schulze, S. G. Schwab, E. M. Scolnick, R. J. Scott, L. J. Seidman, J. Shi, E. Sigurdsson, T. Silagadze, J. M. Silverman, K. Sim, P. Slominsky, J. W. Smoller, H. C. So, C. A. Spencer, E. A. Stahl, H. Stefansson, S. Steinberg, E. Stogmann, R. E. Straub, E. Strengman, J. Strohmaier, T. S. Stroup, M. Subramaniam, J. Suvisaari, D. M. Svrakic, J. P. Szatkiewicz, E. Söderman, S. Thirumalai, D. Toncheva, S. Tosato, J. Veijola, J. Waddington, D. Walsh, D. Wang, Q. Wang, B. T. Webb, M. Weiser, D. B. Wildenauer, N. M. Williams, S. Williams, S. H. Witt, A. R. Wolen, E. H. Wong, B. K. Wormley, H. S. Xi, C. C. Zai, X. Zheng, F. Zimprich, N. R. Wray, K. Stefansson, P. M. Visscher, R. Adolfsson, O. A. Andreassen, D. H. Blackwood, E. Bramon, J. D. Buxbaum, A. D. Børglum, S. Cichon, A. Darvasi, E. Domenici, H. Ehrenreich, T. Esko, P. V. Gejman, M. Gill, H. Gurling, C. M. Hultman, N. Iwata, A. V. Jablensky, E. G. Jönsson, K. S. Kendler, G. Kirov, J. Knight, T. Lencz, D. F. Levinson, Q. S. Li, J. Liu, A. K. Malhotra, S. A. McCarroll, A. McQuillin, J. L. Moran, P. B. Mortensen, B. J. Mowry, M. M. Nöthen, R. A. Ophoff, M. J. Owen, A. Palotie, C. N. Pato, T. L. Petryshen, D. Posthuma, M. Rietschel, B. P. Riley, D. Rujescu, P. C. Sham, P. Sklar, D. St Clair, D. R. Weinberger, J. R. Wendland, T. Werge, M. J. Daly, P. F. Sullivan and M. C. O’Donovan. (2014): “Biological insights from 108 schizophrenia-associated genetic loci,” Nature, 511, 421–427.10.1038/nature13595Search in Google Scholar
Schapire, R. E. (1990): “The Strength of Weak Learnability,” Mach. Learn., 5, 197–227.10.1109/SFCS.1989.63451Search in Google Scholar
Sierra, A. and A. Echeverria (2003): “Skipping Fisher’s criterion,” Pattern Recognition and Image Analysis, Vol. 2652 of series Lecture Notes in Computer Science, 962–969.Search in Google Scholar
Singh, D., P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D’Amico, J. P. Richie, E. S. Landers, M. Loda, P. W. Kantoff, T. R. Golub and W. R. Sellers (2002): “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, 1, 203–209.10.1016/S1535-6108(02)00030-2Search in Google Scholar
Stigler, S. M. (2010): “The changing history of robustness,” Am. Stat., 64, 277–281.10.1198/tast.2010.10159Search in Google Scholar
Stokes, M. E. and S. Visweswaran (2012): “Application of a spatially-weighted Relief algorithm for ranking genetic predictors of disease,” BioData Min., 5, 20.10.1186/1756-0381-5-20Search in Google Scholar PubMed PubMed Central
Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. Series B Stat. Methodol., 64, 479–498.10.1111/1467-9868.00346Search in Google Scholar
Storey, J. D., J. E. Taylor and D. Siegmund (2004): “Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: a unified approach,” J. R. Stat. Soc. Series B Stat. Methodol., 66, 187–205.10.1111/j.1467-9868.2004.00439.xSearch in Google Scholar
Su, Y., T. M. Murali, V. Pavlovic, M. Schaffer and S. Kasif (2003): “RankGene: identification of diagnostic genes based on expression data,” Bioinformatics, 19, 1578–1579.10.1093/bioinformatics/btg179Search in Google Scholar PubMed
Thomas, R., L. de la Torre, X. Chang and S. Mehrotra (2010): “Validation and characterization of DNA microarray gene expression data distribution and associated moments,” BMC Bioinformatics, 11, 576.10.1186/1471-2105-11-576Search in Google Scholar PubMed PubMed Central
Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso: a retrospective,” J. R. Stat. Soc. Series B Stat. Methodol., 73: 273–282.10.1111/j.1467-9868.2011.00771.xSearch in Google Scholar
Van Steen, K. (2012): “Travelling the world of gene-gene interactions,” Brief. Bioinform., 13, 1–19.10.1093/bib/bbr012Search in Google Scholar PubMed
Wang, C. and B. Liu (2008): “Data mining and hotspot detection in an urban development project,” J. Data. Sci., 6, 389–414.Search in Google Scholar
Wang, C. and M. Zhuravlev (2009): “An analysis of profit and customer satisfaction in consumer finance,” Case Studies Bus. Ind. Gov. Stat., 2, 147–156.Search in Google Scholar
Wang, C., W. Howell and C. Wang (2015): “Gene search and the related risk estimates: a statistical analysis of prostate cancer data,” In: Practical predictive analytics and decision systems for medicine, Academic Press, London, 896–920.10.1016/B978-0-12-411643-6.00041-7Search in Google Scholar
Wang, X. S. and R. Simon (2011): “Microarray-based cancer prediction using single genes,” BMC Bioinformatics, 12, 391.10.1186/1471-2105-12-391Search in Google Scholar PubMed PubMed Central
Weston, J., A. Elissee, B. Scholkopf and M. Tipping (2003): “Use of the zero-norm with linear models and kernel methods,” J. Mach. Learn. Res., 3, 1439–1461.Search in Google Scholar
Weston, J., S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V. Vapnik. (2001): “Feature selection for SVMs,” Adv. Neural. Inf. Process. Syst., 13, 668–674.Search in Google Scholar
Yang, Z. R. (2010): Machine learning approaches to bioinformatics (science, engineering, and biology informatics), vol. 4, World Scientific Publishing, New Jersey, USA.10.1142/7454Search in Google Scholar
Yuan, M. and Y. Lin (2007): “On the non-negative garrotte estimator,” J. R. Stat. Soc. Series B Stat. Methodol., 69, 143–161.10.1111/j.1467-9868.2007.00581.xSearch in Google Scholar
Zhao, P. and B. Yu (2006): “On model selection consistency of Lasso,” J. Mach. Learn Res., 7, 2541–2563.Search in Google Scholar
Zou, H. (2006): “The Adaptive Lasso and Its Oracle Properties,” J. Am. Stat. Assoc., 101, 1418–1429.10.1198/016214506000000735Search in Google Scholar
Zuber, V. and K. Strimmer (2011): “High-dimensional regression and variable selection using CAR scores,” Stat. Appl. Genet. Mol. Biol., 10, 34.10.2202/1544-6115.1730Search in Google Scholar
©2016 by De Gruyter