Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches

Chamont Wang; Jana L. Gevertz

doi:10.1515/sagmb-2015-0072

Published by De Gruyter May 25, 2016

Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches

Chamont Wang and Jana L. Gevertz

From the journal Statistical Applications in Genetics and Molecular Biology

https://doi.org/10.1515/sagmb-2015-0072

Showing a limited preview of this publication:

Abstract

Modern biological experiments often involve high-dimensional data with thousands or more variables. A challenging problem is to identify the key variables that are related to a specific disease. Confounding this task is the vast number of statistical methods available for variable selection. For this reason, we set out to develop a framework to investigate the variable selection capability of statistical methods that are commonly applied to analyze high-dimensional biological datasets. Specifically, we designed six simulated cancers (based on benchmark colon and prostate cancer data) where we know precisely which genes cause a dataset to be classified as cancerous or normal – we call these causative genes. We found that not one statistical method tested could identify all the causative genes for all of the simulated cancers, even though increasing the sample size does improve the variable selection capabilities in most cases. Furthermore, certain statistical tools can classify our simulated data with a low error rate, yet the variables being used for classification are not necessarily the causative genes.

Keywords: classification; false discovery rate; gene identification; shrinkage and regularization techniques; variable selection

Acknowledgments:

The authors are grateful to the reviewers, as their comments led to substantial improvements in the final manuscript. In addition, the authors are very appreciative of Welling Howell from Wheatstone Analytics for his extensive works on R penalized-SVM, Random Forest, and for his critical review of the manuscript. In addition, we owe special thanks to Charlene Wang of Health First Incorporated on SAS computation. We also appreciate the comments and help from the following colleagues and friends: Chaur-Chin Chen of National Tsing-Hua University, Leonardo Auslender of Cisco Systems, Inc., and Sudhir Nayak of The College of New Jersey. Finally, we would like to thank the following students at The College of New Jersey for the cross-check of the computer experiments in this paper: Edward Lee, Roger Shan, Alana Huszar, Sahnaz Saleem, Cassidy Wilson, Joseph Ruffo, and Roger Shan.

Appendix

The histograms and scatterplots of simulated genes X1, X2, X3 are presented below:

The means and standard deviations of these variables are slightly different:

Variable	N	Mean	Std dev
X1	102	4.6881029	3.1055643
X2	102	8.3062417	3.7449357
X3	102	13.6261146	5.0801841

The 3 variables are generated by the following formulas in a do-loop:

X₁, X₂, X₃ are 10*Uniform(0,1)
Z₂=2X₂
Z₃=3X₃
X₂(new)=X₁+0.35Z₂
X₃(new)=X₂(new)+0.35Z₃

Consequently X₂ and X₃ have higher probability in the middle. The choice of these distributions were partly motivated by the prostate cancer data (Efron, 2008). See, e.g. the histograms of Gene-1 to Gene-4 below:

After examining the 6033 histograms of the prostate cancer data, along with other disease datasets, we believe the above distributions are reasonable compromises in the simulation study to test statistical methods in gene search.

References

Alon, U., N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine (1999): “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Nat. Acad. Sci., 96, 6745–6750.10.1073/pnas.96.12.6745Search in Google Scholar PubMed PubMed Central

Anonymous (2006): “Making the most of microarrays,” Nat. Biotechnol., 24, 1039.10.1038/nbt0906-1039Search in Google Scholar PubMed

Anonymous (2010): “MAQC-II: Analyze that!,” Nat. Biotechnol., 28, 761.10.1038/nbt0810-761bSearch in Google Scholar PubMed

Anonymous (2014): “A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium,” Nat. Biotechnol., 32, 903–914.10.1038/nbt.2957Search in Google Scholar PubMed PubMed Central

Assimes, T. L., J. W. Knowles, A. Basu, C. Iribarren, A. Southwick, H. Tang, D. Absher, J. Li, J. M. Fair, G. D. Rubin, S. Sidney, S. P. Fortmann, A. S. Go, M. A. Hlatky, R. M. Myers, N. Risch and T. Quertermous (2008): “Susceptibility locus for clinical and subclinical coronary artery disease at chromosome 9p21 in the multi-ethnic advance study,” Hum. Mol. Genet., 17, 2320–2328.10.1093/hmg/ddn132Search in Google Scholar PubMed PubMed Central

Bar, H., J. Booth, E. Schifano and M. T. Wells (2009): “Laplace approximated EM microarray analysis: an empirical bayes approach for comparative microarray experiments,” Statist. Sci., 25, 388–407.10.1214/10-STS339Search in Google Scholar

Becker, N., W. Werft, G. Toedt, P. Lichter and A. Benner (2009): “PenalizedSVM: a R-package for feature selection SVM classification,” Bioinformatics, 25, 1711–1712.10.1093/bioinformatics/btp286Search in Google Scholar PubMed

Benjamini Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Series B Stat. Methodol., 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar

Bootkrajang, J. and A. Kabán (2013): “Classification of mislabelled microarrays using robust sparse logistic regression,” Bioinformatics, 29, 870–877.10.1093/bioinformatics/btt078Search in Google Scholar PubMed

Cordell, H. J. (2009): “Detecting gene-gene interactions that underlie human diseases,” Nat. Rev. Genet., 10, 392–404.10.1038/nrg2579Search in Google Scholar PubMed PubMed Central

Dean, N. and A. E. Raftery (2010): “Latent class analysis variable selection,” Ann. Inst. Stat. Math., 62, 11–35.10.1007/s10463-009-0258-9Search in Google Scholar PubMed PubMed Central

Do, K. A., P. Müller and F. Tang (2005): “A Bayesian mixture model for differential gene expression,” J. R. Stat. Soc. Ser. C Appl. Stat., 54, 627–644.10.1111/j.1467-9876.2005.05593.xSearch in Google Scholar

Dudoit, S., J. P. Shaffer and J. C. Boldrick (2003): “Multiple hypothesis testing in microarray experiments,” Statist. Sci., 18, 71–103.10.1214/ss/1056397487Search in Google Scholar

Efron, B. (2008): “Microarrays, empirical Bayes and the two-groups model,” Statist. Sci., 23, 1–22.Search in Google Scholar

Efron, B. (2010): “The future of indirect evidence,” Statist. Sci., 25, 145–157.10.1214/09-STS308Search in Google Scholar PubMed PubMed Central

Efron, B. and N. Zhang (2011): “False discovery rates and copy number variation,” Biometrika, 98, 251–271.10.1093/biomet/asr018Search in Google Scholar

Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004): “Least angle regression,” Ann. Stat., 32, 407–499.10.1214/009053604000000067Search in Google Scholar

Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1438–1360.10.1198/016214501753382273Search in Google Scholar

Ferreira, J. A. and A. H. Zwinderman (2006): “On the Benjamini-Hochberg method,” Ann. Statist., 34, 1827–1849.10.1214/009053606000000425Search in Google Scholar

Freund, Y. (1995): “Boosting a weak learning algorithm by majority,” Inf. Comput., 121, 256–285.10.1016/B978-1-55860-146-8.50019-9Search in Google Scholar

Freund, Y. and R. E. Schapire (1996): “Experiments with a new boosting algorithm,” Machine Learning: Proc. 13th International Conference, 148–156.Search in Google Scholar

Friedman, J. (2001): “Greedy function approximation: a gradient boosting machine,” Ann. Statist., 29, 1189–1232.10.1214/aos/1013203451Search in Google Scholar

Friedman, J. (2006): “Recent advances in predictive (machine) learning,” J. Classif., 23, 175–197.10.2172/826695Search in Google Scholar

Friedman, J., T. Hastie and R. Tibshirani (2000): “Additive logistic regression: a statistical view of boosting (with discussion),” Ann. Statist., 28, 337–407.10.1214/aos/1016218223Search in Google Scholar

Funke, B., A. K. Malhotra, C. T. Finn, A. M. Plocik, S. L. Lake, T. Lencz, P. DeRosse, J. M. Kane and R. Kucherlapati (2005): “COMT genetic variation confers risk for psychotic and affective disorders: a case control study,” Behav. Brain Funct., 1, 19.10.1186/1744-9081-1-19Search in Google Scholar PubMed PubMed Central

Guyon, I. and A. Elisseeff (2003): “An introduction to variable and feature selection,” J. Mach. Learn. Res., 3, 1157–1182.Search in Google Scholar

Guyon, I., J. Weston, S. Barnhill and V. Vapnik (2002): “Gene selection for cancer classification using support vector machines,” Mach. Learn., 46, 389–422.10.1023/A:1012487302797Search in Google Scholar

Hand, D. J. (2006): “Classifier technology and the illusion of progress,” Statist. Sci., 21, 1–14.10.1214/088342306000000060Search in Google Scholar

Hand, D. J. (2008): “Breast cancer diagnosis from proteomic mass spectrometry data: a comparative evaluation,” Stat. Appl. Genet. Mol. Biol., 7, 15.10.2202/1544-6115.1435Search in Google Scholar PubMed

Hand, D. J. (2012): “Assessing the Performance of Classification Methods,” Int. Stat. Rev., 80, 400–414.10.1111/j.1751-5823.2012.00183.xSearch in Google Scholar

Hastie, T., J. Friedman and R. Tibshirani (2009): “The Elements of Statistical Learning,” Springer-Verlag, New York, USA.10.1007/978-0-387-84858-7Search in Google Scholar

Hazai, E., I. Hazai, I. Ragueneau-Majlessi, S. P. Chung, Z. Bikadi and Q. C. Mao (2013): “Predicting substrates of the human breast cancer resistance protein using a support vector machine method,” BMC Bioinformatics, 14, 130.10.1186/1471-2105-14-130Search in Google Scholar PubMed PubMed Central

Hu, Q., W. Pan, S. An, P. Ma and J. Wei (2010): “An efficient gene selection technique for cancer recognition based on neighborhood mutual information,” Int. J. Mach. Learn. Cyber., 1, 63–74.10.1007/s13042-010-0008-6Search in Google Scholar

Huang, J., P. Breheny and S. Ma (2012): “A selective review of group selection in high dimensional models”, Statist. Sci., 27, 481–499.10.1214/12-STS392Search in Google Scholar PubMed PubMed Central

ICGC-TCGA DREAM Genomic Mutation Calling Challenge (https://www.synapse.org/#!Synapse:syn312572/wiki/), accessed 4/22/16.Search in Google Scholar

Jamain, A. and D. J. Hand (2008): “Mining Supervised Classification Performance Studies: A Meta-Analytic Investigation,” J. Classif., 25, 87–112.10.1007/s00357-008-9003-ySearch in Google Scholar

Jeanmougin, M., A. de Reynies, L. Marisa, C. Paccard, G. Nuel and M. Guedj (2010): “Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies,” PLoS One, 5, e12336.10.1371/journal.pone.0012336Search in Google Scholar PubMed PubMed Central

Lee, Y. J., C. C. Chang and C. H. Chao (2008): “Incremental forward feature selection with application to microarray gene expression data,” J. Biopharm. Stat., 18, 827–840.10.1080/10543400802277868Search in Google Scholar PubMed

Leek, J. T. and J. D. Storey (2011): “The joint null criterion for multiple hypothesis tests,” Stat. Appl. Genet. Mol. Biol., 10, 28.10.2202/1544-6115.1673Search in Google Scholar

Lettre, G., C. D. Palmer, T. Young, K. G. Ejebe, H. Allayee, E. J. Benjamin, F. Bennett, D. W. Bowden, A. Chakravarti, A. Dreisbach, D. N. Farlow, A. R. Folsom, M. Fornage, T. Forrester, E. Fox, C. A. Haiman, J. Hartiala, T. B. Harris, S. L. Hazen, S. R. Heckbert, B. E. Henderson, J. N. Hirschhorn, B. J. Keating, S. B. Kritchevsky, E. Larkin, M. Li, M. E. Rudock, C. A. McKenzie, J. B. Meigs, Y. A. Meng, T. H. Mosley, A. B. Newman, C. H. Newton-Cheh, D. N. Paltoo, G. J. Papanicolaou, N. Patterson, W. S. Post, B. M. Psaty, A. N. Qasim, L. Qu, D. J. Rader, S. Redline, M. P. Reilly, A. P. Reiner, S. S. Rich, J. I. Rotter, Y. Liu, P. Shrader, D. S. Siscovick, W. H. Tang, H. A. Taylor, R. P. Tracy, R. S. Vasan, K. M. Waters, R. Wilks, J. G. Wilson, R. R. Fabsitz, S. B. Gabriel, S. Kathiresan and E. Boerwinkle. (2011): “Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe Project,” PLoS Genet., 7, e1001300.10.1371/journal.pgen.1001300Search in Google Scholar PubMed PubMed Central

Li, C. and M. Li (2008): “GWAsimulator: a rapid whole-genome simulation program,” Bioinformatics, 24, 140–142.10.1093/bioinformatics/btm549Search in Google Scholar PubMed

Ma, S., X. Song and J. Huang (2007): “Supervised group Lasso with applications to microarray data analysis,” BMC Bioinformatics, 8, 60.10.1186/1471-2105-8-60Search in Google Scholar PubMed PubMed Central

MAQC Consortium (2010): “The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models,” Nat. Biotechnol., 28, 827–838.10.1038/nbt.1665Search in Google Scholar PubMed PubMed Central

Michailidis, G. (2012): “Statistical challenges in biological networks,” J. Comput. Graph. Stat., 21, 840–855.10.1080/10618600.2012.738614Search in Google Scholar

Mongan, M. A., R. T. Dunn, S. Vonderfecht, N. Everds, G. Chen, S. Cheng, M. Higgins-Garn, Y. Chen, C. A. Afshari, T. L. Williamson, L. Carlock, C. DiPalma, S. Moss and H. K. Hamadeh (2010) : “A novel statistical algorithm for gene expression analysis helps differentiate pregnane X receptor-dependent and independent mechanisms of toxicity,” PLoS One, 5, e15595.10.1371/journal.pone.0015595Search in Google Scholar

Monti, S., P. Tamayo, J. Mesirov and T. Golu (2003): “Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data,” Kluwer Academic Publishers, The Netherlands.Search in Google Scholar

Park, M. Y. and T. Hastie (2008): “Penalized logistic regression for detecting gene interactions,” Biostatistics, 9, 30–50.10.1093/biostatistics/kxm010Search in Google Scholar

Pool, J. E., I. Hellmann, J. D. Jensen and R. Nielsen (2010): “Population genetic inference from genomic sequence variation,” Genome Res., 20, 291–300.10.1101/gr.079509.108Search in Google Scholar

Ripke, S., B. M. Neale, A. Corvin, J. T. Walters, K. H. Farh, P. A. Holmans, P. Lee, B. Bulik-Sullivan, D. A. Collier, H. Huang, T. H. Pers, I. Agartz, E. Agerbo, M. Albus, M. Alexander, F. Amin, S. A. Bacanu, M. Begemann, R. A. Belliveau Jr, J. Bene, S. E. Bergen, E. Bevilacqua, T. B. Bigdeli, D. W. Black, R. Bruggeman, N. G. Buccola, R. L. Buckner, W. Byerley, W. Cahn, G. Cai, D. Campion, R. M. Cantor, V. J. Carr, N. Carrera, S. V. Catts, K. D. Chambert, R. C. Chan, R. Y. Chen, E. Y. Chen, W. Cheng, E. F. Cheung, S. A. Chong, C. R. Cloninger, D. Cohen, N. Cohen, P. Cormican, N. Craddock, J. J. Crowley, D. Curtis, M. Davidson, K. L. Davis, F. Degenhardt, J. Del Favero, D. Demontis, D. Dikeos, T. Dinan, S. Djurovic, G. Donohoe, E. Drapeau, J. Duan, F. Dudbridge, N. Durmishi, P. Eichhammer, J. Eriksson, V. Escott-Price, L. Essioux, A. H. Fanous, M. S. Farrell, J. Frank, L. Franke, R. Freedman, N. B. Freimer, M. Friedl, J. I. Friedman, M. Fromer, G. Genovese, L. Georgieva, I. Giegling, P. Giusti-Rodríguez, S. Godard, J. I. Goldstein, V. Golimbet, S. Gopal, J. Gratten, L. de Haan, C. Hammer, M. L. Hamshere, M. Hansen, T. Hansen, V. Haroutunian, A. M. Hartmann, F. A. Henskens, S. Herms, J. N. Hirschhorn, P. Hoffmann, A. Hofman, M. V. Hollegaard, D. M. Hougaard, M. Ikeda, I. Joa, A. Julià, R. S. Kahn, L. Kalaydjieva, S. Karachanak-Yankova, J. Karjalainen, D. Kavanagh, M. C. Keller, J. L. Kennedy, A. Khrunin, Y. Kim, J. Klovins, J. A. Knowles, B. Konte, V. Kucinskas, Z. Ausrele Kucinskiene, H. Kuzelova-Ptackova, A. K. Kähler, C. Laurent, J. L. Keong, S. H. Lee, S. E. Legge, B. Lerer, M. Li, T. Li, K. Y. Liang, J. Lieberman, S. Limborska, C. M. Loughland, J. Lubinski, J. Lönnqvist, M. Macek Jr, P. K. Magnusson, B. S. Maher, W. Maier, J. Mallet, S. Marsal, M. Mattheisen, M. Mattingsdal, R. W. McCarley, C. McDonald, A. M. McIntosh, S. Meier, C. J. Meijer, B. Melegh, I. Melle, R. I. Mesholam-Gately, A. Metspalu, P. T. Michie, L. Milani, V. Milanova, Y. Mokrab, D. W. Morris, O. Mors, K. C. Murphy, R. M. Murray, I. Myin-Germeys, B. Müller-Myhsok, M. Nelis, I. Nenadic, D. A. Nertney, G. Nestadt, K. K. Nicodemus, L. Nikitina-Zake, L. Nisenbaum, A. Nordin, E. O’Callaghan, C. O’Dushlaine, F. A. O’Neill, S. Y. Oh, A. Olincy, L. Olsen, J. Van Os, C. Pantelis, G. N. Papadimitriou, S. Papiol, E. Parkhomenko, M. T. Pato, T. Paunio, M. Pejovic-Milovancevic, D. O. Perkins, O. Pietiläinen, J. Pimm, A. J. Pocklington, J. Powell, A. Price, A. E. Pulver, S. M. Purcell, D. Quested, H. B. Rasmussen, A. Reichenberg, M. A. Reimers, A. L. Richards, J. L. Roffman, P. Roussos, D. M. Ruderfer, V. Salomaa, A. R. Sanders, U. Schall, C. R. Schubert, T. G. Schulze, S. G. Schwab, E. M. Scolnick, R. J. Scott, L. J. Seidman, J. Shi, E. Sigurdsson, T. Silagadze, J. M. Silverman, K. Sim, P. Slominsky, J. W. Smoller, H. C. So, C. A. Spencer, E. A. Stahl, H. Stefansson, S. Steinberg, E. Stogmann, R. E. Straub, E. Strengman, J. Strohmaier, T. S. Stroup, M. Subramaniam, J. Suvisaari, D. M. Svrakic, J. P. Szatkiewicz, E. Söderman, S. Thirumalai, D. Toncheva, S. Tosato, J. Veijola, J. Waddington, D. Walsh, D. Wang, Q. Wang, B. T. Webb, M. Weiser, D. B. Wildenauer, N. M. Williams, S. Williams, S. H. Witt, A. R. Wolen, E. H. Wong, B. K. Wormley, H. S. Xi, C. C. Zai, X. Zheng, F. Zimprich, N. R. Wray, K. Stefansson, P. M. Visscher, R. Adolfsson, O. A. Andreassen, D. H. Blackwood, E. Bramon, J. D. Buxbaum, A. D. Børglum, S. Cichon, A. Darvasi, E. Domenici, H. Ehrenreich, T. Esko, P. V. Gejman, M. Gill, H. Gurling, C. M. Hultman, N. Iwata, A. V. Jablensky, E. G. Jönsson, K. S. Kendler, G. Kirov, J. Knight, T. Lencz, D. F. Levinson, Q. S. Li, J. Liu, A. K. Malhotra, S. A. McCarroll, A. McQuillin, J. L. Moran, P. B. Mortensen, B. J. Mowry, M. M. Nöthen, R. A. Ophoff, M. J. Owen, A. Palotie, C. N. Pato, T. L. Petryshen, D. Posthuma, M. Rietschel, B. P. Riley, D. Rujescu, P. C. Sham, P. Sklar, D. St Clair, D. R. Weinberger, J. R. Wendland, T. Werge, M. J. Daly, P. F. Sullivan and M. C. O’Donovan. (2014): “Biological insights from 108 schizophrenia-associated genetic loci,” Nature, 511, 421–427.10.1038/nature13595Search in Google Scholar

Schapire, R. E. (1990): “The Strength of Weak Learnability,” Mach. Learn., 5, 197–227.10.1109/SFCS.1989.63451Search in Google Scholar

Sierra, A. and A. Echeverria (2003): “Skipping Fisher’s criterion,” Pattern Recognition and Image Analysis, Vol. 2652 of series Lecture Notes in Computer Science, 962–969.Search in Google Scholar

Singh, D., P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D’Amico, J. P. Richie, E. S. Landers, M. Loda, P. W. Kantoff, T. R. Golub and W. R. Sellers (2002): “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, 1, 203–209.10.1016/S1535-6108(02)00030-2Search in Google Scholar

Stigler, S. M. (2010): “The changing history of robustness,” Am. Stat., 64, 277–281.10.1198/tast.2010.10159Search in Google Scholar

Stokes, M. E. and S. Visweswaran (2012): “Application of a spatially-weighted Relief algorithm for ranking genetic predictors of disease,” BioData Min., 5, 20.10.1186/1756-0381-5-20Search in Google Scholar PubMed PubMed Central

Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. Series B Stat. Methodol., 64, 479–498.10.1111/1467-9868.00346Search in Google Scholar

Storey, J. D., J. E. Taylor and D. Siegmund (2004): “Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: a unified approach,” J. R. Stat. Soc. Series B Stat. Methodol., 66, 187–205.10.1111/j.1467-9868.2004.00439.xSearch in Google Scholar

Su, Y., T. M. Murali, V. Pavlovic, M. Schaffer and S. Kasif (2003): “RankGene: identification of diagnostic genes based on expression data,” Bioinformatics, 19, 1578–1579.10.1093/bioinformatics/btg179Search in Google Scholar PubMed

Thomas, R., L. de la Torre, X. Chang and S. Mehrotra (2010): “Validation and characterization of DNA microarray gene expression data distribution and associated moments,” BMC Bioinformatics, 11, 576.10.1186/1471-2105-11-576Search in Google Scholar PubMed PubMed Central

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso: a retrospective,” J. R. Stat. Soc. Series B Stat. Methodol., 73: 273–282.10.1111/j.1467-9868.2011.00771.xSearch in Google Scholar

Van Steen, K. (2012): “Travelling the world of gene-gene interactions,” Brief. Bioinform., 13, 1–19.10.1093/bib/bbr012Search in Google Scholar PubMed

Wang, C. and B. Liu (2008): “Data mining and hotspot detection in an urban development project,” J. Data. Sci., 6, 389–414.Search in Google Scholar

Wang, C. and M. Zhuravlev (2009): “An analysis of profit and customer satisfaction in consumer finance,” Case Studies Bus. Ind. Gov. Stat., 2, 147–156.Search in Google Scholar

Wang, C., W. Howell and C. Wang (2015): “Gene search and the related risk estimates: a statistical analysis of prostate cancer data,” In: Practical predictive analytics and decision systems for medicine, Academic Press, London, 896–920.10.1016/B978-0-12-411643-6.00041-7Search in Google Scholar

Wang, X. S. and R. Simon (2011): “Microarray-based cancer prediction using single genes,” BMC Bioinformatics, 12, 391.10.1186/1471-2105-12-391Search in Google Scholar PubMed PubMed Central

Weston, J., A. Elissee, B. Scholkopf and M. Tipping (2003): “Use of the zero-norm with linear models and kernel methods,” J. Mach. Learn. Res., 3, 1439–1461.Search in Google Scholar

Weston, J., S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V. Vapnik. (2001): “Feature selection for SVMs,” Adv. Neural. Inf. Process. Syst., 13, 668–674.Search in Google Scholar

Yang, Z. R. (2010): Machine learning approaches to bioinformatics (science, engineering, and biology informatics), vol. 4, World Scientific Publishing, New Jersey, USA.10.1142/7454Search in Google Scholar

Yuan, M. and Y. Lin (2007): “On the non-negative garrotte estimator,” J. R. Stat. Soc. Series B Stat. Methodol., 69, 143–161.10.1111/j.1467-9868.2007.00581.xSearch in Google Scholar

Zhao, P. and B. Yu (2006): “On model selection consistency of Lasso,” J. Mach. Learn Res., 7, 2541–2563.Search in Google Scholar

Zou, H. (2006): “The Adaptive Lasso and Its Oracle Properties,” J. Am. Stat. Assoc., 101, 1418–1429.10.1198/016214506000000735Search in Google Scholar

Zuber, V. and K. Strimmer (2011): “High-dimensional regression and variable selection using CAR scores,” Stat. Appl. Genet. Mol. Biol., 10, 34.10.2202/1544-6115.1730Search in Google Scholar

Published Online: 2016-5-25

Published in Print: 2016-8-1

Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches

Abstract

Acknowledgments:

Appendix

References

Journal and Issue

Articles in the same Issue