Abstract
Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, for selecting informative biomarkers related to the survival outcome using the longitudinal genomics data. LCox is powerful to detect different forms of dependence between the longitudinal biomarkers and the survival outcome. We show that LCox has improved performance compared to existing methods through extensive simulation studies. In addition, by applying LCox to a dataset of patients with idiopathic pulmonary fibrosis, we are able to identify biologically meaningful genes while all other methods fail to make any discovery. An R package to perform LCox is freely available at https://CRAN.R-project.org/package=LCox.
Funding source: National Institutes of Health
Award Identifier / Grant number: R01 GM59507, P01 CA154295, U01 HL112707, R01 HL127349, U01 HL108642, and UH3 HL123886
Funding source: NSF
Award Identifier / Grant number: DMS-15-12975
Funding statement: Jiehuan Sun and Hongyu Zhao were supported in part by the National Institutes of Health Funder Id 10.13039/100000002, grants R01 GM59507 and P01 CA154295. Jose D. Herazo-Maya was supported by the Harold Amos Faculty development program of the Robert Wood Johnson Foundation and the Pulmonary Fibrosis Foundation. Naftali Kaminski was supported in part by the National Institutes of Health grants U01 HL112707, R01 HL127349, U01 HL108642, and UH3 HL123886. The research of Jane-Ling Wang was supported in part by the NSF grant DMS-15-12975.
References
Alizadeh, A. A., M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson Jr, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown and L. M. Staudt (2000): “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, 403, 503–511.10.1038/35000501Search in Google Scholar PubMed
Anastasios A. and M. D. Tsiatis (2004): “Joint modeling of longitudinal and time-to-event data: An overview,” Stat. Sin., 14, 809–834.Search in Google Scholar
Armanios, M. Y., J. J.-L. Chen, J. D. Cogan, J. K. Alder, R. G. Ingersoll, C. Markin, W. E. Lawson, M. Xie, I. Vulto, A. J. Phillips III, P. M. Lansdorp, C. W. Greider and J. E. Loyd (2007): “Telomerase mutations in families with idiopathic pulmonary fibrosis,” N. Engl. J. Med., 356, 1317–1326.10.1056/NEJMoa066157Search in Google Scholar PubMed
Bair, E. and R. Tibshirani (2004): “Semi-supervised methods to predict patient survival from gene expression data,” PLoS Biol., 2, e108.10.1371/journal.pbio.0020108Search in Google Scholar PubMed PubMed Central
Banchereau, R., S. Hong, B. Cantarel, N. Baldwin, J. Baisch, M. Edens, A.-M. Cepika, P. Acs, J. Turner, E. Anguiano, P. Vinod, S. Kahn, G. Obermoser, D. Blankenship, E. Wakeland, L. Nassi, A. Gotte, M. Punaro, Y. J. Liu, J. Banchereau, J. Rossello-Urgell, T. Wright and V. Pascual (2016): “Personalized immunomonitoring uncovers molecular networks that stratify lupus patients,” Cell, 165, 551–565.10.1016/j.cell.2016.03.008Search in Google Scholar PubMed PubMed Central
Cai, T., G. Tonini and X. Lin (2011): “Kernel machine approach to testing the significance of multiple genetic markers for risk prediction,” Biometrics, 67, 975–986.10.1111/j.1541-0420.2010.01544.xSearch in Google Scholar PubMed PubMed Central
Chen, R., G. I. Mias, J. Li-Pook-Than, L. Jiang, H. Y. Lam, R. Chen, E. Miriami, K. J. Karczewski, M. Hariharan, F. E. Dewey, Y. Cheng, J. M. Clark, H. Im, L. Habegger, S. Balasubramanian, M. O’Huallachain, T. J. Dudley, S. Hillenmeyer, R. Haraksingh, D. Sharon, G. Euskirchen, P. Lacroute, K. Bettinger, P. A. Boyle, M. Kasowski, F. Grubert, S. Seki, M. Garcia, M, M. Whirl-Carrillo. Gallardo, A. M. Blasco, L. P. Greenberg, P. Snyder, E. T. Klein, B. R. Altman, J. A. Butte, A. E. Ashley, M. Gerstein, C. K. Nadeau, H. Tang and M. Snyder (2012): “Personal omics profiling reveals dynamic molecular and medical phenotypes,” Cell, 148, 1293–1307.10.1016/j.cell.2012.02.009Search in Google Scholar PubMed PubMed Central
Fan, J. and R. Li (2002): “Variable selection for Cox’s proportional hazards model and frailty model,” Ann. Statist., 30, 74–99.10.1214/aos/1015362185Search in Google Scholar
Fisher, L. D. and D. Y. Lin (1999): “Time-dependent covariates in the cox proportional-hazards regression model,” Annu. Rev. Public Health, 20, 145–157.10.1146/annurev.publhealth.20.1.145Search in Google Scholar PubMed
Goeman, J. J. (2010): “L1 penalized estimation in the Cox proportional hazards model,” Biom. J., 52, 70–84.Search in Google Scholar PubMed
Gross, T. J. and G. W. Hunninghake (2001): “Idiopathic pulmonary fibrosis,” N. Engl. J. Med., 345, 517–525.10.1056/NEJMra003200Search in Google Scholar PubMed
Gui, J. and H. Li (2005): “Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data,” Bioinformatics, 21, 3001–3008.10.1093/bioinformatics/bti422Search in Google Scholar PubMed
Herazo-Maya, J. D., I. Noth, S. R. Duncan, S. Kim, S.-F. Ma, G. C. Tseng, E. Feingold, B. M. Juan-Guardela, J. T. Richards, Y. Lussier, Y. Huang, R. Vij, K. O. Lindell, J. Xue, K. F. Gibson, S. D. Shapiro, J. G. N. Garcia and N. Kaminski (2013): “Peripheral blood mononuclear cell gene expression profiles predict poor outcome in idiopathic pulmonary fibrosis,” Sci. Transl. Med., 5, 205ra136–205ra136.10.1126/scitranslmed.3005964Search in Google Scholar PubMed PubMed Central
Hsieh, F., Y.-K. Tseng and J.-L. Wang (2006): “Joint modeling of survival and longitudinal data: likelihood approach revisited,” Biometrics, 62, 1037–1043.10.1111/j.1541-0420.2006.00570.xSearch in Google Scholar PubMed
Ishwaran, H., U. B. Kogalur, E. H. Blackstone and M. S. Lauer (2008): “Random survival forests,” Ann. Appl. Stat., 2, 841–860.10.1214/08-AOAS169Search in Google Scholar
Kinnula, V. L., C. L. Fattman, R. J. Tan and T. D. Oury (2005): “Oxidative stress in pulmonary fibrosis: a possible role for redox modulatory therapy,” Am. J. Respir. Crit. Care Med., 172, 417–422.10.1164/rccm.200501-017PPSearch in Google Scholar PubMed PubMed Central
Levine, J. H., E. F. Simonds, S. C. Bendall, K. L. Davis, D. A. El-ad, M. D. Tadmor, O. Litvin, H. G. Fienberg, A. Jager, E. R. Zunder, R. Finck, A. L. Gedman, I. Radtke, J. R. Downing, D. Pe’er, G. P. Nolan (2015): “Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis,” Cell, 162, 184–197.10.1016/j.cell.2015.05.047Search in Google Scholar PubMed PubMed Central
Ley, B., C. J. Ryerson, E. Vittinghoff, J. H. Ryu, S. Tomassetti, J. S. Lee, V. Poletti, M. Buccioli, B. M. Elicker, K. D. Jones, T. E. King Jr and H. R. Collard (2012): “A multidimensional index and staging system for idiopathic pulmonary fibrosis,” Ann. Intern. Med., 156, 684–691.10.7326/0003-4819-156-10-201205150-00004Search in Google Scholar PubMed
Liberzon, A., A. Subramanian, R. Pinchback, H. Thorvaldsdóttir, P. Tamayo and J. P. Mesirov (2011): “Molecular signatures database (MSigDB) 3.0,” Bioinformatics, 27, 1739–1740.10.1093/bioinformatics/btr260Search in Google Scholar PubMed PubMed Central
Michaud, J., K. M. Simpson, R. Escher, K. Buchet-Poyau, T. Beissbarth, C. Carmichael, M. E. Ritchie, F. Schütz, P. Cannon and M. Liu (2008): “Integrative analysis of runx1 downstream pathways and target genes,” BMC Genomics, 9, 363.10.1186/1471-2164-9-363Search in Google Scholar PubMed PubMed Central
Obermoser, G., S. Presnell, K. Domico, H. Xu, Y. Wang, E. Anguiano, L. Thompson-Snipes, R. Ranganathan, B. Zeitner, A. Bjork, D. Anderson, C. Speake, E. Ruchaud, J. Skinner, L. Alsina, M. Sharma, H. Dutartre, A. Cepika, E. Israelsson, P. Nguyen, A. Q. Nguyen, C. A. Harrod, M. S. Zurawski, V. Pascual, H. Ueno, T. G. Nepom, C. Quinn, D. Blankenship, K. Palucka, J. Banchereau and D. Chaussabel (2013): “Systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines,” Immunity, 38, 831–844.10.1016/j.immuni.2012.12.008Search in Google Scholar PubMed PubMed Central
Pang, H., D. Datta and H. Zhao (2010): “Pathway analysis using random forests with bivariate node-split for survival outcomes,” Bioinformatics, 26, 250–258.10.1093/bioinformatics/btp640Search in Google Scholar PubMed PubMed Central
Proust-Lima, C., M. Séne, J. M. Taylor and H. Jacqmin-Gadda (2014): “Joint latent class models for longitudinal and time-to-event data: A review,” Stat. Methods Med. Res., 23, 74–90.10.1177/0962280212445839Search in Google Scholar PubMed PubMed Central
Qu, S., J.-L. Wang and X. Wang (2016): “Optimal estimation for the functional Cox model,” Ann. Stat., 44, 1708–1738.10.1214/16-AOS1441Search in Google Scholar
R Core Team (2017): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.Search in Google Scholar
Rajicic, N., D. M. Finkelstein, D. A. Schoenfeld and Inflammation Host Response to Injury Research Program Investigators (2006): “Survival analysis of longitudinal microarrays,” Bioinformatics, 22, 2643–2649.10.1093/bioinformatics/btl450Search in Google Scholar PubMed
Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. Series B Stat. Methodol., 64, 479–498.10.1111/1467-9868.00346Search in Google Scholar
Van De Vijver, M. J., Y. D. He, L. J. Van’t Veer, H. Dai, A. A. Hart, Voskuil, D. W., G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend and R. Bernards (2002): “A gene-expression signature as a predictor of survival in breast cancer,” N. Engl. J. Med., 347, 1999–2009.10.1056/NEJMoa021967Search in Google Scholar PubMed
Whittemore, A. S., C. Lele, G. D. Friedman, T. Stamey, J. H. Vogelman and N. Orentreich (1995): “Prostate-specific antigen as predictor of prostate cancer in black men and white men,” J. Natl. Cancer Inst., 87, 354–359.10.1093/jnci/87.5.354Search in Google Scholar PubMed
Xiao, W., M. N. Mindrinos, J. Seok, J. Cuschieri, A. G. Cuenca, H. Gao, D. L. Hayden, L. Hennessy, E. E. Moore, J. P. Minei, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, B. H. Brownstein, P. H. Mason, H. V. Baker, C. C. Finnerty, M. G. Jeschke, M. C. López, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. Arnoldo, W. Xu, Y. Zhang, S. E. Calvano, McDonald-G. P. Smith, D. A. Schoenfeld, J. D. Storey, J. P. Cobb, H. S. Warren, L. L. Moldawer, D. N. Herndon, S. F. Lowry, R. V. Maier, R. W. Davis, R. G. Tompkins and Inflammation and Host Response to Injury Large-Scale Collaborative Research Program (2011): “A genomic storm in critically injured humans,” J. Exp. Med., 208, 2581–2590.10.1084/jem.20111354Search in Google Scholar PubMed PubMed Central
Xu, C., P. D. Baines and J.-L. Wang (2014): “Standard error estimation using the EM algorithm for the joint modeling of survival and longitudinal data,” Biostatistics, 15, 731–744.10.1093/biostatistics/kxu015Search in Google Scholar PubMed PubMed Central
Yao, F., H.-G. Müller and J.-L. Wang (2005): “Functional data analysis for sparse longitudinal data,” J. Am. Stat. Assoc., 100, 577–590.10.1198/016214504000001745Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2017-0060).
©2019 Walter de Gruyter GmbH, Berlin/Boston