Abstract
Metabolomics is the comprehensive study of small molecule metabolites in biological systems. By assaying and analyzing thousands of metabolites in biological samples, it provides a whole picture of metabolic status and biochemical events happening within an organism and has become an increasingly powerful tool in the disease research. In metabolomics, it is common to deal with large amounts of data generated by nuclear magnetic resonance (NMR) and/or mass spectrometry (MS). Moreover, based on different goals and designs of studies, it may be necessary to use a variety of data analysis methods or a combination of them in order to obtain an accurate and comprehensive result. In this review, we intend to provide an overview of computational and statistical methods that are commonly applied to analyze metabolomics data. The review is divided into five sections. The first two sections will introduce the background and the databases and resources available for metabolomics research. The third section will briefly describe the principles of the two main experimental methods that produce metabolomics data: MS and NMR, followed by the fourth section that describes the preprocessing of the data from these two approaches. In the fifth and the most important section, we will review four main types of analysis that can be performed on metabolomics data with examples in metabolomics. These are unsupervised learning methods, supervised learning methods, pathway analysis methods and analysis of time course metabolomics data. We conclude by providing a table summarizing the principles and tools that we discussed in this review.
References
Abdi, H. (2010). Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2, 97–106.
Agresti, A. (2014). Categorical data analysis. New York: Wiley.
Anderson, P. E., Reo, N. V., DelRaso, N. J., Doom, T. E., & Raymer, M. L. (2008). Gaussian binning: A new kernel-based method for processing NMR spectroscopic data for metabolomics. Metabolomics, 4, 261–272.
Armitage, E. G., & Barbas, C. (2014). Metabolomics in cancer biomarker discovery: Current trends and future perspectives. Journal of Pharmaceutical and Biomedical Analysis, 87, 1–11.
Assfalg, M., et al. (2008). Evidence of different metabolic phenotypes in humans. Proceedings of the National Academy of Sciences, 105, 1420–1424.
Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. Ø., & Herrgard, M. J. (2007). Quantitative prediction of cellular metabolism with constraint-based models: The COBRA Toolbox. Nature Protocols, 2, 727–738.
Beckonert, O., Monnerjahn, J., Bonk, U., & Leibfritz, D. (2003). Visualizing metabolic changes in breast-cancer tissue using 1H-NMR spectroscopy and self-organizing maps. NMR in Biomedicine, 16, 1–11.
Berk, M., Ebbels, T., & Montana, G. (2011). A statistical framework for biomarker discovery in metabolomic time course data. Bioinformatics, 27, 1979–1985. doi:10.1093/bioinformatics/btr289.
Bezdek, J. C., Coray, C., Gunderson, R., & Watson, J. (1981). Detection and characterization of cluster substructure i. linear structure: Fuzzy c-lines. SIAM Journal on Applied Mathematics, 40, 339–357.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Blekherman, G., et al. (2011). Bioinformatics tools for cancer metabolomics. Metabolomics, 7, 329–343. doi:10.1007/s11306-010-0270-3.
Boulesteix, A.-L. (2004). PLS dimension reduction for classification with microarray data. Statistical Applications in Genetics and Molecular Biology, 3, 1–30.
Box, G. E., Hunter, W. G., & Hunter, J. S. (1978). Statistics for experimenters. New York: Wiley.
Brereton, R. G., & Lloyd, G. R. (2010). Support vector machines for classification and regression. Analyst, 135, 230–267.
Broadhurst, D. I., & Kell, D. B. (2006). Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics, 2, 171–196.
Brockwell, P. J., & Davis, R. A. (2002). Introduction to time series and forecasting (Vol. 1). Boca Raton: Taylor & Francis.
Bu, H.-L., Li, G.-Z., Zeng, X.-Q., Yang, J. Y., & Yang, M. Q. (2007). Feature selection and partial least squares based dimension reduction for tumor classification. In Proceedings of the 7th IEEE international conference on bioinformatics and bioengineering, 2007 (BIBE 2007) (pp. 967–973). New York: IEEE.
Bylesjö, M., Rantalainen, M., Cloarec, O., Nicholson, J. K., Holmes, E., & Trygg, J. (2006). OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20, 341–351.
Cao, H., Dong, J., Cai, C., & Chen, Z. (2008). Investigations on the effects of NMR experimental conditions in human urine and serum metabolic profiles. In The 2nd international conference on bioinformatics and biomedical engineering, 2008 (ICBBE 2008) (pp. 2236–2239). New York: IEEE.
Chun, H., & Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72, 3–25.
Chung, D., & Keles, S. (2010). Sparse partial least squares classification for high dimensional data. Statistical Applications in Genetics and Molecular Biology. doi:10.2202/1544-6115.1492.
Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M. C., & Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 4107–4117.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.
Cui, Q., et al. (2008). Metabolite identification via the Madison metabolomics consortium database. Nature Biotechnology, 26, 162–164.
Davis, R. A., Charlton, A. J., Godward, J., Jones, S. A., Harrison, M., & Wilson, J. C. (2007). Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform. Chemometrics and Intelligent Laboratory Systems, 85, 144–154.
De Soete, G., & Carroll, J. D. (1994). K-means clustering in a low-dimensional Euclidean space. In E. Diday, et al. (Eds.), New approaches in classification and data analysis (pp. 212–219). Heidelberg: Springer.
Dettmer, K., Aronov, P. A., & Hammock, B. D. (2007). Mass spectrometry-based metabolomics. Mass Spectrometry Reviews, 26, 51–78. doi:10.1002/mas.20108.
Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical Chemistry, 78, 4281–4290.
Draisma, H. H., Reijmers, T. H., Meulman, J. J., van der Greef, J., Hankemeier, T., & Boomsma, D. I. (2013). Hierarchical clustering analysis of blood plasma lipidomics profiles from mono-and dizygotic twin families. European Journal of Human Genetics, 21, 95–101.
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57.
Dunn, W. B., Bailey, N. J., & Johnson, H. E. (2005). Measuring the metabolome: Current analytical technologies. Analyst, 130, 606–625.
Dunn, W. B., Wilson, I. D., Nicholls, A. W., & Broadhurst, D. (2012). The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. Bioanalysis, 4, 2249–2264.
Dunn, W. B., Broadhurst, D., Begley, P., Zelena, E., Francis-McIntyre, S., Anderson, N., et al. (2011). Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols, 6, 1060–1083.
Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89–102.
Emwas, A.-H., Luchinat, C., Turano, P., Tenori, L., Roy, R., Salek, R. M., et al. (2014). Standardizing the experimental conditions for using urine in NMR-based metabolomic studies with a particular focus on diagnostic studies: A review. Metabolomics, 11(4), 872–894.
Enea, C., et al. (2010). 1H NMR-based metabolomics approach for exploring urinary metabolome modifications after acute and chronic physical exercise. Analytical and Bioanalytical Chemistry, 396, 1167–1176.
Ertöz, L., Steinbach, M., & Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM 2003, SIAM (pp. 47–58).
Fahy, E., Sud, M., Cotter, D., & Subramaniam, S. (2007). LIPID MAPS online tools for lipid research. Nucleic Acids Research, 35, W606–W612.
Förster, J., Gombert, A. K., & Nielsen, J. (2002). A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnology and Bioengineering, 79, 703–712.
Gentleman, R. C., et al. (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5, R80.
Gika, H. G., Theodoridis, G. A., Plumb, R. S., & Wilson, I. D. (2014). Current practice of liquid chromatography–mass spectrometry in metabolomics and metabonomics. Journal of Pharmaceutical and Biomedical Analysis, 87, 12–25.
Griffin, J. L., Atherton, H., Shockcor, J., & Atzori, L. (2011). Metabolomics as a tool for cardiac research. Nature, 8, 630–643.
Griffin, J. L., & Shockcor, J. P. (2004). Metabolic profiles of cancer cells. Nature Reviews Cancer, 4, 551–561. doi:10.1038/nrc1390.
Griffiths, W. J., Koal, T., Wang, Y., Kohl, M., Enot, D. P., & Deigner, H. P. (2010). Targeted metabolomics for biomarker discovery. Angewandte Chemie, 49, 5426–5445. doi:10.1002/anie.200905579.
Guan, W., Zhou, M., Hampton, C. Y., Benigno, B. B., Walker, L. D., Gray, A., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259. doi:10.1186/1471-2105-10-259.
Gunderson, R. W. (1982). Choosing the r-dimension for the FCV family of clustering algorithms. BIT Numerical Mathematics, 22, 140–149.
Gunderson, R. W. (1983). An adaptive FCV clustering algorithm. International Journal of Man-Machine Studies, 19, 97–104.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.
Haddad, I., Hiller, K., Frimmersdorf, E., Benkert, B., Schomburg, D., & Jahn, D. (2009). An emergent self-organizing map based analysis pipeline for comparative metabolome studies. In Silico Biology, 9, 163–178.
Hamerly, G., & Elkan, C. (2003). Learning the k in k-means. Advances in Neural Information Processing Systems, 16, 281–288.
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28, 100–108.
Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., & Tibshirani, R. (2009). The elements of statistical learning (Vol. 2). Berlin: Springer.
Heather, L. C., Wang, X., West, J. A., & Griffin, J. L. (2013). A practical guide to metabolomic profiling as a discovery tool for human heart disease. Journal of Molecular and Cellular Cardiology, 55, 2–11.
Heinzmann, S. S., Brown, I. J., Chan, Q., Bictash, M., Dumas, M. E., Kochhar, S., et al. (2010). Metabolic profiling strategy for discovery of nutritional biomarkers: Proline betaine as a marker of citrus consumption. The American Journal of Clinical Nutrition, 92, 436–443.
Henneges, C., Bullinger, D., Fux, R., Friese, N., Seeger, H., Neubauer, H., et al. (2009). Prediction of breast cancer by profiling of urinary RNA metabolites using Support Vector Machine-based feature selection. BMC Cancer, 9, 104.
Holmans, P. (2010). Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Advances in Genetics, 72, 141–179. doi:10.1016/B978-0-12-380862-2.00007-2.
Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., et al. (2010). MassBank: A public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45, 703–714.
Hou, Y., et al. (2012). Microbial strain prioritization using metabolomics tools for the discovery of natural products. Analytical Chemistry, 84, 4277–4283. doi:10.1021/ac202623g.
Huang, J. Z., Ng, M. K., Rong, H., & Li, Z. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 657–668.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31, 264–323.
Jansen, J. J., Hoefsloot, H. C., Boelens, H. F., van der Greef, J., & Smilde, A. K. (2004). Analysis of longitudinal metabolomics data. Bioinformatics, 20, 2438–2446. doi:10.1093/bioinformatics/bth268.
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.
Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (6th ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
Jolliffe, I. (2005). Principal component analysis. New YorK: Wiley Online Library.
Kaddurah-Daouk, R., & Krishnan, K. R. (2009). Metabolomics: A global biochemical approach to the study of central nervous system diseases. Neuropsychopharmacology, 34, 173–186. doi:10.1038/npp.2008.174.
Kanehisa, M. (2002). The KEGG database. Novartis Foundation Symposium, 247, 91–101 ; discussion 101–3, 119–28, 244–52.
Kang, S. M., Park, J. C., Shin, M. J., Lee, H., Oh, J., Hwang, G. S., et al. (2011). (1)H nuclear magnetic resonance based metabolic urinary profiling of patients with ischemic heart failure. Clinical Biochemistry, 44, 293–299. doi:10.1016/j.clinbiochem.2010.11.010.
Kell, D. B., Brown, M., Davey, H. M., Dunn, W. B., Spasic, I., & Oliver, S. G. (2005). Metabolic footprinting and systems biology: The medium is the message. Nature Reviews Microbiology, 3, 557–565. doi:10.1038/nrmicro1177.
Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Computational Biology, 8, e1002375. doi:10.1371/journal.pcbi.1002375.
Kilkenny, C., Parsons, N., Kadyszewski, E., Festing, M. F., Cuthill, I. C., Fry, D., et al. (2009). Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE, 4, e7824.
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78, 1464–1480.
Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21, 1–6.
Kutner, M. H. (2005). Applied linear statistical models (5th ed.). McGraw-Hill/Irwin: Boston.
Lauridsen, M., Hansen, S. H., Jaroszewski, J. W., & Cornett, C. (2007). Human urine as test material in 1H NMR-based metabonomics: Recommendations for sample preparation and storage. Analytical Chemistry, 79, 1181–1186.
Li, H., Liang, Y., & Xu, Q. (2009a). Support vector machines and its applications in chemistry. Chemometrics and Intelligent Laboratory Systems, 95, 188–198.
Li, X., Lu, X., Tian, J., Gao, P., Kong, H., & Xu, G. (2009b). Application of fuzzy c-means clustering in data analysis of metabolomics. Analytical Chemistry, 81, 4468–4475.
Li, F., Wang, J., Nie, L., & Zhang, W. (2012). Computational methods to interpret and integrate metabolomic data. New York: INTECH Open Access Publisher.
Luo, W., & Brouwer, C. (2013). Pathview: An R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics, 29, 1830–1831.
Mahadevan, S., Shah, S. L., Marrie, T. J., & Slupsky, C. M. (2008). Analysis of metabolomic data using support vector machines. Analytical Chemistry, 80, 7562–7570.
Martens, H. (1992). Multivariate calibration. New York: Wiley.
Marzetti, E., Landi, F., Marini, F., Cesari, M., Buford, T. W., Manini, T. M., et al. (2014). Patterns of circulating inflammatory biomarkers in older persons with varying levels of physical performance: A partial least squares-discriminant analysis approach. Frontiers in Medicine, 1, 27. doi:10.3389/fmed.2014.00027.
Matthiesen, R., & SpringerLink (Online Service). (2010). Bioinformatics methods in clinical research. In S. Krawetz & S. Misener (Eds.), Methods in molecular biology, methods and protocols. Totowa: Humana Press.
Milliken, G. A., & Johnson, D. E. (2009). Analysis of messy data (2nd ed.). Boca Raton: CRC Press.
Milone, D. H., Stegmayer, G., López, M., Kamenetzky, L., & Carrari, F. (2014). Improving clustering with metabolic pathway data. BMC Bioinformatics, 15, 101.
Montgomery, D. C. (2008). Design and analysis of experiments. New York: Wiley.
Nguyen, D. V., & Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18, 39–50.
Nicholson, J. K., Lindon, J. C., & Holmes, E. (1999). ‘Metabonomics’: Understanding the metabolic responses of living systems to pathphysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica, 29, 1181–1189.
Nin, N., Izquierdo-García, J., & Lorente, J. (2012). The metabolomic approach to the diagnosis of critical illness. In Annual update in intensive care and emergency medicine (pp. 43–52). Berlin: Springer.
Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C., Smilde, A. K., Talón, M., et al. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA. Bioinformatics, 23, 1792–1800. doi:10.1093/bioinformatics/btm251.
Oliver, S. G. (2002). Functional genomics: Lessons from yeast. Philosophical Transactions of the Royal Society of London. Series B, Biological sciences, 357, 17–23. doi:10.1098/rstb.2001.1049.
Oliver, S. G., Winson, M. K., Kell, D. B., & Baganz, F. (1998). Systematic functional analysis of the yeast genome. Trends in Biotechnology, 16, 373–378
O’Sullivan, A., Gibney, M. J., & Brennan, L. (2011). Dietary intake patterns are reflected in metabolomic profiles: Potential role in dietary assessment studies. The American Journal of Clinical Nutrition, 93, 314–321.
Papin, J. A., Stelling, J., Price, N. D., Klamt, S., Schuster, S., & Palsson, B. O. (2004). Comparison of network-based pathway analysis methods. Trends in Biotechnology, 22, 400–405. doi:10.1016/j.tibtech.2004.06.010.
Patel, K. N., Patel, J. K., Patel, M. P., Rajput, G. C., & Patel, H. A. (2010). Introduction to hyphenated techniques and their applications in pharmacy. Pharmaceutical Methods, 1, 2–13.
Pauling, L., Robinson, A. B., Teranishi, R., & Cary, P. (1971). Quantitative analysis of urine vapor and breath by gas-liquid partition chromatography. Proceedings of the National Academy of Sciences of the United States of America, 68, 2374–2376.
Poroyko, V., Morowitz, M., Bell, T., Ulanov, A., Wang, M., Donovan, S., et al. (2011). Diet creates metabolic niches in the “immature gut” that shape microbial communities. Nutricion Hospitalaria, 26, 1283–1295. doi:10.1590/S0212-16112011000600015.
Putri, S. P., Nakayama, Y., Matsuda, F., Uchikata, T., Kobayashi, S., Matsubara, A., et al. (2013). Current metabolomics: Practical applications. Journal of Bioscience and Bioengineering, 115, 579–589. doi:10.1016/j.jbiosc.2012.12.007.
Ramadan, Z., Jacobs, D., Grigorov, M., & Kochhar, S. (2006). Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. Talanta, 68, 1683–1691.
Raman, K., & Chandra, N. (2009). Flux balance analysis of biological systems: Applications and challenges. Briefings in Bioinformatics, 10, 435–449. doi:10.1093/bib/bbp011.
Riter, L. S., Vitek, O., Gooding, K. M., Hodge, B. D., & Julian, R. K. (2005). Statistical design of experiments as a tool in mass spectrometry. Journal of Mass Spectrometry, 40, 565–579.
Rocke, D. M. (2004). Design and analysis of experiments with high throughput biological assay data. Seminars in Cell & Developmental Biology, 15, 703–713.
Savorani, F., Tomasi, G., & Engelsen, S. B. (2010). icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. Journal of Magnetic Resonance, 202, 190–202.
Scalbert, A., Brennan, L., Fiehn, O., Hankemeier, T., Kristal, B. S., van Ommen, B., et al. (2009). Mass-spectrometry-based metabolomics: Limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics, 5, 435–458.
Schilling, C. H., Schuster, S., Palsson, B. O., & Heinrich, R. (1999). Metabolic pathway analysis: Basic concepts and scientific applications in the post-genomic era. Biotechnology Progress, 15, 296–303.
Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.
Slupsky, C. M., Rankin, K. N., Wagner, J., Fu, H., Chang, D., Weljie, A. M., et al. (2007). Investigations of the effects of gender, diurnal variation, and age in human urinary metabolomic profiles. Analytical Chemistry, 79, 6995–7004.
Smilde, A. K., Jansen, J. J., Hoefsloot, H. C., Lamers, R. J., van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics, 21, 3043–3048. doi:10.1093/bioinformatics/bti476.
Smilde, A. K., Westerhuis, J. A., Hoefsloot, H. C. J., Bijlsma, S., Rubingh, C. M., Vis, D. J., et al. (2010). Dynamic metabolomic data analysis: A tutorial review. Metabolomics, 6, 3–17. doi:10.1007/s11306-009-0191-1.
Smith, C. A., O’Maille, G., Want, E. J., Qin, C., Trauger, S. A., Brandon, T. R., et al. (2005). METLIN: A metabolite mass spectral database. Therapeutic Drug Monitoring, 27, 747–751.
Smolinska, A., Blanchet, L., Buydens, L. M., & Wijmenga, S. S. (2012). NMR and pattern recognition methods in metabolomics: From data acquisition to biomarker discovery: A review. Analytica Chimica Acta, 750, 82–97. doi:10.1016/j.aca.2012.05.049.
Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.
Steuer, R. (2007). Computational approaches to the topology, stability and dynamics of metabolic networks. Phytochemistry, 68, 2139–2151. doi:10.1016/j.phytochem.2007.04.041.
Stretch, C., Eastman, T., Mandal, R., Eisner, R., Wishart, D. S., Mourtzakis, M., et al. (2012). Prediction of skeletal muscle and fat mass in patients with advanced cancer using a metabolomic approach. The Journal of Nutrition, 142, 14–21.
Szczesniak, R. D., McPhail, G. L., Duan, L. L., Macaluso, M., Amin, R. S., & Clancy, J. P. (2013). A semiparametric approach to estimate rapid lung function decline in cystic fibrosis. Annals of Epidemiology, 23, 771–777.
Szymanska, E., Saccenti, E., Smilde, A. K., & Westerhuis, J. A. (2012). Double-check: Validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics, 8, 3–16. doi:10.1007/s11306-011-0330-3.
Theodoridis, G. A., Gika, H. G., Want, E. J., & Wilson, I. D. (2012). Liquid chromatography–mass spectrometry based global metabolite profiling: A review. Analytica Chimica Acta, 711, 7–16.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411–423.
Timmerman, M. E., Ceulemans, E., De Roover, K., & Van Leeuwen, K. (2013). Subspace K-means clustering. Behavior Research Methods, 45, 1011–1023.
Timmerman, M. E., Ceulemans, E., Kiers, H. A., & Vichi, M. (2010). Factorial and reduced K-means reconsidered. Computational Statistics & Data Analysis, 54, 1858–1871.
Timmerman, M. E., Hoefsloot, H. C., Smilde, A. K., & Ceulemans, E. (2015). Scaling in ANOVA-simultaneous component analysis. Metabolomics,. doi:10.1007/s11306-015-0785-8.
Tomar, N., & De, R. K. (2013). Comparing methods for metabolic network analysis and an application to Metabolic Engineering. Gene, 521, 1–14.
Tomasi, G., van den Berg, F., & Andersson, C. (2004). Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. Journal of Chemometrics, 18, 231–241.
Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16, 119–128.
Ultsch, A. (2003). U*-matrix: A tool to visualize clusters in high dimensional data. Marburg: Fachbereich Mathematik und Informatik.
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142.
VanDyke, R., Ren, Y., Sucharew, H. J., Miodovnik, M., Rosenn, B., & Khoury, J. C. (2012). Characterizing maternal glycemic control: A more informative approach using semiparametric regression. Journal of Maternal-Fetal and Neonatal Medicine, 25, 15–19.
Velagapudi, V. R., et al. (2010). The gut microbiota modulates host energy and lipid metabolism in mice. Journal of Lipid Research, 51, 1101–1112.
Vettukattil, R. (2015). Preprocessing of raw metabonomic data. Metabonomics: Methods and Protocols, 1, 123–136.
Vichi, M., & Kiers, H. A. (2001). Factorial k-means analysis for two-way data. Computational Statistics & Data Analysis, 37, 49–64.
Wang-Sattler, R., Yu, Z., Herder, C., Messias, A. C., Floegel, A., He, Y., et al. (2012). Novel biomarkers for pre-diabetes identified by metabolomics. Molecular Systems Biology,. doi:10.1038/msb.2012.43.
Wetmore, D. R., Joseloff, E., Pilewski, J., Lee, D. P., Lawton, K. A., Mitchell, M. W., et al. (2010). Metabolomic profiling reveals biochemical pathways and biomarkers associated with pathogenesis in cystic fibrosis cells. Journal of Biological Chemistry, 285, 30516–30522. doi:10.1074/jbc.M110.140806.
Wiechert, W. (2002). Modeling and simulation: Tools for metabolic engineering. Journal of Biotechnology, 94, 37–63.
Wishart, D. S. (2007). Current progress in computational metabolomics. Briefings in Bioinformatics, 8, 279–293.
Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., et al. (2013). HMDB 3.0—The human metabolome database in 2013. Nucleic Acids Research, 41, D801–D807. doi:10.1093/nar/gks1065.
Wold, H. (1966). Estimation of principal components and related models by iterative least squares. Multivariate Analysis, 1, 391–420.
Wold, S., Ruhe, A., Wold, H., & Dunn, W. J. (1984). The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal on Scientific and Statistical Computing, 5, 735–743.
Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58, 109–130.
Xi, Y., & Rocke, D. M. (2008). Baseline correction for NMR spectroscopic metabolomics data analysis. BMC Bioinformatics, 9, 324.
Xia, J., Broadhurst, D. I., Wilson, M., & Wishart, D. S. (2012a). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9, 280–299. doi:10.1007/s11306-012-0482-9.
Xia, J., Mandal, R., Sinelnikov, I. V., Broadhurst, D., & Wishart, D. S. (2012b). MetaboAnalyst 2.0—A comprehensive server for metabolomic data analysis. Nucleic Acids Research, 40, W127–W133.
Xia, J., Psychogios, N., Young, N., & Wishart, D. S. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37, W652–W660.
Xing, E. P., Jordan, M. I., Russell, S., & Ng, A. Y. (2002). Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (pp. 505–512). Cambridge, MA: MIT Press.
Yan, M., & Ye, K. (2007). Determining the number of clusters using the weighted gap statistic. Biometrics, 63, 1031–1037.
Yang, C., He, Z., & Yu, W. (2009). Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics, 10, 4.
Zhang, S., Gowda, G. N., Asiago, V., Shanaiah, N., Barbas, C., & Raftery, D. (2008). Correlative and quantitative 1 H NMR-based metabolomics reveals specific metabolic pathway disturbances in diabetic rats. Analytical Biochemistry, 383, 76–84.
Zhang, J. D., & Wiemann, S. (2009). KEGGgraph: A graph approach to KEGG PATHWAY in R and bioconductor. Bioinformatics, 25, 1470–1471.
Acknowledgments
We would like to express our great appreciation to Dr. Lilliam Ambroggio and Dr. Lindsey Romick-Rosendale for their valuable and constructive suggestions to our review. Their willingness to give their time so generously has been very much appreciated. This study is funded by the NIH Grant R01 HL116226 to RDS and LJL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Sheng Ren, Anna A. Hinzman, Emily L. Kang, Rhonda D. Szczesniak and L. Jason Lu declare that we have no conflict of interest and we have included separately signed conflict of interest forms in this manuscript.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ren, S., Hinzman, A.A., Kang, E.L. et al. Computational and statistical analysis of metabolomics data. Metabolomics 11, 1492–1513 (2015). https://doi.org/10.1007/s11306-015-0823-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-015-0823-6