Skip to main content

Advertisement

Log in

Computational and statistical analysis of metabolomics data

Metabolomics Aims and scope Submit manuscript

Abstract

Metabolomics is the comprehensive study of small molecule metabolites in biological systems. By assaying and analyzing thousands of metabolites in biological samples, it provides a whole picture of metabolic status and biochemical events happening within an organism and has become an increasingly powerful tool in the disease research. In metabolomics, it is common to deal with large amounts of data generated by nuclear magnetic resonance (NMR) and/or mass spectrometry (MS). Moreover, based on different goals and designs of studies, it may be necessary to use a variety of data analysis methods or a combination of them in order to obtain an accurate and comprehensive result. In this review, we intend to provide an overview of computational and statistical methods that are commonly applied to analyze metabolomics data. The review is divided into five sections. The first two sections will introduce the background and the databases and resources available for metabolomics research. The third section will briefly describe the principles of the two main experimental methods that produce metabolomics data: MS and NMR, followed by the fourth section that describes the preprocessing of the data from these two approaches. In the fifth and the most important section, we will review four main types of analysis that can be performed on metabolomics data with examples in metabolomics. These are unsupervised learning methods, supervised learning methods, pathway analysis methods and analysis of time course metabolomics data. We conclude by providing a table summarizing the principles and tools that we discussed in this review.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

References

  • Abdi, H. (2010). Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2, 97–106.

    Article  Google Scholar 

  • Agresti, A. (2014). Categorical data analysis. New York: Wiley.

    Google Scholar 

  • Anderson, P. E., Reo, N. V., DelRaso, N. J., Doom, T. E., & Raymer, M. L. (2008). Gaussian binning: A new kernel-based method for processing NMR spectroscopic data for metabolomics. Metabolomics, 4, 261–272.

    Article  CAS  Google Scholar 

  • Armitage, E. G., & Barbas, C. (2014). Metabolomics in cancer biomarker discovery: Current trends and future perspectives. Journal of Pharmaceutical and Biomedical Analysis, 87, 1–11.

    Article  CAS  PubMed  Google Scholar 

  • Assfalg, M., et al. (2008). Evidence of different metabolic phenotypes in humans. Proceedings of the National Academy of Sciences, 105, 1420–1424.

    Article  CAS  Google Scholar 

  • Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. Ø., & Herrgard, M. J. (2007). Quantitative prediction of cellular metabolism with constraint-based models: The COBRA Toolbox. Nature Protocols, 2, 727–738.

    Article  CAS  PubMed  Google Scholar 

  • Beckonert, O., Monnerjahn, J., Bonk, U., & Leibfritz, D. (2003). Visualizing metabolic changes in breast-cancer tissue using 1H-NMR spectroscopy and self-organizing maps. NMR in Biomedicine, 16, 1–11.

    Article  CAS  PubMed  Google Scholar 

  • Berk, M., Ebbels, T., & Montana, G. (2011). A statistical framework for biomarker discovery in metabolomic time course data. Bioinformatics, 27, 1979–1985. doi:10.1093/bioinformatics/btr289.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bezdek, J. C., Coray, C., Gunderson, R., & Watson, J. (1981). Detection and characterization of cluster substructure i. linear structure: Fuzzy c-lines. SIAM Journal on Applied Mathematics, 40, 339–357.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.

    Google Scholar 

  • Blekherman, G., et al. (2011). Bioinformatics tools for cancer metabolomics. Metabolomics, 7, 329–343. doi:10.1007/s11306-010-0270-3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Boulesteix, A.-L. (2004). PLS dimension reduction for classification with microarray data. Statistical Applications in Genetics and Molecular Biology, 3, 1–30.

    Article  Google Scholar 

  • Box, G. E., Hunter, W. G., & Hunter, J. S. (1978). Statistics for experimenters. New York: Wiley.

    Google Scholar 

  • Brereton, R. G., & Lloyd, G. R. (2010). Support vector machines for classification and regression. Analyst, 135, 230–267.

    Article  CAS  PubMed  Google Scholar 

  • Broadhurst, D. I., & Kell, D. B. (2006). Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics, 2, 171–196.

    Article  CAS  Google Scholar 

  • Brockwell, P. J., & Davis, R. A. (2002). Introduction to time series and forecasting (Vol. 1). Boca Raton: Taylor & Francis.

    Book  Google Scholar 

  • Bu, H.-L., Li, G.-Z., Zeng, X.-Q., Yang, J. Y., & Yang, M. Q. (2007). Feature selection and partial least squares based dimension reduction for tumor classification. In Proceedings of the 7th IEEE international conference on bioinformatics and bioengineering, 2007 (BIBE 2007) (pp. 967–973). New York: IEEE.

  • Bylesjö, M., Rantalainen, M., Cloarec, O., Nicholson, J. K., Holmes, E., & Trygg, J. (2006). OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20, 341–351.

    Article  CAS  Google Scholar 

  • Cao, H., Dong, J., Cai, C., & Chen, Z. (2008). Investigations on the effects of NMR experimental conditions in human urine and serum metabolic profiles. In The 2nd international conference on bioinformatics and biomedical engineering, 2008 (ICBBE 2008) (pp. 2236–2239). New York: IEEE.

  • Chun, H., & Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72, 3–25.

    Article  Google Scholar 

  • Chung, D., & Keles, S. (2010). Sparse partial least squares classification for high dimensional data. Statistical Applications in Genetics and Molecular Biology. doi:10.2202/1544-6115.1492.

    PubMed  PubMed Central  Google Scholar 

  • Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M. C., & Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 4107–4117.

    Article  CAS  PubMed  Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.

    Google Scholar 

  • Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.

    Article  CAS  PubMed  Google Scholar 

  • Cui, Q., et al. (2008). Metabolite identification via the Madison metabolomics consortium database. Nature Biotechnology, 26, 162–164.

    Article  CAS  PubMed  Google Scholar 

  • Davis, R. A., Charlton, A. J., Godward, J., Jones, S. A., Harrison, M., & Wilson, J. C. (2007). Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform. Chemometrics and Intelligent Laboratory Systems, 85, 144–154.

    Article  CAS  Google Scholar 

  • De Soete, G., & Carroll, J. D. (1994). K-means clustering in a low-dimensional Euclidean space. In E. Diday, et al. (Eds.), New approaches in classification and data analysis (pp. 212–219). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Dettmer, K., Aronov, P. A., & Hammock, B. D. (2007). Mass spectrometry-based metabolomics. Mass Spectrometry Reviews, 26, 51–78. doi:10.1002/mas.20108.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical Chemistry, 78, 4281–4290.

    Article  CAS  PubMed  Google Scholar 

  • Draisma, H. H., Reijmers, T. H., Meulman, J. J., van der Greef, J., Hankemeier, T., & Boomsma, D. I. (2013). Hierarchical clustering analysis of blood plasma lipidomics profiles from mono-and dizygotic twin families. European Journal of Human Genetics, 21, 95–101.

    Article  CAS  PubMed  Google Scholar 

  • Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57.

    Article  Google Scholar 

  • Dunn, W. B., Bailey, N. J., & Johnson, H. E. (2005). Measuring the metabolome: Current analytical technologies. Analyst, 130, 606–625.

    Article  CAS  PubMed  Google Scholar 

  • Dunn, W. B., Wilson, I. D., Nicholls, A. W., & Broadhurst, D. (2012). The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. Bioanalysis, 4, 2249–2264.

    Article  CAS  PubMed  Google Scholar 

  • Dunn, W. B., Broadhurst, D., Begley, P., Zelena, E., Francis-McIntyre, S., Anderson, N., et al. (2011). Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols, 6, 1060–1083.

    Article  CAS  PubMed  Google Scholar 

  • Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89–102.

    Article  Google Scholar 

  • Emwas, A.-H., Luchinat, C., Turano, P., Tenori, L., Roy, R., Salek, R. M., et al. (2014). Standardizing the experimental conditions for using urine in NMR-based metabolomic studies with a particular focus on diagnostic studies: A review. Metabolomics, 11(4), 872–894.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Enea, C., et al. (2010). 1H NMR-based metabolomics approach for exploring urinary metabolome modifications after acute and chronic physical exercise. Analytical and Bioanalytical Chemistry, 396, 1167–1176.

    Article  CAS  PubMed  Google Scholar 

  • Ertöz, L., Steinbach, M., & Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM 2003, SIAM (pp. 47–58).

  • Fahy, E., Sud, M., Cotter, D., & Subramaniam, S. (2007). LIPID MAPS online tools for lipid research. Nucleic Acids Research, 35, W606–W612.

    Article  PubMed  PubMed Central  Google Scholar 

  • Förster, J., Gombert, A. K., & Nielsen, J. (2002). A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnology and Bioengineering, 79, 703–712.

    Article  PubMed  CAS  Google Scholar 

  • Gentleman, R. C., et al. (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5, R80.

    Article  PubMed  PubMed Central  Google Scholar 

  • Gika, H. G., Theodoridis, G. A., Plumb, R. S., & Wilson, I. D. (2014). Current practice of liquid chromatography–mass spectrometry in metabolomics and metabonomics. Journal of Pharmaceutical and Biomedical Analysis, 87, 12–25.

    Article  CAS  PubMed  Google Scholar 

  • Griffin, J. L., Atherton, H., Shockcor, J., & Atzori, L. (2011). Metabolomics as a tool for cardiac research. Nature, 8, 630–643.

    CAS  Google Scholar 

  • Griffin, J. L., & Shockcor, J. P. (2004). Metabolic profiles of cancer cells. Nature Reviews Cancer, 4, 551–561. doi:10.1038/nrc1390.

    Article  CAS  PubMed  Google Scholar 

  • Griffiths, W. J., Koal, T., Wang, Y., Kohl, M., Enot, D. P., & Deigner, H. P. (2010). Targeted metabolomics for biomarker discovery. Angewandte Chemie, 49, 5426–5445. doi:10.1002/anie.200905579.

    Article  CAS  PubMed  Google Scholar 

  • Guan, W., Zhou, M., Hampton, C. Y., Benigno, B. B., Walker, L. D., Gray, A., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259. doi:10.1186/1471-2105-10-259.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Gunderson, R. W. (1982). Choosing the r-dimension for the FCV family of clustering algorithms. BIT Numerical Mathematics, 22, 140–149.

    Article  Google Scholar 

  • Gunderson, R. W. (1983). An adaptive FCV clustering algorithm. International Journal of Man-Machine Studies, 19, 97–104.

    Article  Google Scholar 

  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.

    Google Scholar 

  • Haddad, I., Hiller, K., Frimmersdorf, E., Benkert, B., Schomburg, D., & Jahn, D. (2009). An emergent self-organizing map based analysis pipeline for comparative metabolome studies. In Silico Biology, 9, 163–178.

    CAS  PubMed  Google Scholar 

  • Hamerly, G., & Elkan, C. (2003). Learning the k in k-means. Advances in Neural Information Processing Systems, 16, 281–288.

  • Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28, 100–108.

    Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., & Tibshirani, R. (2009). The elements of statistical learning (Vol. 2). Berlin: Springer.

    Book  Google Scholar 

  • Heather, L. C., Wang, X., West, J. A., & Griffin, J. L. (2013). A practical guide to metabolomic profiling as a discovery tool for human heart disease. Journal of Molecular and Cellular Cardiology, 55, 2–11.

    Article  CAS  PubMed  Google Scholar 

  • Heinzmann, S. S., Brown, I. J., Chan, Q., Bictash, M., Dumas, M. E., Kochhar, S., et al. (2010). Metabolic profiling strategy for discovery of nutritional biomarkers: Proline betaine as a marker of citrus consumption. The American Journal of Clinical Nutrition, 92, 436–443.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Henneges, C., Bullinger, D., Fux, R., Friese, N., Seeger, H., Neubauer, H., et al. (2009). Prediction of breast cancer by profiling of urinary RNA metabolites using Support Vector Machine-based feature selection. BMC Cancer, 9, 104.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Holmans, P. (2010). Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Advances in Genetics, 72, 141–179. doi:10.1016/B978-0-12-380862-2.00007-2.

    PubMed  Google Scholar 

  • Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., et al. (2010). MassBank: A public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45, 703–714.

    Article  CAS  PubMed  Google Scholar 

  • Hou, Y., et al. (2012). Microbial strain prioritization using metabolomics tools for the discovery of natural products. Analytical Chemistry, 84, 4277–4283. doi:10.1021/ac202623g.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Huang, J. Z., Ng, M. K., Rong, H., & Li, Z. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 657–668.

    Article  PubMed  Google Scholar 

  • Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.

    Article  Google Scholar 

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31, 264–323.

    Article  Google Scholar 

  • Jansen, J. J., Hoefsloot, H. C., Boelens, H. F., van der Greef, J., & Smilde, A. K. (2004). Analysis of longitudinal metabolomics data. Bioinformatics, 20, 2438–2446. doi:10.1093/bioinformatics/bth268.

    Article  CAS  PubMed  Google Scholar 

  • Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.

    Article  CAS  PubMed  Google Scholar 

  • Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (6th ed.). Upper Saddle River, NJ: Pearson Prentice Hall.

    Google Scholar 

  • Jolliffe, I. (2005). Principal component analysis. New YorK: Wiley Online Library.

    Book  Google Scholar 

  • Kaddurah-Daouk, R., & Krishnan, K. R. (2009). Metabolomics: A global biochemical approach to the study of central nervous system diseases. Neuropsychopharmacology, 34, 173–186. doi:10.1038/npp.2008.174.

    Article  CAS  PubMed  Google Scholar 

  • Kanehisa, M. (2002). The KEGG database. Novartis Foundation Symposium, 247, 91–101 ; discussion 101–3, 119–28, 244–52.

    Article  CAS  PubMed  Google Scholar 

  • Kang, S. M., Park, J. C., Shin, M. J., Lee, H., Oh, J., Hwang, G. S., et al. (2011). (1)H nuclear magnetic resonance based metabolic urinary profiling of patients with ischemic heart failure. Clinical Biochemistry, 44, 293–299. doi:10.1016/j.clinbiochem.2010.11.010.

    Article  PubMed  CAS  Google Scholar 

  • Kell, D. B., Brown, M., Davey, H. M., Dunn, W. B., Spasic, I., & Oliver, S. G. (2005). Metabolic footprinting and systems biology: The medium is the message. Nature Reviews Microbiology, 3, 557–565. doi:10.1038/nrmicro1177.

    Article  CAS  PubMed  Google Scholar 

  • Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Computational Biology, 8, e1002375. doi:10.1371/journal.pcbi.1002375.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kilkenny, C., Parsons, N., Kadyszewski, E., Festing, M. F., Cuthill, I. C., Fry, D., et al. (2009). Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE, 4, e7824.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78, 1464–1480.

    Article  Google Scholar 

  • Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21, 1–6.

    Article  Google Scholar 

  • Kutner, M. H. (2005). Applied linear statistical models (5th ed.). McGraw-Hill/Irwin: Boston.

    Google Scholar 

  • Lauridsen, M., Hansen, S. H., Jaroszewski, J. W., & Cornett, C. (2007). Human urine as test material in 1H NMR-based metabonomics: Recommendations for sample preparation and storage. Analytical Chemistry, 79, 1181–1186.

    Article  CAS  PubMed  Google Scholar 

  • Li, H., Liang, Y., & Xu, Q. (2009a). Support vector machines and its applications in chemistry. Chemometrics and Intelligent Laboratory Systems, 95, 188–198.

    Article  CAS  Google Scholar 

  • Li, X., Lu, X., Tian, J., Gao, P., Kong, H., & Xu, G. (2009b). Application of fuzzy c-means clustering in data analysis of metabolomics. Analytical Chemistry, 81, 4468–4475.

    Article  CAS  PubMed  Google Scholar 

  • Li, F., Wang, J., Nie, L., & Zhang, W. (2012). Computational methods to interpret and integrate metabolomic data. New York: INTECH Open Access Publisher.

    Book  Google Scholar 

  • Luo, W., & Brouwer, C. (2013). Pathview: An R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics, 29, 1830–1831.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mahadevan, S., Shah, S. L., Marrie, T. J., & Slupsky, C. M. (2008). Analysis of metabolomic data using support vector machines. Analytical Chemistry, 80, 7562–7570.

    Article  CAS  PubMed  Google Scholar 

  • Martens, H. (1992). Multivariate calibration. New York: Wiley.

    Google Scholar 

  • Marzetti, E., Landi, F., Marini, F., Cesari, M., Buford, T. W., Manini, T. M., et al. (2014). Patterns of circulating inflammatory biomarkers in older persons with varying levels of physical performance: A partial least squares-discriminant analysis approach. Frontiers in Medicine, 1, 27. doi:10.3389/fmed.2014.00027.

  • Matthiesen, R., & SpringerLink (Online Service). (2010). Bioinformatics methods in clinical research. In S. Krawetz & S. Misener (Eds.), Methods in molecular biology, methods and protocols. Totowa: Humana Press.

  • Milliken, G. A., & Johnson, D. E. (2009). Analysis of messy data (2nd ed.). Boca Raton: CRC Press.

    Book  Google Scholar 

  • Milone, D. H., Stegmayer, G., López, M., Kamenetzky, L., & Carrari, F. (2014). Improving clustering with metabolic pathway data. BMC Bioinformatics, 15, 101.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Montgomery, D. C. (2008). Design and analysis of experiments. New York: Wiley.

    Google Scholar 

  • Nguyen, D. V., & Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18, 39–50.

    Article  CAS  PubMed  Google Scholar 

  • Nicholson, J. K., Lindon, J. C., & Holmes, E. (1999). ‘Metabonomics’: Understanding the metabolic responses of living systems to pathphysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica, 29, 1181–1189.

    Article  CAS  PubMed  Google Scholar 

  • Nin, N., Izquierdo-García, J., & Lorente, J. (2012). The metabolomic approach to the diagnosis of critical illness. In Annual update in intensive care and emergency medicine (pp. 43–52). Berlin: Springer.

  • Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C., Smilde, A. K., Talón, M., et al. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA. Bioinformatics, 23, 1792–1800. doi:10.1093/bioinformatics/btm251.

    Article  CAS  PubMed  Google Scholar 

  • Oliver, S. G. (2002). Functional genomics: Lessons from yeast. Philosophical Transactions of the Royal Society of London. Series B, Biological sciences, 357, 17–23. doi:10.1098/rstb.2001.1049.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Oliver, S. G., Winson, M. K., Kell, D. B., & Baganz, F. (1998). Systematic functional analysis of the yeast genome. Trends in Biotechnology, 16, 373–378

    Article  CAS  PubMed  Google Scholar 

  • O’Sullivan, A., Gibney, M. J., & Brennan, L. (2011). Dietary intake patterns are reflected in metabolomic profiles: Potential role in dietary assessment studies. The American Journal of Clinical Nutrition, 93, 314–321.

    Article  PubMed  CAS  Google Scholar 

  • Papin, J. A., Stelling, J., Price, N. D., Klamt, S., Schuster, S., & Palsson, B. O. (2004). Comparison of network-based pathway analysis methods. Trends in Biotechnology, 22, 400–405. doi:10.1016/j.tibtech.2004.06.010.

    Article  CAS  PubMed  Google Scholar 

  • Patel, K. N., Patel, J. K., Patel, M. P., Rajput, G. C., & Patel, H. A. (2010). Introduction to hyphenated techniques and their applications in pharmacy. Pharmaceutical Methods, 1, 2–13.

    Article  PubMed  PubMed Central  Google Scholar 

  • Pauling, L., Robinson, A. B., Teranishi, R., & Cary, P. (1971). Quantitative analysis of urine vapor and breath by gas-liquid partition chromatography. Proceedings of the National Academy of Sciences of the United States of America, 68, 2374–2376.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Poroyko, V., Morowitz, M., Bell, T., Ulanov, A., Wang, M., Donovan, S., et al. (2011). Diet creates metabolic niches in the “immature gut” that shape microbial communities. Nutricion Hospitalaria, 26, 1283–1295. doi:10.1590/S0212-16112011000600015.

    CAS  PubMed  Google Scholar 

  • Putri, S. P., Nakayama, Y., Matsuda, F., Uchikata, T., Kobayashi, S., Matsubara, A., et al. (2013). Current metabolomics: Practical applications. Journal of Bioscience and Bioengineering, 115, 579–589. doi:10.1016/j.jbiosc.2012.12.007.

    Article  CAS  PubMed  Google Scholar 

  • Ramadan, Z., Jacobs, D., Grigorov, M., & Kochhar, S. (2006). Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. Talanta, 68, 1683–1691.

    Article  CAS  PubMed  Google Scholar 

  • Raman, K., & Chandra, N. (2009). Flux balance analysis of biological systems: Applications and challenges. Briefings in Bioinformatics, 10, 435–449. doi:10.1093/bib/bbp011.

    Article  CAS  PubMed  Google Scholar 

  • Riter, L. S., Vitek, O., Gooding, K. M., Hodge, B. D., & Julian, R. K. (2005). Statistical design of experiments as a tool in mass spectrometry. Journal of Mass Spectrometry, 40, 565–579.

    Article  CAS  PubMed  Google Scholar 

  • Rocke, D. M. (2004). Design and analysis of experiments with high throughput biological assay data. Seminars in Cell & Developmental Biology, 15, 703–713.

    Article  CAS  Google Scholar 

  • Savorani, F., Tomasi, G., & Engelsen, S. B. (2010). icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. Journal of Magnetic Resonance, 202, 190–202.

    Article  CAS  PubMed  Google Scholar 

  • Scalbert, A., Brennan, L., Fiehn, O., Hankemeier, T., Kristal, B. S., van Ommen, B., et al. (2009). Mass-spectrometry-based metabolomics: Limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics, 5, 435–458.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schilling, C. H., Schuster, S., Palsson, B. O., & Heinrich, R. (1999). Metabolic pathway analysis: Basic concepts and scientific applications in the post-genomic era. Biotechnology Progress, 15, 296–303.

    Article  CAS  PubMed  Google Scholar 

  • Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.

    Article  Google Scholar 

  • Slupsky, C. M., Rankin, K. N., Wagner, J., Fu, H., Chang, D., Weljie, A. M., et al. (2007). Investigations of the effects of gender, diurnal variation, and age in human urinary metabolomic profiles. Analytical Chemistry, 79, 6995–7004.

    Article  CAS  PubMed  Google Scholar 

  • Smilde, A. K., Jansen, J. J., Hoefsloot, H. C., Lamers, R. J., van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics, 21, 3043–3048. doi:10.1093/bioinformatics/bti476.

    Article  CAS  PubMed  Google Scholar 

  • Smilde, A. K., Westerhuis, J. A., Hoefsloot, H. C. J., Bijlsma, S., Rubingh, C. M., Vis, D. J., et al. (2010). Dynamic metabolomic data analysis: A tutorial review. Metabolomics, 6, 3–17. doi:10.1007/s11306-009-0191-1.

    Article  CAS  PubMed  Google Scholar 

  • Smith, C. A., O’Maille, G., Want, E. J., Qin, C., Trauger, S. A., Brandon, T. R., et al. (2005). METLIN: A metabolite mass spectral database. Therapeutic Drug Monitoring, 27, 747–751.

    Article  CAS  PubMed  Google Scholar 

  • Smolinska, A., Blanchet, L., Buydens, L. M., & Wijmenga, S. S. (2012). NMR and pattern recognition methods in metabolomics: From data acquisition to biomarker discovery: A review. Analytica Chimica Acta, 750, 82–97. doi:10.1016/j.aca.2012.05.049.

    Article  CAS  PubMed  Google Scholar 

  • Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.

    Article  Google Scholar 

  • Steuer, R. (2007). Computational approaches to the topology, stability and dynamics of metabolic networks. Phytochemistry, 68, 2139–2151. doi:10.1016/j.phytochem.2007.04.041.

    Article  CAS  PubMed  Google Scholar 

  • Stretch, C., Eastman, T., Mandal, R., Eisner, R., Wishart, D. S., Mourtzakis, M., et al. (2012). Prediction of skeletal muscle and fat mass in patients with advanced cancer using a metabolomic approach. The Journal of Nutrition, 142, 14–21.

    Article  CAS  PubMed  Google Scholar 

  • Szczesniak, R. D., McPhail, G. L., Duan, L. L., Macaluso, M., Amin, R. S., & Clancy, J. P. (2013). A semiparametric approach to estimate rapid lung function decline in cystic fibrosis. Annals of Epidemiology, 23, 771–777.

    Article  PubMed  Google Scholar 

  • Szymanska, E., Saccenti, E., Smilde, A. K., & Westerhuis, J. A. (2012). Double-check: Validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics, 8, 3–16. doi:10.1007/s11306-011-0330-3.

    Article  CAS  PubMed  Google Scholar 

  • Theodoridis, G. A., Gika, H. G., Want, E. J., & Wilson, I. D. (2012). Liquid chromatography–mass spectrometry based global metabolite profiling: A review. Analytica Chimica Acta, 711, 7–16.

    Article  CAS  PubMed  Google Scholar 

  • Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411–423.

    Article  Google Scholar 

  • Timmerman, M. E., Ceulemans, E., De Roover, K., & Van Leeuwen, K. (2013). Subspace K-means clustering. Behavior Research Methods, 45, 1011–1023.

    Article  PubMed  Google Scholar 

  • Timmerman, M. E., Ceulemans, E., Kiers, H. A., & Vichi, M. (2010). Factorial and reduced K-means reconsidered. Computational Statistics & Data Analysis, 54, 1858–1871.

    Article  Google Scholar 

  • Timmerman, M. E., Hoefsloot, H. C., Smilde, A. K., & Ceulemans, E. (2015). Scaling in ANOVA-simultaneous component analysis. Metabolomics,. doi:10.1007/s11306-015-0785-8.

    PubMed  PubMed Central  Google Scholar 

  • Tomar, N., & De, R. K. (2013). Comparing methods for metabolic network analysis and an application to Metabolic Engineering. Gene, 521, 1–14.

    Article  CAS  PubMed  Google Scholar 

  • Tomasi, G., van den Berg, F., & Andersson, C. (2004). Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. Journal of Chemometrics, 18, 231–241.

    Article  CAS  Google Scholar 

  • Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16, 119–128.

    Article  CAS  Google Scholar 

  • Ultsch, A. (2003). U*-matrix: A tool to visualize clusters in high dimensional data. Marburg: Fachbereich Mathematik und Informatik.

    Google Scholar 

  • van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • VanDyke, R., Ren, Y., Sucharew, H. J., Miodovnik, M., Rosenn, B., & Khoury, J. C. (2012). Characterizing maternal glycemic control: A more informative approach using semiparametric regression. Journal of Maternal-Fetal and Neonatal Medicine, 25, 15–19.

    Article  PubMed  Google Scholar 

  • Velagapudi, V. R., et al. (2010). The gut microbiota modulates host energy and lipid metabolism in mice. Journal of Lipid Research, 51, 1101–1112.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Vettukattil, R. (2015). Preprocessing of raw metabonomic data. Metabonomics: Methods and Protocols, 1, 123–136.

    Google Scholar 

  • Vichi, M., & Kiers, H. A. (2001). Factorial k-means analysis for two-way data. Computational Statistics & Data Analysis, 37, 49–64.

    Article  Google Scholar 

  • Wang-Sattler, R., Yu, Z., Herder, C., Messias, A. C., Floegel, A., He, Y., et al. (2012). Novel biomarkers for pre-diabetes identified by metabolomics. Molecular Systems Biology,. doi:10.1038/msb.2012.43.

    PubMed  PubMed Central  Google Scholar 

  • Wetmore, D. R., Joseloff, E., Pilewski, J., Lee, D. P., Lawton, K. A., Mitchell, M. W., et al. (2010). Metabolomic profiling reveals biochemical pathways and biomarkers associated with pathogenesis in cystic fibrosis cells. Journal of Biological Chemistry, 285, 30516–30522. doi:10.1074/jbc.M110.140806.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wiechert, W. (2002). Modeling and simulation: Tools for metabolic engineering. Journal of Biotechnology, 94, 37–63.

    Article  CAS  PubMed  Google Scholar 

  • Wishart, D. S. (2007). Current progress in computational metabolomics. Briefings in Bioinformatics, 8, 279–293.

    Article  CAS  PubMed  Google Scholar 

  • Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., et al. (2013). HMDB 3.0—The human metabolome database in 2013. Nucleic Acids Research, 41, D801–D807. doi:10.1093/nar/gks1065.

    Article  CAS  PubMed  Google Scholar 

  • Wold, H. (1966). Estimation of principal components and related models by iterative least squares. Multivariate Analysis, 1, 391–420.

    Google Scholar 

  • Wold, S., Ruhe, A., Wold, H., & Dunn, W. J. (1984). The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal on Scientific and Statistical Computing, 5, 735–743.

    Article  Google Scholar 

  • Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58, 109–130.

    Article  CAS  Google Scholar 

  • Xi, Y., & Rocke, D. M. (2008). Baseline correction for NMR spectroscopic metabolomics data analysis. BMC Bioinformatics, 9, 324.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xia, J., Broadhurst, D. I., Wilson, M., & Wishart, D. S. (2012a). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9, 280–299. doi:10.1007/s11306-012-0482-9.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xia, J., Mandal, R., Sinelnikov, I. V., Broadhurst, D., & Wishart, D. S. (2012b). MetaboAnalyst 2.0—A comprehensive server for metabolomic data analysis. Nucleic Acids Research, 40, W127–W133.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xia, J., Psychogios, N., Young, N., & Wishart, D. S. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37, W652–W660.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xing, E. P., Jordan, M. I., Russell, S., & Ng, A. Y. (2002). Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (pp. 505–512). Cambridge, MA: MIT Press.

    Google Scholar 

  • Yan, M., & Ye, K. (2007). Determining the number of clusters using the weighted gap statistic. Biometrics, 63, 1031–1037.

    Article  PubMed  Google Scholar 

  • Yang, C., He, Z., & Yu, W. (2009). Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics, 10, 4.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Zhang, S., Gowda, G. N., Asiago, V., Shanaiah, N., Barbas, C., & Raftery, D. (2008). Correlative and quantitative 1 H NMR-based metabolomics reveals specific metabolic pathway disturbances in diabetic rats. Analytical Biochemistry, 383, 76–84.

    Article  CAS  PubMed  Google Scholar 

  • Zhang, J. D., & Wiemann, S. (2009). KEGGgraph: A graph approach to KEGG PATHWAY in R and bioconductor. Bioinformatics, 25, 1470–1471.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

We would like to express our great appreciation to Dr. Lilliam Ambroggio and Dr. Lindsey Romick-Rosendale for their valuable and constructive suggestions to our review. Their willingness to give their time so generously has been very much appreciated. This study is funded by the NIH Grant R01 HL116226 to RDS and LJL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Jason Lu.

Ethics declarations

Conflict of interest

Sheng Ren, Anna A. Hinzman, Emily L. Kang, Rhonda D. Szczesniak and L. Jason Lu declare that we have no conflict of interest and we have included separately signed conflict of interest forms in this manuscript.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 1395 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, S., Hinzman, A.A., Kang, E.L. et al. Computational and statistical analysis of metabolomics data. Metabolomics 11, 1492–1513 (2015). https://doi.org/10.1007/s11306-015-0823-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11306-015-0823-6

Keywords

Navigation