Skip to main content

Machine Learning in Untargeted Metabolomics Experiments

  • Protocol
  • First Online:
Book cover Microbial Metabolomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1859))

Abstract

Machine learning is a form of artificial intelligence (AI) that provides computers with the ability to learn generally without being explicitly programmed. Machine learning refers to the ability of computer programs to adapt when exposed to new data. Here we examine the use of machine learning for use with untargeted metabolomics data, when it is appropriate to use, and questions it can answer. We provide an example workflow for training and testing a simple binary classifier, a multiclass classifier and a support vector machine using the Waikato Environment for Knowledge Analysis (Weka), a toolkit for machine learning. This workflow should provide a framework for greater integration of machine learning with metabolomics study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alpaydin E et al (2010) Introduction to machine learning. MIT Press, Cambridge, MA

    Google Scholar 

  2. Cortes C, Vapnik V et al (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  3. Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc Fourteenth Int Joint Conf Artif Intell 2(12):1137–1143

    Google Scholar 

  4. Hawkins DM et al (2004) The problem of overfitting. J Chem Inf Comput Sci 44(1):1–12

    Article  CAS  Google Scholar 

  5. Vafaie H, Jong KD et al (1992) Genetic algorithms as a tool for feature selection in machine learning. Proc 1992 I.E. Int Conf on Tools with AI 11:200–203

    Google Scholar 

  6. Bartlett MS, Littlewort G, Lainscsek C, Fasel I, Movellan J et al (2004) Machine learning methods for fully automatic recognition of facial expressions and facial actions. Proc 2004 I.E. Int Conf on systems. Man and Cybernetics 10:592–597

    Google Scholar 

  7. Russell S, Norvig P et al (2003) Artificial intelligence: a modern approach. Prentice Hall, USA

    Google Scholar 

  8. Murtagh F et al (1985) Multidimensional Clustering Algorithms. In: COMPSTAT Lectures 4. Physica-Verlag, Wuerzburg

    Google Scholar 

  9. Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London

    Google Scholar 

  10. Venables WN, Ripley BD et al (2002) Modern applied statistics with S. Springer-Verlag, Berlin

    Book  Google Scholar 

  11. McQuitty LL et al (1966) Similarity analysis by reciprocal pairs for discrete and continuous data. Educ Psychol Meas 26:825–831

    Article  Google Scholar 

  12. Gordon AD (1999) Classification, 2nd edn. Chapman and Hall/CRC, London

    Google Scholar 

  13. Everitt B (1974) Cluster analysis. Heinemann Educational Books, London

    Google Scholar 

  14. Hartigan JA (1975) Clustering algorithms. Wiley, New York

    Google Scholar 

  15. Anderberg MR (1973) Cluster analysis for applications. Academic Press, New York

    Google Scholar 

  16. Heinemann J, Mazurie A, Tokmina-Lukaszewska M, Beilman GJ, Bothner B et al (2014) Application of support vector machines to metabolomics experiments with limited replicates. Metabolomics 10:1121–1128

    Article  CAS  Google Scholar 

  17. Guan W, Zhou M, Hampton CY, Benigno BB, Walker LD, Gray A et al (2009) Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics 10:259

    Article  Google Scholar 

  18. Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  Google Scholar 

  19. VeselKov KA, Vingara LK, Masson P, Robinette SL, Want E, Li JV et al (2011) Optimizing preprocessing of ultraperformance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. Anal Chem 83:5864–5872

    Article  CAS  Google Scholar 

  20. Lin X, Wang Q, Yin P, Tang L, Tan Y, Li H et al (2011) A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection. Metabolomics 7(4):549–558

    Article  CAS  Google Scholar 

  21. Bertini I, Calabro A, De Carli V, Luchinat C, Nepi S, Porfirio B et al (2009) The metabonomic signature of celiac disease. J Proteome Res 8:170–177

    Article  CAS  Google Scholar 

  22. Smith C, O’Maille G, Want EJ, Qin C, Trauger S, Brandon TR et al (2005) METLIN: a metabolite mass spectral database. Ther Drug Monit 27(6):747–751

    Article  CAS  Google Scholar 

  23. Tautenhahn R, Bo¨ttcher C, Neumann S et al (2008) Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 9:504

    Article  Google Scholar 

  24. Yanes O, Tautenhahn R, Patti GJ, Siuzdak G et al (2011) Expanding coverage of the metabolome for global metabolite profiling. Anal Chem 83(6):2152–2161

    Article  CAS  Google Scholar 

  25. Duan K, Rajapakse JC et al (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4:228–234

    Article  Google Scholar 

  26. Hall M, National H, Frank E, Holmes G, Pfahringer B, Reutemann P et al (2010) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18

    Article  Google Scholar 

  27. Asa BH, Horn D, Hava S, Vapnik V et al (2001) Support vector clustering. J Mach Learn Res 2:125–137

    Google Scholar 

Download references

Acknowledgments

The authors would also like to acknowledge that this work was part of the DOE Joint BioEnergy Institute (http://www.jbei.org) supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua Heinemann .

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Supplementary File 1

Example data files containing mass spectrometry based intensity (relative abundance) information for metabolites in both .csv and .arff format (ZIP 524 kb)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Heinemann, J. (2019). Machine Learning in Untargeted Metabolomics Experiments. In: Baidoo, E. (eds) Microbial Metabolomics. Methods in Molecular Biology, vol 1859. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8757-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8757-3_17

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8756-6

  • Online ISBN: 978-1-4939-8757-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics