Machine Learning in Untargeted Metabolomics Experiments

Heinemann, Joshua

doi:10.1007/978-1-4939-8757-3_17

Joshua Heinemann^3,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1859))

3895 Accesses
14 Citations

Abstract

Machine learning is a form of artificial intelligence (AI) that provides computers with the ability to learn generally without being explicitly programmed. Machine learning refers to the ability of computer programs to adapt when exposed to new data. Here we examine the use of machine learning for use with untargeted metabolomics data, when it is appropriate to use, and questions it can answer. We provide an example workflow for training and testing a simple binary classifier, a multiclass classifier and a support vector machine using the Waikato Environment for Knowledge Analysis (Weka), a toolkit for machine learning. This workflow should provide a framework for greater integration of machine learning with metabolomics study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alpaydin E et al (2010) Introduction to machine learning. MIT Press, Cambridge, MA
Google Scholar
Cortes C, Vapnik V et al (1995) Support-vector networks. Mach Learn 20(3):273–297
Google Scholar
Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc Fourteenth Int Joint Conf Artif Intell 2(12):1137–1143
Google Scholar
Hawkins DM et al (2004) The problem of overfitting. J Chem Inf Comput Sci 44(1):1–12
Article CAS Google Scholar
Vafaie H, Jong KD et al (1992) Genetic algorithms as a tool for feature selection in machine learning. Proc 1992 I.E. Int Conf on Tools with AI 11:200–203
Google Scholar
Bartlett MS, Littlewort G, Lainscsek C, Fasel I, Movellan J et al (2004) Machine learning methods for fully automatic recognition of facial expressions and facial actions. Proc 2004 I.E. Int Conf on systems. Man and Cybernetics 10:592–597
Google Scholar
Russell S, Norvig P et al (2003) Artificial intelligence: a modern approach. Prentice Hall, USA
Google Scholar
Murtagh F et al (1985) Multidimensional Clustering Algorithms. In: COMPSTAT Lectures 4. Physica-Verlag, Wuerzburg
Google Scholar
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London
Google Scholar
Venables WN, Ripley BD et al (2002) Modern applied statistics with S. Springer-Verlag, Berlin
Book Google Scholar
McQuitty LL et al (1966) Similarity analysis by reciprocal pairs for discrete and continuous data. Educ Psychol Meas 26:825–831
Article Google Scholar
Gordon AD (1999) Classification, 2nd edn. Chapman and Hall/CRC, London
Google Scholar
Everitt B (1974) Cluster analysis. Heinemann Educational Books, London
Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Google Scholar
Anderberg MR (1973) Cluster analysis for applications. Academic Press, New York
Google Scholar
Heinemann J, Mazurie A, Tokmina-Lukaszewska M, Beilman GJ, Bothner B et al (2014) Application of support vector machines to metabolomics experiments with limited replicates. Metabolomics 10:1121–1128
Article CAS Google Scholar
Guan W, Zhou M, Hampton CY, Benigno BB, Walker LD, Gray A et al (2009) Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics 10:259
Article Google Scholar
Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Article Google Scholar
VeselKov KA, Vingara LK, Masson P, Robinette SL, Want E, Li JV et al (2011) Optimizing preprocessing of ultraperformance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. Anal Chem 83:5864–5872
Article CAS Google Scholar
Lin X, Wang Q, Yin P, Tang L, Tan Y, Li H et al (2011) A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection. Metabolomics 7(4):549–558
Article CAS Google Scholar
Bertini I, Calabro A, De Carli V, Luchinat C, Nepi S, Porfirio B et al (2009) The metabonomic signature of celiac disease. J Proteome Res 8:170–177
Article CAS Google Scholar
Smith C, O’Maille G, Want EJ, Qin C, Trauger S, Brandon TR et al (2005) METLIN: a metabolite mass spectral database. Ther Drug Monit 27(6):747–751
Article CAS Google Scholar
Tautenhahn R, Bo¨ttcher C, Neumann S et al (2008) Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 9:504
Article Google Scholar
Yanes O, Tautenhahn R, Patti GJ, Siuzdak G et al (2011) Expanding coverage of the metabolome for global metabolite profiling. Anal Chem 83(6):2152–2161
Article CAS Google Scholar
Duan K, Rajapakse JC et al (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4:228–234
Article Google Scholar
Hall M, National H, Frank E, Holmes G, Pfahringer B, Reutemann P et al (2010) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18
Article Google Scholar
Asa BH, Horn D, Hava S, Vapnik V et al (2001) Support vector clustering. J Mach Learn Res 2:125–137
Google Scholar

Download references

Acknowledgments

The authors would also like to acknowledge that this work was part of the DOE Joint BioEnergy Institute (http://www.jbei.org) supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy.

Author information

Authors and Affiliations

Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Joshua Heinemann
Joint BioEnergy Institute, Emeryville, CA, USA
Joshua Heinemann

Authors

Joshua Heinemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joshua Heinemann .

Editor information

Editors and Affiliations

Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA
Edward E.K. Baidoo

1 Electronic Supplementary Material

Supplementary File 1

Example data files containing mass spectrometry based intensity (relative abundance) information for metabolites in both .csv and .arff format (ZIP 524 kb)

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Heinemann, J. (2019). Machine Learning in Untargeted Metabolomics Experiments. In: Baidoo, E. (eds) Microbial Metabolomics. Methods in Molecular Biology, vol 1859. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8757-3_17

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8757-3_17
Published: 13 November 2018
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8756-6
Online ISBN: 978-1-4939-8757-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics