Elsevier

TrAC Trends in Analytical Chemistry

Volume 82, September 2016, Pages 425-442
TrAC Trends in Analytical Chemistry

Data analysis strategies for targeted and untargeted LC-MS metabolomic studies: Overview and workflow

https://doi.org/10.1016/j.trac.2016.07.004Get rights and content

Highlights

  • Data analysis is the “bottleneck” of metabolomic LC-MS studies.

  • Huge amounts of data are produced by LC-MS metabolomics.

  • LC-MS data analysis strategies vary according to the type of metabolomic study: targeted or untargeted.

  • Numerous data analysis methodologies exist with different capabilities.

  • This review shows distinct data analysis strategies for LC-MS metabolomic studies.

Abstract

Data analysis is a very challenging task in LC-MS metabolomic studies. The use of powerful analytical techniques (e.g., high-resolution mass spectrometry) provides high-dimensional data, often with noisy and collinear structures. Such amount of information-rich mass spectrometry data requires extensive processing in order to handle metabolomic data sets appropriately and to further assess sample classification/discrimination and biomarker discovery.

This review shows the steps involved in the data analysis workflow for both targeted and untargeted metabolomic studies. Especial attention is focused on the distinct methodologies that have been developed in the last decade for the untargeted case. Furthermore, some powerful and recent alternatives based on the use of chemometric tools will also be discussed. In general terms, this review helps researchers to critically explore the distinct alternatives for LC-MS metabolomic data analysis to better choose the most appropriate for their case study.

Introduction

Metabolomics [1], [2], [3] is one of the categorical platforms that constitute omics [4] (see Fig. 1). Omics is a field that aims at the study of the abundance and (or) structural characterization of a broad range of molecules in organisms under distinct scenarios. In the clinical field, high-throughput omic technologies are used for the characterization of diseases to better predict the clinical course of organisms and to evaluate the efficacy of existing or under-development therapies [5]. In food science, omics plays a significant role in the light of an improvement of human nutrition [6]. In the environmental field, omic studies aim at the evaluation of the alterations that organisms might suffer after exposure to environmental stressors [7], [8].

In all cases, the expressed molecules are involved in most crucial biological processes, and principally comprehend deoxyribonucleic acid (DNA) (genomics [9], epigenomics [10]), ribonucleic acid (RNA) (transcriptomics [11]), proteins (proteomics [12]), and other small molecules (metabolomics [1], [2], [3]). In more recent years, another categorical omic platform named fluxomics [13], [14], which aims at the study of the fluxome, or the total set of fluxes in the metabolic network of the biological specimen, has gained relevance. Apart from these categorical omic platforms, a variety of omic subdisciplines have also emerged (e.g., lipidomics [15], glycomics [16], foodomics [6], [17], interactomics [18], and metallomics [19]), showing that omics is a constantly evolving discipline. Among all these omic platforms, metabolomics is becoming increasingly popular and is used to detect the perturbations that disease, drugs or toxins might cause on concentrations and fluxes of metabolites involved in key biochemical pathways [20]. Due to its importance and relevance, the current study concentrates on metabolomic data.

Several analytical techniques have been developed for each of the omic platforms (see Fig. 1), including DNA microarray-based and RNA-sequencing techniques [21], nuclear magnetic resonance (NMR) spectroscopy [22], [23] and mass spectrometry (MS) methods [24], [25]. In the field of metabolomics, both NMR and MS techniques are the most popular. High-resolution proton NMR spectroscopy (1H-NMR) has proved to be one of the most powerful technologies for examining biofluids and studying intact tissues, producing a comprehensive profile of metabolite signals without separation, derivatization, and preselected measurement parameters [26], [27]. On the other hand, MS methods, both by direct injection [28] or coupled to chromatographic techniques [29], have also evolved into a powerful technology for metabolomics due to their ability in the analysis of low molecular weight compounds in biological systems. These two approaches (i.e., NMR and MS) are complementary, and the integration of both technologies to provide more comprehensive information is now pursued in the metabolomics field. Nevertheless, this study concentrates on MS-based metabolomic data.

Concerning MS instrumentation, high-resolution mass spectrometers are the most powerful analysers due to their ability to improve accurate mass determination. In fact, spectrometers such as time-of-flight (TOF) [30], quadrupole time-of-flight (Q-TOF) [31], and Fourier transform ion cyclotron resonance (FT-ICR) [32] spectrometers and orbital ion traps [33], have substituted in many cases the conventional low-resolution quadrupoles and linear ion traps (IT), due to their ability to resolve isomeric and isobaric species and elucidate elemental composition [34]. Regarding chromatographic techniques, early metabolomic studies were commonly based on gas chromatography (GC), since it is a highly efficient, sensitive and reproducible technique [35]. However, GC has the drawback that only volatile compounds or compounds that are made volatile after derivatisation can be analysed, and extensive sample preparation is often required. In contrast, high-performance liquid chromatography (HPLC) and ultra high-performance liquid chromatography (UHPLC) are considered to be more comprehensive than GC since they allow the analysis of a wider range of metabolites without the requirement of derivatisation [36], [37], [38], [39]. Hence, liquid chromatography coupled to mass spectrometry (LC-MS) has lately gained popularity in the metabolomics field in detriment of gas chromatography coupled to mass spectrometry (GC-MS), this being the reason why this study is focused on the former technique.

The improvement of analytical techniques has gradually caused metabolomic data sets to become larger with more intricate inner structures [40]. Mass spectrometric based techniques generate highly complex data, due to the vast number of measurements (i.e., MS spectrum at each retention time) related to the number of observations (i.e., samples). In the case of LC-MS analysis (see Fig. 1), data generated from each chromatogram are arranged in data sets containing information of mass-to-charge (m/z), retention times and intensities. Hence, massive amounts of information-rich MS data are generated in the analysis of every sample, thus requiring specific standard approaches for its study and interpretation [41].

In general terms, data analysis strategies are classified in two groups: data analysis strategies for targeted (Fig. 2) and untargeted (Fig. 3) metabolomic studies. The reason for such differentiation is due to the different types of data generated in these two approaches, which require being handled accordingly. Targeted studies [42] focus the research on a set of known metabolites whereas untargeted studies [43] allow a more comprehensive evaluation of metabolomic profiles. Most of the methodologies used in early targeted studies just allowed the identification of a few number of metabolites [44]. Nevertheless, recent targeted methodologies enable large-scale metabolic profiling, including hundreds of compounds [45], [46], [47]. However, the number of compounds analysed in untargeted studies is even larger. This is so because one must process entire data sets including thousands of metabolite signals, and among these, few are finally identified as candidate biomarkers [48]. Therefore, data analysis strategies for untargeted studies require highly-extensive processing of LC-MS chromatograms. A large number of data analysis strategies are found in the literature but none of them can be singled out as the optimal choice in all cases, which makes data analysis an open task in the bioinformatics research. In fact, the field of MS-based metabolomics is rather young, and new methods, software and platforms are being regularly published or updated [49], [50].

A recent review of Yi et al. [51] summarizes recent and potential advances in chemometric methods in relation to data processing in untargeted metabolomic studies. Various aspects, including raw data pre-processing, metabolite identification, and variable selection and modeling are accurately discussed and presented there. The present review complements the previous one with some data analysis steps not covered or partially covered by the former (e.g., data acquisition, data storage and conversion, data import, data compression and feature detection or peak resolution), presents novel and little known chemometric tools for data analysis and includes an overview of the data analysis strategies for targeted studies. Moreover, it is intended to contribute to the state-of-art by providing comprehensive information on bioanalytical and data processing tools rather than describing the principles of the chemometric methods that can be used in LC-MS metabolomic data analysis.

Section snippets

General overview of the data analysis approaches

LC-MS metabolomic data analysis strategies are primarily designed for targeted and untargeted studies. However, future advances in LC-MS metabolomics may lead to a merging of targeted and untargeted analyses; with the targeted approach providing more sensitive and accurate detection of predetermined metabolites, and the untargeted approach being able to detect and identify unknown metabolites [52]. Indeed, first steps in this direction were made by Savolainen et al. [53], who collected for the

The data analysis workflow for targeted and untargeted metabolomic studies

This section provides details of the steps involved in data analysis workflows for targeted and untargeted studies (highlighting common aspects), and finishes with a common explanation of the biochemical interpretation for both approaches.

LC-MS metabolomic data analysis: an active area in bioinformatics research

The development of tools for data analysis is an active area of bioinformatics research. Recent years have witnessed the development of many software tools for data analysis, but still there is a need for further improvement of the data analysis pipeline. Such improvement should concentrate on two aspects: combination of data analysis strategies and fusion of distinct omic fields.

The combination of various data analysis strategies is necessary to allow a more comprehensive detection of chemical

Concluding remarks

From a general point of view, we can conclude that the complexity of LC-MS metabolomic data and the diversity of strategies that are used for their processing makes data analysis an open field in the bioinformatics research. In global terms, targeted strategies allow highly sensitive and accurate detection of predetermined metabolites whereas untargeted strategies are valuable for the detection of unknown metabolites and biochemical pathways. However, both approaches are complementary and can

Acknowledgements

The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007–2013) / ERC Grant Agreement n. 320737. First author acknowledges the Spanish Government (Ministerio de Educación, Cultura y Deporte) for a predoctoral FPU scholarship (FPU13/04384).

References (201)

  • A. Garanto et al.

    Specific sphingolipid content decrease in Cerkl knockdown mouse retinas

    Exp. Eye Res

    (2013)
  • J.C.L. Erve et al.

    Spectral accuracy of molecular ions in an LTQ/Orbitrap mass spectrometer and implications for elemental composition determination

    J. Am. Soc. Mass Spectrom

    (2009)
  • E. Dudley et al.

    Targeted metabolomics and mass spectrometry

    Adv. Protein Chem. Struct. Biol

    (2010)
  • M. Gergov et al.

    Simultaneous screening for 238 drugs in blood by liquid chromatography–ionspray tandem mass spectrometry with multiple-reaction monitoring

    J. Chromatogr. B

    (2003)
  • S.U. Bajad et al.

    Separation and quantitation of water soluble cellular metabolites by hydrophilic interaction chromatography-tandem mass spectrometry

    J. Chromatogr. A

    (2006)
  • M. Katajamaa et al.

    Data processing for mass spectrometry-based metabolomics

    J. Chromatogr. A

    (2007)
  • L. Wu et al.

    Quantitative analysis of the microbial metabolome by isotope dilution mass spectrometry using uniformly 13C-labeled cell extracts as internal standards

    Anal. Biochem

    (2005)
  • W. Lu et al.

    A high-performance liquid chromatography-tandem mass spectrometry method for quantitation of nitrogen-containing intracellular metabolites

    J. Am. Soc. Mass Spectrom

    (2006)
  • W. Lu et al.

    Isotope ratio-based profiling of microbial folates

    J. Am. Soc. Mass Spectrom

    (2007)
  • A. Bajoub et al.

    Comparing two metabolic profiling approaches (liquid chromatography and gas chromatography coupled to mass spectrometry) for extra-virgin olive oil phenolic compounds analysis: a botanical classification perspective

    J. Chromatogr. A

    (2016)
  • O. Fiehn et al.

    Metabolite profiling for plant functional genomics

    Nat. Biotechnol

    (2000)
  • O. Fiehn

    Metabolomics – the link between genotypes and phenotypes

    Plant Mol. Biol

    (2002)
  • G.J. Patti et al.

    Innovation: metabolomics: the apogee of the omics trilogy

    Nat. Rev. Mol. Cell Biol

    (2012)
  • M. Chadeau-Hyam et al.

    Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers

    Environ. Mol. Mutagen

    (2013)
  • L.M. McShane et al.

    Criteria for the use of omics-based predictors in clinical trials: explanation and elaboration

    BMC Med

    (2013)
  • F. Capozzi et al.

    Foodomics: a new comprehensive approach to food and nutrition

    Genes Nutr

    (2013)
  • J.G. Bundy et al.

    Environmental metabolomics: a critical review and future perspectives

    Metabolomics

    (2008)
  • M.R. Viant et al.

    Mass spectrometry based environmental metabolomics: a primer and review

    Metabolomics

    (2012)
  • M. Adams et al.

    Complementary DNA sequencing: expressed sequence tags and human genome project

    Science

    (1991)
  • M.J. Fazzari et al.

    Epigenomics: beyond CpG islands

    Nat. Rev. Genet

    (2004)
  • A. Abbott

    Proteomics, transcriptomics: what's in a name?

    Nature

    (1999)
  • N.L. Anderson et al.

    Proteome and proteomics: new technologies, new concepts, and new words

    Electrophoresis

    (1998)
  • G. Winter et al.

    Fluxomics – connecting ‘omics analysis and phenotypes

    Environ. Microbiol

    (2013)
  • M. Cascante et al.

    Metabolomics and fluxomics approaches

    Essays Biochem

    (2008)
  • J.E. Turnbull et al.

    Emerging glycomics technologies

    Nat. Chem. Biol

    (2007)
  • M. Herrero et al.

    Foodomics: MS-based strategies in modern food science and nutrition

    Mass Spectrom. Rev

    (2012)
  • W. Zhang et al.

    Integrating multiple “omics” analysis for microbial biology: application and methodologies

    Microbiology

    (2010)
  • A.K. Shanker et al.

    Chromium interactions in plants: current status and future strategies

    Metallomics

    (2009)
  • J.K. Nicholson et al.

    Metabonomics: a platform for studying drug toxicity and gene function

    Nat. Rev. Drug Discov

    (2002)
  • B. Campos et al.

    Identification of metabolic pathways in Daphnia magna explaining hormetic effects of selective serotonin reuptake inhibitors and 4-nonylphenol using transcriptomic and phenotypic responses

    Environ. Sci. Technol

    (2013)
  • F. Puig-Castellví et al.

    A quantitative 1H NMR approach for evaluating the metabolic response of Saccharomyces cerevisiae to mild heat stress

    Metabolomics

    (2015)
  • J.M. Halket

    Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS

    J. Exp. Bot

    (2004)
  • K. Dettmer et al.

    Mass spectrometry-based metabolomics

    Mass Spectrom. Rev

    (2007)
  • J.C. Lindon et al.

    Peer reviewed: so what's the deal with metabonomics?

    Anal. Chem

    (2003)
  • R.J.M. Weber et al.

    Characterization of isotopic abundance measurements in high resolution FT-ICR and Orbitrap mass spectra for improved confidence of metabolite identification

    Anal. Chem

    (2011)
  • I.D. Wilson et al.

    High resolution “ultra performance” liquid chromatography coupled to oa-TOF mass spectrometry as a tool for differential metabolic pathway profiling in functional genomic studies

    J. Proteome Res

    (2005)
  • P.J. Weaver et al.

    Investigation of the advanced functionalities of a hybrid quadrupole orthogonal acceleration time-of-flight mass spectrometer

    Rapid Commun. Mass Spectrom

    (2007)
  • S.C. Brown et al.

    Metabolomics applications of FT-ICR mass spectrometry

    Mass Spectrom. Rev

    (2005)
  • A. Koulman et al.

    High-resolution extracted ion chromatography, a new tool for metabolomics and lipidomics using a second-generation orbitrap mass spectrometer

    Rapid Commun. Mass Spectrom

    (2009)
  • E. Rathahao-Paris et al.

    High resolution mass spectrometry for structural identification of metabolites in metabolomics

    Metabolomics

    (2015)
  • Cited by (241)

    View all citing articles on Scopus
    View full text