Data analysis strategies for targeted and untargeted LC-MS metabolomic studies: Overview and workflow
Introduction
Metabolomics [1], [2], [3] is one of the categorical platforms that constitute omics [4] (see Fig. 1). Omics is a field that aims at the study of the abundance and (or) structural characterization of a broad range of molecules in organisms under distinct scenarios. In the clinical field, high-throughput omic technologies are used for the characterization of diseases to better predict the clinical course of organisms and to evaluate the efficacy of existing or under-development therapies [5]. In food science, omics plays a significant role in the light of an improvement of human nutrition [6]. In the environmental field, omic studies aim at the evaluation of the alterations that organisms might suffer after exposure to environmental stressors [7], [8].
In all cases, the expressed molecules are involved in most crucial biological processes, and principally comprehend deoxyribonucleic acid (DNA) (genomics [9], epigenomics [10]), ribonucleic acid (RNA) (transcriptomics [11]), proteins (proteomics [12]), and other small molecules (metabolomics [1], [2], [3]). In more recent years, another categorical omic platform named fluxomics [13], [14], which aims at the study of the fluxome, or the total set of fluxes in the metabolic network of the biological specimen, has gained relevance. Apart from these categorical omic platforms, a variety of omic subdisciplines have also emerged (e.g., lipidomics [15], glycomics [16], foodomics [6], [17], interactomics [18], and metallomics [19]), showing that omics is a constantly evolving discipline. Among all these omic platforms, metabolomics is becoming increasingly popular and is used to detect the perturbations that disease, drugs or toxins might cause on concentrations and fluxes of metabolites involved in key biochemical pathways [20]. Due to its importance and relevance, the current study concentrates on metabolomic data.
Several analytical techniques have been developed for each of the omic platforms (see Fig. 1), including DNA microarray-based and RNA-sequencing techniques [21], nuclear magnetic resonance (NMR) spectroscopy [22], [23] and mass spectrometry (MS) methods [24], [25]. In the field of metabolomics, both NMR and MS techniques are the most popular. High-resolution proton NMR spectroscopy (1H-NMR) has proved to be one of the most powerful technologies for examining biofluids and studying intact tissues, producing a comprehensive profile of metabolite signals without separation, derivatization, and preselected measurement parameters [26], [27]. On the other hand, MS methods, both by direct injection [28] or coupled to chromatographic techniques [29], have also evolved into a powerful technology for metabolomics due to their ability in the analysis of low molecular weight compounds in biological systems. These two approaches (i.e., NMR and MS) are complementary, and the integration of both technologies to provide more comprehensive information is now pursued in the metabolomics field. Nevertheless, this study concentrates on MS-based metabolomic data.
Concerning MS instrumentation, high-resolution mass spectrometers are the most powerful analysers due to their ability to improve accurate mass determination. In fact, spectrometers such as time-of-flight (TOF) [30], quadrupole time-of-flight (Q-TOF) [31], and Fourier transform ion cyclotron resonance (FT-ICR) [32] spectrometers and orbital ion traps [33], have substituted in many cases the conventional low-resolution quadrupoles and linear ion traps (IT), due to their ability to resolve isomeric and isobaric species and elucidate elemental composition [34]. Regarding chromatographic techniques, early metabolomic studies were commonly based on gas chromatography (GC), since it is a highly efficient, sensitive and reproducible technique [35]. However, GC has the drawback that only volatile compounds or compounds that are made volatile after derivatisation can be analysed, and extensive sample preparation is often required. In contrast, high-performance liquid chromatography (HPLC) and ultra high-performance liquid chromatography (UHPLC) are considered to be more comprehensive than GC since they allow the analysis of a wider range of metabolites without the requirement of derivatisation [36], [37], [38], [39]. Hence, liquid chromatography coupled to mass spectrometry (LC-MS) has lately gained popularity in the metabolomics field in detriment of gas chromatography coupled to mass spectrometry (GC-MS), this being the reason why this study is focused on the former technique.
The improvement of analytical techniques has gradually caused metabolomic data sets to become larger with more intricate inner structures [40]. Mass spectrometric based techniques generate highly complex data, due to the vast number of measurements (i.e., MS spectrum at each retention time) related to the number of observations (i.e., samples). In the case of LC-MS analysis (see Fig. 1), data generated from each chromatogram are arranged in data sets containing information of mass-to-charge (m/z), retention times and intensities. Hence, massive amounts of information-rich MS data are generated in the analysis of every sample, thus requiring specific standard approaches for its study and interpretation [41].
In general terms, data analysis strategies are classified in two groups: data analysis strategies for targeted (Fig. 2) and untargeted (Fig. 3) metabolomic studies. The reason for such differentiation is due to the different types of data generated in these two approaches, which require being handled accordingly. Targeted studies [42] focus the research on a set of known metabolites whereas untargeted studies [43] allow a more comprehensive evaluation of metabolomic profiles. Most of the methodologies used in early targeted studies just allowed the identification of a few number of metabolites [44]. Nevertheless, recent targeted methodologies enable large-scale metabolic profiling, including hundreds of compounds [45], [46], [47]. However, the number of compounds analysed in untargeted studies is even larger. This is so because one must process entire data sets including thousands of metabolite signals, and among these, few are finally identified as candidate biomarkers [48]. Therefore, data analysis strategies for untargeted studies require highly-extensive processing of LC-MS chromatograms. A large number of data analysis strategies are found in the literature but none of them can be singled out as the optimal choice in all cases, which makes data analysis an open task in the bioinformatics research. In fact, the field of MS-based metabolomics is rather young, and new methods, software and platforms are being regularly published or updated [49], [50].
A recent review of Yi et al. [51] summarizes recent and potential advances in chemometric methods in relation to data processing in untargeted metabolomic studies. Various aspects, including raw data pre-processing, metabolite identification, and variable selection and modeling are accurately discussed and presented there. The present review complements the previous one with some data analysis steps not covered or partially covered by the former (e.g., data acquisition, data storage and conversion, data import, data compression and feature detection or peak resolution), presents novel and little known chemometric tools for data analysis and includes an overview of the data analysis strategies for targeted studies. Moreover, it is intended to contribute to the state-of-art by providing comprehensive information on bioanalytical and data processing tools rather than describing the principles of the chemometric methods that can be used in LC-MS metabolomic data analysis.
Section snippets
General overview of the data analysis approaches
LC-MS metabolomic data analysis strategies are primarily designed for targeted and untargeted studies. However, future advances in LC-MS metabolomics may lead to a merging of targeted and untargeted analyses; with the targeted approach providing more sensitive and accurate detection of predetermined metabolites, and the untargeted approach being able to detect and identify unknown metabolites [52]. Indeed, first steps in this direction were made by Savolainen et al. [53], who collected for the
The data analysis workflow for targeted and untargeted metabolomic studies
This section provides details of the steps involved in data analysis workflows for targeted and untargeted studies (highlighting common aspects), and finishes with a common explanation of the biochemical interpretation for both approaches.
LC-MS metabolomic data analysis: an active area in bioinformatics research
The development of tools for data analysis is an active area of bioinformatics research. Recent years have witnessed the development of many software tools for data analysis, but still there is a need for further improvement of the data analysis pipeline. Such improvement should concentrate on two aspects: combination of data analysis strategies and fusion of distinct omic fields.
The combination of various data analysis strategies is necessary to allow a more comprehensive detection of chemical
Concluding remarks
From a general point of view, we can conclude that the complexity of LC-MS metabolomic data and the diversity of strategies that are used for their processing makes data analysis an open field in the bioinformatics research. In global terms, targeted strategies allow highly sensitive and accurate detection of predetermined metabolites whereas untargeted strategies are valuable for the detection of unknown metabolites and biochemical pathways. However, both approaches are complementary and can
Acknowledgements
The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007–2013) / ERC Grant Agreement n. 320737. First author acknowledges the Spanish Government (Ministerio de Educación, Cultura y Deporte) for a predoctoral FPU scholarship (FPU13/04384).
References (201)
- et al.
Global analyses of cellular lipidomes directly from crude extracts of biological samples by ESI mass spectrometry: a bridge to lipidomics
J. Lipid Res
(2003) - et al.
NMR-based plant metabolomics: where do we stand, where do we go?
Trends Biotechnol
(2011) - et al.
High resolution proton magnetic resonance spectroscopy of biological fluids
Prog. Nucl. Magn. Reson. Spectrosc
(1989) - et al.
HPLC-MS-based methods for the study of metabonomics
J. Chromatogr. B. Analyt Technol Biomed Life Sci
(2005) - et al.
Analytical strategies for LC-MS-based targeted metabolomics
J. Chromatogr. B. Analyt Technol Biomed Life Sci
(2008) - et al.
Potential of fermentation profiling via rapid measurement of amino acid metabolism by liquid chromatography–tandem mass spectrometry
J. Chromatogr. A
(2004) - et al.
Spatio-temporal distribution and natural variation of metabolites in citrus fruits
Food Chem
(2016) - et al.
Algorithms and tools for the preprocessing of LC–MS metabolomics data
Chemometr. Intell. Lab
(2011) - et al.
Chemometric methods in data processing of mass spectrometry-based metabolomics: a review
Anal. Chim. Acta
(2016) - et al.
MRMer, an interactive open source and cross-platform system for data extraction and visualization of multiple reaction monitoring experiments
Mol. Cell. Proteomics
(2008)
Specific sphingolipid content decrease in Cerkl knockdown mouse retinas
Exp. Eye Res
Spectral accuracy of molecular ions in an LTQ/Orbitrap mass spectrometer and implications for elemental composition determination
J. Am. Soc. Mass Spectrom
Targeted metabolomics and mass spectrometry
Adv. Protein Chem. Struct. Biol
Simultaneous screening for 238 drugs in blood by liquid chromatography–ionspray tandem mass spectrometry with multiple-reaction monitoring
J. Chromatogr. B
Separation and quantitation of water soluble cellular metabolites by hydrophilic interaction chromatography-tandem mass spectrometry
J. Chromatogr. A
Data processing for mass spectrometry-based metabolomics
J. Chromatogr. A
Quantitative analysis of the microbial metabolome by isotope dilution mass spectrometry using uniformly 13C-labeled cell extracts as internal standards
Anal. Biochem
A high-performance liquid chromatography-tandem mass spectrometry method for quantitation of nitrogen-containing intracellular metabolites
J. Am. Soc. Mass Spectrom
Isotope ratio-based profiling of microbial folates
J. Am. Soc. Mass Spectrom
Comparing two metabolic profiling approaches (liquid chromatography and gas chromatography coupled to mass spectrometry) for extra-virgin olive oil phenolic compounds analysis: a botanical classification perspective
J. Chromatogr. A
Metabolite profiling for plant functional genomics
Nat. Biotechnol
Metabolomics – the link between genotypes and phenotypes
Plant Mol. Biol
Innovation: metabolomics: the apogee of the omics trilogy
Nat. Rev. Mol. Cell Biol
Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers
Environ. Mol. Mutagen
Criteria for the use of omics-based predictors in clinical trials: explanation and elaboration
BMC Med
Foodomics: a new comprehensive approach to food and nutrition
Genes Nutr
Environmental metabolomics: a critical review and future perspectives
Metabolomics
Mass spectrometry based environmental metabolomics: a primer and review
Metabolomics
Complementary DNA sequencing: expressed sequence tags and human genome project
Science
Epigenomics: beyond CpG islands
Nat. Rev. Genet
Proteomics, transcriptomics: what's in a name?
Nature
Proteome and proteomics: new technologies, new concepts, and new words
Electrophoresis
Fluxomics – connecting ‘omics analysis and phenotypes
Environ. Microbiol
Metabolomics and fluxomics approaches
Essays Biochem
Emerging glycomics technologies
Nat. Chem. Biol
Foodomics: MS-based strategies in modern food science and nutrition
Mass Spectrom. Rev
Integrating multiple “omics” analysis for microbial biology: application and methodologies
Microbiology
Chromium interactions in plants: current status and future strategies
Metallomics
Metabonomics: a platform for studying drug toxicity and gene function
Nat. Rev. Drug Discov
Identification of metabolic pathways in Daphnia magna explaining hormetic effects of selective serotonin reuptake inhibitors and 4-nonylphenol using transcriptomic and phenotypic responses
Environ. Sci. Technol
A quantitative 1H NMR approach for evaluating the metabolic response of Saccharomyces cerevisiae to mild heat stress
Metabolomics
Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS
J. Exp. Bot
Mass spectrometry-based metabolomics
Mass Spectrom. Rev
Peer reviewed: so what's the deal with metabonomics?
Anal. Chem
Characterization of isotopic abundance measurements in high resolution FT-ICR and Orbitrap mass spectra for improved confidence of metabolite identification
Anal. Chem
High resolution “ultra performance” liquid chromatography coupled to oa-TOF mass spectrometry as a tool for differential metabolic pathway profiling in functional genomic studies
J. Proteome Res
Investigation of the advanced functionalities of a hybrid quadrupole orthogonal acceleration time-of-flight mass spectrometer
Rapid Commun. Mass Spectrom
Metabolomics applications of FT-ICR mass spectrometry
Mass Spectrom. Rev
High-resolution extracted ion chromatography, a new tool for metabolomics and lipidomics using a second-generation orbitrap mass spectrometer
Rapid Commun. Mass Spectrom
High resolution mass spectrometry for structural identification of metabolites in metabolomics
Metabolomics
Cited by (241)
LC-MS investigated as a tool to study the metabolomic characteristics of cereal fermentation
2024, Applied Food ResearchMass spectrometry for biomarkers, disease mechanisms, and drug development in cerebrospinal fluid metabolomics
2024, TrAC - Trends in Analytical ChemistryCellular metabolomics: From sample preparation to high-throughput data analysis
2024, Journal of Agriculture and Food Research