Supporting metabolomics with adaptable software: design architectures for the end-user

doi:10.1016/j.copbio.2016.11.001

Current Opinion in Biotechnology

Volume 43, February 2017, Pages 110-117

https://doi.org/10.1016/j.copbio.2016.11.001 Get rights and content

Highlights

•
Metabolomics needs sustained activity in software development for data processing.
•
Tools are available that allow rapid development of new software by the end-user.
•
Large amounts of content are available for reuse and repurposing.
•
Developers are encouraged to design code for reuse by metabolomics community.

Large and disparate sets of LC–MS data are generated by modern metabolomics profiling initiatives, and while useful software tools are available to annotate and quantify compounds, the field requires continued software development in order to sustain methodological innovation. Advances in software development practices allow for a new paradigm in tool development for metabolomics, where increasingly the end-user can develop or redeploy utilities ranging from simple algorithms to complex workflows. Resources that provide an organized framework for development are described and illustrated with LC–MS processing packages that have leveraged their design tools. Full access to these resources depends in part on coding experience, but the emergence of workflow builders and pluggable frameworks strongly reduces the skill level required. Developers in the metabolomics community are encouraged to use these resources and design content for uptake and reuse.

Graphical abstract

Introduction

Metabolomics has an established role in the molecular-level phenotyping of complex biological samples, as arguably the ‘omics’ with the greatest sensitivity to underlying biological change. As part of wider multiomics initiatives, metabolomics contributes a better understanding of how biological processes integrate. Long term benefits of this understanding will include new routes to therapy for complex disease states like cancer. Metabolomics will ultimately serve an important clinical role through the early detection of disease and the monitoring of treatment progression [1, 2].

Mass spectrometry is the main technology used for profiling metabolomes. The most comprehensive cataloguing efforts involve LC–MS systems as key data providers. These are supported by informatics packages designed to annotate and quantify metabolites from the MS data, assisted by external libraries and data sources. Systems development in metabolomics is an unfinished business [3]. The structural diversity of metabolites is enormous, and current LC–MS technologies struggle to offer data sufficiently rich to generate unambiguous identifications [4]. In addition, the dynamic range of metabolite concentration is extremely high and chemical noise can be difficult to distinguish from signal. The bioanalytical community is aggressively pursuing new concepts and methods to address these challenges, which places continuous pressure on bioinformatics packages to keep up.

There are excellent and extensive reviews that provide an up-to-date evaluation of established resources for LC–MS data processing and annotation in metabolomics [5, 6]. There is no further need to evaluate these packages. In this article, we will focus instead on informatics resources designed to develop new LC–MS software tools and organize innovative informatics content spread over multiple disciplines. One could argue that an explosion of tools is unfortunate for the end-user simply looking for the ‘best’ package, but the sheer variety of applications demands sustained software innovation. Providing the end-user with the necessary resources for high-quality bespoke software production is now possible with available development resources and practices.

Section snippets

Winds of change: standardizing LC–MS software development

The open data concept provides community access to information that is expensive to generate. In the context of metabolomics, public data can be mined for the creation of better compound libraries, and used to support more extensive biomarker discovery initiatives. Important strides have been made in developing standards for data and metadata (MSI and ISA-Tab) [7, 8], and establishing repositories like Metabolights to promote curation [9]. Producing standards during periods of active innovation

R/Bioconductor — XCMS

XCMS is the current standard for processing LC–MS metabolomics data, particularly of the untargeted variety [21]. It applies feature detection algorithms, chromatographic alignment and a degree of statistical analysis on the peaks and groups tables generated by the package. XCMS is written in the R statistical programming language and is command-line driven. It can be extended by capitalizing on the natural attributes of R, and the resources available through Bioconductor [22]. Bioconductor is

Workflow builders

R-based tools require programming skill in order to build complex, multi-step processing routines. On the other end of the development spectrum, a strategy for recycling existing tools involves the application of data-mining resources specifically created for workflow generation. These workflow builders have found use in many disciplines and industries interested in big data and analytics. Workflow builders are carefully designed for ease of configuration and sharing, with little coding skill

Developing in the general purpose compiled languages

Interpreted languages like R are powerful for their targeted specialty but lack the design flexibility and performance options of compiled languages and their associated frameworks. The latter can access a wide variety of tools for creating new software, and go one step further than workflow builders like Knime by offering customizable content. Recent developments offer powerful resources for both the skilled developer and the user with minimal code development experience.

Others

A review of this size makes it difficult to give full attention, at the technical level, to the other extensible software in the mass spectrometry community. Ultimately, any code is extensible. We have selected the foregoing to survey the styles of approach to the wider problem of flexible software development by the end-user. Some other initiatives that could benefit the metabolomics community are noteworthy in passing, however. Maltcms [47] is in the same class as MZmine 2 for Java

Conclusions and perspective

Our survey illustrates that the MS-focused bioinformatics community no longer needs to begin software development ‘from scratch’. It is not our intention to debate the best technologies for integrating the enormous algorithmic knowledgebase available to mass spectrometry. It is more important to recognize that tool integration in a standardized and robust environment is both possible and necessary, and multiple integration strategies are useful at this stage. The sharing of tools between

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

• of special interest
•• of outstanding interest

Acknowledgements

This work was supported by the University of Calgary and by a Discovery grant to DCS from the National Sciences and Engineering Research Council (NSERC) of Canada (grant # 298351-2010) (DCS). DCS acknowledges the additional support of the Canada Research Chair program, Alberta Ingenuity — Health Solutions and the Canada Foundation for Innovation.

References (51)

X. Xu et al.
Metabolomic profile for the early detection of coronary artery disease by using UPLC–QTOF/MS
J Pharm Biomed Anal
(2016)
N.G. Mahieu et al.
A roadmap for the XCMS family of software solutions in metabolomics
Curr Opin Chem Biol
(2016)
L.C. Crosswell et al.
ELIXIR: a distributed infrastructure for European biological data
Trends Biotechnol
(2012)
R.C. Gentleman et al.
Bioconductor: open software development for computational biology and bioinformatics
Genome Biol
(2004)
R. Pallares-Mendez et al.
Metabolomics in diabetes, a review
Ann Med
(2016)
I. Kohler et al.
Analytical pitfalls and challenges in clinical metabolomics
Bioanalysis
(2016)
T. Huan et al.
MyCompoundID MS/MS search: metabolite identification using a library of predicted fragment-ion-spectra of 383,830 possible human metabolites
Anal Chem
(2015)
B.B. Misra et al.
Updates in metabolomics tools and resources: 2014–2015
Electrophoresis
(2016)
S.A. Sansone et al.
The metabolomics standards initiative
Nat Biotechnol
(2007)
S.A. Sansone et al.
The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?."
OMICS
(2008)

N.S. Kale et al.

MetaboLights: an open-access database repository for metabolomics data

Curr Protoc Bioinform

(2016)

P. Rocca-Serra et al.

Data standards can boost metabolomics research, and if there is a will, there is a way

Metabolomics

(2016)

C.A. Ball

Are we stuck in the standards?

Nat Biotechnol

(2006)

V.J. Henry et al.

OMICtools: an informative directory for multi-omic data analysis

Database (Oxford)

(2014)

J. Ison et al.

EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats

Bioinformatics

(2013)

J. Malone et al.

The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation

J Biomed Semant

(2014)

G. Wilson

Software carpentry: lessons learned

F1000Res

(2014)

L. Martens et al.

mzML — a community standard for mass spectrometry data

Mol Cell Proteomics

(2011)

J.D. Holman et al.

Employing ProteoWizard to convert raw mass spectrometry data

Curr Protoc Bioinform

(2014)

D. Kessner et al.

ProteoWizard: open source software for rapid proteomics tools development

Bioinformatics

(2008)

Y. Perez-Riverol et al.

Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective

Biochim Biophys Acta

(2014)

C.A. Smith et al.

XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification

Anal Chem

(2006)

J. Xia et al.

MetaboAnalyst 3.0 — making metabolomics more meaningful

Nucleic Acids Res

(2015)

W.M. Edmands et al.

MetMSLine: an automated and fully integrated pipeline for rapid processing of high-resolution LC–MS metabolomic datasets

Bioinformatics

(2015)

J.T. Prince et al.

Chromatographic alignment of ESI-LC–MS proteomics data sets by ordered bijective interpolated warping

Anal Chem

(2006)

Cited by (9)

Harmonizing structural mass spectrometry analyses in the mass spec studio
2020, Journal of Proteomics
Citation Excerpt :
We are currently working on a new library extensibility point to further improve component development through linkages to repositories of curated spectral libraries. Software development in mass spectrometry is moving towards stronger practices [36,61], but, surprisingly, there are few platforms that enable both fast and robust application development. We built the Mass Spec Studio v2.0 framework according to best-practices in software architecture design to enable confident and quick reuse of individual components for new applications.
Structural Mass Spectrometry (SMS) provides a comprehensive toolbox for the analysis of protein structure and function. It offers multiple sources of structural information that are increasingly useful for integrative structural modeling of complex protein systems. As MS-based structural workflows scale to larger systems, consistent and coherent data interpretation resources are needed to better support modeling. Unlike the proteomics community, practitioners of SMS lack adequate computational tools. Here, we review new developments in the Mass Spec Studio: an expandable ecosystem of workflows for the analysis of complementary SMS techniques with linkages to modeling. Current functionality in the Studio (version 2) supports three major SMS workflows (crosslinking, hydrogen/deuterium exchange and covalent labelling) and two pipelines for structural modeling, with a special focus on data integration. The Mass Spec Studio is an architecture focused on rapid and robust extension of functionality by a community of developers.
This review surveys the new data analysis capabilities within the Mass Spec Studio, a rich framework for rapid software development specifically targeting the community of structural proteomics and structural mass spectrometry. Updates to crosslinking, hydrogen/deuterium-exchange and covalent labeling apps are provided as well as a utility for translating such analyses into restraints that support integrative structural modeling. These new capabilities, together with the underlying design tools and content, provide the community with a wealth of resources to tackle complex structural problem and design new approaches to data analysis.
Automation of mass spectrometric detection of analytes and related workflows: A review
2020, Talanta
Citation Excerpt :
For instance, in the clinical setting, this can be worrisome if the data is related to patients under disease diagnosis. Overall, researchers and software developers in the field of bioinformatics have developed numerous tools to accelerate the process of data processing in MS-based proteomics [192,217,267–270]. Although there exist numerous platforms for proteomic data analysis, their user-friendliness is of utmost importance so that novices can get used to these tools without extensive training.
The developments in mass spectrometry (MS) in the past few decades reveal the power and versatility of this technology. MS methods are utilized in routine analyses as well as research activities involving a broad range of analytes (elements and molecules) and countless matrices. However, manual MS analysis is gradually becoming a thing of the past. In this article, the available MS automation strategies are critically evaluated. Automation of analytical workflows culminating with MS detection encompasses involvement of automated operations in any of the steps related to sample handling/treatment before MS detection, sample introduction, MS data acquisition, and MS data processing. Automated MS workflows help to overcome the intrinsic limitations of MS methodology regarding reproducibility, throughput, and the expertise required to operate MS instruments. Such workflows often comprise automated off-line and on-line steps such as sampling, extraction, derivatization, and separation. The most common instrumental tools include autosamplers, multi-axis robots, flow injection systems, and lab-on-a-chip. Prototyping customized automated MS systems is a way to introduce non-standard automated features to MS workflows. The review highlights the enabling role of automated MS procedures in various sectors of academic research and industry. Examples include applications of automated MS workflows in bioscience, environmental studies, and exploration of the outer space.
Metandem: An online software tool for mass spectrometry-based isobaric labeling metabolomics
2019, Analytica Chimica Acta
Citation Excerpt :
MS-based metabolomics generates multidimensional datasets where thousands of features can be measured in a single instrument run, pressing significant challenges on data processing and analysis. Particularly for large-scale metabolomics studies, state-of-the-art analytical techniques must be paired up with proper bioinformatics software for automated and efficient data analysis [10–14]. MS-based metabolomic analysis is typically performed by label-free or stable isotope labeling approaches.
Mass spectrometry-based stable isotope labeling provides the advantages of multiplexing capability and accurate quantification but requires tailored bioinformatics tools for data analysis. Despite the rapid advancements in analytical methodology, it is often challenging to analyze stable isotope labeling-based metabolomics data, particularly for isobaric labeling using MS/MS reporter ions for quantification. We report Metandem, a novel online software tool for isobaric labeling-based metabolomics, freely available at http://metandem.com/web/. Metandem provides a comprehensive data analysis pipeline integrating feature extraction, metabolite quantification, metabolite identification, batch processing of multiple data files, online parameter optimization for custom datasets, data normalization, and statistical analysis. Systematic evaluation of the Metandem tool was demonstrated on UPLC-MS/MS, nanoLC-MS/MS, CE-MS/MS and MALDI-MS platforms, via duplex, 4-plex, 10-plex, and 12-plex isobaric labeling experiments and the application to various biological samples.
“What I wish I had known before starting my PhD”
2023, Analytical Science Advances
Tool and techniques study to plant microbiome current understanding and future needs: an overview
2022, Communicative and Integrative Biology
Ecometabolomics: Metabolic fluxes versus environmental stoichiometry
2019, Ecometabolomics: Metabolic Fluxes versus Environmental Stoichiometry

View all citing articles on Scopus

View full text

Supporting metabolomics with adaptable software: design architectures for the end-user

Highlights

Graphical abstract

Introduction

Section snippets

Winds of change: standardizing LC–MS software development

R/Bioconductor — XCMS

Workflow builders

Developing in the general purpose compiled languages

Others

Conclusions and perspective

References and recommended reading

Acknowledgements

J Pharm Biomed Anal

Curr Opin Chem Biol

Trends Biotechnol

Genome Biol

Metabolomics in diabetes, a review

Ann Med

Analytical pitfalls and challenges in clinical metabolomics

Bioanalysis

MyCompoundID MS/MS search: metabolite identification using a library of predicted fragment-ion-spectra of 383,830 possible human metabolites

Anal Chem

Updates in metabolomics tools and resources: 2014–2015

Electrophoresis

The metabolomics standards initiative

Nat Biotechnol

The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?."

OMICS

MetaboLights: an open-access database repository for metabolomics data

Curr Protoc Bioinform

Data standards can boost metabolomics research, and if there is a will, there is a way

Metabolomics

Are we stuck in the standards?

Nat Biotechnol

OMICtools: an informative directory for multi-omic data analysis

Database (Oxford)

EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats

Bioinformatics

The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation

J Biomed Semant

Software carpentry: lessons learned

F1000Res

mzML — a community standard for mass spectrometry data

Mol Cell Proteomics

Employing ProteoWizard to convert raw mass spectrometry data

Curr Protoc Bioinform

ProteoWizard: open source software for rapid proteomics tools development

Bioinformatics

Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective

Biochim Biophys Acta

XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification

Anal Chem

MetaboAnalyst 3.0 — making metabolomics more meaningful

Nucleic Acids Res

MetMSLine: an automated and fully integrated pipeline for rapid processing of high-resolution LC–MS metabolomic datasets

Bioinformatics

Chromatographic alignment of ESI-LC–MS proteomics data sets by ordered bijective interpolated warping

Anal Chem