Elsevier

Current Opinion in Biotechnology

Volume 43, February 2017, Pages 110-117
Current Opinion in Biotechnology

Supporting metabolomics with adaptable software: design architectures for the end-user

https://doi.org/10.1016/j.copbio.2016.11.001Get rights and content

Highlights

  • Metabolomics needs sustained activity in software development for data processing.

  • Tools are available that allow rapid development of new software by the end-user.

  • Large amounts of content are available for reuse and repurposing.

  • Developers are encouraged to design code for reuse by metabolomics community.

Large and disparate sets of LC–MS data are generated by modern metabolomics profiling initiatives, and while useful software tools are available to annotate and quantify compounds, the field requires continued software development in order to sustain methodological innovation. Advances in software development practices allow for a new paradigm in tool development for metabolomics, where increasingly the end-user can develop or redeploy utilities ranging from simple algorithms to complex workflows. Resources that provide an organized framework for development are described and illustrated with LC–MS processing packages that have leveraged their design tools. Full access to these resources depends in part on coding experience, but the emergence of workflow builders and pluggable frameworks strongly reduces the skill level required. Developers in the metabolomics community are encouraged to use these resources and design content for uptake and reuse.

Introduction

Metabolomics has an established role in the molecular-level phenotyping of complex biological samples, as arguably the ‘omics’ with the greatest sensitivity to underlying biological change. As part of wider multiomics initiatives, metabolomics contributes a better understanding of how biological processes integrate. Long term benefits of this understanding will include new routes to therapy for complex disease states like cancer. Metabolomics will ultimately serve an important clinical role through the early detection of disease and the monitoring of treatment progression [1, 2].

Mass spectrometry is the main technology used for profiling metabolomes. The most comprehensive cataloguing efforts involve LC–MS systems as key data providers. These are supported by informatics packages designed to annotate and quantify metabolites from the MS data, assisted by external libraries and data sources. Systems development in metabolomics is an unfinished business [3]. The structural diversity of metabolites is enormous, and current LC–MS technologies struggle to offer data sufficiently rich to generate unambiguous identifications [4]. In addition, the dynamic range of metabolite concentration is extremely high and chemical noise can be difficult to distinguish from signal. The bioanalytical community is aggressively pursuing new concepts and methods to address these challenges, which places continuous pressure on bioinformatics packages to keep up.

There are excellent and extensive reviews that provide an up-to-date evaluation of established resources for LC–MS data processing and annotation in metabolomics [5, 6]. There is no further need to evaluate these packages. In this article, we will focus instead on informatics resources designed to develop new LC–MS software tools and organize innovative informatics content spread over multiple disciplines. One could argue that an explosion of tools is unfortunate for the end-user simply looking for the ‘best’ package, but the sheer variety of applications demands sustained software innovation. Providing the end-user with the necessary resources for high-quality bespoke software production is now possible with available development resources and practices.

Section snippets

Winds of change: standardizing LC–MS software development

The open data concept provides community access to information that is expensive to generate. In the context of metabolomics, public data can be mined for the creation of better compound libraries, and used to support more extensive biomarker discovery initiatives. Important strides have been made in developing standards for data and metadata (MSI and ISA-Tab) [7, 8], and establishing repositories like Metabolights to promote curation [9]. Producing standards during periods of active innovation

R/Bioconductor  XCMS

XCMS is the current standard for processing LC–MS metabolomics data, particularly of the untargeted variety [21]. It applies feature detection algorithms, chromatographic alignment and a degree of statistical analysis on the peaks and groups tables generated by the package. XCMS is written in the R statistical programming language and is command-line driven. It can be extended by capitalizing on the natural attributes of R, and the resources available through Bioconductor [22]. Bioconductor is

Workflow builders

R-based tools require programming skill in order to build complex, multi-step processing routines. On the other end of the development spectrum, a strategy for recycling existing tools involves the application of data-mining resources specifically created for workflow generation. These workflow builders have found use in many disciplines and industries interested in big data and analytics. Workflow builders are carefully designed for ease of configuration and sharing, with little coding skill

Developing in the general purpose compiled languages

Interpreted languages like R are powerful for their targeted specialty but lack the design flexibility and performance options of compiled languages and their associated frameworks. The latter can access a wide variety of tools for creating new software, and go one step further than workflow builders like Knime by offering customizable content. Recent developments offer powerful resources for both the skilled developer and the user with minimal code development experience.

Others

A review of this size makes it difficult to give full attention, at the technical level, to the other extensible software in the mass spectrometry community. Ultimately, any code is extensible. We have selected the foregoing to survey the styles of approach to the wider problem of flexible software development by the end-user. Some other initiatives that could benefit the metabolomics community are noteworthy in passing, however. Maltcms [47] is in the same class as MZmine 2 for Java

Conclusions and perspective

Our survey illustrates that the MS-focused bioinformatics community no longer needs to begin software development ‘from scratch’. It is not our intention to debate the best technologies for integrating the enormous algorithmic knowledgebase available to mass spectrometry. It is more important to recognize that tool integration in a standardized and robust environment is both possible and necessary, and multiple integration strategies are useful at this stage. The sharing of tools between

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

This work was supported by the University of Calgary and by a Discovery grant to DCS from the National Sciences and Engineering Research Council (NSERC) of Canada (grant # 298351-2010) (DCS). DCS acknowledges the additional support of the Canada Research Chair program, Alberta Ingenuity  Health Solutions and the Canada Foundation for Innovation.

References (51)

  • N.S. Kale et al.

    MetaboLights: an open-access database repository for metabolomics data

    Curr Protoc Bioinform

    (2016)
  • P. Rocca-Serra et al.

    Data standards can boost metabolomics research, and if there is a will, there is a way

    Metabolomics

    (2016)
  • C.A. Ball

    Are we stuck in the standards?

    Nat Biotechnol

    (2006)
  • V.J. Henry et al.

    OMICtools: an informative directory for multi-omic data analysis

    Database (Oxford)

    (2014)
  • J. Ison et al.

    EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats

    Bioinformatics

    (2013)
  • J. Malone et al.

    The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation

    J Biomed Semant

    (2014)
  • G. Wilson

    Software carpentry: lessons learned

    F1000Res

    (2014)
  • L. Martens et al.

    mzML  a community standard for mass spectrometry data

    Mol Cell Proteomics

    (2011)
  • J.D. Holman et al.

    Employing ProteoWizard to convert raw mass spectrometry data

    Curr Protoc Bioinform

    (2014)
  • D. Kessner et al.

    ProteoWizard: open source software for rapid proteomics tools development

    Bioinformatics

    (2008)
  • Y. Perez-Riverol et al.

    Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective

    Biochim Biophys Acta

    (2014)
  • C.A. Smith et al.

    XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification

    Anal Chem

    (2006)
  • J. Xia et al.

    MetaboAnalyst 3.0  making metabolomics more meaningful

    Nucleic Acids Res

    (2015)
  • W.M. Edmands et al.

    MetMSLine: an automated and fully integrated pipeline for rapid processing of high-resolution LC–MS metabolomic datasets

    Bioinformatics

    (2015)
  • J.T. Prince et al.

    Chromatographic alignment of ESI-LC–MS proteomics data sets by ordered bijective interpolated warping

    Anal Chem

    (2006)
  • Cited by (9)

    • Harmonizing structural mass spectrometry analyses in the mass spec studio

      2020, Journal of Proteomics
      Citation Excerpt :

      We are currently working on a new library extensibility point to further improve component development through linkages to repositories of curated spectral libraries. Software development in mass spectrometry is moving towards stronger practices [36,61], but, surprisingly, there are few platforms that enable both fast and robust application development. We built the Mass Spec Studio v2.0 framework according to best-practices in software architecture design to enable confident and quick reuse of individual components for new applications.

    • Automation of mass spectrometric detection of analytes and related workflows: A review

      2020, Talanta
      Citation Excerpt :

      For instance, in the clinical setting, this can be worrisome if the data is related to patients under disease diagnosis. Overall, researchers and software developers in the field of bioinformatics have developed numerous tools to accelerate the process of data processing in MS-based proteomics [192,217,267–270]. Although there exist numerous platforms for proteomic data analysis, their user-friendliness is of utmost importance so that novices can get used to these tools without extensive training.

    • Metandem: An online software tool for mass spectrometry-based isobaric labeling metabolomics

      2019, Analytica Chimica Acta
      Citation Excerpt :

      MS-based metabolomics generates multidimensional datasets where thousands of features can be measured in a single instrument run, pressing significant challenges on data processing and analysis. Particularly for large-scale metabolomics studies, state-of-the-art analytical techniques must be paired up with proper bioinformatics software for automated and efficient data analysis [10–14]. MS-based metabolomic analysis is typically performed by label-free or stable isotope labeling approaches.

    • Ecometabolomics: Metabolic fluxes versus environmental stoichiometry

      2019, Ecometabolomics: Metabolic Fluxes versus Environmental Stoichiometry
    View all citing articles on Scopus
    View full text