Elsevier

Journal of Proteomics

Volume 73, Issue 3, 3 January 2010, Pages 562-570
Journal of Proteomics

A computational platform for MALDI-TOF mass spectrometry data: Application to serum and plasma samples

https://doi.org/10.1016/j.jprot.2009.11.004Get rights and content

Abstract

Background

Mass spectrometry (MS) is becoming the gold standard for biomarker discovery. Several MS-based bioinformatics methods have been proposed for this application, but the divergence of the findings by different research groups on the same MS data suggests that the definition of a reliable method has not been achieved yet. In this work, we propose an integrated software platform, MASCAP, intended for comparative biomarker detection from MALDI-TOF MS data.

Results

MASCAP integrates denoising and feature extraction algorithms, which have already shown to provide consistent peaks across mass spectra; furthermore, it relies on statistical analysis and graphical tools to compare the results between groups. The effectiveness in mass spectrum processing is demonstrated using MALDI-TOF data, as well as SELDI-TOF data. The usefulness in detecting potential protein biomarkers is shown comparing MALDI-TOF mass spectra collected from serum and plasma samples belonging to the same clinical population.

Conclusions

The analysis approach implemented in MASCAP may simplify biomarker detection, by assisting the recognition of proteomic expression signatures of the disease. A MATLAB implementation of the software and the data used for its validation are available at http://www.unich.it/proteomica/bioinf.

Introduction

Mass spectrometry (MS) has provided proteomics with promising new tools for the diagnosis of several clinical defined states [1], [2], [3], [4], [5], [6]. MALDI-TOF (matrix assisted laser desorption/ionization time-of-flight) and SELDI-TOF (surface enhanced laser desorption/ionization time-of-flight) mass spectrometers are high throughput and high sensitivity instruments able to analyze a biological sample, determining the relative abundance of many polypeptides associated with their mass/charge ratio (m/z) over a specific range [1], [6], [7]. These enabling technologies can be used for analyzing body fluids, as for example serum, plasma, and CSF [8], in order to detect changes in protein profiles. Indeed, one of the main topics of clinical proteomics is the discovery of biomarkers which can be used for the qualitative and quantitative assessment of a diseased status [2], [7]. In this scenario the molecular biomarkers signals the presence (or absence) of which reliably designates a particular biological state, as for example an early diseased condition. Typically two collections of samples are matched with the challenge of detecting the differences. Biomarker discovery is the process of identifying features that distinguish these two groups, starting from the acquired signals. The present period sees a considerable growth in techniques and instrumentation for mass-spectrometry in anticipation of biomarker discovery and clinical proteomics. For example, Lucid Proteomics System is the result of the collaborative effort between Bio-Rad Laboratories and Bruker Daltonics and it combines top–down approach of SELDI system and bottom–up approach of MS/MS instruments (www.lucidproteomics.com). Unfortunately, mass spectrometry keeps suffering from problems of sensitivity, under-sampling, and mostly reproducibility [7], [9]. In particular, signal intensity from acquisition systems is not exactly proportional to the examined protein concentration, due to ionization and ion competition processes [1]. This limitation represents a basic problem for large scale proteomic studies, particularly for comparative MS biomarker identification. Consequently, several bioinformatics processing methods have been developed for calibration [10], [11], noise reduction [12] and protein peak detection [13], [14], [15], [16], [17], with the purpose of reducing the uncertainty associated with the acquired mass spectra. The use of comparative methods across multiple samples can support the recognition of intra-group similarities and inter-group differences, reducing the effect of the typical MS limitations. For this reason, many computational methods and statistical tests able to determine significant differences between variables have been used [18], [19], [20]. Several strategies for comparative biomarker discovery have been proposed in the literature [7], [18], [19], [21]. Nevertheless, the discrepancy of the results obtained by research groups using different bioinformatics tools, even on the same data sets [2], suggests that the definition of a stable and reliable method has not been attained so far.

This paper describes Mass Spectrometry Comparative Analysis Package (MASCAP), an integrated platform of processing algorithms, statistical methods and graphical tools developed to discover robust and efficient biomarker candidates using MS data. In contrast to a large number of existing solutions, MASCAP can effectively process both MALDI- and SELDI-TOF mass spectra within the same analysis framework, with the possibility of tuning the analysis parameters according to the signal characteristics. In this work, MALDI-TOF data have been used to show all the processing and analysis steps necessary to discriminate significant differences in protein signal profiles between two different groups. The graphical tools represent the most important feature of the proposed platform, and are expected to provide a user-friendly interface for data analysis, as well as a straightforward general operability. Moreover, the implementation in an open environment architecture might allow adding new software packages, and upgrading the developed tools whenever necessary.

Section snippets

Materials and methods

MASCAP, designed and implemented in MATLAB 7.0 (The Mathworks, Natick, MA, USA), can be used for systematic global comparison and classification of complex biological samples, assisting and speeding the discovery of significant proteomic biomarker candidates. The MASCAP software code is provided in the supplementary material, as well as at http://www.unich.it/proteomica/bioinf.

A schematic overview of the MASCAP processing and analysis steps can be found in the supplementary Fig. 1. In brief,

Results

The preprocessing tools for baseline removal and smoothing of the MS spectra allow an improvement in terms of signal quality, both in case of MALDI- and SELDI-TOF data (supplementary Fig. 2), aiming at increasing the peak detection effectiveness. In Fig. 1 examples of the peaks detected for a MALDI-TOF mass spectrum are shown, as well as the non-uniform noise level that is used for noise-peak removal; unreliable peaks, which have been classified by means of the Kolmogorov–Smirnov test and

Discussion

Mass spectrometry is becoming the gold standard technique for biomarker discovery. It can be used to early diagnose disease through the collection of biological samples, which need to be analyzed by mass spectrometers in order to extract protein information [1], [2], [5], [6]. The identification and characterization of discriminatory protein peaks is a fundamental part of the biomarker discovery method, and can produce an accurate and effective diagnostic evaluation [2], [7]. However, it is

Acknowledgements

We are particularly grateful to Enzo Ballone and Gian Luca Romani for their continuous support and scientific discussion. Dante Mantini was supported by the Research Foundation Flanders (FWO mandate A 4/5 SDS). This work has been supported by the “Rete Nazionale di Proteomica”, Project FIRB RBRN07BMCT.

References (33)

  • S. Hu et al.

    Human body fluid proteome analysis

    Proteomics

    (2006)
  • X. Wang et al.

    Feature extraction in the analysis of proteomic mass spectra

    Proteomics

    (2006)
  • Y. Yasui et al.

    An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers

    J Biomed Biotechnol

    (2003)
  • N. Jeffries

    Algorithms for alignment of mass spectrometry proteomic data

    Bioinformatics

    (2005)
  • G.A. Satten et al.

    Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens

    Bioinformatics

    (2004)
  • R. Gras et al.

    Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection

    Electrophoresis

    (1999)
  • Cited by (28)

    • The application of fuzzy statistics and linear discriminant analysis as criteria for optimizing the preparation of plasma for matrix-assisted laser desorption/ionization mass spectrometry peptide profiling

      2015, Clinica Chimica Acta
      Citation Excerpt :

      In our study, we employed the MATLAB software package with the Bioinformatics Toolbox for manually processing the spectra, as well as for the selection and classification of the data sets. The software is highly suitable for developing a new computational platform for MALDI TOF MS data, as proven by Mantini et al. [35]. For statistical classification, we used discriminant analysis alongside the fuzzy theory statistic.

    • Molecular mechanisms behind the antimicrobial activity of hop iso-α-acids in Lactobacillus brevis

      2015, Food Microbiology
      Citation Excerpt :

      Dynamic exclusion was enabled and dynamic exclusion duration was set to 10 s. Spectra were subsequently exported with the Bruker Daltonik DataAnalysis tool version 4.0 as mascot generic format files (mgf). All measured fragment spectra of each precursor mass were merged into a consensus spectrum using a self-tailored version of “elab.m” (settings: minimum peak detection rate PDR = 0.4, mass tolerance was set according to the resolution of the mass spectrometer to ±0.1 m/z) from the MASCAP platform (Mantini et al., 2010). Fragment consensus spectra were imported into the MS Interpreter application of the NIST MS search tool for evaluation of acquired MS data.

    • Detection of acid and hop shock induced responses in beer spoiling Lactobacillus brevis by MALDI-TOF MS

      2015, Food Microbiology
      Citation Excerpt :

      Data processing for the identification of stress induced peaks was carried out according to Kern et al. as summarised in the following (Kern et al., 2014). An open sharedroot computer cluster (ATIX; http://opensharedroot.org), running a self-tailored MASCAP (Mantini et al., 2010) software application, implemented in octave (Eaton and Rawlings, 2003) was used to analyse spectra exported using FlexAnalysis 3.3 (Bruker Daltronics, Germany). Job control was conducted via a message passing interface (MPI) and BASH (http://www.gnu.org/software/bash) scripts were used to create software pipelines.

    • MALDI-TOF MS as evolving cancer diagnostic tool: A review

      2014, Journal of Pharmaceutical and Biomedical Analysis
      Citation Excerpt :

      Biological variability and heterogeneity in samples further complicate the MALDI-TOF MS-based biomarker discovery. In addition, robust computational methods are needed to minimize the impact of biological variability caused by unknown intrinsic biological differences [85–87]. In nearly all types of ionization processes including MALDI-TOF and MADI-TOF imaging, a phenomenon often referred to as “ion suppression” can occur.

    • Differentiation of Lactobacillus brevis strains using Matrix-Assisted-Laser-Desorption-Ionization-Time-of-Flight Mass Spectrometry with respect to their beer spoilage potential

      2014, Food Microbiology
      Citation Excerpt :

      A message passing interface (MPI) was applied for job control (Gabriel et al., 2004) and software pipelines were constructed by the use of BASH (http://www.gnu.org/software/bash) scripts. Peak processing and detection was carried out according to Mantini et al. (2010). The distance tolerance limit for peak alignment and clustering was set to 600 ppm.

    View all citing articles on Scopus
    View full text