Mass spectrometry of the M. smegmatis proteome: Protein expression levels correlate with function, operons, and codon bias

  1. Rong Wang,
  2. John T. Prince, and
  3. Edward M. Marcotte1
  1. Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, and Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, Texas 78712, USA

Abstract

The fast-growing bacterium Mycobacterium smegmatis is a model mycobacterial system, a nonpathogenic soil bacterium that nonetheless shares many features with the pathogenic Mycobacterium tuberculosis, the causative agent of tuberculosis. The study of M. smegmatis is expected to shed light on mechanisms of mycobacterial growth and complex lipid metabolism, and provides a tractable system for antimycobacterial drug development. Although the M. smegmatis genome sequence is not yet completed, we used multidimensional chromatography and tandem mass spectrometry, in combination with the partially completed genome sequence, to detect and identify a total of 901 distinct proteins from M. smegmatis over the course of 25 growth conditions, providing experimental annotation for many predicted genes with an ∼5% false-positive identification rate. We observed numerous proteins involved in energy production (9.8% of expressed proteins), protein translation (8.7%), and lipid biosynthesis (5.4%); 33% of the 901 proteins are of unknown function. Protein expression levels were estimated from the number of observations of each protein, allowing measurement of differential expression of complete operons, and the comparison of the stationary and exponential phase proteomes. Expression levels are correlated with proteins' codon biases and mRNA expression levels, as measured by comparison with codon adaptation indices, principle component analysis of codon frequencies, and DNA microarray data. This observation is consistent with notions that either (1) prokaryotic protein expression levels are largely preset by codon choice, or (2) codon choice is optimized for consistency with average expression levels regardless of the mechanism of regulating expression.

Footnotes

  • [Supplemental material is available online at www.genome.org. The mass spectrometry raw data from this study have been deposited in the Open Proteomics Database http://bioinformatics.icmb.utexas.edu/OPD, under accession nos. opd00007_MYCSM–opd00031_MYCSM. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: D. Graham.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3994105.

  • 1 Corresponding author. E-mail marcotte{at}icmb.utexas.edu; fax (512) 232-3432.

    • Accepted May 4, 2005.
    • Received August 24, 2004.
| Table of Contents

Preprint Server