Mass Spectrometry-based Workflow for Accurate Quantification of Escherichia coli Enzymes: How Proteomics Can Play a Key Role in Metabolic Engineering*

Metabolic engineering aims to design high performance microbial strains producing compounds of interest. This requires systems-level understanding; genome-scale models have therefore been developed to predict metabolic fluxes. However, multi-omics data including genomics, transcriptomics, fluxomics, and proteomics may be required to model the metabolism of potential cell factories. Recent technological advances to quantitative proteomics have made mass spectrometry-based quantitative assays an interesting alternative to more traditional immuno-affinity based approaches. This has improved specificity and multiplexing capabilities. In this study, we developed a quantification workflow to analyze enzymes involved in central metabolism in Escherichia coli (E. coli). This workflow combined full-length isotopically labeled standards with selected reaction monitoring analysis. First, full-length 15N labeled standards were produced and calibrated to ensure accurate measurements. Liquid chromatography conditions were then optimized for reproducibility and multiplexing capabilities over a single 30-min liquid chromatography-MS analysis. This workflow was used to accurately quantify 22 enzymes involved in E. coli central metabolism in a wild-type reference strain and two derived strains, optimized for higher NADPH production. In combination with measurements of metabolic fluxes, proteomics data can be used to assess different levels of regulation, in particular enzyme abundance and catalytic rate. This provides information that can be used to design specific strains used in biotechnology. In addition, accurate measurement of absolute enzyme concentrations is key to the development of predictive kinetic models in the context of metabolic engineering.

able resources through fermentation, building upon advances in metabolic engineering methods (4).
Metabolic engineering consists in optimizing the genetic and regulatory processes of the cell to produce large amounts of a given chemical. Metabolic engineering requires detailed omics analyses to identify targets that can be manipulated through targeted genetic modifications to improve and/or adapt cells (5).
Various factors can affect the yield of a genetically modified cell factory, including consumption of reaction intermediates by native metabolic pathways, control of mRNA levels, regulation of small RNA or final enzyme abundances and activities. To finely tune metabolic pathways, the different levels of regulation within the cell must be identified and their contribution to overall flux control understood. Systems biology approaches aim to provide this type of information. For instance, metabolic networks can be converted to constraintbased models (CBMs), which allow metabolic fluxes to be predicted on a genome-wide scale (6) (7). As described by Feist and Palsson (8), the construction of such genome-scale models can be divided into four key steps: 1-generation of high-throughput datasets (omics data), 2-genome-wide network reconstruction, 3-conversion of network reconstruction into a CBM, and 4-application of computational models for metabolic engineering.
For quantitative data on protein levels, enzyme-linked immuno-sorbent assay (ELISA) (9) (10) has long been considered the gold standard, because of its high sensitivity and throughput; although limitations include labor intensiveness and the high cost of assay development (11). As detailed by Bantscheff et al. (12), proteomic approaches and instrument advances in recent years have made quantitative liquid chromatography-mass spectrometry (LC-MS) techniques a viable alternative for generating data on protein levels. Indeed, using appropriately labeled standards, MS analyses can provide highly specific and accurate data. MS is also adaptable for highly multiplexed analyses, in contrast with conventional immunoassays. Selected Reaction Monitoring (SRM), by conferring two levels of selection, filtering predefined peptide ions and specific fragment ions (13), makes it possible to detect and quantify several proteins with high sensitivity and specificity in a single assay. Combined with SRM analysis, stable isotope dilution (SID) makes absolute protein quantification feasible. SID in proteomics consists in spiking a known amount of isotopically labeled peptide or protein standards into the sample. Standards and endogenous peptide species can be distinguished in MS spectra because of their slight and predictable mass difference. In contrast, the physical and chemical properties of the different species are identical, resulting in co-elution during the LC run. Accurate information on the amount of spiked standard allows protein concentration to be deduced from the ratio of signal intensities between the heavy standard and the corresponding endogenous peptide (14). Different standards have been described, such as labeled peptide standards (Absolute Quantification standards AQUA TM ), which are chemically synthesized isotope-labeled peptides. These are spiked into samples just before MS analysis (15). Alternatively, Quantitative concatemer standards (QconCAT) are artificial concatemers of labeled peptides that can be spiked into samples before trypsin digestion (16). With both these approaches, the efficiency of trypsin digestion is critical, and because prefractionation steps are often necessary, the accuracy of protein concentration measurements might be biased. Protein Standard Absolute Quantification (PSAQ TM ) limits these biases, because these full-length isotopically labeled standards are expected to behave in the same way as the endogenous target proteins throughout sample preparation (17). Studies by Brun et al. (18,19) demonstrated new possibilities for quantitative MS, specifically in the context of clinical biomarkers detection, when targeted mass spectrometry is combined with PSAQ standards to enhance the accuracy of quantification. Although more accurate, the PSAQ strategy requires the synthesis of full-length recombinant proteins in a labeled form and may present specific difficulties.
As outlined above, to generate genome-scale models some prior biological knowledge is required. This knowledge is typically retrieved from omics experiments. We show here that an MS-based pipeline allows generating high-throughput accurate quantitative proteomics data, which can be used in combination with genome-scale models to guide choices in the design of cell factories in metabolic engineering. The study focused on Escherichia coli (E. coli), which is one of the most widely used cell factory platforms, and has been extensively studied in systems biology and used to develop very detailed mathematical models (8). We particularly targeted central carbon metabolism (CCM) because it is involved in transport and oxidation of the main carbon sources, and all the natural metabolites produced by E. coli are derived from 12 precursor metabolites, which are either CCM intermediates or co-factors like ATP, NADH, and NADHP (20). This makes CCM an essential crossroad for metabolic engineering (21).
We chose to use a PSAQ-like strategy to measure the concentrations of enzymes involved in E. coli CCM with high accuracy in a multiplexed assay. To do this, we optimized a system producing full-length 15 N-labeled standards. To be fully compatible with high-throughput screening, no prefractionation steps were carried out on the E. coli proteome, and SRM analyses were performed using scheduled mode to allow multiplexed analyses (22). Analytical performances were tested using titration curves. The so-developed scheduled-SRM assay, monitoring 720 transitions during a single LC-SRM run, was used to accurately quantify 22 key enzymes involved in CCM in E. coli ( Fig. 1 and Table I). Using this workflow (Fig. 2), we investigated a wild-type reference E. coli strain and two genetically modified strains with an increased NADPH/NADP ϩ ratio (23). Enzyme concentration measure-ments were combined with previously obtained metabolic flux measurements for calculating effective or apparent catalytic rates of the enzymes (24) (25). The results show that our approach is applicable for metabolic engineering purposes, and is particularly useful for analyzing whether a change in flux is because of a change in the concentration or to the catalytic rate of an enzyme. We expect these data and this methodology to be very useful in developing kinetic models of metabolism.
Construction of Expression Strains-Genes coding for the proteins of interest were PCR-amplified from MG1655 gDNA using forward and reverse primers designed with both Vector NTI® and Geneious® software. List of primers is provided in supplemental Data File S1. PCR products were purified using nucleospin® extract II kit and digested with the restriction enzymes indicated in supplemental Data File S1. Fragments were then ligated into pPAL7, a fusion vector FIG. 1. Overview of the central carbon metabolism network in Escherichia coli. The main pathways of central carbon metabolism studied in this work are shown, including glycolysis, the tricarboxylic acid pathway (TCA), the glyoxylate shunt, and the pentose phosphate pathway (PPP). Black squares indicate major metabolic intermediates. Gray boxes highlight the twenty-two enzymes targeted in this work.
carrying an ampicillin-resistance gene, and a Profinity exact tag® (for details of the protocol, see the Bio-Rad supplier notice).
DH5␣ E. coli strains were transformed and plasmids were purified using the Nucleospin® plasmid kit. Presence of the fragment was checked by double digestion and analysis after electrophoresis on a 0.8% agarose gel. Fragments were then sequenced (GATC biotech, Konstanz, Germany). E. coli expression strains were stored at Ϫ80°C in LB broth containing 20% glycerol (v/v).
Study Strains and Culture Conditions-Genetically modified and wild-type E. coli strains were obtained from Auriol et al. (23), their genotypes are described in Table II. Cultures were performed in minimal media containing 5 g/L glucose (see below). Cells were harvested during the exponential growth phase, based on OD at 600 nm.
Production and Purification of Full-length 15  For pre-culture, minimal media containing 10% LB was inoculated with the over-expressing strains of interest. Culture in minimal media was inoculated with pre-cultures to obtain an initial OD 600 of ϳ 0.2. Protein expression was induced with 1 mM IPTG, and cells were grown to OD 600 ϳ 4.
Cultures were centrifuged 5 min at 8000 g, supernatant was discarded, and cells were washed in the same volume of fresh PBS. Cells were resuspended in 100 mM KPO 4 , pH 7.6 buffer for sonication. Lysates were centrifuged 10 min at 12,000 g and supernatant was filtered through 0.22 m membranes.
Protein Purification Conditions and Verification of 15 N isotope Incorporation-Proteins were purified using 5 ml Profinity eXact car-tridges (Bio-Rad®) according to the supplier protocol. The duration of incubation in sodium fluoride (100 mM, pH 7.6) was adapted for each protein to optimize tag cleavage. For the proteins expressed here, between 2 h at room temperature and 12 h at 4°C were necessary.
Purified proteins were either stored lyophilized or in 25 mM NH 4 HC0 3 10% glycerol (v/v) at Ϫ80°C. 15 N incorporation into recombinant proteins was determined on peptides after digestion of 15 N labeled proteins, as described below. LC-MS/MS analysis was performed in enhanced resolution data acquisition mode on a 4000 QTRAP instrument (ABSciex, Foster City, CA), as described below. Simulated isotopic distributions with varying 15 N incorporation were generated for selected 15 N tryptic peptides using Isopro® software (https://sites.google.com/site/isoproms/ home). The experimental isotope incorporation rate was determined by comparing simulated and experimental isotopic distributions using a least-squares regression method (supplemental Data File S2).
Amino Acid Analysis (AAA) for Standard Titration-Standards were calibrated using the AAA-MS (Amino Acid Analysis coupled to Mass Spectrometry) approach described by Louwagie et al. (26). Briefly, the protein standard (1 g, previously titrated by UV absorbance at 280 nm) was mixed with 1 g of NIST bovine serum albumin standard. Then microwave-assisted acidic hydrolysis was performed using the Discover protein hydrolysis apparatus (CEM, Matthews, NC).
Samples were solubilized in 20 l of 50% acetonitrile, 0.1% formic acid. The autosampler from an HPLC-Ultimate 3000 (Dionex, Voisins Le Bretonneux, France) was used to inject 10 l of each sample (in a 20 l injection loop) at a flow rate of 6 l/min (mobile phase: 50% acetonitrile, 0.1% formic acid) directly into the nanosource of an LTQ-Orbitrap XL (Thermo Fisher Scientific). MS signal intensity was averaged for each amino acid over 2.5 min. The intensity of each amino acid in the standard and the analyte were used to calculate quantities. For more details, see (26).
Trypsin and LysC Digestion-Proteins or E. coli lysates were solubilized in 8 M urea, 50 mM NH 4 HC0 3 buffer, reduced for 30 min at For MS/MS experiments, peptides were analyzed on a 4000 QTRAP hybrid triple quadrupole mass spectrometer (ABSciex, Les Ulis, France) operated using a Turbo V source in Electrospray Ionisation (ESI) mode, operated in "Information Dependent Acquisition" (IDA) mode under Analyst version 1.5.1 software (AB SCIEX). A precursor ion scan between m/z 400 -1400 was performed as a survey scan for the IDA method. Enhanced product ion (EPI) spectra were acquired with a scan speed of 4000 amu/s using a dynamic fill time for optimal MS/MS and rolling collision energy settings. Precursor and product ion accuracies were 1.2 and 0.6 Da, respectively, which is in line with the specifications of the 4000 QTRAP.
Data were acquired in positive mode with the ion spray voltage at 5500 V, curtain gas 15 (arbitrary units), interface heater temperature 350°C. Collision exit, entrance and declustering potential were set to 27, 12, and 50 volts, respectively.
For SRM assays, collision energy was calculated using a linear equation based on the manufacturer's recommendations: CE (volts) ϭ 0.44 ϫ m/z ϩ 4 for doubly charged precursors CE (volts) ϭ 0.5 ϫ m/z ϩ 5 for triply charged precursors During SRM assay for method development, the mass spectrometer was operated in SRM mode, with a dwell time of 20 ms and a maximum of 100 transitions. After method validation, the instrument was operated using scheduled SRM mode (retention time windows: 60 s; target scan time: 2.5 s). A 30 s base width was estimated, and twelve points were acquired per chromatographic peak.
Design of SRM Assays and MS/MS Identifications-A set of twenty-two proteins was selected from among proteins involved in the central metabolic pathways in E. coli (Fig. 1). Each purified protein standard, in its labeled and unlabeled forms, was reduced, alkylated and digested, according to the protocol described above. LC-MS/MS analysis was performed on a 4000 QTRAP mass spectrometer. Peak lists (mgf files) were generated using Analyst version 1.5.1 software (AB SCIEX). MS/MS spectra were assigned to peptides using a sequence database search strategy. Mascot (version 2.4).was used as the search engine. We generated a home-made Ecoli_K12 protein sequence data bank (12548 sequences) from UniprotKB (http:// www.uniprot.org, release July 28, 2011) to retrieve proteins with 83333 Escherichia coli taxonomy (strain K12) (http://ebi3.uniprot.org/ uniprot/?queryϭtaxonomy%3a83333&formatϭ*). Trypsin was set as the enzyme and 1 missed-cleavage was allowed; cystein carbamidomethyl was set as a fixed modification whereas mono and dioxidation of methionine was set as a variable modification. Mass tolerance for precursors and fragments ions was 1.2 and 0.6 Da  2. Accurate quantification workflow to analyze enzymes involved in Escherichia coli central metabolism. Stable isotope dilution combined with an MS-based analytical strategy was developed using full-length 15 N labeled standards to ensure accuracy. To ensure a straightforward workflow, no prefractionation steps were carried out. Selected Reaction Monitoring (SRM) was used for optimal selectivity and sensitivity.
respectively. Peptide charge was set at 1, 2 and 3 and instrument was ESI-TRAP. Identification results file (.dat) may be downloaded from Peptide Atlas repository (http://www.peptideatlas.org/PASS/ PASS00273). MS/MS data were used to retrieve the best peptides for each of the 22 purified proteins in order to build SRM transition lists. Consequently, no specific cut-off score was set. Mascot results files were used to build Skyline® libraries. Lists of SRM transitions for both light and heavy versions of selected peptides were generated using Skyline® software (27). Proteotypic peptides were selected, and the sequence was confirmed to be unique using the BLAST program. In line with Jaquinod et al. (28), five cysteine-containing peptides were retained in the transition list. In contrast, methionine-containing peptides were not included. The retention times for the selected peptides were examined, and peptides were selected to provide a homogeneous distribution across the gradient. This resulted in selection of at least three detectable proteotypic peptides for each protein targeted in this study. For each selected peptide, three transitions were selected. The higher fragments in the acquired MS/MS spectra were preferentially chosen for setting up transitions, unless a similar transition exists for another protein. Finally, SRM transitions were validated by verifying that the retention time was identical for both heavy and light versions. Transition lists may be downloaded from Peptide Atlas repository (http://www.peptideatlas.org/PASS/PASS00272).
Quantitative Analysis-Quantitative SRM analyses were performed using MultiQuant software (version 1.2, AbSciex). The MultiQuant value for noise levels was 40%, and 2 min for the base-line subtraction window. All data were manually inspected to ensure correct peak detection and accurate integration. Data may be downloaded from Peptide Atlas repository (http://www.peptideatlas.org/PASS/ PASS00272).
The heavy area/light area (H/L) ratio was determined for each SRM transition, and peptide ratios were calculated by averaging ratios for all the transitions of a given peptide. The ratios obtained for the all the different proteotypic peptides from a given protein were averaged to determine the protein ratio. Standard deviation values and CVs were calculated at the peptide and protein levels considering the technical and the biological replicates (see supplemental Data File S8). Absolute quantification was obtained from the heavy/light protein ratio multiplied by the amount of standard spiked into the sample at the beginning of the pipeline. The resulting enzyme quantifications were converted to concentrations (mmol enzyme ⅐ml cyto Ϫ1 ) and number of protein copies per cell (copies/cell) based on literature data (29) (30), as described in supplemental Data File S3.
Calculation of (differences in) Effective Catalytic Rates-In steadystate growth conditions, the flux J (mmol metabolite ⅐ml cyto Ϫ1 ⅐h Ϫ1 ) through a metabolic reaction can be written as J ϭ E ⅐ k eff , where E is the enzyme concentration (mmol enzyme ⅐ml cyto Ϫ1 ) and k eff is the effective or apparent catalytic rate (mmol metabolite ⅐ mmol enzyme Ϫ 1 ⅐h Ϫ1 ), defined similarly as in two recent papers (24) (25). k eff was calculated from measurements of E and J for each enzyme in the three strains studied, as explained in supplemental Data File S3. Note that fluxes in the NA23 and NA176 strains relative to the MG1655 reference strain, J/J MG , can be split into the product of the relative enzyme concentrations E/E MG and the relative effective catalytic rates k eff /k eff-MG : . Flux and enzyme concentration ratios were determined from the data for both the NA23 and NA176 strains, and relative effective catalytic rates were calculated by dividing these ratios.

RESULTS
In Cell Production of 15 N Labeled Protein Standards-The aim of this study was to use a proteomics quantitative pipeline, to ensure accurate data for analyzing the contributions of enzyme concentrations and effective catalytic rates to observed flux differences between strains and for building predictive kinetic models. To begin, central metabolism pathways in E. coli were chosen, including glycolysis, gluconeogenesis, the glyoxylate shunt, the pentose phosphate pathway (PPP) and the tricarboxylic acid pathway (TCA) (Fig. 1). We chose to target key enzymes in each pathway, selecting glyceraldehyde-3-phosphate dehydrogenase (GapA), involved in glycolysis; glucose-6-phosphate dehydrogenase (Zwf), part of the PPP; and citrate synthase (GltA) from the TCA pathway, among others (Table I).
Twenty-two full-length 15 N labeled standards were produced in transformed E. coli and purified. To ensure that labeled standards were produced in sufficient quantities at minimal substrate cost, growth conditions were optimized to minimize (NH 4 ) 2 SO 4 concentration. The purification system was chosen to ensure fast and efficient target purification. With the Bio-Scale™ Mini Profinity eXact™ cartridge, purification and tag cleavage are performed in a single step, which reduces multi-step losses (31). SDS-PAGE analysis confirmed the absence of major contaminants or degradations (supplemental Data File S4). For each 15 N standard, it was also necessary to check the isotope incorporation rate. This is critical for generating accurate data. Incorporation rates were evaluated using Isopro® software and found to be 98.5% for all peptides (supplemental Data File S2). This agrees well with the purity of the ( 15 NH 4 ) 2 SO 4 used (98.5%). Light versions of standards were also produced to establish titration curves. Standard titration was also a critical step because it determines assay accuracy. To limit the amount of standards and time required, AAA-MS was performed, as described by Louwagie et al. (26). In contrast with [ 13 C 6 , 15 N 2 ]-L-lysine and [ 13 C 6 , 15 N 4 ]-L-arginine standards, full-length 15 N labeled standards can be titrated using all twenty amino acids; amino acids which are known to undergo chemical modifications were nonetheless excluded. For instance, asparagine and glutamine are subject to deamidation, producing aspartate or glutamate; whereas cysteine, methionine, and tryptophan may be modified through uncontrolled oxidation processes. We therefore chose five amino acids to determine the concentration of the standards produced, these were: alanine, isoleucine/leucine, phenylalanine, proline, and valine. Results are expressed as the mean concentration for these five amino acids. Each standard was assayed four times in technical replicates. The coefficients of variation (CVs) calculated for these assays were between 4 and 12% (supplemental Data File S5).
For each standard, between 2 and 5 milligrams of protein were obtained per 100 ml culture. We therefore set up a pipeline to produce large amounts of labeled protein standards while limiting substrate costs. Standards were stored at Ϫ80°C prior to further analyses.
Designing a Scheduled SRM Assay for Key Enzymes in E. coli Central Carbon Metabolism-To set up the SRM assay for key enzymes involved in E. coli CCM, a list of peptides and fragment ions to be monitored must first be carefully defined to ensure the specificity of analyses. Skyline® software (27) was recently developed to automate this step. This software requires two inputs: protein sequences and spectral libraries for the target proteins. LC-MS/MS analyses were performed after LysC and trypsin digestion of light and heavy proteins, and MS/MS data were used to create the Skyline® library. SRM methods were automatically generated, with three proteotypic peptides per protein, except Eda, which is a small protein producing only one detectable peptide. Three transitions were monitored per peptide. All the peptides selected were proteotypic peptides, i.e. of unique sequence in the E. coli database and with good ionization properties (32). The final list for the SRM assay consisted of 720 SRM transitions (supplemental Data File S6).
To monitor these 720 SRM transitions, scheduled SRM analyses (22) were performed. Parameters were carefully tuned to optimize the limit of detection for all transitions. It was necessary to find a trade-off between the number of transitions to be monitored, peak resolution and sensitivity of the analysis. We chose to analyze samples on a micro-LC system with a flow rate at 50 l/min. This configuration was chosen because it results in highly reproducible retention times (ϳ10 s shift) allowing a retention time window of 60 s to be used with confidence. Target scan time was set to 2.5 s, allowing ϳ12 points per chromatographic peak, which is the peak definition recommended for quantification (33).
Thus, chromatographic conditions were optimized to ensure highly multiplexed capabilities, making it possible to monitor 720 transitions during a single 30-min effective gradient. Thus, 22 enzymes could be effectively and accurately quantified in a single analysis.
Assay Validation in Terms of Accuracy, Precision, and Linearity Range-The analytical performances of the quantification method were checked using titration curves. This involved adding increasing amount of light standard proteins to an E. coli lysate while adding a constant amount of heavy standard to each sample. The light version of proteins was added in amounts ranging from the lowest endogenous amount, to assess signal response linearity from basal to the highest potential levels, as advocated by the Food and Drug Administration. (http://www.fda.gov/downloads/Drugs/GuidanceCompliance RegulatoryInformation/Guidances/UCM070107.pdf).
A preparation workflow was applied, including LysC digestion in denaturing conditions (2 M urea), followed by trypsin digestion. No prefractionation steps were applied. SRM analyses were performed to determine the heavy/light ratio and thus estimate the amount of light protein present in samples (supplemental Data File S7). Estimated light protein amounts were plotted against the quantities added. All titration points were performed as full-process triplicates. This experiment thus allows us to determine three key parameters of the assay: 1. accuracy of the method, which is indicated by the slope of the curve; 2. precision of the method, which is indicated by the error bars corresponding to the CVs for technical replicates; 3. the range of the linear response (R 2 ). The results indicate that our method is accurate, precise and that the signal response is linear over the range tested (Fig. 3) ( Table III). The slope of titration curves was between 0.82 (Eda) and 0.98 (PpsA), and in all cases the linear regression coefficient was higher than 0.998. The median CV for technical replicates was 2.9%, with 90% of replicates having a CV of less than 4.1%. It can be noted that, for a given protein, inter-peptide CVs were less than 20% (Supplemental data 7; note that the CV could not be determined for Eda because only one peptide was monitored).
The LLOQ is an important measure of performance as it defines the lowest analyte concentration that can be accurately measured; it thus represents a genuine measure of how sensitive an assay is. A signal-to-noise ratio of 10 is frequently used to define the LLOQ (34). However, with scheduled SRM mode, background noise is extremely low, and there is no easy and/or practical way to accurately measure co-eluting noise. We therefore chose, in line with Kuzyk et al. (35) (36), to determine the LLOQ of the assay empirically. Thus, the lowest analyte concentration that can be measured with a CV Ͻ 20%, and an accuracy between 80 and 120% in the linear response range is defined as the LLOQ. LLOQs were defined for each assay using data from technical triplicates for each concentration point in titration curves. The lowest point of each titration curve was the endogenous level present in the lysate (from MG1655, NA0023 or NA0176 strain), corresponding to the lowest abundance of target protein. At endogenous levels, CVs above 20% were only reached for Eda, Glk and TktB (34%, 33%, and 34%, respectively). The LLOQ considered for these three proteins was therefore the amount of light protein spiked into point 1 (P1) for titration curves (supplemental Data File S7). The LLOQs determined for each of the twenty-two protein assays are indicated in Table III.
The titration results demonstrate that our analytical pipeline is accurate, precise and linear; we therefore went on to use the labeled standards described and the associated SRM assay to measure enzyme concentrations in selected E. coli strains.

Absolute Quantification of Twenty-two Proteins in MG1655 E. coli Lysates and in Genetically Modified Strains: Assessing
Workflow Efficiency-To verify the benefits of using labeled protein standards to determine E. coli enzyme concentrations, we first applied our SRM assay to the MG1655 wildtype reference E. coli strain (Fig. 2). Twenty-two standards were spiked into E. coli lysate. The rapid sample preparation workflow, consisting in LysC followed by trypsin digestion, was then applied, without sample decomplexification. Scheduled SRM analyses were performed as described above. Heavy/light ratios were determined, and the endogenous concentrations of each protein were calculated. Three biological samples were prepared for each strain, and each biological sample was analyzed twice in full-process technical replicates.
Absolute quantification was obtained for the 22 enzymes. The least abundant protein, PpsA, was quantified at 491 copies per cell, although the most abundant, GapA, was measured at 21,119 copies per cell. The biological CVs for these concentrations were 1.8% and 4.7%, respectively (Table IV), whereas technical CVs never exceeded 7%. For absolute quantification of the wild-type E. coli (MG1655) strain, we scanned 720 transitions during a single 60 min LC-SRM analysis (30 min effective gradient) (supplemental Data File S8). The results obtained for the wild-type strain confirmed that this workflow is robust, efficient and has a high multiplexing potential, while generating accurate, high-throughput data.
In light of these results, the pipeline was used to measure protein concentrations in two strains of biotechnological interest, both genetically engineered to produce high levels of NADPH (NA23 and NA176). NADPH is involved in many biosynthesis pathways, making it a potentially limiting cofactor when synthetic pathways are engineered. As a consequence, strains producing high levels of NADPH are interesting for metabolic engineering purposes (37). The gene pgi, coding for glucose-6-phosphate-isomerase, is deleted in the NA23 and NA176 strains. Hence, we determined the abundance of 21 of the 22 enzymes studied in the wild-type strain. The absolute quantities obtained from biological triplicates were expressed as a number of copies of each protein per cell (Table IV and Fig. 4). The results presented in Fig. 4 distinguish between the most abundant enzymes in each of the three strains, and make it possible to compare variations in abundance between strains. The dynamic range for these measurements was around 10 3 , because the least abundant protein, PpsA, was estimated to be present at 285 copies per cell (CV 0.7%) in strain NA176, although the most abundant protein, AceA, was estimated to be present at 58,843 copies per cell (CV 9.2%). The coefficients of variation for technical replicates never exceeded 15% for these assays and 90% of technical CVs were below 2.7% (supplemental Data File S8). For instance, for strain NA23 (biological replicate 1), the copy numbers per cell measured for Eda and AceA were 314 Ϯ 21 and 52,650 Ϯ 684, with technical CVs of 6.8% and 1.3%, respectively. This confirms the precision of the methodology right across the dynamic range.
We conclude from these results, using both technical and biological triplicates, that the workflow developed is reproducible, and can be used to generate accurate and multiplexed data at high-throughput.
Determining Effective Catalytic Rates Using Accurate Quantitative Proteomics Data and Metabolic Flux Measurements-One of the main objectives of metabolic engineering is to change metabolic flux distributions in microorganisms so as to optimize the production of compounds of biotechnological interest (38). Metabolic fluxes are controlled at different levels, FIG. 3. Analysis of PpsA, using full-length 15 N labeled standard and scheduled SRM analysis. A, Total extracted ion chromatogram for PpsA. Six peptides were monitored for this enzyme, with at least three transitions per peptide. Zoom shows peptide IEDVPQEQR (retention time 12.6 min) m/z of light peptide is 557, and m/z of heavy peptide is 564. Three transitions were monitored for each version of the peptide. B, Titration curve for protein PpsA, monitoring 6 peptides and at least 3 transitions per peptide. Increasing amounts of light protein were added to samples along with a constant amount of labeled standard. The preparation workflow was applied, combining LysC and trypsin digestion, without prefractionation steps. SRM analyses were performed to determine the heavy/light ratio and thus estimate the light protein concentration. Estimated amounts of light protein were plotted against the quantities added. All titration points were performed in full-process triplicates. Linear regression was applied, and coefficients of variation are a measure of reproducibility between full-process technical triplicates.
including enzyme concentrations, kinetic parameters defining enzyme properties (K m and k cat in irreversible Michaelis-Menten kinetics), and the concentrations of substrates, products, and co-factors (39). We used a simple phenomenological model of metabolic fluxes, separating the effects of the abundance and the catalytic rate of the enzymes. The model makes the common assumption that metabolic fluxes are proportional to enzyme concentrations (40). Thus, J ϭ E ⅐ k eff , where J denotes metabolic flux, E enzyme concentrations, and k eff the so-called effective catalytic rate. The effective catalytic rate describes how the maximum catalytic rate k cat is modulated by concentrations of substrates and products as well as co-factors regulating the catalytic action of the enzyme. As a consequence, k eff is not a constant enzyme property, like k cat , but depends on both enzyme properties and condition-varying metabolite concentrations. By definition, the effective catalytic rate is smaller than the maximum catalytic rate, i.e., k eff Յ k cat (supplemental Data File S3). The model J ϭ E ⅐ k eff can be used even when exact, in vivo values for the kinetic parameters are missing (which is often the case).
Absolute values were available for two of the three variables in the flux equation for the network of Fig. 1, for all three strains: the metabolic flux distribution, as described by Auriol et al. (23), and the enzyme concentrations from the SIDscheduled SRM (scSRM) analysis presented in this study. The effective catalytic rate could thus be computed from k eff ϭ J/E, after converting the experimental data to comparable units (supplemental Data File S3). This idea can be used to gain a deeper insight into how flux differences across strains are related to relative concentrations and effective catalytic rates of enzymes, by comparing the wild-type reference strain MG1655 with strains NA23 and NA176. The metabolic flux in NA23 or NA176 relative to MG1655 (J/J MG ) equals the product of the relative enzyme concentrations (E/E MG ) and the relative catalytic rates (k eff /k eff-MG ). For example, a twofold increase in metabolic flux in strain NA23 (J NA23 /J MG ϭ 2) could be because of a fourfold increase in enzyme concentration (E NA23 /E MG ϭ 4) and a twofold decrease in its effective catalytic rate (k eff-NA23 /k eff-MG ϭ 0.5).
We measured the relative metabolic fluxes and enzyme concentrations, and calculated effective catalytic rates for 19 out of the 21 enzymes in all three strains growing on minimal medium with glucose (PpsA and Glk were not considered, as there was no flux data available for the corresponding reactions). The results show that, for some reactions, the meta- bolic flux differences are mainly because of variations in enzyme concentrations, whereas for others they are the result of changes in the catalytic activity of the enzymes (Table V and  supplemental Data File S9). For instance, Auriol et al. (23) compared the NA23 strain with the wild-type strain and found a 53-fold decrease in flux through the reaction catalyzed by SucC, which generates succinate from succinyl-CoA as part of the TCA cycle (Fig. 1). We show that this change in flux results from a 3.3-fold increase in enzyme concentration and a 180-fold decrease in effective catalytic rate (Table V). This confirms that the difference in catalytic rate causes the change in metabolic flux. It would not have been possible to reach this conclusion based on flux data alone. Similar observations can be made for the reactions catalyzed by DhsA, GltA, and GpmA, for which the fluxes decreased 14, 14, 16-fold, respectively, in strain NA23 relative to the wild-type MG1655 strain. Although the enzyme concentrations increased 6.9, 5.9, 1.8-fold for these reactions, the effective catalytic rates decreased 100, 83, and 29-fold, respectively (Table V). For AceA, Auriol et al. observed a fivefold increase in flux. Our results indicate that this is the result of an 8.1-fold increase in enzyme concentration, and a 1.6-fold decrease in effective catalytic rate. The results observed for AceA in this study match those reported by Hua et al. (41), who demonstrated that deleting pgi led to in-creased activation of the glyoxylate shunt. Consistent with this, IcdA abundance increased 1.6-fold, whereas its catalytic rate decreased 81-fold. This leads to a decreased flux through the TCA cycle, with substrates redirected through the glyoxylate shunt (see below).
In addition to quantifying the relative contributions of enzyme concentrations and activities to observed flux changes, the computation of effective catalytic rates may help in pinpointing flux bottlenecks in (reengineered) metabolic networks. If the effective catalytic rate is close to k cat , the enzyme is operating not far from its maximum capacity. As a consequence, a further increase of flux through the reaction will require an increase of the enzyme concentration. In the NA176 strain, for instance, the flux through the glyoxylate shunt is much increased in comparison with the wild-type MG1655 strain and the effective catalytic rate of AceB is 34.7 s Ϫ1 . When comparing this number with the k cat constant, using the compendium of Bar-Even et al. (42), the values of k eff and k cat are found to be close (k cat ϭ 48.1 s Ϫ1 ). This indicates a limiting capacity constraint in the network that could be resolved by overexpressing the enzyme. Notice that this diagnostic critically depends on the possibility to compare k eff with k cat , and thus on the capability to measure absolute enzyme concentrations with high precision and accuracy, like in this study. Using the workflow and the labeled protein standards developed in this work, absolute enzyme concentrations can be obtained for any E. coli strain cultivated in any experimental conditions, and effective catalytic rates can be calculated from flux data, if available. As demonstrated by the comparison of the wild-type and mutant strains, information on the effective catalytic rate of an enzyme can be used to distinguish between two levels of regulation of metabolic fluxes: enzyme concentrations and the rates at which enzymes cat-alyze reactions. The insights thus obtained can be used to target metabolic engineering efforts, for instance to decide whether it is worth increasing the expression of a gene coding for a specific enzyme. bon metabolism. The method is based on stable isotope dilution combined with MS-based analysis using Selected Reaction Monitoring (SRM). Previous studies have demonstrated that several proteins could be detected and accurately quantified with high sensitivity and specificity using peptides (43) or concatemers standards (44) coupled to SRM analysis. More especially, Carroll et al. (44) described the absolute quantification of 27 enzymes (and isoenzymes) of the yeast glycolysis pathway using one 88 kDa size concatemers. In the present work, we chose an absolute quantification methodology that allows the best possible accuracy for CCM enzyme concentrations measurements. Indeed, in order to obtain highly accurate measurements, choices had to be made, concerning the type of standards, the MS analysis mode, and quality control of the overall workflow. In the context of large scale studies, AQUA and QconCAT standards are commonly used for multiplexed absolute quantification experiments. For example, a QconCAT strategy has been developed with the objective to provide absolute quantification for at least 4000 proteins of yeast (45). However, as described by Brownridge et al., "Quantification is impaired if either the QconCAT or the analyte proteins are incompletely digested, such that the yield of either peptide is incomplete -indeed, this is not a problem unique to our workflow, but any quantitative approach using proteolytic digestion to generate peptides as analytes." Thus, to ensure accurate measurement of protein concentrations, stable isotope labeled proteins standards were chosen as reference. Added at the very beginning of sample preparation, those standards allow eliminating all biases and ensuring accurate quantification. We decided to use full-length isotope labeled standards to measure enzyme concentrations as accurately as possible. Considering our study model (E. coli), it appeared relevant to produce full-length isotopically labeled standards in E. coli cells, to allow high production rates. The first step was to develop a minimal growth medium limiting substrate consumption and thus production costs. Then, a purification system had to be found to allow easy and efficient production of standards. We selected a system where purification and tag cleavage were performed during a single step (31). We then verified the isotope incorporation rate by comparing precursor ion mass spectra with a simulated isotopic distribution. Finally, standards were calibrated, using an AAA-MS approach which has the advantages of requiring tiny amounts of starting material and a shorter time frame, and which has similar accuracy to standard AAA (26). This workflow can be used for straightforward and efficient production of large amounts of multiple full-length isotopically labeled standards, and is particularly suitable for soluble proteins. This system is not limited to E. coli proteins, and can be extended for many applications where post-translational modifications are not needed. For instance, Wang et al. (31) developed a similar 15 N labeled standard production system to quantify human ApolipoproteinE, whereas another study (46) produced 15 N-labeled analogues to accurately quantify DNA glycosylase.
A recent study has demonstrated that an SRM approach can quantify proteins over a dynamic range between 50 and 10 6 copies per cell (43). According to Ishihama et al. (47), the protein dynamic range in E. coli strains is ϳ4 orders of magnitude. The selectivity, sensitivity, and dynamic range capabilities conferred by SRM make this approach the most appropriate for our study. One of the main challenges was to We applied the quantification workflow to strains with increased production of NADPH, an essential intermediate involved in many biosynthesis pathways. We focused on how this imbalance affected the abundance of CCM enzymes (21). To do this, we studied the modified microorganisms developed by Auriol et al. (23). In strain NA23 (MG1655 ⌬pgi ⌬edd ⌬udhA ⌬qor) NADPH overproduction was the result of deletion of the pgi and edd genes. Auriol et al. showed that, to compensate for the stress because of the NADPH/NADP ϩ imbalance, two adaptive mutations in two different genes occurred to produce strain NA176 (23). The first of these, nuoF*, occurred in the gene encoding a subunit of the watersoluble NADH:ubiquinone, leading to novel NADPH reoxidizing pathways. The second mutation, rpoA*, occurred in the gene coding for the alpha subunit of RNA polymerase, which causes a global transcriptional rearrangement, involving the repression of genes contributing to the TCA pathway and the glyoxylate shunt (23). Results presented by Auriol et al. showed that, in the case of NA23 and NA176, fluxes were rerouted through the PPP, whereas the lower glycolysis pathway was activated to produce pyruvate. TCA enzymes were regulated through hyperactivation of the glyoxylate shunt, thus limiting NADPH production by isocitrate dehydrogenase (Fig. 1). This resulted in an accumulation of oxaloacetate, which explains the need to activate anaplerotic pathways to maintain the balance of TCA intermediates (21).
How can the accurate quantitative proteomics data obtained with the approach presented in this study be exploited for the analysis of the network? The absolute numbers in Table IV are interesting in themselves, as they show to which CCM enzymes the cell directs its protein synthesis invest-ments. However, the data are even more valuable when cross-referenced with other types of omics resources.
We have combined the proteomics data with flux distributions available from experiments carried out under exactly the same conditions as in this study. This makes it possible to analyze the relative contributions of the concentration and the catalytic rate of enzymes to observed flux variations between strains. For NA23, the increased flux through the AceA reaction in the glyoxylate shunt was found to be mainly because of a higher enzyme concentration with a relatively constant effective catalytic rate. A similar result was observed for AceB. In contrast, flux reduction through IcdA appeared to be mainly because of a diminished catalytic rate of the enzymes. These results concur with studies describing regulation of the glyoxylate bypass operon, aceBAK. This operon codes for the glyoxylate shunt enzymes AceA and AceB, as well as isocitrate dehydrogenase kinase/phosphatase (AceK). AceK phosphorylates IcdA, to reduce its activity (49) (50). Thus, our results, showing that AceA and AceB were present in increased amounts whereas the effective catalytic rate of IcdA decreased, are consistent with the known co-expression of aceA, aceB, and aceK as well as the post-translational regulation of IcdA by AceK. In the NA176 strain, which results from adaptive evolution of NA23, the IcdA flux was higher than in NA23. This appears to be because of an increased effective catalytic rate, as the enzyme concentration remains similar in both strains. The above results validate the consistency of our strategy for calculating in vivo effective catalytic rates in prokaryote models from fluxes and enzymes concentrations. Recently, effective catalytic rates were also used for understanding the complex relation between the protein synthesis rate and the growth rate of the cell (24) (25).
Quantitative proteomics, through the computation of effective catalytic rates of enzymes, thus makes it possible to dissect network functioning in a novel way. In addition, this information may also be of great value in the context of metabolic engineering, as it indicates which enzymes operate close to their maximum capacity and may thus be bottlenecks in the network. For example, in the NA176 strain, where the flux is rerouted through the glyoxylate shunt, the k eff value of AceB was found to be close to its maximal value, given by k cat , thus suggesting a possible target for protein overexpression. Although interesting as a heuristic guideline, one should be beware of the limits of this strategy. Assays for determining k cat values are performed by means of purified enzymes in vitro, and may therefore lead to a distorted view of the actual in vivo situation (51). Even more importantly, control of metabolic fluxes is a global network property and the relaxation of local capacity constraints for individual reactions is not guaranteed to increase the flux through those reactions (40). Therefore, it will be interesting to integrate accurate quantitative proteomics data into predictive models, both constraintbased models (CBMs) of the flux distribution in metabolic networks and mechanistic models including details of enzyme kinetics. For instance, enriching CBMs with quantitative proteomics and metabolomics data, as part of IOMA (52), would increase the accuracy of the metabolic fluxes predicted. Ultimately, quantitative information on protein abundance may be most useful when combined with both flux and metabolite data in the construction of predictive kinetic models, which explicitly take into account the regulatory mechanisms of the cell.
A recent example of the latter strategy is the study by Smallbone et al. (53), who combined proteomics and metabolomics data on yeast glycolysis with enzyme kinetic assays to obtain in vivo values for k cat and K m which, when integrated into a dynamic kinetic model, allowed to refine the picture of the control of yeast glycolysis, distributed more widely over the whole system than originally thought. Another example is the data set of Ishii et al., who measured in parallel enzyme and metabolite concentrations as well as fluxes and mRNA levels in CCM of E. coli (54). These data have been used to estimate the parameter values in a simplified model of the metabolic network, revealing among other things a number of critical issues for model identification in genome-scale kinetic models, such as theoretical and practical limits on the identifiability of parameter values (55).
A limitation of the present work is related to the fact that isoenzymes were not individually considered when analyzing effective catalytic rates. As a consequence, the effective rates of some enzymes might possibly be incorrectly calculated as fluxes might result from the action of several enzymes. However, in prokaryote cells, like E. coli, isoenzymes are rare because a given activity is more commonly controlled by a single protein. Among the 22 chosen enzymes, only the transketolase activity involved two isoenzymes, TktA and TktB, of which only TktB was measured. In order to obtain a reliable estimate of the overall in vivo contribution of enzyme abundance and effective catalytic rate to a change in flux, the abundances of both isoenzymes would have to be quantified. In order to separate the activity of each isoenzyme, in vitro assays can be performed as described by Smallbone et al. (53).
In conclusion, in this work we describe the development of a proteomics workflow allowing multiplex absolute quantification of enzymes involved in E. coli CCM. We analyzed the data thus obtained in an innovative way, by calculating effective catalytic rates of the enzymes for wild-type and mutant strains optimized for higher NADPH production. This provides clues about the relative contributions of enzyme concentrations and their catalytic rates to flux changes observed across different strains as well as about potential flux bottlenecks in the network. Although important in their own right, the results thus obtained also lead the way toward kinetic models of metabolism that can be used as predictive tools in systems biology and many applications in biotechnology.