Interlaboratory Evaluation of Automated, Multiplexed Peptide Immunoaffinity Enrichment Coupled to Multiple Reaction Monitoring Mass Spectrometry for Quantifying Proteins in Plasma*

The inability to quantify large numbers of proteins in tissues and biofluids with high precision, sensitivity, and throughput is a major bottleneck in biomarker studies. We previously demonstrated that coupling immunoaffinity enrichment using anti-peptide antibodies (SISCAPA) to multiple reaction monitoring mass spectrometry (MRM-MS) produces Immunoprecipitation MRM-MS (immuno-MRM-MS) assays that can be multiplexed to quantify proteins in plasma with high sensitivity, specificity, and precision. Here we report the first systematic evaluation of the interlaboratory performance of multiplexed (8-plex) immuno-MRM-MS in three independent labs. A staged study was carried out in which the effect of each processing and analysis step on assay coefficient of variance, limit of detection, limit of quantification, and recovery was evaluated. Limits of detection were at or below 1 ng/ml for the assayed proteins in 30 μl of plasma. Assay reproducibility was acceptable for verification studies, with median intra- and interlaboratory coefficients of variance above the limit of quantification of 11% and <14%, respectively, for the entire immuno-MRM-MS assay process, including enzymatic digestion of plasma. Trypsin digestion and its requisite sample handling contributed the most to assay variability and reduced the recovery of target peptides from digested proteins. Using a stable isotope-labeled protein as an internal standard instead of stable isotope-labeled peptides to account for losses in the digestion process nearly doubled assay accuracy for this while improving assay precision 5%. Our results demonstrate that multiplexed immuno-MRM-MS can be made reproducible across independent laboratories and has the potential to be adopted widely for assaying proteins in matrices as complex as plasma.

Hundreds of proteins with claimed potential as diagnostic or prognostic biomarkers of disease presence or progression have emerged from discovery proteomics studies, but few have progressed to use in the clinic. The failure to deliver on the promise of plasma protein biomarkers is in part due to the lack of sufficiently precise methods to measure changes in abundance of numerous candidate protein biomarkers in hundreds of patient samples. We call this critical step "verification," and it bridges discovery and clinical validation of biomarkers (1)(2)(3)(4)(5). Verification of candidate protein biomarkers currently relies primarily on standard immunoassays. Analytically validated immunoassays measure target analytes with high specificity, sensitivity, and throughput, although these are not without problems caused by autoantibodies and off target interferences. In addition, antibody reagents suitable for constructing useful sandwich immunoassays exist for only a small fraction of the proteome. Developing reliable immunoassays for every potential candidate biomarker is not practical at present because of the long development times and high cost (3), which have led to a dearth of new protein biomarkers (6). Alternative methods that have similar performance attributes to immunoassays are clearly needed for the biomarker verification step.
The performance characteristics of multiplexed SID-MRM-MS 1 assays for direct measurement of proteins in plasma have been systematically and rigorously evaluated, including the intra-and interlaboratory reproducibility of the technology, using well established methods for analytical validation of clinical assays (7)(8)(9). These studies demonstrated that the technology is highly specific, sensitive, and precise for direct quantification of proteins in plasma. Limits of quantification (LOQ) by direct analysis of digested plasma by SID-MRM-MS were in the range of 1-5 g/ml of protein (9), with intralaboratory reproducibility between 6.6 and 54.9% and interlaboratory imprecision of 18.5% as measured at the LOQ. Importantly, these excellent performance levels were achieved regardless of the specific LC or MS instrument employed, provided that proper care was exercised in sample preparation and data analysis.
Although many proteins of clinical utility are present in blood at or above 1 g/ml (e.g. C-reactive protein), most disease-specific protein biomarkers are present at ng/ml and lower levels (e.g. cardiac troponins in heart failure, prostate specific antigen in prostate cancer and thyroglobulin in thyroid cancer) (6). The complexity and 10 11 dynamic range of protein abundances in blood severely limit the sensitivity of protein quantification in plasma by SID-MRM-MS. A number of approaches have been taken to address this problem. Keshishian et al. (8) developed a method involving abundant protein depletion using commercial immunoaffinity columns followed by limited peptide fractionation (6 -8 fractions) prior to on-line LC-SID-MRM-MS. Using this approach, termed fractionation MRM (fMRM), robust, multiplexed quantification of proteins at the bottom of the ng/ml range in plasma was achieved with an overall technical reproducibility between 2.6 and 37% with a median CV of 9.8% at 2.5 ng/ml. This technology has been used to develop multiplexed assays for known and novel markers of cardiovascular injury that are sufficiently sensitive and reproducible to assess levels of these proteins in patient samples (10). A limitation of the approach is that throughput of the approach fMRM is lower than for direct MRM-MS analysis primarily because of the increased amount of instrument time required for analysis of each sample fraction. Zhang and co-workers (11,12) developed a highly selective approach that specifically enriches N-glycosylated peptides and proteins and then quantifies the formerly glycosylated peptides by MRM-MS. This approach permitted precise quantification of N-linked glycoproteins at low ng/ml range in plasma. Unlike the fMRM approach, the choice of which peptide to use for quantification is strictly limited to peptides that had been N-glycosylated. If the formerly N-glycosylated peptide does not have suitable HPLC retention or ionization properties for LC-MS/MS, it will not be usable for quantification. In addition, a change in the level of one of these peptides could be due to either a change in the level of the protein or a change in the extent of glycosylation at this particular site, the latter occurring without a change in protein level.
Stable isotope standards and capture by anti-peptide antibodies (SISCAPA) employs immunoaffinity enrichment of targeted peptides from digested tissue or biofluids prior to analysis by MRM-MS (immuno-MRM) (13). A number of reports from our laboratories (14 -17), as well as from others (18 -21), have demonstrated the ability of immuno-MRM assays to achieve LODs of ϳ1 ng/ml for proteins using as little as 10 l of plasma with process reproducibility similar to that of MRM-MS and fMRM-MS. Immuno-MRM assays have been developed and applied to measure a range of known and potentially novel protein biomarkers in cancer (14,15,22) and cardiovascular disease (16). An automated magnetic beadbased platform capable of high throughput processing for immuno-MRM has been developed and tested to measure proteins with high precision (15,17). Multiplexing of immuno-MRM assays can be achieved in plex levels of at least 10 by combining anti-peptide antibodies produced against different distinct peptides (17). Mixtures of these antibodies are used to immuno-affinity enrich the target peptides from plasma in multiplex. Subsequent washing and elution of the beads releases all peptides captured for analysis by LC-MRM-MS. The pharmaceutical and biotechnology industries are now beginning to develop immuno-MRM assays internally or in partnership with academia and contract research organizations to measure proteins in preclinical animal models for pharmacodynamic assessment of drug candidates (19 -21). These studies have clearly demonstrated the advantages and potentials of immuno-MRM to construct multiplexed assays to measure large numbers of proteins in biofluids and to overcome limitations of traditional immunoassays.
Here we report a systematic evaluation of the intra-and interlaboratory performance of multiplexed immuno-MRM assays, a prerequisite for more widespread acceptance and adoption of the technology in academia and industry. Protocols were developed to assess the intra-and interlaboratory reproducibility of the technique performed at multiple sites and to systematically examine the sources of imprecision in each phase of the assay process. The results for the 8-plex assay developed define the achievable performance of immuno-MRM using automated bead capture and washing and provide additional support that immuno-MRM-MS is valuable technology for bridging the gap between discovery and clinical validation of candidate protein biomarkers. 1 The abbreviations used are: SID-MRM-MS, stable isotope dilution multiple reaction monitoring mass spectrometry; CV, coefficient of variance; MRM, multiple reaction monitoring; fMRM, (chromatographic) fractionation MRM-MS; immuno-MRM-MS, immunoprecipitation-MRM-mass spectrometry; LOD, limit of detection; LOQ, limit of quantification; PAR, peak area ratio; SISCAPA, stable isotope standards and capture with anti-peptide antibodies; RE, relative error; AA, amino acid(s).

EXPERIMENTAL PROCEDURES
Peptide Standards, Protein Standards, Rabbit Anti-peptide Polyclonal Antibodies, and Plasma-Eight peptides specific to six proteins were selected using methods and criteria previously described (8,9) (Table I). Briefly, all possible tryptic peptides for each protein were selected in silico using Agilent Spectrum Mill Peptide Selector (http:// proteomics.broadinstitute.org/millhtml/mssluice.htm; Agilent, Santa Clara, CA) and chosen on the basis of uniqueness, size, and lack of modifiable residues. They were synthesized with a single stable isotope-labeled amino acid using Fmoc (N-(9-fluorenyl)methoxycarbonyl) chemistry and purified by reversed phase chromatography to greater than 90% (MIT Biopolymer Lab, Cambridge, MA In general, the proteins were selected for assay development because of an expected low concentration (less than 1 ng/ml) in plasma so that standard addition experiments could be generated with standard peptides in the absence of endogenous analyte. Granulocyte colony-stimulating factor protein standard (CSF3) was purchased from US Biological (Swampscott, MA; catalogue number G8951-26P). Unlabeled and fully 15 N-labeled forms of recombinant S100B were obtained from Argonne National Laboratory (Argonne, IL). Proteins S100A7, S100A8, S100A12, and IL1RN were purchased from Abnova (Taipei, Taiwan) and were all expressed with GST tags. Protein concentrations were determined by colorimetric methods or amino acid analysis and used without further formulation. Protein purity was determined to be greater than 90% by SDS-PAGE. Although the GST-tagged protein standards each yielded the expected tryptic peptides when digested in buffer, none of the targeted peptides were observed by MRM when the proteins were spiked into plasma prior to digestion. The reason for a lack of digestion of the GST-tagged proteins in plasma is unclear, but could be due to facile homodimerization of the GST tag that could decrease the accessibility of the tagged proteins, thereby limiting digestion by trypsin or, alternatively, that the dimerized GST tag could decrease the solubility of the tagged protein, again resulting in low or no yield of the targeted peptides. No further work was done using the GST-tagged proteins.
Rabbit polyclonal antibodies were generated in New Zealand White rabbits following a standard 98-day protocol (Epitomics, Burlingame, CA) as previously described (17). In brief, peptides were synthesized to 80% purity with an additional cysteine on the C terminus and conjugated to keyhole limpet hemocyanin for immunization. Peptides were combined into groups of five, and each rabbit was immunized with a different mixture of five distinct peptides. Antisera titers were measured by peptide ELISA (23) to determine the peptides that elicited the highest immune response. Anti-peptide antibodies, typically specific for two or three peptides per protein, were purified separately by affinity chromatography using the immunizing peptide Detailed scheduled MRM methods are listed in Supplemental Methods. Transition ions in bold are used for all calculations including reporting relative abundance, calculating LOD, LOQ and CV. Stable isotope labeled amino acids are in blue (one AA) or red (all AA) and carbamidomethylated cysteine is in green. Antibody grades were assigned in the initial screening assays (17) based primarily on the limits of detection determined from a mini-curve prepared as 10-fold step dilutions from 500 fmol to 0.5 fmol.
covalently attached to a Sulfolink column (Thermo Fisher Scientific, San Jose, CA). Affinity-purified polyclonal antibodies were shipped in 1ϫ PBS, 0.02% sodium azide to one site where they were formulated to 25% glycerol and aliquotted for long term storage at Ϫ20°C. Antibodies were sent to each site on wet ice and stored at 4°C until use. Plasma pooled from deidentified human specimens was purchased from Bioreclamation Inc. (Westbury, NY) and treated as not human subjects research for samples prepared in this study.
Sample Preparation for All Study Phases-All of the samples were prepared at one site to remove initial sample preparation as a source of variability. Curves containing eight light peptides unique to six proteins ranging from 3 amol/l to 500 fmol/l were prepared in triplicate. The same concentrations of analyte were used in each phase of the study to directly compare the performance across all phases (supplemental Table 1). In phase 1, peptides were spiked into digested plasma diluted 1:500 in 5% acetic acid, 0.03% CHAPS to simulate nonspecifically bound plasma remaining after magnetic antibody bead enrichment (120 g/ml). In phase 2, the peptides were spiked into digested plasma diluted 1:5 in 1ϫ PBS, 0.03% CHAPS to simulate a resuspended plasma digest prepared for magnetic antibody bead enrichment. In phases 3 and 4, two proteins, S100B (unlabeled (light) and stable isotope-labeled (heavy) forms) and CSF3 (light form only) were added by standard addition into plasma at molar concentrations equivalent to those used in the peptide curves to simulate biomarkers in a clinical sample (supplemental Table 1). The highest concentration, 500 fmol/l, was not prepared for phases 3 or 4 because of limiting amounts of reagents. The concentration of heavy peptide standard was 5 fmol/l for phases 1 and 2 and 10 fmol/l for phases 3 and 4. All of the samples were frozen and then distributed to each site.
Plasma Enzymatic Digestion-Digestion was performed using urea as the denaturant as described previously (8). Conditions based on digestion of 100 l of plasma were scaled accordingly to digest 10 l to 100 ml of plasma as required. Bulk plasma digests, used as a background matrix for assay evaluation, were digested with bovine trypsin (Sigma). Mock biomarker samples, containing protein standards, were digested with porcine trypsin (Promega, Madison, WI). Briefly, 100 l of plasma were reduced and denatured in 7.5 M urea, 0.2 M Tris buffer, pH 8, and 40 mM tris(2-carboxyethyl) phosphine at 37°C for 30 min, alkylated in the dark at ambient temperature with 120 mM iodoacetamide for 30 min, diluted 1:6 with 0.2 M Tris buffer pH 8, and incubated for 16 h with trypsin in a 1:50 enzyme to substrate on a Thermomixer (Eppendorf AG, Hamburg, Germany) at 37°C and 750 RPM. After 16 h, a second volume of trypsin was added at the same ratio, and incubation continued for an additional 4 h. Formic acid was added to 2% final concentration to quench the reaction.
Samples were diluted 1:2 with 0.1% formic acid and loaded onto Oasisா HLB 3cc (60 mg) extraction cartridges (Waters, Milford, MA) that were prewashed and equilibrated with 2 ml of 90% acetonitrile, 0.1% formic acid followed by 2 ml of 0.1% formic acid. Stable isotope-labeled (heavy) peptides (100 fmol) were added to phase 3 and 4 samples prior to loading. After loading, each sample was washed with 5 ml of 0.1% formic acid and eluted with 2 volumes of 400 l of 90% acetonitrile, 0.1% formic acid. Samples were vacuumcentrifuged or lyophilized to dryness and stored at Ϫ80°C until analysis.
Automated Peptide Immunoaffinity Enrichment-Fifty microliters of each sample were added to a 96-well microtiter plate in triplicate. A bulk mixture of all eight anti-peptide antibodies was prepared to a final concentration of 20 g/ml in sufficient quantity to add 1 g of each antibody to each sample. Sixteen microliters of 1-m protein G magnetic beads (Invitrogen) were added to each sample and mixed briefly. The plates were sealed with aluminum adhesive seal mats and gently inverted on a Labquakeா (Thermo Fisher Scientific) rotator overnight at 4°C. After overnight incubation, the plate was transferred onto a KingFisher magnetic bead processor (Thermo Fisher Scientific) equipped with a PCR magnet head. The plates were processed as described previously (15) with modifications (supplemental methods). Briefly, the beads were transferred from the incubation plate into a 250-l plate containing 1ϫ PBS, 0.03% CHAPS and mixed for 1.5 min. The beads were subsequently transferred and mixed twice more, with the second wash plate containing one-tenth the salt concentration. In the final step, the beads were transferred into a 100-l PCR plate containing 25 l of 5% acetic acid to elute the bound material. Elution plates were sealed with an adhesive aluminum seal mat and frozen at Ϫ80°C until analysis.
Full loop injections (10 l) of each sample were loaded onto the trap column with CH1 A: CH1B (50:50) at 10 l/min for 1.5 min. After loading, CH2 was switched into line with the trap and analytical columns, and samples were eluted with the following gradient at 300 nl/min over 18 min: 2.3% B isocratically for 3 min, 2.3% B to 40% B in 10 min, 40% B to 90% B for 2 min, and hold at 90% B for 3 min. After 18 min, the system was returned to 2.3% B for 19 min, increasing the flow rate to 400 nl/min for the last 10 min and switching the trap column in-line with CH1 for the last 2 min at 10 l/min. Heavy peptide mixtures (50 fmol) in digested plasma diluted 1:1000 in 5% acetic acid were injected prior to analysis to condition the system and scheduled with the MRM method monitoring three transitions per peptide per isotopic version (light, heavy, heavy 15 N). Five abundant plasma peptides that nonspecifically bound to the beads were also monitored for a total of 68 transitions (supplemental methods).
Data Analysis and Calculations-Extracted ion chromatograms of all transition ions were integrated using MultiQuant TM (AB Sciex Foster City, CA) version 1.2. Integrated peak areas were manually inspected to make sure the automated settings in the software integrated the same peak and retention time for each light and heavy pair of peptides.
Calculation of Protein Concentration Using Spiked (Single AA) Labeled Heavy Peptide-Light peptide to heavy (single AA) peptide peak area ratios (PAR) were entered into the following equation to calculate the observed concentration of protein in the original sample reported in ng/mL: (PAR) ϫ (13C standard peptide concentration (fmol/L)) ϫ (analysis volume (L)) ϫ (Protein molecular weight (fg/ fmol))/(original process volume (L))/1000 (fg/L to ng/mL conversion factor)ϭ.
Calculation of Protein Concentration Using Spiked Uniformly Labeled (All AA) Peptide Derived from Heavy Protein-Light peptide to uniformly heavy (all AA) peptide PARs were entered into the following equation to calculate the observed concentration of protein in the original sample reported in ng/ml: (PAR) ϫ (heavy protein concentration (ng/ml)/(normalization factor) ϭ where the normalization factor is the average peak area ratio of the light peptide to the corresponding uniformly 15 N-labeled peptide. The normalization factor accounts for the fact that the light protein and the uniformly 15 N-labeled protein are spiked in at identical levels.
Calculation of Intralaboratory and Interlaboratory % CV, % RE, LOD, and LOQ-Imprecision (% CV) was calculated using the mean of the process replicates (i) within each site (intralab) and (ii) across all sites (interlab). Precision was calculated as 1 minus % CV. The percentage of relative error (% RE) was calculated as (observed Ϫ expected)/expected ϫ 100. Accuracy and percent recovery were calculated as 1 plus % RE. Limit of detection (LOD) and quantification (LOQ) were calculated according to a method we described previously (10), based on a variation of the methods described by Linnet (24) and Currie (25). Linear regression was used to fit the calibration curve and determine the slope and intercept, along with respective standard error estimates, for data in the detectable range. Standard errors were reported to quantify variation and can be used to calculate appropriate confidence intervals. Weighted robust linear regression using Tukey's biweight estimator (26,27) minimized the influence of outliers while appropriately accounting for the heteroscedasticity of the data. All of the calculations were performed using custom programs written in R (28).

Design of Interlaboratory Study-
The primary goal of this study was to establish an automated, multiplexed anti-peptide antibody enrichment coupled to SID-MRM-MS analysis for quantification of proteins in plasma with well defined and suitable performance characteristics to enable distribution of the technology and the assays across laboratories. The performance of the immuno-MRM assay workflow across three independent laboratories was evaluated in a series of experiments that were implemented in four distinct phases as detailed below and in Fig. 1. Four phases were designed to evaluate intra-and interlaboratory reproducibility by determining the linear response, percent recovery, and LODs and LOQs for each unique peptide derived from target protein. % CV was calculated for each peptide at each site, as well as across all sites to estimate the process imprecision and interlaboratory reproducibility.
Phase 1 was used to establish the baseline LC-SID-MRM-MS response for the peptides at each laboratory in the absence of immunoaffinity enrichment, but in a background simulating the complexity of peptides nonspecifically bound to 1-m protein G magnetic beads after SISCAPA immunoaffinity enrichment. The curves were prepared in triplicate for each site by spiking light peptide standards at 12 different concentrations into digested plasma diluted 1:500 (120 g/ ml). The background level of digested plasma used closely approximated the abundance and distribution of peptides we have detected nonspecifically bound to these magnetic beads when incubated 1:10 (v:v) with digested plasma in the absence of antibody (based on data-dependent LC-MS/MS analysis on an Orbitrap; data not shown). Phase 2 introduced immunoaffinity enrichment of the peptides (SISCAPA) and was designed to evaluate the efficiency and reproducibility of heavy and light peptide capture by anti-peptide antibodies. The samples were prepared by spiking peptides at the same molar concentrations as in Phase 1 into 10 l of digested plasma diluted 1:5 (12 mg/ml), enriched using antibodies and magnetic beads (SISCAPA; KingFisher), and analyzed by LC-MRM-MS. Phases 3 and 4 were designed to determine the variability introduced by trypsin digestion in the context of the assay workflow. In phase 3, the samples were spiked with the target proteins, digested, spiked with heavy peptide, desalted at one site, and then distributed to each site for anti-FIG. 1. Design of interlaboratory study. The study consisted of four phases. In each phase, samples were prepared centrally, and the same molar concentrations of peptide or protein were used to generate response curves (see supplemental Table 1). Phase 1 consisted of pooled, digested diluted plasma (120 g/ml) spiked with standard peptides at 12 concentrations plus a blank (no light peptide added) analyzed by LC-MRM (lightest shading). Phase 2 consisted of pooled, digested plasma (10 mg/ml) spiked with standard peptides at 12 concentrations, enriched by SISCAPA and analyzed by LC-MRM. Phase 3 consisted of undigested plasma (60 mg/ml) spiked with standard proteins, digested centrally, distributed, enriched by SIS-CAPA, and analyzed by LC-MRM. Phase 4 consisted of undigested plasma (60 mg/ml) spiked with standard proteins, distributed, digested locally, enriched by SISCAPA, and analyzed by LC-MRM (darkest shading). The data were centrally analyzed and statistically evaluated.
body enrichment and LC-MRM-MS analysis. Phase 4 evaluated the entire workflow by simulating the analysis of a real set of biomarker assays for proteins in plasma. In Phase 4, undigested plasma spiked with the target proteins spanning a range of concentrations was processed entirely at each site.
Linearity of Response and Peptide Recovery-Overlaid plots for each analyte of the experimentally determined average concentrations for all phases of the study help visualize the linearity of response and percent recovery of each peptide across the sites throughout the study. Representative curves for the peptide IQGDGAALQEK from CSF3, a growth factor and cytokine, are shown in Fig. 2. Results for the other peptides are shown in supplemental Fig. 2, A-G. Each curve represents the interlaboratory performance plotting the average measurement at each phase (color-coded lines). Because the measurements spanned a large dynamic range, linearity of response was assessed by the variability of the slope and intercept (21) calculated using a robust linear model for all data collected in the detectable range between 4.3 ng/ml (0.23 fmol/l) and 3.1 g/ml (167 fmol/l) for each phase in the study (Table II). Standard error of slope was less than 0.02 for all peptides between 0.23 and 167 fmol/l in this study, indicating a robust linear response between assays and sites. Therefore, immuno-MRM-MS can be used to determine the relative levels of protein between samples containing a large range of concentrations without first running an assay to estimate the concentration, which is often a prerequisite of other protein measurement systems. Below these concentrations, the peptide response was no longer linear due to low ion counts, background noise, and, in some cases, the presence of trace peptide carryover from prior analysis (see detailed discussion of LOD and LOQ below). Intercept (sup-plemental Table 4), an estimation of endogenous levels of peptide in plasma, has a 95% confidence interval (intercept Ϯ 1.96 S.E.) that encompasses 0 ng/ml for all peptides, indicating the absence of any endogenous peptide, with the exception of S100A8.ALN in phase 2 (b ϭ 4.12 Ϯ 0.01; supplemental Table 4), which was included in this study as an example of an endogenous protein detected by immuno-MRM-MS in plasma.
The percentage of recovery following sample processing and immuno-MRM-MS analysis was determined from the relative error of each peptide as measured by MRM-MS at two concentrations, one at or near the LOD of the assay (0.23 fmol/l) and one in the lower half of the linear range of the assay (2.1 fmol/l) (Table II, Fig. 2, and supplemental Fig. 2, A-G). In Phase 1, the percentage of recovery in the linear range (2.1 fmol/l) ranged from 60.3% (IL1RN.IDV) to 133.6% (S100B.ELI) for all peptides (Table II, column 3). These values reflect the true unadjusted amount of peptide added to prepare the curves and represent the experimentally derived maximum response for evaluating recovery in all subsequent phases in the study. In phase 2, where the same peptide mixtures were spiked into a more complex digest of plasma and enriched by immunoaffinity capture prior to MRM-MS, the recovery was similar to phase 1, ranging from 69.8% (IL1RN.IDV) to 119.8% (S100B.AMV) for all peptides at 2.1 fmol/l (Table II, column 3). The percentages of recovery were higher at or near the limit of detection (0.23 fmol/l) because of the influence of background signal. Apparent recovery of S100A8.ALN was much higher than expected at both concentration levels (298.4% at 2.1 fmol/l and 1894.3% at 0.23 fmol/l), indicating that endogenous S100A8 protein was present at ϳ70 ng/ml in the background plasma digest. Slope was also calculated from the robust linear regression as an alternative value of recovery using all the concentration levels (n ϭ 7) in the linear range of the assay. In general, slope values (Table II, (Table II, column 3), suggesting that this is a suitable alternative for determining assay recovery. In phases 3 and 4, where proteins were spiked into plasma and digested prior to immuno-MRM-MS, the response was still linear from 0.23 to 167 fmol/l, but the determined recoveries were distinctly less compared with phase 1, ranging from 19.5 to 54.8% for the peptides measured. The percentages of recovery at 2.1 fmol/l of the CSF3 peptide in phases 3 and 4 as shown in Fig. 2 were 40.4 and 42.2%, respectively.

Intra-and Interlaboratory Reproducibility-
The reproducibility of the quantitative measurements for each peptide within and across laboratories in each study phase were evaluated ("Experimental Procedures" and Fig. 1). Intralaboratory CVs for phase 1 constitute a measure of the technical variation caused by instrument and data acquisition because samples were prepared centrally. As summarized at the bottom of Table III, the median of intralaboratory CVs determined in the linear range in phase 1 was 3.4% for all peptides in the study (n ϭ 8). Introducing SISCAPA capture into the process in phase 2 did not significantly affect assay imprecision relative to MRM-MS alone (median CV ϭ 3.1%), because 13 C-labeled peptides were present from the beginning of the analysis, and the antibody captures both forms without discrimination. Median intralaboratory CV approximately Slope, LOD, and LOQ are median values from three sites determined using the same transition ion in the linear range of the assay as described under "Experimental Procedures." The percentage of recovery was determined at two concentrations, one at or near the LOD (0.23 fmol/l) and one in the lower half of the linear range of the assay (2.1 fmol/l). LOD and LOQ are reported in fmol/l (peptide/protein) and ng/ml (protein). The slopes were determined for curve concentrations in the linear range using weighted robust linear regression as described under "Experimental Procedures." The standard errors were reported to quantify variation and can be used to calculate appropriate confidence intervals. Higher recoveries were observed at the limit of detection because of the influence of background signal. The recovery reported in phase 2 for S100A8.ALN is artifactual because endogenous protein was detected in the background plasma digest. NA, not applicable, protein not spiked into plasma; ND, not determined.
Slope was determined using all curve concentrations in the linear range of the assay. b Determined at the LOD of the assay. c Determined at a concentration in the lower half of the linear range of the assay. d Recovery for S100A8.ALN is artifactual as endogenous protein was detected in the background plasma digest.
doubled to ϳ7.0% when variability arising from the tryptic digestions performed at a single site in process replicate was introduced in phase 3. Performing each part of the analysis process entirely at each site as in phase 4 only marginally increased the intralaboratory CV relative to phase 3. However, primarily because of the differences in the mean determined between sites, the median interlaboratory CV of all peptides was higher (13.9%) than the median intralaboratory CV (8.0%) (Table III, column 1, bottom, phase 4). In phase 2, which evaluated all eight peptide antibodies by immuno-MRM, some assays, such as S100A7.GTN and S100A7.ENF, consistently performed well and had low intralaboratory CVs (Ͻ10%), whereas other assays, e.g.
IL1RN.IDV were less reproducible. These results correlate to the antibody grade assignment (Table I) determined from the initial screening assays (17), which were based primarily on the limits of detection determined from a mini-curve prepared as 10-fold step dilutions from 500 to 0.5 fmol. Peptides S100A7.GTN and S100A7.ENF were detected by immuno-MRM at the lowest level, 0.5 fmol, and so the corresponding antibodies were given a grade of A. In contrast, the lowest level at which peptide IL1RN.IDV could be detected using antibody enrichment in plasma was 50 fmol, and this antibody was given a grade of D. Assay imprecision increased noticeably at the highest concentration in the phase 1 and 2 studies, 500 fmol/l, because of variability in the ion counts between sites as LC column and MS detector saturation start to occur

Intra-and interlaboratory imprecision of multiplex immuno-MRM measurements
The table shows CV (%) calculated from the means of process replicates, i.e. single process-single injection within sites (intralaboratory, n ϭ 3) and across sites (interlaboratory, n ϭ 9) and reported as the median for all values in the linear range, at a concentration near the LOD of the assay (0.23 fmol/l) and at a concentration in the lower half of the linear range of the assay (2.1 fmol/l). Unusually high CVs were observed for peptide IL1RN.IDV in phase 2 at 0.23 fmol/l because this concentration is below the LOQ. a The median CV was calculated from the concentrations (0.23-167 fmol/l) in the linear range of the assay (n ϭ 7 independent concentrations).
b The CV was calculated at or near the linear range of the assay (0.23 fmol/l). c The CV was calculated at a concentration in the lower half of the linear range of the assay (2.1 fmol/l). d For S100A8.ALN, the CV was calculated at 70 ng/ml (6.7 fmol/l). e Median of all peptides (n ϭ 8) in the study.
toward the end of the curves. Saturation of binding to the antibodies does not contribute significantly to this observed effect, because it was also observed in phase 1 (LC-MRM-MS only). As expected, assay imprecision increased dramatically below the LOQ because of the low ion counts from the analytes and the presence of a continuum of low intensity chemical noise (supplemental Fig. 3, bracketed area). The interlaboratory CVs provide an estimate of the analytical variability of the measurement for a clinical specimen analyzed at multiple centers. The results for the detectable range of measurements for all phases are illustrated by box and whisker plots in Fig. 3. CVs were plotted for each peptide (by color) for each phase (by shading). Median CV (calculated using the mean of measurements made at all sites throughout the linear range; see "Experimental Procedures") is represented by a horizontal black line in each box. The interlaboratory reproducibility across all labs and all four phases of the study was, in general, very good ( Fig. 3 and Table III), but overall greater than the corresponding intralaboratory imprecision (Table III and supplemental Tables 2). The median interlaboratory CV for all peptides in the linear range of the assays (i.e. for protein concentrations of ϳ1 ng/ml to 3 g/ml) was less than 20%. Trypsin digestion of proteins spiked into plasma prior to peptide capture and analysis (study phases 3 and 4) significantly increased the median interlaboratory CVs relative to SISCAPA capture of spiked peptides (study phases 2) from ϳ8% to ϳ14% (Table III, bottom row). Increased variability because of enzymatic digestion was also observed in an interlaboratory study of SID-MRM-MS (9). As expected, imprecision of the measurements at peptide concentrations at or near the determined LOD (see Table II and "LOD and LOQ") was higher at every phase of the study for all peptides studied (Table III). Most notably are the higher CVs observed for peptide IL1RN.IDV in phase 2 (Table III,  Box and whisker plots of interlaboratory CV determined from the mean of three replicates across three sites (n ϭ 9) for concentrations ranging from 0.08 to 500 fmol/l. CVs were plotted for each peptide (by color) for each phase (by shading: phase 1, lightest shading; phase 4, darkest shading). Median of all peptides for a study is shown by black horizontal line. The whiskers extend up to 1.5 times the interquartile range. Markers outside the whiskers are considered outliers (circled). The highest concentration, 500 fmol/l, was not prepared for phases 3 or 4 because of limiting amounts of reagents. 0.23 fmol/l or 5 ng/ml protein, is at or below the detectable limit for the peptide.
LOD and LOQ-The LOD and LOQ of the SISCAPA assays were determined using a modification of methods proposed by Linnet (24) and Currie (25) that account for the variability of the blank sample and a sample with the analyte spiked in at a low concentration (10). LOD and LOQ for all peptides for all sites (supplemental Table 3) and median interlaboratory LOD and LOQ were calculated (Table II) to facilitate comparison of assay performance across sites. The interlaboratory LOD and LOQ obtained in study phase 2, where light and heavy peptide standards were spiked into digested plasma background and distributed for measurement at the sites and then immunoenriched, ranged from 0.03 to 0.20 fmol/l (LOD) and 0.08 to 0.59 fmol/l (LOQ), which corresponds to protein concentrations of 0.3-2.1 ng/ml (LOD) and 0.9 -6.3 ng/ml (LOQ). These values represent the maximum sensitivities expected to be able to be achieved in these assays using peptide immunoenrichment alone. These values do not take into account recovery of the peptides from protein digestion, which occurs upstream from immuno-enrichment in the assay workflow (Fig. 1, panels 3 and 4). For the three peptides for which proteins were available, tryptic digestion at a single site (phase 3) generally resulted in only small increases in the LOD of the assays to a maximum of 1.6 ng/ml (Table II). Proteolytic digestion at each site (phase 4) did not further increase the LOD of the assays.
Improving Assay Accuracy and Precision Using Stable Isotope-labeled Protein Standards-We investigated whether the accuracy and precision of the assay could be improved through the use of fully 15 N-labeled proteins as internal standards as suggested previously (29,30). Standard addition experiments were prepared by adding equimolar amounts of unlabeled and 15 N-labeled S100B protein at each concentra-tion level into plasma. Response curves for SISCAPA captured peptides S100B.AMV and S100B.ELI from the digested proteins are shown in Fig. 4. As in previous experiments, heavy peptide standards were spiked prior to desalting the plasma digest. Transition ions were monitored for the specific light peptide and two isotopically labeled versions of the heavy peptide: (i) the synthetic heavy peptide containing one 13 C-labeled amino acid and (ii) heavy peptide containing uniformly 15 N-labeled amino acids. The amount of light protein present was determined using two different methods (Table IV  and Fig. 4; and described under "Experimental Procedures"). First, the concentration of light S100B protein was calculated using the peak area ratio of light peptide (from digested light protein) to spiked heavy synthetic peptide (Fig. 4, red lines) as was performed in previous phases. In the second approach, the concentration of the light S100B protein was determined using the peak area ratio of light peptide to uniformly 15 Nlabeled heavy peptide produced by tryptic digestion of the respective heavy form of the protein. This latter calculation results in direct determination of protein concentration in nanograms (without conversion from moles) because protein concentration of the standard was known, and responses could be normalized in a site-specific manner because the ratio of spiked light to heavy protein was constant throughout the curve. Precision and accuracy were calculated to compare these two methods of determining protein concentration (Table IV). Interlaboratory assay precision improved 6%, from 83 to 89%, which was very similar to the precision observed in phases 1 and 2, where digestion of proteins was not performed. Interlaboratory accuracy improved substantially, from 44 -54% when synthetic heavy peptide was used to 95-101% when the heavy protein was used as the internal standard. Because the heavy standard protein undergoes the same processing as the light target protein, the peptide recovery is the same for light and heavy peptide. Use of heavy proteins in this manner did not improve the LOD of the method because neither light nor heavy peptide was detected below 2.3 fmol/l. DISCUSSION The results of the first multi-site evaluation of SISCAPA coupled to MRM-MS presented here demonstrate that the technology, including the semi-automated method we developed, is transferable between laboratories and has sufficient precision for use in biomarker verification studies performed across multiple sites to quantify proteins in plasma with multiplexed assays. Under conditions simulating a real interlaboratory biomarker verification study, the overall interlaboratory % CV, (including protein digestion, desalting, peptide antibody enrichment, and scheduled LC-MRM-MS analysis) was below 25% at or near the LOQ and below 20% at or near the midpoint of the linear range for each analyte peptide (Table  III). Limits of detection were at or below 1 ng/ml for the assayed proteins using just 10 -30 l of plasma (Table II). The multi-staged design of the study enabled us to obtain useful insights into the sources of variation. Importantly, we have shown that the addition of the semi-automated, antipeptide immunoaffinity capture (SISCAPA) to LC-MRM-MS analysis process does not significantly increase either intra-or interlaboratory variability relative to MRM-MS (9). Our results demonstrate that multiplexed immuno-MRM-MS (where both immunoprecipitation and MRM-MS of target peptides are done as mixtures) is a sensitive and precise technology that can be widely implemented for assaying proteins in matrices as complex as plasma.
A number of prior studies have evaluated the intra-and/or interlaboratory performance of MRM, fMRM, and immuno-MRM assays with respect to precision, linearity, and sensitivity (9,31). In the first published assessment of the interlaboratory performance of SID-MRM-MS for measuring proteins in digested plasma, Addona et al. (9) demonstrated detection limits of ϳ1 g/ml without prior enrichment of proteins or peptides and showed that the assays were reproducible across laboratories with CVs of ϳ25%. Keshishian et al. (10) showed that combining immuno-affinity depletion of proteins and limited chromatographic fractionation of peptides improves the LOD of SID-MRM-MS assays ϳ1000-fold (ϳ1 ng/ml) with intralaboratory CVs of 10 -25% when starting with 100 l of plasma. Here we have shown that the interlaboratory performance of multiplexed immuno-MRM-MS assays are consistent with those observed in prior studies conducted in single laboratories (15,16,18), as well as by MRM-MS and fMRM-MS. Importantly, by enriching for target peptides using anti-peptide antibodies prior to SID-MRM-MS, LOD improved ϳ1000-fold without increasing the interlaboratory imprecision relative to these other approaches. These findings support the value of generating antibodies for large scale biomarker verification studies to verify the increasing number of analytes in larger cohorts of patient samples.
Many factors contribute to the LOD and LOQ of immuno-MRM assays, including antibody capture efficiency, peptide ionization efficiency, and peptide recovery from protein digestion. In addition, more subtle effects can increase the LOD. These antibodies most likely recognize a linear epitope because tryptic peptides were used as immunogens. However, other peptides with sequences different from the analyte may contain part of the linear or conformational epitope (32) that could potentially be recognized by the antibody and simultaneously enriched. Furthermore, if the recognized epitope is contained in a peptide from an abundant protein, the antibody may become saturated, thereby decreasing the amount of analyte peptide captured. It is important to note that in this case only assay sensitivity is affected, not assay selectivity, because any captured peptide that is not the true analyte will be filtered out in the triple quadrupole mass spectrometer at either the precursor or product mass levels and will likely be Precision is reported as 1 Ϫ median % CV calculated from the mean of process replicates, i.e. single process-single injection within sites (intralaboratory, n ϭ 3) and across sites (interlaboratory, n ϭ 9). Accuracy was as 1 ϩ median % RE calculated from the means of the process replicates in the linear range as was performed on CV to calculate precision. Both values were calculated in the linear assay range using all the measurements determined above the LOD (0.23-167 fmol/l). The peptide standard refers to the 13 C-labeled synthetic peptide; protein standard refers to the uniformly 15  separated by LC retention time as well. In contrast, autoantibodies and heterophilic anti-reagent antibodies are known to produce false positive and false negative results in sandwich immunoassays (33-35). These types of interferences are not observed in immuno-MRM-MS because the auto-and heterophilic antibodies are inactivated by reduction, alkylation, and enzymatic digestion during sample preparation for SISCAPA (36 -38). The present study focuses on interlaboratory evaluation of assay performance because large scale biomarker verification studies will need to be performed across laboratories in hundreds or even thousands of patient samples to accommodate the workload. In addition, it is imperative to understand the limits of analytical variation of the assay before assessing the variability of the patient population. The percentages of change in the determined peptide/protein concentrations were consistent and linear from site to site. For example, the fold change between each concentration in the linear range was 3.01 Ϯ 0.16 for CSF3 (phase 2) for all sites where the curves were prepared using a 3-fold serial dilution. Our results indicate that the magnitude of differential regulation of proteins observed in patients samples assayed at one site by immuno-MRM-MS would be the highly similar if not identical to the values that other sites would determine analyzing the same samples. Thus, immuno-MRM-MS has sensitivity comparable with fMRM-MS and like MRM-MS is transferable across laboratories.
In general, the interlaboratory variation of immuno-MRM, including all sample processing steps (phases 3 and 4), was roughly twice the corresponding intralaboratory CVs across the linear range of the assays (Table III, column 1, bottom), averaging 14% compared with 7%. Interlaboratory variation in the linear range of the assays was also observed to be higher than intralaboratory variation in a previous MRM-MS study (9). With very tightly controlled intralaboratory CVs, the apparent larger interlaboratory CVs are attributable to differences in the mean concentrations determined at each site. We hypothesize that, despite using exactly identical samples, this small but consistent difference in measurement is due to hitherto unforeseen differences in the MS or LC, as well as other small variations within the analysis path. We suspect the latter, because (i) the changes are, for the most part, rather small and (ii) the variation is not consistent across sites, peptides, and phases of the study. Such differences between sites could arise through the accumulated effect of small contributions from factors such as variations in chromatographic quality (i.e. variation in peak tailing, peak width, etc.), instrument calibration affecting precursor, and product ion isolation window locations and width, as well as variations in how software (or analyst) integrates the data. Such variations are not unique to MS-based assays. Immunoassays can have large interlaboratory variations that are typically compensated for by using an appropriate common calibration standard (39,40). Similar approaches would be applicable to improve the accuracy of MRM-MS and SISCAPA assays. Here, in the 15 N-labeled protein standard experiment (Table IV and see  "Experimental Procedures"), we used the observed peak area ratio at each site to normalize the response for that site. Once normalization was applied, the accuracy of each site improved, as did the interlaboratory accuracy, and this led to slight increases in overall precision as well. This technique could also be applied to assays where an internal heavy protein standard is not available. If response curves similar to phase 2 samples are simultaneously prepared and run with true patient samples (mimicked by phase 4 samples), the response observed at each site, i.e. the slope (as shown in supplemental Table 4), could be used to calculate a sitespecific calibration factor for that particular day and set of runs in a peptide-specific manner.
As was demonstrated in the MRM-MS interlaboratory study (9), trypsin digestion and its requisite sample handling contributed the most to SISCAPA assay variability and reduced the recovery of target peptides from digested proteins. Here we have shown that using a stable isotope-labeled (heavy) recombinant protein as an internal standard instead of stable isotope-labeled (heavy) peptides approximately doubled the accuracy of the assay for the protein studied, from 50 to 98% on average for two peptides for S100B. This improvement is due to similar (if not identical) efficiency of proteolytic release and recovery of the target peptides from the heavy-labeled protein and the native analyte protein. These results are consistent with previous findings from a number of laboratories (29,30) demonstrating that intact albumin could be quantified by LC-MS in patient urine using a corresponding 15 N-labeled protein standard. We also found that interlaboratory assay precision improved slightly from 83 to 89%, similar to the levels observed in the absence of digestion. Provided that the protein standard accurately recapitulates the behavior of its endogenous counterpart (which may not always be true because of differences in post-translational modifications or unanticipated effects of the tag used, for example), the presence of the heavy protein at time of digestion takes into account small differences in digestion efficiency and peptide recovery in the process replicates, thereby improving the precision of the measurements in addition to the accuracy. Further improvement in the precision and accuracy of the method is expected by using liquid handling robotics for automated reduction, alkylation, and proteolytic digestion.
Although the immuno-MRM-MS and fMRM approaches have similar LOD, LOQ, and quantitative precision, these technologies differ in important practical ways relative to the ability to conduct large scale patient studies requiring the analysis of hundreds of patient samples. The practical upper multiplex level of MRM and fMRM is in excess of 100 and is only limited by the scan speed/sensitivity and scheduling capabilities of the specific triple quadrupole LC-MS/MS platform employed. In contrast, multiplex levels of higher than 50 have yet to be demonstrated for immuno-MRM-MS. The prin-ciple drawback of SISCAPA is the considerable added time (ϳ6 months) and upfront reagent costs of approximately $4000/anti-peptide antibody to produce the anti-peptide antibody reagents. Cost and time are significant factors for the fMRM-MS method as well. The requisite first step of fMRM is immunoaffinity depletion using commercial columns. These columns cost in the vicinity of $10,000 and have limited lifespans. Their use adds approximately $100/sample. Most importantly, throughput of fMRM-MS is 6 -8-fold lower than for immuno-MRM-MS analyses because of the need to analyze each SCX or high pH RP peptide fraction by LC-MRM-MS, and even at the highest plex levels, the throughput difference is still 3-4-fold lower than immuno-MRM. Another factor in favor of the immuno-MRM-MS approach is that it requires only 10 -30% of the starting input plasma required for fMRM-MS to achieve comparable ng/ml sensitivity (10 -30 l for immuno-MRM versus 100 -200 l for fMRM). The time and total cost (instrument use and reagents) for a hypothetical verification study assaying levels of 100 distinct analyte peptides (using 100 heavy peptides analogs) in 100 patient samples is roughly the same for immuno-MRM and fMRM. Although fMRM takes considerably more MS instrument time for sample analysis, this is balanced by the up-front time (ϳ6 months) and cost of anti-peptide antibody reagents (supplemental Table 5). However, with more than ϳ100 patient samples, the added cost of instrument and personnel time for fMRM-MS analysis of a 100-plex analyte assay outweighs the initial cost and delivery time of antibody reagents. For these reasons, we consider fMRM-MS to be most useful as the method for defining which assays to move into immuno-MRM assay development for large clinical sample studies and in situations where the number of samples is relatively small, such as cell-based biology studies.
Based on the results presented here, we are optimistic that immuno-MRM-MS reagents and the overall assay technology can be widely deployed for assaying proteins in plasma and tissue as MRM-MS technology has already been shown to be. We believe that immuno-MRM technology has sufficient intraand interlaboratory assay precision and sensitivity for use today in large scale verification studies of biomarker candidate proteins in patient biofluid samples in any disease of interest, as well as in quantitative biology studies in tissues and cells.