Analysis of Monoclonal Antibody Sequence and Post-translational Modifications by Time-controlled Proteolysis and Tandem Mass Spectrometry*

Methodology for sequence analysis of ∼150 kDa monoclonal antibodies (mAb), including location of post-translational modifications and disulfide bonds, is described. Limited digestion of fully denatured (reduced and alkylated) antibody was accomplished in seconds by flowing a sample in 8 m urea at a controlled flow rate through a micro column reactor containing immobilized aspergillopepsin I. The resulting product mixture containing 3–9 kDa peptides was then fractionated by capillary column liquid chromatography and analyzed on-line by both electron-transfer dissociation and collisionally activated dissociation mass spectrometry (MS). This approach enabled identification of peptides that cover the complete sequence of a murine mAb. With customized tandem MS and ProSightPC Biomarker search, we verified 95% amino acid residues of this mAb and identified numerous post-translational modifications (oxidized methionine, pyroglutamylation, deamidation of Asn, and several forms of N-linked glycosylation). For disulfide bond location, native mAb is subjected to the same procedure but with longer digestion times controlled by sample flow rate through the micro column reactor. Release of disulfide containing peptides from accessible regions of the folded antibody occurs with short digestion times. Release of those in the interior of the molecule requires longer digestion times. The identity of two peptides connected by a disulfide bond is determined using a combination of electron-transfer dissociation and ion–ion proton transfer chemistry to read the two N-terminal and two C-terminal sequences of the connected peptides.

Monoclonal antibodies (mAbs) 1 and related biological molecules constitute one of the most rapidly growing classes of human therapeutics. These large proteins ( Fig. 1) have molecular weights near 150 kDa and are composed of two identical ϳ50 kDa heavy chains (HC) and two identical ϳ25 kDa light chains (LC) (1). They also contain at least 16 disulfide bonds that maintain three-dimensional structure and biological activity (2). Although sharing similar secondary protein structures, different mAbs differ greatly in the sequence of variable regions, especially in the complementarity determining regions (CDRs) which are responsible for the diversity and specificity of antibody-antigen binding. Changes to the mAb structure introduced during the manufacturing process or storage may influence the therapeutic efficacy, bio-availability and -clearance, and immunogenic properties and thus alter drug safety (3)(4)(5). Comprehensive characterization of mAbs primary structure, post-translational modifications (PTMs), and disulfide linkages is critical to the evaluation of drug efficacy and safety, as well as understanding the structure/ function relationships (4,6). Presented in this work is a novel protein analytical platform that consists of innovative methods for mass spectrometry (MS) characterization of mAbs. The methodology reported here will have a dramatic impact on the whole field of antibody characterization.
Typical MS characterization of proteins uses a "Bottom-Up" approach. This method involves tryptic digestion of the protein(s) into small peptides (mostly below 2500 Da) followed by high-performance liquid chromatography-tandem mass spectrometry (HPLC-MS/MS) analyses of the resulting peptides (7). Although sensitive for MS analysis, small tryptic peptides often have issues such as weak retention in liquid chromatography, difficulties in assigning peptides to specific gene products, and loss of combinatorial PTM information (8). Recent years have seen developments in direct MS analysis of intact proteins (often called "Top-Down" MS). Despite increasing success in characterization of small to medium-sized proteins, MS analysis of intact proteins larger than 50 kDa, including mAbs, is still unsatisfactory because of inefficient gas-phase protein fragmentation and complex fragment ions that restrict efficient data interpretation (9,10).
A "compromise" between Bottom-Up and Top-Down approaches is the "Middle-Down" (or "Middle-Up") method. Middle-Down analysis typically involves proteolysis using proteases (e.g. Lys-C) or chemicals that hydrolyze proteins at a single type of amino acid residue. This approach aims to generate 3-15 kDa peptides which are compatible with high resolution MS/MS analysis on a chromatographic time scale. The Middle-Down approach inherits some of the advantages of Top-Down analysis, yet has less demanding instrumental requirements compared with intact protein MS in achieving sufficient signal-to-noise ratio (S/N) of fragment ions for sequence mapping (11)(12)(13)(14)(15).
However, limitations of currently available tools for Middle-Down protein analysis are also obvious. First, none of the twenty amino acids is evenly distributed along a polypeptide. Protein digestion at single-type amino acid residues can still produce very small (Ͻ1000 Da) or ultra large (Ͼ15 kDa) peptides, which deviates from the original intention of the Middle-Down approach (16). Second, the enzymatic digestion efficiency is often low for proteins with highly folded structure or low solubility. Although high concentrations of chaotropic agents such as 8 M urea are often used for protein denaturation, this harsh condition quickly deactivates many commonly used proteases. Third, traditional data-dependent ETD or electron-capture dissociation MS/MS analyses adopt a single reaction parameter for gas-phase dissociation and select only several abundant ions regardless of their charge states. As these methods were previously optimized for tryptic peptide ions that typically carry ϩ2 or ϩ3 charges, they are incompatible with the analysis of large, highly charged pep-tides that require optimized ETD to achieve high sequence coverage and PTM mapping (12).
Herein we report a "time-controlled" proteolysis method for tailored Middle-Down MS analysis of mAb. To hydrolyze the 150 kDa mAb into large peptides for HPLC-MS analysis, we fabricated a capillary enzyme reactor column that contains a specified length of immobilized protease (supplemental Fig.  S1 and S2A). Precise control of the sample flow rate leads to defined digestion time of the substrate protein in the reactor. A short digestion time results in a small number of "cuts" along the protein chain and consequently the formation of large peptides (supplemental Fig. S2B). The Bruening group previously demonstrated a similar concept using a nylon membrane electrostatically adsorbed with pepsin or trypsin. Pushing a protein solution (protein dissolved in 5% formic acid solution) through the membrane-based enzyme reactor in less than 1 s breaks the protein into large peptides that facilitate sequence mapping of horse apomyoglobin (17 kDa) and bovine serum albumin (66 kDa) by infusion electrospray ionization MS/MS (17). The advantages of their enzyme disc include simple preparation procedures, as well as the low back pressure in the thin disc that allows for rapid sample flow rate. In our present work, we designed a more robust enzyme reactor that digests alkylated or native mAb into 3-12 kDa peptides in a buffer containing 8 M urea (a condition incompatible with most widely used proteases), and characterized their amino acid sequences, PTMs, as well as the disulfide linkages using HPLC-MS/MS.
We chose a rarely used protease, aspergillopepsin I, for the enzyme reactor. Aspergillopepsin I, also known as Aspergillus saitoi acid proteinase, generally catalyzes the hydrolysis of substrate proteins at P1 and P1Ј of hydrophobic residues, but also accepts Lys at P1 (18). There are several innovative aspects of employing this enzyme: (1) Aspergillopepsin I is active in 8 M urea at pH 3-4 for at least 1 h. This extreme chaotropic condition may disrupt the higher-order structure of proteins to a great extent and allows for easy access of the protease to most regions of the substrate protein once the disulfide bonds are reduced. (2) Compared with proteases with dual-or single-type amino acid specificity, aspergillopepsin I provides more cleavage sites along an unfolded substrate protein. Allowing limited time for the substrate protein to interact with immobilized aspergillopepsin I should generate large peptides with a relatively narrow size distribution because of similar numbers of missed cleavages on these peptides. (3) The enzyme reactor automatically "quenches" proteolysis as the sample flows out of the column. This is in great contrast to in-tube digestion using solubilized proteases that are active in acidic conditions. In the latter case, digestion is difficult to quench or control because of the sustained enzymatic activity in an acidic condition. (4) Compared with electrostatic or hydrophobic interactions for enzyme immobilization, covalent conjugation of the protease onto porous beads should prevent the replacement of enzymes by upcom- ing substrate proteins. (5) The enzyme beads can be stored at 4°C for at least half a year once water is removed, allowing the production of hundreds of disposable enzyme reactors from one batch of beads. In addition, we introduced a new cysteine (Cys) alkylation reagent, N-(2-aminoethyl)maleimide (NAEM) for protein MS analysis. This reagent improves ETD (19) of peptides containing Cys residues by adding a basic, readily protonated side chain to thiol groups.
The above features of our new strategy led to the generation of large, highly charged peptides that cover the entire murine mAb. Analyzing ETD and collisionally activated dissociation (CAD) fragments from the most abundant large peptides by ProSightPC revealed near complete sequence coverage of the mAb and multiple PTMs. Furthermore, we digested the native mAb into large fragments of disulfidebonded peptides using time-controlled digestion. The ETD/ ion-ion proton transfer (IIPT) technique (20) Table S1 lists specific column lengths for different samples.
Sample Preparation and Protein Digestion-Supplemental information section 1.3 describes the detail procedures for protein digestion. Briefly, apomyoglobin from equine skeletal muscle was dissolved in the digestion buffer (pH 3.9 containing 8 M urea) at 0.2 g/l and pressure-loaded through the enzyme reactor at different flow rates to achieve different digestion times.
To digest the murine IgG1 mAb (Waters, Milford, MA) with the enzyme reactor for peptide mapping, the disulfide bonds of mAb were reduced with tris(2-carboxyethyl)phosphine (TCEP) and alkylated with NAEM at pH 6.8 in 8 M urea. The NAEM-alkylated mAb was acidified and diluted to 0.2 g/l in a buffer containing 8 M urea (final pH 3.9), followed by on-column digestion with different digestion times. The same mAb was also reduced and alkylated with iodoacetamide (IAM) as a control sample for on-column digestion with aspergillopepsin I and in-tube digestion with Lys-C (Roche Diagnostics, Indianapolis, IN). To digest the native mAb for disulfide localization, the intact mAb was dissolved in the digestion buffer for time-controlled on-column digestion with aspergillopepsin I. An aliquot of each native mAb digest was reduced with TCEP to separate the disulfide-linked peptides.
Chromatography and Mass Spectrometry-An Agilent Technologies (Palo Alto, CA) 1100 Series binary HPLC system was interfaced with a Thermo Scientific™ Orbitrap Velos Pro™ Hybrid Ion Trap-Orbitrap Mass Spectrometer (San Jose, CA) for online separation of protein digests. One pmol of the protein digest was pressure-loaded onto a 360 m o.d.ϫ150 m i.d. fused silica capillary precolumn packed with 11 cm long POROSHELL 300SB-C18 (5 m diameter, Agilent). After desalting, the precolumn was connected to an analytical column (360 m o.d. ϫ 50 m i.d. capillary packed with 16 cm of the same material) that was integrated with a laser-pulled nanoelectrospray emitter tip (21). Peptides were eluted at a flow rate of 100 nL/min using the following gradient: 0 -25% B for 5 min, 25-60% B for 105 min, 60 -100% B for 4 min (A ϭ 0.3% formic acid in water; B ϭ 0.3% formic acid, 72% acetonitrile (Mallinckrodt, Inc., Paris, KY), 18% isopropanol and 9.7% water).
Mass spectrometric analyses of apomyoglobin and mAb sequences consisted of an HPLC-MS experiment with full MS Orbitrap scans for sample evaluation (Experiment I), followed by a multisegment HPLC-MS/MS experiment with ETD MS/MS Orbitrap scans (Experiment II) targeted on the most abundant large peptides (3-9 kDa) selected from Experiment I. For each selected large peptide, the ion with the highest charge state (but with sufficient intensity) was selected as the precursor ion for ETD MS/MS analysis. ETD reaction time was set based on the following formula, t ETD ϭ 50 ms ϫ (3/ charge state) 2 . To obtain maximum mAb sequence coverage, an HPLC-CAD MS/MS experiment was performed as Experiment III in a similar way as in Experiment II to generate complimentary peptide sequence information. To characterize the location of mAb disulfide bonds, ETD/IIPT and ion multi-fill techniques (22) were performed on the on-column generated disulfide-containing mAb fragments to identify the N-and C-terminal sequences of the two disulfide-bonded peptides. See supplemental information section 1.4 for detailed MS methods.
In addition to the on-column time-controlled digestions, a Lys-C digest of the murine mAb was generated in a tube and then analyzed using data-dependent HPLC MS/MS. See supplemental information section 3 for details.
Data Analysis-Targeted ETD MS/MS spectra for each of the preselected peptides were merged (if there was more than one scan per peptide) and extracted from the raw file of the HPLC-MS/MS Experiment II using Xcalibur™ 2.1 (Thermo Scientific). The same procedure was performed for CAD spectra obtained from Experiment III of the mAb analysis. Each extracted ETD and CAD spectrum was then searched against the single protein database, horse myoglobin (accession number P68082, see supplemental Da precursor mass error were directly accepted. Identified peptides with ϩ1.0 Da mass difference from the theoretical MW were considered for possible deamidation on Asn, and the deamidation sites were further identified by rematching the overall MS/MS fragment ions after manually adding ϩ0.9840 Da on potential Asn residue(s) suggested by ⌬m function. Unidentified peptides were further searched by increasing the precursor tolerance to 20 Da to allow for Met oxidation or N-terminal pyroGlu from Gln (pyroglutamic acid formation of N-terminal Gln), or to 2000 Da to allow for glycosylation. Newly identified peptides were further verified by rematching the fragment ions in the program after manually adding either ϩ15.9949 Da (oxidation) on potential Met residue(s) suggested by ⌬m function or ϩ1444.5338 Da (fucosylated biantennary (-2 galactose) oligosaccharide, i.e. G0F) on potential Asn residue(s) suggested by ⌬m function. The identified PTM sites of all modified peptides were manually verified. To calculate sequence coverage, the c-, z-, b-and y-type fragment ions assigned by ProSightPC 3.0 with mass error within Ϯ15 ppm were accepted. The mass error for the remaining fragment ions was recalculated by allowing a Ϯ1.00 or Ϯ2.00 unit Da mass shift (because of incorrect monoisotopic peak selection by the software, or electron transfer without dissociation (23)), and the ions with their new mass error within Ϯ15 ppm were accepted after manual verification.
As a comparison, MS data from the mAb Lys-C digest obtained from data-dependent HPLC MS/MS were searched against the murine mAb reference sequence using Open Mass Spectrometry Search Algorithm (OMSSA, version 2.1.8) (24). See supplemental information section 3 for details.

Time-Controlled Proteolysis with Aspergillopepsin I-To
generate 3-12 kDa fragments from protein samples, we constructed a micro-column enzyme reactor using fused silica (150 m i.d.) packed with the protease, aspergillopepsin I, covalently linked to 20 m particles. This protease is active under acidic conditions and exhibits broad specificity with preference for hydrolysis of amide bonds at the N terminus of hydrophobic residues such as Val, Ile, and Leu, and at the C terminus of Lys (18). It also has the unusual property of being active in 8 M urea for at least 1 h. Under these conditions most protein substrates, once their disulfide bonds (if any) are reduced, will be completely denatured and easily digested. Operation of the enzyme reactor involves flowing a solution of the protein sample in pH 3.9 buffer containing 8 M urea under constant back pressure through the packed fused silica column (supplemental Fig. S2A). This produces a constant flow rate and allows the digestion (residence) time in the enzyme reactor to be calculated as a function of the flow rate and the length of the packed enzyme reactor (see supplemental information section 1.2 for details).
On-column Digestion and Sequence Analysis of Horse Apomyoglobin-Shown in supplemental Fig. S5 are total ion current (TIC) chromatograms from micro-capillary HPLC-MS spectra recorded on peptides produced from the standard, 17 kDa protein, horse apomyoglobin, with three different digestion times (2.  Fig. 2A is the base peak chromatogram for the peptides generated with 0.77 s digestion. Major large peptides are labeled with apomyoglobin amino acid sequence numbers deduced from ETD spectra recorded on each parent ion population. Fig. 2B displays the ETD spectrum recorded on the ϩ7 ion at m/z 619.89 that corresponds to the last 40 residues (114 -153) in the protein (all multiply charged ions have been converted to ϩ1 ions by Xcalibur Xtract). Observed fragment ions of type c and z⅐ are labeled below the spectrum. Total sequence coverage for apomyoglobin observed with two peptide fragments 1-69 and 70 -153 was 86%, and with 5 peptides, 1-31, 32-69, 70 -113, 114 -153, and 105-153 was 97% (supplemental Fig. S6).
On-column Digestion and Sequence Analysis of a Murine mAb-For on-column digestion of reduced and alkylated mAb in 8 M urea, several reaction times were evaluated (Fig. 3) as described above for apomyoglobin. From this data, a digestion time of 5.7 s was selected because HPLC-MS (Experiment I) of the resulting product mixture showed that more than three dozen peptides with masses in the 3-10 kDa range eluted within the chromatographic time window from 30 -105 min. Intact protein was not detected.
To acquire sequence information for peptides in the 3-10 kDa range, we divided the TIC chromatogram into 8 time segments (supplemental Fig. S3) and selected ions representing the most abundant large peptides at specific m/z values in each segment for targeted analysis by ETD during a second HPLC-MS/MS experiment (Experiment II). Generation of a list of the ions to be targeted involved averaging of the mass spectra (full MS) acquired in a particular segment of Experiment I, deconvolution of the clusters of multiply charged ions into single peaks corresponding to the monoisoptic molecular weights of each component, and selection of a m/z window that would only contain highly charged ions from one of the peptides to be targeted. An example of ETD sequencing of HC 37-77 is presented in supplemental Fig. S4.
Using the above approach, 39 large peptides were selected based on their abundance from Experiment I. These peptides were targeted for ETD during 8 time segments of a second HPLC MS/MS experiment (Experiments II), as well as for CAD the same way in a third HPLC MS/MS experiment (Experiment III). These peptides and their PTMs (if any) were successfully identified by ProSightPC software using Biomarker mode search. Among these peptides, four peptides (LC1-52, 53-110, 111-148 and 149 -219) cover the entire LC, and 9 peptides (HC1-36, 37-83, 84 -148, 149 -210, 211-260, 261-276, 277-319, 320 -371 and 372-441) cover the entire HC. MS/MS spectra acquired on 6 LC peptides and 14 HC peptides provided 98 and 94% sequence coverage of these two mAb components, respectively (supplemental Fig. S7). In addition, spectra recorded on 8 of the 39 peptides detected one or more PTMs. Shown in Fig. 4A is an average of the peptide signals (displayed as monoisotopic molecular masses) that were detected in mass spectra recorded during Segment I-3 (supplemental Fig. S3) of the TIC chromatogram. Peptides targeted for sequence analysis by ETD MS/MS and identified by ProSightPC and by manual interpretation are labeled with the appropriate amino acid sequence numbers. Note that there are three groups of signals in the spectrum that contain peaks that differ in mass by 162 Da. This mass difference corresponds to that of hexose and suggests that all nine of the signals are likely to come from glycopeptides. Shown in Fig. 4B is the ETD mass spectrum recorded on m/z 840.1 (ϩ8 charge state) ions from the peptide of MW 6708 Da. For display purposes, all of the multiply charged ions in the ETD spectrum have been converted to singly-charged ions by the Xcalibur Xtract program. ProSightPC and manual interpretation both assigned the peptide sequence as residues 277-319 of the HC with an extra 1444 Da attached to Asn 292 . This is the expected mass of the N-linked, displayed in Fig. 4A. Addition of one and two galactose residues to this structure would generate the expected G1F and G2F structures and explain the signals observed at molecular masses at 6870 Da and 7032 Da.
In addition to N-glycosylation, we also found significant (ϳ40%) deamidation of an Asn residue to Asp and isoAsp on the HC. Because of this modification, peptide HC 80 -148 eluted as three adjacent peaks as shown in supplemental Fig.  S8A, with their monoisotopic masses differing by 1 Da (NH3 O) (supplemental Fig. S8B) (25). ProSightPC and manual interpretation of CAD spectra of the three peaks confirmed the deamidation site to be Asn 138 on HC (supplemental Fig.  S8C). We noticed that the deamidation level of Asp 138 in the HC of this commercial murine mAb is consistent in different digests from the same mAb sample. However, the level increased greatly in digests from later mAb sample batches. The deamidation of HC Asp 138 should not be affected to this extent by sample preparation and MS analysis, as they were performed in acidic conditions except for the alkylation procedure carried out at pH 7 for only 10 min. It is known that acidic condition can minimize mAb deamidation (26). Other PTMs defined by targeted analysis of the 3-9 kDa peptides in the digest include oxidation of multiple Met residues in LC and HC to the corresponding sulfoxide, as well as conversion of Gln to pyroglutamate at the N terminus of HC. Table I lists all the identified PTMs. Supplemental Table S2 lists the relative quantification of each identified PTM.
Charge Enhancement on Cys Improves MAb Sequence Coverage by ETD-In this work, we introduced a novel reagent, NAEM, for Cys alkylation prior to protein digestion (supplemental Fig. S9). NAEM alkylates Cys residues completely in only 10 min. Compared with Cys alkylation with the widely used iodoacetamide (IAM), reacting sulfhydryl groups with NAEM increases the charge state of the protein. Placing a positively charged amino group on Cys side chains facilitates fragmentation and sequence analysis of nearby residues in the LC and HC by ETD MS. For example, alkylation of LC1-52 (5767 Da) Cys with NAEM adds one more charge to this peptide compared with the IAM-alkylated form ( Fig. 5A and  5B). Alkylation of Cys with NAEM yielded a twofold improvement in the sequence coverage of LC1-52 using ETD (Fig.  5C). Many of the newly appearing c and z⅐ fragments come from CDR1 close to the Cys residue. In another example, replacing IAM with NAEM on LC111-148 enhanced charge state by ϩ1, and increased the sequence coverage from 10.5% to 89.5%. Similarly, alkylating the 5 Cys of HC211-260 with NAEM instead of IAM enhanced the charge state distribution by ϩ3 and improved peptide sequence coverage upon ETD from 53 to 73%. Supplemental Table S3 lists the comparison of NAEM-and IAM-alkylated Cys-containing peptides in sequence coverage upon ETD.
Comparison with In-tube Lys-C Digestion, Data-dependent MS/MS and Traditional Enzyme-based Database Search-To compare our new Middle-Down methodology with conventional peptide mapping method, we performed in-tube Lys-C digestion for the murine mAb followed by data-dependent HPLC MS/MS analyses. We then searched the ETD and CAD data against the in silico digested reference mAb sequence by OMSSA to identify mAb peptides and their PTMs. Details can be found in supplemental information section 3.
We chose Lys-C for comparison because, as opposed to aspergillopepsin I, Lys-C digests protein at a single type of amino acid residue (Lys) and is widely used for producing peptides larger than tryptic peptides. Displayed in supplemental Fig. S10 is the TIC chromatogram of the Lys-C peptides. The first 1/8 portion (40 -50 min) of the chromatogram is dominated by peptides smaller than 3 kDa, which are too small to provide Middle-Down benefits. On the other hand, the last 1/3 portion of the chromatogram is occupied by peptides over 11 kDa with poor separation resolution. Compared with 3-9 kDa peptides, these ultra-large peptides are less effective in producing sequence information in data-dependent MS/MS, as can be seen from the frequent false  Although the OMSSA search identified Lys-C peptides that cover the entire mAb, the sequence coverage for many Lys-C peptides obtained from data-dependent MS/MS is low and PTMs can be difficult to localize. For example, the Lys-C peptide HC254 -315 (8.7 kDa) eluted as a base peak in HPLC-MS (supplemental Fig. S10). However, ETD and CAD scans from the data-dependent MS/MS contain fragment ions that allow verification of only 56% of the peptide sequence and do not fully cover the site (Asn 292 ) modified with glycan G0F (supplemental Fig. S11). In contrast, our new Middle-Down approach covered 90% sequence of a 6.7 kDa peptide (HC277-319) using only ETD, with confident assignment of glycan on Asn 292 (Fig. 4). In another example, ETD and CAD scans from the data-dependent MS/MS yielded 41% sequence coverage of Lys-C peptide LC1-50, including 31% of CDR1 (LC24 -39) (supplemental Fig. S12). In contrast, our new Middle-Down approach mapped 90% sequence of peptide LC1-52, including 94% of CDR1.
Location of Disulfide Bonds-Sequence analysis of the reduced and alkylated mAb revealed 5 and 12 Cys residues in the LC and HC, respectively. Because the intact antibody has two HCs and two LCs, the total number of Cys residues in the molecule is 34 and the likely number of disulfide bonds is 17. To specify the location of disulfide linkages, we evaluated multiple reaction times for on-column digestion of the intact mAb. We searched mass spectra (after deconvolution to the monoisotpic molecular weights) recorded on the digestion products for large peptides that were not detected in the data set generated from the post-TCEP reduction sample of the same digest (see supplemental Fig. S13 for an example). We then targeted these unique peptides (nonreduced form) for sequence identification. A combination of ETD and gas phase IIPT reactions on these target ions allowed us to read amino acid sequences from the two N termini and two C termini of the disulfide-bonded peptides (22).
As might be expected, reaction times required to digest the tightly folded, and disulfide bond constrained, intact mAb in 8 M urea were substantially longer than that required for the fully reduced, alkylated, denatured molecule. Digestion times were also different for the different domains of the antibody structure. Most accessible were the N-terminal variable domain of LC (V L ) and C-terminal constant domain of HC (C H 3), both of which released disulfide-bond containing peptides after 12 s on the reactor column. Those in the other domains, i.e. V H , C L plus C H 1, and C H 2 plus the hinge region required 93, 260 and 740 s, respectively, on the enzyme reactor column before being released (supplemental Fig. S14). All of the 17 disulfide bonds of the murine IgG were successfully characterized.
Shown in supplemental Fig. S15A is the ETD mass spectrum recorded on the ϩ12 charge state ion ((Mϩ12H) ϩ12 ϭ 985.5) of the disulfide-connected peptides (MW 11,814) released from the intact mAb after 12 s on the enzyme reactor. To acquire this spectrum, ϩ12 ions were allowed to accept an electron from fluoranthene radical anions and then dissociate at the various amide bonds into a collection of fragment ions of type c and z⅐ from both of the attached peptides. These fragment ions can carry up to 11 protons, and many of them (such as the c ϩ10 and z⅐ ϩ10 ions) are fragments from one peptide but with the Cys linked to the entire chain of the second peptide (see supplemental Fig. S15C and S15D). To simplify this mixture, highly charged fragments were allowed to react with a second gas phase anion, SF 6 Ϫ ⅐ that functions as a base and removes protons and thus charge from the c and z⅐ ions. The result is a mass spectrum that contains ions carrying only ϩ1 or ϩ2 charges which are distributed across the 4000 m/z range of the Orbitrap mass spectrometer (Fig. 6). To enhance the S/N of fragment ions in this spectrum, we used C-trap multiple fill technology to store products of 15 ETD/IIPT reactions before the ions were scanned in the Orbitrap (22). Observed ions were more than sufficient to identify the two disulfide peptides as LC 1-52 and LC 53-108. Because each of the two peptides contains only one Cys residue, the disulfide bond was assigned as LC Cys 23 -Cys 93 . Note that capture of an electron into the disulfide bond produces fragments that correspond to protonated forms of the two peptides that were previously disulfide-linked. Doubly charged ions at m/z 2880 and 3022 in Fig. 6 are the result of this phenomenon and confirm the identity of the connected peptides. The same protocol was employed to identify the disulfide bond connections presented in Table I and  to 3-9 kDa. Peptides in this MW range have higher sequence coverage compared with small tryptic peptides, and are spread across a wide range in the chromatogram generated from porous shell-type C18 silica particles. On the other hand, these large peptides are more compatible with online high resolution MS/MS analysis than intact proteins. Although aspergillopepsin I favors protein hydrolysis at hydrophobic and Lys residues, the probabilities of hydrolysis at these sites are not completely equal. Otherwise the time-controlled digestion would generate a much higher number of 3-9 kDa mAb peptides with highly overlapped sequences and equivalent quantities. Based on the most abundant large peptides produced from apomyoglobin and mAb in this work, the most frequent hydrolysis occurs at the N terminus of Val, Leu, and Ile, and the C terminus of Lys. The following calculations can also reflect the partially controlled protease specificity. The average MW of the 39 most abundant large peptides from murine mAb targeted for MS/MS analysis is 5485 Da. With this MW as a "standard" Middle-Down peptide length, covering the whole sequence of LC and HC (combined MW 76.5 kDa) would require at least 14 peptides of this average MW with no overlapping sequences. The 14 hypothesized peptides should contain 11 missed cleavages on average if considering only Val/Leu/Ile/Lys as potential hydrolysis sites. If the chance of hydrolysis is equal for each of the four amino acid residues, the actual number of resulting peptides with this size would be ϳ160. This number can greatly increase if additional types of amino acid residues (e.g. Gly, Trp, Phe) are considered as potential hydrolysis sites. However, in our work the actual number of major large peptides selected for HPLC-MS/MS analysis is only 39. These peptides cover the entire mAb primary structure and led to 95% sequence coverage by MS/MS, suggesting controllable sample complexity for mAb sequencing. Moreover, these peptides provide some overlapping sequences which are beneficial for protein sequencing.
Taking advantage of the highly reproducible retention time on the POROSHELL C18 column (within 0.3 min for each peptide in the HPLC MS experiments I, II and III), we analyzed the 39 preselected large mAb peptides using a customized multi-segment MS/MS method. Selecting the highest charge state of each peptide for ETD with tailored ETD reaction time maximizes the fragmentation efficiency and avoids wasting MS/MS scans on the same peptide with other charge states.
We improved ETD of many Cys-containing peptides by derivatizing Cys residues with an amino group prior to digestion. As over a dozen disulfide bonds are evenly distributed within different domains of an IgG molecule, this strategy improves the overall sequence coverage of the mAb. It is worth noting that some studies correlating IgG secondary structure to the primary structure information found that CDR-L1 and -L3 always begin immediately after a Cys residue, and CDR-H1 and -H3 always begin only a few amino acid residues after a Cys (27). Considering the close proximity of Cys to CDRs, placing a positively charged amino group on Cys residue(s) may facilitate mapping nearby CDR sequences by ETD.
In our new methodology, we verified peptide sequence from the MS/MS data using a ProSightPC Biomarker search (no enzyme search). This search mode first identifies a candidate peptide (any portion of a protein in the database) matching an observed precursor mass, and then compares the theoretical fragment ion masses of the candidate peptide to the observed fragment ion masses. By including ⌬m function to the N terminus and C terminus of candidate peptides in Biomarker search, we observed the pattern of fragments with and without ⌬m (e.g. ϩ0.9840 Da for deamidation), and successfully identified one or more PTMs on multiple large peptides. The ⌬m function avoids incorporating multiple variable modifications into search algorithm. The latter is widely adopted in traditional database search programs based on in silico digestion of candidate protein(s) (e.g. SEQUEST, Mascot, and OMSSA). This traditional search mode has been proved to be successful for identifying small tryptic peptides. However, the computational complexity greatly increases with the growth of peptide length when adding multiple types of PTMs in the variable modification list, and may result in frequent false positive identifications. This can be seen from the OMSSA search result of Lys-C peptides in our comparison study, which will be discussed below.