Facilitating Protein Disulfide Mapping by a Combination of Pepsin Digestion, Electron Transfer Higher Energy Dissociation (EThcD), and a Dedicated Search Algorithm SlinkS*

Disulfide bond identification is important for a detailed understanding of protein structures, which directly affect their biological functions. Here we describe an integrated workflow for the fast and accurate identification of authentic protein disulfide bridges. This novel workflow incorporates acidic proteolytic digestion using pepsin to eliminate undesirable disulfide reshuffling during sample preparation and a novel search engine, SlinkS, to directly identify disulfide-bridged peptides isolated via electron transfer higher energy dissociation (EThcD). In EThcD fragmentation of disulfide-bridged peptides, electron transfer dissociation preferentially leads to the cleavage of the S–S bonds, generating two intense disulfide-cleaved peptides as primary fragment ions. Subsequently, higher energy collision dissociation primarily targets unreacted and charge-reduced precursor ions, inducing peptide backbone fragmentation. SlinkS is able to provide the accurate monoisotopic precursor masses of the two disulfide-cleaved peptides and the sequence of each linked peptide by matching the remaining EThcD product ions against a linear peptide database. The workflow was validated using a protein mixture containing six proteins rich in natural disulfide bridges. Using this pepsin-based workflow, we were able to efficiently and confidently identify a total of 31 unique Cys–Cys bonds (out of 43 disulfide bridges present), with no disulfide reshuffling products detected. Pepsin digestion not only outperformed trypsin digestion in terms of the number of detected authentic Cys–Cys bonds, but, more important, prevented the formation of artificially reshuffled disulfide bridges due to protein digestion under neutral pH. Our new workflow therefore provides a precise and generic approach for disulfide bridge mapping, which can be used to study protein folding, structure, and stability.

Disulfide bridges are one of the most common post-translational modifications in proteins (1). The formation of disulfide bonds between cysteine residues is a crucial component in the process of protein folding and plays an important role in stabilizing the tertiary and quaternary structures of proteins (2,3). Therefore, detecting and characterizing the exact locations of disulfide bonds is an important aspect of proteomics, especially in the context of gaining a comprehensive understanding of protein folding and three-dimensional structures. Moreover, in the use of protein therapeutics (e.g. antibodies), it is also of interest to monitor the reshuffling of disulfide bonds during formulation, storage, and usage, which reflects the antibody structure, stability, and biological function (4).
Most knowledge about protein disulfide bridges comes from detailed molecular structures obtained via x-ray crystallography and NMR spectroscopy (5,6), although regrettably such data are mostly obtained from overexpressed recombinant proteins. Mass spectrometry is gaining importance in the identification and characterization of protein disulfide bridges (7,8). Some advantages of MS-based approaches include relatively easy sample preparation, short analysis time, and the capability to deal with more complex protein mixtures from endogenous sources. However, the detection of disulfide bridges remains challenging for a few reasons.
Firstly, the presence of free sulfhydryl groups can induce undesired sulfhydryl-disulfide reshuffling, especially under neutral and alkaline pH condition. As most standard proteomic strategies use enzymatic digestion in a pH range of 7.5-8.5, undesirable disulfide reshuffling can occur during sample handling (8). Secondly, most of the widely applied database searching programs, such as SEQUEST and Mascot, are not developed, and thus are not suitable, for analyzing fragmentation spectra originating from disulfide-bridged peptides (9).
Efforts have been directed at tackling these obstacles and facilitating the identification of authentic disulfide bridges. With respect to sample handling, it has been demonstrated by several groups that disulfide reshuffling can be reduced by (i) blocking free cysteines using alkylating reagents before denaturing the protein, (ii) lowering the pH to 6.0 to 7.0 during tryptic digestion (8, 10 -13), and (iii) using the enzyme pepsin under acidic conditions for proteolytic digestion (13)(14)(15)(16)(17). Unfortunately, trypsin becomes less efficient and less specific at more acidic pH, and pepsin, which has an optimal pH range of 1-3, tremendously increases the complexity of both protein digests and data analysis (8). Regarding data analysis, one of the current approaches used for the identification of disulfide bridges involves chromatographic comparison between reduced and non-reduced protein digests, with disulfidebridged peptides appearing only in non-reduced samples (8,12). Alternatively, disulfide bonds can be identified directly from non-reduced protein digests using an electron transfer dissociation (ETD) 1 MS 2 and collision-induced dissociation (CID)/higher energy collision dissociation (HCD) MS 3 fragmentation scheme (termed the ETD-MS 2 CID/HCD-MS 3 approach) (13,18,19). Thereby, ETD aids in the preferential cleavage of S-S linkages, generating two disulfide-cleaved peptides, which can be subsequently isolated and further fragmented via CID/HCD for sequence information. In addition, substantial efforts have been made to develop novel strategies specifically for interpreting spectra from disulfidebridged peptides, including de novo sequencing approaches (20,21) and database search engines such as MassMatrix and Dbond (9,22).
A combined dual fragmentation scheme, referred to as electron-transfer and higher-energy collision dissociation (EThcD), was introduced by our group recently as implemented on an Orbitrap Elite (23)(24)(25) and will become available for the Orbitrap Fusion. In this approach, an initial ETD step is applied to fragment the isolated MS precursor, and subsequently all resulting ions are subjected to HCD fragmentation, generating a mixture of b/y and c/z ions. Here we explored the use of EThcD for disulfide bridge analysis. We reasoned that the previously reported ETD-MS 2 CID/HCD-MS 3 method could be integrated into a single EThcD experiment, with ETD applied first to preferentially break the disulfide bond and HCD employed next to enhance the number of peptide backbone fragments. Based on the fact that all the ions resulting from the ETD process are subjected to HCD simultaneously and thus no MS 3 isolation is necessary, the sensitivity and duty cycle of the EThcD workflow should potentially be improved relative to the previous MS 3 strategy.
In this work, we describe a fast and accurate framework for both intrapeptide and interpeptide disulfide bridge identification, including the acidic digestion procedure using pepsin, the usage of the dual-fragmentation scheme EThcD, and the development of a novel search engine, SlinkS. The workflow described herein diminishes issues induced by disulfide reshuffling during sample preparation and provides direct and efficient identification of intrapeptide and interpeptide disulfide bonds from LC/MS 2 experiments. We evaluated the integrated workflow using a mixture of six standard proteins and confirmed that this approach enables reliable and robust identification of authentic disulfide bridges from protein mixtures. Furthermore, we assessed the capability of the workflow to quantitatively monitor the changes of disulfide bridges in stress-induced therapeutic antibodies.
Proteolytic Digestion of Standard Protein Mixture-Five standard proteins, several of which are rich in disulfide bridges (namely, cytochrome C, bovine serum albumin, lysozyme C, ribonuclease B, and ␤-lactoglobulin) were reconstituted in 10 mM PBS (pH 7.4) at a concentration of 1 mg/ml and mixed in an equal ratio (w/w). The protein mixture was diluted 10 times with 0.04N HCl (pH 1.5) for pepsin digestion, 10 mM PBS (pH 7.8) for trypsin digestion at normal pH, and 10 mM PBS (pH 6.8) for trypsin digestion at low pH. Wild-type IgG1 (1 mg/ml in 20 mM Tris-HCl, pH 7.4) was added to each of the samples to reach a final concentration of 0.1 mg/ml. For pepsin digestion, 2 M urea was added to denature the protein, and pepsin was added in a 1:50 (w/w) ratio. The digestion was performed for 2 h at room temperature. For both normal and low-pH trypsin digestion, iodoacetamide (normal-pH digestion) or N-ethylmaleimide (low-pH digestion) was used to alkylate free Cys residues for 30 min at room temperature in the dark, and then 8 M urea was added to denature the protein.
Lys-C was added in a 1:75 (w/w) ratio, and samples were incubated for 4 h at 37°C. The sample was then diluted four times with 10 mM PBS (the same pH as the sample). Trypsin was added in a 1:100 (w/w) ratio, and incubation was carried out overnight at 37°C.
Proteolytic Digestion of Heat-stressed Therapeutic Antibody-20 l of wild-type IgG1 (1 mg/ml in 20 mM Tris-HCl, pH 7.4) was incubated at 37°C, 60°C, and 70°C for 2 h. Then the non-heated and heated samples were diluted 10 times with 0.04N HCl (pH 1.5) prior to pepsin digestion. The digestion procedure was applied as described above.
LC/MS Analysis-Protein digests were analyzed by an ultra-HPLC Proxeon EASY-nLC 1000 (Thermo Fisher Scientific, Odense, Denmark) coupled on-line to an ETD-enabled LTQ Orbitrap Elite mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). Reversed-phase separation was accomplished using a 100 m inner diameter ϫ 2 cm trap column (in-housed packed with ReproSil-Pur C18-AQ, 3 m) (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) coupled to a 50 m inner diameter ϫ 50 cm analytical column (in-house packed with Poroshell 120 EC-C18, 2.7 m) (Agilent Tech-nologies, Amstelveen,The Netherlands). Mobile-phase solvent A consisted of 0.1% formic acid in water, and mobile-phase solvent B consisted of 0.1% formic acid in acetonitrile. The flow rate was set to 100 nL/min. A 50-min gradient was used (7% to 30% solvent B within 31 min, 30% to 100% solvent B within 3 min, 100% solvent B for 5 min, 100% to 7% solvent B within 1 min, and 7% solvent B for 10 min). The instrument firmware was modified to allow all-ion HCD fragmentation after the ETD step, as described earlier (23), allowing an EThcD fragmentation scheme. The five most abundant precursors were selected for EThcD data-dependent fragmentation. All data were acquired in the Orbitrap mass analyzer. For MS scans, the scan range was from 350 to 1500 m/z at a resolution of 60,000, and the automatic gain control target was set at 1 ϫ 10 6 . For MS 2 scans, the resolution was 15,000, the automatic gain control target was set at 2 ϫ 10 5 , the precursor isolation width was 2 Da, and the maximum injection time was 500 ms. The ETD reaction time was set at 50 ms, the ETD automatic gain control target was 2 ϫ 10 5 , and the HCD normalized collision energy was 32% (calculation based on precursor m/z and charge).
Data Analysis-The raw data files were converted to .mgf files using Thermo Scientific Proteome Discoverer 1.4 software (Thermo Fisher Scientific) with the add-on node MS 2 -Spectrum Processor. The non-fragment filter node was used with the following parameters: remove precursors, 4-Da window offset; remove charge reduced precursors, 2-Da window offset; and remove neutral losses from charge reduced precursors, 2-Da window offset. The MS 2 -Spectrum processor was used to deconvolute and deisotope the original m/z values to singly charged masses. The in-house-developed algorithm SlinkS was used for the main search. SlinkS was developed in R (version 3.0.1). The source code and user manual are available via Sourceforge with the package name SlinkS. The singly charged precursors and product ions in the .mgf files were converted to neutral masses. The following settings were used: precursor ion mass tolerance, 10 ppm; product ion mass tolerance, 20 ppm; and variable modification, Met oxidation. For pepsin digestion, the setting "nonspecific enzymatic digestion" was used. For trypsin digestion, the setting "fully tryptic enzymatic specificity" and up to three miscleavages were used, and additional variable modification of carbamidomethyl (ϩ57.021 Da) or N-ethylmaleimide (ϩ125.0476 Da) was allowed to accommodate modifications induced by the alkylating reagents. The target database contained the sequences of six standard proteins, and the decoy database was generated by randomizing each sequence in the target database. SlinkS performed all the searches against a combined target and decoy database (12 protein entries as presented in supplemental Table S5). Of note, with the use of nonspecific in silico enzymatic digestion, this small protein database results in a peptide database containing 114,961 entries enabling proper false discovery rate (FDR) calculations. SlinkS searches both intrapeptide disulfide bridges (where the disulfide bridge is within one peptide) and interpeptide disulfide bridges (where the disulfide bridge is between two peptides). Intrapeptide disulfide bridges were searched against the peptide database containing at least two cysteines per peptide, whereas interpeptide disulfide bridges were searched against the peptide database containing at least one cysteine per peptide. All the identifications were filtered according to an estimated FDR of 1%, with a corresponding natural log based n-score cutoff of Ϫ21.5 for EThcD spectra and Ϫ20.5 for ETD spectra. All the fully annotated spectra are available in the supplemental material.

RESULTS AND DISCUSSION
Characterizing the Fragmentation Pattern of Disulfidebridged Peptides under Different Activation Methods: HCD, ETD, and EThcD-To evaluate in detail the fragmentation patterns of interpeptide disulfide linkages under HCD/ETD/ EThcD conditions, we selected five disulfide-containing peptides from a trypsin-digested standard protein mixture (see supplemental Table S1 for the list of selected peptides). Each peptide was fragmented by means of the three aforementioned fragmentation schemes, and all 15 MS 2 spectra were manually annotated with the consideration of four possible fragment categories, which are summarized in Fig. 1. The fragment categories were chosen to contain (i) standard backbone fragments (annotated as b-, y-, c-, and z-ion series), (ii) disulfide-bond-containing backbone fragments (annotated as b-, y-, c-, and z-ϩA/B ion series), (iii) disulfide-bond-specific ions (cleaved at S-S or S-C bonds, annotated as S-S or S-C cleavage), and (iv) double-cleavage ions (referring to the specific situation in which one cleavage is at the peptide backbone and the other one is at the S-S/S-C bond; annotated as b-, y-, c-, or z-ϩ S-S/S-C cleavage). A representative set of annotated HCD, ETD, and EThcD MS 2 spectra is shown in Fig. 2 and supplemental Fig. S1.
The purpose of our SlinkS search algorithm is to determine the precursor masses of the two disulfide-cleaved peptides (e.g. those annotated as peptides A and B in Figs. 1 and 2) and search each of the linked peptides against a linear peptide database. The first critical challenge is to efficiently discriminate the two disulfide-cleaved peptides in the background of all the other MS 2 fragments. To further evaluate the previous finding that the S-S bond is prone to cleavage under ETD-based conditions (13), we carefully examined the inten- sity ranking of each type of fragment ion presented in Fig. 1. As illustrated in Fig. 3A, in each of the five representative spectrum sets, the peptides derived from S-S bond cleavage were among the top three and top six most intense fragments in ETD and EThcD spectra, respectively, whereas such ions were nearly absent in HCD spectra. This observation suggests that both ETD and EThcD generate predominately dis-ulfide-cleaved ions, although the relative intensities of these ions are slightly lower under EThcD conditions. Frese et al. showed that during EThcD fragmentation, HCD energy primarily targets the unreacted and charge-reduced precursors without inducing secondary fragmentation of c-or z-ions originating from the ETD reaction (23). In the case of disulfidebridged peptide fragmentation, we need to preserve not only the c-and z-ions, but also, and more importantly, the two disulfide-cleaved peptides generated by the initial ETD reaction. Our finding indicates that, similarly to ETD, EThcD is also capable of generating highly abundant disulfide-cleaved peptides, although a small percentage of secondary fragmentation may also occur as a result of the subsequent HCD fragmentation step.
The second interesting question is how much each type of fragment ion contributes to the total ion abundance during different fragmentation schemes. As shown in Fig. 3B, HCD and ETD fragmentation primarily generate, respectively, b-, yand c-, z-ion series, whereas EThcD gives rise to all four types of fragment ions, with a slightly greater abundance of b-, y-ions over c-, z-ions. In all three types of fragmentation schemes, including EThcD, double cleavages (b-, y-, c-, z-, and S-S/S-C cleavage ions) were observed, although they were significantly less abundant than disulfide-bond-containing backbone fragments (b-, y-, c-, z-, and A/B ions). Based on the aforementioned observations of fragmentation patterns, the search engine SlinkS was developed to automatically identify disulfide bridges from a protein database, and this is detailed in the following section.
Software Design-SlinkS is a disulfide bridge search engine that is able to analyze LC/MS 2 data derived from ETD and EThcD fragmentation schemes. Here ETD-based fragmentation is essential in order to generate high-abundant disulfidecleaved peptides. SlinkS uses deconvoluted peak list files such as .mgf files as input. For this, Thermo Raw files need to be deconvoluted, deisotoped, and exported as .mgf files by existing software packages such as Proteome Discover and ProSightPC (Thermo Fisher Scientific) prior to the SlinkS search.
SlinkS searches both intrapeptide and interpeptide disulfide bridges. For intrapeptide disulfide identification, the software performs in a manner similar to that of linear peptide search engines, where a mass modification of Ϫ2H is applied during searching against an in silico digested peptide database. Peptide matching is accomplished by comparing the in silico generated theoretical peptide backbone fragments (c-, z-ion series for ETD and b-, y-, c-, z-ion series for EThcD) to all recorded MS 2 fragments in the spectrum. Additionally, intrapeptide-specific internal ions (where peptide fragments are generated from double backbone cleavages between the two linked cysteines) are also considered in the algorithm. Notably, traditional internal fragment ions, such as water or ammonium loss, are not taken into account during the search. A set of annotated example intrapeptide disulfide bond spectra is shown in supplemental Fig. S2.
For interpeptide disulfide bond identification, as illustrated in Fig. 4, SlinkS first performs precursor mass selection to obtain the monoisotopic mass of each disulfide-cleaved pep-tide. This part is composed of two steps: (i) choosing the top n most abundant peaks from all the MS 2 ions (n is a userdefined parameter in the search engine), and (ii) considering the sum of the masses from all possible peptide pairs and selecting the ones that lie within the user-defined error tolerance around the MS precursor ion mass. Previous studies showed that the dissociation of S-S bonds yields similar relative intensities of thiyl (-S ⅐ ) and thiol (-SH) fragment ions (27); therefore, each of the disulfide-cleaved peptide masses (m p ) obtained from precursor mass selection is extended to three masses (m p , m p ϩ m H , and m p Ϫ m H , where m H is the mass of hydrogen) in order to include all the potential candidates of linked peptides.
The peptide database is created by in silico digestion of the protein database, where only peptides containing at least one cysteine are retained. SlinkS searches each recorded MS 2 spectrum based on the determined precursor masses of disulfide-cleaved peptides and the remaining MS 2 ions as backbone fragment ions for sequence information. In this study, only single cleavage ions (c-, z-and c-, z-ϩ A/B ion series for ETD; b-, y-, c-, z-, and b-, y-, c-, z-ϩ A/B ion series for EThcD) were considered for product ion matching. Because double cleavage ions (c-, z-ϩ S-S/S-C cleavage ions for ETD; b-, y-, c-, z-ϩ S-S/S-C cleavage ions for EThcD) are less frequent and contribute only marginally to the total ion current of the MS 2 spectra, they are labeled in the spectra but not considered in SlinkS scoring.
SlinkS uses a probability score (termed the n-score) to calculate the confidence of each candidate sequence. The n-score has been introduced (28) and is used in ProSightPC (Thermo Fisher Scientific) for top-down/middle-down experiments. We adapted the principle of n-score calculation and modified it to match our disulfide bridge identification. Because interpeptide disulfide bridges are composed of two linked linear peptides, they are assigned when the n-scores from both of the peptides are less than the threshold. The use of an individual peptide score for each linked peptide has been described previously (29). It effectively avoids the frequently occurring scenario in which most of the matched fragments arise from only one of the linked peptides.
SlinkS uses a target-decoy strategy for peptide-spectrum match validation in which the FDR is calculated as the number of false positive hits divided by the total number of identifications (true and false positive hits). In contrast to the linear peptide FDR evaluation, the false positive hits of disulfidebridged peptides refer to either or both of the linked peptides being matched in the decoy database (target-decoy ϩ decoydecoy), and the true positive hits refer to both of the linked peptides being matched in the target database (target-target). Raw files are deconvoluted, deisotoped, converted to peak list files, and subjected to SlinkS analysis. SlinkS first performs precursor mass selection to obtain the monoisotopic mass of each disulfide-cleaved peptide, and spectra that do not contain any candidate peptide pairs are discarded. Next, all MS 2 ions are matched against in silico fragments of each candidate peptide, and spectra that do not have sufficient matched fragment ions are omitted. Finally, the n-score and FDR are calculated, and identification results are filtered based on the user-defined n-score and FDR cutoff value. under acidic conditions using pepsin, and the resulting peptides were subjected to ETD and EThcD LC/MS 2 experiments. Firstly we evaluated the n-score distributions of true and false positive hits by plotting the n-scores against the number of identifications in each category. For both ETD and EThcD data, the true positive matches (target-target) were clearly separated from the false positive hits (target-decoy and decoy-decoy) in the low n-score region (Fig. 5A). We also noticed that the hybrid false positive hits (target-decoy) contributed significantly more than the double false positive hits (decoy-decoy). This finding suggests that hybrid false positive hits, which are sometimes inappropriately estimated by other algorithms, have to be taken into account. As expected, when comparing to ETD data, the average n-score under EThcD is lower for true positive hits, which directly reflects the higher spectral quality due to the generation of all b-, y-, c-, z-ion series. However, the average n-score for false positive hits is also lower. One of the possible explanations is that more random matches are generated under EThcD because, in theory, EThcD spectra contain twice as many ions as ETD spectra. In addition, we plotted the estimated FDRs against different n-score thresholds used in the search (Fig. 5B). Under ETD and EThcD, respectively, a natural log-based nscore cutoff of Ϫ20.5 or Ϫ21.5 gave rise to an estimated FDR of ϳ1%. This indicates that in order to reach the same desired FDR, a slightly lower n-score cutoff should be applied in EThcD than in ETD.

Evaluating the Performance of SlinkS in the Identification of Disulfide Bridges under ETD and EThcD Conditions-To
Next, we assessed the number of identified disulfide bridges in ETD and EThcD experiments. Based on the results from n-score and FDR evaluation, we chose a peptide natural log n-score cutoff of Ϫ21.5 for EThcD and Ϫ20.5 for ETD data, both of which provided an estimated FDR of 1%. As a result, 13 unique Cys-Cys bridges were confidently identified via ETD, and an additional 11 (a total of 24) unique Cys-Cys bonds were identified via EThcD (supplemental Table S2). Together, these results suggest that SlinkS performs well in unambiguously and sensitively identifying disulfide-bridged peptides from both fragmentation schemes, but EThcD outperforms ETD substantially in terms of the number of identifications and the average n-score of all true positive matches.
Evaluation of Pepsin-and Trypsin-based Sample Preparation Strategies-In addition to the difficulties in spectra interpretation and database searching, sample preparation for disulfide bond identification is also quite challenging. The two major obstacles are (i) disulfide reshuffling caused by disulfide exchange or the formation of new artificial disulfide bridges, and (ii) low protein sequence coverage due to the presence of intertwined disulfide bridges.
To tackle these issues, pepsin was used for proteolytic digestion in this study. We chose pepsin over other frequently used enzymes because of the following advantages: (i) it is compatible with highly acidic conditions (pH 1-3), thus eliminating disulfide reshuffling, which typically occurs under neutral and alkaline pH, and (ii) it is a nonspecific enzyme and therefore works very well in generating small peptides and separating most cysteines into different peptides to facilitate spectra interpretation. However, pepsin is not widely applied to disulfide bond studies with the use of mass spectrometry For each of the disulfide-bridged peptides, the lower n-score value of the two linked peptides was used for plotting. B, a bar diagram of different n-score thresholds applied in the search against the corresponding FDRs. The actual dataset (shown as gray bars) is fitted to a power function (shown as blue lines). The red triangle indicates n-score cutoffs of Ϫ20.5 and Ϫ21.5 for ETD and EThcD spectra, respectively, to allow for an FDR of ϳ1%. These n-score cutoff values are calculated based on the fitted curve. (8), likely because of the complexity of the peptide mixture derived from nonspecific enzymatic digestion and the difficulties of spectra interpretation. During the database searching of disulfide bridges, because the precursor mass obtained from MS acquisition is the sum of the two linked peptides, all possible peptide pairs (the summed mass of which matches the precursor mass) in the peptide library have to be considered, which increases the size of the database quadratically. For example, an average-sized protein (50 kDa) generates ϳ80 cysteine-containing tryptic peptides but ϳ4000 peptic peptides. If we include all the possible peptide-pair combinations, the actual search space for a single pepsin-digested protein is 8 ϫ 10 6 , which is twice the size of a tryptic human proteome database. In our approach, because we were able to obtain the precursor mass of each linked peptide, we directly used the linear peptide database during the search and thus overcame the computational obstacle. In Fig. 6 we plot the number of theoretical peptides to be considered using different search strategies. It clearly shows the quadratic expansion of the search space when the combination of peptide pairs is taken into consideration. Therefore, the main feature of SlinkS, which is the ability to search each disulfidelinked peptide, is essential for reducing the search space.
To further characterize the performance of pepsin digestion relative to that of the frequently used low-pH (pH 6 -7) and normal-pH (pH 7-8) Lys-C plus trypsin digestion procedures, we used a protein mixture containing six well-characterized proteins with known disulfide bridges. In this case, we were able to evaluate both the efficacy of identifying authentic disulfide bonds and the level of reshuffling. We compared three different sample preparation conditions: pepsin digestion at pH 1.5 and Lys-C plus trypsin digestion at pH 6.8 and 7.8, with the two trypsin-digested samples alkylated by iodoacetamide or N-ethylmaleimide, respectively, prior to protein denaturation. Subsequently, peptide mixtures were subjected to EThcD LC/MS 2 analysis. The data analysis was accomplished by SlinkS with the use of a natural log-based n-score cutoff value of Ϫ21.5 and an estimated FDR of ϳ1%. The results are summarized in Table I, and all identifications are listed in supplemental Table S3. Based on crystal structures, out of 43 well-described disulfide bridges from all six proteins, 31,16, and 22 unique authentic disulfide bonds were identified in pepsin, low-pH trypsin, and normal-pH trypsin digestion workflows. Interestingly, in the low-pH trypsin-digested sample, we observed one abnormal disulfide bond, an intrapeptide linkage between the two cysteines in Cytochrome C. Because these two cysteines have been well characterized as free cysteines, our observation was very likely due to undesirable disulfide bond formation during sample preparation. This observation is probably promoted by the fact that these two cysteines are only two amino acid residues apart. Additionally, and more problematically, we identified eight intraprotein and six interprotein disulfide bridges in the pH 7.8 trypsin-digested sample. Because our starting material was a mixture of individually purified proteins, these interprotein disulfides were clearly an artifact from disulfide reshuffling during  proteolytic digestion. Our results suggest that lowering the digestion pH to 6.8 can effectively decrease the undesirable disulfide reshuffling. However, because the entire sample preparation procedure is still performed under oxidative conditions (for example, the existence of oxygen in the air), the possibility of disulfide bond formation and exchange has to be taken into consideration. Convincingly, among all three tested digestion procedures, pepsin digestion resulted in the greatest number of identifications (outperforming trypsin digestion) and the most reliable results (providing no evidence of disulfide bond reshuffling). Moreover, in our search results, many disulfide-bridged peptides containing two disulfide bonds were identified (supplemental Table S3), and an example EThcD spectrum of these double disulfide-bridged peptides is shown in Fig. 7. This observation is clearly an additional strength of SlinkS. As discussed in the previous section, during the precursor selection step in a SlinkS search, a Ϯ1 Da window was included to allow the possibly formed thiyl (-S ⅐ ) and thiol (-SH) ions after S-S bond cleavage. Beneficially, a disulfide-cleaved peptide containing an internal disulfide bridge, which is 2 Da smaller than its reduced form, is also included in the disulfide-cleaved peptide candidate list. The number of disulfide bonds in each disulfide-bridged peptide can be verified by calculating the mass difference between the sum of the two peptides and the observed precursor mass in the MS acquisition. In the example spectrum shown in Fig. 7, a 4-Da mass difference between the calculated and measured disulfide-bridged peptide precursor further proves the existence of two disulfide bonds in this identification.
Applying Pepsin Digestion, EThcD, and SlinkS to Identify Disulfide Bridges in Therapeutic Antibodies-After the evaluation of a well-characterized standard protein mixture, we further applied our new workflow to an unknown system, namely, stress-induced therapeutic antibodies. As for many other therapeutic proteins, the correct formation of disulfide bridges is critical during antibody production, and undesirable disulfide reshuffling may cause antibody instability, unfolding, aggregation, and hence malfunction. It has been reported that therapeutic antibodies may become less effective under stress conditions, such as pH changes, temperature variations, and agitations (30 -32). Heat stress may induce both intramolecule and intermolecule disulfide bridge reshuffling, which can be one of the causes of antibody unfolding and aggregation. We argue that assessing the changes in disulfide connections present in therapeutic antibodies under stress conditions, mimicking formulation, storage, and usage, provides useful information on their stability and long-term quality.
Here, we evaluated the disulfide bridge reshuffling of nonheated and heat-stressed IgG1 antibodies. The non-heated and heated (at 37°C, 60°C, or 70°C) antibody samples were pepsin digested and EThcD LC/MS 2 analyzed, and the data were searched by SlinkS. The results were filtered with a 1% FDR, and all unambiguously identified disulfide-bridged peptides are summarized in Fig. 8 and listed in supplemental Table S4. In total, out of 16 authentic disulfide-bridged peptides (including 12 intrachains and 4 interchains), all 12 intrachain disulfide bonds (6 unique ones in the monomer) were identified in both non-heated and heated samples. We did not detect any interchain disulfide bridges, which might have been due to the presence of intertwined disulfide bridges at the hinge region (Fig. 8A). In addition, we identified one and six unique reshuffled Cys-Cys bonds in the 60°C and 70°C heated samples, respectively. The relative abundances of these stress-induced, newly formed disulfide bridges were also quantified via spectra counting (Fig. 8B) 7. An example EThcD spectrum of a disulfide-bridged peptide containing two disulfide bonds, illustrating that such spectra can also be identified by SlinkS. The color and label schemes are the same as in Fig. 2. number and the abundance of the reshuffled disulfide-bridged peptides increase. We also noticed that in heat-stressed antibody samples, disulfide reshuffling frequently happened at specific cysteine residues (i.e. it was not evenly spread among all cysteine residues), which strengthens our hypothesis that disulfide reshuffling is not a pure random process and correlates with the protein structure. Our results clearly show that heating as one of the environmental stresses can alter the structure and stability of therapeutic antibodies, which can be conveniently and efficiently detected via our pepsin-based disulfide bridge mapping approach. CONCLUSIONS We developed a novel integrated workflow to efficiently and reliably identify disulfide-bridged peptides from complex mixtures. The three integrated components of our workflow are the use of pepsin at highly acidic conditions for proteolytic digestion, EThcD for the generation of fragment ion spectra, and a novel search engine, SlinkS, for the automated interpretation of the data. SlinkS possesses a unique feature: for each MS 2 spectrum, it first determines the precursor masses of the two disulfide-cleaved peptides and subsequently sequences each of them from a linear peptide database. This approach not only tremendously reduces the algorithm search space, but also provides a simple and efficient scoring scheme that employs individual peptide n-scores to evaluate the confidence of disulfide bond identification. Moreover, SlinkS allows one to define the FDR through a target-decoy strategy. In this study, we compared the performance of ETD and EThcD fragmentation schemes for disulfide bridge identification, and our data strongly suggest that EThcD outperforms ETD in terms of both the number of identifications and the average n-score of each linked peptide. We also assessed three different digestion strategies: pepsin digestion at pH 1.5, low-pH trypsin digestion at pH 6.8, and normal-pH trypsin digestion at pH 7.8. Our data highlight the benefits of using pepsin over trypsin digestion on the basis of much higher identification numbers and more reliable results (no artificial disulfide-bridged peptides observed). We conclude that our three-pronged integrated workflow is a powerful tool for the analysis of protein folding and three-dimensional structure in the native state, as well as conformational changes under stress-induced conditions.