Nepenthesin from monkey cups for Hydrogen/Deuterium Exchange Mass Spectrometry

at in retain label. Pepsin is used but it provides relatively low efficiency under the constraints of the experiment, and a selectivity profile that renders poor coverage of intrinsically-disordered regions. In this study we present nepenthesin-containing secretions of the pitcher plant Nepenthes , commonly called monkey cups, for use in HDX-MS. We show that nepenthesin is at least 1400-fold more efficient than pepsin under HDX-competent conditions, with a selectivity profile that mimics pepsin in part, but also includes efficient cleavage C-terminal to “forbidden” residues K,R,H and P. High efficiency permits a solution-based analysis with no detectable autolysis, avoiding the complication of immobilized enzyme reactors. Relaxed selectivity promotes high coverage of disordered regions and the ability to “tune” the mass map for regions of interest. Nepenthesin-enriched secretions were applied to an analysis of protein complexes in the non-homologous end-joining (NHEJ) DNA repair pathway. The analysis of XRCC4 binding to the BRCT domains of Ligase IV points to secondary interactions between the disordered C-terminal tail of XRCC4 and remote regions of the BRCT domains, which could only be identified with a nepenthesin-based workflow. HDX data suggest that stalk-binding to XRCC4 primes a BRCT conformation in these remote regions to support tail interaction, an event which may be phosphoregulated. We conclude that nepenthesin is an effective alternative to pepsin for all HDX-MS applications, and especially for the analysis of structural transitions among intrinsically-disordered proteins and their binding partners.


Introduction
Mass spectrometry has served the biochemical and biological communities by providing the capacity for protein identification and characterization, but in the last several years it has also become a powerful tool for interrogating protein structure and dynamics (1,2). Solution-phase hydrogen/deuterium exchange (HDX), when coupled with mass spectrometry (MS), provides rich sets of data that can be mined to extract structural and dynamic parameters from proteins (3)(4)(5)(6)(7). In many cases, the technique is used in experimental situations where X-ray diffraction analysis or other biophysical techniques (e.g. NMR, cryoEM) are difficult to apply, which is particularly true in the structure-function analysis of protein interactions (8). Most applications of the method involve deuteration of a protein in two or more states, and differential labeling data is extracted at the highest structural resolution possible. Although there have been some impressive developments in top-down protein analysis using newer ion fragmentation methods for label localization (9,10), most studies continue to employ a bottom-up strategy, in which the protein is digested with an enzyme, and the label is quantified by mass analysis of the resulting peptides.
The reasons for the prominence of the bottom-up approach to HDX-MS are shared with the corresponding proteomics method. Peptides may be detected more sensitively than proteins, and samples of considerably higher complexity can be interrogated (11). Analytically, this shifts the focus towards optimizing protein digestion, in order to cover 100% of the sequence and generate a high degree of overlapping fragments to increase opportunities for localizing the deuterium label at high resolution (12)(13)(14). The unique requirements of the HDX-MS workflow unfortunately place restrictions upon the digestion enzymes that may be used. To avoid label loss through back-exchange to nondeuterated solvent, the kinetics of exchange must be dramatically slowed by pH and temperature reduction, and even under these conditions the digestion must be done rapidly. Conventional methods employ a pH of ~2.5 and temperatures of 4-10°C. The aspartic protease pepsin can function under such conditions, which has led to its prominence for HDX-MS applications. This does not mean the enzyme is ideal. Several labs have sought to identify other enzymes or develop analytical solutions that address its shortcomings, which include extensive autolysis, modest efficiency, and non-ideal substrate specificity. Currently, the most methods involve either enzyme microreactors presenting high concentrations of pepsin in a flow-through system (15), or solution-phase digestions using pepsin and protease XIII in separate experiments (16,17). The latter enzyme is also an aspartic protease with a sequence specificity that partially overlaps pepsin, thus the two maps together tend to somewhat improve sequence coverage (18).
A method that fuses the two strategies has recently been described, involving tandem pepsin and protease XIII microreactors (13).
Neither strategy is likely to be ideal when applied to protein complexes of far greater complexity, in order to support of structure-building activities or dynamics analysis of multiprotein "machines". The microreactor approach complicates the front-end fluidic system, and can lead to sample loss and carryover (19). The solution-phase method requires large amounts of enzyme resulting in contamination due to enzyme autolysis (18). More importantly, neither fully overcomes the low efficiency of these enzymes. In many cases, changing the presentation of substrate can change the sequence map considerably. For example, the sequence map of a protein can be distorted when bound to a second protein (20). A map dependent on the by guest on May 8, 2020 72 2 protein load complicates the comparison of deuteration levels for the protein in different states.
This alteration suggests a level of substrate inhibition (21), and probably reflects a wide range of specificity constants (k cat /K m ) across the many individual cleavage sites presented in a protein substrate (22). Similar issues have been noted with trypsin (23). Overall, the sequence maps generated using these methods are serviceable for samples of lower complexity, but low enzymatic efficiency of the available proteases remains an important limitation and a key driver in the search for novel proteases (24).
In this study, we characterize the proteolytic activity of secretions from the Nepenthes genus (25), arising from the aspartic protease nepenthesin (26), and evaluate the enzyme for use within HDX applications. Nepenthesin displays remarkably high cleavage efficiency for a broad range of substrates at low pH and temperature, which promotes high sequence coverage for a collection of proteins selected from ongoing HDX projects in our laboratories. Globally, we demonstrate that it outperforms pepsin in sequence coverage and can be used in a simple workflow for broad sequence coverage, or targeted towards a desired area of protein sequence.
This new tool was applied to an HDX-MS characterization of a protein complex involved in the non-homologous end-joining (NHEJ) pathway of DNA damage repair, and the results support a model of the complex proposed from SAXS data (27).

Plants, and fluid processing
Transplants of several Nepenthes varieties were acclimated in a small terrarium. Upon pitcher maturity, the plants were fed with one or two Drosophila per pitcher and the pitcher fluid harvested one week later. Pitchers and their secretions were left to recover for one week prior to a second round of feeding and extraction. All pitcher fluid was combined and clarified through a 0.22 µ m filter, then concentrated 80 to 100-fold using an Amicon Ultra 10 kDa molecular weight cut-off centrifugal filter (both from Millipore). Prior to use in digestions, the concentrate was acid-activated with 100 mM Glycine HCl (pH 2.5) for 3 hours, then washed 3X with 100 mM Glycine-HCl (pH 2.5) in the filtration device, using 10X fluid volume for each wash. The final isolate was then rediluted to an 11X concentration based on the original sampling of pitcher fluid. Additional details on plant horticulture, feeding and fluid extraction can be found in Supplementary Methods.

Digest mapping by mass spectrometry
Digestions were carried out in solution using a HTX-PAL autosampler and dispensing system designed for HDX applications (Leap Technologies), and data were collected using an AB Sciex Triple-TOF 5600 QqTOF mass spectrometer. Peptides were identified with Mascot (v2.3) from MS/MS data, from .mgf files created in Analyst TF v1.51. Data were mapped to sequence using the following search terms: a mass tolerance of 10 ppm on precursor ions and 0.6 Da on fragment ions, no modifications, and no enzyme specificity. A standard probability cut-off of Aliquots were then digested in two ways. In the first digestion strategy, protein deuteration was quenched by adding the sample to chilled 100 mM glycine-HCl (pH 2.5), and the quenched protein solution was injected into a pepsin microreactor. This microreactor was installed in the HTX-PAL system between the injector valve and the C18 column. Protein digest was captured on the monolithic C18 capillary column and eluted into the mass spectrometer. All fluidic elements, including the microreactor, were chilled at 4°C to minimize deuterium back-exchange during the analysis time (<15 min). In the second digestion strategy, an equivalent amount of deuterated protein was simultaneously quenched and digested with 3 or 5 µ L of 11X nepenthes by guest on May 8, 2020 02 2 fluid for 3 or 5 min, respectively, at 10°C. The samples were then injected into the chilled LCsystem connected to the mass spectrometer.
Replicate mass shift measurements were made (4 or more) and referenced to control protein states -free XRCC4(1-200), free XRCC4(full length) and free LigIV-BRCT. The average deuterium level for each peptide was determined using Mass Spec Studio (manuscript in preparation), which is a rebuild of Hydra v1.5 (46). Perturbed mass shifts were considered significant if they (a) passed a two-tailed t test (p<0.05) using pooled standard deviations from the analyses of each state, (b) passed a distribution analysis to guard against spectral overlap and (c) exceeded a threshold shift value (±2 s.d.) based on a measurement of the shift noise and assuming its normal distribution (47).

Chemicals
Water and acetonitrile, HPLC grade form Burdick and Jackson, were purchased from VWR.
Formic acid, Tris, glycine were purchased from Sigma Aldrich. by guest on May 8, 2020

Pitcher fluid extract
The fluidic secretions of the pitcher plant were filtered, concentrated and the nepenthesin activated by pH reduction (pH 2.5), approximately the same pH as found in the wild . In our experiments, the enzyme-to-substrate ratio was 1:85 based on the above assumption that all the measured protein in the enriched fluid is nepenthesin. The nepenthesin data represents an assessment of 1612 residues and although not as extensive as the corresponding pepsin data (13,766 residues), the sequence diversity is sufficiently high in the protein set to warrant a comparison at the level of P1 and P1′ positions at least. The greatest specificity for pepsin is clearly in the P1 position. It presents high-efficiency cleavage for the hydrophobic residues F, L and M but cleavage after P, H, K and R is essentially forbidden. Nepenthesin cleaves after most residues with the exception of G, S, T, V, I and W. It supports a high rate of cleavage after the expected pepsin P1 residues but also at the residues forbidden in pepsin digestion, notably K, R by guest on May 8, 2020 112 2 and P. In the P1′ position, pepsin shows a preference for hydrophobic residues in general, including any residue with aromaticity. Conversely, nepenthesin demonstrates little in the way of selectivity at the P1′ position, except perhaps against G, P and H. The significantly-relaxed specificity relative to pepsin is remarkable for an aspartic protease, and the selectivity data provides an early indication of very high enzymatic efficiency.
To determine if this relaxed specificity translates into improved sequence mapping for HDX-MS applications, we profiled full-length XRCC4, a protein that contains a globular domain, an extended helical stalk, and a long disordered C-terminal (27,32). Such multi-domain proteins are challenging to encompass in a single digestion protocol, and in particular, intrinsically-disordered regions tend to digest poorly with pepsin as they are relatively depleted in hydrophobic residues and enriched in proline, and charged residues (33). The pepsin and nepenthesin maps for this protein are displayed in Figure 2. In this comparison, an exhaustive mapping was pursued for both enzymes, using a range of different protease:substrate ratios, and recursive MS/MS experiments. Nepenthesin provides superior coverage of the full length protein: 357 peptides for nepenthesin compared to 187 for pepsin, but with the same average peptide length (11 residues). Both enzymes represent the globular and stalk regions with a large number of overlapping peptides but the complementarity provided by nepenthesin is evident. The high cleavage efficiency C-terminal to basic residues prompted us to explore if there exists any bias in peptide detection. This could be tested in several ways, but we chose to select average search score as the metric (Figure 3). The approach emphasizes confidence in sequence identification as the principal means by which sequence maps are defined. There is only one outlier, R. The higher scores for peptides terminating in R likely reflect a combination of higher average peptide intensity and better fragmentation, which is consistent with what we know from trypsin-based bottom-up proteomics (34).
We then examined enzyme efficiency in greater detail and the degree to which the peptide mass map could be varied, or tuned, simply by altering the enzyme-to-substrate ratio ( Figure 4). Nepenthesin load was varied over a 50-fold range for in-solution digestions. For the pepsin experiment, immobilized pepsin in a slurry format was used rather than free pepsin so that we avoided extensive pepsin autolysis. The enzyme load was varied over an 8-fold range; lower amounts led to poor peptide intensities and higher amounts had no effect on the map. We note that nepenthesin generated a very low autolysis profile even at the higher loads (see Figure S1).
We used an aggregate peptide ion chromatogram (PIC) as a measure of effective digestion, reasoning that a distribution peaking towards low elution times presents a digest with a bias towards smaller peptides, which are more useful for HDX-MS applications. Comparing the relatively similar distributions found at 0.38:1 (nepenthesin:substrate) with 520:1 (pepsin:substrate) represents a remarkable 1400-fold improvement in efficiency for nepenthesin over pepsin in HDX-like applications. Similarly high efficiencies have been realized in the digestions of other proteins (not shown), thus the effect is not limited to XRCC4.
The nepenthesin digest could be more readily tuned from large fragments to small by varying the enzyme load, which generated variable representation of XRCC4. This is by guest on May 8, 2020 142 2 demonstrated in Figure 4a by the transition in the PIC from long retention times at low load, to short retention times at high load. This transition correlated with a decrease in the average peptide length for the most abundant peptides, from >12 at low enzyme load to 10 at high enzyme load. Conversely, varying pepsin load did not significantly alter the PIC or average peptide length (Figure 4b). A forced-flow pepsin microreactor may improve tuning but would likely not generate smaller fragments.
To explore the utility of nepenthesin in HDX-MS, we applied it to an analysis of Ligase IV interactions with XRCC4, a scaffolding protein in a DNA damage repair complex of the nonhomologous end-joining (NHEJ) pathway. The tandem BRCT domains of Ligase IV were complexed with both full length XRCC4 and a truncated form of XRCC4 , in which the intrinsically-disordered region was removed ( Figure S2). The analysis of full-length XRCC4 used a conventional pepsin-based approach, and as expected the regional changes in deuteration for each protein correlate with the stabilization of XRCC4, and the binding interface in general ( Figure 5 middle, and Figure S3). The coiled coil "stalk" domain of the XRCC4 dimer is strongly stabilized upon complexation to the XRCC4 dimer, as is the clamp-shaped helix-loophelix in the linker between the tandem BRCT domains. The stalk domain is slightly stabilized beyond the binding site proper, which is consistent with earlier studies (35). There is no structure of the interaction involving the full-length XRCC4, so we compared deuteration results with a structure using a truncated form (36). We expected to observe reduced deuteration in other regions of BRCT beyond the clamp region, as the binding interface extends beyond the clamp and is not contiguous. We see reductions in the 3: That is, BRCT binding to the truncated XRCC4 destabilizes these regions, but they become stabilized when the XRCC4 tails are present (see also Figure S3).
The pepsin map did not permit full analysis of the disordered tail (as shown in Figure 2), so we used nepenthesin and repeated the HDX-MS analysis. The controllable nature of nepenthesin digestion permitted a regiospecific optimization of the map, which we tuned to elements of the tail region ( Figure 6). Upon confirming that the nepenthesin digestion strategy generated comparable deuteration results for the binding interface ( Figure S4), we compared the full-length XRCC4-BRCT to the corresponding free proteins. A segment of the tail shows significant reductions in deuteration upon complexation (residues 213-251), but the remaining portions of the tail remained highly deuterated ( Figure S4). These findings suggest that the tails fold back upon the tandem BRCT domains, in the regions we observed to undergo tail-induced reductions in deuteration (modeled in Figure 5, right). An interaction of this nature is consistent with structural models built upon SAXS data (27). The asymmetrical orientations of the tandem BRCT domains on the stalk suggest that the tails would interact differently. Reduced deuteration in residues 213-223 is consistent with one tail binding to one BRCT domain, and reduced deuteration at residues 242-251 with the other tail binding to the other BRCT domain.

Discussion
The only endopeptidases that are currently used for bottom-up HDX-MS experiments are drawn from the acid protease class. These enzymes are chosen primarily for their digestion efficiency in the 2-3 pH range, where HDX-MS must be performed, to suppress loss of peptide deuteration into non-deuterated solvent during the sample workup stage. Unfortunately, acid proteases has not offered up many alternatives to pepsin, and these are needed. Using a single enzyme leads to biased sequence maps. Protein regions may simply not process well, because the selectivity of pepsin may not match with the amino acid composition of the region. This is especially true for intrinsically-disordered proteins, an area of great interest in protein biochemistry. Such regions are depleted in pepsin cleavage sites and enriched in forbidden sites (33). Equally important is the general inefficiency of pepsin relative to the demands of the HDX-MS workflow. Most methods employ very large amounts of enzyme to compensate, and this will likely limit HDX-MS methods to protein complexes and systems of low to moderate complexity.
Pepsin is a member of the aspartic acid protease sub-class, and several members of the class have been assessed for utility within the HDX-MS workflow. Aside from protease XIII, no other enzymes have entered common use. In our assessment of various candidates, we considered nepenthesin (EC 3.4.23.12) based upon its phylogenetic remoteness from pepsin (37).
This enzyme is produced in carnivorous plants of the Nepenthes genus. Plants in this genus have long fascinated botanists interested in mechanisms of insect trapping and nutrient uptake (25,38), and more recently, they have captured the imagination of materials researchers developing new lubrication technology (39). The proteome of the unfractionated fluid is remarkably simple, containing 3-9 major proteins in the unstimulated secretions (28,29). Two major components are nepenthesin I and II, where I is present at approximately 10fold higher levels than II (40). Initial characterizations of the enzyme suggested a selectivity profile similar to pepsin, but including cleavage on either side of D and at K,R (26,30). Early evidence also indicated an optimum digestion pH of 2.6 (41), and that the enzyme may be more efficient than pepsin (30).
For these reasons, we evaluated it for HDX-MS applications as an alternative to pepsin.
Proteome analysis confirmed the presence of nepenthesin I/II and the overall simplicity of the fluid, at the protein level. Activating the fluid by pH reduction proved an effective and reproducible means of "scrubbing" the fluid. It seemed to promote digestion of contaminating protein prior to protein concentration (from bacterial and drosophila sources), as seen in the very low background in the fluid. The low background also serves to highlight very low levels of autolysis.
Nepenthesin enriched as described is significantly more efficient than pepsin in processing protein substrates for HDX-MS. Enzyme levels need to be 1400-fold higher for pepsin-based experiments to achieve comparable peptide maps, assuming that nepenthesin is the only protein in the enriched pitcher fluid proteome. Our experiments using pepsin were performed at 4°C, and 10°C for nepenthesin, but we have not found the digestion efficiency of pepsin to increase significantly at 10°C. It would be of some benefit to decrease digestion temperatures further for nepenthesin digestions, but we note a modest drop in efficiency at 4°C.
We believe this is due either to extensive enzyme glycosylation, the presence of complex carbohydrate in the enriched fluid fraction increasing viscosity, or both. This will be explored in the future.
The current study improves our understanding of the substrate specificity for nepenthesin.
It is significantly broader than pepsin, and much less specific than first reported, which leads to a sequence map that can be highly tuned as a result. This degree of relaxed specificity is unique by guest on May 8, 2020 182 2 among the acid proteases tested to date for HDX-MS. It raises the potential for "over-digestion" to individual amino acids but overdigestion is not very common for single endoproteases. Even with high levels of enzyme we still observe a large number of sequenceable peptides, which probably highlights that secondary interaction sites remote from the actual catalytic site are needed for processing, as has been noted with pepsin (22,42). There is no structure available for nepenthesin I aside from an homology model derived from pepsin (40). If accurate, the model provides support for an extended binding site, and further suggests that the impressive pH and temperature stability of the enzyme derives in part from extensive disulfide bridges. We propose that the broader specificity results not from an altered catalytic site, but from the stabilization of a more active enzyme conformer. For pepsin, k cat values vary over a much wider range than their corresponding K m values, suggesting that an enzyme conformational change is needed to promote hydrolysis of certain substrates (the "induced fit" model) (22). It is possible that nepenthesin I/II is less dependent upon such a change, where the mechanism of hydrolysis is closer to "lock and key".
The high efficiency and resistance of nepenthesin to autolysis improves the HDX-MS workflow. Automated analyses don't require online enzyme microreactors, but simple solution dispensing and incubation. At least with the monolithic columns used in our LC-MS analyses, there was no evidence the enzyme preparation clogged columns, even after hundreds of runs.
"Tuning" the sequence map to optimize the analysis can therefore be readily controlled by dispensing different volume ratios of the enzyme extract. In our hands, pepsin is not amenable to such tuning, in either free or immobilized form.
We also show that nepenthesin-based HDX-MS analyses are broadly equivalent to the conventional pepsin-based method, and that nepenthesin provides unique opportunities to by guest on May 8, 2020 192 2 analyze disordered regions. HDX-MS is one of few methods with the potential to study structural transitions in "disordered" proteins, but the limitation of pepsin in these regions is problematic. The high coverage we observe in the of XRCC4 results from cleavage at proline and charged residues, which are sites with no to low cleavage probability using pepsin.
Functionally, the tail has been proposed to sequester XRCC4 until association with the BRCT domains of DNA Ligase IV is needed (27), an enzyme that is responsible for DNA end ligation in the NHEJ repair mechanism. This sequestration is thought to involve interaction of the tails with or near the XRCC4 head domain, so we might expect the tails to increase in deuteration when the BRCT domains bind. We do not observe this, but rather reduced deuteration in segments of the tail. These reductions correlate with a similar reduction in two distal regions of the tandem BRCT domains. Intriguingly, when these tandem BRCT domains bind to the stalk region of XRCC4 without the tails, the distal regions appear to reorganize first, suggested by their higher deuteration profiles. We propose that stalk-binding creates a permissive binding groove in each BRCT domain, which then engages the C-terminal tails of XRCC4 for a coordinated, multisite interaction between the two proteins. This type of tail interaction is supported by a modeling exercise based upon SAXS data (27). Interestingly, the proposed binding regions in the tail span one known CK2 phosphorylation site (T233) (43,44) and one predicted site (S232), which suggests a mechanism for regulating an interaction that may have important functional significance to the DNA end-joining mechanism.
Overall, nepenthesin is an important addition to HDX technology and has replaced pepsin in our laboratory as the enzyme of choice. The high digestion efficiency coupled with a more broadly-distributed amino acid specificity should render most protein substrates amenable to processing for HDX-MS. We note that membrane proteins in addition to disordered proteins by guest on May 8, 2020 102 2 may be better suited to nepenthesin as well. Our initial efforts with integral membrane proteins are suggestive of this, perhaps due in part to detectable lipase activity in the pitcher fluid (30).
Future studies will explore this in greater depth. Sufficient enzyme for a large number of projects is easily accessible to any lab willing to grow and feed a modest number of plants.  Nepenthesin cleavage preferences at (A) the P1 or N-terminal side of the cleavage site and at (B) the P1′ or C-terminal side of the cleavage site. Data is grouped according to amino acid type and compared to a similar rendering of pepsin data from Hamuro et al. (31) Black bars indicate nepenthesin digestion and the grey bars pepsin digestion. The % cleavage represents the number of observed cleavages at the given residue, relative to the total number of the given residues in the set. Nepenthesin data were obtained from 2 min digests of six denatured proteins, as described in the text.

Figure 2
XRCC4 composite peptide sequence map, arranged according to domain type. Peptides obtained using pepsin digestion at four different enzyme to substrate ratios (65:1 to 520:1, blue bars), and using nepenthesin digestion at four different enzyme to substrate ratios (0.0075:1 to 0.38:1, red bar).

Figure 3
Average MASCOT score of peptides obtained after nepenthesin digestion, grouped by Cterminal amino acid. The number of peptides used for each calculation is associated with the terminal amino acid, above the bar. Peptides were obtained from the digests of six denatured proteins, as described in the text.