Quantitative cross-linking via engineered cysteines to study inter-domain interactions in bacterial collagenases

Summary Inter-domain movements act as important activity modulators in multi-domain proteins. Here, we present a protocol for inter-domain cross-linking via engineered cysteines. Using collagenase G (ColG) from Hathewaya histolytica as a model, we describe steps for the design, expression, purification, and cross-linking of the target protein. We detail a system to monitor the progress of the cross-linking reaction and to confirm the structural integrity of the purified cross-linked proteins. We anticipate this protocol to be readily adaptable to other multi-domain enzymes. For complete details on the use and execution of this protocol, please refer to Serwanja et al.1


SUMMARY
Inter-domain movements act as important activity modulators in multi-domain proteins. Here, we present a protocol for inter-domain cross-linking via engineered cysteines. Using collagenase G (ColG) from Hathewaya histolytica as a model, we describe steps for the design, expression, purification, and cross-linking of the target protein. We detail a system to monitor the progress of the crosslinking reaction and to confirm the structural integrity of the purified cross-linked proteins. We anticipate this protocol to be readily adaptable to other multidomain enzymes. For complete details on the use and execution of this protocol, please refer to Serwanja et al. 1

BEFORE YOU BEGIN
Inter-domain movements act as important activity modulators in multi-domain proteins; whether it is by creating binding surfaces for ligands, [2][3][4] or by enabling catalysis or receptor activation. 3,[5][6][7] To unravel their functional significance, it is crucial to be able to lock the target domains in distinct conformational states. This can be done by inter-domain cross-linking via engineered cysteine residues. [8][9][10] This method has the advantage that the target protein can be reversibly maintained in a fixed conformational state. Upon addition of a reducing agent, the restraining linkage dissolves. Yet, this method entails pitfalls that need to be carefully handled, as one must avoid the presence of i) mislinked higher oligomeric species, and ii) a mixture of cross-linked and non-cross-linked proteins. Moreover, it is also vital that the engineered cysteine pair and the subsequently formed disulfide bond do not compromise the fold of the protein or annihilate critical functional residues or motifs.
Here we describe the cross-linking of the collagenase unit (CU) of the collagenase ColG from Hathewaya histolytica. The CU is composed of the activator domain (AD) and peptidase domain (PD) and adopts an open conformation in the crystal structure. A two-state model of collagen degradation was proposed, in which the CU switches between an open and closed conformation. 11 This protocol was established to generate cross-linked CU variants in different states of closure. Care was taken to address the aforementioned pitfalls, in order to produce pure, functional, homogenously crosslinked, monomeric protein.
This protocol can be easily adapted for other multi-domain proteins, but it is important to consider the following points beforehand: 1. In case the target protein contains conserved, functionally relevant cysteines (e.g., cysteine proteases, disulfide bonds), this protocol will require adaptations or may not even be applicable, as mutation of these residues and/or the introduction of additional cysteines will interfere with proper folding and protein activity. 2. When designing the cross-linked protein variants, please follow these guidelines: a. Perform a multiple sequence alignment of the target protein with known homologues to identify conserved residues. b. Do not target conserved amino acids for mutation, as sequence conservation is indicative for residues of structural and/or functional significance. c. Remove naturally occurring non-conserved cysteine residues by mutation; otherwise the yield of properly cross-linked protein will be severely reduced. d. Visualize the conformational state that you want to stabilize using software programs such as PyMol 12 or UCSF Chimera 13 : i. Use available crystal structures of the target protein, homology models or AlphaFold 14 models. ii. Model the targeted conformational state of the two involved domains. iii. Search for surface-accessible sidechains facing each other at the interface of the two domains (C b -C b distance <5 Å ). Upon mutation into cysteines, this will translate into a sulfur-sulfur distance of approx. 2 Å , ideal for disulfide-bond formation, if the hypothetical conformation is populated. 15 If two residues for exchange cannot be identified, more distant pairs may be used together with bridging cross-linkers.

OPEN ACCESS
CRITICAL: Adjust to pH 7.5 at RT.

STEP-BY-STEP METHOD DETAILS
Production of non-cross-linked protein

Timing: 3 weeks
This part of the protocol details the design, cloning, overexpression, and pre-purification of the engineered CU variants (CL1, CL2, CL3 CL4) in a non-cross-linked, i.e., reduced, state. The expression and purification steps are based on Hoppe et al. 17 Note: The individual cloning steps are not presented in detail here, as different cloning methods can be used to generate the necessary construct plasmids. Therefore, we only list the necessary cloning products.
1. Design. a. Perform a multiple-sequence alignment of ColG and homologous proteins to identify naturally occurring cysteines (C218 and C262) and determine whether these cysteine residues are conserved or not (both non-conserved). b. Using the crystal structure for wild type ColG-CU (PDB: 4are), model with PyMol the closed conformation of the CU ( Figure 1A). c. Based on this model, select non-conserved amino acid pairs in the AD and the PD in close proximity to each other ( Figure 1A).   Note: This expression plasmid is based on the pET-expression system encoding for an N-terminal hexahistidine tag (pET15b) to facilitate downstream purification and has an ampicillin resistance marker. The expression of the target protein can be induced by the addition of IPTG.
b. Generate a variant of this expression plasmid with a cysteine-free CDS for ColG-CU (C218S/ C262S) (CF). c. Confirm the removal of the native cysteines by sequencing. d. Using the cysteine-free expression plasmid as template, generate the expression plasmids for the four different double mutants CL1-CL4. e. Confirm the introduction of the engineered cysteines by sequencing.

Expression.
a. Transform the mutant plasmids into an E. coli expression host (Nico21 DE3). b. Plate the transformed cells on LB-agar plates supplemented with 100 mg/mL ampicillin and incubate overnight at 37 C. c. Once single colonies have grown, seal the plates with parafilm and store the plates at 4 C prior to large-scale expression (maximum storage: 6 weeks). d. For the large-scale expression, prepare 3 L of 20 g/L LB (lysogeny broth) media for each construct. e. Transfer 500 mL of the media into 2.5 L baffled shake flasks, autoclave and keep the media in a sterile environment until needed. f. Set up for each construct 50 mL of preculture using autoclaved LB media supplemented with 100 mg/mL ampicillin. g. Inoculate each preculture with a single colony from the stored LB-agar plates. h. Incubate the preculture at 37 C with shaking at 230 rpm overnight. i. In the morning supplement all autoclaved flasks with 500 mL LB media with ampicillin to a final concentration of 100 mg/mL. o. Store the cell pellets at À20 C.
Note: The use of Nico21 DE3 (NEB) as an expression host is encouraged, since its genotype lowers the contamination of the IMAC purified protein by endogenous E. coli metal-binding proteins.
Note: The 3 L culture volumes were tailored for a final yield of cross-linked collagenase of approx. 10 mg.
4. Pre-purification. a. Suspend cell pellets with expressed protein in precooled loading buffer (2 mL/g cell wet weight). b. Lyse the cells on ice and purify the protein batches via IMAC using pre-equilibrated Ni-Sepharose columns by washing bound protein with 50 mL precooled wash buffer 1, wash buffer 2, and wash buffer 3 each. c. Elute Ni-bound protein with 250 mM imidazole using 30 mL of the precooled elution buffer. d. Concentrate the eluted protein using Amicon Ultra-15 devices, 10,000 MWCO (4,000 3 g, 4 C, 310 min). e. Perform size-exclusion chromatography (SEC1) at 4 C using pre-equilibrated size-exclusion chromatography column (Superdex 200 10/300 GL) with filtered and degassed size-exclusion buffer containing 10 mM ßME. Clarify the protein solution prior to loading by centrifugation (17,000 3 g for 30 min at 4 C). f. Analyze the peak fractions via SDS-PAGE (12% polyacrylamide gel). g. Concentrate the pure monomeric SEC1 fractions using Amicon Ultra-15 devices, 10,000 MWCO (4,000 3 g, 4 C, 310 min) to the desired final concentration. h. Aliquot concentrated protein in PCR tubes.
Pause point: Protein samples can be flash-frozen in liquid nitrogen and stored at À80 C for several months. Optional: In case it is anticipated that the engineered cysteine sidechains might be located too distal from each other to favor spontaneous disulfide-bond formation, add a 100-fold excess of DTME dissolved in dimethyl sulfoxide (DMSO) or a similar thiol-based cross-linker to facilitate inter-domain cross-linking.

Cross-linking reaction
f. Incubate the reaction for 10 days at 4 C or for 3 days at RT.

Note:
The cross-linking reaction should be maintained at a low protein concentration (1.0 mM or lower) to prevent oligomerization by intermolecular disulfide bonding, which is disfavored at low protein concentrations.
Note: Optimization of cross-linking reaction was achieved by determining how long it takes for 1 mM bME to evaporate from 500 mL oxidation buffer at 4 C and RT, respectively, via the CPM assay ( Figure 2). With protein in solution, no thiol detection was measured after 10 days at 4 C and after 3 days at RT, respectively.

Timing: 2 days
This section details the purification of monomeric cross-linked protein using a two-step protocol starting with Activated Thiol Sepharose (ATS) chromatography followed by gel filtration. Note: The binding capacity of 10 mL Activated Thiol Sepharose is approx. 30 mg of protein.
Note: Because the likelihood of spontaneous disulfide formation depends on the location of the engineered cysteines relative to each other, the efficiency of the cross-linking reaction is highly construct-dependent ( Figure 3). The yield of cross-linked species can be increased by the use of thiol-based cross-linkers in the cross-linking reaction.

Figure 2. Time course of ßME evaporation at RT and 4 C in oxidation buffer
The presence of the thiol-containing bME was determined using the CPM assay (Step 9). No free thiols were detected in the oxidation buffer after 3 days incubation at RT and 10 days at 4 C, respectively. h. Concentrate the pure monomeric SEC2 fractions using Amicon Ultra-15 devices, 10,000 MWCO (4,000 3 g, 4 C, 310 min) to the desired final concentration (Figure 4). i. Aliquot the protein in PCR tubes and flash freeze it with liquid nitrogen prior to long term storage at À80 C.
CRITICAL: Centrifugation of the protein sample prior to loading on the Sepharose column is paramount to avoid clogging the column with precipitated protein.

Quality control
Timing: 2-3 days The structural and functional integrity of the purified cross-linked ColG-CU variants needs to be verified prior to any downstream applications. These quality checks examine whether the final protein is fully cross-linked, properly folded, and catalytically active.
9. Verification of cross-linking by CPM assay.
CRITICAL: The use of gloves, sterile equipment and materials and precise pipetting is critical for an accurate measurement. Since 7-diethylamino-3-(4-maleinimidophenyl)-4-methyl coumarin (CPM) is light sensitive, light exposure of the reagent and the plate filled with the reagent must be minimised.
Note: CPM is only fluorescent upon conjugation with a free thiol and thus, allows the detection of free thiols in protein samples. The protocol for the assay was modified from Alexandrov et al. 18   The crosslinking reaction for CL3 was performed for 10 days at 4 C. In the subsequent ATS purification, the oxidized protein sample was loaded onto the Sepharose column (column load (CL)), and cross-linked protein collected in the flowthrough (FT). In the final SEC purification, referred to as SEC2, highly pure, monomeric protein was collected in the indicated peak fractions. iii. Mix well with a multichannel pipette. iv. Remove bubbles using a gentle stream of ethanol vapor from a laboratory squirt bottle. v. Cover the wells tightly with a sealing foil and incubate at 60 C for 3 min to mildly denature the protein samples. e. Measure the fluorescence (E x : 387 nm and E m : 463 nm) of all wells at RT. f. Analysis: i. Subtract the blank fluorescence from all standard and sample measurements.
ii. Generate a standard curve plotting fluorescence signal vs. number of thiols (mM). 1 mM ColG-CU WT corresponds to 2 mM thiols. iii. Interpolating from the standard curve, determine the free thiol concentration in the test samples (Table 1).
Note: To avoid volumetric pipetting errors that would lead to inconsistent measurements, we prepare a rather large volume of 2.0 mL working stock of each protein at a final concentration of 1.0 mM.
10. Structural quality control. a. Collect far UV CD spectra to confirm secondary structure content: i. Thaw purified stocks of ColG-CU WT, CF and CL1-CL4 and store them on ice.
ii. Set up the CD spectrometer and flush the instrument with liquid nitrogen.
iii. Set the wavelength range for the spectral scan from 200 to 260 nm. iv. Set spectral bandwidth and scan time-per-point to 1 nm and 1 s, respectively. v. Set the temperature of measurement to RT. vi. If necessary, re-buffer the protein samples to remove any chloride ions. vii. Prepare in triplicate 200 mL at a final concentration of 5.0 mM for each protein using the CD buffer. Using 200 mL for a 0.5 mm quartz cuvette is recommended. viii. Collect CD spectra in triplicates. ix. Convert the recorded spectra data to molar ellipticity vs. wavelength for a direct comparison between the various protein variants ( Figure 6).   Figure 6. CD spectra of ColG-CU constructs The spectra show two main minima at 208 and 222 nm, indicative of a structure dominated by alpha-helices, which is in good agreement with the crystal structure of ColG-CU (PDB: 4are). The highly similar, almost identical spectra of WT and the mutants demonstrate that the cross-linking did not compromise the overall fold of the CU in the cross-linked variants. ll

OPEN ACCESS
f. Monitor the cleavage of the substrate for 2 min at 25 C (excitation: 328 nm, emission: 392 nm) g. Calculate the initial velocity (v 0 ) from each reaction using linear regression of the progress curves (stay below 10% substrate conversion). h. Normalize the calculated enzymatic activities derived from the initial velocities to the activity of the WT (Figure 8).

EXPECTED OUTCOMES
The oxidation and isolation of cross-linked monomeric protein can be confirmed using the CPM assay. The CPM assay will reveal the presence or absence of non-cross-linked proteins in the final samples of CL1-CL4 ( Table 2).
The CD spectra and the thermal unfolding monitored by DSF are used to determine the secondary structure and stability of the protein variants. Similar CD spectra and melting temperatures  compared to the non-cross-linked WT indicate that the cross-linking procedure did not compromise the overall fold and secondary structure of the cross-linked protein variants.
Finally, the peptidolytic activities of the cross-linked proteins are compared to the non-cross-linked WT. In case of successful 'non-invasive' cross-linking, the catalytic activities of the WT and the crosslinked variants should not differ significantly from each other.

LIMITATIONS
This protocol puts particular emphasis on the production of highly pure, natively folded cross-linked protein. Therefore, multiple quality controls are implemented. We are aware that, in the current form, the presented approach might not be suitable for the cross-linking of proteins that contain structurally and/or functionally relevant cysteines such as cysteine proteases or disulfide-bonded proteins. For example, for proteins containing disulfide-bonds the reducing buffers would have to be complemented with an oxidized disulfide reservoir, e.g., 10 mM bME, 1 mM cystine. Such a redox system should prevent formation of artificial cysteine-modifications while preserving conformationally stabilized disulfide bonds, but would certainly require optimization.

Potential solution
Maintain reducing conditions (10 mM bME) in the affinity chromatography purification and first size exclusion chromatography (SEC1) runs (Step 4). Use lower protein concentration during the cross-linking reaction (0.1-1.0 mM) (Step 5). Start the cross-linking reaction with 1 mM bME in the buffer to ensure that at the start of the oxidation reaction all cysteines are completely reduced (Step 5). Incubate the reaction for 10 days or longer (if protein is stable at 4 C) (Step 5).

Problem 2
Low final yield of cross-linked protein (Step 9).

Potential solution
Try to identify the reaction step where you lose most protein and optimize this protocol step. Scale up the expression volume (more than 3 L) and use larger amounts of protein for the oxidation reaction (Step 3). Low yields may indicate that the proposed conformational intermediate (Step 1b) is hardly populated in solution, questioning the initial assumptions.

Problem 3
No evidence of cross-linking from the CPM measurement (Step 9). Results are given as mean G standard deviation of independently purified batches.

Potential solution
Test the addition of a 100-fold excess of DTME or of a similar thiol-based cross-linker in the crosslinking reaction to facilitate inter-domain cross-linking (Step 5). Select a different pair of residues to be replaced by the engineered cysteines (Step 1c).
No yield may indicate that the proposed conformational intermediate (Step 1b) is not populated in solution, questioning the initial assumptions.

Problem 4
Significant deviations in the free thiol quantification are observed (Step 9).
Standard curve preparation using molar concentrations of commercial cysteine can produce significant deviations in free thiol measurement. It may be more accurate to obtain the standard curve using a thiol-containing molecule with comparable protein background such as ColG-CU WT (Step 9). Free thiol measurement of cross-linked proteins using this protocol may give a negative result with the test sample. This should be interpreted as 100% cross-linking.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Esther Schoenauer (esther.schoenauer@plus.ac.at).

Materials availability
Plasmids, primers, and E. coli strains are available from lead contact upon request.

Data and code availability
This study did not generate any unique datasets or code.

DECLARATION OF INTERESTS
The authors declare no competing interests.