A New Heterobifunctional Cross-linker Based on an “Introverted” Acid: Mass Spectrometric and Bioinformatics Studies, Analysis of Intermolecular Crosslinking of Proteins

A new NHS-aryl azido heterobifunctional cross-linker based on an “introverted” carboxylic acid has been used to bring about successful intermolecular cross-linking. As a ‘proof-of-concept’ Lysozyme was incubated with the crosslinker, then photolysed (366 nm, 6 W UV lamp), subjected to SDS-PAGE, excision of the ‘dimer’, trypsin digested and analyzed by ESI-MS and StavroX 3.6.0.1. Previous studies on crosslinking of Lysozyme (SI-I and SIII) using homobifunctional cross-linkers, either no cross-linking was observed or only two crosslinks were detected in the case of BS3, a smaller cross-linker. The heterobifunctional cross-linker described here leads to many more crosslinks, which have been identified by using mass spectrometry (ESI-MS) and StavroX 3.6.0.1, a bioinformatics software, especially suited for identifying intermolecular crosslinking. A New Heterobifunctional Cross-linker Based on an “Introverted” Acid: Mass Spectrometric and Bioinformatics Studies, Analysis of Intermolecular Crosslinking of Proteins


Introduction
A wide variety of both homo-and hetero-bifunctional crosslinkers are now commercially available, yet many laboratories the world over continue to employ conventional crosslinkers like formaldehyde and glutaraldehyde, which lead to indiscriminate and extensive crosslinking. This is due to a lack of understanding that the combined technique of chemical-crosslinking-mass spectrometry-bioinformatics is a very useful and emerging tool for studying large scale proteinprotein interactions and 3-D structures of protein complexes, especially dynamic and transient interactions in living cells. Large crosslinkers are more useful for identifying interacting partners, while relatively smaller crosslinkers capture interfaces rather than identities. Homobifunctional crosslinkers have identical displaceable groups on the two ends, e.g., the amine displaceable N-hydroxysuccinimide (NHS) group. The most popular reagent of this type is BS 2 G (bis [sulfosuccinimydyl] glutarate), which has been used extensively but is large in size and is restricted to binding with mostly lysine residues. This limitation is overcome by using heterobifunctional crosslinkers, where two different groups are present on the two ends of the crosslinker-one of these groups being thermally reactive (e.g., the NHS group) and the other one is photoreactive (e.g., the azide group). The two step protocol in the case of heterobifunctional crosslinkers is not restricted to binding only to lysine residues, which provides greater flexibility in probing interactions and even transient ones. On the other hand, X-ray diffraction and NMR continue to be the gold standards, but both these techniques have their limitations. The former requires a single crystal, while the latter requires solubility in specific solvents and also demands larger amounts of the sample. Thus, both these techniques, unlike chemical crosslinking, are not yet suitable for dynamic studies in the living cells [1][2][3][4][5][6][7][8][9][10].
Based on the work of Banks et al. on perfluorophenyl azides [11], Hagan Bayley [12] had correctly predicted that these could serve as precursors of efficient photo-affinity labelling agents. Such reagents would involve formation of 'long-lived 'transients, leading to an increase in the singlet-triplet nitrene energy gap ("Flourine effect": presence of ortho flanking fluorine on either side of the aryl nitrene intermediate required) [13,14]. Involvement of even a "slippery potential energy surface" in such reactions has also been demonstrated [15]. That such reagents would lead to enhanced intermolecular reaction rates was successfully demonstrated by Platz et al. by crosslinking Chymotrypsin [16] using Tetrafluroaryl azide based crosslinkers. Keana et al. [17] and later Liu et al. [18] confirmed this and demonstrated innumerable applications for reagents not only in biochemistry/ biotechnology/ biology but also in materials science and nanotechnology. Recently, flourinated aryl azide based crosslinkers have been used in crosslinking studies on "engineering aryl azide ligase for site specific mapping of protein-protein interaction through photocrosslinking" [19] Some firms have lately discontinued supply of a few of the fluorinated crosslinkers and increasingly more biotinylated flourinated aryl azides containing reagents are being promoted. It may be noted that flourination of aromatics is tedious, hazardous and polluting. Therefore, we questioned whether the presence of two ortho-flanking fluorine atoms on either side of an aryl nitrene intermediate is mandatory and whether it is the only way to achieve such enhanced intermolecular crosslinking efficiency.
Over three decades we have prepared many highly substituted aryl azides, which do not require any fluorination, yet parallel the special phenomenon shown by perflouro phenyl azides described above. Thermolysis of our azides initially yield singlet nitrenes, which do not flip to the triplet state and both nitrene and carbene based products have been isolated from the same reaction. This establishes involvement of nitrene-carbene conversion ("Crow-Wentrup pathway") [20] and 'longlived' transients in our reactions, which has also been substantiated by computational studies. Such 'long-lived' nitrenes were called as "true nitrenes" [21], which turned out to be a misnomer as these undergo nitrene-carbene conversion rather than flipping to the triplet state. Using Laser Flash Photolysis (LFP) the life span of the transient in one of our cases has been measured to be 700 picoseconds. The present work being submitted for publication is thus a part our continued investigations [22][23][24][25][26][27] on 'Dimethyl-azido-m-hemipinate' and 'azidom-meconine'. Related to these aryl azides is "Azido-m-hemipinic acid" (4,5-dimethoxy benzene-1, 2-dicarboxylic acid; 4,5-dimethoxy phthalic acid) (I), an example of a rare "introverted" [28] acid, used in the current investigation "[Carboxylic acids are especially difficult to place in a sterically demanding environment; their oxygens are exposed" and therefore the term "introverted" carboxylic acids are used to describe less reactive and sterically hindered carboxylic acids. This required that "convoluted molecular architectures must be created to bring intramolecular elements near them in space"]. The molecule, "azido-mhemipinic acid" has the necessary "sterically demanding environment" inbuilt in its molecular structure and there is thus no need for creating a "convoluted molecular architecture" and it functions as an "introverted" carboxylic acid, as shown by us recently in its reaction with dicyclohexyl carbodiimide and N-Hydroxy succinimide [29].

Materials and Methods
All reagents and solvents used were from standard sources; only deionized water was used. Chemical Cross-linking was done using freshly prepared 1 M lysozyme (14.4 kDa) SI-I and SI-II and 200 M crosslinker solution overnight with PBS buffer. Two micro litres of Lysozyme solution were mixed with 8 micro litres of the cross-linker in a molar ratio of 1:200 and the volume was made up to 40 micro litres using PBS buffer. These samples were kept for overnight incubation in a dark chamber. The samples were photolyzed under a 6 W TLC Visualization UV lamp (366 nm wavelength) for 45 minutes. The mixture thus obtained was then boiled for 5 minutes before loading into the wells of SDS-PAGE gel. All these processes were performed in completely dark conditions, so as to prevent any unnecessary exposure of the light sensitive cross-linker.
When the above mentioned conditions were followed, a 'dimeric' band was observed at around 28 kDa (Figures 1-5), confirming intermolecular cross-linking.

Trypsin digestion
The protocol for trypsin digestion is described in the Supplementary Information SI-SIII.

Desalting of the sample using Zip-Tip (C-18)
Reagents used: • Sample treatment solution 2.5% Formic acid.

MALDI-MS investigations
We initially undertook MALDI-MS, MS/MS Investigations and the data thus obtained could not be used. We were then yet to characterize the new crosslinker and its exact mass had not been determined, which would have hampered further bioinformatics analysis. However, midway through our study, we also learnt that StavroX 3.6.0.1 software requires data only from an ESI-MS instrument. Thus, for both these reasons, we had to switch from MALDI-MS studies to ESI-MS investigations.

Sample preparation for ESI-MS
Adjusted sample to 0.1% Formic acid using 2.5% formic acid and made up the sample volume up to 100 µL. Wetted the tip by aspirating 100 µL of 50% Acetonitrile in water and then discarding the solvent. Repeated this step twice and then equilibrated the tip by aspirating 100 µL of 0.1% Formic acid and discarding the solvent. Aspirated up to 100 µL of sample into the C-18 tip for maximum efficiency dispersed and aspirated sample for 3-10 cycle. Rinsed the tip by aspirating 100 µl of 0.15 formic acid-5% Acetonitrile and the solvent was discarded. Repeated this step four times and then eluted the sample by slowly aspirating 100 µL of 0.1% formic acid and 70% Acetonitrile and dispersed into a new micro centrifuge tube. Vacuum dried the samples and used these for ESI-MS studies.

Mass spectrometry
The data obtained from the above instrument was loaded into software named Protein Pilot and Mascot search was performed ( Figure  5). The database used for the MS/MS ion search was Swissprot. The experiment was performed for mass analysis on the 'dimeric' band obtained from trypsin digestion using AB SCIEX Triple TOF 5600 Instrument at room temperature. The Pulser frequency had been adjusted to the value of 14.980 kHz for this method. Pulse 1 Duration was 3 µs for this method. The File was acquired with TDCx8. Acquisition method was total 25 min, gradient 800 µl, 3 µl dam and acquisition duration were 38 minutes and in the Synchronization Mode. The software version used in ESI-MS instrument was Analyst TF 1.7. StavroX 3.6.0.1 software StavroX 3.6.0.1 was used to identify the intermolecularly crosslinked peptides [30]. This is the latest version of this and it enables quick and efficient identification of the intermoleculary cross-linked peptides. This software basically calculates the theoretical cross-links and estimates them to the precursors of the ESI-MS data stored in the form of .mgf file. This further leads to the identification of the hits and scores which are given accordingly. For analysis, the original FASTA sequence of our protein along with the ESI-MS data was uploaded. The software provides options to select the desired cross-linker along with the scope to add new cross-linkers. The cross-linker used by us in this experiment has a chemical composition of C 35 H 44 N 8 O 14 with the molecular mass of 800.2977 (Chemical Characterization of the crosslinker is being published separately). No changes were made in the amino acid sequence section. An unspecific digest option was selected along with minimum peptide length as 1 and maximum peptide length as 10. The precursor precision was selected to be 250.0 ppm, fragment ion precision as 1.0 Da with the lower mass limit as 200.0 Da and upper mass limit as 6000.0 Da. The S/N ratio was selected to be 2. Only 'b' and 'y' ions were selected with the score cut-off of -1 and pre score intensity greater than 10%.

PyMol software
PyMol is a software for molecular visualization. With the help of this software 3D figures of the protein molecules could be generated to visualize the cross-linked sequences from the information obtained from the StavroX 3.6.0.1 software.
This new crosslinker functions via 'long-lived' transients, ensuring enhanced and more efficient intermolecular crosslinking. Its molecular ion (M + ) was observed at m/z 800.2507 and the base peak at m/z 533.1589 (SI-IVA), which possibly arises out of fragmentation of the molecule (II) it being a tertiary ester could undergo ready cleavage. MS/ MS spectrum of the m/z 800 peak showed peaks at m/z 726.416 (loss of 74 amu) and 508.231(loss of 292 amu), which further loses a mass of 292 amu yielding a fragment with m/z 298.128, which confirms the molecular ion to be 800.2507 (SI-IVB).
The new heterobifunctional crosslinker was then incubated with Lysozyme, which was followed by photolysis at 366 nm, which is mild not to damage the protein. SDS-PAGE was then carried out. Figure 1 shows Lysozyme after incubation with the new crosslinker (Lanes 2 and 4) and after incubation-photolysis (Lanes 8, 9) along with the ladder (Lane 6). The 'dimeric' bands can be observed at around 28 kDa in lanes 8 and 9. This indicates that photolysis brings about 'dimerization'.
The 'dimeric' bands were excised and trypsin digested to prepare the sample for ESI-MS. analysis. ESI-MS was performed as described above and the data obtained as .zhr file was converted to a .mgf file. ESI-MS chromatogram is shown in Figure 2 (Mascot analysis of the trypsin digested 'dimeric' band is shown in SI-V).
This data was uploaded in the StavroX 3.6.0.1 software along with the FASTA sequence of Lysozyme and appropriate settings were selected and the process was run. As a result, 126 of 127 spectra was compared to 6836649 theoretical candidates out which 7338 possible cross-links were identified within 1 minute and 7 seconds of the run.
Ten most significant intermolecularly cross-linked candidate fragments identified by the StavroX 3.6.0.1 software are included in Table 1.
After the analysis was over, a window opened up on top, showing a bar chart, where the number of candidates identified in a certain score range (number of hits) to the score hit was plotted. The Decoy analysis figure helped to estimate the quality of the score in our experiment. The blue bars represent the number of candidates from our data set and the red bars represent the number of false positive candidates from a decoy data set, which is obtained from the inverted sequence of the FASTA file. More enriched real data set candidates in the score region indicates towards better crosslinking. The decoy analysis data for m/z 1753.831 fragment is shown in Figure 3.
Out of the top nine candidates with high scores, the detailed spectrum for the peak value m/z 1753.831, the peptide fragments involved in the process of intermolecular cross-linking are shown in SI-VI. The spectrum panel shows the MS 2 spectrum for the identified peaks. This one example with its annotation is shown here as a representative. In the deviation diagram (printed below the spectrum panel, deviation of the identified signals is plotted against the m/z values) less deviation in the annotation, points towards better results in the crosslinking experiment. (Similar detailed spectra of the other intermolecularly crosslinked fragments have also been obtained via StavroX 3. 6. 0.1, from our experimental data, but these have not been exhibited here). These ten most significant fragment ions identified by StavroX 3.6.0.1 thus provide further evidence for successful intermolecular crosslinking by our new heterobifunctional crosslinker. The, modified fragment ions identified by StavroX 3.6.0.1 are included in Table 2.
The 'b' and 'y' ions for m/z 1753.831 fragment ion as determined by StavroX 3.6.0.1 are shown in Table 3.
To visualize the positioning of the cross-linking sites in 3D, we used the software PyMol. Figure 5 shows the 3D representation of the intermolecular cross-link that has occurred between two Lysozymes using our new heterobifunctional cross-linker.
Crosslinking studies on Lysozyme using homobifunctional crosslinkers was discussed in a seminal and highly cited paper by A. Sinz's group [3]. Cross-linking reactions with sulfo-DST and sulfo-EGS yielded no cross-linking products, while the cross-linking reaction with BS 3 gave two cross-linking products. The details of the nature of this cross-linking (mostly intramolecular cross-linking) are included in Table 4, reproduced from this earlier work. It is clear that cross linking in their case was indeed very limited.
Our current experiments also led to the identification of intramolecular crosslinks (Table 5). In addition, our experiments also identified many intermolecular cross-linking not detected by the earlier workers. In comparison, our experiments have led to enhanced intermolecular cross-linking (Table 6).
Our results thus justify the hypothesis originally put up by Hagan Bayley (loc. cit.) based on perflourophenylazides and extended by us to our aryl azides, which do not require ortho-flanking fluorines, as in the case of perflourophenyl azides. Aryl azides that lead to 'long-lived' transients bring about more efficient intermolecular cross-linking, which is the case with our new hetero-bi-functional cross-linker. As stated earlier, this observation can have an impact on diverse areas of science.

Conclusions
The use of a new arylazido NHS-hetero-bifunctional cross-linker based on an "introverted" acid is described in this paper. Even today, many laboratories continue to use conventional crosslinkers, like formaldehyde and glutaraldehyde, which lead to indiscriminate and widespread crosslinking. This is due to lack of information about the technique of Chemical crosslinking-mass spectrometry-bioinformatics tools, which is an emerging technique of much value. A previous literature report using a homobifunctional crosslinker, detected no    crosslinking with Lysozyme or just two intramolecular crosslinks. We, on the other hand, report many crosslinks, both intra-and inter molecular, in this paper. Crosslinking of Lysozyme has been done by us as a 'proof of concept'. Our study confirms that the new crosslinker successfully brings about intermolecular crosslinking more efficiently. Use of ESI-MS/MS/MS along with StavroX 3.6.0.1, the bioinformatics software, greatly facilitates the analysis of intermolecular crosslinking of two protein molecules.
Crosslinkers based on perflourinated aryl azides, with orthoflourine atoms on either side of the aryl nitrene intermediate, remain the only other class which lead to enhanced intermolecular crosslinking. We demonstrate here that presence of ortho-flanking flourines is not a mandatory requirement for increased intermolecular reaction rates. In our case, presence of a methoxy-and a methoxycarbonyl on either side of the aryl nitrene intermediate, with their respective electron donating and electron withdrawing abilities with their steric effects and placed in a sterically demanding environment suffices to stabilise the singlet nitrene and bring about similar enhanced intermolecular crosslinking. We have demonstrated that we achieve this without the need for any fluorination, which is very demanding.
The technique of 'chemical crosslinking-mass spectrometrybioinformatics' has implications in many areas, e. g. for studies on protein-protein interactions, for proteomics/lipidomics and in systems and structural biology. It is expected to help in preparing monoclonal antibody-drug conjugates, which specifically target tumor cells representing "the pinnacle of such targeting efforts" [31]. Recently, it has been shown that combining cryo-electron microscopy (cryo-EM) with chemical crosslinking will pave the way for highly efficient in vivo studies [32]. The technique also contributes to many areas of materials science. As it evolves, this technique will be increasingly more amenable to high through put screening (HTS) of patient samples in a routine, rapid and reliable manner [33].