Anatomy of a crosslinker

Crosslinking mass spectrometry has become a core technology in structural biology and is expanding its reach towards systems biology. Its appeal lies in a rapid workflow, high sensitivity and the ability to provide data on proteins in complex systems, even in whole cells. The technology depends heavily on crosslinking reagents. The anatomy of crosslinkers can be modular, sometimes comprising combinations of functional groups. These groups are defined by concepts including: reaction selectivity to increase information density, enrichability to improve detection, cleavability to enhance the identification process and isotope-labelling for quantification. Here, we argue that both concepts and functional groups need more thorough experimental evaluation, so that we can show exactly how and where they are useful when applied to crosslinkers. Crosslinker design should be driven by data, not only concepts. We focus on two crosslinker concepts with large consequences for the technology, namely reactive group reaction kinetics and enrichment groups. Addresses 1 Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355, Berlin, Germany 2 Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, UK Corresponding author: Rappsilber, Juri (Juri.Rappsilber@tu-berlin.de) Current Opinion in Chemical Biology 2021, 60:39–46 This review comes from a themed issue on Omics Edited by Nichollas Scott and Laura Edgington-Mitchell For a complete overview see the Issue and the Editorial https://doi.org/10.1016/j.cbpa.2020.07.008 1367-5931/© 2020 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.o rg/licenses/by/4.0/).


Introduction
Crosslinking mass spectrometry (MS) technology and its applications have been reviewed extensively [1e6], which includes subdisciplines providing high-density crosslinking data [7,8] and quantitative crosslinking data [9]. Here, we would like to turn the spotlight back onto the crosslinking reagent, focussing on the multiple aspects of chemical structure and reactivity that these applications depend upon.
Protein crosslinking predates the electrospray-ionisation MS that is used routinely nowadays for its detection, by 30 years. The first identification of a crosslink (via paper chromatography) in 1958 by Zahn et al. was made following the use of a Sanger's reagent derivative 1,5difluoro-2,4-dinitrobenzene ( Figure 1) on insulin [10,11]. Since this time, there has been an explosion of protein crosslinking reagents available and something of an evolution/revolution in the chemicals used specifically for crosslinking-MS [12].
The range of available reagents and the chemistry and diverse functionality that can be incorporated within them is so great that it becomes impossible to review them on a case-by-case basis (to date, SigmaeAldrich lists 268 crosslinkers, Thermo Fisher Scientific lists 104 crosslinkers, crosslinkers are also not a part of standardisation efforts in crosslinking-MS [https://arxiv. org/abs/2007.00383]). Instead, the crosslinking reagent can rather be dissected, understood and discussed in terms of its constituent parts. Note that 'zero-length' crosslinkers [13,14], reactive oxygen species [15] and metal-induced redox reactions [16], and photoactivatable noncanonical amino acids [17] can lead to crosslinks. However, the largest individual group of crosslinking reagents, which is also the most used for crosslinking-MS today, comprises molecules with functional groups, whereby at least two of the groups are reactive and capable of conjugation, which are themselves separated by a group functioning as a spacer or distance restraint ( Figure 2). The simplest examples of this type include disuccinimidyl suberate (DSS) and bis(sulfosuccinimidyl)suberate (BS3) (NHS and sulfo-NHS esters, respectively), where the spacer constitutes a straightforward alkyl chain ( Figure 1).
Despite the large selection of available crosslinkers, and their predating crosslinking-MS by nearly two decades, DSS [18] and BS3 [19] have become the go-to reagents of choice [6]. This was also evidenced in a recent interlaboratory study where the majority of participating labs chose DSS/BS3 (35/57 data sets) [20]. The choice is understandable considering the huge range of applications and successes using DSS/BS3, including integrative structural biology in purified complexes [21] [9,28].
The simplicity of DSS/BS3, reaction specificity, ease-ofuse, reaction product stability and lack of reaction byproducts, all contribute to the tremendous success of these reagents. However, an overreliance on this specific reactive group chemistry significantly limits the information density of crosslinking and may mislead structural analysis as we will discuss in the following context. Furthermore, there is significantly more to crosslinker design than only protein reactive groups.
In this review, we give a general overview of the array of functional groups found in crosslinking reagents. We propose that crosslinker design has been led by concepts, including reactive group specificity, cleavability to enhance identification and enrichability to improve detection. We argue that currently the enthusiasm for concepts pushes the design of new reagents. As appealing as concepts can be, they still must be underpinned by data. Ultimately, it is the data that reveal, where further improvements are required. After all, it is data that crosslinking-MS feeds, into structural and systems biology.

Reactive groups
The NHS-ester (and derivatives) is found ubiquitously in crosslinking reagents. Its reaction specificity in proteins is well characterised, preferentially reacting with amines from lysine side chains and protein Ntermini but also with hydroxyl groups from the side chains of serine, threonine and tyrosine, at physiological conditions [29,30]. This confined specificity, predominantly towards pairs of proximal lysine residues, limits the numbers of distance restraints that can be introduced into proteins; hence data are sparse and lowresolution. Data density can be increased by expanding reactivity to target other amino acid residues,   Crosslinking reagent anatomy. Requisite features are shown in red. Optional features are highlighted in grey. Many different reactive groups and spacer scaffolds, and multiple different cleavable, releasable and enrichable groups are in use. Even more can be imagined or will be developed in the future. The resulting combinatorial complexity has already resulted in a large number of crosslinkers and allows for an even larger number of future crosslinkers. Reagents are typically only demonstrated in proof-of-principle, although some also in biological applications. However, the actual benefit of most of these reagents for crosslinking-MS has not been assessed rigorously. MS, mass spectrometry.
Expanding reactivity even further, to all amino acids, has led to the greatest leap towards increasing data density so far (Figure 3), via crosslinkers containing photoactivatable reactive groups including diazirine (sulfo-SDA/SDA) [35] and benzophenone (sulfo-SBP/SBP) [36]. Upon activation by ultraviolet irradiation, the resulting reactive radical intermediates react promiscuously and readily with any proximal NeH and CeH bonds. These all-amino-acid-targeting photoactivatable groups are combined with highly amino acidespecific NHS-ester groups into heterobifunctional, semispecific crosslinkers. Protein crosslinking becomes a two-stage reaction: First, the NHS-ester groups decorate proteins specifically on nucleophiles (as previously described). Second, photoactivation completes crosslinking, to any part of a protein within a crosslinkable distance. Similar to the reaction products of homobifunctional, NHS-ester crosslinkers BS3/DSS, the beauty of the reaction products of the diazirine in sulfo-SDA, lies in their simplicity. The spacer group introduced is a short, 4-carbon, alkyl chain (3.9 A spacer length, compared with 11.4 A for BS3/DSS). The crosslinked products from use of photoactivatable benzophenone on the other hand, retain the aromatic groups. Interestingly, there is evidence of complementarity between crosslinked products with diazirine and benzophenone [36], despite involving the same anchoring residues. The influence of crosslinker spacer composition and conformational preference is an aspect of reagents that is being actively studied [37,38].
Semispecific, heterobifunctional crosslinkers can increase data density (not crosslinks per protein molecule) by orders of magnitude, compared with amino acidespecific, homobifunctional crosslinkers (a crosslaboratory study reported 44 crosslinks for monomeric BSA summed from three liquid chromatography (LC)-MS runs, as an average of the participating labs, whereas sulfo-SDA yielded 311 in a single LC-MS run [20]). The work carried out by our laboratory demonstrated the implication of this on one of the original application fields of crosslinking-MS, resolving detailed tertiary protein structure [8,35]. Indeed, high-density data using semispecific crosslinkers can improve proteinstructure predictions [35] and routine implementation is intensely investigated [7,8,39e42]. This work inspired a number of other works in pursuit of this same ambition [41e44]. Further studies continue to demonstrate the benefits of semispecific crosslinkers in protein complexes [45]. The costs involved with the use of semispecific crosslinkers are the increased needs for high-quality MS and MS/MS data, and robust database search and respective false-discovery rate [46e49] determination pipelines [28,50]. The same is also true wherever the scope of crosslinked products is expanded, including proteome-wide analyses [51e53].
An oft-mentioned concern in crosslinking-MS is the occurrence of artefacts, especially for photocrosslinkers [2,54]. Large numbers of crosslinks per protein molecule supposedly lead to catching, and stabilising, rare or artificial protein conformations and interactions, by 'zippering' (Figure 3). However, even at very high reagent concentrations, homobifunctional NHS-ester crosslinking reagents perturb protein structure only locally, with global folding relatively unscathed [55,56]. On the other hand, crosslinking with semispecific photoactivatable reagents is typically carried out with lower reagent concentrations than normal homobifunctional NHS-ester crosslinking, with high-density information resulting from the aggregated data of many individual protein molecules [28,35]. Zippering instead appears to be linked to reaction kinetics, rather than the number of crosslinks per molecule.
The reaction kinetics of crosslinker reactive groups is a concept that has recently been garnering more attention with regards to protein crosslinking reactions [55,57]. In short, the two relevant key factors of the reactive group are reaction rate and half-life. The NHS-ester group is highly reactive with a long half-life, consequently waiting for protein to come close enough for capture. This makes homobifunctional NHS-crosslinkers susceptible to zippering. The diazirine photoactivatable group however, produces an extraordinarily reactive (order of nanoseconds) carbene radical upon photoactivation, with an almost nonexistent half-life [58]. This indiscriminate and fast reaction therefore accurately captures protein states and interactions [54,59]. Taking reactivity in the other direction, the sulfur fluoride exchange (SuFEx) clickable group [60] has a much lower reactivity than NHS-ester, thus requiring longer-lived contacts and therefore not catching short-lived protein states [61]. Considering reaction kinetics and not only the reaction specificity of reactive groups, suggests a strong future for reactive groups complementary to the NHS-ester and the importance of data-driven crosslinker design.

Spacer functionality
The spacer can act as a scaffold for functionalities that address the low abundance and complexity of analytes, characteristics of crosslinking products that challenge analysis. Functionality includes MS-cleavable groups, isotope-coding, enrichment handles and related capture and release groups. Modular design approaches applied to crosslinker synthesis can be used to create reagents combining multiple functional groups [62]. This is where the range of applicable chemistry also greatly expands. This is demonstrated in the PIR crosslinkers [63] (Figure 1), azide/alkyne-A-DSBSO [64], CBDPS [65] and Leiker [66], which have seen application success including proteome-wide analyses [64,67,68].
MS-cleavable groups include multiple classes of bonds, which cleave upon collisional activation in the mass spectrometer and have been incorporated into numerous crosslinkers. MS-cleavable crosslinkers have been reviewed extensively [2,3,69]. The concept of MScleavability is proposed as important for proteome-wide analyses [52,70e72]. Other comparable proteome-wide studies however, have relied simply on DSS/BS3 [22,23,28]. This leads to the question of where the real gains of MS-cleavable groups can be achieved versus the conceptual benefits. Contributing to this, it is still unclear to what extent available reagents realise the concept. MS-cleavable reagents will not cleave with the same efficiency between all crosslinked peptides, an aspect that original proof-of-concept studies did not investigate and that urgently demands to be quantified. Assuming perfect behaviour when designing acquisition or search strategies will come at the cost of excluding a potentially large number of crosslinked peptides that cleave with lower efficiency than selected examples. Furthermore, the data obtained using noncleavable reagents can be computationally dissected into contributions of the individual peptides [73]. This computational 'cleavage' blurs the distinction between MS-cleavable and other crosslinkers and increases the need for more rigorous experimental validation.
Enrichable groups have also been used in crosslinkers (sometimes in combination with MS-cleavable groups), for proteome-wide analyses [64,67,68]. The predominant enrichment group used in the spacer of crosslinkers is biotin [62e66,74e80], which exploits the remarkably high affinity of the avidinebiotin complex. Biotinylated crosslinked peptides can be isolated on a solidsupported avidin derivative (including monomeric avidin, streptavidin and neutravidin), with the streptavidinebiotin interaction being the strongest noncovalent biological interaction known (K d w 10 À15 M). This extremely high affinity presents a problem however, when release from the solid support is required [81]. It is unclear what is the extent of recovery from streptavidin using organic solvent elution alone [74,77,79]. Solid-supported monomeric avidin, with a far lower binding affinity with biotin (K d w 10 À7 M), represents a compromise [63,65,75,76,78]. Other attempts to overcome the problem of release from the solid support have been to include release groups between the biotin and the rest of the crosslinker backbone [62,64,66,80]. These include polyethylene glycol (PEG, which is hydroxylamine-cleavable) [62], pinacol esters (acid-cleavable) [64], azobenzene (cleaved by sodium dithionate reduction) [66], disulphide (reductive-cleavage) [80] and photocleavable groups [66]. Click chemistry groups have been proposed as both capture groups for coupling a biotin enrichment group Consequences of reactive group chemistry in crosslinkers. (a) Semispecific photocrosslinkers can increase crosslink data density (the protein sequence is represented by a grey circular line; the crosslinks are represented as internal lines) by expanding reactivity to all amino acid residues. Photocrosslinkers such as SDA contain lysine (and S/T/Y) reactive NHS-esters on one side (blue circle). The photoactivatable reactive group (red circle) on the other side can react with any amino acid. The reaction specificity of standard homobifunctional NHS-ester-based after crosslinking [64,78], and as enrichment groups in their own right [64,82,83], with reversibility achieved via an acid-cleavable acetal [64,83] or disulphide [82]. Recently, a different enrichment group, a negatively charged phosphonic acid, was developed into a crosslinker reagent [84]. This simple, small and short crosslinker is a refreshing exception to the biotin/Click chemistry conundrum and the trend towards functional crosslinker expansion. Paradoxically though, the negative charge prevents the crossing of cell membranes for in-cell crosslinking, a major application goal that thirsts for enrichment.
The impressive array of chemistry applied to crosslinking reagents in the quest for crosslinker enrichment shows that synthesis of even protracted functionality is possible. What is not yet clear is whether increasing functionality guarantees or translates into improved identification of crosslinked analytes, that is, greater information content which might be reached by increased number of crosslinks (quantity) at maintained confidence (quality) but also different types of cross links (kinetics, selectivity). A concern is that increasing and combining functional chemistry into larger crosslinking reagents could end up burdening analysis, owing to issues such as increased hydrophobicity, unexpected side reactions, inefficient reactions, complicated fragmentation spectra and other related causes of analyte or information losses.

Conclusion
The power and potential of crosslinking-MS has been thoroughly established. Extensive claims regarding conceptual gains, through the use of crosslinker chemistry and consequent functional groups, have been made repeatedly. Crosslinker anatomical concepts including MS-cleavability and enrichment have potential, but the benefits must be demonstrated clearly. This must be done in order to move forward with developing better performing crosslinkers and optimal combinations of functionality.
Functional groups applied in crosslinkers need to be thoroughly tested, characterised and proven specifically in the context of crosslinking-MS. A larger focus on data is urgently needed, to enable data-driven crosslinker design. An important element of this is the inclusion of statistical data as part of proof-of-concept studies to complement cherry-picked examples, which would also include providing open and stable access to the raw data. One way this could be done is through synthetic peptide libraries [50].
The increased appeal of crosslinking-MS, means that many scientists now want to apply crosslinking-MS as a tool to answer their own structural biology questions. However, they must first navigate concepts when deciding which crosslinker reagent to choose for their application. It is possible that an ideal crosslinker already exists, despite crosslinker developers continuing to churn out new variants. But without proving concepts through data, crosslinker developers are unable to make concrete suggestions or convince crosslinking MS users to apply their crosslinker reagents with confidence. A consequence is that researchers will err on the side of caution, opting for the simplicity of homobifunctional NHS-ester based crosslinkers such as BS3 and DSS.
Crosslinking reagents are of course essential but still only one of the critical pieces of the crosslinking MS machinery, which also relies on MS, software engineering, data processing and analysis, data visualisation and integration with structural biology. Ultimately the true test of crosslinker chemistry will be through the applications that are achieved and the discoveries that will be made.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.  Shows that the crosslinker, as a molecule and not just its reactive groups, influences which residues are captured.