MaXLinker: proteome-wide cross-link identifications with high specificity and sensitivity

Protein-protein interactions play a vital role in nearly all cellular functions. Hence, understanding their interaction patterns and three-dimensional structural conformations can provide crucial insights about various biological processes and underlying molecular mechanisms for many disease phenotypes. Cross-linking mass spectrometry has the unique capability to detect protein-protein interactions at a large scale along with spatial constraints between interaction partners. However, the current cross-link search algorithms follow an “MS2-centric” approach and, as a result, suffer from a high rate of mis-identified cross-links (~15%). We address this urgent problem, by designing a novel “MS3-centric” approach for cross-link identification and implemented it as a search engine called MaXLinker. MaXLinker significantly outperforms the current state of the art search engine with up to 18-fold lower false positive rate. Additionally, MaXLinker results in up to 31% more cross-links, demonstrating its superior sensitivity and specificity. Moreover, we performed proteome-wide cross-linking mass spectrometry using K562 cells. Employing MaXLinker, we unveiled the most comprehensive set of 9,319 unique cross-links at 1% false discovery rate, comprising 8,051 intraprotein and 1,268 interprotein cross-links. Finally, we experimentally validated the quality of a large number of novel interactions identified in our study, providing a conclusive evidence for MaXLinker’s robust performance.


INTRODUCTION
In the post-genomic era, one of the main goals of systems biology is to determine the functions of all the proteins of various organisms. In the cell, most proteins function through interacting with other proteins. Therefore, generating interactome network models with high quality and coverage is a necessary step in the process of developing predictive models for protein functions at the scale of the whole cell 1 . Furthermore, structural information for protein-protein interactions can serve as a crucial prerequisite for understanding the mechanism of protein function 2 .
Rapid advancements in the respective fields of cross-linking and mass spectrometry lead to the development of a powerful technique known as cross-linking mass spectrometry (XL-MS) 3,4 .
XL-MS has been demonstrated to be an efficient technology to capture distance constraints, thereby providing crucial information to decipher the interaction partners and dynamics of proteinprotein interactions 5 . Development of efficient MS-cleavable chemical cross-linkers such as disuccinimidyl sulfoxide (DSSO) 6  Moreover, utilizing sequence information only from MS3 spectra (a subset of CID-MS2-MS3) for cross-link identification ('MS3-Only') resulted in the least number of crosslinks among all the approaches. Hence the study concluded CID-MS2-MS3-ETD-MS2 and MS3-Only approaches to be the most and least informative approaches, respectively. However, the study did not assess quality of different approaches at the given FDR cut-off using a rigorous comparative analysis.
In this study, we perform systematic and rigorous quality assessment across different XL-MS acquisition strategies, inspired by approaches widely-used in machine learning 1,10 . Based on these analyses, we noted that XlinkX results in high number of mis-identifications. Therefore, we developed and validated a novel search algorithm named MaXLinker, which is based on an innovative "MS3-centric" approach, designed to efficiently eliminate incorrect cross-link candidates. At a 1% FDR, MaXLinker has an 18-fold lower rate of mis-identifications than XlinkX. With MaXLinker in hand, we performed a large-scale proteome-wide XL-MS study on K562 cell lysate, yielding the largest XL-MS data set to date. We further validated the cross-links using available three-dimensional structures and through a systematic experimental validation of novel interactions identified in our study.

Current MS2-centric cross-link search algorithms are limited in their sensitivity and specificity
When compared to traditional PSM searches, the identification of CSMs from a proteome-wide study is markedly more complex. This fact motivated us to thoroughly examine the MS2-centric algorithms 9, 11 for processing proteome-wide XL-MS data sets. XlinkX is currently the most widely-used MS2-centric software (it is available as a node within Proteome Discoverer 9 ). Thus, we performed a systematic quality comparison of cross-links generated by XlinkX using data from multiple XL-MS acquisition strategies described in Liu et al 9 . First, we obtained corresponding raw files for the three fragmentation schemes CID-MS2-ETD-MS2, CID-MS2-MS3 and CID-MS2-MS3-ETD-MS2 (through email request to Dr. Fan Liu). Then we performed cross-link search using XlinkX software (implemented in Proteome Discoverer 2.2) at 1% FDR with a concatenated database containing sequences from E. coli proteome (true search space) and S. cerevisiae (false search space). It is important to note that XlinkX by default, generates a reversed version of the input database and uses it as a decoy database to estimate FDR. As a next step, we compared the three fragmentation approaches in terms of the number of incorrect unique CSMs (CSMs with at least one peptide from the S. cerevisiae search space, i.e., mis-identifications). The aim of this search is to re-assess the quality of cross-links at 1% FDR, with expected fraction of incorrect CSMs involving unambiguous peptides from S. cerevisiae to be less than 1%.
This analysis clearly indicates that the methodology implemented in XlinkX does not adequately evaluate the quality of the identified CSMs. Therefore, utilizing only the number of identifications for comparative evaluations 9 might not yield accurate conclusions about the capability of different acquisition strategies. As the FDR filtering is typically performed at the redundant CSM level by the conventional cross-link search algorithms (i.e., before the processing step that results in a unique list of CSMs), we repeated the analysis at redundant CSM level and observed results consistent with what was found at the unique CSM level (Supplementary Figure 1).

The most reliable sequence information for cross-linked peptides comes from the MS3-level
We also evaluated quality of identifications from CID-MS2-MS3 approach, with sequence information obtained exclusively from MS3 spectra ('MS3-only') ( Fig.1a). Strikingly, we observed a drastically lower fraction of incorrect CSMs for 'MS3-Only' (3.3%), which is a subset of CID-MS2-MS3 (with 14.8% mis-identifications). This result clearly demonstrates that MS3, the most advanced MS level, provides higher quality sequence information in comparison to MS2level. To improve the quality of XlinkX-identified CSMs, XlinkX allows the use of ' XlinkX score' to further filter the CSMs. As a next step, we filtered the CSMs using five different ' XlinkX score' cutoffs and re-assessed their quality across different approaches. We observed that, overall, increasing the stringency based on ' XlinkX score' significantly reduced the number of incorrect CSMs for all three acquisition approaches (Supplementary Figure 2). However, even after filtering by ' XlinkX score', the trend across the different methods was similar to what was observed before the filtering (Fig.1a and Supplementary Figure 2), with data from the MS3level yielding the highest fraction of reliable CSMs.

Precision is a reliable metric for comparative quality assessment for proteome-wide XL-MS data sets
To perform a more comprehensive and rigorous quality evaluation, we next utilized precision to compare the quality across the three acquisition approaches. Precision has been shown to be an effective quality measure in machine learning based studies for the identification of proteins 12,13 and their interactors 14 . Precision has also been utilized for evaluating the quality of large-scale interactions screens, where it is derived using known interactions (as training set) 1 . However, none of the reported XL-MS studies have adapted it as a quality estimate for their data sets. Here we present precision as a measure to assess quality of cross-link data sets from proteome-wide XL-MS studies. Precision for XL-MS essentially represents the fraction of identified interprotein cross-links that correspond to known protein-protein interactions (METHODS). Remarkably, precision complemented the result obtained in the above analysis using additional S. cerevisiae search space (Fig. 1b, Supplementary Figure 2).

Most of the reliable cross-link identifications are contributed by CID-MS2-MS3 methodology
It is important note that CID-MS2-MS3-ETD-MS2 (combination CID-MS2-MS3 and CID-MS2-ETD-MS2 methodologies) resulted in higher fraction of mis-identifications when compared to CID-MS2-MS3 approach (Fig. 1a). Upon closer examination of the quality of CSMs identified by the inherent CID-MS2-MS3 and CID-MS2-ETD-MS2 methodologies, we observed that at 1% FDR, CSMs identified exclusively by CID-MS2-ETD-MS2 contains almost two-fold higher fraction of mis-identifications in comparison to exclusive identifications by CID-MS2-MS3 (Supplementary Figure 3). We repeated the analysis after filtering the CSMs at different ' XlinkX score' cut-offs. It is interesting to note that, as the cut-off score increases, the number of identifications contributed exclusively by CID-MS2-ETD-MS2 reduces consistently, to as low as 6% when compared to the exclusive identifications by CID-MS2-MS3 (at ' XlinkX score' ≥ 50) (Supplementary Figure 3). These results reveal that, for CID-MS2-MS3-ETD-MS2, at higher quality cut-offs, CID-MS2-ETD-MS2 fails to yield additional cross-links than what were already captured by CID-MS2-MS3.
Our observations provide captivating evidence that, among the three widely used approaches, CID-MS2-MS3 results in cross-links with significantly better quality, most of which rely on MS3 spectra for sequence information. However, the high number of incorrect identifications for CID-MS2-MS3 approach at 1% FDR by XlinkX (the current state-of-the-art search engine) strongly demonstrates the need for an improved search algorithm that can efficiently eliminate false positives while maintaining a minimum number of false negatives.

MaXLinker: a novel "MS3-centric" approach for cross-link identification
To address the limitations faced by the conventional "MS2-centric" algorithms such as XlinkX for reliable cross-link identifications from MS2-MS3 fragmentation, we designed a novel "MS3centric" approach (Fig. 2). The XlinkX starts the search at MS2-level and attempts to identify CSMs exclusively from the MS2 spectrum, for cases with no available sequence information from MS3-level. However, our analyses revealed that such "MS2-centric" approach could lead up to 14.8% false identifications (Fig. 1a). On the contrary, our approach starts the search from MS3level, which is confirmed through our analyses to be most informative level for the sequences of cross-linked peptides (Fig. 1). Additionally, our approach fully utilizes MS2-level to rescue candidate CSMs ('MS2 Rescue node') if one of the two cross-linked peptides could be reliably identified from the MS3 spectra ( Fig.2 Node C). Finally, we require all cross-links to match the precursor mass in MS1 ( Fig. 2 Node D ) and perform correction for mis-assigned monoisotopic MS1 precursor masses ( Fig.2 Node B). This novel design, where we start with MS3-level information but fully integrate information from both MS2 and MS1 levels, fundamentally enables MaXLinker's rigorous cross-link identification and validation work-flow.
The general experimental methodology for MS2-MS3 strategy involves precursor selection at multiple stages of mass spectrometry. First, ions above certain threshold charge state (typically ≥ +3 or +4) will be selected for fragmentation at MS2 stage to yield signature ions with predefined mass difference (m = 31.97 for DSSO). Further, an iterative search known as 'targeted inclusion' is performed by mass spectrometer on-the-fly to select ion pairs with signature m, following certain prioritization criteria to perform fragmentation at MS3-level to yield two MS3 spectra per peptide in an ideal scenario. MaXLinker accepts '.mgf' files consisting different levels of MS spectra exported using Proteome discover (PD), along with PSM annotations from PD as input (METHODS). MaXLinker initiates the search from the MS3-level by performing the mandatory precursor-based mass validation ( Fig. 2 Node 'A'). Initiating the search from MS3, the most informative level in terms of the peptide sequence information, provides a key advantage to MaXLinker in eliminating potential false positives. If a set of MS3 spectra representing a potential cross-link pass the precursor-based mass validation step ( Fig. 2 Node 'A') (Case 1 in Fig. 2), it is verified through multiple validation filters ( Fig. 2 Node 'D'). It is important to note that typically larger size of crosslinked peptides can often result in the mis-assignment of the monoisotopic MS1 precursor mass 15 , thus for cases that fail to pass through the precursor mass-based filter ( Fig. 2 Node 'A'), MaXLinker inspects the corresponding MS1 spectrum to verify mis-assignment of the monoisotopic MS1 precursor mass ( Fig. 2 Node "B"). Such cases are systematically examined and passed on to the next filter if they satisfy the mass validation step with the adjusted precursor mass.
The remaining failed candidates are sent to the 'MS2 Rescue Module ( Fig. 2 Node 'C').
MS2 Rescue Module is another important and unique feature of MaXLinker. As mentioned earlier, this module is triggered if the candidate spectra failed to pass the precursor-based mass validation step ( Fig. 2 Node 'A') and could not be validated through precursor mass re-assignment.
We found that failure to pass these filters often coincided with poor or "uninformative" MS3 spectral data for one of the cross-linked peptides (case 2 in Fig. 2). In this case, considering a scenario where the mass spectrometer picked an incorrect m pair from the MS2-level having the signature just by chance, MaXLinker attempts to obtain sequence information for the peptide by utilizing fragment ions from the corresponding MS2 spectrum ( Fig. 2 Node C). First, precursor masses for the peptide with poor MS3 spectra are derived using MS2 precursor mass and MS3 precursor masses of the "informative" MS3 spectra (with account for the linker long and short arm modifications) (Supplementary Figure 5). An additional validation search is performed on the ions of the corresponding MS2 spectrum to confirm presence of the derived MS3 precursor masses.
Subsequently, a PSM search is performed on the deconvoluted MS2 spectrum with the derived masses (both long and short) as the precursor mass. If the search returns at least one reliable PSM, the cross-link candidate (along with sequence information for the 'rescued' peptide) is directed to the general validation pipeline for further evaluation ( Fig. 2 Node D). Additionally, the MS2 Rescue module also accounts for cases where the mass spectrometer selects two pairs with signature m for MS3, however both pairs represent different charge states of one of the two crosslinked peptides (Supplementary Figure 6). Upon completion of the search, a unique list of crosslinks is obtained by merging the redundant CSM entries, and a confidence score is assigned to each identification (equation 2 in METHODS). Finally, a target-decoy strategy is employed to establish the FDR.

MaXLinker significantly outperforms XlinkX in both specificity and sensitivity
We evaluated the performance of MaXLinker utilizing MS2-MS3 XL-MS raw files for six E. coli fractions from Liu et al 9 . First, we utilized the strategy employed in Fig. 1a and performed the search using MaXLinker at 1% FDR. We noted that the fraction of mis-identifications was less than 1% (Supplementary Table 1), and for majority of the identifications (~82%), the peptide sequence information was derived from MS3 spectra (Supplementary Table 2), which agrees with MaXLinker's fundamental algorithmic design. Next, we compared the results with CSMs identified using XlinkX at 1% FDR on the same set of raw files (Fig. 3a). Our analysis showed that MaXLinker evidently outperforms XlinkX, indicated by the extremely significant difference (18-fold lower) in the fraction of mis-identifications (i.e. non-E. coli CSMs). We then examined the overlap between identifications from the two search engines (Fig. 3b). It clearly reveals that the overlapping fraction from XlinkX has only 0.6% mis-identifications, whereas the nonoverlapping CSMs which were identified exclusively by XlinkX contained a large fraction (33.1%) of mis-identifications. Further, using precision as a complementary quality metric, we observed similar results (Fig. 3d, 3e). When we repeated the quality analyses by filtering the identifications from XlinkX at different ' XlinkX score' cutoffs, we observed that MaXLinker consistently finds 13-31% more cross-links than XlinkX at comparable quality (Supplementary Figure 4).
Importantly, the CSMs identified exclusively by MaXLinker are of three-fold higher quality than the exclusive identifications by XlinkX, even at the highly stringent cutoff ' XlinkX score' ≥ 50 ( Fig. 3c, 3f). All these results demonstrate that MaXLinker outperforms XlinkX for CSM identifications in both specificity and sensitivity. Next, we cross-linked commercially available Bovine Glutamate Dehydrogenase 1 (GLUD1) using DSSO and performed a CID-MS2-HCD-MS3 experiment in our own lab (METHODS). We employed MaXLinker to perform two individual CSM searches, search1: using Bovine GLUD1 sequence as the search database yielding 43 unique CSMs, and search2: with a concatenated database with Bovine GLUD1 and a full proteome of Saccharomyces cerevisiae, yielding 36 unique CSMs. We then examined the overlap between CSMs from search1 and search2 to inspect MaXLinker's ability to find true CSMs from single protein in a plethora of false search space. Strikingly, we observed that 33 of 36 (92%) CSMs from search2 were overlapping with the ones from search1 (Fig. 4a). Out of the remaining 3 CSMs, 2 were misidentifications, having one of the peptides in the pair from S. cerevisiae proteome (false search space). Of note, 10 CSMs were identified exclusively in search1. Upon close examination, we noted that MaXLinker rejected those 10 CSM candidates due to either (i) its stringent validation filters or (ii) lower confidence in their PSM assignments, attributable to the drastic increase in the number of competing candidate peptides for individual spectra. On the other hand, when we performed similar analysis using XlinkX, search1 and search2 yielded 35 and 140 unique CSMs, respectively. Out of the 140 CSMs from search2, 30 were overlapping with search1 and the remaining 110 had at least one of the peptides from S. cerevisiae proteome (mis-identifications) (Fig. 4b). We examined the overlap between search2 identifications from MaXLinker and XlinkX and observed that most of the mis-identifications from XlinkX (109 of 110) were not found by MaXLinker (Fig. 4c). Further, we filtered CSMs from XlinkX using ' XlinkX score' ≥ 50 and reinspected the overlap with MaXLinker's identifications. This filtering step resulted in drastic elimination of false positives (Fig. 4d). However, all the non-overlapping CSMs from XlinkX were observed to be mis-identifications. On the other hand, MaXLinker identified 12 CSMs (containing 11 true CSMs) that were missed by XlinkX. For further validation of MaXLinker's identifications, we mapped CSMs from search1 on to a three-dimensional structure (Fig. 4e) of Bovine GLUD1. We observed that 15 of the 18 mapped CSMs were within the theoretical distance constraint (30Å), and the remaining three CSMs were within 38Å, validating reliable quality of our identifications. This analysis serves as a revealing case study for MaXLinker's unique ability to identify cross-links with high sensitivity and specificity.

Our proteome-wide K562 XL-MS study unveils the largest single set of cross-links
Having established the MaXLinker software and optimized the experimental pipeline in our lab, we carried out a comprehensive proteome-wide XL-MS study on human K562 cell lysates, using the CID-MS2-HCD-MS3 strategy. Previous proteome-wide XL-MS studies implemented the strong cation exchange chromatography (SCX) for pre-fractionation of crosslinked proteome samples. Here, to capture a more comprehensive set of cross-links, we employed both SCX and hydrophilic interaction chromatography (HILIC) for our proteome-wide XL-MS study. We then employed MaXLinker for cross-link identification. Our study yielded 9,319 unique cross-links (8,051 intraprotein and 1,268 interprotein with 74.2% precision) at 1% FDR (Supplementary Table 3), ~ 3-fold more number of cross-links than that of the latest human proteome XL-MS study 9 . To validate the identified cross-links utilizing available three-dimensional structures, we mapped cross-links from 26S proteasome, which is a large biological complex, on to its threedimensional structure (Fig. 5a, 5b). Out of the 100 cross-links mapped to the structure, 90 were within the theoretical constraint i.e., 30Å. Additionally, we could validate one cross-link that was exceeding 30Å, utilizing a different structure (Fig. 5c), suggesting potential conformational changes in the corresponding subunits. Six out of the remaining nine cross-links were within 35Å, and all the others were within 50Å, demonstrating high quality of our identifications. Additionally, interprotein cross-links identified at 1% FDR in our study represent 160 unambiguous novel interactions ( Fig. 5d and Supplementary Table 4).

Systematic experimental validation of novel interactions from our proteome-wide XL-MS study
Furthermore, in order to validate those novel interactions using an orthogonal experimental methodology, a representative subset of them (49 randomly-chosen interactions) was tested individually using a Protein Complementation Assay (PCA). The fraction of PCA-positive interactions among the novel interactions identified in our XL-MS study is statistically indistinguishable (P = 0.325) from that of the positive reference set containing well-established interactions in the literature, but significantly higher (P = 1.8 x 10 -5 ) than that of a negative reference set containing random protein pairs (Fig. 5e) 16 . This large set of experimental results demonstrate the high quality of the novel cross-links and corresponding interactions identified in our proteome-wide XL-MS study, and further confirm the reliability and accuracy of MaXLinker.

DISCUSSION
Machine learning approaches have been an integral part of conventional mass spectrometry-based methods 13 . Here, we extended their applications for comparative quality assessment among multiple proteome-wide XL-MS data sets. In addition to using a false search space from an unrelated organism, we demonstrated precision as an effective additional metric for comparative quality assessments. It should be noted that, because a large fraction of true protein interactions is yet to be discovered, precision should not be used as an absolute measure for data quality.
Nevertheless, it is an orthogonal and reliable quality metric for comparative assessments of proteome-wide XL-MS studies.
Our systematic analyses revealed for the first time, the limitations of current quality assessment strategies and the drawbacks of the conventional "MS2-centric" cross-link identification approach resulting in high false positive rates (~15%). Our analyses also revealed that for MS2-MS3 strategy, the MS3-level provides sequence information with significantly higher quality when compared to that of the MS2-level, and identification of cross-links exclusively from MS2-level could result in alarmingly high false positive rate. To address these issues, we designed and implemented a novel "MS3-centric" approach (MaXLinker) (Fig. 2). The conventional "MS2centric" methods such as XlinkX start the search from the MS2-level and attempts cross-link identifications without any information from MS3-level, resulting in high fraction of false positives. On the contrary, MaXLinker starts the search from MS3-level and discards any crosslink candidate without reliable sequence information from MS3-level for at least one of the two cross-linked peptides. Furthermore, the "MS2-Rescue" module, along with other novel features such as the correction for mis-assigned MS1 monoisotopic mass (Fig. 2), play a crucial role in MaXLinker's superior sensitivity over the conventional approach, without compromising on the specificity. Overall, MaXLinker significantly outperformed XlinkX with 18-fold lower false positive rate and up to 31% higher number of identifications.
Having MaXLinker in hand, we reported the largest single data set from proteome-wide XL-MS consisting 9,319 cross-links at 1% FDR, representing 160 unambiguous novel interactions. Moreover, to our knowledge, this is the first study that performed a large-scale orthogonal experimental validation of novel interactions identified from a proteome-wide XL-MS study.
With the constant technical advancements in XL-MS methodologies, reliable search algorithms such as MaXLinker will play a highly significant role in the success of future crosslinking studies. Moreover, the expanding size of cross-link datasets would allow researchers to investigate interaction networks in many disease phenotypes more thoroughly, thereby enabling us to better understand the underlying molecular mechanisms.

Cell culture and whole cell lysate preparation
The K562 cells (ATCC ® CCL-243™) were purchased from American Type Culture Collection

Processing of DSSO-cross-linked samples for analysis
The DSSO-treated protein samples were processed as previously described 20,21 . Briefly, the crosslinked GDH was denatured in 1% SDS, reduced by DTT, and alkylated with iodoacetamide, followed by precipitated in cold acetone-ethanol solution (acetone:ethanol:acetic acid=50:49.9:0.1, v/v/v). The precipitates were dissolved in 50 mM Tris-Cl, 150 mM NaCl, 2 M urea, pH 8.0 and digested by Trypsin Gold (Promega) at 37 o C overnight. After digestion, the sample was acidified by 2% trifluoroacetic acid-formic acid solution, desalted through Sep-Pak C18 cartridge (Waters), and dried using SpeedVac TM Concentrator (Thermo Fisher Scientific). The sample was then reconstituted in 0.1% trifluoroacetic acid and stored in -80 o C before mass spectrometry analysis. The DSSO-cross-linked human proteome was processed identically as described above except that the TPCK-treated trypsin was used for digestion and the sample was stored after drying.

Fractionation by Strong Cation Exchange (SCX)
The SCX fractionation was performed on a Dionex UltiMate 3000 Series instrument (Thermo
Each fraction was dried and stored at -80 o C for further analysis.

LC-MS n analysis
The HILIC fractions were reconstituted in 0.1% trifluoroacetic acid. The samples were analyzed using an EASY-nLC 1200 system (Thermo Fisher Scientific) equipped with an 125-µm x 25-cm capillary column in-house packed with 3-µm C18 resin (Michrom BioResources) and coupled online to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific). The

Data processing
The raw data files were converted, and the spectra were exported as '.mgf' (MS1 spectra as '.dta') False"). For "MS3-Only" category, results from "CID-MS2-MS3" were reprocessed with option "Reprocess: Last Consensus Step" with "Ignore reporter scan: True" in "Xlinkx Crosslink Grouping" node. This set contained a list of all CSMs (includes multiple identifications representing a cross-linked peptide pair). This set of data was used for comparisons shown in Supplementary Fig. 1. Next, Those CSMs for were further processed to obtain a list of unique CSMs (In case of multiple CSMs with different cross-link positions, only one of them was retained to avoid potential biases due to over-representation of certain peptide pairs). The resulting set of CSMs were used for comparisons shown in Fig. 1, Fig. 3a, 3b, 3c, 3d, 3e, 3f, Supplementary Figure   2, Supplementary Figure 3, Supplementary Figure 4. Same procedure was followed to obtain the unique CSMs for GLUD1 analysis shown in Fig. 4b, 4c, and 4d.

Description of MaXLinker
MaXLinker runs in two steps: (i) pre-processing generates a '.MS2_rescue.mgf' file, which is The identified cross-links were annotated as 'interprotein' if neither of the linked peptides were derived from a common protein (with the exception where, both the linked peptides from a common protein, were identical or one of them was a complete subset of the other and the peptide occurs only once in the protein sequence). Cross-links that did not satisfy the aforementioned criteria were annotated as 'intraprotein'.

Statistics
Statistical analyses were performed using a two-sided Z test or a one-sided Welch Two Sample ttest, as indicated in the figure captions. Exact P values are provided for all compared groups.

Data availability
All cross-links are reported in the Supplementary information. Additional data that support the findings of this study are available from the corresponding author upon request       Fraction positive by PCA