Quantitative cross-linking/mass spectrometry reveals subtle protein conformational changes

Quantitative cross-linking/mass spectrometry (QCLMS) probes protein structural dynamics in solution by quantitatively comparing the yields of cross-links between different conformational statuses. We have used QCLMS to understand the final maturation step of the proteasome lid and also to elucidate the structure of complement C3(H2O). Here we benchmark our workflow using a structurally well-described reference system, the human complement protein C3 and its activated cleavage product C3b. We found that small local conformational changes affect the yields of cross-linking residues that are near in space while larger conformational changes affect the detectability of cross-links. Distinguishing between minor and major changes required robust analysis based on replica analysis and a label-swapping procedure. By providing workflow, code of practice and a framework for semi-automated data processing, we lay the foundation for QCLMS as a tool to monitor the domain choreography that drives binary switching in many protein-protein interaction networks.


Introduction
Domain rearrangements of individual proteins act as molecular switches, govern the assembly of complexes and regulate the activity of networks. A typical example of a predominantly conformational change-driven finely tuned protein-protein interaction network is the mammalian complement system of ~40 plasma and cell-surface proteins. This is responsible for clearance of immune complexes from body fluids along with other hazards to health including bacteria and viruses. The study of such networks is hampered by the lack of tools to follow dynamic aspects of protein structures.
Cross-linking combined with mass spectrometry and database searching is a tool that can reveal the topology and other structural details of proteins and their complexes [1][2][3] . It is currently unclear if dynamic information could also be obtained from such straight applications of cross-linking/mass spectrometry (CLMS) analysis. However, dynamic information could come from a comparison of the cross-links obtained from different protein states. We demonstrate and test in detail a strategy to perform such comparison in a rigorous manner by quantitative CLMS (QCLMS) 4 using isotope-labeled cross-linkers 5-8 . In our strategy, distinct isotopomers of a cross-linker are used, to cross-link different stable conformers of the same, or very similar, protein molecules. The mass spectrometric signals of cross-linked peptides derived from different conformations can then be distinguished by their masses and thus quantified and related to conformational differences ( Figure 1A). Our approach has been successfully applied to study the conformational changes that involved in the maturation of the proteasome lid complex 9 and to interrogate the structural of complement protein C3(H2O) 10 .
Despite being straightforward in principle, quantifying cross-linked peptides is technically challenging. First, experimental challenges derive from cross-linking itself. Cross-linking is renowned for low reproducibility. Even if overcome, the signal intensity of an individual cross-linked peptide does not just depend on cross-linking efficiency. For example, additional variable chemical modifications of the peptides, such as methionine oxidation and reaction with the hydrolyzed cross-linker, may also depend on protein conformation. Furthermore, the network of cross-links that form within the molecule may influence the efficiency of protease digestion 11 . Finally, peak interference or stochastic processes during data acquisition may prevent accurate quantitation. This is a general problem for quantification but it may especially affect cross-linking; cross-links are generally sub-stoichiometric and thus of low signal intensity in mass spectra. Replicated analysis and label-swaps are important standard procedures of quantitative proteomics. We elucidated here, they can also be essential to improve the accuracy on quantifying cross-link data. Unfortunately, these experimental procedures were neglected in the earlier QCLMS works 4,12,13 , however have been carried out in later QCLMS analyses by us and other groups 9-11,14 .
Another major technical challenge for QCLMS is that most available quantitative proteomics software is linked to protein identification workflows meaning that cross-link analysis is not routinely possible. MaxQuant was recently adopted for quantifying crosslinks 15 being a first exception here. Also, Walzthoeni et al. developed software xTract to be used in conjunction with their identification pipeline for automated quantitation of cross-linking data 14 . Previously performed proof-of-principle work relied on an elementary computational tool, XiQ 4 . A later application 13 relied on manual data analysis, a subsequent protocol conspicuously left out the computational aspects 12 . A workflow using software mMass has been demonstrated using a calmodulin (17kDa) in presence and absence of Ca2+ 11 . It remains to be seen if the approach scales to larger protein system. Now that QCLMS data appear to become readily available, the question opens of what the data actually means in detail, i.e. what detailed structural information can be deduced from quantitative cross-link data.
To explore solutions to the challenges of QCLMS, we developed a workflow that includes replicated analysis, label-swap and offers a more generic solution for automation of the quantitation process, utilizing the quantitative proteomics software tool Pinpoint (Thermo Scientific). This workflow has been described previously in the applications to understand the final maturation step of the proteasome lid 9 and to elucidate the structure of complement C3(H2O) 10 .
Here we present the workflow and framework with technical details, together with benchmarking, which has not yet been reported. We demonstrated this workflow to investigate key proteins in the complement system, a central player in human innate immune defenses. As the pivotal activation step of the complement system, C3 convertases excise the small anaphylatoxin domain (ANA) from the complement component C3 (184 kDa) leaving its activated form, C3b (175 kDa) ( Figure 1). Both C3 and C3b are stable, and comparisons of the crystal structures of both (C3 16 and C3b 17 ) ( Figure 1) revealed details of the structural rearrangements during this conversion. Using this comparison of C3 and C3b as a model system, we demonstrated and test in technical details the reliability of our workflow and usefulness of QCLMS. It is proved possible to infer, from our QCLMS data, the conformational changes that accompany cleavage of C3 to form C3b and then compare these to the difference between the two crystal structures. This both allowed cross-validation of the existing structures and revealed details of the relationship between cross-linking yields and conformational changes. Based on our experiences, we suggest a code of practice for the use of QCLMS in the study of protein dynamics. The crystal structures of C3 (PDB|2A73) and C3b (PDB|2I07) with the α-chain (in C3)/α'-chain (in C3b) (blue) and the β-chains (grey) highlighted. (C) Chemical structure of cross-linkers BS 3 and BS 3 -d4. (D) SDS-PAGE shows that BS 3 (light cross-linker) and BS 3 -d4 (heavy cross-linker) cross-link C3 and C3b with roughly equivalent overall efficiencies, and that broadly similar sets of cross-linked protein products were obtained. (E) High-resolution fragmentation spectrum of BS 3 -cross-linked peptides ISLPESLK(cl)R-K(cl)VLLDGVQNLR that reveals a cross-link between Lys 267 and Lys 283. The mass spectrum of the precursor ion is shown (blue) in the inset; also present (red) is the signal of the precursor ion corresponding to the equivalent BS 3 -d4 cross-linked peptides.

Results
Quantitative cross-linking/mass spectrometry (QCLMS) of C3 and C3b in solution To assess the abilities of QCLMS to reflect conformational changes, we studied the structurally well-characterized differences between C3 (PDB|2A73) and its activated cleavage product C3b (PDB|2I07). We cross-linked both purified proteins in solution using bis[sulfosuccinimidyl] suberate (BS 3 ) and its deuterated analogue BS 3 -d4 5-7 ( Figure 1C) in four distinct protein-crosslinker combinations, generating C3+BS 3 , C3+BS 3 -d4, C3b+BS 3 and C3b+BS 3 -d4. BS 3 is a homo-bifunctional cross-linker containing an amine-reactive N-hydroxysulfosuccinimide (NHS) ester at each end. It reacts primarily with the ε-amino groups of lysine residues and the amino-terminus of a protein, however it can also react with hydroxyl groups of serine, tyrosine and threonine residues 18 . Native C3 and C3b are each composed of two polypeptides chains (β and α for C3, β and α' for C3b) connected by a disulfide bond. In all four cross-linking reactions, the two polypeptide chains of C3/C3b were efficiently cross-linked by BS 3 or BS 3 -d4, and the resultant two-chain products of cross-linking could be isolated by SDS-PAGE ( Figure 1D) and subjected to in-gel trypsin digestion 19 . A 1:1 mixture of C3+BS 3 and C3b+BS 3 -d4 digests along with a "label-swapped" replica (C3+BS 3 -d4 and C3b+BS 3 digests) were analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) 20 . Cross-linked peptides were identified by database searching and then quantified based on their MS signals ( Figure 1E).
Wherever C3 and C3b have regions of identical structure, crosslinked peptides should be seen in mass spectra as 1:1 doublet signals (separated by 4 Da). In contrast, where C3 and C3b structures differ, unique cross-links may occur and these will lead to singlets in mass spectra ( Figure 1A). Signal interferences, extended isotope envelopes or experimental variations may affect quantification. Experimental variations are addressed by replica analysis. Swapping the use of labels in the replica addresses problems arising from extended isotope envelopes and signal interference. Importantly, the label-swap enabled quantitation of cross-linked peptides with singlet signals by generating signature signal patterns ( Figure 1A); it also prevented singlet signals of BS 3 cross-linked (light) peptides from being mistakenly quantified as doublets when the heavy and the light signals of cross-linked peptides overlap ( Figure 3A). Swapping the use of labels also assists in the identification of crosslinks. Singlet signals move by 4 Da, while doublet signals remain as doublets. This behavior of signals corresponding to cross-linked peptides is also clearly distinct from those of peptides contain no cross-linked amino acid residues ( Figure 3A).
In experiment I, we conducted strong cation exchange (SCX)-based fractionation to increase the number of identifiable cross-linked peptides ( Figure 4A). Interestingly, only 20% of all unique crosslinked peptide pairs (25/127) eluted in more than a single fraction. Twenty-four of these 25 cross-linked peptides eluted in two fractions in both forward-labeled and reverse-labeled samples (one only quantified in the reverse-labeled) showing good reproducibility of SCX fractionation. For 49 quantified events of these 25 crosslinked peptides that were identified and quantified in two sequential SCX fractions, we observed that the C3b/C3 signal ratios were highly reproducible (100% for C3/C3b unique cross-linked peptides and R 2 =0.95 for cross-linked peptides quantified with ratios, Figure 4B). We conclude that SCX appears to be compatible with QCLMS and a worthwhile step in the procedure while cross-linked peptides can be quantified with high technical reproducibility.
In an ideal case, the ratio of the component signals of a doublet would simply be a function of the relative likelihood of linking a particular residue pair in each of the two different protein conformations. As shown above, good agreement on the signal ratio of identical cross-linked peptides quantified in sequential SCX fractions was observed in both forward-labeled and reverse-labeled experiments in experiment I, suggesting high technical reproducibility of our analysis. However, we observed that different cross-linked peptide pairs that contained the same pair of cross-linked residues, did not share the same component signal ratios ( Figure 3B). For example, within a single experiment, for the six different cross-linked peptides that covered the residue pair Lys1 and Lys646, the (C3b/C3) signal ratio varied between 0.3 and 1.9 (Supplemental Table S1). This by far exceeds the variation seen in technical replicates. They resulted from methionine oxidation and also missed cleavages and thus may link to sample preparation or to the impact of conformation on proteolysis cleavage efficiencies. To translate data on multiple peptide pairs containing the same cross-linked residues into a single data point for that residue pair, we took the median ratio [i.e. log2(C3b/C3)] of all supporting cross-linked peptide pairs for each residue pair.
Semi-automated quantitation for cross-linked peptides using Pinpoint A practical challenge for quantifying cross-link data is how to capture efficiently the signal intensities of cross-linked peptides from a large quantity of mass spectrometric raw data. Available quantitative proteomics software does not accommodate crosslinked peptides 4 . A previous protocol for QCLMS resorted to manual quantification 12 . Manual quantitation relies more on user expertise and subjective criteria; in respect to automated analysis, it is less comparable between labs. When dataset size increases, manual quantitation becomes increasingly time consuming and impractical. Furthermore, any repetitive task done by a human is error-prone. To alleviate this we developed a semi-automated workflow using the quantitative proteomics software package, Pinpoint (Thermo Scientific). The resultant workflow utilizes the well-established functionality of Pinpoint to retrieve intensities of both light and heavy version of every cross-linked peptide in an automatic manner. Furthermore, the user interface of Pinpoint provides, when necessary, a platform for visualizing and validating the quantitation results of any chosen cross-linked peptides. Hence, improvements to the accuracy of quantitation by introducing knowledge-based expertise can be achieved easily and rapidly.
Pinpoint is normally restricted to work with single peptides. Therefore it was necessary to convert our cross-linked peptide pairs into a "linearized" single-peptide format, following a previous approach developed for database searches 19 (Figure 2A). This allowed generation of a tailored input library of cross-linked peptides ( Figure 2B) based on "General Spectral Library Format for Pinpoint" (Supplemental Material 1). In general, Pinpoint requires a very basic set of information to quantify a cross-linked peptide: amino acid sequence, precursor charge state, retention time, and chemical modifications. A consequence of this minimal requirement of information is that quantification can be performed in Pinpoint independently of the method used for identifying the cross-linked peptides in the first place.
Pinpoint calculates the theoretical m/z of each converted ("linearized") cross-linked peptide and identifies its signal (within 6 ppm error tolerance in our case) in the raw MS data. Although isotope-related shifts in retention time often occur, in most cases, the retention time of the BS 3 and BS 3 -d4 cross-linked versions of a cross-linked peptide in LC-MS/MS overlap to some extent. In order to accurately define singlet and doublet signals of cross-linked peptides, we programmed Pinpoint to retrieve intensity information of both light and heavy signals for each cross-linked peptide by including both a BS 3 cross-linked and a BS 3 -d4 cross-linked version of them in the input library.
As discussed above, accurate quantitation of a cross-linked peptide relies on consistent read-out from both replicates of the label-swap analysis. This is especially important for crosslinked peptides that are unique in one of the two conformations. However, cross-linked peptides were not always fragmented and identified in each replica. Such "under sampling" of signals is common in shotgun proteomics and likely exasperated due to the generally low abundance of cross-linked peptides. The Pinpoint   interface also provides a platform to conduct "Match between runs" 21 . The high mass accuracy achieved in the analysis of the high-resolution Orbitrap data and reproducible LC retention time facilitated this transfer of peptide identities. Thus, all identified cross-linked peptides were quantified in both replica experiments even though they were not necessarily identified in both replicas.
The aggregated intensity of each cross-linked peptide was calculated from the summed area of the three most intense peaks among the first four peaks in the isotope envelope. In cases where a cross-linked peptide was identified with more than one charge state, intensities derived from the different charge states were automatically combined by Pinpoint. Pinpoint did not select correctly the start or end of an elution peak in every instance. Manual curation was therefore still necessary, albeit largely supported by the Pinpoint interface. In addition, cross-linked peptides were discarded for quantitation if they did not have proper signals for the first three peaks of their isotope envelopes.
Consequently, for each cross-linked peptide, Pinpoint provided intensities for both light (BS 3 ) and heavy (BS 3 -d4) signals. These signal intensities were exported as .csv format. The subsequent processes for generating the final quantitation results (Supplemental Table S1 and Supplemental Table S2) were carried out using Microsoft Excel (2013). This included calculating the C3b/C3 signal ratio for each quantified event of cross-linked peptides, normalizing C3b/C3 signal ratio within each cross-link replica, summarizing C3b/C3 signal ratios for each cross-linked residue pair from the C3b/C3 signal ratios of its supporting cross-linked peptides; and then combining the outcomes of quantitation for the label-swapped replicas and for two experiments (experiment I and experiment II).
Cross-linking confirms in solution, C3 to C3b conformational transition In total, we quantified 104 unique cross-linked residue pairs (cross-links) that could be divided into three groups based on their MS1 signal types: 31 C3-unique cross-links, 24 C3b-unique cross-links and 49 cross-links observed in both proteins. When the "Significance A" test from the standard proteomics data analysis tool Perseus (version 1.4.1.2) 22 was applied to the 49 cross-links observed in both C3 and C3b, based on their log2(C3b/C3) values, three subgroups appeared: 37 showed no significant change (named here mutual), four linked residue pairs were significantly enriched (p<0.05) in C3b and eight linked residue pairs were significantly enriched in C3 (p<0.05) ( Figure 5). The conformational differences/similarities between C3 and C3b were revealed by the locations of these 104 quantified cross-links ( Figure 5A). Thirty-one C3-unique cross-links and 24 C3b-unique cross-links highlighted where the structures of C3 and C3b differs; 37 crosslinks that showed no significant change would suggest structural features that are unaffected by cleavage of C3 to C3b; eight C3-enriched and four C3-enriched cross-links potentially reflected minor conformational changes. These quantified groups and subgroups were not randomly distributed, neither in the primary sequences nor in the 3D structures of C3 (PDB|2A73) 16 and C3b (PDB|2I07) 17 ( Figure 5B, C, D, E). The structural differences and similarities between C3 and C3b as deduced based on our QCLMS data are in agreement with the crystal structures of the two proteins.
Conversion from C3 to C3b is triggered by proteolytic cleavage of the 7-kDa anaphylatoxin (ANA) domain from the N-terminus of the alpha chain. It is therefore not surprising that 20 out of the 31 residue pairs that were found to be cross-linked only in C3 involved ANA. Six of these 20 residue pairs included ANA residues as one partner and residues of MG3, MG8 and TED as the other. Thus these C3-exclusive ANA-specific cross-links clearly define a spatial location for ANA in the C3 molecule that is wholly consistent with the crystal structure ( Figure 5C). The remaining 84 cross-links (11 unique to C3, 24 unique to C3b, and 49 mutual ones) ( Figure 5B, C, D E), report on the extent and nature of rearrangements of the 12 non-ANA domains, namely C345C, TED, CUB, LNK and eight MG domains.

A B
The data confirmed that the spatial arrangement of the domains of the β-chain is conserved following cleavage of the alpha chain: only two cross-links unique to either C3 or C3b involved residues in this chain, compared to 33 C3-unique or C3b-unique cross-links for αchain domains. The cross-links in the β-chain that are conserved between C3 and C3b occur within (13 cross-links) or between (10 cross-links) its seven intact domains (six MG domains and the LNK domain). In contrast, the remaining domains of the α-chain appear to rearrange extensively following ANA excision: a total of 28 of the 33 C3-unique or C3b-unique cross-links in the α-chain are between domains and only five are within domains; this compares to the identification of nine out of the 12 preserved (not unique to C3 or C3b) α-chain cross-links within domains. In essence, all but one of the domains remain largely unaltered in structure following activation of C3, despite their arrangement changing. An exception is a set of four C3b-unique cross-links identified within MG8. This suggests that conformational changes occur within MG8, to a greater extent than in the other MG domains, following C3 cleavage.
Inspection of pairs of amino acid residues that were cross-linked exclusively either in C3 or in C3b illuminates three major rearrangements in the α-chain that accompany the conversion of C3 to C3b. These observations, made in solution, strongly agree with the conformational changes inferred from comparing the crystal structures of C3 (PDB 2A73) and C3b (PDB 2I07) ( Figure 5C, D; Supplemental Table S2). First, residues within the α'-N-terminal (NT) segment (the region of the C3b α-chain that becomes the new terminus after ANA cleavage) are involved in four C3-unique crosslinks and six C3b-unique cross-links. These cross-links captured, in C3, the proximity of the α-NT segment to ANA and MG8, In C3b the cross-links confirmed re-location of α'-NT from the MG8/MG3 side to the opposite, C345C/MG7, side of the structure 17 . Second, migration of the CUB and TED domains is reflected by two subsets of cross-links. Five cross-links from TED to MG8, MG7, ANA and MG2 support the location of TED at "shoulder-height" as observed in the traditional view of the crystal structure of C3 ( Figure 5C) 16 . In C3b, these cross-links were no longer detectable and had been effectively replaced by five TED-MG1 cross-links and a CUB-MG1 cross-link, locating the TED at the "bottom" of the β-chain key ring 17 structure ( Figure 5D), which is again consistent with the crystal structure. Third, distinct sets of cross-links for C3 versus C3b amongst the MG7, MG8, C345C and anchor domains suggest the rearrangements of domains in this region. Such a domain rearrangement is entirely consistent with a comparison of the crystal structures.
Interestingly, two pairs of residues, with one partner in TED and the other in MG1, were cross-linked in C3b, in solution, yet were far apart in the crystal structure (38.3 and 40.4 Å compared to a theoretical cross-linking limit of 27.4 Å for the cross-linker used here ( Figure 6A). A change in the juxtaposition of the TED and MG1 domain could explain this but, on the other hand, three further TED-MG1 cross-links support the arrangement seen in the C3b crystal structure. This apparent conflict can be resolved assuming the TED domain to be mobile with respect to MG1 in solution ( Figure 6B). This is consistent with several other indications that TED is mobile in C3b 23-25 .

QCLMS may reveal subtle protein conformational changes
Based on a comparison of QCLMS data and crystal structures, here we report some observations that may be more generally relevant to the challenging task of translating differences in yields of crosslinked products into inferred changes in protein conformation. Due to the absence of some residues in the crystal structures of C3 and C3b, not all cross-links can be evaluated against crystallographic evidence. Of the 104 cross-links observed, 71 bridged residues that are present in the crystal structures of C3 and C3b.
We first investigated the structural details of 25 cross-linked residue pairs that are unique to one of the two structures. As expected, six C3-unique and eight C3b-unique cross-links agree only with the structure of the protein they were observed in but not the other when considering the residue pair distance in the crystal structures, offering an explanation for their absence based on distance solely. The remaining 11 C3/C3b-unique residue pairs, however, would be possible in both C3 and C3b when just considering the Euclidean residue pair distance in the crystal structures. To account for their absence from one or other structure, we must invoke steric effects that would prevent formation of a bridge ( Figure 7B, and panels A-E in Supplemental Material 2), changes in accessibility of cross-linkable residues ( Figure 7C, and panels F, G in Supplemental Material 2), or change of side chain orientation leading to increased distance of the reactive groups in solution (panel I in Supplemental Material 2). For one case, a nearby sequence stretch absent from the crystal structure may have interfered with cross-linking (panel H in Supplemental Material 2).
None of the cross-links that were quantified with ratios (n=46) involved dramatic proximity changes and the Euclidean distances between cross-linked residues in both crystal structures are within the limits of our cross-linker ( Figure 7A). However, C3-enriched and C3b enriched cross-links co-locate with C3-and C3b-unique cross-links by falling into the part of the molecule that experiences rearrangement during the transition from C3 to C3b ( Figure 5E). For seven of these ten cross-links, the crystal structures provided clues to explain the significant decrease on the yields of cross-linking in one structure versus the other: increase on residue distance ( Figure 7D), residue side chain orientation becoming less favorable for cross-linking (Supplemental Material 3, panel A), change on residue flexibilities (Supplemental Material 3, panel B), as well as influence from appearance or increase of co-located cross-links ( Figure 7D, Supplemental Material 3, panels C, D). The remaining three cross-links experience differences that cannot be rationalized by the crystal representations of C3 and C3b. However, crosslinking samples proteins in solution and the two proteins may differ in their in-solution conformational ensembles. For this explanation also speaks that these three cross-links were found in the domains that experience rearrangements between C3 and C3b (Supplemental Material 3, panel E).

Discussion
We have established a workflow for QCLMS that we believe is in accordance with good practice in quantitative proteomics. This includes replication and label-swapping. In addition, we place the cross-linked pairs of amino acid residues at the focus of the analysis by gathering and summarizing all the relevant peptide quantitation data. We have further lowered the barrier to entry for researchers wishing to apply this technique by establishing a semi-automated approach; this should also facilitate application of QCLMS to more complex systems involving, for example, multiple proteins. To achieve this, we enabled the standard proteomic quantitation software Pinpoint to work with data for cross-linked peptides by "linearizing" their sequences (Figure 2A). Importantly, this mode of quantitation is independent of the specific algorithm used for identifying cross-linked peptides. Linearizing cross-linked peptides also makes it possible to quantify cross-link data using another software packages like Skyline 26 in a similar manner to Pinpoint. However, up to now, Skyline (version 3.5) does not allow for grouping cross-linked peptides based on unique cross-linked residue pairs. As a consequence, post-quantitation data processing becomes more elaborate. As recently becoming standard for publications However these cross-links were only observed in C3 because, in C3b, the steric access of cross-linkers to both residues is blocked. (C) A dramatic decrease in the surface accessibility of K578 coincides with the absence of a C3 unique cross-link, K578-K588 (blue line), in C3b. Even though the proximities between K578 and K588 are nearly identical in the crystal structures of C3 (12.9 Å) and C3b (12.7 Å). (D) Decreased distance between residues K857 MG7 and K1513 C345C may explain why their cross-link was significantly enriched in C3b. As an effect, other links involving K857 MG7 namely K857 MG7 -S1501 C345C and K857 MG7 -K1504 C345C are seen less in C3b. Residue K1501 is not present in the C3b crystal structure (PDB|2I07), residue 1500 is used instead for display purposes.
with proteomics datasets, we propose that good practice in QCLMS would include open access to both the raw data and the lists of cross-links with associated quantitation. Our data are available via ProteomeXchange 27 with identifier PXD001675 and in the supplemental materials.
Our study demonstrated that QCLMS is able to explore, in solution, the differences and similarities between the arrangements of domains in C3 and C3b. Cross-links that were unique or significantly enriched in one conformation over the other were observed in the parts of the molecules that experience major rearrangements between C3 and C3b. On the contrary, cross-links that show no major changes on yield in C3 and C3b were detected in the parts of structure that are preserved from C3 to C3b. The excellent agreement of QCLMS-derived data with the structural transitions suggested by the crystal structures of C3 and C3b provide strong support for our approach. It also suggests some rules that determine how changes in protein conformation influence the yields of cross-linked peptides. Clearly, residue proximity can in many cases act as a simple binary switch for cross-link formation. But even in instances where a bridgeable distance between two cross-linkable residues is largely preserved during a conformational change (here as close as 0.1 Å), other factors than distance may impact cross-link formation; these include changes in surface accessibilities or the positions of the two partners relative to other structural features. While these factors could completely prohibit formation of a cross-link between two residues that are within range, complete negation seems to be rare. More commonly, these non-distance factors cause variation in yields of cross-linked products. This is manifested in a linkage that is enriched in one conformer relative to the other. In general, only complete loss of a cross-link may result from large conformational changes. In contrast, depletion or enrichment of the cross-link correlates with more local conformational changes that do not involve major distance change between cross-linked residues. It is therefore essential to distinguish experimentally between the two scenarios. This is possible through the robust quantitation, using replicated analysis with isotopic label-swapping, outlined herein.
In conclusion, QCLMS is emerging as a tool for studying dynamic protein architectures. We have presented a carefully designed and thoroughly evaluated workflow for QCLMS analysis using isotopelabeled cross-linkers. A limitation of the approach is that each conformer to be analyzed must be stable on a time-scale during which it can be resolved from other conformers and then cross-linked. On the other hand, the automation in quantitation achieved in our protocols means that it is now feasible to extend a QCLMS study to the conformational changes that occur in multiple-protein assemblies 9 . Thus it should be possible, for example, to follow the conformational preferences of a protein subunit in a series of assemblies as proteins are sequentially incorporated. The quantitation module in our workflow can also be adapted for SILAC-based quantitation or label-free quantitation 28 for cross-linked peptides. We envision that QCLMS will greatly facilitate the investigation of conformational dynamics in solution and help to animate the current largely crystallography-derived "snapshots" of biological processes. Quantitative cross-linking samples for mass spectrometric analysis Native C3 and C3b molecules each contain two, disulfide-linked, polypeptide chains. The bands corresponding to monomeric (twopolypeptide chain) products of cross-linked C3 and C3b were excised from the SDS-PAGE gel. In-gel reduction and alkylation, and trypsin trypsin digestion were conducted following a standard protocol 19 . For quantitation, equimolar quantities of the tryptic products from the four cross-linked protein samples were mixed pair-wise, yielding two combinations: C3+BS 3 and C3b+BS 3 -d4 (named here as forward-labeled); C3+BS 3 -d4 and C3b+BS 3 (named here as reverse-labeled). The two quantitative cross-linking samples used for mass spectrometric analysis in experiment I and experiment II were prepared in two separated batches for both protein cross-linking and sample preparation procedures.

Materials and reagents
Mass spectrometric analysis Experiment I. Two quantitative cross-linking samples were analyzed using a previously established workflow for QCLMS analysis 20 . From each quantitative cross-linking sample, a 20-µg (40 µL) aliquot was taken and fractionated using SCX-Stage-Tips 29,30 . In brief, peptide mixtures were loaded on a SCX-Stage-Tip in loading buffer (0.5% v/v acetic acid, 20% v/v acetonitrile, 50 mM ammonium acetate). The bound peptides were eluted into two fractions, with buffers containing 100 mM ammonium acetate and 500 mM ammonium acetate respectively. These peptide fractions were subsequently desalted using C18-StageTips 31 for mass spectrometric analysis.
The SCX-Stage-Tip fractions were analyzed using a hybrid linear ion trap-Orbitrap mass spectrometer (LTQ-Orbitrap Velos, Thermo Fisher Scientific,Bremen Germany) that was coupled with revers phase chromatography. The analytical column was packed with C18 material (ReproSil-Pur C18-AQ 3 µm; Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) in a spray emitter (75-µm inner diameter, 8-µm opening, 250-mm length; New Objectives, Woburn, MA, USA) 33 . Mobile phase A consisted of water and 0.5% v/v acetic acid. Mobile phase B consisted of acetonitrile and 0.5% v/v acetic acid. Peptides were loaded at a flow-rate of 0.7 µl/min and eluted at 0.3 µl/min using a linear gradient going from 3% mobile phase B to 35% mobile phase B over 130 minutes, followed by a linear increase from 35% to 80% mobile phase B in 5 minutes. The eluted peptides were directly introduced into the mass spectrometer vial an electrospray interface. MS data were acquired in the data-dependent mode applying a "high-high" strategy. For each acquisition cycle, the mass spectrum was recorded in the Orbitrap with a resolution of 100,000. The eight most intense ions with a precursor charge state 3+ or greater were fragmented in the linear ion trap by collision-induced disassociation (CID). The "mini. Singal required" was set to 2e4. The fragmentation spectra were then recorded in the Orbitrap at a resolution of 7,500. Dynamic exclusion was enabled with single repeat count and 60-second exclusion duration.

Experiment II.
Using a more sensitive mass spectrometer, we carried out LC-MS/MS analysis without enrichment for cross-linked peptides. A 4-µg (8 µL) aliquot of each quantitative sample was desalted using C18-Stage-Tips prior to mass spectrometric analysis. Peptide mixtures were separated on a reversed-phase analytical column of the same type that was described in experiment I. Mobile phase A consisted of water and 0.1% v/v formic acid. Mobile phase B consisted of 80% v/v acetonitrile and 0.1% v/v formic acid.
Peptides were loaded at a flow rate of 0.5 µl/min and eluted at 0.2 µl/min. The separation gradient consisted of a linear increase from 2% mobile phase B to 40% mobile phase B in 169 minutes and a subsequent linear increase to 95% B over 11 minutes. Eluted peptides were directly sprayed into a hybrid quadrupole-Orbitrap mass spectrometer (Q Exactive, Thermo Fisher Scientific Bremen Germany). MS data were acquired in the data-dependent mode.
For each acquisition cycle, the MS spectrum was recorded in the Orbitrap at 70,000 resolution. The 10 most intense ions in the MS spectrum, with a precursor change state 3+ or greater, were fragmented by Higher Energy Collision Induced Dissociation (HCD). "dd-MS2 intensity threshold" was set to 4.2e4. The fragmentation spectra were recorded in the Orbitrap at 35,000 resolution. Dynamic exclusion was enabled, with single-repeat count and a 60 second exclusion duration.

Identification of cross-linked peptides
The MS2 peak lists were generated from the raw mass spectrometric data files using MaxQuant version 1.2.2.5 22 with default parameters, except that "Top MS/MS Peaks per 100 Da" was set to 20. The peak lists were searched against C3 and decoy C3 sequences using Xi software (ERI, Edinburgh) for identification of crosslinked peptides. Following parameters were applied for the search: MS accuracy, 6 ppm; MS2 accuracy, 20 ppm; enzyme, trypsin (with full tryptic specificity); allowed number of missed cleavages, four; cross-linker, BS 3 /BS 3 -d4 (the reaction specificity for BS 3 /BS 3 -d4 was assumed to be for lysine, serine, threonine, tyrosine and protein N-termini); fixed modifications, carbamidomethylation on cysteine; variable modifications, oxidation on methionine, modifications by BS 3 /BS 3 -d4 that are hydrolyzed or amidated on the end.
Identified candidates of cross-linked peptides with an estimated 3% FDR were accepted and further validated manually based on annotated MS2 spectra. List of identified cross-linked residue pairs were summarized based these cross-linked peptides. We also included additional cross-linked peptides that were identified and quantified in C3 and/or C3b in a separate QCLMS analysis 10 for quantitation. The identification of these cross-linked peptides were transferred into the quantitation runs using "match between runs" based on high m/z accuracy and reproducible chromatographic retention time for MS1 signals. In addition, transferred identification were further verified in the quantitation runs with their MS signal pattern (either shown as doublet signals or singlet signals with 4D mass shift between paired label-swapped replicas). Identification information of all quantified cross-linked peptides and the annotated best-matched MS2 spectra for quantified cross-linked peptides are provided in Supplemental Table S1 and Supplemental Material 4.
Quantitation of cross-link data using Pinpoint software Identified cross-linked peptides were quantified based on their MS signals. The quantitative proteomics software tool Pinpoint (version 1.4.0, Thermo Fisher Scientific, San Jose, CA) was used to retrieve intensities of both light and heavy signals for each cross-linked peptides in an automated manner. An input library for cross-linked peptides was constructed according to "General Spectral Library Format for Pinpoint Comma Separated Values" (Thermo Fisher Scientific) (Supplemental Material 1). In the input library, the sequence of every cross-linked peptide was converted into a linear version with identical mass (Figure 2A), following an idea from Maiolica et al. 13 . Six modifications that were not listed in Pinpoint modification list were defined in "customized modifications": The three most abundant of the first four signals in the isotope envelope were used for quantitation. The error tolerance for precursor m/z was set to 6 ppm. Signals are only accepted within a window of retention time (defied in spectral library) ± 10 minutes.
Manual inspection was carried out to ensure the correct isolation of elution peaks. For each cross-linked peptide, elution peak areas of the light and the heavy signals were measured and reported as log2 (C3b/C3). "Match between runs" 21 was carried out for all cross-linked peptides in Pinpoint interface manually, based on high mass accuracy and reproducible LC retention time, hence quantitation was conducted for each identified cross-linked peptides in both forward-labeled and reverse-labeled samples even when they were only identified in one of them. In "experiment I", if a crosslinked peptide was quantified in two SCX fractions, the average of fold-change (log2(C3b/C3)) in both fractions was reported. Within each of four biological (cross-link) replicas (experiment I forward-labeled, experiment I reverse-labeled, experiment II forward-labeled and experiment II revers-labeled), signal foldchanges of all quantified cross-linked peptides were normalized against their median, in order to correct systematic error introduced by minor shift on mixing ratio during sample preparation. C3b/C3 signal ratios of all quantified peptides in two experiments (four cross-link replicas) were listed in Supplemental Table S1.
The quantitation data were subsequently summarized at the level of unique residue pairs (cross-links). A cross-link was defined as a unique cross-link in either C3 or C3b only if all its supporting crosslinked peptides were observed as the corresponding singlet signals. Otherwise, cross-links were regarded as having being observed in both conformations. For a cross-link shared by C3 and C3b, the signal fold-change was defined as the median of all its supporting cross-linked peptides. Only those cross-links that were consistently quantified in both paired replica analyses (with label-swap) were accepted for subsequent structural analysis. Singlet cross-links were further confirmed by a mass shift of 4 Da, resulting from the label-swap. For cross-links observed as a doublet, the average of signal fold-changes from label-swapping replicated analyses was reported ( Figure 3C). When a cross-link was quantified in both "experiment I" and "experiment II", its fold-change in the two analyses was averaged and reported. All quantified cross-links are listed in Supplemental Table S2. Cross-links that were significantly enriched in C3 or C3b were determined using "Significance A" test from the standard proteomics data analysis tool Perseus (version 1.4.1.2) 22 based on log2(C3b/C3) values. The following parameters were used in Perseus for the test: "Side": both; "Use for truncation": P value; "Threshold value": 0.05.

Comparison with crystal structures
To compare cross-linking data with X-ray crystallographic data, cross-links were displayed using PyMol (version 1.2b5) 32 in the crystal structures of C3 (PDB|2A73) and C3b (PDB|2I07).
Cross-links were represented as strokes between the C-α atoms of linked residues. In the case of a residue missing from the crystal structures, the nearest residue in the sequence was used for display purposes. The distance of a cross-linked residue pair in the crystal structures was measured between the C-α atoms. Measured distances of linked residue pairs in crystal structures were compared to a theoretical cross-linking limit, which was calculated as side-chain length of cross-linked residues plus the spacer length of the cross-linker. An additional 2 Å was added for each residue as allowance for residue displacement in crystal structures. The following side-chain lengths were used for the calculation: 6.0 Å for lysine, 2.4 Å for threonine, 2.4 Å for serine and 6.5 Å for tyrosine. For example, for a lysine-lysine cross-link, the theoretical cross-linking limit is 27.4 Å. Solvent accessibility of cross-linked residues in the crystal structures were obtained using 'Protein interfaces, surfaces and assemblies' service PISA at the European Bioinformatics Institute. (http://www.ebi.ac.uk/pdbe/ prot_int/pistart.html) 33 .

Data availability
The mass spectrometry proteomics data have been deposited in the ProteomeXchange Consortium 27 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD001675.
Author contributions JR, PB and ZAC designed experiments. ZAC carried out experiments. ZAC, JR and LF analyzed data. JC, JB, LF and ST developed computational tools for data analysis. JR, PB and ZAC wrote the manuscript.

Competing interests
No competing interests were disclosed. In this manuscript, Rappsilber and co-workers detail a comprehensive and robust chemical Content crosslinking MS method to sample the conformational dynamics of the complement protein C3 and one of its activated products, C3b. The use of crosslinking MS is a powerful tool for elucidating the solution state dynamics of proteins in their near-native environments. The study features the use of non-deuterated and deuterated crosslinking reagents which are used as molecular reporters of the accessibility distribution of crosslinked residues specific to the C3 and C3b structures. The data is of high quality and presented in an appealing and clear format. The structural transition of C3 to C3b is accompanied by large conformational changes (as seen with the crystal structures of C3 and C3b), and is a suitable and relatable model system to other researchers in structural biology who may deal with systems that exhibit similar dynamics.

Grant information
The authors describe a limitation with chemical crosslinking MS: "Cross-linking is renowned for low reproducibility", which appears a bit of an unclear statement. While labelled residue pairs reported from experimental replicas may not be 100% identical, the strategy should be representative of the state-specific labelling status of the protein, and not dramatically influenced by downstream or other processing procedures.
Although the geometry and accessibility of crosslinked residue pairs are important to consider when evaluating crosslinks, the authors have highlighted a limitation with projection of crosslinks onto crystal structures. The nature of models generated from X-ray crystallography represent a stable solid-state conformation which is often a poor indicator of the solution-state dynamics. Side chain orientations of crystallised proteins may be influenced depending on crystal packing and model bias introduced during refinement of the structure. A quick check of the C3 and C3b crystal structures (PDB ID 2A73 and 2I07) reveal them to be at 3.3 and 4.0 Å resolution, and as such it may be a stretch to use a theoretical crosslinking limit of 27.4 Å to demonstrate subtle conformational changes with a model derived from low resolution X-ray crystallography. It may be more reliable to use a lower-resolution distance limit such as 30 Å and measure from backbone atoms (which are more reliably fitted from electron density data). If considering the solvent accessibility and flexibility of specific residues, it may be appropriate to complement such data with b-factors deposited with the crystal structures, or RMSF values derived from molecular dynamic simulations.
In the Supplementary Figure S2, the crosslinked residue pairs K1346-K1359 in panel (a) have been described as becoming "less favourable for cross-linking" in the C3b model. Could the authors describe what they mean by "less favourable", and perhaps annotate the figure with the measured Euclidean 1. 2. 3. described as becoming "less favourable for cross-linking" in the C3b model. Could the authors describe what they mean by "less favourable", and perhaps annotate the figure with the measured Euclidean distances for these examples? Without accompanying flexibility data, it is difficult to support and substantiate the probability of a crosslink being formed between rotamers of side chains found in the crystal structure. In panel (b) of the figure, it is difficult to see how the authors have deduced that the "flexibility of cross-linked residues reduce". We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Minor points of revision
No competing interests were disclosed. :

Article content
In this manuscript the authors describe an effective approach for quantitative cross-linking/ mass spectrometry (QCLMS). The method was previously used to investigate properties of two distinct conformations of complement C3(H2O) protein which arise due to proteolytic cleavage. Here the protocol and data analysis strategies are described in greater detail than in earlier publications, which is of benefit to researchers considering to implement the quantitative aspect in their own cross-linking/MS investigations. An important part of this workflow is the 'label-swap' aspect, whereby the deuterated cross-linker is used first on one sample, and then on the other. Combining this data then negates any effect of the cross-linker label, since it has been used with both samples. The approach appears to be robust, and it is certainly demonstrated on a very suitable protein. :

Conclusions and Data
A slight disadvantage of this approach (which also applies to all other quantitative workflows) is that some cross-links of low abundance will not be identified. The non-crosslinked peptides from the digests of C3 and C3b will have the same m/z value, whilst the cross-linked peptides will have a different m/z because of the isotopic XL-reagent. Therefore, the relative signal intensity for the XL peptides in the mixed sample is 50% compared to non-mixed. Whilst this doesn't affect the study in question which only considers the quantifiable cross-linked peptides, it would be useful to mention this limitation and to point out that if the unmixed samples are initially analysed more cross-links could be identified, which may help in further studies. If a given lab uses the D0-D4 x-linker as standard then this point is moot. But that point should be mentioned here. studies. If a given lab uses the D0-D4 x-linker as standard then this point is moot. But that point should be mentioned here.
It would also be interesting to know how the researchers verified that they are mixing a 1:1 ratio of digested peptides from the different samples. On the gel, there is less monomeric protein in the C3b sample, because more multimers are formed. Is this taken into consideration?
Is it possible to show if the 'deuteration' of the cross-linker affects the extent of cross-linking? If the C3-light and C3-heavy are analysed together it could be seen if more cross-links are found with one than with the other. This would support the need for label swaps.
For the SCX 'enrichment' step… it would be interesting to know whether the peptides are enriched with respect to non-cross-linked, or just separated from each other. A comparison of PSMs of linear peptides and cross-linked peptides would be beneficial.
A technical point is that the figures are not in the correct order. Figure 2 is not mentioned until after Figure  4 in the text.
The authors should mention HDX-MS and other MS based foot-printing approaches (for example FPOP) as a competing/ complementary technologies. The use of HDX-MS along with chemical crosslinking could be highly informative here. The time scale effects mentioned in conclusion can be better explored with the use of robotics and/ or photo-induced labelling and this may be a good direction for further study.
In general the article requires significant editing; there are many instances where definite articles are missing, words are not pluralised, and words are not spelt correctly. We also think that the article suffers a little from too much opinion regarding the technology for example: Page 1 second paragraph, Cross-linking is renowned for low reproducibility. What does this statement mean? Which part of the complex methodology is not reproducible?
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. Competing Interests: 28  Rappsilber and co-workers provide a detailed description of their quantitative cross-linking/mass spectrometry (QCLMS) workflow as applied to the complement protein C3 and its truncated form C3b. This article complements earlier reports from the group that focused on the application of the method to more extended biological problems. In the present work, the emphasis is on method design and how QCLMS data can give insight into conformational changes within proteins.
QCLMS is an emerging technique in structural proteomics / structural mass spectrometry, and there have been only few applications so far. Therefore, only limited knowledge exists concerning how data emerging from such experiments should be interpreted. As Chen rightly point out, abundance et al,. 1.

2.
emerging from such experiments should be interpreted. As Chen rightly point out, abundance et al,. changes of cross-linked peptides can be caused by different types of changes in the protein structure besides a major conformational rearrangement, and the C3 system actually provides several different examples that point to relatively small changes in the immediate vicinity of cross-linked residues (steric clashes, changes in solvent accessibility etc.). Therefore, the results presented here add valuable insight into what QCLMS can tell us (or not tell us) about protein structures, and these results can be extended to other studies.
In addition, the article provides a more detailed description of the methods employed by the Rappsilber group in a concise format, including experimental design (label-swap experiment) and data analysis strategy. All results are properly documented and the raw mass spectrometry data have been deposited in the PRIDE repository.
I have one comment regarding a somewhat vague statement in the introduction: "Cross-linking is renowned for low reproducibility." This may be an unconcious feeling that some of us working in this area have. Is there any reference that the authors can cite that supports this statement? Indeed, in view of the results presented in this work, the reproducibility of QCLMS, or CLMS in general, does not appear to be such a major concern, but rather seems to be affected by parameters not directly connected to the cross-linking procedure, such as undersampling in the mass spectrometer.
Two minor technical points: Figures 3 and 4 are referred to ahead of Figure 2, so the authors should consider renumbering the figures.
Starting on page 7, the authors refer to individual domains of the C3 protein. It would be helpful to add a respective illustration in one of the figures for reference.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. Competing Interests: