Quantitative cross-linking/mass spectrometry using isotope-labelled cross-linkers☆

Dynamic proteins and multi-protein complexes govern most biological processes. Cross-linking/mass spectrometry (CLMS) is increasingly successful in providing residue-resolution data on static proteinaceous structures. Here we investigate the technical feasibility of recording dynamic processes using isotope-labelling for quantitation. We cross-linked human serum albumin (HSA) with the readily available cross-linker BS3-d0/4 in different heavy/light ratios. We found two limitations. First, isotope labelling reduced the number of identified cross-links. This is in line with similar findings when identifying proteins. Second, standard quantitative proteomics software was not suitable for work with cross-linking. To ameliorate this we wrote a basic open source application, XiQ. Using XiQ we could establish that quantitative CLMS was technically feasible. Biological significance Cross-linking/mass spectrometry (CLMS) has become a powerful tool for providing residue-resolution data on static proteinaceous structures. Adding quantitation to CLMS will extend its ability of recording dynamic processes. Here we introduce a cross-linking specific quantitation strategy by using isotope labelled cross-linkers. Using a model system, we demonstrate the principle and feasibility of quantifying cross-linking data and discuss challenges one may encounter while doing so. We then provide a basic open source application, XiQ, to carry out automated quantitation of CLMS data. Our work lays the foundations of studying the molecular details of biological processes at greater ease than this could be done so far. This article is part of a Special Issue entitled: New Horizons and Applications for Proteomics [EuPA 2012].


Introduction
"τά ðντα ìέναι τε πάντα καì μένειν οúδέν"all entities move and nothing remains still [1]. Dynamic aspects of proteins play a pivotal role in many if not all biological processes. Unfortunately, the analysis of protein dynamics remains a technological challenge, as does the analysis of protein structures.
Cross-linking/mass spectrometry (CLMS) is finally emerging after many years of method development as a highly successful tool in the structural analysis of proteins and multi-protein complexes [2,3]. Adding quantitative measurements to CLMS would allow expanding this success to the analysis of protein dynamics, such as conformational changes and protein-protein interaction dynamics.
CLMS currently involves chemical or light-induced crosslinking to covalently fix proximities in proteins or multi-protein complexes [4][5][6]. Following proteolytic cleavage of the proteins, the cross-linked peptides are identified by mass spectrometry and database searches. An early analysis of a multi-protein complex, the Ndc80 complex (180 kDa, four sub-units) [7], guided crystallisation trials to success [8]. The approach was benchmarked and found to be highly accurate by investigating the 0.5 MDa RNA polymerase II complex (12 sub-units) and provided reciprocal footprints of the transcription factor TFIIF and Pol II [9]. Integration of multiple data sources including cross-linking has also been used to describe phage packaging motor incorporation [10] and has recently led to a model of the proteasome [11], where classical structure determination tools alone have failed for more than a decade. Some of these studies relied on isotope-labelled cross-linkers to enhance the identification success of cross-linked peptides [3,11] while others utilised the high resolution of modern mass spectrometers [7,9].
Adding quantitation to structural analyses by CLMS is an attractive next step that should benefit from the wellestablished tools of quantitative proteomics. Proteins and their modifications are quantified using isotope labelling [12][13][14][15] or label-free approaches. Signal intensities of peptides as measured by mass spectrometry are proportional to peptide concentration and are used routinely for quantitation. Indeed, CLMS has been used to qualitatively reflect conformational changes of single proteins [2,16] or multiple binding sites [9] through observation of conformation-specific cross-links and also in a label free approach for the quantitation of protein-protein interactions [17]. Isotope labelling for quantitation has been explored for cross-link analysis of conformation changes [18]. However, it has yet to be implemented properly for cross-linked peptides. Software tools will likely be of importance, given the key role MaxQuant software [19] has played in making SILAC a success in quantitative proteomics. Isotope-labelled cross-linkers were introduced some time ago [20,21], and are used extensively to help identify crosslinked peptides, as any doublet resulting from the simultaneous use of light/heavy labelled cross-linker pair indicates a crosslinker-containing peptide [2,7,[22][23][24]. Identification confidence of cross-linked peptides can be increased e.g.: by pre-filtering MS2 spectra for those precursors that were observed as doublets before the search and thus reducing the noise of database searches [7]; by using the mass shifts observed at MS2 level during the search, when both precursors of a doublet were selected for fragmentation [24]; or by adding confidence after the search by checking if identified cross-links were indeed observed as doublets [2]. Unfortunately, using isotope labelled cross-linkers for identification conflicts with their use for quantitation. Selecting both labelled precursors for fragmentation, a need for reliable identification, becomes less likely as ratios deviate from 1:1, which happens when structural changes are being traced. Here, high-resolution mass measurements of cross-linked peptide masses and their fragments may offer a solution. This label-free approach has proven highly successful in CLMS for the purpose of identifying cross-links thus leaving isotope labels for quantitative purposes [9].
The advantage of quantifying by use of cross-linker over other chemical labels, such as isobaric labels (iTRAQ [25] or TMT [26]), is the added identification confidence in knowing, a priori, that one looks at a cross-linker containing species. Furthermore, the additional step of adding e.g. iTRAQ or TMT to introduce the label is avoided. Labelled cross-linkers have the same advantage over SILAC [15] as all chemical labelling schemes of not using labelled proteins. Labelling the proteins for cross-linking during synthesis [27] excludes certain biological materials from analysis, such as human serum. Isotope-labelled cross-linkers might therefore be the most general and practical way of introducing isotopes to cross-linked peptides for quantitation. Obviously, this only holds true as long as a given cross-linker is available in isotope-labelled form.
We set out to test isotope-labelled cross-linkers for quantitation. As a model system, we cross-linked human serum albumin and quantified cross-links at different mixing ratios of heavy and light cross-linker. Quantitation was done either manually or by exploiting MaxQuant. As we observed limitations with either method, we developed an application, XiQ, to prove the technical feasibility of the entire approach by combining the accuracy of manual quantitation with the speed of automated quantitation. This also created a reference data set that may be used to test other established quantitation software. The XiQ application and the mass spectrometric raw data used here are available from http://xiq.rappsilberlab.org. . For the purpose of quantitation, BS3-d0 and BS3-d4 were mixed with three molar ratios, 1:1, 1:2 and 1:4. The ratio of BS3-d4:HSA was 4:1 (by mass) in all three mixing ratios. Three replicas were prepared for each ratio. The cross-linking reaction was incubated at room temperature (~23°C) for 1 hour, and quenched by addition of ammonium bicarbonate and incubation for 30 minutes at room temperature. Cross-linked protein samples were isolated on SDS-PAGE gel, and in-gel digested using trypsin following a standard protocol [7]. After digestion, peptide solutions were desalted using self-made C18-StageTips [28], following the published protocol [28] for subsequent mass spectrometric analysis.

Mass spectrometry
We used as analytical column a spray emitter (75-μm inner diameter, 8-μm opening, 250-mm length; New Objectives) that was packed with C18 material (ReproSil-Pur C18-AQ 3 μm; Dr Maisch GmbH, Ammerbuch-Entringen, Germany) by help of an an air pressure pump (Proxeon Biosystems) [29]. Mobile phase A consisted of water and 0.1% formic acid. Mobile phase B consisted of acetonitrile and 0.1% formic acid. Peptides were loaded onto the column with 1% B at 700 nl/min flow rate and eluted at 300 nl/min flow rate with a gradient: 1 minute linear increase from 1% B to 9% B; linear increase to 35% B in 169 minutes; 5 minute increase to 85% B. The eluted peptides were directly sprayed into an LTQ-Orbitrap Velos mass spectrometer (Thermo Fisher Scientific). Mass spectrometric analyses were carried out using a "high-high" acquisition strategy [7,9]. The survey scan (MS) spectra were recorded in the Orbitrap at 100,000 resolution. In each acquisition cycle, the eight most intense signals in the survey scan were isolated with an m/z window of 2 Th and fragmented with collision-induced dissociation (CID) in the ion trap. 1+ and 2+ ions were excluded from fragmentation. Fragmentation (MS2) spectra were acquired in the Orbitrap at 7500 resolution. Dynamic exclusion was enabled with 90 seconds exclusion time and repeat count equal to 1.

Identification of cross-links
The raw mass spectrometric data files were processed into peak lists using MaxQuant version 1.2.2.5 [19] with default parameters, except "Top MS/MS Peaks per 100 Da" was set to 200. The peak lists were searched against the sequences of HSA using Xi software (ERI, Edinburgh) for identification of cross-linked peptides. Search parameters were as follows: MS accuracy, 6 ppm; MS/MS accuracy, 20 ppm; enzyme, trypsin; specificity, fully tryptic; allowed number of missed cleavages, four; cross-linker, BS3-d0/d4; fixed modifications, carbamidomethylation on cysteine; variable modifications, oxidation on methionine. The linkage specificity for BS3 was assumed to be at lysine, serine, threonine, tyrosine and protein N-termini. Identified candidates of cross-linked peptides were validated by Xi, and only auto-validated cross-linked peptides were used for subsequent quantitation. Distribution of these cross-links was visualised in the crystal structure of HSA (PDB| 1AO6) [30] using PyMOL [31]. Distances between alpha-carbons (C-α distances) of cross-linked residues were measured and compared to the maximum cross-linker length, which allowed for further validation of these identified cross-links [9].

Manual quantitation of cross-links
Five cross-linked peptide pairs were quantified manually for each mixing ratio. The cross-links were selected as being amongst those with highest identification confidence in all three replicas. For each cross-linked peptide, the summed intensities of the first three isotope peaks in the isotope cluster of heavy signals and light signals were used to calculate the signal ratio of BS3-d0 cross-linked (light) to BS3-d4 cross-linked (heavy) peptides. We excluded the monoisotopic peak because the relatively small mass difference of 4 Da used here can lead to an overlap of isotope clusters for light and heavy peptides. Peak intensity was defined as the peak area for each isotope peak, and was derived from raw data using Peak detection in Xcalibur (version 2.1.0, Thermo Scientific). Visual inspection ensured that any overlap between light and heavy signals was minimal and that no other signals interfered with the quantitation.

MaxQuant quantitation of cross-links
MaxQuant currently does not support quantitation of crosslinked peptides or indeed of any third-party identifications. Therefore we had to rely on first doing a quantitation that considers doublet signals detected by MaxQuant and then matching these to our results, a process that omits the re-quantitation routine of MaxQuant for identified peptides. We analysed our raw-files using MaxQuant (version 1.2.2.5) allowing for 5 missed cleavages, using a single protein fasta file (HSA), no contaminants, disabled "I = L" and "Filter labelled amino acids," and multiplicity was set to two. We selected "Lys4" as the heavy label, which corresponds to the mass shift introduced by the labelled cross-linker used in our study. After MaxQuant finished, we concentrated on the allPeptides.txt file. This contains all found "features" in the raw-files and the assigned ratios. These were then mapped back to the identified cross-linked peptides. The ratio for each cross-link site was taken as the median of all supporting ratios for that site.

XiQ quantitation of cross-links
The Xi quantitation application (XiQ) was implemented in C++, following and automating the manual quantitation procedure. It is used in on-going projects in our lab while we are trying to convince the makers of other freely available and more powerful quantitation packages, e.g. MaxQuant [19] and Skyline [32] to make the necessary amendments to their tools that allow cross-link data to be used. XiQ uses the MSFileReader library to access raw-files (Thermo Fisher Scientific). XiQ starts by reading in a tab-delimited file containing the list of identifications (ID list) and opens all provided raw-files. The ID list specifies, for each peptide spectrum match (PSM), the raw-file name, the ms2 scan number, the precursor m/z, the precursor charge state, the label mass, label count, and quantitation window. For each PSM, XiQ then selects the originating raw-file and in a first step reads out the actual elution time from the scan header of the ms2 event, selected based on the scan number. Additionally, it calculates the expected m/z value of the mono-isotopic, first, second and third isotope-peak of both signals of the doublet. For the labelled partner the originally defined window is extended in one direction, as deuterium labelling often results in a shift in retention time. If the time window was set originally for the non-labelled peptide it is extended towards earlier elution times and if it was set originally for the deuterium-labelled peptide it is extended to later elution times. The extracted ion current (XIC) is then read out for each isotope peak within the predefined/ calculated time window. In each profile it will search for the maximum intensity. From this point XiQ increases the area of the elution peak along the time access until the intensity drops to a value of 10% of the maximum intensity. The peak areas of the first, second and third isotope peak are then summed up for each label-partner and the ratio is defined as heavy area divided by light area. The mono isotopic peak is ignored, as explained above. The calculated ratio is appended to the respective row in the ID list. Site-specific ratios are given by the median of all ratios supporting a given cross-link site. XiQ application is available open source at http://xiq.rappsilberlab.org.

Results and discussion
3.1.

HSA analysis
We chose human serum albumin (HSA) as a model system for this study following a number of considerations. It is a medium sized protein (~70 kDa) and thus promised to yield richer cross-linking data than smaller proteins, which is desired for statistical analysis. At the same time, HSA does not require additional fractionation steps as larger proteins might. Its crystal structure [30] serves as a reference to support or question the accuracy of any identified cross-links. Furthermore, HSA falls in line with a number of other human serum proteins currently under study in our laboratory and that cannot be obtained as SILAC labelled proteins in their endogenous form. Finally, HSA can be purchased at low cost and in large quantities. Triplicate data sets were generated for three different H/L ratios to serve as model data for quantitation. HSA was cross-linked in solution using 1:1, 1:2 and 1:4 mixtures of bis [sulfosuccinimidyl] suberate (BS3-d0) and its 4 Da heavier deuterated isotopologue BS3-d4 (Fig. 1a-d). Following tryptic digestion and analysis of unfractionated peptides by LC-MS, employing a high-high strategy (i.e. recording MS1 and MS2 by high-resolution mass spectrometry), cross-linked peptides were identified using Xi software. All auto-validated matches were carried forward for quantitative analysis and no attempt was made by manual validation to increase the number of matches. Due to the stringency of auto-validation applied, our results constitute a conservative estimate of the amount of identifiable cross-links in HSA. In triplicate analyses and three mixing ratios, i.e. nine LC-MS runs, we identified 597 spectra of cross-linked peptides using auto-validation in Xi, giving confident evidence to 43 unique cross-links in HSA.
Interestingly, the number of unique cross-links identified depended on the mixing ratio. Using a 1:1 mixture of heavy and light cross-linker resulted in the smallest number of identified cross-links, 17 on average for the triplicate analyses (17,22,13). This number increased to 19 cross-links (18,18,21) when using 1:2 mixing ratio and to 32 (26, 34, 35) when using a mixing ratio 1:4. This did not affect our quantitative assessment as evaluation was done within each mixing ratio. the expense of missing as much as 50% of the data on cross-links that would otherwise have been observable. Or, in other words, parting from the identification confidence boost of doublets may double the number of observable cross-links. This is in line with similar observations when identifying and quantifying proteins. Xi, used here to identify crosslinked peptides, does not rely on isotope labelling for the identification process but solely on high-resolution data (Fig. 2a). The observed cross-links distribute over the entire sequence of HSA, connecting all three albumin domains (Fig. 2b). The density of linkages and their distribution suggest the possibility that cross-link data could be used for the assembly of multi-domain protein structures from individual domain structures. In fact, the extensive network links many of the alpha-helices found in HSA (Fig. 2b, c). Modelling software might therefore be assisted in de-novo modelling of protein structures [5]. Importantly, none of the 43 cross-links identified by our automated data analysis conflicted with the crystal structure of HSA (Fig. 2d). Considering side chain length of lysines (6-6.5 Å) and the spacer of the cross-linker (11.4 Å) together with some flexibility of the protein in solution results with 25-30 Å as a reasonable cross-link limit, measured between alpha-carbon atoms of lysines. None of the observed cross-links surpass this limit. Furthermore, the cross-link limit is not fulfilled just because of the small size of HSA. A random measurement of alpha-carbon distances between theoretically linkable residues gives clearly a distinct distribution that is in conflict with the cross-link limit. This fully agrees with our prior manual benchmarking of cross-linking data as highly accurate [9].
We concluded that our HSA data set was of high quality, with respect to confidence of identifications, and contained data on sufficient cross-links to statistically assess different routes of extracting quantitative data for cross-links.

Manual peak integration
To test if quantitation was possible at all using our data, and to establish a reference for any automated quantitation algorithm, we manually integrated a number of mass spectrometric signals of cross-linked peptides and calculated their H/L ratio. Five cross-linked peptide pairs were chosen for being amongst those with highest identification confidence in all three mixing ratios and identified in all replicas. Consequently, we had to integrate 15 MS signals for each mixing ratio. This was achieved using Xcalibur to open raw-files and creating an extracted ion chromatogram around the m/z of the identified cross-link. The peak of the correct elution time, given by the identifying MS2 scan, was then integrated using the peak integration routine of Xcalibur. Peak boundaries were manually determined. Manual quantitation resulted in good agreement between measured and expected ratios (Fig. 3a)  Crystal structure of HSA (PDB|1AO6) displayed by PyMOL [31] with cross-links visualised as red dashed lines and alpha-carbons of linked residues as spheres. d. Length distribution of alpha-carbon distances of observed cross-links as measured in the crystal structure of HSA (green) and for random lysine-lysine pairs (grey).
clustered near the experimental mixing ratio, indicating good quantitation accuracy and precision. The median of log2(H/L) was −0.27, 0.99 and 1.96, respectively, which fell close to the target values of 0, 1 and 2. This demonstrated that neither the small mass difference between BS3-d0 and BS3-d4 nor any elution time shift caused significant interference with the manual quantitation. We noticed, though, that the first isotope of the heavy signal might overlap with the light isotope cluster.
For larger peptides, such as cross-linked peptides, the monoisotopic peak tends to be comparably small and hence the influence of any overlap is potentially significant. We therefore excluded the mono-isotopic peak from our manual quantitation.

Automated quantitation of cross-links
Currently, none of the mainstream quantitation software packages natively support the quantitation of cross-linked peptides. For example, neither Skyline [32] nor MaxQuant [19] accept cross-linked peptides as their input. Nevertheless, we have good experience with using MaxQuant for SILAC based quantitation [33] and also it provides the quantitation of nonidentified chromatographic features. We therefore examined to what extend this could be exploited for the quantitation of cross-links. As is the case for SILAC, our approach is based on signal pairs in MS1 that need to be integrated. MaxQuant performs doublet recognition and peak integration and then outputs the H/L ratios in the "allPeptides.txt" file. There, the ratios are accessible even without using MaxQuant for simultaneous identification, which is not possible in the case of cross-linked peptides. This excludes the re-quantify option of MaxQuant, which is only applicable for peptides identified within the MaxQuant workflow.  Looking at all quantified cross-links, the ratios were significantly different from the experimental mixing ratios (Fig. 3c).
The median of log2(H/L) was −0.74, 0.15 and 1.19, respectively, which was far from the target values of 0, 1 and 2. The ratios spread over a large range and typically to smaller values. In conclusion, MaxQuant disagreed strongly and inconsistently with our manual quantitation and the expected mixing ratios. A possible reason for the variable success of MaxQuant could be in the use of a deuterated cross-linker. Deuterium can introduce a large shift in the elution time between labelled and unlabelled cross-linked peptides (Fig. 3d). Future versions of MaxQuant are likely to improve on this (Jürgen Cox, personal communication) and may also include accessing the re-quantify process with peaks selected outside the MaxQuant pipeline. This would be important to ensure determination of ratios for all identified cross-links. For our on-going work on quantitative CLMS, we decided to explore our own software developments.

XiQ quantitation of cross-links
Given the limited success of MaxQuant in handling a deuterated cross-linker we decided to develop the open source Xi quantitation software (XiQ). XiQ emulates the steps of the manual quantitation, but without the time involvement and subjective components. The algorithm is described in a flow chart (Fig. S1) and the Methods section. In brief, MSFileReader (Thermo Fisher Scientific) is used to access the mass spectrometric raw-files. The peak volume of the associated precursor signal is extracted for every validated peptide-spectrum match returned from the Xi search algorithm. The same is done for the corresponding doublet signal. Importantly, the elution time window that is considered when looking for the corresponding doublet signal is expanded towards earlier times for the deuterated partner and towards later times for the non-labelled partner. This takes the isotope effect of deuterium on elution time into consideration. Finally, the cross-linked peptide ratio is calculated from the heavy over light signals, considering the first, second and third isotope only, as was done for manual integration.
XiQ compared favourably to manual quantitation. XiQ determined ratios for all 45 manual quantitation points and did so  with high agreement to the results of manual quantitation (Fig. 4a). Also, XiQ quantitation achieved fair agreement to the theoretical mixing ratios (Fig. 4b).  (Fig. 4c), with Pearson correlation 0.89, rootmean-square error (RMSE) 0.22 and coefficient of determination R 2 0.77. In summary, XiQ succeeded with delivering reliable and accurate quantitative data for all identified cross-linked peptides. XiQ does not use any cross-linking specific features. It requires a minimal set of input data, specifically the scannumber of the identifying MS/MS spectrum, precursor m/z value, charge state, label state (is the m/z of the heavy or the light precursor), label mass, label count, a time window used to integrate the peaks, and whether a shift in elution time is expected. This makes it possible to use XiQ with any MS1-based labelling scheme that may currently be overlooked by mainstream quantitation software.

Conclusion
Cross-link data carry valuable information about structure and interactions of proteins. Consequently, quantitative analysis of cross-linked peptides may reveal dynamics of protein conformation and protein interactions. We showed that in principle this could be achieved using stable isotope labels. Unfortunately, mainstream quantitation software currently does not yet handle cross-linked peptides. With XiQ we developed a quantitation application to step in for the time being and to create a full reference data set for testing quantitation software. Furthermore, we provide access to the first mass spectrometric raw data from cross-linking, which allows not only for testing quantitation software but also identification software. This work now lays the foundation and opens the opportunity to venture towards analysing protein dynamics at residue-level resolution by mass spectrometry. Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.jprot.2013.03.005.