Identifying and characterising Thrap3, Bclaf1 and Erh interactions using cross-linking mass spectrometry

Background: Cross-linking mass spectrometry (XL-MS) is a powerful technology capable of yielding structural insights across the complex cellular protein interaction network. However, up to date most of the studies utilising XL-MS to characterise individual protein complexes’ topology have been carried out on over-expressed or recombinant proteins, which might not accurately represent native cellular conditions. Methods: We performed XL-MS using MS-cleavable crosslinker disuccinimidyl sulfoxide (DSSO) after immunoprecipitation of endogenous BRG/Brahma-associated factors (BAF) complex and co-purifying proteins. Data are available via ProteomeXchange with identifier PXD027611. Results: Although we did not detect the expected enrichment of crosslinks within the BAF complex, we identified numerous crosslinks between three co-purifying proteins, namely Thrap3, Bclaf1 and Erh. Thrap3 and Bclaf1 are mostly disordered proteins for which no 3D structure is available. The XL data allowed us to map interaction surfaces on these proteins, which overlap with the non-disordered portions of both proteins. The identified XLs are in agreement with homology-modelled structures suggesting that the interaction surfaces are globular. Conclusions: Our data shows that MS-cleavable crosslinker DSSO can be used to characterise in detail the topology and interaction surfaces of endogenous protein complexes without the need for overexpression. We demonstrate that Bclaf1, Erh and Thrap3 interact closely with each other, suggesting they might form a novel complex, hereby referred to as TEB complex. This data can be exploited for modelling protein-protein docking to characterise the three-dimensional structure of the complex. Endogenous XL-MS might be challenging due to crosslinker accessibility, protein complex abundance or isolation efficiency, and require further optimisation for some complexes like the BAF complex to detect a substantial number of crosslinks.


Introduction
Cross-linking mass spectrometry (XL-MS) is a powerful technique that enables the identification of proximal amino acid residues within a single protein as well as residues in close proximity in interacting proteins. Intramolecular crosslinks provide distance constraint parameters that can support homologybased tertiary structure modelling or even guide de novo modelling (Adikaram et al., 2019;Kahraman et al., 2013;Liu et al., 2020;Orbán-Németh et al., 2018;O'Reilly & Rappsilber, 2018). Moreover, intermolecular cross-linking information has been used to determine spatial orientation and elucidate the topology of protein complex subunits (Gaik et al., 2015;Herzog et al., 2012;Leitner et al., 2016;Politis et al., 2014). In the absence of a full tertiary structure model proteinprotein interaction (PPI) surface information derived from XL-MS can potentially serve as a guide to direct mutation or small molecule screens to disrupt specific subunit interactions within a given complex.
Here, we performed XL-MS on affinity-purified (AP) endogenous BRG/Brahma-associated factors (BAF) complex in native conditions to define the interaction surfaces between complex subunits and with associated protein partners. The strategy was based on immunoprecipitation of Arid1a, considered to be a scaffolding/bridging component of the complex (Han et al., 2020;He et al., 2020;Mashtalir et al., 2018), and unlike most published XL-MS studies, involved no artificial increase of the target protein complex by overexpression. In addition to BAF purification, we concomitantly achieved a significant co-enrichment of Thrap3, Bclaf1 and Erh proteins that allowed the identification of a substantial number of crosslinks (XLs) between them, implicating this protein cluster as a native protein assembly, that we hereby refer to as TEB complex. We report the interaction surfaces between Thrap3, Bclaf1 and Erh, determined through chemical crosslinking mass-spectrometry.

Bclaf1, Erh and Thrap3 interact directly with each other
To investigate interaction surfaces of Arid1a and the BAF complex, we carried out five experiments where immunoprecipitation of endogenous Arid1a from mouse embryonic stem cells (mESCs) was coupled to crosslinking using MS cleavable crosslinker DSSO and MS3 mass spectrometry. Crosslink peptide identification was performed using the full mouse Uniprot database, with crosslink assignment using XlinkX (Liu et al., 2015) as a part of Proteome Discoverer at a 1% false discovery rate (FDR).
Whilst we were able to detect several XLs between BAF subunits and interacting proteins, these were mostly single crosslink spectra matches (CSMs) with exception of two. We also detected a significant number of XLs between Thrap3, Erh and Bclaf1, due to a strong enrichment of these proteins in Arid1a APs. Specifically, 13% of all XLs identified in the five experiments involved these three proteins. For comparison, 9.6% of total XLs obtained were assigned to BAF, which, in addition to being the bait in the AP, is also a much larger complex with 27 subunits. Given the richness of XL information associated with these proteins, we used this data to gain insight into the structural topology of this protein cluster. Our data suggests that they interact closely and might represent a stable complex. We hereby refer to it as the Bclaf1-Erh-Thrap3 (TEB) complex.
Bclaf1 and Thrap3 are highly homologous proteins (Figure 1). They share the Thrap3-Bclaf1 domain, which defines the THRAP3/BCLAF1 family, that contains these two proteins. There is no available crystal or cryo-electron microscopy structure for Thrap3 or Bclaf1. Interestingly, domain analysis with Pfam database (v33.1) (Mistry et al., 2021) showed that Thrap3 and Bclaf1 are both highly disordered proteins along the whole length (Figure 2A), which limits the application of structure prediction modelling on them without additional information. Hence our XL-MS data could add useful information for structural elucidation of the complex. We summarised confident crosslinks and corresponding number of CSMs for each of five experiments in Table 1. Due to the high sequence homology between Thrap3 and Bclaf1 (Figure 1), the crosslinks were manually checked for unambiguous assignment between the two proteins and undistinguishable pairs were removed from analysis. We detected 24 unique XL sites and a total of 121 CSMs across all five experiments. The high density of crosslinks in the regions from 496-537 amino acids (aa) for Thrap3 and 578-635aa for Bclaf1 ( Figure 2B) indicates the interaction surface between the two proteins. This region of Thrap3 also contains most of XLs to-self, suggesting that it is a potentially globular area and is used for PPI. On the contrary, no intra-links were retained for Bclaf1 upon high confidence filtering ( Figure 2B). Even though they share ~43% homology, the middle parts of these proteins are quite different between approximately 400aa and 560aa, with fragments missing in one protein or the other (Figure 1). The Bclaf1 and Thrap3 interaction interface also contains their cross-linked sites to Erh, which extends slightly further (up to 481-708aa in Thrap3 and 534-620aa in Bclaf1). Moreover, the region of    Figure 2A), suggesting that the Bclaf1 interaction surface, like that of Thrap3, may also be a globular structure. Our data demonstrates that Bclaf1, Erh and Thrap3 interaction is direct and suggests that they might form a ternary complex which we have named TEB (Thrap3-Erh-Bclaf1a).
Previous studies identified that both the N-terminus (1-190aa) and the C-terminus (359-951aa) of Thrap3 are important for its function in DNA repair of post double stranded breaks and stalled replication forks, whilst the residues 190-359aa appear to be dispensable for this function (Vohhodina et al., 2017). The C-terminal fragment of Thrap3 has been shown to partially rescue the Thrap3 knockout phenotype in terms of ability to respond to ionising radiation (IR) induced DNA damage (Vohhodina et al., 2017). Since this Thrap3 C-terminal fragment covers the XL-rich region, the rescue may have been mediated by restoration of interactions with Erh and Bclaf1. Moreover, there is a number of highly frequent phosphorylation sites (>100 reports per site) reported in PhosphoSitePlus (v6.6.0.1) (Hornbeck et al., 2015) within the Thrap3 C-terminus in mice (S379, S572, S679, S924).

Mapping TEB crosslinks on available structures
Erh is the only component of the TEB complex with an available crystal structure (Arai et al., 2005;Hazra et al., 2020;Jin et al., 2007;Kwon et al., 2020;Li et al., 2005;Wan et al., 2005;Xie et al., 2019). We detected a XL for Erh that involves two connected peptides with overlapping sequences (Table 1; XL-pair ID 19). Although XL-MS is not able to distinguish between intra-or inter-links in the case of homomultimeric proteins, the overlap in the crosslinked sequences conclusively identifies an inter-molecular interaction, that is, a crosslink involving two molecules of the same protein (or alternatively a false positive peptide identification). In agreement with this data, Erh is known to form a homodimer (Arai et al., 2005;Hazra et al., 2020;Xie et al., 2019) and therefore we concluded that the self-link at the position 90 involves two Erh molecules. We utilised available Erh homodimer protein structure PDB:1WZ7 (Arai et al., 2005) to model this crosslink within a structural context ( Figure 3). The length of the mapped crosslink is in agreement with the DSSO maximum distance constraint of 37Å threshold used in other modelling studies (Liu et al., 2020).

TEB crosslinks satisfy the model based on sequence homology
Since there is no available crystal or cryo-EM structure for Thrap3 or Bclaf1, we used online modelling tool Robetta for de novo modelling of putative structures (Raman et al., 2009;Song et al., 2013), and five predicted structures were rendered for each protein. In all models Thrap3 appeared to be largely unfolded and flexible (Figure 4), with roughly the same area stretching from 330-725aa folding into a number of helices that resemble a globular structure. This region overlapped with the crosslink-dense area of Thrap3 ( Figure 5). When XLs were mapped onto the proposed models, 5 out of the 5 crosslinks we detected satisfied the more restrictive distance threshold of 32Å (Armony et al., 2021) in models 2, 3 and 5, whilst 4 out 5 did in models 1 and 4.
Surprisingly, despite the homology between Bclaf1 and Thrap3, the predicted structures for Bclaf1 appeared generally much more folded ( Figure 6) than those of Thrap3 and overall looked very different.    by itself being a subject of splicing regulation, with one specific splicing isoform implicated in regulation of tumour growth (Zhou et al., 2014).

Discussion
Crosslinks can not only be used to refine low or mediumresolution structures, but also aid the generation of protein models from their amino acid sequences (Liu et al., 2020;Orbán-Németh et al., 2018). The crosslinking data presented here could be incorporated as C α -C α distance restraints using I-TASSER Roy et al., 2010;Yang et al., 2015;Zhang, 2008) to refine the preliminary protein models we generated and improve their reliability. If satisfactory model structures are obtained, XL data can again be exploited through the HADDOCK platform for inter-molecular docking of the TEB complex subunits (van Zundert et al., 2016). However, due to lack of confident intramolecular XLs for Bclaf1 or available resolved structure of its homologues, its modelling may not be accurate enough to perform this task successfully.
In summary, we have performed XL-MS on endogenous protein complexes to derive useful topological information.
We have shown that Thrap3, Bclaf1 and Erh interact directly with each other to form a tight protein assembly and we have identified their interaction interfaces though XL-MS. Abnormal splicing events are often observed in cancer cells and have been involved in many types of cancer. Hence, characterisation of the interactions between these proteins will be useful for better understanding their role in oncogenesis

Cell culture
Mouse embryonic stem cells (mESCs) were cultured by StemCell Technologies Inc. (Cambridge, UK).

Immunoprecipitation and crosslinking
Protein G-Dynabeads (#10004D, Invitrogen) were prepared by coupling to antibodies against specific target protein in Table 2

Mass spectrometry
Peptides generated by trypsin digestion were fractionated with the Pierce High pH Reversed-Phase Peptide Fractionation Kit (#84868, ThermoFisher) according to manufacturer instructions, and eight fractions were collected and dried. Liquid chromatography mass spectrometry (LC-MS) analysis was performed on the Dionex UltiMate 3000 UHPLC system coupled with the Orbitrap Lumos Mass Spectrometer (Thermo Scientific). Each peptide fraction was reconstituted in 15 μL 0.1% formic acid and loaded to the Acclaim PepMap 100, 100 μm × 2 cm C18, 5 μm trapping column at 10 μL/min flow rate of 0.1% formic acid loading buffer. The sample was then subjected to a gradient elution on the EASY-Spray C18 capillary column (75 μm × 50 cm, 2 μm) at 50°C. Mobile phase A was 0.1% formic acid and mobile phase B was 80% acetonitrile, 0.1% formic acid. The gradient separation method at flow rate 300 nL/min was as follows: for 90 minutes gradient from 5%-38% B, for 10 minutes up to 95% B, for 5 minutes isocratic at 95% B, re-equilibration to 5% B in 5 minutes, for 10 minutes isocratic at 5% B. MS scans were acquired at a mass resolution of 120,000 and precursors between 375-1,600 m/z and charge equal or higher than +3 were isolated for collision-induced dissociation (CID) fragmentation with quadrupole isolation width 1.6 Th in the top speed mode in cycles of 5 seconds. Collision energy was set at 25%. Fragments with targeted mass difference of 31.9721 (DSSO crosslinker) were further subjected to CID fragmentation at the MS3 level with collision energy 35%, iontrap detection and MS2 isolation window 2 Th. Two precursor groups were selected with both ions in the  (Cox & Mann, 2008) pLink , XQuest/xProphet (Leitner et al., 2014;Rinner et al., 2008;Walzthoeni et al., 2012), StravoX (Götze et al., 2012), MeroX (Götze et al., 2015), Kojak (Hoopmann et al., 2015), XiSEARCH (Mendes et al., 2019) or MaxLinker (Yugandhar et al., 2020) could be used for XL identification.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez-Riverol et al., 2019) partner repository with the dataset identifier PXD027611 (see underlying data).

Crosslinking mapping, protein modelling and visualisation
Annotations of domains and disorder regions of Thrap3, Bclaf1 and Erh as reported in Pfam database (v33.1) (Mistry et al., 2021). The illustration of the domains was made using DOG 2.0 (Ren et al., 2009). Two dimensional visualisation of crosslinks was performed using xiVIEW online tool (Graham et al., 2019). The crosslinks visualised were filtered so at least one of the proteins from the cross-linked pair is a member of the TEB complex and the site of the cross-linked pair was observed in more than one experiment. Structural models of Bclaf1 and Thrap3 were predicted by Robetta online tool with TrRefineRosetta modelling method for Bclaf1, and comparative modelling and ab initio modelling for Thrap3 (Raman et al., 2009;Song et al., 2013). PDB files of the model and the available structures were visualised using UCSF Chimera package (Pettersen et al., 2004) and crosslinks were mapped using a Chimera plug-in Xlink Analyzer (Kosinski et al., 2015). Chimera was developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIGMS P41-GM103311).

Data availability Underlying data
The underlying data has been deposited in the ProteomeXchange Consortium via the PRIDE partner repository, accession number PXD027611: https://identifiers.org/pride.project:PXD027611.
between Arginine Methyltransferase Hmt1 and Its Substrate Npl3: Use sets out to investigate the interactions of the BAF complex, by immunoprecipitating Arid1a, a scaffolding protein of the BAF complex. Unfortunately, the experiment yielded too few cross-links of the BAF complex to enable detailed topological investigation, however, three proteins, Thrap3, Bclaf1, and Erh were identified with numerous cross-links, suggesting that they may form a previously unidentified complex. In all, five experiments were carried out yielding a total of 121 cross-links. Of these, 24 CSMs (cross-link spectral matches) were attributed to subunits of this putative complex.
The study highlights an important aspect of structural mass spectrometry, namely that chemical cross-linking enables the structural analysis of intrinsically disordered proteins and disordered regions within a protein. Thrap3 and Bclaf1 are predicted to have extensively disordered regions, which makes it very difficult if not impossible to study the structures by crystallography or cryoelectron microscopy. While the data here provide limited intra and inter cross-links, they are starting point for more extensive investigation of this putative new complex.
Raw data underlying this work has been deposited in the ProteomeXchange Consortium via the PRIDE repository.
Minor comments: The authors suggest naming the putative Bclaf1a-Erh-Thrap3 complex, the BET complex. This is potentially confusing, as there are already the Bromodomain and Extra-Terminal domain (BET) proteins (BRD2, BRD3, BRD4, and BRDT). Perhaps an alternative nomenclature would be better.
1. Table 1 should additionally show the confidence score attributed to a CSM by the software used for data analysis (XLinkX). The legend also refers to 4 experiments where five are shown.

2.
may not have any particular importance for these disordered proteins.
For the cross-linking data analysis, it may prove useful to include serine and threonine in the reaction specificity for DSSO.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes structures for these intrinsically disordered proteins. It would be useful to compare these structures with those predicted by AlphaFold. AlphaFold was released after the manuscript had been submitted, but we agree with the reviewer that it would be interesting to compare the modelled structures with the AlphaFold predicted structures. AlphaFold structures for human THRAP3 and BCLAF1 look more disordered than those presented in the paper for the mouse counterparts but retained a few alpha helixes in the central region of both THRAP3 and BCLAF1, similarly to the structures generated by Robetta.
Searches were performed against the full UniProt Mouse database. Potentially an alternative approach would have been to characterise the IP obtained using Arid1a and build a more targeted database against which to search the data.  (Cox and Mann, 2008), pLink , XQuest/xProphet (Rinner et al., 2008, Walzthoeni et al., 2012, Leitner et al., 2014, StravoX (Götze et al., 2012), MeroX (Götze et al., 2015), Kojak (Hoopmann et al., 2015), XiSEARCH (Mendes et al., 2019) or MaxLinker (Yugandhar et al., 2020) could be used for XL identification." The authors get slightly fixated on the distance restraint of 37Å. This is a lenient value and may not have any particular importance for these disordered proteins.

○
We agree that 37 Å is a lax value. We have now relaxed the use of the 37 Å threshold throughout the manuscript. Further examination of our results has revealed that most of Thrap3 XLs satisfy a distance constraint of 32 Å is all models, and we have modified the manuscript accordingly.
For the cross-linking data analysis, it may prove useful to include serine and threonine in the reaction specificity for DSSO.

○
We thank the reviewer for this suggestion. We did not check for serine and threonine sites as we generally see less than 1% reactivity with these residues when using other reagents with similar reaction chemistry to DSSO such as TMT.

Alexander Leitner
Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland In this contribution, Shcherbakova et al. report the characterization of the ternary interaction between mouse Thrap3, Bclaf1, and Erh by chemical cross-linking (XL) and mass spectrometry. The cross-linking data was obtained from an experiment that initially targeted the BAF complex, which did not yield sufficient cross-links for a detailed structural interpretation. Thrap3, Bclaf1, and Erh, however, were found to be connected through a mutually supporting set of cross-links, suggesting that the observed connections specify the binding interfaces in a ternary complex. The manuscript describes data generation from the IP-XL-MS experiment, for which results from multiple, independent experiments were combined. In total, 24 non-redundant peptide pairs involving at least one of the three proteins were identified repeatedly.
Thrap3 and Bclaf1 are predicted to have large unstructured/intrinsically disordered regions, which makes it difficult to study their interactions by established structural biology methods. The XL data presented here serve as a starting point for further experiments, including integrative modeling approaches to define the organization of the complex. The role of Thrap3 and Bclaf1 in some splicing events may make the complex relevant for diseases such as cancer.
The manuscript is clearly written and experiments are described in sufficient detail to enable replication. XL-MS results have been deposited to the PRIDE repository, facilitating reanalysis or reuse of the data.
Minor comments that could be addressed in a revised version: On page 7, the authors discuss the potential ambiguity of XL data when it comes to site pairs on the same protein. While this is true, the case described here -a cross-link that connects partially overlapping sequences -needs to be inter-molecular (or a false positive identification). I would emphasize this more in the text.
○ Several times throughout the manuscript, the authors mention that they used an upper distance of 37 Å to assess "compatibility" of a cross-link with available PDB structures or structure predictions, citing Liu et al., 2020, as a reference. I would like to point out that 37 Å is a quite lenient threshold, and upper distances of 30-32 Å are much more commonly used. Nevertheless, an interaction between largely unstructured proteins may sample a larger conformational space. Somewhat related to this, the authors discuss that intra-molecular links on Thrap3 were compatible with one of the proposed models (page 7). Since more than one model was predicted, how did the links fit to other models? ○ In the methods section (page 10), an "IPP150" buffer is mentioned that needs to be defined.
○ to point out that 37 Å is a quite lenient threshold, and upper distances of 30-32 Å are much more commonly used. Nevertheless, an interaction between largely unstructured proteins may sample a larger conformational space. Somewhat related to this, the authors discuss that intra-molecular links on Thrap3 were compatible with one of the proposed models (page 7). Since more than one model was predicted, how did the links fit to other models? We thank the reviewer for his expert comment on the cross-link distance upper threshold. The cross-links fit to the other 4 models was similar, with models 2,3 and 5 satisfying all 5/5 crosslinks at a distance of 32 Å and models 1 and 4 satisfying 4/5 crosslinks at 32 Å. We have now modified the text to include this information.
In the methods section (page 10), an "IPP150" buffer is mentioned that needs to be defined. On the same page, "precursor selection ... with a mass resolution of 120,000" should be rephrased. Precursors were detected at a resolution of 120,000 but their selection for fragmentation is only dependent on the isolation width. "miss cleavages" should read "missed cleavages".
We have rephrased the MS mass resolution and precursor selection parameters to: "MS scans were acquired at a mass resolution of 120,000 and precursors between 375-1,600 m/z and charge equal or higher than +3 were isolated for collision-induced dissociation (CID) fragmentation with quadrupole isolation width 1.6 Th in the top speed mode in cycles of 5 seconds." We have also replaced "miss cleavages" with "missed cleavages". The legend in Table 1 mentions four experiments, although the results from five are displayed ○ The legend has been corrected.

Competing Interests:
No competing interests were disclosed.