Conformational and oligomeric states of SPOP from small-angle X-ray scattering and molecular dynamics simulations

Speckle-type POZ protein (SPOP) is a substrate adaptor in the ubiquitin proteasome system, and plays important roles in cell-cycle control, development, and cancer pathogenesis. SPOP forms linear higher-order oligomers following an isodesmic self-association model. Oligomerization is essential for SPOP’s multivalent interactions with substrates, which facilitate phase separation and localization to biomolecular condensates. Structural characterization of SPOP in its oligomeric state and in solution is, however, challenging due to the inherent conformational and compositional heterogeneity of the oligomeric species. Here, we develop an approach to simultaneously and self-consistently characterize the conformational ensemble and the distribution of oligomeric states of SPOP by combining small-angle X-ray scattering (SAXS) and molecular dynamics (MD) simulations. We build initial conformational ensembles of SPOP oligomers using coarse-grained molecular dynamics simulations, and use a Bayesian/maximum entropy approach to refine the ensembles, along with the distribution of oligomeric states, against a concentration series of SAXS experiments. Our results suggest that SPOP oligomers behave as rigid, helical structures in solution, and that a flexible linker region allows SPOP’s substrate-binding domains to extend away from the core of the oligomers. Additionally, our results are in good agreement with previous characterization of the isodesmic self-association of SPOP. In the future, the approach presented here can be extended to other systems to simultaneously characterize structural heterogeneity and self-assembly.


Introduction 30
Protein self-association is fundamental for many processes in biology (Ali and Imperiali, 2005; 31 Marsh and Teichmann, 2015), and it has been estimated that around half of all proteins form 32 dimers or higher-order complexes (Lynch, 2012). One such protein is Speckle-type POZ protein 33 (SPOP), a substrate adaptor in the ubiquitin proteasome system, which recruits substrates for the

67
The higher-order self-association of SPOP follows the isodesmic model (Marzahn et al., 2016), 68 in which the equilibrium between oligomer and +1 is described by a single equilibrium constant 69 independently of oligomer size (Oosawa and Kasai, 1962). In the case of SPOP, the BTB-mediated 70 dimer acts as the protomer of higher-order self-association, and the isodesmic D thus describes 71 BACK-BACK self-association. The isodesmic model can be used to calculate the equilibrium con-72 centration of every oligomeric species as a function of the total protomer concentration (Fig. 1a,c). against the SAXS data using Bayesian/maximum entropy (BME) reweighting (Bottaro et al., 2020). 100 Our results show that SPOP forms rigid, helical oligomers in solution, and that the linker connect-101 ing the MATH and BTB domains is likely flexible, allowing for repositioning of the MATH domains 102 during substrate binding. Our results also provide further evidence that SPOP self-association fol-103 lows the isodesmic model, and we find an isodesmic D in good agreement with the previously 104 determined value (Marzahn et al., 2016). Using SAXS experiments of a cancer variant of SPOP we 105 also show how our approach can be used to determine changes in the level of self-association.

107
We collected a concentration series of SAXS data on a previously used truncated version of SPOP, 108 SPOP 28-359 (full length is 374 residues), with total protein concentrations ranging from 5 to 40 µM.

109
In order to build structural models to refine against the SAXS data, we first needed to decide which 110 oligomeric species to include in our modelling. To this aim we used the isodesmic self-association   As scattering intensity is proportional to particle size squared, larger oligomers, however, make 115 a considerable contribution to the SAXS signal despite their low concentrations (Fig. 1c). Given   We calculated SAXS intensities from our conformational ensembles and, given the relative pop-132 ulation of each oligomer from the isodesmic model with previously determined D =2.4 µM, we cal-133 culated SAXS profiles averaged over all the oligomeric species. We found that the SAXS data calcu-134 lated in this way from the ensembles generated by MD simulations convoluted with the isodesmic 135 model were in good agreement with the experimental SAXS data, giving a reduced 2 to the con-136 centration series of SAXS data ( 2 ,global ) of 2.22 (Fig. 3). Despite the overall good agreement, the mined previously, and resulted in a 2 ,global of 1.24 to the SAXS data (Fig. 3). However, this did still 144 not fully eliminate the systematic deviations to the experimental SAXS profiles.

145
To improve the agreement with the experimental SAXS data further, we aimed to refine simulta-

155
The isodesmic D was fitted to 1.3±0.5 µM, and thus also remained in good agreement with the  The MATH and BTB domains are connected through a~20 residue long linker region (Fig. 1b). 181 We hypothesized that this linker may be flexible, allowing for reconfiguration of the MATH domains ,global quantifies the agreement with SAXS data in panel b for the three scenarios. b. Agreement between experimental SAXS data and averaged SAXS data calculated from conformational ensembles of SPOP oligomers with populations given by the isodesmic model (as shown in panel a). SAXS profiles are shown for three different scenarios: (1) calculated from the conformational ensembles generated by MD simulations with the isodesmic D previously determined with CG-MALS, (2) calculated from the conformational ensembles generated by MD simulations with the isodesmic D fitted to the SAXS data, and (3) calculated from conformational ensembles refined against the SAXS data using Bayesian/MaxEnt reweighting, and with the isodesmic D self-consistently fitted to the SAXS data. Error-normalized residuals are shown below the SAXS profiles and 2 to each SAXS profile is shown on the plot.  after reweighting for all SPOP oligomers. c. The average end-to-end distance calculated from ensembles of SPOP oligomers before and after reweighting (see Fig. S3 for distributions for all oligomers). Solid lines show the fit of a power law: E-E = 0 , where E-E is the average end-to-end distance, 0 is the subunit segment size, is the number of subunits in the oligomer, and is a scaling exponent. The fit gave 0 =3.16 nm, =0.99 before reweighting and 0 =3.11 nm, =0.99 after reweighting. d. The fold-change in average end-to-end distance after reweighting for all SPOP oligomers. e. Normalized histogram of distances between the center-of-mass (COM) of the MATH domain and the COM of the BTB/BACK domains in the same subunit before and after reweighting. gives rise to a broad distribution of distances between the substrate binding sites in neighbouring 192 MATH domains, which is also slightly increased upon reweighting for all oligomers (Fig. 4g-h). Both 193 the overall rigidity of the oligomers and the flexibility of the MATH domains are also evident from 194 visual inspection of the conformational ensemble of the 60-mer (Fig. 4j). however, difficult due to conformational and compositional heterogeneity. In one approach, SAXS 238 data of mixtures may be attempted to be decomposed into contributions of individual components 239 that may then be analysed separately (Herranz-Trillo et al., 2017; Meisburger et al., 2021). Here, we 240 have developed an alternative 'forward modelling' approach to characterize proteins that undergo 241 polydisperse oligomerization by self-consistently and globally fitting the distribution of oligomeric 242 species and reweighting the conformational ensembles of the oligomers against SAXS data. A sim-243 ilar idea has recently been applied to study the self-association of tubulin using static structures 244 as input (Shemesh et al., 2021). We recorded a concentration series of SAXS data on SPOP, which 245 is known to form linear higher-order oligomers, and combined MD simulations with our approach 246 to simultaneously refine conformational ensembles of thirty oligomeric states of SPOP along with 247 their relative populations. 248 Our results suggest that SPOP oligomers are rigid, helical structures in solution and that the  (Pierce et al., 2016; Bouchard et al., 2018). Our results also provide or-253 thogonal evidence that SPOP self-association is described well by the isodesmic model, and that 254 the isodesmic D for BACK-BACK mediated self-association is in the low micromolar range, in agree-255 ment with previous measurements by CG-MALS (Marzahn et al., 2016). We also collected SAXS data 256 and fitted the isodesmic D for the SPOP mutant R221C. Our results suggest that SPOP R221C has 257 a 6-9 fold decreased propensity to self-associate.

258
The approach presented here to study SPOP can be extended to other polydisperse systems 259 to characterize the distribution of oligomeric states and their conformational properties. However, 260 there are a few limitations to be aware of; SAXS is a low resolution technique, and may not be able 261 to distinguish between all relevant conformations, a problem that is likely exacerbated here, as the 262 contribution of many species to the SAXS signal may average out distinct features in the profile.

263
One way to mitigate this problem is to construct multiple structural models, and test whether  In the case of SPOP, we described the distribution of oligomers using the isodesmic self-association 277 model, but this can be replaced by any model that describes the populations of the species in so-278 lution -with the caveat that there should not be too many free parameters to fit to the SAXS data.

279
Similarly, the approach to generate prior conformational ensembles is not limited to MD simula-280 tions, and can be varied based on the system at hand. This flexibility in the modelling approach 281 will make it useful to study other polydisperse systems in the future.

283
Protein expression and purification 284 The SPOP gene encoding residues 28-359 (His-SUMO-SPOP 28-359 ) was expressed and purified as 285 previously described (Bouchard et al., 2018). Briefly, His-SUMO-SPOP 28-359 was transformed into 286 BL21-RIPL cells and expressed in auto-induction media (Studier, 2005). Cells were harvested, lysed, 287 and cell debris was pelleted by centrifugation. The clarified supernatant was applied to a gravity 288 Ni Sepharose resin equilibrated in resuspension buffer (30 mM imidazole, 1 M NaCl, pH 7.8). After 289 washing with wash buffer (75 mM imidazole, 1 M NaCl, pH 7.8), the protein was eluted with a buffer 290 containing 300 mM imidazole, 1 M NaCl, pH 7.8. One milligram of TEV protease was added to the 291 eluted protein and the reaction was left to dialyze into pH 7.8, 300 mM NaCl and 5 mM DTT at 4°C 292 overnight. The cleaved protein was then further purified using a Superdex S200 size-exclusion 293 chromatography column equilibrated with pH 7.8, 300 mM NaCl and 5 mM DTT.  (Souza et al., 2021) and Gromacs 2020 (Abraham et al., 2015). We built the SPOP along the x-axis. To keep these oligomers from rotating and self-associating across the periodic 326 boundary, we added soft harmonic position restraints of 5 J mol -1 nm -2 along the y-and z-axis to 327 the backbone beads of the terminal BTB/BACK domains. We solvated the systems using the Insane 328 python script (Wassenaar et al., 2015) and added 150 mM NaCl along with Na + ions to neutralize the 329 systems. In the 'MATH free' system, we rescaled the of the Lennard-Jones potentials between all 330 protein and water beads by a factor 1.06 to favour extension of the MATH domains into solution 331 (Thomasen et al., 2022), while the unmodified Martini 3 beta v.3.0.4.17 was used for the 'MATH 332 restrained' model.

333
Energy minimization was performed using steepest descent for 10,000 steps with a 30 fs time-334 step. Simulations were run in the NPT ensemble at 300 K and 1 bar using the Velocity-Rescaling 335 thermostat (Bussi et al., 2007) and Parinello-Rahman barostat (Parrinello and Rahman, 1981).  bonded interactions were treated with the Verlet cut-off scheme. The cut-off for Van der Waals 337 interactions and Coulomb interactions was set to 1.1 nm. A dielectric constant of 15 was used. We 338 equilibrated the systems for 10 ns with a 2 fs time-step and ran production simulations for 60 µs 339 with a 20 fs time-step, saving a frame every 1 ns.

340
After running the simulations, molecule breaks over the periodic boundaries were treated with 341 Gromacs trjconv using the flags -pbc mol -center. Simulations were backmapped to all-atom us-342 ing a modified version of the Backward algorithm (Wassenaar et al., 2014), in which simulation 343 runs are excluded and energy minimization is shortened to 200 steps (Larsen et al., 2020). Every 344 fourth simulation frame was backmapped for a total of 15,000 conformers in each backmapped 345 ensemble.

346
Constructing ensembles of larger SPOP oligomers 347 We constructed conformational ensembles of larger SPOP 28-359 oligomers with up to 60 subunits 348 by joining together conformers from the all-atom backmapped ensembles of the SPOP dodecamer.  Calculating SAXS intensities from conformational ensembles 361 We calculated SAXS intensities from each of the 15,000 conformers in each of our all-atom en-362 sembles of SPOP oligomers using Pepsi-SAXS (Grudinin et al., 2017). To avoid overfitting to the 363 experimental SAXS data, we used fixed values for the parameters that describe the contrast of 364 the hydration layer, =3.34 e/nm 3 , and the volume of displaced solvent, 0 / =1.025, that have 365 been shown to work well for intrinsically disordered and multidomain proteins (Pesce and Lindorff- 366 Larsen, 2021). The forward scattering ( (0)) was set equal to the number of subunits in the oligomer, 367 in order to scale the SAXS intensities proportionally to the particle volume. (1) The concentration of any larger oligomer with subunits can be calculated given 1 and the con- is the isodesmic association constant and is the total concentration of protomers. Here we 375 assume that the SPOP BTB-BTB dimer is always fully formed (Marzahn et al., 2016) and in Eq. 1 is 376 thus half of the total protein concentration reported for the SAXS experiments, which refers to the 377 SPOP monomer concentration. Given the concentration of each oligomer from the isodesmic 378 model, we can calculate the volume fraction of the oligomer: The average SAXS intensities from the mixture of oligomers ⟨ ⟩ mix are then given by: where ⟨ ⟩ ,ensemble is the conformationally averaged SAXS intensity of oligomer . Note that the Step 2: Fitting the isodesmic model Step 3: Reweighting the conformational ensemble 420 The following step was repeated for each SAXS experiment in the concentration series. We calcu-421 lated the oligomer concentrations using the isodesmic model given the new A determined in step 422 2. For each oligomer , we extracted a SAXS profile for BME reweighting from the experimental pro-423 file using the following method: we calculated the average SAXS profile from the ensembles as in 424 Eq. 4 but leaving out oligomer from the sum to get ⟨ ⟩ mix,rest . Next, we determined the contribution 425 of species to the experimental SAXS intensity as: where exp is the experimental SAXS intensity and is the volume fraction of oligomer . We then 427 propagated the error , on ⟨ ⟩ , from both the errors on the experimental SAXS intensities 428 and the errors on the calculated SAXS intensities, which we determined using block error analysis 429 (Flyvbjerg and Petersen, 1989). The propagated errors were given by: where the sum to runs over all oligomers that contributed to ⟨ ⟩ mix,rest , exp is the error on the where is the number of ensemble conformations, is the number of experimental observables 437 (in this case the number of SAXS intensities in the profile), 2 quantifies the agreement between 438 ⟨ ⟩ ensemble and ⟨ ⟩ , is the relative Shannon entropy that quantifies the deviation of the new 439 weights from the initial weights, 0 , and is a scaling parameter that quantifies the confidence in 440 the experimental data versus the prior ensemble. 2 is given by: where , ,ensemble is the SAXS intensity calculated from the conformer of the ensemble. Preventing overfitting 465 We ran the optimization with a range of eff -cut-offs from 0.1 to 1. To prevent overfitting, we aimed where , , is the weight of conformer of oligomer from reweighting against SAXS experiment 490 and , is the contribution of oligomer to SAXS experiment relative to the contribution of oligomer 491 to the other SAXS experiments in the concentration series, given by: where , is the concentration of oligomer in SAXS experiment given by the isodesmic model. For