Experiment-guided molecular simulations define a heterogeneous structural ensemble for the PTPN11 tandem SH2 domains

SHP2 plays an important role in regulating cellular processes, and its pathogenic mutations cause developmental disorders and are linked to cancer. SHP2 is a multidomain protein, comprising two SH2 domains arranged in tandem, a catalytic PTP domain, and a disordered C-terminal tail. SHP2 is activated upon binding two linked phosphopeptides to its SH2 domains, and the peptide orientation and spacing between binding sites are critical for enzymatic activation. For decades, the tandem SH2 has been extensively studied to identify the relative orientation of the two SH2 domains that most effectively binds effectors. So far, neither crystallography nor experiments in solution have provided conclusive results. Using experiment-guided molecular simulations, we determine the heterogeneous structural ensemble of the tandem SH2 in solution in agreement with experimental data from small-angle X-ray scattering and NMR residual dipolar couplings. In the solution ensemble, N-SH2 adopts different orientations and positions relative to C-SH2. We suggest that the intrinsic structural plasticity of the tandem SH2 allows SHP2 to respond to external stimuli and is essential for its functional activity.

The N-SH2 and C-SH2 domains are depicted as ribbons and colored in cyan and orange, respectively. The structures were aligned on the C-SH2 domain. Figure S5. Overlay of the representative structures of the ten most populated clusters from the MD simulations of the tandem SH2 performed with the AMBER99SBws force field. The N-SH2 and C-SH2 domains are depicted as ribbons and colored in cyan and orange, respectively. For comparison, the conformation adopted by the tandem SH2 in autoinhibited SHP2 is represented as a backbone trace colored in light green. Figure S6. Comparison of the experimental small-angle X-ray scattering (SAXS) curve (red), reported as raw data, with the calculated SAXS curves (blue) of the ensemble of the tandem SH2 structures, obtained from MD simulations performed with the AMBER03ws (A) and the AMBER99SBws force field (B), respectively. Experimental and calculated radii of gyrations (Rg) are reported in each panel. Residuals are plotted as a function of q in the top panels. Figure S7. Comparison of the experimental small-angle X-ray scattering (SAXS) curve (red), reported as a smoothed curve, with the SAXS curves calculated from the structure ensembles of each of the ten most populated clusters of the tandem SH2 (blue), obtained from MD simulations performed either with the AMBER03ws force field (panels A-J) or with the AMBER99SBws force field (panels K-T). Experimental and calculated radii of gyrations (Rg) are reported in each panel.    (dashed lines) ensembles (panels B, D, F, H). The RDC-restrained ensemble was generated using as starting coordinates 24 randomly selected conformations representing either the entire heterogenous ensemble of MD-derived tandem SH2 structures (panels A, B) or the homogeneous ensembles of cluster 1 (panels C, D), cluster 3 (panels E, F), and cluster 9 (panels G, H). The attempt to fit the RDC data with tandem SH2 structures with a defined relative orientation of the two domains (as in cluster 1, 3 and 9) leads to major perturbations in the conformation of either the N-SH2 (E and G) or the C-SH2 (C) domain.

Sample preparation
The DNA sequence encoding SHP2 1-220 (tandem SH2) was cloned into the pETM22 expression vector (European Molecular Biology Laboratory collection), which allows the expression of the recombinant protein as a fusion construct with His6-tagged thioredoxin for improved solubility. Each type of RDC (H-N, N-C', H-C' and C'-Cα) was calculated as difference between the respective isotropic and anisotropic doublet-splittings. All doublet-splittings were extracted from IPAP-type (in-phase/anti-phase) spectra; in this approach, each RDC experiment comprises two sub-spectra ("in-phase" and "anti-phase"), in which the relevant doublet appears as either in-phase (the two doublet peaks have the same sign) or anti-phase (the two doublet peaks have opposite sign). Two new sub-spectra ("upfield" and "downfield") that contain either one or other of the two doublet peaks were generated by taking the sum and difference of the in-phase and anti-phase sub-spectra. The peak-positions for calculation of the doublet-splittings were then measured from the upfield and downfield sub-spectra. H-N splittings were extracted from 2D IPAP-15 N-HSQC spectra, 1 and recorded with Hα/Hβ band-selective decoupling for 15 N chemical-shift evolution. 2 C'-Cα splittings were recorded from 3D IPAP-HNCO[J-CA] spectra. 3 N-C' and H-C' splittings were both extracted from 2D IPAP-E.COSY-15 N-HSQC spectra. In this experiment, the in-phase/anti-phase sub-spectra were generated by either refocusing or evolving 15 N transverse magnetization with respect to the N-C' coupling prior to the indirect evolution period. The doublet components in the resultant 2D spectrum were separated by the N-C' splitting in the indirect dimension and by the H-C' splitting in the acquisition (direct) dimension. All spectra were processed with NMRPipe (v10.1). 4 Peakpositions were determined with CcpNmr Analysis. 5

SAXS experiments
Small-angle X-ray scattering (SAXS) data were collected at the P12 beamline at the Petra III storage ring at DESY (Deutsches Elektronen-Synchrotron) in Hamburg (Germany).
Six different tandem SH2 samples were prepared at six different concentrations in SAXS buffer

MD simulations of the tandem SH2 in solution
Molecular dynamics simulations were performed for the tandem SH2 domains of wildtype SHP2 in apo form (SHP2 1-220 , corresponding to sequence ranges from Met 1 to Arg 220 ), except for the N-terminus that was modified by adding two residues, Gly -1 and Pro 0 , and replacing Thr 2 with alanine. The initial atomic coordinates were derived from the crystallographic structure of the PTPN11 tandem SH2 domains in complex with TXNIP peptides (PDB ID 5df6). 7 Missing or incomplete residues (strands Gly -1 -Ser 3 , Asn 161 -Gly 163 , Glu 176 -Leu 177 ) were modeled by Molecular Operative Environment (MOE). 8 Tandem SH2 domains were put at the center of a dodecahedron box, large enough to contain the protein and at least 2.1 nm of solvent on all sides. The system was solvated with ~39500 explicit TIP4P/2005 water molecules, 9 and one Na + ion was added to neutralize the simulation box. All MD simulations were performed with the GROMACS software package, 10 using the AMBER03ws or AMBER99SBws force fields. 11 Long range electrostatic interactions were calculated with the particle-mesh Ewald (PME) approach. 12 A cutoff of 1.2 nm was applied to the direct-space Coulomb and Lennard-Jones interactions. Bond lengths and angles of water molecules were constrained with the SETTLE algorithm, 13 and all other bonds were constrained with LINCS. 14 The solvent was relaxed by energy minimization followed by 100 ps of MD at 298 K, while restraining protein coordinates with a harmonic potential. Then the system was minimized without restraints. Starting from the last system structure, 12 simulations were spawned, after generating the initial velocities at 50 K from different seeds according to the Maxwell distribution. The temperature was raised to 298 K in 10 ns, in a stepwise manner, while the pressure was set to 1 bar using the weak-coupling barostat. 15 Finally, 12 independent production simulations of 1.05 µs were performed. The pressure was set to 1 bar using the Parrinello-Rahman barostat. 16 The temperature was controlled at 298 K using velocity rescaling with a stochastic term. 17 The first 50 ns of each simulation were discarded, and the analyses were performed on a cumulative trajectory of 12 µs.
First-order water-mediated hydrogen bonds were analyzed by MDAnalysis tools. 18 The cross-correlation between the N-SH2 and C-SH2 domains was calculated by Linear Mutual Information (LMI) method 19 from the positions of the Ca atoms after separating the internal local motions of the single domains from the rigid body motions. 20

SAXS calculations from explicit solvent MD simulations
SAXS curves were computed from the MD simulations using explicit-solvent SAXS calculations. 21 Accordingly, all explicit water molecules and ions within a predefined distance from the protein contributed to the SAXS calculations, as defined by a spatial envelope. 21 A distance of the envelope from the solute atoms (7 Å) was chosen to ensure bulk-like water at the envelope surface. 21 A density correction was applied to fix the bulk water density to the experimental value of 334 e nm -3 . 21 The buffer-subtracted SAXS curve was computed from the scattering of atoms inside the envelope volume, as taken from MD simulation frames of two systems: i) containing the tandem SH2 in solvent and ii) containing pure solvent. 21

Back-calculation of RDC from tandem SH2 structures
RDCs were calculated after an alignment tensor best-fitting with the experimental RDCs using singular value decomposition (SVD). 22 Calculations were performed by calcTensor (for the single-template/single-tensor and the multiple-template/single-tensor approach) and calcETensor (for the multiple-template/multiple-tensor approach) helper programs of Xplor-NIH suite 23 on a maximum of 100 conformations of tandem SH2, extracted from the AMBER99SBws simulations in solution.

RDC-restrained MD simulations
Langevin MD simulations were performed using the AMBER99SBws force field, 11 with the TIP4P/2005 water model, 9 which had previously provided a structure ensemble in agreement with experimental SAXS curve. The system and the simulation setup were the same as the previous simulations of the tandem SH2 in solution, with the exception of the integrator and the temperature bath. RDCs were applied as ensemble averaged structural restraints using the tensor-free Θ method. 24 The ensemble was composed by 24 parallel replicas, whose initial structures and velocities were randomly taken from the restraint-free ensemble. We selected a