Exploring Multi‐Subsite Binding Pockets in Proteins: DEEP‐STD NMR Fingerprinting and Molecular Dynamics Unveil a Cryptic Subsite at the GM1 Binding Pocket of Cholera Toxin B

Abstract Ligand‐based NMR techniques to study protein–ligand interactions are potent tools in drug design. Saturation transfer difference (STD) NMR spectroscopy stands out as one of the most versatile techniques, allowing screening of fragments libraries and providing structural information on binding modes. Recently, it has been shown that a multi‐frequency STD NMR approach, differential epitope mapping (DEEP)‐STD NMR, can provide additional information on the orientation of small ligands within the binding pocket. Here, the approach is extended to a so‐called DEEP‐STD NMR fingerprinting technique to explore the binding subsites of cholera toxin subunit B (CTB). To that aim, the synthesis of a set of new ligands is presented, which have been subject to a thorough study of their interactions with CTB by weak affinity chromatography (WAC) and NMR spectroscopy. Remarkably, the combination of DEEP‐STD NMR fingerprinting and Hamiltonian replica exchange molecular dynamics has proved to be an excellent approach to explore the geometry, flexibility, and ligand occupancy of multi‐subsite binding pockets. In the particular case of CTB, it allowed the existence of a hitherto unknown binding subsite adjacent to the GM1 binding pocket to be revealed, paving the way to the design of novel leads for inhibition of this relevant toxin.


General methods
Optical rotations were measured in a Jasco P-2000 spectropolarimeter in a 1.0 cm or 1.0 dm tube (Na, λ 598 nm). Infrared spectra were recorded with a Jasco FTIR-410 spectrophotometer. 1 H-and 13 C-NMR spectra were recorded with a Bruker AMX300 spectrometer for solutions in CDCl3 or CD3OD. δ are given in ppm and J in Hz. J are assigned and not repeated. All the assignments were confirmed by 2D spectra (COSY and HSCQ). High resolution mass spectra were recorded on a Q-Exactive spectrometer. TLC
soln. of citric acid was added and the mixture was diluted with CH2Cl2 and washed with water and brine. CH2Cl2 (8 mL) previously cooled to -20 ºC was subsequently added and the reaction mixture was stirred at 0 ºC for 1 hour. The solvent was evaporated and the resulting 11b was directly used for the next reaction without further purification.

H and 13 C NMR spectra of new compounds
Binding epitopes of ligands 1, 2, and 3 [1] Figure S1. Binding epitopes of ligands 1, 2 and 3 as bound to CTB (from reference [1]). The numbers represent relative values of saturation after their normalization related to the most intense one (assigned 100%), obtained from STD initial slopes (protons nomenclature as in Figure 1 of main text).     Table S7. STD NMR binding epitope mapping ligands 4-9. Normalised initial slope values for ligands 4-9, obtained from the fitting of the raw data in the Tables S1 to S6. For each ligand, the epitope is normalised against the proton with the strongest initial slope, arbitrarily. Red arrows indicate decrease of STD signals upon addition of the competitor. a) Galactose STD signals of ligand 3 decrease upon addition of 3NPG (complete spectra with assignments in Figure S4). b) STD signals of ligand 3 decrease upon addition of ligand 2. Only protons H5, H2 and H1 can be monitored due to overlapping of the non-galactose peaks of two ligands (see panel a) for assignment and Figure  S4 for complete reference spectra). c) STD signals of ligand 2 are not affected upon addition of 3NPG.  Control tr-NOESY spectra Figure S6. Full spectral width tr-NOESY experiments (mixing time 1.2 s). Top, 2/CTB/3NPG complex (ternary complex, in black) and 3NPG/CTB complex (binary complex, in red), as in Figure 7 of main text. Bottom, 2/CTB complex (binary complex): the spectral area containing the ILOE spectra is empty when the protein is alone, confirming that the ILOE is due to the spatial proximity of the ligands in the ternary complex. Figure S7. Full width tr-NOESY experiments (Mixing time 1.2 s). Top, 2/3NPG/CTB complex (ternary complex, in black) and 3NPG/CTB (binary complex, in red), as in Figure 7 of main text. Bottom, CTB alone: the spectral area containing the ILOE spectra is empty when the protein is alone, confirming that the ILOE is due to the ligands.

Rigid molecular docking of ligand 1.
Based on the available experimental data, molecular docking of ligands 1, 2 and 3 was undertaken to provide a 3D model for binding. The coordinates of CTB from the X-ray structure in complex with GM1 (PDB ID: 3CHB) were used to generate the receptor grid and the docking parameters were optimized by re-docking GM1 to reproduce the crystal structure ( Figure S8a).
With this setting, docking of ligand 1 showed the thio-galactoside moiety fitting in the galactose subsite with similar orientation as the galactose ring of GM1 ( Figure S8b). The docking model was validated against the available STD NMR build-up curves by predicting STD intensities using CORCEMA-ST [3] .
Predicted STD intensities are compared to experimental NMR data through the so-called NOE R-factor, with a low value indicating a good matching.   [1] . b) CORCEMA-ST predicted STD intensities using the coordinates of the 3D model from the lowest energy docking solution of the complex (see Figure 1 in the main text for protons nomenclature).   Remarkably, the calculated data for the thio-galactose moiety (occupying the galactose sub-site in the docking solution, see Figures S10a and S10b) in (i) and (ii) fit well the experimental data, whereas the calculated data for the rest of the molecule in (iii) and (iv) did not fit the experimental results; this part of the molecule occupies the sialic acid binding sub-site in the docking solution (see Figure S10c and Figure  1 in the main text for protons nomenclature).  RMSD determined by first performing a best fit to CTB backbone heavy atoms, before performing a no-fit calculation on the ligand heavy atoms. First frame used as reference. Cluster 0 and 1 shown as red and green points respectively. Bottom: Clustering statistics for the simulation, based on an average linkage hierarchical agglomerative algorithm with a cutoff of 2.5 Å fitting the heavy atoms of ligand 3. Table shows fraction of frames in each cluster, average distance from the centroid and standard deviation. Cluster 0 shown in main text (Figure 9).  Figure S17. Top: Root mean squared deviation (RMSD) of ligand 3 over the course of a 100 ns simulation in which the ligand is bound to CTB. RMSD determined by first performing a best fit to CTB backbone heavy atoms, before performing a no-fit calculation on the ligand heavy atoms. First frame used as reference. Cluster 0 to 4 shown as red, green, blue, yellow, magenta points respectively. All other clusters shown in black. Bottom: Clustering statistics for the simulation, based on an average linkage hierarchical agglomerative algorithm with a cutoff of 2.5 Å fitting the heavy atoms of ligand 3. Clustering yielded 26 clusters; those representing less than 1% population are omitted for clarity. Table  shows fraction of frames in each cluster, average distance from the centroid and standard deviation. Cluster 0 shown in main text (Figure 9). Figure S18. Ligand 3, CORCEMA-ST calculations on MD frames. a) Experimental STD intensities at increasing saturation time (build-up curves) for ligand 3 in complex with CTB [1]. b) CORCEMA-ST calculated STD intensities for 100 averaged frames from MD of the complex (see Figure 1 in the main text for protons nomenclature). Figure S19. Ligand 2, CORCEMA-ST calculations on MD frames. a) Experimental STD intensities at increasing saturation time (build-up curves) for ligand 2 in complex with CTB [1] . b) CORCEMA-ST calculated STD intensities for 100 averaged frames from MD of the complex (see Figure 1 in the main text for protons nomenclature).