Structure based virtual screening identifies small molecule effectors for the sialoglycan binding protein Hsa

Infective endocarditis (IE) is a cardiovascular disease often caused by bacteria of the viridans group of streptococci, which includes Streptococcus gordonii and Streptococcus sanguinis . Previous research has found that serine-rich repeat (SRR) proteins on the S. gordonii bacterial surface play a critical role in pathogenesis by facilitating bacterial attachment to sialylated glycans displayed on human platelets. Despite their important role in disease progression, there are currently no anti-adhesive drugs available on the market. Here, we performed structure-based virtual screening using an ensemble docking approach followed by consensus scoring to identify novel small molecule effectors against the sialoglycan binding domain of the SRR adhesin protein Hsa from the S. gordonii strain DL1. The screening

diverse small molecules and thus prevent the interaction of the protein with the sialoglycans.
This opens new avenues for discovering potential drugs against infective endocarditis.

Introduction
Infective endocarditis (IE) (or bacterial endocarditis (BE)) is a life-threatening infection of cardiac valves and the interior surface of the heart (endocardium) 1 . Oral streptococci account for ~17-45% of all cases of IE 2, 3 . If untreated, infection destroys the valves and results in heart failure [4][5][6] . Moreover, bacteria may also form clots (emboli) that enter the blood stream and produce strokes. IE affects 10,000-20,000 patients in the US every year and is associated with an in-hospital mortality rate of ~20% and a five year mortality rate of ~40-70 % 1,7,8 . Treatment for endocarditis currently requires prolonged antimicrobial therapy, often combined with surgery.
The rise in antibiotic resistance 9 has limited our pharmacological options 10,11 , and resistant organisms have increased the mortality rate 12 . Although medical therapy alone often resolves infection, 47% or more of the patients eventually require valve replacement due to the damage incurred 5,6,13 . Given the associated morbidity and rising mortality rate, there is an urgent need to develop novel therapies against IE.
Previous studies have reported that the binding of bacteria to host platelets contributes to the colonization of damaged aortic valves [4][5][6] . A cell wall-anchored serine-rich repeat (SRR) protein mediates the adherence of S. gordonii and S. sanguinis to sialoglycans displayed on the human platelet 14 glycoprotein GPIb 15,16 . SRR proteins have been demonstrated to be virulence factors for endocarditis 15,16 , and disrupting the interaction between the SRR protein and sialoglycans on host platelets may therefore reduce virulence. Streptococcus gordonii is one of the well-studied species that cause IE and is a normal component of the human oral microbiota 7, Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200332/892847/bcj-2020-0332.pdf by guest on 19 September 2020 8,17 . Platelet binding by S. gordonii strains M99 and DL1 are facilitated by the homologous SRR proteins GspB and Hsa, respectively 18 . Although these adhesins have high sequence identity, their ligand binding regions (BRs) differ significantly and have different sialoglycan selectivity 15,19,20 . GspB binds with narrow selectivity to sialyl-T antigen (sTa) whereas Hsa binds promiscuously to a range of glycans 15,19,20 . It is unclear whether an inhibitor could impact infection by the time cases are identified. Sub-acute infective endocarditis disease develops over a period of weeks and by the time disease is identified vegetations that are a mixture of host and bacterial material are well established. Anti-adhesive therapies have been explored for the treatment of a wide range of other bacterial infections 21-23 , but have not yet been pursued for IE.
Anti-adhesives can, in principle, complement traditional antibiotics and improve their efficacy, potentially eliminating the need for surgical intervention. Moreover, an inhibitor might also reduce "re-seeding" (bacteremia is a hallmark of IE), or could be used as a prophylactic in some situations. Additionally, anti-adhesive agents are not bactericidal and hence the propagation and spread of resistant strains is much less likely to occur than as a result of exposure to bactericidal agents, such as antibiotics.
The crystal structures of the BRs (Hsa BR (PDB 6EFC) 24 , GspB BR (PDB 6EFA) 24 , 10712 BR (PDB 6EFF) 24 , SK150 BR (PDB 6EFB) 24 , SrpA BR (PDB 5EQ2) 25 , and SK678 BR (PDB 6EFI) 24 ) from a number of S. gordonii and S. sanguinis SRR proteins have been solved [24][25][26] . These all have two domains which are associated with sialoglycan binding: the Siglec (Sialic acid-binding immunoglobulin-like lectin) domain and the Unique domain (for which function is not known completely). Furthermore, recent studies have identified that the three loops (CD, EF and FG) adjacent to the sialoglycan binding site are critical for the affinity and selectivity between ligands 24 . Additionally, it has been reported that a partially conserved "YTRY" motif in the Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200332/892847/bcj-2020-0332.pdf by guest on 19 September 2020 Biochemical Journal. This is an Accepted Manuscript. You are encouraged to use the Version of Record that, when published, will replace this version. The most up-to-date-version is available at https://doi.org/10.1042/BCJ20200332 binding site is necessary for formation of hydrogen bond interactions with the sialic acid of the native ligand 24 and a crystal structure of Hsa BR bound to sTa (PDB 6EFD 24 ) has been resolved showing these interactions. Importantly, there are also human sialoglycan-binding proteins 27 , that contain a sialoglycan binding site but the site differs significantly in both geometry and in the location of hydrogen-bonding donors and acceptors from those found in the streptococcal Sigleclike proteins 28 . Moreover, the mode of interaction with sialoglycans is distinct between human Siglec proteins and bacterial sialoglycan binding adhesin proteins 28 .
The above structural information can thus be leveraged by structure-based approaches for the identification of molecular effectors targeting SRR adhesin proteins. Here, we targeted one of the SRR proteins, Hsa (Hsa BR ), using the in-silico virtual screening of ~105,000 small molecules, with the goal of discovering small molecules that bind to Hsa BR and disrupt its interactions with sialoglycan ligands. Hsa BR is a good starting structure for a chemical biology approach that could later be expanded toward inhibitors of the entire family because it is well-characterized and is known to bind promiscuously to many glycans 19,20 . Moreover, we have determined crystal structures of the Hsa sialoglycan-binding domain both in an unliganded state and in complex with di-tri-and tetrasaccharide sialoglycan ligands 24 .
Here, instead of using only the crystal structure in our computational approaches, we used molecular dynamics (MD) simulations to describe the flexibility of the binding pocket and generate an ensemble of protein conformations, which has been shown to yield large and diverse sets of molecular effectors to control protein functions 29,30 . Following subsequent high throughput ensemble docking, we prioritized the compounds using consensus scoring, which has previously shown to reduce the number of false positives and increase the success rate 31 . To Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200332/892847/bcj-2020-0332.pdf by guest on 19 September 2020 Biochemical Journal. This is an Accepted Manuscript. You are encouraged to use the Version of Record that, when published, will replace this version. The most up-to-date-version is available at https://doi.org/10.1042/BCJ20200332 further improve our predictions, we cross screened the compounds against the BRs from five Hsa homologues and identified compounds which bound to Hsa BR with relatively higher docking scores compared to other BRs. From our virtual screening predictions, we were able to achieve a high success rate of ~20%, finding that 9 out of 50 compounds that were suggested for experimental validation were indeed able to displace the native ligand from the Hsa BR binding pocket. Moreover, we were also able to identify scaffolds distant from the native ligand that bind to Hsa BR . To our knowledge, these are the first small molecules described to inhibit binding of SRR family of adhesin proteins to its native sialoglycan.

System preparation and molecular dynamics simulation
Crystal structures of the sialoglycan binding proteins Hsa BR (PDB 6EFC) 24 , GspB BR (PDB 6EFA) 24 , 10712 BR (PDB 6EFF) 24 , SK150 BR (PDB 6EFB) 24 , SrpA BR (PDB 5EQ2) 25 , and SK678 BR (PDB 6EFI) 24 were used in this study. Molecular dynamics (MD) simulations were performed on all these proteins using the Amber14 ff14SB force-field parameters 32,33 . Each of these proteins were solvated in a TIP3P 34 octahedral box with a 10 Å buffer of water around the protein in each direction. First, the protein structure was held fixed with a force constant of 500 kcal mol -1 Å -2 while the system was minimized with 500 steps of steepest descent followed by 500 steps with the conjugate gradient method. In the second minimization step, the restraints on the protein were removed and 1000 steps of steepest descent minimization were performed followed by 1500 steps of conjugate gradient. The system was heated to 300 K while holding the protein fixed with a force constant of 10 kcal mol -1 Å -2 for 1000 steps. Then, the restraints were removed, and 1000 MD steps were performed. The SHAKE algorithm 35 was used to constrain all bonds involving hydrogen in the simulations. 200 ns production MD was performed at 300 K Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200332/892847/bcj-2020-0332.pdf by guest on 19 September 2020 using the NPT ensemble and a 2 fs time step with nonbonded cutoff of 10 Å. The temperature was fixed with the Langevin dynamics thermostat 36 and the pressure was fixed with a Monte Carlo barostat 37 . Similar MD simulation protocol was used on all the adhesin. This procedure yielded a total of 20,000 snapshots for subsequent analyses. Three independent runs were performed for each simulation.

In silico screening
Ensemble docking is an in-silico structure-based chemical biology method using an 'ensemble' of protein target conformations to discover novel protein effectors 29 . The flowchart of the screening methodology used is shown in Fig. 1. The ensemble was constructed by clustering snapshots from molecular dynamics (MD) simulation trajectories by root mean square deviation (RMSD) of the binding pocket residues and loops (Table S1) surrounding the binding pocket with the hierarchical agglomerate clustering algorithm using Cpptraj module 38 .
The Vanderbilt small molecule collection ("The Discovery Collection") containing ~105,000 compounds was docked to an ensemble of 5 conformations (4 representative structures obtained from clustering from MD and 1 crystal structure) with a cubic box with edges of ~30 Å.
This small molecule library has been used in multiple high-throughput screens resulting in hits that have moved to hit-to-lead stages of early drug discovery programs [39][40][41] . The docking box was centered on the C⍺ atom of conserved residue THR 339 (Hsa numbering). VinaMPI 42 , a parallel version of AutodockVina 43 , was used to perform the in silico screening. A similar docking procedure was performed for all the adhesin proteins. The docked poses were then ranked by the AutodockVina scoring function 44 . The compounds were not only screened for Hsa BR but also cross screened with 5 adhesin proteins (GspB BR , 10712 BR , SK150 BR , SrpA BR , SK678 BR ). The Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200332/892847/bcj-2020-0332.pdf by guest on 19 September 2020 cross screening was performed to remove the promiscuous compounds or "frequent hitters" ( i.e., compounds which are always scored high for all the targets) and thus reducing the number of false positives 45 . However, it must be noted that the goal of the cross screening was not to get selectivity towards Hsa BR .
From this ranked list of compounds, we tested compounds which were within the top 1% (~1050 compounds) for Hsa BR but not within the top 1% of the other 5 BRs (GspB BR , 10712 BR , SK150 BR , SrpA BR , SK678 BR ) and narrowed the list down to 250 compounds.
We note that we only experimentally tested binding to Hsa BR and not the selectivity of the predicted binders. Next, the resulting ~250 compounds were refined and rescored using two MOE scoring functions 46 . The non-polar hydrogens (not included in Vina docking protocol) were added before performing the "induced fit" docking protocol in MOE 46 . The docking poses were ranked using "GBVI-WSA dG" and "Affinity DG" scoring functions 46 . Using consensus scoring, the top 50 compounds were then suggested for experimental validation.

Cheminformatics
All the physicochemical properties and fingerprints of small molecules were calculated using the combination of MOE 46 , ChemBioServer 47 and RDkit 48 . MACCS fingerprints were calculated for each compound and similarity between them were compared with the Tanimoto coefficient, followed by hierarchical clustering to cluster the similarity matrix.

Protein expression and purification
GST-tagged Hsa BR was expressed and purified as described in ref 20 . GST-Hsa BR was expressed under the control of the pGEX-3X vector in E. coli BL21 (DE3) in a Terrific Broth medium with 50 µg/ml kanamycin at 37 °C. When the OD 600 reached 1.0, expression was induced with 1 mM IPTG at 24 °C for 5 hrs. Cells were harvested by centrifugation at 5,000  g for 15 min,

AlphaScreen high-throughput screening assay
We used the AlphaScreen modification of an ELISA as the primary target-based proximity assay to monitor ligand displacement. AlphaPlate (Cat # PE 6005351, Lot # 8220-16081) with 384well was used for the screening. In the experimental setup, biotinylated sialyl T antigen (sTa) was coupled to a streptavidin donor bead and GST-tagged Hsa was coupled to an anti-GST conjugated acceptor bead in PBS (phosphate buffered saline). The reaction was excited at 680 nm to stimulate singlet oxygen-mediated energy transfer to the acceptor bead, which can be detected at 615 nm.
The dose-dependent signal reflects the number of bead-coupled adhesins bound to beadcoupled glycans. To determine the optimal ratio of Hsa-GST to biotinylated sTa for occupying binding sites on the beads and, therefore, maximal signal production, the Hsa-GST concentration was titrated in a 10 point-3 fold dilution starting from 1000 nM and the biotinylated sTa concentration was titrated in a 9 point-3 fold dilution starting from 100 nM and the resulting Alpha signal measured. The maximal signal, representing the "hooking point" where either the donor or acceptor beads are saturated, was found to be 3 nM for Hsa-GST and 3 nM for biotinylated sTa. The final chosen concentrations used in the screen was 1 nM of Hsa-GST and Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200332/892847/bcj-2020-0332.pdf by guest on 19 September 2020 2 nM of biotinylated sTa, slightly below the hook point, to avoid potential excess Hsa that may sequester inhibitors and interfere with signal disruption. DMSO was used as the negative control and unbiotinylated sTa was used as a positive control at a concentration of 30 uM which was determined to provide maximal disruption of the Alpha signal.
We applied this assay to the evaluation of the test compounds that were predicted as binding to Hsa BR using virtual screening. This initial screen was performed with all test compounds in duplicate at a final concentration of 10 M and DMSO was used as the negative control and unbiotinylated sTa was used as a positive control.

Z' factor calculation
The Z' factor 49 is an indicator of high throughput screening assay performance and was calculated as follows: The standard deviations and means of the positive (p) and negative (n) controls are denoted by , and , respectively. DMSO and untagged sTa are the positive and negative control respectively.

Effector identification analyses
The alpha value of each test compound was measured and was filtered using 1-fold, 2-fold or 3fold of either standard deviation (SD) from the mean of the negative control group, or absolute deviation from the median (MAD) of negative control group. We determined which tested points lay outside the mean of the vehicle control (there were 9 replicates of the negative control (DMSO), 4 replicates of the positive control (untagged sTa)). We used a threshold of both 3 SD and 3 MAD from the negative control group. This was followed by taking the union of the two.
Then, it was further filtered by only keeping those molecules which hit twice in the confirmation duplicates. Finally, we calculated the Percentage Control from the control group to identify compounds that disrupted the Hsa-sTa interaction. This serves as an initial hit identifier but should be followed-up to confirm true actives and rule out false positives.
Percentage control (PC) calculated as follows: where, is the average alpha value for negative control ( ), positive control ( ) and compounds tested ( ). PC is a measure of the alpha signal of the 10 M test compound in percentage of the controls.

I. Protein dynamics and conformations
We used MD simulations to capture the internal dynamics of the proteins and find binding site conformations not seen in the crystal structure 50 . We calculated the root mean square fluctuation (RMSF) to identify the flexible regions (Fig 2a). Although the overall structure of the Siglec domain is rigid, we observed that the loops (CD, EF and FG) close to the binding pocket are flexible for all the adhesin proteins (Figs 2a, S1). In the case of Hsa BR , we observed that the CD and EF loops constitute the most flexible region of the protein. Moreover, critical binding pocket residues other than in these loops were identified from the crystal structure of Hsa BR and the native ligand (sTa) ( Table S1).
To capture new conformations of the binding pocket that deviate from the initial crystal structure, the root mean square deviation (RMSD) of the loop residues and other critical residues (previously known to bind to the native ligand) ( of each cluster was used for docking. The "ensemble" of structures obtained from clustering and crystal structure were superimposed to observe the deviation of structures (as shown in Fig 2b).
We observed that all the structures had similar secondary and backbone structures and the RMSD Calpha of the Siglec domain was calculated to be within ~1.5 Å. However, as seen in the superimposed structures (Fig 2b), the loop regions (especially CD and EF loop) have different orientations in the "ensemble" when compared to the crystal structure. Similarly, we observed that the side chains in the binding pocket residues orient differently between the structures, which can be critical for rigid body docking.

Residue number C D E F FG a) b)
Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200332/892847/bcj-2020-0332.pdf by guest on 19 September 2020 discovery programs [39][40][41] , it has not been characterized yet. Therefore, before performing the virtual screening, and although the goal of the present work is only to identify a molecular effector, we calculated some useful physicochemical properties. Firstly, we calculated the molecular weight (MW) of the compounds (Fig 3a), which is known to be critical for safety and tolerability reasons 51 . The Vanderbilt database has compounds with MW less than 500 Da that are considered to improve druglikeness 52, 53 and also has low MW compounds (<300 Da) that are considered better initial precursors because they serve as effective chemical starting points for lead optimization 54 . The polar surface area and the number of rotatable bonds have been found to better discriminate between compounds that are orally active 55 . It has been predicted that compounds with 10 or fewer rotatable bonds and those having a polar surface area of less than 140 Å 2 have a good oral bioavailability 55 . In our database, we observed that most compounds had a mean polar surface area of ~150 Å 2 and less than 10 rotatable bonds (Figs 3b, c). Lipophilicity (SLogP) is another factor which is known to influence drug potency, pharmacokinetics, and toxicity 52,56 . Compounds with SLog P values between −0.4 to +5.6 range are known to be more "druglike" 53,57 . Here, we found that most of the compounds fall within this range (Fig 3d).
Although the above is a set of physicochemical properties that are considered to be important for different aspects of druggability, there have been numerous FDA approved drugs which violate one or more of these rules 58,59 .

III. Docking results and poses
After our virtual screening, we first ranked all the top poses for each compound based on the AutodockVina scoring function 44 . Subsequently, we selected those compounds (~250) that were in the top 1% for Hsa BR but did not rank within the top 1% for any other adhesin protein (GspB BR , 10712 BR , SK150 BR , SrpA BR , SK678 BR ). This was followed by implementing consensus scoring in which the poses (obtained from AutodockVina) were energy-minimized and then rescored using two MOE scoring functions 46 (as mentioned in the Methods section). In the end, compounds that ranked within the top 50 for all the three scoring functions were suggested for experimental validation. Next, we examined the number of electrostatic intermolecular interaction (hydrogen bonds (HBs) and Pi (π-π/ π-H/π-cation)) which are important for protein-ligand binding 60-62 . The importance of hydrogen bonding in drug design is well recognized and hydrogen-bonding capabilities deeply influence the transport and ADME (Adsorption, Distribution, Metabolism and Excretion) properties of a molecule as well as its specific interaction with biological receptors 63 .
Many QSAR studies have been reported in which hydrogen-bonding interactions play a key role in modeling a particular target activity 64, 65 . Therefore, we calculated the number of compounds which form an interaction with the residues known to bind (or in close proximity) sTa to get information about which residues were targeted the most and are easily accessible to interact with a small molecule (Fig S2). Thr 339, Tyr 337 and Lys 335, which form HBs with the native ligand (sTa) in the crystal structure (PDB 6EFD) 24 as shown in Fig. 4a, are also some of the residues which form interactions (HBs or Pi) with majority of the compounds. Moreover, we found that majority of the compounds form 3 or more HBs or Pi interactions with the binding site residues, whereas 10 compounds make more than 5 HBs or Pi interactions (Fig. 4b).
Furthermore, out of these 50 compounds, 25 compounds (50%) were predicted to bind to two of the "ensemble" structures generated from MD simulations with higher score than to the crystal structure. This further illustrates the usefulness of using ensemble docking.

IV. Experimental validation
Alpha assay screening was performed for the top 50 compounds predicted to displace sTa (the highest affinity native ligand) from Hsa BR . The Z' factor value of the DMSO (negative control) versus untagged sTa (positive control) was 0.32, which denotes that there is a separation between the high and low signals of the assay in that 3x the sum of the standard deviations of the high and low signals of the assay divided by the difference of the mean of these two experimental groups is 0.68 (the error is relatively small compared to the separation of the mean of the two groups).
After filtering the small molecules using the experimental data based on the percentage control (PC), nine small molecules were retained. These nine compounds showed a statistically significant decrease in the signal when the two replicates were averaged (Fig. 5). These The IC 50 of the untagged sTa (positive control) was calculated as 8.67 M (Fig. S3a). At 10M concentration, the PC was 39% (Fig. S3b). At the same concentration, the PC of the 9 small molecules ranges from 23% to 70% and, out of these, two compounds have PC values of less than 39% and one has a PC of 41% (Table S2).

V. Computational and binding pose analyses of experimentally validated effectors
The nine small molecules were screened for 25 known toxic and carcinogenic fragments, such as anthracene, quinone, hydroquinone, butenone--Michael acceptor, chloroethane--Michael acceptor 47 . Of the 9 experimentally validated compounds (C1-C9) ( Table 1), only Compound 1 (C1) was identified as potentially toxic, containing a benzo-dioxane and a catechol group.
Moreover, to test the similarity between these effectors and the native ligand, fingerprint-based hierarchical clustering was performed. We found four clusters (as shown in Fig. S4), which showed that the compounds identified from the screen are diverse among themselves and are not Following the above cheminformatics analyses of the experimentally validated effectors, we examined the computational models of the best poses and the interactions of the nine small molecules shown to have inhibition experimentally ( Fig. S5 and Table 1). Interestingly, the inhibitor binding site is adjacent to the sTa binding site (Fig. 6)  and bind in adjacent sites (Fig. 6). However, the residues that form HBs with the effectors, also form HBs with native ligand or are in close proximity of it. Hence, it is likely that these nine compounds are able to displace the native ligand in part because they form HBs with these critical residues.

Conclusions
The SRR protein Hsa has been considered an attractive molecular target for drug discovery due to its role in infective endocarditis (IE Vanderbilt database was used for the small molecules since it covers a wide distribution of different physicochemical properties. The goal of combining these strategies was to improve the hit rate and reduce the number of false positives. Indeed, we were able to achieve a hit rate of ~20% and identified nine compounds that could displace the native ligand in the experimental assay. The binding poses of all the nine compounds identified from docking show that they are in close proximity with residues known to form HBs with the native ligand (sialyl-T antigen). These compounds may be used as a starting point for medicinal chemistry optimization. Further studies need to be conducted to characterize the binding affinities and poses of these identified compounds, and similar analyses for other sialoglycan-binding SRR proteins are ongoing.

Acknowledgements
We would like to thank Dr. Jarrod Smith for providing the small molecule database. We would also like to thank Dr. Paul Sullam for the valuable discussion. This research used resources of the