Scaffold Hopping and Bioisosteric Replacements Based on Binding Site Alignments

Bioisosteric replacements and scaffold hopping play an important role in modern drug discovery and design, as they enable the change of either a core scaffold or substitutes in a drug structure, thereby facilitating optimization of pharmacokinetic properties and patenting, while the drug retains its activity. A new knowledge-based method was developed to obtain bioisosteric or scaffold replacements based on the extensive data existing in the Protein Data Bank. The method uses all-against-all ProBiS-based protein superimposition to identify ligand fragments that overlap in similar binding sites and could therefore be considered as bioisosteric replacements. The method was demonstrated on a specific example of drug candidate – a nanomolar butyrylcholinesterase inhibitor, on which bioisosteric replacements of the three ring fragments were performed. The new molecule containing bioisosteric replacements was evaluated virtually using AutoDock Vina; a similar score for the original and the compound with replacements was obtained, suggesting that the newly designed bioisostere compound might retain the potency of the original inhibitor.


INTRODUCTION
HE concepts and methods of bioisosterism and scaffold hopping are becoming increasingly important in modern drug design. [1]The term bioisostere represents a functional group that can be used to replace another group in a drug molecule, while this new molecule retains the desired biological activity. [2]Bioisosteres are often used to replace a functional group that is important for binding to a target, but a new group in its place would: improve selectivity, lessen side effects, improve pharmacokinetics, increase metabolic stability, simplify synthetic route, or help avoid patent-related issues.Meanwhile, scaffold hopping can be considered a special type of bioisosteric replacement where the core structure of a small drug molecule is replaced. [3]The core structure may be of direct functional importance for interacting with the target protein or it may provide the necessary alternative scaffolding that allows for substitution with functional groups in the appropriate spatial arrangement.
While the concept of bioisosterism is relatively old, only recently have computational systematic and data-intensive methods been used to explore the appropriate replacements.These data mining approaches are becoming widespread, mostly due to the public availability of large structural databases of bioactive molecules and due to new efficient methods for identifying appropriate bioisosteric or scaffold replacements.BIOSTER, for example, is a database of compiled bioisosteric transformations obtained over the past forty years from literature. [4]ChEMBL (www.ebi.ac.uk/chembl/) is a public domain database of small molecules with associated bioactivity data that has been mined from the medicinal chemistry literature which contains over 1.5 million distinct compounds (release version 20). [5]Using ChEMBL, one can identify experimentally observed molecular transforms that exhibit bioisosteric characteristics.An example of such T method is the Matched Molecular Pair (MMP) concept, [6] which is based on the identification of molecules in the ChEMBL that differ only in one functional group.This kind of identification allows for the analysis of potential changes in biological properties that may be affected by this transformation.The MMP concept implemented using the ChEMBL database has been made freely available on-line as the Swiss-Bioisostere database (www.swissbioisostere.ch). [7]sing this system one can quickly identify potential bioisosteric replacements for the query functional group.0][11][12] In this work, we developed a unique knowledgebased method to identify possible bioisosteric or scaffold hopping replacements for small molecules.We find bioisosteric replacements based on local binding site alignments using the ProBiS algorithm, [13][14][15] which enables finding bioisosterically replaceable fragments even in distantly related proteins.ProBiS superimposes binding sites and not entire protein structures (folds) nor protein sequences and seeks for locally similar spatial arrangements of physicochemically similar surface functional groups.It is therefore able to differentiate between binding pocket conformations in proteins with similar overall sequence identity, as well as recognize similar binding pockets in proteins with very different sequences.This, we believe, is one of the main advantages of our method.In addition, to our knowledge, a rigorous method with clearly defined rules on how identified bioisosteric replacements should be connected to the original query structure does not exist yet.
The process of identification of possible bioisosteric replacements was started using the ProBiS algorithm, [13][14][15] with which we superimposed the holo protein structures from the Protein Data Bank (PDB) containing co-crystallized small ligands.We compared and aligned binding sites that contained co-crystallized ligands; we superimposed the binding sites that had similar three-dimensional amino acid residue arrangements.
The co-crystallized ligands were superimposed together with the binding sites, but were not used in the alignment procedure.Overlaps in the superimposed ligands' molecular structures were then sought.The overlapping ligands were fragmented to the basic elements of their structures, such as rings and various functional groups.The elements that had high spatial overlap measured by the Hausdorff distance were considered as bioisosterically replaceable.Thus, a database of about 14.000 possible bioisosteric replacements was created based on the PDB structural data that will benefit early stages of the drug development process.The collection of identified bioisosteric replacement pairs is available at http://insilab.org/files/bober/BoBER.zip.

Binding Sites Data Preparation
For binding sites database preparation we followed the same procedure as described previously. [16]In the following, we briefly repeat the steps taken.
Step 1.For each protein chain (PDB/Chain identifier) in the April 2016 PDB release we constructed the biological assembly that contains this chain, based on the rules in the header of the corresponding PDB file.A biological assembly is the three-dimensional macromolecular assembly composed of the protein chains that are or are believed to be the biological functional form.Next, we extracted the cocrystallized ligands of the protein chain identified by HETATM from each biological assembly file; where we considered only ligands with > 7 heavy atoms, and at least one atom being < 3 Å from the particular protein chain.Non-specific binders and modified residues denoted by the MODRES code were discarded.The modified residues such as selenomethionines and glycosylation sites were discarded due to them not being true ligands; they are chemically modified amino acid residues that are part of the protein's chain.From each extracted ligand we then obtained its binding residues, i.e. protein atoms < 5 Å from any ligand atom, and generated the corresponding binding site surface files (.srf).
Step 2. We constructed an initial non-redundant protein single chain database using a sequence identity cutoff at 40 %.This was performed by clustering ~290.000 protein chains in the PDB by their sequence identity.Each resulting sequence cluster contained > 40 % sequence identical protein chains.In total, 27.148 clusters were obtained.From each of these clusters we then selected one protein chain with the lowest resolution as the representative chain, resulting in 27.148 unique protein chains.We then assigned each binding site obtained in Step 1 to its corresponding sequence cluster.
Step 3. Within each cluster obtained in Step 2, we compared each pair of binding sites (all against all) using ProBiS, and calculated their pairwise z-scores. [14]Then we clustered binding sites within each sequence identity cluster and obtained around 11.000 binding site clusters.Each such binding site cluster contained from a few to many thousands of binding sites.
Step 4. For each binding site cluster we then superimposed all its member sites, their coordinates, together with their ligands onto the representative binding site, which is the largest binding site containing the ligand with most heavy atoms.To further reduce computational complexity, the remaining protein chains were compared only to the top scored (z-score > 0) binding sites from their corresponding cluster representative.Thus we obtained the representative binding site surfaces.
Step 5. Using the ProBiS algorithm, we compared each existing protein structural chain in the PDB with the binding site surfaces obtained in Step 4.There are > 290.000 protein chains in the PDB so to facilitate this task, we took the approximately 44.000 representatives of 90 % sequence identity clusters (obtained by the same procedure as in Step 2) and compared each against the binding site surfaces from Step 4. To further reduce computational complexity, the remaining protein chains were compared only to the top scored (z-score > 0) binding sites from their corresponding cluster representative.This allowed us to get accurate predictions for each protein chain in the PDB at a lower computational cost comped to if we considered all of the protein chains.Finally, only similar binding sites with z-score > 2.5 were considered further.From the obtained alignments for each protein chain, we obtained possible overlapping ligands that could be used as potential bioisosteric replacements.When a ligand is at the interface of two protein chains, it will be considered twice -one time for each chain.

Fragmenting of Overlapped Ligands
After binding site superimposition, all of the ligands were fragmented using an "in-house" developed C++ program to obtain smaller fragment-like structures which should be suitable as bioisosteric replacement groups.The program works by identifying all the rotatable bonds within a molecule.The molecule is then broken along these bonds, and all bond atoms are proclaimed as "join" atoms; all other atoms are "core" atoms.Finally, atom types for every atom in each fragment are determined using the IDATM algorithm as implemented in the UCSF Chimera. [17]ragments that are not interesting as bioisosteric replacements, that is, fragments containing less than four heavy atoms and fragments lacking at least one atom that could act as a hydrogen bond donor or acceptor, are not considered further.
As reconnecting the newly obtained fragments to the original structure is not trivial, rigorous atom typing of join atoms is essential to determine how to reconnect the bioisosteres in a new molecule.Atom typing also decides if the reconnection is possible at all.

Rules for Bioisosteric Replacements
We defined new rules for join atom replacements to expand the search space of possible bioisosteric replacements for each original fragment.These rules ensure that bioisosteres can be found for most fragments making the approach widely applicable: Rule 1. Join atoms in the bioisosteric fragment can be replaced with atoms having similar hybridization properties.For example, see fragments 3 and 5 in Table 1, in which N.pl atom types were replaced with C.sp2 as in the reference fragment (Figure 1, upper left and Figure 5).
Rule 2. If a bioisosteric fragment has more join atoms than the reference fragment, then only those join atoms from the bioisosteric fragment that are bound to core atoms that overlap join-atom-binding core atoms of the reference fragment are considered; others are ignored, for example in fragments 2 and 4 in Table 1 we ignore C.ar atoms; in 5 we ignore the C.sp3 atom (Figure 1, upper right).
Rule 3. If we do not obtain any bioisostere replacements due to the uniqueness of the reference fragment, then the structure of this fragment is reduced by removing some join atoms.A smaller and usually less unique fragment is thus obtained.
Rule 4. If the bioisosteric fragment has less join atoms than the reference fragment, then join atoms on the bioisosteric fragment are created.A join atom is added to the core atom of the bioisosteric fragment that is the closest to the overlapped core atom of the reference fragment bound to a join atom.The new join atom on the bioisosteric fragment is assigned the atom-type of the original join atom on the reference fragment based on which it was created.
Rule 5.One or more of the above rules can be applied in sequence.A special case is the combination of Rule 3 and Rule 4 (Figure 1, lower half).By applying Rule 3, we remove join atoms that are required by Rule 4 in order to add join atoms to the discovered bioisosteric fragments.This problem is solved by reconnecting join atoms to the reduced reference fragment in the same way as they were in the original fragment after the completion of database screening (Figure 1, lower half, step 2).Rule 4 can then be used.Examples of fragments obtained by first applying Rule 3 and then Rule 4 are fragments 6-9 in Table 2.

Fragment Overlap Evaluation via Hausdorff Distance
To assess the appropriateness of a certain fragment as a bioisostere, we determined its spatial overlap with the reference fragment using the Hausdorff distance (HD) as a measure of molecular overlap. [18]Such "molecular" HD is a very fast and effective way to measure the similarity between a pair of candidate fragments.The HD is defined as the greatest of all the distances from a point in one set to the closest point in the other set (Figure 2).We calculated the molecular HD between the candidate bioisostere and reference fragments based on the overlapping points of their van der Walls surfaces.The fragment pairs having HD < 1.5 Å were considered as bioisosterically replaceable.The overall procedure is summarized in Figure 3.

RESULTS AND DISCUSSION
Bioisosteric replacements are commonly used in drug design to alleviate unwanted properties of drugs such as low solubility, low bioavailability, or high toxicity.Using our approach, we obtained 14.407 potential bioisosteric replacements.To assess the structural diversity of these bioisosteric pairs, we measured the Tanimoto coefficient of each individual pair using the OpenBabel's FT4 method. 19he average Tanimoto coefficient was ~0.5 ± 0.3; thus, on average the structures of the bioisosteric replacement pairs differ significantly.
To demonstrate the usefulness of our approach, we applied it to a previously published nanomolar butyrylcholinesterase inhibitor [20] (PDB code: 3F9) serving as a reference compound (Figure 4, upper half).First, the inhibitor structure was fragmented, enabling recognition of its basic structural elements.Then, possible bioisosteric replacements for the existing fragments in the reference molecule were identified.As rings are often the most promising for bioisosteric replacements, the search was focused on the three rings found in this compound, that is the naphthalenyl, the piperidinyl and the dihydroindenyl rings (Figure 4, lower half).5).Rule 2: reference fragment (orange) and the bioisosteric replacement fragment (blue) overlap due to the alignment of their co-crystallized binding sites.The C.sp3 join atom in the bioisosteric fragment (red circle) is removed as there is no corresponding overlapping atom in the reference structure.The bioisostere N.sp2 join atom is overlapped with the reference C.sp2, thus considered further (green circle).Rule 3: to reduce the uniqueness of the reference fragment the N.pl join atom is removed (red circle).Then, database screening is initiated and overlapping bioisosteric structures are found (blue).Then, Rule 4 cannot be used, as we have no join atoms on the reference (orange) structure (green circle).Therefore, we reconnect the original N.pl join atom to the reference fragment (green circle).Rule 4: the N.pl join atom is created on the bioisosteric fragment as a copy of the reference join atom (green circle).For the naphthalenyl moiety we considered the fragments as bioisosterically replaceable if their HD < 1.0 Å.We found five non-trivial cyclic fragments (Table 1) that also contained the appropriately hybridized join atom (C.sp2 or N.pl) -meaning that the fragments can be connected to the main structure in a similar way as in the original reference fragment, and thus retaining an appropriate chemical environment.In fragments 3 and 5 the connecting join atom is N.pl, which would lead to a structure containing an improbable N=O-N group.Therefore, we replaced this join atom according to Rule 1 (see Methods) with C.sp2 atom, to obtain a more probable structure (Figure 5).Further, based on Rule 2, we have retained the N.pl join atom in fragment 5, while ignoring the C.sp2 join atom.We did this because the core atom that binds the C.sp2 join atom in the bioisosteric fragment does not overlap with any join-binding core atom from the reference fragment (Figure 1, upper right).
For the dihydroindenyl moiety as the reference fragment (Figure 4, right) we could not find any possible bioisosteric replacements; however, if we excluded the N.sp3 join atom using Rule 3 (see Methods) and used this reduced fragment without the join atom to scan the database, we were able to obtain potential bioisosteric replacements (Table 2).These exhibited low HDs towards the reference fragment.As fragments 6-9 did not contain any join atoms, we added them based on the spatial overlapping of the reference and bioisosteric core atoms that are bound to the join atoms in the original structure (Rules 4 and 5 and Figure 1, lower half).
Finally, we repeated the same procedure for the piperidinly fragment without the C.sp3 join atom bound to the core carbon, since using the exact same moiety (Figure 4, middle) we did not obtain any overlapping fragments with low HDs.Observing many examples of overlapping fragments, we found empirically that HD of 1.5 Å is most likely the uttermost limit at which the fragments are still suitable for bioisosteric replacement.Here, the lowest scoring fragment when using the original structure was over 2.0 Å.Consequently, we used a simpler structure containing only core atoms of the piperidinly moiety (Rule 3).With this procedure we obtained possible bioisostereic fragments, which are shown in Table 3. Similarly as in the previous example, one of the join atoms in every fragment 11-15 was added based on the spatial overlap of the reference and bioisosteric core atoms that were bound to the join atoms in the original structure (explained in Figure 1, lower half and by Rules 4 and 5).
To further test our bioisostere replacement approach for scaffold hopping, we built a new molecule (Figure 6) in which all of the rings of the 3F9 structure were replaced with their best predicted bioisosteric pair based on HD (fragments 1, 6, and 11).This new molecule was then aligned to the reference compound in the binding site of      the butyrylcholinesterase using ChemAxon MarvinSketch [21] three-dimensional alignment tool.The new molecule was further manually optimized to better the alignment (Figure 7).
To evaluate the binding energy of the original 3F9 structure and of our bioisostere structure, we used the AutoDock Vina [22] score only function.The binding energy obtained for the original structure was -9.7 kcal/mol, while for the bioisosteric structure it was -8.4 kcal mol -1 , which is within the standard error of 2.8 kcal mol -1 for AutoDock Vina. [22]ased on this result, we can therefore predict that the binding energy of our new ligand might be similar to that of the original co-crystallized ligand.

CONCLUSION
We developed a knowledge-based method that uses binding sites superposition to identify possible bioisosteric and scaffold hopping replacements for existing inhibitors.The novel predicted bioisosteric replacements were obtained from existing crystal structures of ligands in the PDB, which makes us confident that the newly obtained compounds will retain the activity.A major difficulty was to decide how the join atoms should be considered.Join atoms that are a part of reference fragments can make some fragments, despite the large size of the PDB database, unique, and in such cases no suitable bioisosteric replacements can be obtained.Thus to expand the possibility of finding good replacements, we defined five rules that can be used for this purpose.Another problem with the structures obtained is the possible synthetic inaccessibility of these compounds.However, methods exist that can check the theoretically built structure for synthetic accessibility; [23] therefore we can still screen out compounds which could not be efficiently synthesized, thus focusing on accessible compounds in further experimental studies.We showed a use case of our new method on the example of butyrylcholinesterase inhibitors.The generated bioisostere compound had the predicted energy of binding almost equal to that of the reference butyrylcholinesterase inhibitor.A definite confirmation of the usefulness of this method will come from experimental studies, which are planned in the future.It is also our plan to implement this method in LiSiCA, [24] our ligand-based virtual screening software, to enable searching large databases for similar ligands not only on the basis of atom type similarity but also based on possible bioisosteric or scaffold hopping replacements.

Figure 1 .
Figure 1.Rules for bioisosteric replacements.Rule 1: N.pl join atom (red circle) is replaced by C.sp2 join atom (green circle), having the same hybridization state, to enable connection of the new bioisosteric fragment (blue) with the rest of the compound structure (shown in Figure5).Rule 2: reference fragment (orange) and the bioisosteric replacement fragment (blue) overlap due to the alignment of their co-crystallized binding sites.The C.sp3 join atom in the bioisosteric fragment (red circle) is removed as there is no corresponding overlapping atom in the reference structure.The bioisostere N.sp2 join atom is overlapped with the reference C.sp2, thus considered further (green circle).Rule 3: to reduce the uniqueness of the reference fragment the N.pl join atom is removed (red circle).Then, database screening is initiated and overlapping bioisosteric structures are found (blue).Then, Rule 4 cannot be used, as we have no join atoms on the reference (orange) structure (green circle).Therefore, we reconnect the original N.pl join atom to the reference fragment (green circle).Rule 4: the N.pl join atom is created on the bioisosteric fragment as a copy of the reference join atom (green circle).

Figure 2 .
Figure 2. Molecular Hausdorff distance calculation.Onesided (oHD) distances are calculated for overlapping molecules A and B. oHD(A, B) is the minimum distance between a surface point of an atom in molecule A that is the furthest from any surface point in molecule B, and a surface point of atoms in molecule B; the oHD(B, A) is calculated using reversed rules.The molecular HD(A,B) is the larger of the two oHDs, i.e., the oHD(A,B) (bold).

Figure 3 .
Figure 3. Overview of the bioisosteric replacement generation approach.(1) Using ProBiS, protein structures (cyan and orange ribbons) are superimposed based on their local binding site (red arrow) similarity; (2) The cocrystallized ligands (blue and orange sticks) are rotated and translated based on the superimposition matrices acquired from the protein binding sites superimposition; (3) Ligands are fragmented and the HD between pairs of overlapped fragments originating from different ligands are calculated; (4) A pair of bioisosterically replaceable fragments.

Figure 4 .
Figure 4.The structure of the butyrylcholinesterase inhibitor (PDB code 3F9) (above) and the structures of the three ring fragments obtained (below).Some fragments have overlapping atoms (join atoms); e.g., the same nitrogen atom (red asterisk) is present in both the piperidinyl and dihydroindenyl fragment.The hybridization states of join atoms are explicitly shown, i.e., C.sp2, C.sp3, and N.sp3.

Figure 5 .
Figure 5.A new bioisosteric derivative of compound 3F9 with the naphthalenyl moiety replaced by benzothiazolaminyl fragment 3 from Table 1; the N.pl join atom suggested by original bioisosteric replacement fragment was replaced according to Rule 1 with an C.sp2 atom for chemical feasibility.

Figure 6 .
Figure 6.A molecule obtained by performing bioisosteric replacements of all rings in the 3F9 ligand.The fragments used for replacements were 1, 6 and 11.

Figure 7 .
Figure 7. Bioisosteric structure (orange) generated with our approach overlapped with the original inhibitor 3F9 (blue) in the butyrylcholinesterase binding site (green).A good overlap between the structures important for binding (hetero atoms and cyclic fragments) is observed.Based on AutoDock Vina scoring function the binding energy of the 3F9 ligand is -9.7, while for the new potential inhibitor it is -8.4 kcal mol -1 .

Table 2 .
Five best examples (compounds 6-10) by HD of possible fragment replacements for the dihydroindenyl fragment.

Table 1 .
Five best examples (compounds 1-5) by HD of possible fragment replacements for the naphthalenyl fragment.

Table 3 .
Five best examples (compounds 11-15) by HD of possible fragment replacements for the piperidinyl fragments.