Scaffold fragmentation and substructure hopping reveal potential, robustness, and limits of computer-aided pattern analysis (C@PA)

Graphical abstract


Basic scaffold dissection and potential positive hit identification
In our latest report about C@PA, we identified so-called 'multitarget fingerprints' for the prediction of broad-spectrum ABCB1, ABCC1, and ABCG2 inhibitors amongst a manually assembled and curated initial dataset of 1,049 compounds [15]. The model was generated on the basis of (i) the identification of basic scaffolds amongst the most potent known ABCB1, ABCC1, and ABCG2 inhibitors; (ii) the definition of substructures with a positive impact regarding multitarget ABCB1, ABCC1, and ABCG2 inhibition; and (iii) the definition of substructures with a negative impact with respect to multitarget ABCB1, ABCC1, and ABCG2 inhibition. As a result, compounds 8-9 as well as 11 were discovered by a virtual screening as so-called 'class 7 compounds' (=IC 50 values below 10 mM toward ABCB1, ABCC1, and ABCG2; Fig. 3).
As a first step, we dissected the basic scaffolds ('Scaffold Fragmentation'; Fig. 4 A) as derived by C@PA [15], which resulted in the first extension of the positive pattern fingerprints with potential positive hits: (i) pyrimidine; (ii) pyrrole; (iii) pyridine; and (iv) thiophene. As a second step, we extended the structural variety of the non-aromatic heterocycles ('Heterocyclic Substructure Hopping'; Fig. 4 B) as derived and proposed by C@PA: (i) imidazolidine deduced from piperazine and homo-piperazine; (ii) homopiperidine and pyrrolidine deduced from piperidine; and (iii) homo-morpholine and oxazolidine deduced from morpholine. It   [15]. Compounds 12-15 resulted from a combined ligand-based approach using similarity search and pharmacophore modelling as reported by Silbermann et al. in 2019 [67]. The corresponding IC 50 values can be found in Table 1. Red mark: suggested secondary positive hits as proposed before [15]. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) V. Namasivayam et al.
Computational and Structural Biotechnology Journal 19 (2021) 3269-3283 must be noted that pyrrolidine was earlier identified by C@PA as a 'clear negative hit' [15]. However, for a detailed investigation of piperidine derivatives and their impact on ABCB1, ABCC1, and ABCG2 function, this clear negative hit needed overruling. As a final step, the two found novel aromatic substructures in compounds 7-11, 1,2,4-oxadiazole and 1,3,4-thiadiazole, were extended by five-membered rings that had conserved features of the original substructure ('Heteroaromatic Substructure Hopping'; Here again, it must be taken note that oxazole was also identified by C@PA as a clear negative hit [15], however, it was now added for a detailed evaluation of 1,2,4-oxadiazole derivatives toward their effect on ABCB1, ABCC1, and ABCG2 function. In total, the 8 'clear positive hits' as derived from C@PA [15] were extended by additional 5 suggested secondary positive hits [15] and 15 deduced potential positive hits. These 20 substructures will in the following be referred to as 'extended positive hits' ('Extended Positive Pattern').

Virtual screening and compound selection
The clear limit of C@PA was the prediction of ABCC1 inhibitors, as the discovered multitarget ABCB1, ABCC1, and ABCG2 inhibitors 7-11 were the only present ABCC1 inhibitors in the biologically evaluated set of 23 compounds [15]. To counteract this effect, we selected a virtual screening data set that favored ABCC1 inhibition as reported by us before [67]. This set of molecules comprised of 1,510 compounds that resulted from a combined virtual screening approach for the prediction of ABCC1 inhibitors. It is known that~23.5% of these compounds comprised of ABCC1 inhibitors. We favored this virtual screening dataset compared to other options because the identified ABCC1 inhibitors (12)(13)(14)(15)Fig. 2) were in parallel multitarget ABCB1, ABCC1, and ABCG2 inhibitors, which possibly increased the chance to identify novel lead molecules for broad-spectrum ABCB1, ABCC1, and ABCG2 inhibition.
As a first step, the 1,510 compounds were screened for redundant molecules in form of stereoisomers to increase the diversity of the virtual screening data set. In total, 281 were removed, resulting in 1,229 unique compounds. These were in a second step subject to the negative pattern search [15]. While 383 compounds have been eliminated, 846 remained in the virtual screening dataset. Finally, the 846 compounds were screened for the extended positive hits. At least one of these favored substructures was present in these 846 molecules with the following distribution: (i) 1 time: 29 molecules; (ii) 2 times: 277 molecules, (iii) 3 times: 356 molecules; (iv): 4 times: 149 molecules; (v) 5 times: 34 molecules; (vi): 6 times: 1 molecule. From these 846 potential broadspectrum ABCB1, ABCC1, and ABCG2 inhibitors we manually selected and purchased 10 candidates (compounds 16-25; Fig. 5) depending on manner and number of extended positive hits present and general molecular composition, as well as commercial availability and affordability at MolPort Ò (www.molport.com). Fig. 6 shows the virtual screening flow as exerted in this study.

Biological evaluation
Compounds 16-25 were screened at 10 mM in calcein AM (ABCB1 and ABCC1) as well as pheophorbide A (ABCG2) fluorescence accumulation assays using either ABCB1-overexpressing A2780/ADR, ABCC1-overexpressing H69AR, or ABCG2overexpressing MDCK II BCRP cells, respectively, as reported earlier [15,43,45,67,101]. Calcein AM and pheophorbide A are substrates of ABCB1 and ABCC1 as well as ABCG2, respectively, which passively diffuse into the used cells and become extruded by the corresponding ABC transporter. Inhibition of the respective transporter results in the accumulation of these substrates. Calcein AM is subsequently cleaved by intracellular esterases to the fluorescent calcein, while pheophorbide A is already fluorescent. Intracellular fluorescence was determined via microplate reader (calcein AM; ABCB1 and ABCC1) and flow cytometry (pheophorbide A; ABCG2), respectively. Compounds 2 ( Fig. 1) and 26-27 ( Fig. 5) were used as reference inhibitors against ABCB1, ABCC1, and ABCG2, respectively, as reported before [67,101]. Fig. 7 provides the screening results for ABCB1 (A), ABCC1 (B), and ABCG2 (C).

Pharmacophore modelling
In our previous study, we have explored different ligand-based approaches to validate C@PA [15]. A generated pharmacophore model based on the 6 most potent and diverse class 7 compounds ( Supplementary Fig. 1) showed a sensitivity value of 60.4% and a specificity value of 44.5% (C@PA: 62.5% and 90.8%, respectively). Five pharmacophore features were discovered: (i-iv) F1-F4: aromatic/hydrophobic; and (v) F5: acceptor ( Fig. 9 A). In the present study, we aimed for an additional investigation of the potential binding properties of compound 23. Hence, we performed a search on the recently presented pharmacophore model [15] for triple ABCB1, ABCC1, and ABCG2 inhibitors [15] by generating conformers of compound 23. As can be seen in Fig. 9 B, compound 23 reflected all five pharmacophore features as derived in the multitarget pharmacophore model [15], which confirms compound 23 as a moderately potent triple ABCB1, ABCC1, and ABCG2 inhibitor. In addition, compound 23 did also reflect all five pharmacophore features as derived from the previously reported similarity search and pharmacophore modelling approach [(i) F1: aromatic; (ii-iii) F2 and F3: aromatic/hydrophobic; (iv) F4: hydrophobic; and (v) F5: acceptor; Fig. 9 C-D] [67]. This suggests that compound 23 represents a good lead molecule for further improvement via synthesis to gain novel potent multitarget ABCB1, ABCC1, and ABCG2 inhibitors focusing ABCC1 inhibition. Furthermore, the findings support the hypothesis of a common multitarget binding site amongst different ABC transporter subfamilies as postulated earlier [15,74].

Statistical framework of C@PA
The aim of the present study was to extend the knowledge regarding multitarget fingerprints independent from the statistical background as reported previously [15]. This measure was necessary as it is almost impossible to change the statistical distribution of substructures amongst multitarget inhibitors (classes 4-7),  (27), respectively, used in the present study [67,101]. The corresponding IC 50 values of compounds 16-25 can be found in Table 1 specifically class 7 molecules, but also non-multitarget compounds (classes 0-3), unless a compound library of significant size (=hundreds of compounds) compared to the initial dataset of 1,049 compounds of C@PA [15] is synthesized and biologically evaluated on all three transporters. This is unlikely to happen within the next years.
The clear limit of the presented study was to discover class 7 compounds, supporting the threshold values set initially as the selection criteria of C@PA. [15]. These C@PA-derived clear positive hit and clear negative hit substructures are an important framework to obtain potent multitarget ABCB1, ABCC1, and ABCG2 inhibitors [15]. Especially the 32 clear negative hits proved to be of major importance compared to the only 8 found clear positive hits. However, the present work revealed that changes in these substructure compositions are tolerated, indicating an acceptable robustness of C@PA. This can also be visualized when comparing the initial hit rate for multitarget ABCB1, ABCC1, and ABCG2 inhibition of the virtual screening data set (23.5%) [67] with the hit rate of 40% found in the presented work, which indicates that C@PA_1.2 is an even more powerful methodology for the prediction of broadspectrum ABCB1, ABCC1, and ABCG2 inhibitors. Strikingly, the present work demonstrated that the combination of C@PA with other computational approaches, in particular similarity search and pharmacophore modelling, led to a predictive synergism. Hence, the refinement of computer-chemical approaches with improved patterns and data sets may provide even higher biological hit rates in further developed pattern analysis models (e.g., C@PA_1.X).

Potential of extended positive hits: under-represented substructures
Several defined extended positive hits were reflected in the discovered multitarget ABCB1, ABCC1, and ABCG2 inhibitors 16 and 22-24, namely (i) pyrimidine (24), (ii) pyridine (22)(23)(24), (iii) isoxazole (16), (iv) imidazole (23), (v) pyrazole (22 and 24), and pyrrolidine (16). This discovery ultimately showed that the 8 clear positive hits as derived from C@PA [15] may indeed be supported by secondary positive hits, revealing the high potential of substructure extension in C@PA. A detailed analysis of these substructures according to their statistical distribution amongst the 133 known multitarget ABCB1, ABCC1, and ABCG2 inhibitors [15,43,45,62,67,75, showed that the substructures isoxazole, imidazole, pyrazole, and pyrrolidine occurred only 1, 3, 1, and 2 times [67,96,106,107,117,127], respectively, in these 133 compounds, and were generally only present in 1, 16, 8, and 8 molecules, respectively, of the initial dataset of 1,049 compounds as used in C@PA [15]. Our results indicate that these 'underrepresented substructures' pose a high exploratory potential for the improvement of C@PA's prediction capabilities and the discovery of novel pan-ABC transporter inhibitors, as their specific statistical evaluation as exerted in our previous report [15] can easily be changed with a small number of additional compounds.

Potential of extended positive hits: rejected putative positive substructures
The omnipresent substructures pyrimidine [15,17] and pyridine [15] must be seen in a different light, as these cannot be regarded on their own as indicators for multitarget ABCB1, ABCC1, and ABCG2 inhibition due to their ubiquitousness. However, our results indicate that these substructures have generally a positive impact on broad-spectrum ABCB1, ABCC1, and ABCG2 inhibition, depending on the composition of and combination with other substructures. Statistically, pyrimidine and pyridine occurred 56 and 28 times, respectively, in the 133 known multitarget ABCB1, ABCC1, and ABCG2 inhibitors [15,43,45,62,67,75,. In terms of class 7 compounds, 26 and 14 molecules contained pyrimidine and pyridine, respectively [15]. Indeed, pyrimidine and pyridine could not be considered as clear positive hits in our previous study [15] because many compounds of the other classes 0-6 contained these substructures as well (407 and 209 molecules, respectively). However, these 'rejected putative positive substructures' -which, nevertheless, resulted in class 7 molecules in a significant number -must be taken into special consideration for the further improvement of C@PA's prediction capabilities (e.g., C@PA_1.X). Besides pyrimidine and pyridine, we identified 14 more substructures from the initial data set of 1,049 compounds [15] that should be reconsidered in terms of multitarget ABCB1, ABCC1, and ABCG2 inhibition in particular, and pan-ABC transporter inhibition in general: (i) aniline; (ii) benzoyl; (iii) benzyl; (iv) cyano; (v) 9deazapurine; (vi) ether; (vii) ethylenediamine; (viii) methoxy; (xiv) methoxyphenyl; (x) phenol; (xi) phenyl; (xii) piperazine; (xiii) pyrrole; and (xiv) resorcin. Cyano, methoxy, and piperazine were already proposed in our previous study as secondary positive hits [15]. Nevertheless, it must clearly be noted that the percentage of occurence of these particular substructures amongst class 7 compounds is rather low. However, they might support other, clearer positive indicators of broad-spectrum ABCB1, ABCC1, and ABCG2 inhibition, enhancing compound potency through their proportionate contribution and combination, which represents a high potential for further developed C@PA-derived models (e.g., C@PA_1.X). Table 1 The determined IC 50 values of compounds that resulted in an inhibition level of !20% [+ standard error or the mean (SEM)] in the preliminary screening ( Fig. 7 A-C) determined in calcein AM (ABCB1 and ABCC1) and pheophorbide A (ABCG2) assays, respectively, applying ABCB1-overexpressing A2780/ADR, ABCC1-overexpressing H69AR, and ABCG2overexpressing MDCK II BCRP cells, respectively, as described earlier [15,43,45,67,101]. The reference inhibitors (ABCB1: 2; ABCC1: 26; ABCG2: 27) served as positive controls as already reported earlier [67,101], defining 100% inhibition. Buffer medium served as a negative control (0%). Shown is mean ± SEM of at least three independent experiments. Light rose mark: IC 50 values of the triple ABCB1, ABCC1, and ABCG2 inhibitors 7-15 as reported earlier [15,67]; dark rose mark: within this work discovered novel multitarget ABCB1, ABCC1, and ABCG2 inhibitors. a Compound was reported before [15]. b Compound was reported before [67]. c Not determined due to the lack of inhibitory activity in the initial screening (Fig. 7 A-C). The present study contributed to a major understanding of pattern analysis and possibilities to extend chemical patterns with the purpose to enhance the prediction rate to obtain biologically active compounds. The statistical distribution of certain substructures that occurred in class 7 or class 4-6 molecules in the initial data set of 1,049 compounds needs revision and re-evaluation, taking the results of the present study into account. We propose a ranking methodology to maximally increase the impact of secondary positive substructures in combination with primary positive hits for the best possible multitarget ABCB1, ABCC1, and ABCG2 inhibition. Deciphering the interconnection between manner, number, as well as the orientational composition of certain substructures and maximal possible impact on ABCB1, ABCC1, and ABCG2 will provide potential candidates for biological screening on other ABC transporters, exploring their nature, function, as well as their suitability as therapeutic or diagnostic drug targets. Furthermore, recent advances in crystallographic methodologies, such as cryo-EM, increasingly provided structural information of ABC transporters of different sub-families. This will allow for the analysis of the 'multitarget binding site' [15,74] with the identified multitarget pan-ABC transporter inhibitors applying a combination of structure-based computational approaches. Using the knowledge derived from C@PA, C@PA_1.2, and potentially C@PA_1.X, new truly multitarget pan-ABC transporter modulators will be derived that could address less-and under-studied ABC transporters to tackle common and rare human diseases.

Virtual screening dataset
The virtual screening dataset of the 1,510 putative ABCC1 inhibitors was derived by a combined similarity search and pharmacophore modelling approach as described earlier [67]. In short, an initial dataset of 288 known ABCC1 inhibitors with definite IC 50 values was collected from ChEMBL [128] and categorized ['active' (IC 50 < 1 mM); 'moderate' (IC 50 = 1-10 mM); 'inactive' (IC 50 > 10-mM)]. Similarity search applying the FTrees algorithm [129,130] from BioSolveIT GmbH (Sankt Augustin, Germany) was conducted with a Tanimoto coefficient (Tc) of 0.8 by which the database was analyzed according to 4 query molecules ( Supplementary Fig. 2) [101,104,131,132]. The flexible alignment tool as well as the MMFF94x force field implemented in MOE (version 2016.08; Chemical Computing Group ULC, Montreal, QC, Canada) were applied for pharmacophore modelling using UNICON [133] to generate the 1000 best (=quality level 3) conformers with a tolerance distance of 1.5 Å and a threshold of 50.0% conservation. Virtual screening was performed with the ZINC12 library [134] consisting of 16,403,865 molecules from which a set of 1,510 molecules as potential ABCC1 inhibitors resulted.

Chemicals
The reference ABCB1 inhibitor 2 as well as the reference ABCG2 inhibitor 27 were purchased from Tocris Bioscience (Bristol, UK). The standard ABCC1 inhibitor 26 was synthesized as described earlier [101]

Calcein AM assay
The inhibitory activity against ABCB1 and ABCC1 was evaluated in a calcein AM assay as reported earlier [15,67,101]. Compounds 16-25 were added into a 96-well flat-bottom clear plate (Greiner, Frickenhausen, Germany) at a concentration of 100 mM and 160 mL of cell suspension containing either ABCB1-overexpressing A2780/ ADR (30,000 cells/well) or ABCC1-overexpressing H69AR (60,000 cells/well) cells were added. The incubation period at 37°C under 5% CO 2 -humidified atmosphere lasted 30 min before 20 mL of a 3.125 mM calcein AM was added to each well, subsequently followed by measurement of fluorescence increase at an excitation wavelength of 485 nm and an emission wave length of 520 nm in 60 sec intervals for 1 h in either POLARstar and FLUOstar Optima microplate readers (BMG Labtech, software versions 2.00R2/2.20 and 4.11-0; Offenburg, Germany). The slope values from the linear fluorescence increase revealed the effect value which has been normalized to the effect value of 10 mM of either compounds 2 (ABCB1) or 26 (ABCC1). As candidates [16][17]19, and 21-24 as well as 16, 18, 22-24 resulted in significant inhibition (20% + SEM) of ABCB1 and ABCC1, respectively, full-blown concentration-effect curves were generated and IC 50 values were calculated applying GraphPad Prism (version 8.4.0, San Diego, CA, USA) using the statistically preferred model (three-or four-parameter logistic equation).

Pheophorbide A assay
The inhibitory activity against ABCG2 was evaluated in a pheophorbide A assay as reported earlier [15,67].

Retrospective pharmacophore analysis
In our earlier study, a pharmacophore model was generated to evaluate the performance of C@PA [15]. This model generation has been accomplished on the basis of the 6 most potent and diverse ABCB1, ABCC1, and ABCG2 inhibitors ( Supplementary  Fig. 1) [43,99,101,103,110] by aligning these molecules using the flexible alignment tool as described in 4.1.1 applying MOE (version 2019.01) [15]. The best alignment was selected and the pharmacophore model was generated using the consensus method implemented in the Pharmacophore Query Editor with a threshold value of 50.0% and a tolerance distance of 1.2 Å. The conformers of the most potent multitarget ABCB1, ABCC1, and ABCG2 inhibitor in this work, compound 23, were generated using the conformational search tool by selecting the stochastic search method implemented in MOE 2019.01. The default parameters were applied for the conformational search with a maximum limit of 10,000.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.