Identi ﬁ cation of mitochondrial toxicants by combined in silico and in vitro studies – A structure-based view on the adverse outcome pathway

Drugs that modulate mitochondrial function can cause severe adverse e ﬀ ects. Unfortunately, mitochondrial toxicity is often not detected in animal models, which stresses the need for predictive in silico approaches. In this study we present a model for predicting mitochondrial toxicity focusing on human mitochondrial respiratory complex I (CI) inhibition by combining structure-based methods with machine learning. The structure-based studies are based on CI inhibition by the pesticide rotenone, which is known to induce parkinsonian motor de ﬁ cits, and its analogue deguelin. After predicting a common binding mode for these two compounds using induced-ﬁ t docking, two structure-based pharmacophore models were created and used for virtual screening of DrugBank and the Chemspace library. The hit list was further re ﬁ ned by three di ﬀ erent machine learning models, and the top ranked compounds were selected for experimental testing. Using a tiered approach, the compounds were tested in three distinct in vitro assays, which led to the identi ﬁ cation of three speci ﬁ c CI inhibitors. These results demonstrate that risk assessment and hazard analysis can bene ﬁ t from combining structure-based methods and machine learning.


Introduction
Mitochondria are eukaryotic cell organelles which are known as the power plants of the cells, since they are capable of producing vast amounts of adenosine triphosphate (ATP) via oxidative phosphorylation [1]. Additionally, mitochondria are also involved in several other metabolic processes such as fatty acid oxidation [2], iron-sulfur cluster synthesis [3], or calcium signaling [4]. Due to their central role in metabolism, mitochondrial dysfunction is linked to various diseases like cancer [5], cardiovascular diseases [6], diabetes [7], liver diseases [8], and also to neuronal diseases such as Parkinson's or Alzheimer's disease [9]. Consequently, drugs that impede mitochondrial function can cause severe adverse effects. The antidiabetic drug troglitazone, for example, was withdrawn due to severe liver injury that was later on shown to be linked to toxic effects on mitochondria [10]. Unfortunately, mitochondrial toxicity is often not detected in animal studies as the animals used are young, mostly rodents, and have strong mitochondrial reserves [11].
On a molecular basis, mitochondrial toxicity of a compound is linked to interaction with e.g. the phospholipid bilayer, the mitochondrial DNA (mtDNA), or the calcium uniporter [12]. However, one of the most prominent mechanisms leading to mitochondrial toxicity is modulation of the electron transport chain (ETC), where oxidative phosphorylation takes place. The ETC is located in the mitochondrial inner membrane and comprises four protein complexes (CI -IV), which establish an electrochemical proton gradient that provides the energy for the ATP-synthase to create ATP from ADP (adenosine diphosphate) [13].
In risk assessment, the framework of adverse outcome pathways (AOP) links biological/toxicological data to adverse effects [14,15]. Recently, an AOP has been published which underlines the evidence that CI inhibition by the pesticide rotenone (1) leads to Parkinsonian motor deficits. In this specific AOP, the molecular initiating event is the binding of the inhibitor to CI. Afterwards, several key events, such as inhibition of CI, mitochondrial dysfunction, and neuro-inflammation lead to the adverse outcome Parkinsonian motor deficits [16]. CI consists of 45 subunits (44 unique protein chains) whose complete cryoEM structure was published in 2017 (PDB ID: 5XTD, 3.7Å) [17] (Fig. 1a).
The goal of this study was to target the first key event in the AOP, i.e. the CI inhibition, by structure-based modeling approaches. Combining docking of compounds into the rotenone binding site at CI with pharmacophore-based screening and machine learning led to a workflow which could identify compounds prone to cause mitochondrial toxicity via CI inhibition.
The model was validated via virtual screening of two large compound libraries followed by selection of twelve compounds for experimental testing in three different in vitro assays. Using a tiered approach, we first assessed the perturbation of mitochondrial membrane potential (MMP), a measure of mitochondrial integrity. Subsequently, we assessed the substances' hazard to be neurotoxic and mitotoxic, using the Glc-Gal-NeuriTox test [20]. Finally, we assessed specificity over the other respiratory complexes.

Alignment
The binding site of rotenone in CI is located from mutation studies in Yarrowia lipolytica. Human NDUFS2 and yeast NUCM share the same fold and have a sequence identity of 64%. The residues mutated in the proposed rotenone binding site (NDUFS2 and NDUFS7, D140, S143, M185, F200, F204, V457, D455 and M94 human nomenclature and numbering) are conserved in yeast and human CI [19]. The multiple sequence alignment was performed using the online tool promals3D [21,22] implementing the NDUFS2 and the NDUSF7 subunit from seven different organisms (human, bovine, mouse, Yarrowia lipolytica, Rhodobacter capsulatus, Escherichia coli and Thermus thermophilus). (ESM1, Suppl. Item 5)

Docking and clustering
Protein and ligand preparation, induced fit docking and clustering have been performed in Maestro 2017-4 (SCHRÖDINGER), [23] using the OPLS3 force field. Ligprep and the Protein preparation wizard were used with default settings, with pH 7 ± 2. To generate various conformers of the ligands the conformational search tool has been used. Docking has been performed, using the induced-fit docking protocol with default settings and extended sampling. Induced-fit docking was used to allow protein flexibility within the pocket [24]. The center of the grid was set between the residues 140, 200, 457 in NDUFS2 of 5XTD. Docking was performed without constraints, including only the subunits around the binding-site (chain: B, C, E, P, N, Q, j and s). Clustering of rotenone and deguelin was performed within the conformer clustering module according to their common scaffold.
. Clustering based on Volume Overlap was performed regarding the normalized volume overlap and complete as linkage method. RMSD calculations were performed using the RMSD-calculation node of the Schrödinger package in KNIME 3.6.2.

Pharmacophore modeling and virtual screening
Pharmacophore modeling and virtual screening were performed in LigandScout 4.2.1 [25,26]. The preparation of the screening libraries was performed with the iCon module, using iCon Best (max. 200 conformations per compound). The screening was performed using iscreen module, with default settings.

Machine learning models
The three machine learning models used in this study, a gradient boosting [27], a random forest [28] and a deep learning model ensemble [29], were recently developed by us [30]. Briefly, the models were trained on a dataset for mitochondrial toxicity compiled from the Tox21 dataset [31], the Zhang dataset [32], ChEMBL [33] and by literature search. In total the dataset consists of 5761 compounds (4940 actives, and 827 inactives) with the cutoff for activity set at a pChEMBL value ≥ 5. For training the gradient boosting model the data was split into training (80%) and test (20%) set. Further it was trained using an extensive model selection workflow available on the KNIME 3.6.3 analytics platform on the examples server 04/Analytics/11_Optimization/08Model_Optimisazion_and_Selection using five different type of fingerprints (ECFC6, ECFP6, ECFP4, AtomPair and RDKit) and different model algorithms. The optimal combination of fingerprint, algorithm, and hyperparameters was found via a large grid search. For the deep learning model a nested cross-validation with folds based on a previous clustering was used to determine the best hyperparameters. The deep learning model was trained using keras and tensorflow and the random forest model using the scikit-learn library version 0.19.0. The random forest model was trained with 10 RDKit descriptors and a smaller subset of 1412 randomly chosen molecules. All three models have been validated using a common external test set. For further details see Hemmerich, et al. [30] and also our GitHub repository https://github.com/ PharminfoVienna/Mitochondrial-Toxicity.

Compounds synthesis
Compounds 3-14, were bought from Enamine (https://enamine. net/), where they were synthesized (Catalogue numbers: Z1550449875, Z1417029922, Z33474488, Z28738114, Z2666334399, Z2647724547, Z2222823426, Z1748268470, Z1840858432, Z1728504684). All of the compounds reported in the manuscript have a purity of ≥ 95% (except 13, 91%).  [17]). a The complex is divided into a membrane arm, were proton pumps are located, and into the matrix arm where the ubiquinone binding site is located [18]. The inhibitor binding site is located next to the ubiquinone binding site at the interface of the sub units NDUFS2 and NDUFS7 [19]. b The pesticides and CI inhibitors rotenone (1) and deguelin (2).

HepG2 cell culture
The HepG2 cell line was maintained in complete medium consisting of DMEM (Dulbecco's modified Eagle's medium, Fisher Scientific), 10% FBS (fetal bovine serum, South American Fisher Scientific) and penicillin/streptomycin (respectively 25U/ml and 25 µg/ml, Fisher Scientific). Cells were kept at 37°C and 5% CO 2 .

Mitochondrial membrane potential assay in HepG2
Cells were seeded at a density of 10.000cell/well in a black µclear plate (384-wells, Greiner Bio-One). Upon 48 h the nuclei were loaded with Hoechst-33342 (200 ng/ml, life technologies) and Rhodamine123 (rho123, 1 µM, Sigma Aldrich). After 75 min of incubation the medium was replaced with medium containing rho123 (0.2 µM) and propidium iodide (100 nM). Subsequently cells were exposed to a concentration range of the chemicals. The Hoechst, rho123 and PI signal intensity (respectively 408, 488 and 561 nm laser) was tracked for 24 h every hour using a high throughput imaging platform (Nikon TiE200 with perfect focus system and automated stage; Nikon Amsterdam, The Netherlands).
The obtained images of the 3 dyes were analysed using CellProfiler (version 2.2.0 broad institute, Cambridge USA). The used pipeline was built based on a previous workflow in CellProfiler [34]. The Hoechst intensity was used to identify single nuclei (segmentation module) [35]. Rho123 intensity was assessed in the cytoplasm, which was segmented as a distance of 5 pixels around the nucleus. Finally, cell death was assessed as the fraction of nuclei that were both positive for the Hoechst and PI signal. The obtained intensity values per cell were stored in a HDF5file. Membrane potential values are represented as an average per picture of the integrated pixel intensity of all identified nuclei and related cytoplasms.

Glucose-Galactose-NeuriTox test
LUHMES cells were differentiated for 48 h (d2), detached with 0.05% trypsin/EDTA (Invitrogen) and seeded into 96 well plates at a density of 100,000 cells/cm 2 . One hour after seeding (i.e. after attachment), the cells were treated for 24 h with the test substances. On d3, cells were life-stained with 1 µg/ml Hoechst H-33342 and 1 µM calcein-AM. Automated microscopy and high-content image analysis was performed to identify viable cells (Hoechst and calcein doublepositive), dead cells (Hoechst single-positive), relative viability (V = life cell count/total cell count) and the corresponding neurite area (NA). The NA was calculated as described before (Di et al., 2012;Wink et al., 2017). In brief, from the total calcein-positive area, the somatic area was subtracted, which was identified by expansion of the nuclear region). Concentration response curves were obtained and fitted using a 4-parameter Hill function with the upper and lower asymptote constrained to 0% and 100%, respectively. Substances were classified to be specifically neurotoxic, if their EC 25 ratio (V/NA) was in either or both of the media ≥ 4. Substances were classified to be specific mitochondrial toxicants, if their ratio of neurite outgrowth impairment between Glc and Gal condition was ≥ 2 (EC 25 NA(Glc/Gal)) [20,37].

Seahorse mitochondrial CI and III activity analysis
For the analysis of CI and CIII activity/inhibition, proliferating LUHMES cells were seeded into Seahorse 24 well plates at a density of 205,000 cells/cm 2 one day before the assay in proliferation medium. The assay was started by permeabilization of the cells using MAS-buffer (70 mM sucrose, 220 mM mannitol, 10 mM KH 2 PO 4 , 5 mM MgCl 2 , 2 mM HEPES, 1 mM EGTA, 4 mg/ml fatty acid free BSA; pH = 7.2) supplemented with 1 mM ADP and 25 µg/ml digitonin, for 13 min directly before starting the Seahorse measurements. CI and III activity was assessed by monitoring the oxygen consumption rate (OCR) before and after injection of the test substances and complex-specific substrates and inhibitors in parallel between test substance treated wells and solvent-control treated wells. As CI substrate, 5 mM pyruvate, 2 mM L-glutamine and 2.5 mM malate was used. To fuel CIII, 250 µM duroquinol, 1 µM rotenone (CI inhibitor) and 5 mM malonate (CII inhibitor) were used. To assess the non-mitochondrial respiration, 0.5 µM antimycin A and 0.5 µM rotenone were injected. Mitochondrial activity (i.e. OCR) was defined to be the total OCR subtracted by the non-mitochondrial OCR, always using values relative to solvent control. The relative complex inhibition was calculated as 100% -activity.

Identification of new CI inhibitors by structure-based design
The binding pocket was identified based on residues known from previously published mutations in Yarrowia lipolytica at the interface of NDUFS2 and NDUFS7. In this study, residues were mutated in order to impair the physicochemical properties and shape of the binding site. The study presents mutations which lead to a loss of the inhibitory potency of rotenone (D143A, S146C, M188C, F203E, D458A, V460L and M91C (yeast numbering)) [19]. Thus we hypothesized that these residues were essential for the binding of rotenone (1). For creating binding mode hypotheses for rotenone and its structural analogue deguelin (2), we performed induced-fit docking of both compounds into the human CI (PDB ID 5XTD (17)). The 521 poses (109 for deguelin, 412 for rotenone) retrieved from induced-fit docking were clustered according to their common scaffold [38,39], which led to four clusters containing rotenone as well as deguelin poses (ESM1, Suppl. Item 1). For each of the four clusters, the root mean square deviation (RMSD) between rotenone and deguelin poses was calculated. The poses were ranked according their RMSD and their docking score (XP GScore), followed by visual inspection of the poses. In the final proposed binding mode (Fig. 2), the deguelin pose is the best scored deguelin pose and the rotenone pose is within the top 100 of 412 rotenone poses. The RMSD between the two poses is 0.257 Å and for both, rotenone and deguelin, hydrogen bonding between the methoxy groups and T83 and G85 of NDUFS7 can be observed. Furthermore, both rotenone and deguelin poses were located in the second most populated cluster (74 poses; 20 deguelin, 54 rotenone), which also contains the best scored rotenone and deguelin poses.
The proposed common binding mode of rotenone and deguelin was then used to create two structure-based pharmacophore models, using LigandScout 4.2.1 (Fig. 3). First, a very specific model, containing nine features and a dense excluded volume coat, which is referred to as pharmacophore A, and, second, a less specific model containing only five fixed and one optional feature with a less dense excluded volume coat, which is referred to as pharmacophore B. The optional feature only catches deguelin but not rotenone. However, it was included in pharmacophore B to gain a higher specificity of the model. Subsequently, the two models were used for virtual screening of DrugBank (DB) release 5-1-1 [40] and Chemspace (CS) release Feb. 2018 ("https://chem-space.com/"). DrugBank was selected for screening to cover available and possible drugs. In fact, this library contains approximately 6000 compounds which are either drugs, withdrawn drugs or experimental drugs. Conversely, Chemspace contains approximately 50 million compounds and was chosen to cover a larger chemical space. As expected, pharmacophore A (CS 438 hits, DB 1 hit -rotenone) was much more selective than pharmacophore B (CS 105,090 hits, DB 18 hits) and therefore retrieved less hits.

Refinement of the hit list by machine learning models
To include ligand-based information and to refine the hit list using a methodologically different method, the retrieved hits were subjected to three distinct machine learning models, a gradient boosting, a random forest and a deep learning model, developed recently by us (see materials and methods and (30)). Each compound which was hit by one of the two pharmacophores, was predicted either "toxic", assigned as 1, or "nontoxic", assigned as 0, in each of the models. Further, this information was used for hit list refinement.
To validate the approach of combined structure-based toxicity prediction and machine learning, twelve compounds (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14) were selected for experimental testing. For compound selection, we considered hits retrieved from DrugBank (pharmacophore B) and from Chemspace (pharmacophore A). Out of the 18 hits retrieved from DrugBank, nine were predicted to be active by at least one of the machine learning models. Unfortunately, only compounds 9 and 14 were commercially available and were thus subject to experimental testing. The hits retrieved from screening Chemspace (pharmacophore A) were ranked regarding their pharmacophore fit score and their prediction in the machine learning models using the Pareto-ranking node in KNIME 3.6.2. The five subsets constituting the deep learning model were taken into account separately to implement the predictions of each subset in the ranking. Subsequently, considering also commercial availability and pricing, ten of the top ranked compounds were selected for experimental testing (Table 1). Applying the Pareto-ranking, all models (pharmacophore fit score and machine learning models) were weighted identically. Hence, e.g. 3 and 11 were selected due to their predicted activity in the machine learning models, 5 on the other hand, due to the high pharmacophore fit score (96.51). The selected compounds (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14) were subjected to in vitro testing and validation of their predicted mode of action, i.e. CI inhibition.

Experimental testing of hits from virtual screening
Using a tiered approach, we first assessed the MMP, a measure of mitochondrial integrity. Second, since mitochondrial CI-inhibitors may trigger Parkinsonian motor deficits [16], we went further using a neuronal cell line, often used in parkinsonism research and applied the Glc-Gal-NeuriTox test [20]. Additionally, we intended to be more specific on the mechanism and the adverse outcome. Thus, in the third assay, we assessed the impairment of mitochondrial respiratory chain function to specifically test for CI inhibition.

Assessment of mitochondrial membrane potential
Chemical-induced mitochondrial perturbation can be assessed in cellular systems based on effects upon mitochondrial integrity. One way to quantify mitochondrial integrity is to monitor the presence of the   MMP between the intermembrane space and the mitochondrial matrix. This method relies on the accumulation of membrane potential-dependent dyes into the negative charged matrix. Using a tiered approach, first the assessment of a general effect upon mitochondrial integrity can be performed using the HepG2 cell line, which is a cell line suitable for early and quick screening (Fig. 4a, b and c). Evaluation of the effects on MMP in a concentration and time dependent manner provides the first evidence for the identification of mitochondrial toxicants and possible parkinsonian liabilities via CI interference (Fig. 4d). All 12 identified possible CI inhibitors were assessed in a concentration range from 0.0025 µM to 50 µM over a period of 24 h for their effects on the MMP (Fig. 4e, f). Notably, none of the substances was toxic in the used assay settings (PI values not shown). The positive controls deguelin and rotenone affected the MMP already after 2 h exposure at low µM concentrations. Furthermore, the IC values of both demonstrated a shift of 10-fold at 24 h. Of the assessed new set of chemicals only 10 did not affect mitochondrial integrity in the tested concentration range. 3, 4 and 6 perturbed the MMP at high nM ranges and 5, 8 and 11 affected the MMP in the low µM range. None of the chemicals demonstrated the~10-fold difference in IC values between the early and late time points that was observed for deguelin and rotenone exposure (Fig. 4f). Overall, of the selected 12 chemicals, three (3, 4 and 6) perturbed the MMP at concentrations similar to deguelin and rotenone.

Screening for potential mitochondrial toxicants applying the Glc-Gal-NeuriTox test
The Glc-Gal-NeuriTox test was applied to identify specific mitochondrial toxicants (mitotoxicants). This test relies on the metabolic reprogramming of LUHMES cells under different (glucose (Glc) vs galactose (Gal)) medium sugar conditions (Fig. 5a). In the Gal-condition, the neuronal cells show a predominant mitochondrial phenotype, making them especially sensitive for mitotoxicants. By evaluating the neurite outgrowth, an energy-consuming and neuro-specific process, in parallel with general cell viability by automated high content imaging (Fig. 5b), specific neurotoxicants can be identified as well. The classification of the test substance as either neurotoxic and/or mitotoxic, depends on the effective concentration that impairs the assessed measures (Fig. 5c). The tested substances can either be (i) inactive (effects ≤ 25%), (ii) unspecifically cytotoxic (i.e. reduction of neurite area and viability at similar concentrations), (iii) specific neurotoxicants (i.e. impairment of neurite outgrowth at lower concentrations than viability), (iv) specific mitotoxicants (i.e. stronger effects in Gal than in Glc condition), or (v) specific neuro-and mitotoxicants.
Each compound was assessed at ≥ 10 different concentrations in three independent experiments. 4 was identified to be a specifically neurotoxic compound, but likely to be not interfering with mitochondrial metabolism (Fig. 5d). In contrast, 5 was identified to be strongly neuro-and mitotoxic (i.e. different toxicity in Glc and Gal). Also 11 was identified to be mitotoxic (Fig. 5e). 14 was strongly cytotoxic without indicating a specific mitochondrial-or neuronal toxicity. For 3, no exact prediction could be made within the concentration range tested. At the highest tested concentration, the EC 25 for neurite outgrowth impairment was reached in Gal conditions, while in Glc this parameter was not affected. These data suggest that 3 may be mitotoxic, but further data would be required for confirmation.

Assessing the impairment of mitochondrial respiratory chain function
Screening in the Glc-Gal-NeuriTox test suggested that substances 5, 11 (and 3) may indeed inhibit CI. To assess this directly, LUHMES cells were permeabilized to make their mitochondria accessible. By monitoring CI-mediated oxygen consumption at CIV (via feeding CI substrates pyruvate, malate and glutamine), CI activity was assessed (Fig. 6a). Indeed, substances 3, 5 and 11 were the only substances that Fig. 5. Glucose-Galactose-NeuriTox screen to identify potential mitochondrial inhibitors. The Glc-Gal-NeuriTox test is based on the modulation of the neurite outgrowth of LUHMES (human dopaminergic neuronal) cells, and can be used to identify specific neuro-and/or mito-toxicants. a Neuronal differentiation from proliferating precursors is performed for 48 h (two days), before cells are re-plated into 96 well plates and treated with test substances for 24 h. On day 3 of differentiation, the assay readout is performed by high content imaging and automated analysis. By testing two conditions in parallel that favor either glycolytic (glucose (Glc) condition) or mitochondrial metabolism (galactose (Gal) condition), mitochondrial toxicants can be identified by an increased toxicity in the Gal/ mitochondrial condition. b Automated high content imaging is used to record pictures of nuclei (stained with H-33342) and neuronal cytoplasm (stained by calcein-AM) including neurites and cell bodies (somata). An automated algorithm identifies the nuclei, determines the somatic area and subtracts the latter from the total calcein-positive area to obtain the neurite area (NA, red mask). In parallel, viability (V) of the investigated cultures is determined by evaluating the single/double staining for H-33342 and calcein. Only life cells are double positive (encircled green), while dead cells are encircled pink. Example pictures of the DMSO solventcontrol (0.1% (V/V)), the two left images have a width of 330 µm, the two right enlargements have a width of 165 µm. c The prediction model makes use of the effective concentrations (EC) that are necessary to reduce viability (V) or neurite area (NA) by 25% relative to control, and classifies the test substances as specific neurotoxicant (n) and/or specific mitotoxicants (m) if applicable. d Exemplary data of test compound effects on V and NA in the presence of Glc or Gal. 4 exemplifies a neurotoxic (n) compound, while 5 was neurotoxic and mitotoxic (m, n; strong difference of NA-Glc and NA-Gal). The gray areas indicate concentrations above the solubility limit. e Summary data of the Glc-Gal-NeuriTox test for all 12 hit substances (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14) and the two positive controls rotenone and deguelin (1 and 2), for the latter two data are taken from (20). inhibited CI significantly, i.e. beyond the noise band of 20%. 5 was the most efficient substance, by reducing CI activity by ca. 40% at 7 µM tested (i.e. the EC 25 NA,Gal), 11 was tested at 35 µM (its EC 25 NA,Gal), at which it reduced CI activity by ca. 25%. 3 was tested at 40 µM (its EC 25 NA,Gal) and reduced CI activity as well by ca. 25% (Fig. 6b). In order to exclude that the test substances inhibited a complex downstream of CI that might cause an "apparent CI inhibition", we investigated CIII activity, the complex directly downstream of CI. Neither 3, 5 nor 11 showed a significant (> 20%) inhibition of CIII. Thus we concluded that neither CIII, nor downstream complexes (i.e. CIV or CV) were affected by the test substances (Fig. 6c). In summary, 3 of the 12 hit compounds can be classified as bona fide CI inhibitors. The initial results from the Glc-Gal-NeuriTox screen were completely confirmed by the more detailed mechanistic investigation. Moreover, the two most efficient inhibitors, 5 and 11, were subjected to detailed concentration-response analysis for their inhibitory effect on CI. IC 25 concentrations for these compounds were 7.9 and 56 µM, respectively, confirming the prior observed offset in efficacy at the single concentration tested (Fig. 6d). Although both substances inhibited CI activity by ca. 65%-75%, a full inhibition of CI activity could not be observed within the soluble concentration range.

Docking of the in vitro tested compounds
Compounds 3-14 were docked into human CI (PDB ID 5XTD). To exclude any bias from the rotenone and deguelin pose, induced-fit docking was performed into the apo structure. The binding site, as well as the settings for docking were chosen in the same way as for docking rotenone and deguelin (Experimental Section). Since in this case the compounds do not share a common scaffold, the 269 poses retrieved were clustered according to their volume overlap, which resulted in 14 clusters. Six clusters contain only inactives, three clusters comprise both actives and inactives, and five clusters contain only actives. (ESM1 Suppl. Item 2) The latter were analysed further using XP Gscore, ligand protein interactions, as well as visual inspection in order to propose binding modes for the actives 3, 5 and 11 (Fig. 7).
As expected, the selected binding pose of the most potent inhibitor 5 (IC 25 of 7.9 µM) overlays better to the proposed binding mode of rotenone and deguelin (Fig. 7d), than the pose of 11 (IC 25 of 56 µM) (Fig. 7e). Since 5 is the only compound of the three newly discovered specific CI inhibitors showing also specific neurotoxicity in the Neur-iTox test, we hypothesize that it might be the most likely one triggering the AOP that leads to Parkinsonian motor deficiencies as adverse outcome.

Conclusion
By combining structure-based methods and machine learning a refined hit-list of 12 potential CI inhibitors (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14) was retrieved. The MMP was decreased by 3, 4 and 6 at high nM concentrations and by 5, 8 and 11 in the low µM range. In vitro screening in the Glc-Gal-NeuriTox test identified that the substances 3, 5 and 11 were most likely mitochondrial toxicants. Direct assessment of CI inhibition in LUHMES neurons showed that these three substances were genuine CI inhibitors. The potency ranking within the newly identified CI inhibitors is as follows: 3 (weak) < 11 (medium) < 5 (strong CI inhibitor). Hence, we were able to identify key event ligands for the AOP inhibition of CI in neurons leading to parkinsonian motor deficits, by Fig. 6. Inhibition of mitochondrial respiratory chain complex activity. a To directly assess the activity of each mitochondrial respiratory chain complex, LUHMES cells were selectively permeabilized with digitonin and fed with specific substrates and inhibitors to monitor CI and CIII activity. The diagram illustrates that if the test for CIII activity showed no effect (≤20% change from baseline), then CIII-V were not affected by the test compound (serial connection of these complexes). Inhibition of CI (by > 20%) was assessed in a separate assay. b CI activity in the presence of test compounds was performed at concentrations corresponding to the EC 25 (NA,Gal). If this was not possible, 50 µM (i.e. 1:1000 of the stock concentration, resulting in 0.1% DMSO) were used. Each point represents an independent experiment. c For the substances that showed a CI inhibition, CIII activity was quantified (via oxygen consumption at CIV, initiated by providing the CIII substrate duroquinol) to ensure specificity. Tested concentrations were the same as in b, each point represents an independent experiment. d The most effective CI inhibitors (5 and 11) were tested at different concentrations for their effect on CI and CIII. Concentrations that inhibited CI-mediated respiration by 25% or 50% relative to control are indicated. Data are means ± SEM; N = 3 (for CI data of 5 and 11, and CIII data of 5), N = 2 for CIII of 11), N: independent experiments (not technical replicates).
using a structure-based approach followed by hit refinement via machine learning models. This allowed to combine the predictive power of machine learning with the interpretability of docking poses. Once the method is developed and validated, it could be implemented also for other AOPs, paving the way for a more structure-based view on the prediction of complex toxicological endpoints.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Fig. 7. Poses of the CI actives (3, 5 and 11) retrieved by induced fit docking. The NDUFS2 unit is colored in turquoise and the NDUFS7 in pink, the ligand is colored in grey sticks. a Predicted binding mode for 3. The selected pose belongs to the cluster with the second most poses of 3 and is the fifth best scored pose. The pose shows hydrogen bonding to T189 and backbone interactions with I456 b predicted binding mode for 5. The pose shown belongs to the cluster containing most of the poses of 5. In addition, it is also the second best scored pose; it shows hydrogen bonding to Y141 and backbone interactions with I456. c All poses for 11 were located in the same cluster. Here the best ranked pose is shown with backbone interactions to H92 and F458. The two most potent inhibitors 5 d and 11 e are shown superimposed to 1 and 2, represented in pink and blue sticks.