Machine learning on drug-specific data to predict small molecule teratogenicity

Pregnant women are an especially vulnerable population, given the sensitivity of a developing fetus to chemical exposures. However, prescribing behavior for the gravid patient is guided on limited human data and conflicting cases of adverse outcomes due to the exclusion of pregnant populations from randomized, controlled trials. These factors increase risk for adverse drug outcomes and reduce quality of care for pregnant populations. Herein, we propose the application of artificial intelligence to systematically predict the teratogenicity of a prescriptible small molecule from information inherent to the drug. Using unsupervised and supervised machine learning, our model probes all small molecules with known structure and teratogenicity data published in research-amenable formats to identify patterns among structural, meta-structural, and in vitro bioactivity data for each drug and its teratogenicity score. With this workflow, we discovered three chemical functionalities that predispose a drug towards increased teratogenicity and two moieties with potentially protective effects. Our models predict three clinically-relevant classes of teratogenicity with AUC = 0.8 and nearly double the predictive accuracy of a blind control for the same task, suggesting successful modeling. We also present extensive barriers to translational research that restrict data-driven studies in pregnancy and therapeutically “orphan” pregnant populations. Collectively, this work represents a first-in-kind platform for the application of computing to study and predict teratogenicity.


Risky prescriptive behavior in pregnancy 2
Teratogenicity is the most serious manifestation of iatrogenic fetal toxicity: teratogens lead to 3 fetal malformation and are implicated in lifelong physical and/or mental disabilities 1 . 4 Nonetheless, clinical trial results of drug exposure during pregnancy are often conflicting 2-4 , and 5 teratogenicity scoring for small molecules is unsystematic and performed outside the clinical 6 environment [5][6][7] . The consequences of this subjectivity are seen in the high rate of unintended 7 maternal exposure to a teratogenic agent 8 , reminiscent of the "thalidomide disaster" of the early 8 1960s 9,10 . Following this disaster, randomized, controlled trials (RCTs) were modified to exclude 9 pregnant populations, fearing unintended teratogenicity from exposure to unsystematically 10 profiled drugs 10 . This change continues to "orphan" pregnant women, as many diseases in 11 women's health lack safe and effective drug choices for treatment 8,11,12 . 12 In the wake of the "thalidomide disaster," the United States Food and Drug Administration 13 (FDA) developed a five-point scale for ranking the teratogenicity of a compound [7][8][9]11 . This scale 14 is presented in  16 A hallmark of the binning within this scale is the absence of definitive human data: at present, 17 teratogenicity scores are established pre-clinically by pharmacologists, who evaluate biomarkers 18 of fetal toxicity in animal models 5,6 . This approach is inherently limited, as common in vivo 19 models are not sufficiently representative of human physiology 13 , and human subjects are not 20 included in the teratogenicity scoring process for ethical reasons 11,14,15 . Indeed, the limited 21 human data available for teratology scoring are often derived retrospectively from high-profile 22 cases of fetal malformation resulting from drug exposure 9, 16,17 . While new FDA standards for 23 scoring teratogenicity acknowledge these limitations by providing fewer, more holistic toxicity 1 scores, these standards still suffer from the absence of robust human data and are not yet 2 integrated in clinical decision-making tools 18 . 3 Collectively, the factors above create a significant degree of uncertainty at the point of care 4 (POC), as providers are guided on contradictory, incomplete, and non-human derived 5 information in their choice of prescriptions for pregnant women. This dilemma is of special 6 consequence to expectant mothers with chronic morbidities pre-existing to their pregnancies 11 . 7 1.2. Target rationale for teratogenesis 8 Fetal exposure to a teratogen in utero strongly associates with cognitive and/or physical 9 disabilities, resulting from dysregulation of key developmental processes such as neurulation, 10 purine and pyrimidine synthesis, and lipid anabolism 2, 19 . 11 Broadly, teratogens may be categorized by their mechanism of action (MOA) as either "on-12 target" or "off-target [20][21][22] ." "On-target" teratogenicity implies the generation of adverse 13 phenotypes from bioactive agents impacting well-defined protein targets that are critically 14 regulated in development. In contrast, "off-target" teratogenicity implies mutagenicity, resulting 15 from DNA damage such as alkylation and thymine dimerization. "Off-target" teratogenicity 16 involves repeated reactions between a teratogen and newly-synthesized nucleic acid residues, 17 often resulting from the generation of reactive oxygen species (ROS) generated from drug 18 metabolism 20 . 19 Thus, teratology is known to converge on few principal MOA classes 19,23 , which are outlined in 20  1 The inherent contradiction between the limited target rationale for teratogenesis and the extent of 2 uncertainty that guides prescribing behavior for gravid populations speaks to the need for more 3 rigorous predictions of small molecule teratogenicity. Furthermore, computational modeling on 4 healthcare data is the most accurate method of predicting drug safety in pregnant women, given 5 that phase I trials are unethical for expectant populations and animal models are inherently 6 limited for studying human health 12,13,24 . 7 Classification algorithms are optimized to identify patterns between associated data sets (such as 8 binding affinity and phenotype data for a cytotoxic target) 25-28 , suggesting that machine learning 9 (ML) classifiers may play a pivotal role in systematically establishing relationships between 10 maternal drug history and adverse fetal outcomes 29-31 . While these models are not intended as a 11 replacement for existing physician knowledge of responsible prescriptive practice 32 , ML 12 classifiers offer an attractive opportunity to discover meaningful relationships within existing 13 biomedical data than could result in meaningful POC conclusions. chemical exposure-adverse outcome clusters 17 . 5 In this study, we report on a previously-unattempted, unbiased (phenotype-agnostic and target-6 agnostic) approach to predicting teratogenicity by identifying chemical and biochemical factors 7 that predispose a chemical to increased teratogenic risk. Given significant limitations in 8 established teratogenicity scoring criteria, we propose a novel application of ML to develop a 9 teratogenicity quantitative structure-activity relationship (QSAR) 37 . By leveraging drug structure, 10 meta-structural elements like molecular energetics, and real-world bioactivity data, we attempt to 11 predict the teratogenic risk of drugs potentially prescriptible in pregnancy.

1
Our teratogenicity QSAR accesses chemical and bioassay data to predict a teratogenicity score 2 for compounds that are prescriptible in pregnancy and to identify patterns within drug-specific 3 information that predispose a drug towards an increased risk of fetal toxicity. 4 Broadly, we leverage three layers of drug data to accomplish these tasks: 5 1. This fingerprinting process is only valid for organic small molecules; therefore, all inorganic 3 agents were automatically parsed from our drug set by the ChemmineR and Rcdk fingerprinting 4 algorithms 38 . Thus, fingerprinting allowed us to access comprehensive, structured information on 5 nearly nine thousand (9,000) small molecules and one-hot encode this information. 6 As noted above, we obtained FDA-compliant teratogenicity data from SafeFetus, the largest 7 publically-available source of structured FDA teratogenicity scores with an API. Integrating the 8 data sets for teratogenicity and drug structure in R, we obtained N = 611 drugs with information 9 on both structure and teratogenicity. 10 We then developed multiple label classification strategies for teratogenicity scores, based on the 11 nature of FDA teratology scores and a bibliostatistic search. One set of teratogenicity scores for 12 all 611 drugs was aligned according to native FDA schema. A second set of scores was redefined 13 as a three-pronged scale of bins: "Clinically Acceptable Risk" (scores A/B), "Moderate Risk" 14 (score C), and "Clinically Unacceptable Risk" (scores D/X First, to discover clustering relationships between teratogenicity and drug structure, the Barnes- Noting that multiple structure-teratogenicity relationships resulting from our t-SNE analysis were 5 validated in the literature, we considered our unsupervised ML model to be a successful proof-6 of-concept experiment. This-along with meaningful multiclass ROC analysis for structure-2.3. Layer 2: Curating meta-structural information for exploratory analysis 1 After deriving a successful model for predicting teratological risk from drug structure, we sought 2 to increase the predictive accuracy of our GBM by supplementing our features with information 3 on "meta-structure 63  Given that teratogenicity has well-identified target rationale, we decided to leverage existing, 12 real-world bioassay information for all targets implicated in teratogenesis (as described GBM with five-fold CV was re-executed with a three-pronged set of teratogenicity scores and 15 feature-prioritized structural, meta-structural, and biochemical assay information. 16 Hyperparameters were optimized by large grid search within Caret. 17 We are committed to open-source science. Code that we developed to execute this protocol is 18 available through the following GitHub repository: https://github.com/apchalla/teratogenicity- 19 qsar. In this manuscript, we present a first-in-kind application of ML to identify structural, meta-2 structural, and bioassay performance factors that predispose a drug towards increased teratogenic 3 risk. We developed a model to prospectively score a drug's teratogenicity from these drug-4 specific factors. Because our workflow is anchored in computing, our methods apply algorithmic 5 rigor to studying teratogenicity, a contrast to many non-systematic studies which have 6 historically dominated this space. We found that drug structure is a good predictor of teratogenicity, as multiclass ROC analysis 10 between 1,024-dimensional Morgan fingerprints and a three-pronged teratogenicity metric gave 11 AUC = 0.78 ( Figure 1). This result validates our hypothesis that a "form-fits-function" argument 12 is valid for predicting teratogenicity from homology between drug structure and pharmacophore 13 biochemistry among targets implicated in teratogenesis. From t-SNE analysis between drug structure and a binary encoding of teratogenicity (Figure 2), 16 we discovered clusters of teratogenic risk and the absence thereof, which are partially validated 17 within existing clinical literature ( Figure 3). Though t-SNE contains noise across most of the 18 diminished structure-teratogenicity landscape, the clusters we identified by visual inspection 19 were consistent in teratogenic risk. A reason for the limited tightness of the observed clustering 20 behavior may involve dimensionality mismatch between structure and teratogenicity data sets, 21 given that we plotted 1,024 structural motifs against only two (2) teratogenicity scores. However, 22 since generating ~10 3 independent teratogenicity scores and reducing chemical structure to ~10 1 23 categories are both unfeasible (this would remove the clinical and chemical significance of the 24 respective data sets), we cannot address this probable cause of loose clustering by adjusting the 1 form of the data we seek to associate. Despite these issues, our t-SNE step was a successful 2 proof-of-concept experiment, as we discovered functionalities that are known to be highly fetal 3 toxic and those that are known to be safe through this procedure. identity-which is non-teratogenic-it is reasonable to assert that the azetidinone functionality 15 and dihydrothiazine ring are non-teratogenic chemomarkers in this case. We recognize that the 16 burden of evidence is significant to claim that these motifs demonstrate protective effects. 17 Instead, we suggest that our results warrant more involved analysis of these potentially protective 18 moieties. 19 In contrast, similar analysis of "YES" clusters reveals three teratogenic chemomarkers, including 20 corticosteroids, fluoroquinolones, and acetylproline derivatives. While fluoroquinolones are 21 documented teratogens 75-78 , there is contention on the toxicity of steroid derivatives 79-81 , as well 22 as prolinated compounds 82-84 . Our model adds to this discussion by arguing that steroid 23 derivatives are indeed teratogenic. 24 We reasonably assume that the "YES" functionalities in Figure 3 are the source of teratogenicity 1 within molecules that contain them, given that these moieties are distinctive. This conclusion 2 requires MOA validation; however, as with fluoroquinolones, available phenotypic data appear 3 to support our conclusions on functional group toxicity. 4 Drawing on these mappings also allows us to evaluate new trends in drug development; namely, 5 we can extrapolate functional group mappings towards drug development targets in the anti- predicts to be the core teratogenic functionality within these drugs. As of date, only one small-10 molecule anti-hypercholesterolemic drug, ezetimibe (Zetia), does not belong to the statin class of 11 drugs 88 . Instead, ezetimibe contains a central azetidinone group and has been noted in reduced 12 teratogenicity across the expectant population, as compared to statins 89 . Given that we identify 13 azetidinone-containing drugs to carry potential protective effects, this result further edifies the 14 results from our model and speaks to the applicability of structure-teratogenicity relationship 15 modeling similar to that in this paper to inform downstream, data-driven inquiries into drug 16 safety for expectant populations. We emphasize that expansion of this study and downstream 17 mechanistic studies are required to fully substantiate our observations. 18 19 Our GBM predicts three classes of teratogenicity with 64.7% accuracy (SD = 3.0%) when 20 trained on 1,024-dimensional Morgan fingerprints. Thus, our model achieves nearly double the 21 predictive accuracy as a blind, probabilistic control for the same trivariate predictive task; QSAR 22 accuracy enrichment is nearly 32% on these baseline predictions. Because there exist no other 23 structure-activity relationships, meta-structure-activity relationships, or structure-assay-activity 1 relationships published in this space, we assert our model as a first attempt at applying drug-2 inherent information towards predicting teratogenicity. 3 3.2. Ontological limitations and barriers to data-driven studies in pregnancy 4 While the results above appear promising, the data that we queried in this investigation present 5 significant ontological challenges. These problems drastically reduce the sample size of all drug-6 specific teratology probes and present significant barriers to translational science, as we explain 7 below. 8 In this study, we encountered problems with procuring teratogenicity information, given that 9 teratology reference data are not published and updated in the relevant clinical literature very 10 often. Furthermore, existing clinical decision-making tools like UpToDate and Medscape do not 11 have APIs and contain contradictory teratology information that is not available in structured 12 formats-as is required for systematic, retrospective data analysis and ML modeling. FDA 13 resources containing teratology data are also not published in structured formats amenable for 14 computational research, despite the availability an API for FDA pharmacopeias like DailyMed. 15 For this investigation, the consequence of this limitation in available teratogenicity data was a 16 significant reduction in drug sample size, as available to t-SNE and GBM. Though we used one 17 of the arguably most powerful chemical computing software programs currently available (i.e., 18 MOE), we encountered sparsity in meta-structural predictions within our limited subset of drugs 19 with available structure and teratogenicity information. This restricted the power of our meta-20 structural t-SNE and GBM probes, resulting in no test power for a feature-selected meta- 21 structural and structural feature set. 22 Despite the gravity of the inherent uncertainty within available teratogenicity scoring criteria and 1 limited target rationale for teratogenesis, there exist no teratology-specific HTS platforms. 2 Though large toxicology HTS programs like Tox21 have screened targets that overlap with those 3 in Figure 1, this intersection remains small: only two (2) targets have coverage through Tox21. 4 Therefore, though real-world bioactivity information is inherently powerful, we were able to 5 access data on only two (2) relevant targets, and for only 128 drugs with structure and available 6 teratogenicity data and assay information. Only sixty-four (64) drugs had information available 7 for both RAR and HDAC, available structure data, and a known teratology score. Hence, a major 8 reason why the addition of Tox21 HTS data did not improve predictive accuracy or t-SNE 9 clustering over a purely structural model was limited sample size. This issue remains intractable, 10 given the inherently limited data resources currently existing available and little action on the 11 part of data providers to address these quality issues. driven on a lack of human data. This informs irresponsible prescribing behavior at the POC, 3 reducing the quality of care for pregnant women and their developing fetuses. However, given 4 the rigor of rules-based ML classification algorithms and limited "on-target" rationale for 5 teratogenesis, there is potential to systematically predict a compound's risk for fetal toxicity by 6 leveraging AI on drug-specific information, such as drug structure, meta-structure, and existing 7 real-world bioassay data, as a proxy for binding affinity to teratogenic targets. 8 In our study, we assert that drug structure is a good predictor of teratogenicity, using ROC 9 analysis, unsupervised ML (t-SNE), and a supervised GBM to discover relationships between 10 chemical functionalities within drugs prescriptible in pregnancy and existing teratogenicity 11 information. This allowed us to identify moieties that appear to predispose a drug towards an 12 increased chance of teratogenicity, based on existing use cases that are salient in relevant clinical 13 and drug development literature. We also identify significant barriers to translational research in 14 this space as rationale for the limited utility of existing meta-structural and toxicology HTS 15 platforms for teratogenicity prediction tasks. The importance of these ontological considerations 16 cannot be overstated in considering future research to improve the quality of data-driven 17 maternal-fetal medicine. 18 Our team of investigators has formed a first-in-kind research collaboration of engineers, 19 informaticians, and clinicians dedicated to the development of computational tools to predict more continuous spectrum of phenotype information through which to quantify teratogenicity. In 10 turn, this allows for easier validation of associative outcomes in silico and in vitro, as compared 11 to similar hits from QSAR. Indeed, drugs identified as teratogenic through MedWAS may be 12 referred to our QSAR model for validation, and vice versa. We have begun work on this 13 MedWAS and look forward to further exploring its intersections with our teratogenicity QSAR.   11 We declare no competing interests relevant to the execution or outcomes of this study.

12
Acknowledgements 13 We thank Asher Schachter, MD, Senior Vice President, Clinical, and Head of Pharmaceutical 14 Sciences at CAMP4 Therapeutics, for sharing teratogenicity data that he extracted from 15 SafeFetus. 16 Research reported in this publication was supported by the National Human Genome Research    ACE and ATII receptor inhibitors reduce perfusion to developing fetal tissues, which especially affects peripheral structures such as the distal limbs. These agents also decrease the tone of fetal vasculature, leading to cardiovascular morbidity. hydroxymethylglutaryl-coenzyme A (HMG-CoA) reductase HMG-CoA reductase inhibitors downregulate the conversion of HMG-CoA to mevalonic acid, an essential step in cholesterol synthesis. In the developing fetus, cholesterol is an essential progenitor of lipid regulators of the SHH gene, which affects fetal patterning and morphogenesis. Therefore, HMG-CoA inhibition is associated with severe fetal malformation and lipid deficiencies. histone deacetylase (HDAC) HDAC proteins are essential in regulating gene expression by promoting chromatin unwinding. Therefore, HDAC inhibitors lead to a wide spectrum of morbidities (e.g., axial skeletal malformations) and may be fetal lethal. cyclooxygenase-1 (COX-1) COX-1 inhibition is associated with cardiac, midline, and diaphragm defects, as the release of prostaglandins required for healthy morphogenesis is reduced by interference within the COX-1 signaling pathway. N-methyl-D -aspartate receptor (NMDAR) NMDAR inhibition is associated with gross structural defects within the brain, resulting from dysregulation of neuronal migration, synapse formation, and synapse elimination in the developing fetus. 5-hydroxytryptamine (5-HT) receptor, 5-HT transporter 5-HT is a neurotransmitter critical to craniofacial morphogenesis in development. Agents activating or inhibiting 5-HT-or promoting 5-HT reuptake-disrupt a critical 5-HT concentration, resulting in craniofacial malformations and other structural defects in the fetus.

γ
-aminobutyric acid (GABA) receptor GABA is a key inhibitory neurotransmitter that guides healthy testicular, ovarian, pancreatic, enteric, and palatal morphogenesis at a critical concentration. Enhancers of GABA receptor are significantly associated in malformation of these tissues and are therefore implicated in morbidities such as cleft palate and atresia of the gastrointestinal

Figure 3:
We discovered relationships between teratogenic risk ("YES", "NO") and the presence of distinct chemical functionalities from consistent structure-teratogenicity points within each discrete t-SNE cluster.