Deep Learning Predicts Patterns of Cardiotoxicity in a High-Content Screen Using Induced Pluripotent Stem Cell–Derived Cardiomyocytes

Drug-induced cardiotoxicity and hepatotoxicity are major causes of drug attrition. To decrease late-stage drug attrition, pharmaceutical and biotechnology industries need to establish biologically relevant models that use phenotypic screening to predict drug-induced toxicity. In this study, we sought to rapidly detect patterns of cardiotoxicity using high-content image analysis with deep learning and induced pluripotent stem cell–derived cardiomyocytes (iPSC-CMs). We screened a library of 1280 bioactive compounds and identified those predicted to have cardiotoxic liabilities using a single-parameter score based on deep learning. Compounds with major predicted cardiotoxicity included DNA intercalators, ion channel blockers, epidermal growth factor receptor, cyclin-dependent kinase, and multi-kinase inhibitors. We also screened a diverse library of molecules with unknown targets and identified chemical frameworks with predicted cardiotoxic liabilities. By using this screening approach during target discovery and lead optimization, we can de-risk early-stage drug discovery. We show that the broad applicability of combining deep learning with iPSC technology is an effective way to interrogate cellular phenotypes and identify drugs that protect against diseased phenotypes and deleterious mutations. GRAPHICAL ABSTRACT CONTRIBUTION TO THE FIELD In this article, Grafton and colleagues use induced pluripotent stem cell technology and deep learning to train a neural network capable of detecting patterns of cardiotoxicity. To identify bioactive and chemical classes that lead to cardiotoxicity, they combine the neural network with high-content screening of 2560 compounds. The methods described in this study can be used to de-risk early-stage drug development, triage hits, and identify drugs that protect against disease. This screening paradigm will serve as a useful resource for drug discovery and phenotypic interrogation of stem cells and stem cell–derived cell types.

cell-derived cell types.

ProBNP Assay 208
The proBNP/NPPB human sandwich ELISA kit (Invitrogen) was used to determine the 209 level of secreted human proBNP. Cell culture media was collected from wells containing 210 approximately 10 5 iPSC-CMs exposed to cardiotoxic drugs for 4 days. Media was 211 diluted to obtain a proBNP concentration between 0.137 and 100 ng/mL (1:3-1:5). A 212 standard mixture of recombinant human proBNP was diluted 1:3 from 0.137 to 100 213 ng/mL. Then 100 μL of diluted sample and standards were added to the human proBNP 214 using best practice parameters to ensure mapping validity and reproducibility (--seqBias 250 --gcBias --posBias --useVBOpt --rangeFactorizationBins 4 --validateMappings --251 mimicStrictBT2). Next, tximport (Soneson et al., 2015)  were omitted from the analysis. After this step, expression values were re-normalized to 258 TPM. After adding 1 as the pseudo-count, the expression matrix was log2-transformed. 259 For the initial assessment, principal component analysis (PCA) models were generated 260 in the R environment using the prcomp function, and the first two principal components 261 were used for visualization. To quantify the distances between each drug's samples and 262 generate a PCA-based similarity matrix, we used Euclidean distance in the PCA space, 263 as calculated by the pca2euclid function from the tcR package. We calculated the 264 distance using all replicates and used the averaged expression in replicates for each 265 drug. Visualizations were generated in R using the ggplot2 and ComplexHeatmap 266

packages. 267
To compare the differential gene expression analysis of these clusters with the 268 DMSO-treated group, we limited our downstream analysis to the genes in our curated 269 list of genes with high relative expression in cardiac tissue. We also applied the 270 following filters: (1) protein-coding genes (as defined by Ensembl) that had a median 271 log2-transform TPM expression of greater than 1 in at least one cluster and (2)

Seahorse Assay 287
The Agilent Seahorse XFe96 Analyzer was used to measure mitochondrial function in 288 iPSC-CMs. The 96-well Seahorse plates were coated with Matrigel (1/100 dilution) in a 289 phenol-free medium overnight. WTC iPSC-CMs were seeded at 30,000 cells per XFe96 290 well and recovered in TCM media for 1 week. Cardiotoxic compounds were diluted to 3 291 µM, 1 µM, 0.3 µM, and 0.1 µM in 1.0% DMSO in TCM media. Drugs were administered 292 in fresh media for 4 days. Following 4 days of exposure to drugs, the cells were washed 293 and incubated for 1 hour before the assay with Seahorse XF DMEM Basal Medium 294 supplemented with 2 mM glutamine, 2 mM pyruvate, and 10 mM glucose. The 295 Seahorse XFe96 cartridge was prepared according to manufacturer's guidelines. First, 296 basal oxygen consumption rate (OCR) was measured. Next, the Mito Stress Test was 297 performed with inhibitors injected in the following order: oligomycin (2.5 µM), carbonyl 298 cyanide 4-(trifluoromethoxy) phenylhydrazone (FCCP; 1 µM), and rotenone and 299 antimycin A (0.5 µM). OCR values were normalized to the total nuclear count per well 300 as quantified by Hoechst staining. Basal respiration was calculated as: (last rate 301 measurement from the first oligomycin injection) -(minimum rate measurement after 302 rotenone/antimycin A). Maximal respiration was calculated as: (maximum rate 303 measurement after FCCP injection) -(minimum rate measurement after 304 rotenone/antimycin A). Spare respiratory capacity was calculated as: (maximal 305 respiration) -(basal respiration). ATP production was calculated as: (last measurement 306 before oligomycin injection) -(minimum rate after oligomycin injection). 307 308

Proteasome Activity Assay 309
Proteasome activity was measured in iPSC-CMs using the Proteasome-Glo™ 310 Chymotrypsin-Like Assay according to manufacturer's instructions (Promega). Cells 311 were incubated with the inhibitors (ranging from 2 to 5000 nM) for 1 hour before running 312 the assay. IC50s were calculated using PRISM 8 software.

Optimizing the Deep Learning Model 369
Frozen iPSC-CMs were thawed and allowed to recover in 384-well plates to enable 370 high-content screening. These plates were divided into three categories: a training 371 plate, a validation plate, and library plates. The cells were then immunostained using 372 MYBPC3 before capturing images. Images from the training plate (dosed with known 373 cardiotoxins ranging from 10 nM to 10 μM for 4 days) were used to establish the deep 374 learning models. Images from the validation plate were used to test the accuracy of the 375 deep learning models. And images from the library plates were used to test the toxicity 376 level of compounds. iPSC-CMs in the library plates were exposed to 1280 bioactive 377 compounds at three doses (0.3 μM, 1.0 μM, and 3.0 μM for 4 days) ( Figure 3A). We compared the deep learning accuracies across the three models. All models 388 showed more than 95% accuracy in identifying the non-toxic DMSO-treated condition 389 (class 0). For the 2-class model, the validation showed 100% accuracy in distinguishing 390 the toxic from non-toxic conditions. For the 3-and 4-class models, the accuracies were 391 After training, the three deep learning models were independently used to score 395 the validation plate (which the neural network had not seen before). The results from the 396 three models on the validation plate are compared in a 2D plot in Figure 3C. The data 397 suggested that all three models showed a strong correlation upon validation (R 2 > 0.93, 398

Screening Window 406
We hypothesized that there may be an optimal cell density for screening that leads to 407 best model performance and Z-factor. The Z-factor is a statistical measure of assay 408 variability and reproducibility (Zhang et al., 1999). We posited that using a cell density 409 that was too high would tightly pack cells and mask features, whereas using a cell 410 density that was too low would not generate sufficient features for training. Interestingly, 411 as we increased the cell density, we did not observe a bell-shaped effect in the 412 performance of the deep learning models. We found that the higher the cell density, the 413 higher the model accuracy and, most importantly, the higher the Z-factor. At 414 screening window. For example, a minimum number of 1500 iPSC-CMs per well of a 418 384-well plate was required to reach a Z-factor of approximately 0.5, which is an 419 acceptable level for an HCS in cell-based assays (Supplementary Figure 3B). Training 420 and validation accuracies were tracked as a function of epochs, or the number of times 421 the entire training dataset passed through the neural network. Two representative cell 422 seeding densities were analyzed, which showed that a higher cell density supports 423 improved model performance (Supplementary Figure 3C). Higher cell densities yielded 424 higher cellular content for feature training, and they reduced cell edges and background. 425 In addition, more cell-cell contact between iPSC-CMs led to sarcomere alignment that 426 may reduce heterogeneity in training images. Given these factors, higher cell densities 427 achieved a Z-factor greater than 0.5, which is ideal for HCS (Supplementary Figure 3D). 428 429

Deep Learning Enables Detection of Toxic Compounds in an HCS Format 430
We calculated the dynamic range and Z-factors on the screening plates treated with 431 DMSO (non-toxic control) as well as bortezomib and doxorubicin (cardiotoxic controls) 432 ( Figure 4A). With enough technical replicates, we saw significant differences in the 433 nuclear count and sarcomere staining intensities ( Figure 4A). However, when 434 performing an HCS, only one or two technical replicates per test article are used. Thus, 435 using parameters such as nuclear count and staining intensity will not be reliable for 436 identifying hits in high-throughput screening (HTS) (Supplementary Figure 4A Figure 5C. We classified the protein targets of our top 25 predicted 476 cardiotoxic hits with a PRISM repurposing dataset. Next, we built a protein-protein 477 interaction network using the Search Tool for the Retrieval of Interacting Genes/Proteins 478 (STRING). In the resulting network, we identified seven clusters of protein families, 479 including DNA interactors, ion channel blockers, multi-kinase inhibitors, and cyclin-480 dependent kinase (CDK) inhibitors ( Figure 5D). 481 The top predicted cardiotoxic compounds included several FDA-approved drugs 482 that clinical and pre-clinical studies have linked to adverse cardiovascular events 483 (Supplementary Table 3 To determine if BNP is a reliable marker for cardiotoxicity, we treated iPSC-CMs 539 with the seven validated cardiotoxic compounds and evaluated the levels of BNP that 540 these cells secrete into the media using a ProBNP ELISA kit. We found an inconsistent 541 pattern of ProBNP levels and cardiotoxicity signal in the media of treated cells. Only 542 PHA-767491 (at all three doses) elevated ProBNP. In contrast, WZ8040, daunorubicin, 543 nitrendipine, and solifenacin dose-dependently reduced ProBNP ( Figure 6E). This 544 reduction may be due to poor health of iPSC-CMs after treatment with the cardiotoxic 545 compounds. We found that BI-2536 did not alter the levels of ProBNP and that 546 tegaserod only reduced BNP at the highest dose (3 μM) ( Figure 6E). genes with enriched expression in cardiac tissue. We did not see any major difference 564 in the clustering structure between these two methods. For simplicity, we show the 565 averaged PCA-based similarity matrix for each drug cluster in Figure 7B.

Deep Learning Predicts Structural Frameworks that Cause Cardiotoxicity 593
To predict chemical structure frameworks that lead to cardiotoxicity, we used our deep 594 learning approach to assess a library of 1280 diverse compounds with no known 595 targets. We used the same conditions as the bioactive library screen. However, to 596 optimize the accuracy and applicability of the diverse library screen, we established a 597 new set of training images and developed a new deep learning model that incorporated 598 all generated datasets. From the 1280 diverse compounds, our deep learning approach 599 identified 33 hits predicted to cause cardiotoxicity ( Figure 8A,B). Analysis of the 600 structure-activity relationship revealed three structural frameworks with two compounds 601 (also classified as matched pairs) in each framework set ( Figure 8C). For example, 602 Framework 2a (cardiotoxicity score: 1.41) and its matched pair Framework 2b 603 (cardiotoxicity score: 0.69) showed that adding an ethylene spacer increased the 604 cardiomyocyte toxicity score by two-fold ( Figure 8C). Based on this result, high-content 605 phenotypic analysis using deep learning is a powerful approach to identifying chemical 606 frameworks with cardiotoxic liabilities. By using this approach during the lead 607 optimization process, we can de-risk a clinical development program and potentially 608 reduce drug attrition in late-stage clinical trials. 609 cardiomyopathy, worsening heart rhythm, heart failure, and death due to cardiac arrest 615 (Moslehi, 2016). To identify proteasome inhibitors that may reduce cardiotoxic liability, 616 we profiled the potency of ten proteasome inhibitors using the Proteasome-Glo 617 biochemical assay ( Figure 8D). For each inhibitor, we calculated the biochemical IC50 618 and measured the cardiotoxicity score using deep learning. We identified six inhibitors 619 epoxomicin, delanzomib, ixazomib, and oprozomib. Carfilzomib, bortezomib, and 621 epoxomicin had the highest cardiotoxicity score; delanzomib showed the lowest 622 cardiotoxicity liability; and ixazomib and oprozomib showed medium levels of 623 cardiotoxicity ( Figure 8E). This analysis is an example of how this type of phenotypic 624 screening would help to de-risk a small-molecule program at an early stage. By focusing 625 on a chemical series (or pathway) that shows high levels of potency for the intended 626 target, we can identify compounds in that series that limit undesired toxicity in a specific 627 cell type with a major liability (such as cardiomyocytes and hepatocytes). In this study, we combined the power of human iPSC technology with high-content 631 image analysis using deep learning to detect signatures of cardiotoxicity in a high-632 throughput chemical screen. First, we treated iPSC-CMs with drugs known to cause 633 cardiotoxicity in the clinic. We combined the contractility readout, immunostaining, and 634 nuclear count to stratify doses of compounds into defined categories: mildly toxic, toxic, 635 and highly toxic. Next, we trained a neural network with fluorescently labeled images to 636 classify the images by degrees of cardiotoxicity with a data-driven method. We then 637 constructed three deep learning models to understand how the categories would differ 638 based on how the toxicities were classified with imaging. We found that all models 639 strongly agreed in identifying hits predicted to cause cardiotoxic liabilities. 640 Our screening identified classes of compounds that clustered into distinct 641 mechanisms of action and predicted targets. Some of the compounds with the most 642 likely cardiotoxic liabilities included DNA intercalators and ion channel blockers, as well 643 as EGFR, CDK, and multi-kinase inhibitors. We validated these hits using a second 644 round of deep learning and orthogonal assays that measured mitochondrial respiration 645 and evaluated BNP as a marker of stress. 646 Cardiomyocytes in the adult human heart contain approximately 30% 647 mitochondria by cell volume. These mitochondria critically regulate many cellular 648 processes, such as metabolism, oxidative stress, cell survival, and apoptotic death 649 has been the "black box" nature of the method, or not knowing exactly what features the 695 deep learning net is identifying (Moen et al., 2019). We propose that the black box is "a 696 feature and not a bug" of deep learning. We believe the black box allows researchers to 697 interrogate a large set of perturbations in a cell-agnostic, unbiased, and high-throughput 698

manner. 699
In this study, we showcase the power of deep learning in uncovering new 700 biological insights and how the technology can be more accessible to biologists. Deep 701 learning can be used as an additional tool to interrogate subtle and difficult-to-define 702 phenotypes. Our findings support that deep learning is a highly sensitive method to 703 detect cardiotoxicity phenotypes that are undetectable by traditional single-readout 704 assays. While the scope of our study was limited to identifying cardiotoxicity signatures 705 in iPSC-CMs, the method can be applied to identifying toxicity in other cell types. 706 Screening using iPSC-derived cells with deep learning can also be applied to de-risk 707 early-stage drug discovery and ensure that compounds with on-and off-target toxicity 708 are triaged at an early stage in drug development. In particular, deep learning can be 709 used to identify targets with high-throughput perturbation screening (using small We thank Crystal R. Herron for editing the manuscript. We thank members of Tenaya's 731 drug discovery team for technical assistance and helpful comments on the manuscript.