Deciphering COVID-19 host transcriptomic complexity and variations for therapeutic discovery against new variants

Summary The molecular manifestations of host cells responding to SARS-CoV-2 and its evolving variants of infection are vastly different across the studied models and conditions, imposing challenges for host-based antiviral drug discovery. Based on the postulation that antiviral drugs tend to reverse the global host gene expression induced by viral infection, we retrospectively evaluated hundreds of signatures derived from 1,700 published host transcriptomic profiles of SARS/MERS/SARS-CoV-2 infection using an iterative data-driven approach. A few of these signatures could be reversed by known anti-SARS-CoV-2 inhibitors, suggesting the potential of extrapolating the biology for new variant research. We discovered IMD-0354 as a promising candidate to reverse the signatures globally with nanomolar IC50 against SARS-CoV-2 and its five variants. IMD-0354 stimulated type I interferon antiviral response, inhibited viral entry, and down-regulated hijacked proteins. This study demonstrates that the conserved coronavirus signatures and the transcriptomic reversal approach that leverages polypharmacological effects could guide new variant therapeutic discovery.


IMD-0354 reversed these signatures and inhibited new variants with nM IC50s
The antiviral potential of IMD-0354 benefits from its polypharmacological effects

INTRODUCTION
Since early December 2019, the newly emerged SARS-CoV-2 virus has infected more than 600 million people globally and claimed more than 6 million deaths (as of September 2022). In addition, severe COVID-19 has caused irreversible organ injuries in a large number of patients (Abbasi, 2021;Ankit et al., 2020;Nalbandian et al., 2021). The systems of host cells responding to SARS-CoV-2 infection under various models (cell lines, organoids, animals, and patients) and conditions (dosages, times, and patient severity) have been molecularly characterized using diverse omics profiling and perturbation technologies, often resulting in a list of candidate targets in individual studies (Blanco-Melo et al., 2020;Bojkova et al., 2020;Daniloski et al., 2020;Delorey et al., 2021;Demichev et al., 2021;Desai et al., 2020;McClain et al., 2021;Overmyer et al., 2021;Yang et al., 2021). Small molecules modulating these targets were subsequently proposed to treat COVID-19 (Chu et al., 2021;Samelson et al., 2022, p. 2). Together with phenotypic screening hits, more than 200 candidates surfaced in publications. However, few studies fully appreciated the complexity and variation of the virus-induced molecular changes in host cells The COVID-19 Gene and Drug Set Library, 2020) and the prevalence of polypharmacological effect of small molecules (Lin et al., 2019), partially accounting for the moderate efficacy of repurposing candidates. Rapidly emerging variants still impose increasing threats because of their higher transmissibility, highlighting the pressing need of untangling the complexity and variation of transcriptional programs in host cells to aid the discovery of drugs for new variants and future pandemics.
Recent studies revealed that published anti-SARS-CoV-2 drugs share similar transcriptional programs  and 16 viruses tend to present conserved transcriptional regulation modules of immune responses (Zheng et al., 2021), suggesting the prevalence of underlying molecular mechanisms connecting antivirals and host responses. In cancer, reversal of transcriptional expression correlates with drug efficacy (Chen et al., 2017). We thus first sought to answer if published anti-SARS-CoV-2 drugs tend to reverse the global gene expression of infected host cells as observed in cancer. The central idea of the reversal iScience Article approach (or called systems-based approach) is that the antiviral drugs could suppress the over-expressed genes and activate the repressed genes, regardless of drug mechanisms and biological systems involved ( Figure 1A). Next, we asked if the findings could guide the identification of a robust transcriptional program that could drive the discovery of drugs for new SARS-CoV-2 variants.
We first surveyed published COVID-19 host signatures and observed that most of them were not reversed by known anti-SARS-CoV-2 drugs. We then interrogated the biological processes implicated in the 374 raw coronavirus (CoV) signatures derived from 1,700 open host transcriptomic samples of SARS, MERS, and SARS-CoV-2 infection and found that only 6% of them could be reversed by anti-SARS-CoV-2 drugs, suggesting the challenge of direct deployment of these raw signatures without denoizing the signal. We proposed a novel data-driven approach to tease out a set of robust CoV signatures by distilling knowledge from compounds effective against other species in the coronaviridae. Reversal of these signatures identified a potent drug candidate, IKK2 Inhibitor V (IMD-0354), against SARS-CoV-2 and five variants with nanomolar IC 50 in vitro. RNA-seq profiling and systems biology analysis further confirmed it reversed a global host transcriptional program through targeting multiple virus-altered pathways rather than interacting with the primary target IKK2.

Reversal of published COVID-19 signatures correlates poorly with antiviral efficacy
We compiled 18 COVID-19 signatures from the supplementary materials of nine studies, which profiled the host transcriptomic changes in lung cell lines, organoids, mice, and patients with SARS-CoV-2 infection (Table S1). Pathway enrichment analysis suggested that most of the published signatures displayed induced interferon and inflammatory responses, whereas some showed diverse or even contradictory patterns ( Figure 1B). For example, the cholesterol homeostasis genes are up-regulated in an organoid model but down-regulated in a Calu-3 cell line model ( Figure 1B). In addition to model differences, varying biological conditions could drive the diversity, as supported by the separation of the signatures derived from the same mouse model under different days ( Figure 1B).
Because these signatures were developed to answer specific biological questions, to verify if they could be adopted for the systems-based drug discovery ( Figure 1A) we then sought to leverage approximately 200 anti-SARS-CoV-2 repurposing candidates identified through the collective efforts worldwide. Most of the hits showed moderate activity or toxicity (Martinez, 2021;Singh et al., 2020) (Figures 1C and S1A, and Data S1), indicating a need for drugs to arrest viral infection effectively and safely, especially for new variants. Of note, compounds selectively inhibiting viral proteins (e.g., 3CL) or viral entry receptors (e.g., ACE2 and TMPRSS2) might not present transcriptomic reversal of the dysregulated genes in the infected cells. By mapping to a few published screening results (Brimacombe et al., 2020), 38 positive compounds in our collection were found active to inhibit 3CL or Spike-ACE2 binding (efficacy >50% and AC 50 < 50 mM, Data S1). However, their antiviral activities at the cellular level are not always consistent with the target inhibitory activities, suggesting that targets related to host transcriptional programs were involved. Besides, 92 compounds in our list exhibit no obvious inhibition of the three targets, indicating they might act on the host targets. Among all 18 published COVID-19 signatures, only one showed a significant positive correlation between compound antiviral EC 50 values and the scores of CoV signature reversal (sRGES) ( Figure 1D) and only five could enrich positive hits ( Figure S1B). A few even displayed a negative correlation and a negative enrichment ( Figures 1D and S1B), meaning these signatures could lead to a hit rate even lower than random selection. This initial survey of published signatures suggests their considerable variation and a mixed-signal of informing drug discovery, thus a systematic investigation of all published transcriptome profiles is needed.  To select valid disease signatures that capture the pathological biology of CoV infection, we developed a data-driven pipeline utilizing known antiviral compound-induced transcriptomic profiles. Of note, for emerging crises caused by SARS-CoV-2 and its variants, the infected host profiles and active drugs are not often readily available; thus, we aim for a robust model trained from known virus variants data that could be extrapolated to respond to future variants. For example, existing host gene expression profiles of samples infected by SARS-CoV or MERS-CoV might approximate those infected by SARS-CoV-2 based on their high genomic similarity . The high correlation of in vitro drug efficacy between anti-SARS-CoV and anti-MERS-CoV (Spearman correlation coefficient: 0.6, Figure S1C) further confirmed that drugs active against SARS-CoV or MERS-CoV might provide knowledge in SARS-CoV-2 drug screening.
In this pipeline (Figure 2A), we first processed microarray and RNA-seq data from different published studies of SARS-, MERS-and SARS-CoV-2-induced host gene expression change, including both preclinical models and COVID-19 patient samples, which generated various comparisons between infection and control or different infection stages. Based on their reversal pattern to the transcriptomic profiles of known small molecule CoV inhibitors, we defined valid CoV disease signatures, which were subsequently verified in multiple COVID-19 patient cohorts. Then a consensus prediction using valid CoV signatures was performed to propose potent drug candidates against SARS-CoV-2 and its variants of concern (VOC) followed by in vitro validation. Finally, we investigated its antiviral mechanism based on RNA-seq profiling and systems biology.
In total, 1696 host RNA-seq and microarray samples related to SARS/MERS/SARS-CoV-2 infection were collected from 57 published studies (Data S2). SARS-CoV-2 profiles account for more than half of all samples, much more than SARS and MERS samples (14%) because of the improved profiling technology and the greater impact of SARS-CoV-2 ( Figure 2B). We considered COVID-19 patient samples (47.2%) spanning several severity levels (asymptomatic, mild, and severe, Figure S2A) and different tissue of origins (primarily from blood and lung) as well as preclinical (52.8%) samples including human cell lines (human airway epithelial (HAE) cells and other CoV susceptible cells like Calu-3), organoids, and mouse/ferret in vivo models ( Figures 2B and S2B). An integrative analysis of these samples from diverse patient cohorts and experimental settings is deemed to provide a comprehensive picture of CoV-induced host responses. Again, most of the raw signatures implied over-activated interferon and inflammatory responses of the host ( Figure 2C), which are natural defensive reactions for viral infection blockade and should not be suppressed by antiviral drugs. The patterns of other pathway enrichment are diverse across raw CoV signatures, corroborating with the survey of published signatures. For example, NF-kB, JAK/STAT, and cholesterol homeostasis could be both up-and down-regulated in different experiments. Together, reversing one or a few of the raw CoV signatures might not lead to a potent antiviral drug candidate.
To explore the potential of this pipeline in anti-CoV compound discovery, we evaluated its performance when (1) predicting SARS-CoV-2 inhibitors from SARS and MERS data only and (2) predicting external SARS-CoV-2 inhibitors from data of all three viruses.
For the first task, 430 samples from public repositories, representing infections by MERS-CoV or SARS-CoV (and a few other strains for comparison) in different models (e.g., cell line and mouse models) across multiple time points were used to create raw disease signatures ( Figure 2A and Table S2). Their expression profiles were generated using either microarray or RNA-seq. Data processing and signature creation methods were tailored for different profiling platforms (see STAR Methods). We enumerated all possible comparisons ( Figure 2A), including (1) comparisons between infection and mock groups at each time point, and (2) comparisons between different time points within individual infection or mock groups (e.g., time point 1 versus time point 0, time point 2 versus time point 1). Each comparison resulted in a raw disease signature used to characterize the infection status, followed by calculating the sRGES of different drug transcriptomic profiles from the LINCS project . To evaluate the quality and pathologic relevance of each raw disease signature, we used positive drugs identified from in vitro MERS/SARS-CoV testing (38 positive drugs with known LINCS profiles, 29 with EC 50 values; Figure S1C, Table S3 and Data S1    iScience Article ( Figure S3). In contrast, we did not observe significant enrichment of positive control drugs using H1N1 infection signatures. We also confirmed the predictive power of this pipeline using 52 compounds with reported anti-SARS-CoV-2 in vitro efficacy (Data S1). Although derived from SARS and MERS profiles, each valid signature still enriched positive compounds (EC 50 < 10 mM) at the top, whereas the invalid signatures could not ( Figure 3A). Also, the average rank could recall the published SARS-CoV-2 inhibitors with an AU-ROC of 0.78, and it correlated with reported anti-SARS-CoV-2 EC 50 (Spearman R = 0.51, p = 1.21E-04) (Figure 3A). For prospective validation, a drug repurposing library including 1720 bioactive compounds with their LINCS profiles was screened against the valid CoV signatures for COVID-19 drug discovery based on a consensus score of the median rank of each drug across different CoV signatures (Data S5). In this pilot screening, seven out of ten compounds proposed by this pipeline prevented the SARS-CoV-2-induced cytopathic effect (CPE) within 10 mM (Table S4). These results indicated that the selected SARS and MERS signatures capture the essential CoV pathological biology and the strategy of reversing them is applicable for drug discovery against COVID-19 caused by SARS-CoV-2.
Gene Ontology (GO) enrichment analysis suggested common pathways among the 13 valid CoV signatures ( Figure 3B, Data S6). Host regular activities that were inhibited by CoV infection include protein phosphorylation and modification (GO:0001934 and GO:0006464), intracellular signal transduction (GO:1902533), and DNA transcription (GO:0045893). The viruses also developed immune evasion (Kasuga et al., 2021) to suppress cytokine-mediated signaling (GO:0019221) and host defense response (GO:0031349). Gene products located at endosomes and lysosomes (GO:0010008 and GO:0005764) were also down-regulated, probably because the viruses hijacked these organelles for their own manufacturing. Meanwhile, the host protein biosynthesis machinery was highly activated for the viral replication, such as macromolecule biosynthesis (GO:0034645), cotranslational protein targeting to the membrane (GO:0006613), peptide biosynthesis (GO:0043043), large ribosomal subunit (GO:0015934), and mitochondria (GO:0005759) (Gatti et al., 2020;V'kovski et al., 2021). In line with our previous study , the microtubule cytoskeleton system was utilized for viral transportation. Of interest, cell cycle pathways were also up-regulated, including DNA replication and repair (GO:0006260 and GO: 0006281), mitotic cell cycle transition (GO:1901990), and P53 signaling (GO:1901796).
For the second task, 49 RNA-seq datasets of SARS-CoV-2 infection were incorporated to generate valid SARS-CoV-2 signatures, yielding a set of 23 valid SARS/MERS/SARS-CoV-2 infection signatures in total (i.e., robust CoV signatures v2, Figure 3C, Data S7). In an external positive drug set (Data S1), the reversal of CoV signatures v2 could predict known CoV inhibitors (AU-ROC = 0.76) and significantly correlated with drug EC 50 values (Spearman R = 0.47, p = 0.005, Figure 3C). Most of the invalid CoV signatures showed no enrichment of positive drugs (grey curves in Figures 3A and 3C), suggesting their irrelevance with anti-CoV drug discovery. Of interest, several invalid CoV signatures enriched the positive drugs at the right end, which indicates the mimicking of host antiviral response by those host-targeting drugs. Similar patterns were observed in the published CoV signatures as well.
The valid CoV signatures captured essential biology of how CoVs hijack the host cell machinery. Through GO enrichment analysis on the updated CoV signatures (Data S6), we observed that general pathway enrichment patterns remained similar to CoV signatures v1, i.e., inhibiting host activity and antiviral response (Kasuga et al., 2021;Lei et al., 2020;Schroeder et al., 2021;Xia et al., 2020) while boosting protein and RNA biosynthesis ( Figure 3D). More specific than v1, the CoV signatures v2 highlighted down-regulation of cell migration (GO:0030335) and neutrophil-mediated immunity (GO:0002446). Another minor difference is that virus hijacked ATP synthesis on mitochondria inner membrane (GO:0042775) was more significant in v2 signatures, whereas v1 enriched genes were expressed in the mitochondrial matrix. The valid CoV signatures (v2) maintain some diversity in biological pathways and form complementary clusters ( Figure S4A). Compared with the above mentioned published COVID-19 signatures, the selected valid CoV signatures are richer in pathway presentation ( Figure S4). Unlike the published signatures, which mainly iScience Article focus on interferon pathway over-activation, the valid signatures proposed in this work captured the immune evasive nature of these CoVs at the early stage of infection ( Figures S4C and S4D), which is essential in regulating host antiviral processes to fight against different variants.
We also sought to answer the question of why, among hundreds of infection versus control comparisons, only 23 were informative for drug screening. First, CoV infection dysregulated pathways showed diverse and complicated dynamic patterns over different time points and varied across different research models ( Figure S5A), meaning not all comparisons are valid for drug screening. Second, by examining the SARS-CoV-2-induced transcriptomic changes in different tissues, we found that infection profiles of lung and PBMC followed similar patterns with the CoV meta-signature v2 (Spearman Rho: 0.62 and 0.48), whereas no obvious pattern was observed in liver, kidney, and heart samples ( Figure S5B). Because the lung is the primary target organ for COVID-19, and PBMCs circulating in the blood are highly exposed to SARS-CoV-2, the transcriptomic changes in these two organs are considered appropriate to characterize the viral infection. These observations emphasize the power of data-driven approaches in teasing out useful signals for therapeutic discovery.

The CoV signature genes separate COVID-19 patients from controls
To confirm that the selected disease signatures capture essential COVID-19 pathological processes, we first created a summarized CoV meta-signature based on the 13 valid signatures (v1), with 88 genes having a positive effect size and 43 genes having a negative effect size. Furthermore, we collected 14 studies comprising 21 patient cohorts (Data S8) with 560 whole-transcriptomic samples from different organs (e.g., lung, heart, and blood) of COVID-19 patients, healthy donors, and patients who had died of non-infectious diseases. For each cohort, principal component analysis (PCA) was performed using the 131 CoV signature genes or 131 random genes to visualize the separation of COVID-19 patient samples from others ( Figure 4A). As expected, the CoV signature genes significantly outperformed random genes in separating COVID-19 samples from those of healthy donors or other diseases ( Figure 4B, within the 2D space defined by the first two principal components). For example, in a study of lung autopsies of deceased patients, CoV signature genes correctly classified 90% of the COVID-19 patients, whereas the random genes only showed a 70% accuracy ( Figure 4C, dataset SRP261138). Note that when two groups of samples present distinct transcriptomic profiles, even the random genes could separate them although the set of CoV signature genes performs better. The CoV signature genes were also found to associate with symptom severity. Using data from a study of lung autopsy (SRP265869), we observed 95% and 90% of classification accuracy were achieved by the selected genes for degree 2 and 3 lung damaged patients, yet the separation was reduced to 86% in degree 1 ( Figure 4C). Using the data from a whole-blood study (SRP274382), the separation accuracy increased as severity elevated (the accuracies in severity degree of 0, 1, and 2 were 50%, 82%, and 85%, respectively) ( Figure 4C). Similarly, in studies SRP279280 and SRP293106, the CoV signature performed better in intensive care unit (ICU) patients than in non-ICU patients and better in hospitalized patients than in non-hospitalized patients ( Figure 4C). Similarly, we evaluated the CoV meta-signature using data from SARS-CoV-2 infected preclinical models (cell lines and organoids) and found the separation was also significant ( Figure S6). These results suggested that the proposed CoV meta-signature successfully characterized the critical features of SARS-CoV-2 infection.
The CoV signatures led to the discovery of IMD-0354 as a potent candidate against SARS-CoV-2 and its variants Next, we applied the valid CoV signatures v2 to drug prediction for SARS-CoV-2 variants. The candidate ranking list was also updated (Data S9), with IMD-0354 (IKK-2-inhibitor-V) at the top. Its transcriptomic profiles presented a clear reversal pattern compared with CoV meta-signature v2 ( Figure 5A). In line with the prediction, IMD-0354 demonstrated potent inhibitory activity against six variant viruses in VeroE6 cells, with IC 50 values around 50 nM for the lineage A virus (ancestral SARS-CoV-2), Alpha, Beta, Delta, and Kappa variants, and 107 nM for the Gamma variant. With a CC 50 of 14 mM, IMD-0354 showed a wide therapeutic window of 132-folds ( Figure 5B). Kato et al. (Kato et al., 2020, p. 0354) also reported its prevention of the ancestral SARS-CoV-2-induced CPE on VeroE6/TMPRSS2 cells, confirming our results. Strikingly, IMD-0354 is 90-fold more active than remdesivir (IC 50 2-10 mM, Figure 5C) to the six variants tested. In Calu-3 cells, the antiviral IC 50 of IMD-0354 was 0.52 mM with no observable cytotoxicity ( Figure 5D), whereas the IC 50 of remdesivir was 1.27 mM ( Figure 5E). IMD-0354 is structurally similar to niclosamide, an anthelmintic that elicits broad-spectrum antiviral activity (Xu et al., 2020)  iScience Article trial and is under Phase II investigation for chronic obstructive pulmonary disease (COPD) (NCT00883584) and pulmonary fibrosis (IMMD Drug Discovery Business), indicating its high safety and reasonable bioavailability for lung diseases.
We also tested the cellular antiviral activity of four other compounds showing varying predicted priorities, i.e., puromycin (ranked 22), methotrexate (ranked 110), methylene blue (ranked 148), and dasatinib (ranked 391) ( Figure 5A). Consistent with the ranking, puromycin inhibited Kappa viral replication with an IC 50 of 0.88 mM and inhibited other variants with IC 50 values ranging from 3 to 15 mM, suggesting a moderate activity ( Figure 5F). Methotrexate and methylene blue were active only at the highest concentration tested (Figures 5G and 5H). Dasatinib, the least prioritized one, although showed moderate activity (IC 50 $ 20mM), its CC 50 was even lower (4.65 mM), suggesting a nonspecific inhibition to the host cell activities instead of specifically targeting the CoV signatures ( Figure 5I). These in vitro validation results confirmed

Selected Candidates
Ethambutol (1299) Betulinic Acid (1355) Random Compounds  iScience Article the robustness of our CoV signatures and the rationale of drug screening based on the reversal of infected host gene expression. Together, we propose IMD-0354 or its prodrug as a drug candidate for SARS-CoV-2 and its variants.

IMD-0354 inhibits viral infection through type I interferon stimulation and other polypharmacological mechanisms
To gain better insights into the mechanism of action (MoA) of IMD-0354 on viral inhibitory processes, we performed RNA-seq based on three sets of experiments, including control, SARS-CoV-2 infection, and IMD-0354 treatment followed by SARS-CoV-2 infection in Calu-3 cells ( Figure 6A). We used the ribo-minus approach for RNA-seq library preparation to capture the host mRNA as well as viral RNA. After mapping the sequencing reads to the human transcriptome, we obtained 4089 differentially expressed (DE) genes in the infection group compared to the control, and 4828 DE genes in the treatment compared to the infection group ( Figure 6B and Data S10). The gene expression clustering presents a counteracting pattern between the summarized meta-CoV signature and the IMD-0354 treated group ( Figure 6C), confirming our initial hypothesis that IMD-0354 could reverse the COVID-19 infection-induced gene expression pattern. We also mapped the sequencing reads to the SARS-CoV-2 genome and observed minimal/no expression of viral genes in control and treatment samples, yet high expression of viral genes in infected samples (Figure 6D). In detail, the viral M, N, ORF7a, ORF8, ORF9b, and ORF9c showed higher expression (read counts normalized with gene length) than other genes (Wilcox rank-sum test, p-value < 0.05, two-sided; Figure 6E); whereas NSP11, ORF2b, ORF3b, ORF3c, and ORF7b were least expressed compared with other genes. Of interest, in IMD-0354-treated samples, expression of all viral genes was at the same and low level, suggesting complete viral inhibition ( Figure 6F).
We further investigated these RNA-seq samples to delineate the MoA. With a minimal amount of viral RNA in the treatment group, the RNA-seq profiles mainly captured the effect of IMD-0354 pre-treatment on host cells. Compared with the merged CoV signature (v2, mean log 2 fold change), several innate immune response genes inhibited by the virus were significantly up-regulated in the IMD-0354 treated group, including IFNB1, IL1B, IL6, and TNF ( Figure 7A and Data S11). GO enrichment analysis also suggested up-regulation of cytokine-mediated signaling, viral process negative regulation, and NF-kB transcription factor activation ( Figure 7B and Data S12). These results suggest that IMD-0354 blocks viral replication by boosting the antiviral response in lung epithelial cells. Based on the assumption that perturbagens with similar transcriptomic profiles share similar MoA ( Figure 7C), we used the IMD-0354 RNA-seq profile to query gene knock-down and over-expression profiles in LINCS. The transcriptomic changes induced by over-expression of IFNG, TIRAP, or IFNB1 mimicked IMD-0354 with high RGES values of 0.83, 0.63, and 0.58, respectively (Data S13). Similarly, the whole-transcriptomic profiles of interferons on small airway epithelial cells also exhibited similar patterns with IMD-0354 (RGES $0.5); whereas ruxolitinib, a JAK inhibitor, showed no correlation (RGES = 0.02, Figure 7D and Data S10, data source: GSE161664). By incubating Calu-1 cells with IMD-0354 (without subsequent viral infection), we also observed increased expression of IFN-b, CXCL10, and IL-6 with 15, 9, and 36-folds, respectively; and the activation of these genes was counteracted by BX795, a TBK1/IKKε inhibitor that blocks IFN-b production ( Figure 7E and Table S5). These results suggest that IMD-0354 pre-treatment activates interferon-mediated antiviral signaling, which has been proven to inhibit SARS-CoV-2 infection . IMD-0354 did not over-activate ACE2, a receptor of SARS-CoV-2 and also an interferon-stimulated gene (log 2 fold change 0.47, Data S10). IMD-0354 did not stimulate IFN-b or IL-6 in macrophages, meaning this drug might not worsen the cytokine storm caused by COVID-19 ( Figure S7). Although BX795 and ruxolitinib inhibit type-I interferon response, neither of them abrogated the antiviral effect of IMD-0354 ( Figure S8), suggesting other potential pathways regulated by IMD-0354. In addition, the knock-down of CHMP2A, a core component of the endosomal sorting required for transport complex III involved in virus budding (Votteler and Sundquist, 2013), also resembled the IMD-0354 RNA-seq profile (RGES = 0.69, Figure 7F and Data S14). CHMP2A directly interacts with ORF9b of SARS-CoV-2 (Gordon et al., 2020), which was found hyperactively expressed during infection but suppressed by IMD-0354 treatment ( Figures 6E and 6F). Thus, we reason that IMD-0354 might down-regulate  Figure 7F and Data S14    iScience Article might not contribute to its antiviral effect ( Figure 7F). Moreover, IMD-0354 inhibits TMPRSS4 (Kang et al., 2013), one of the crucial enzymes for SARS-CoV-2 entry (Zang et al., 2020), which could partially explain its higher EC 50 in Calu-3 cells than VeroE6 ( Figures 5B and 5D), as TMPRSS4 is elevated in Calu-3. Taken together, we propose that IMD-0354 inhibits SARS-CoV-2 replication mainly through inducing type I interferon-mediated antiviral response, together with multiple polypharmacological mechanisms including TMPRSS4 inhibition mediated viral entry blockade, CHMP2A down-regulation mediated virus budding inhibition, and regulation of host proteins hijacked by the virus (Figure 7G).

DISCUSSION
SARS-CoV-2 induces variable responses in humans, causing highly diverse pathological symptoms. Thus, a therapeutic agent that targets only a single pathway or gene may only lead to weak inhibition. Systembased approaches hold great promise to complement the traditional target-based approaches and have been employed to propose drug repurposing candidates for COVID-19 (Bocci et al., 2020;Le et al., 2021;Zhou et al., 2020). One of the system-based strategies is to propose drugs to reverse disease gene expression (Qu and Rajpal, 2012), which has demonstrated successes in drug repurposing in various diseases (Chen et al., 2021a;Dhindsa et al., 2021;Mun et al., 2020;Qu and Rajpal, 2012). More than 1,000 RNA-seq profiles of SARS-CoV-2 infected patients and preclinical models are publicly available, forming a rich resource of COVID-19 signatures for repurposing drugs. However, the pathological relevance of the CoV infection signature is critical for antiviral drug discovery. For example, those COVID-19 signatures focusing on over-activated host immune response do not present a clear pathological effect of virus hijacking the host cell machinery or immune escaping. We found the raw CoV signatures displayed diverse patterns of enriched pathways, and only 6% of them could recover published anti-SARS-CoV-2 active compounds, which motivated us to develop a computational system-based framework to select disease signatures relevant to therapeutic discovery against SARS-CoV-2 and its current and emerging variants.
Inspired by the high correlation of in vitro drug efficacy data between SARS-CoV and MERS-CoV, we distilled knowledge from published inhibitors of related coronaviruses to evaluate the pathological relevance of the CoV signatures and applied the valid signatures to drug discovery against new variants. Because SARS-CoV, MERS-CoV and SARS-CoV-2 have highly similar genomes , it is likely that they elicit similar dysregulations of cellular machinery that affect host genes and pathways. The positive enrichment of the SARS-CoV-2 inhibitors in the hits predicted from the valid SARS and MERS signatures further supported that drug candidates identified with this method would be active against new variants. Indeed, the promising in vitro results of the top candidate IMD-0354 in the inhibition of SARS-CoV-2 and six VOC supported our hypothesis. Thus, by using legacy profiles from the same virus family and compound efficacy against previous viruses, we can quickly narrow down the candidate list for emerging viruses whose profiles are often not readily available.
Among hundreds of possible signatures, only a small fraction of them are informative in drug prediction. Our data-driven approach elegantly teased out those informative signatures, resulting in a robust metasignature of CoV-hijacking host transcriptomic change. Although this signature was initially derived from SARS and MERS data, it retains the power to reclassify the SARS-CoV-2 infected and control patient samples in multiple independent datasets. Furthermore, the valid SARS-CoV-2 signatures were merged with SARS-CoV and MERS-CoV signatures to create a refined CoV signature. Pathway analysis of these valid signatures revealed the dysregulation of several known viral pathways. For example, the antiviral immune response was down-regulated in our signatures and was in concordance with previous reports (Kasuga et al., 2021;Xia et al., 2020). In addition, replication, mitochondrial ATP synthesis, translation, and peptide synthesis-related pathways required for viral multiplication were up-regulated in these signatures (Chen et al., 2021b;deBreyne et al., 2020;Gatti et al., 2020;V'kovski et al., 2021). A drug candidate exhibiting strong antiviral effects against SARS-CoV-2 is expected to target one or multiple dysregulated pathways.  iScience Article The RNA-seq analysis of our drug-treated samples and SARS-CoV-2 infected samples showed the inverse relationship on different pathways, in agreement with our hypothesis that IMD-0354 targets multiple pathways dysregulated by the virus. In addition, the viral gene expression analysis depicted the hyperactivation of SARS-CoV-2 proteins, including M, N, ORF7a, ORF8, ORF9b, and ORF9c in infected samples. The M protein and ORF7a are known for assembling and budding viral particles and are involved in recruiting structural proteins to ER-Golgi intermediate compartments (Nelson et al., 2005;Voss et al., 2009). The N protein is involved in genome protection, viral RNA replication, virion assembly, and immune evasion (including IFN-I suppression) (Cubuk et al., 2021;Mu et al., 2020, p. 1;. The ORF8 protein is crucial in viral assembly and immune invasion via inhibiting type I interferon signaling (Li et al., 2020, p. 6;Zhang et al., 2021, p. 6). The ORF9b protein dysregulates mitochondrial function (Shi et al., 2014;Wu et al., 2021), and the ORF9c protein interacts with various host proteins including Sigma receptors, implying involvement in lipid remodeling and the ER stress response (Gordon et al., 2020). The absence of these genes in IMD-0354 treated cells suggests a complete viral infection inhibition under 0.5 mM.
The comparison of the merged CoV signature and IMD-0354-induced transcriptomic profiles revealed that IMD-0354 activated interferon pathway-related genes (e.g., IFNB1, IL1B, IL6, and CXCL8). Several studies suggested that type I interferon activation leads to the blockage of SARS-CoV-2 infection (Kim and Shin, 2021;Lei et al., 2020;Pierce Carl et al., 2020;Schroeder et al., 2021;van der Wijst Monique et al., 2021;Xia et al., 2020). Independent validation by qPCR further confirmed that IMD-0354 stimulated the interferon pathway even without any viral challenge. This was also supported by the analysis of viral proteins, where we observed activation of viral N protein, an interferon inhibitor, in infected samples but down-regulation of viral N protein in the IMD-0354 treated samples. In addition, down-regulation of CHMP2A could lead to viral inhibition as it is a member of the endosomal sorting complex required for transport (ESCRT), a cellular machinery hijacked by the virus for its replication and release (Calistri et al., 2021). TMPRSS4 and a similar serine protease TMPRSS2 mediate SARS-CoV-2 entry into cells with an additive effect by inducing cleavage of the S protein and enhancing membrane fusion (Wruck and Adjaye, Zang et al., 2020), and TMPRSS2 blockage resulted in SARS-CoV-2 inhibition (Hoffmann et al., 2020). In addition, the knock-down profile of several host proteins hijacked by the virus such as AP2M1 and NUP88 also showed significant correlations with the IMD-0354-induced profile. Collectively, the inhibition of these proteins by IMD-0354 would effectively inhibit viral replication, as evidenced by the negligible viral gene expression in the RNA-seq results of drug-treated samples, which resembled uninfected control samples ( Figure 6D). Together, these suggest the efficiency of our system-based host-transcriptome-targeting approach in the discovery of novel antiviral drugs with polypharmacological effects. Our robust CoV signatures would be instrumental in the future also to discover novel anti-CoV drugs even with very limited transcriptomic profiles.
In conclusion, by incorporating the knowledge of published CoV inhibitors and aggregating the disease signatures of SARS-CoV, MERS-CoV, and SARS-CoV-2 in a data-driven fashion, we provide a collection of robust CoV signatures to enable drug discovery for new variants, which led us to discover a potent drug candidate, IMD-0354, with anti-SARS-CoV-2 activity, which also showed enhanced potency against several variants. Our MoA analysis suggests that IMD-0354 activates type I interferon antiviral response and targets multiple pathways involved in the viral life cycle, suggesting the power of this system-based approach for drug discovery and the importance of robust pathological relevant CoV signatures. This pipeline might provide more robust CoV signatures as a forward-looking tool to enable drug discovery for new variants and future potential pandemics by iteratively updating CoV infection profiles and anti-CoV drug profiles.

Limitations of the study
Here are some limitations of our work and how they can be overcome. Because we used the transcriptome profiles from different in vivo and in vitro models, which might add the biases and limitations in their disease signatures, to overcome the limitation of technical variations, we created the disease signatures of all datasets with one pipeline and made comparisons within individual studies. In addition, diverse in vivo and in vitro models were included to capture all possible transcriptional dysregulated profiles and only those that could inform drug discovery were selected to create a robust disease signature for SARS-CoV-2. Although our pipeline offers a unique approach to discovering potent drug candidates for future pandemics, it requires prior knowledge of transcriptomics profiles and active compound profiles of any of the family members related to the infection particle. This limitation could be overcome with the advancement and wide availability of sequencing technologies. iScience Article from the LINCS, which are based on cancer cells and only cover 978 landmark genes. Although this database has already been successfully applied for drug predictions in various non-oncology diseases, a drug profile library database specific to viral perturbation is expected to improve performance. Lastly, the IMD-0354 prodrug IMD-1041 is even more promising because it is orally available and has been investigated for COPD, a group of lung diseases that block airflow and make it difficult to breathe. However, because this drug is currently owned by a private company and relevant information, including chemical structure, is not available, it is not feasible to immediately launch animal studies to fully evaluate its candidacy.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
This research is supported by R01GM134307, K01 ES028047, and the MSU Global Impact Initiative. The pathogen resource (NCCP43326, NCCP43381, NCCP43382, NCCP43390, NCCP43388, NCCP43389) for this study was provided by the National Culture Collection for Pathogens of Korea. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2017M3A9G6068245 andNRF-2020M3E9A1041756). The content is solely the responsibility of the authors and does not necessarily represent the official views of sponsors. We thank Nova Chekalina for inspiring the visualization and Thomas Dexheimer for reagent preparation. We appreciate all researchers who produced the datasets that make this work possible.  (2020). A SARS-CoV-2 protein interaction map reveals targets for drug-repurposing. Nature 583, 459-468. https:// doi.org/10.1038/s41586-020-2286-9.

Experimental design
This study aims to discover drug repurposing candidates for COVID-19 and emerging VOC using a computational system-based approach, which ranks drugs based on their predicted potency to reverse the CoV host response signature (i.e., host gene expression dysregulated in COVID-19 samples). We made tremendous efforts to decipher the complexity and variance of host responses through large-scale analysis of published data. When collecting and processing the CoV transcriptomic profiles, we used the same pipeline to process all the raw sequence data and made the comparison within the same study to minimize the batch effect. Sample annotation followed the original study, and no samples were purposely excluded from the analysis. Published anti-CoV compounds (i.e., positive controls) were divided into a calibration set and a testing set to select and validate CoV signatures for drug discovery. We screened the LINCS drug-treated transcriptomic profiles to prioritize repurposed candidates that can reverse the robust CoV infection signatures. The experimental validation of the repurposed drug candidates for inhibiting SARS-CoV-2 and its VOC was performed on two frequently used cell lines, namely Calu-3 and Vero-E6. The MoA of the most potent drug was also explored based on RNA-seq profiling and perturbagen connectivity scoring, followed by independent q-PCR confirmation. Technical and biological replicates in each experiment were used to check the robustness of the results.

MSigDB hallmark gene enrichment analysis
Fisher exact tests were performed to calculate the p value of a gene set enrichment in the up-or downregulated genes from a published COVID-19 signature or a raw CoV signature. The background was defined as a union of all genes annotated in MSigDB Hallmark dataset. For improved visualization, the p values were transformed by -log10 for up-regulation enrichment and by +log10 for down-regulation enrichment, and we took the one with a larger absolute value as the final direction.

Computation of infection signatures
SARS-CoV, MERS-CoV, and SARS-CoV-2 related data were retrieved from ArrayExpress, Gene Expression Omnibus (GEO), and Sequence Read Archive (SRA). The meta information of each sample was manually curated, including virus strain, model, organism, and time point. The expression matrix for each microarray data was downloaded via the GEOquery R package. The matrix was further filtered by removing the probes with expression in only half of the samples. Expression values were normalized using quantile normalization, and log 2 transformation was applied for each matrix. The probe values were merged based on Entrez Gene ID. The Significance Analysis of Microarrays (SAM) method was used to compute differentially expressed (DE) genes with criteria log 2 fold change R1 or % À1 and false discovery rate (FDR) < 0.05. Gene symbols of other organisms were converted to HUGO gene symbols using the biomaRt package. For RNA-seq datasets, raw sequence data were downloaded from SRA and processed with the TOIL pipeline Vivian et al., 2017). EdgeR (Robinson et al., 2010) was used to compute DE genes using the same criteria as used for microarray data, except that log 2 fold change threshold was set to 0.585 (zlog 2 1.5) for SARS- iScience Article all the comparisons across all time points, and corresponding comparisons were performed in the mock group. The DE genes that were uniquely present in the infection group were selected for further analysis.
We also compared DE genes between infection and mock groups at each time point, together with consistently dysregulated genes from the first to last time point.

Computation of drug signatures
Drug gene expression profiles have been widely used in our previous studies. Briefly, a full matrix comprising 476,251 signatures and 22,268 genes including 978 landmark genes (as of September 2013) was downloaded from the LINCS website (https://clue.io). The meta-information of the signatures (for example, cell line, treatment duration, treatment concentration) was retrieved via LINCS Application Program Interfaces. The matrix and metadata are now available via GSE92742 in GEO. The signature derived from the comparison of gene expressions between the perturbagen-or vehicle control-treated samples represents gene expression changes upon treatment. We further downloaded the LINCS drug information from the Drug Repurposing Hub. Only small molecules with high-quality gene expression profiles (is_gold = 1, annotated in the meta information) and listed in the Drug Repurposing Hub were further analyzed.

Reversal correlation
The computation of Reversal of Gene Expression Score (RGES) and the summarization of RGES (to give the summarized RGES, or sRGES) were detailed elsewhere and recently implemented as a standalone R package . In short, we quantified the reversal of disease gene expression as RGES, a measure modified from the connectivity score developed in other studies (Sirota et al., 2011;Subramanian et al., 2017). To compute RGES, we first ranked genes based on their expression values in each drug profile. An enrichment score for each set of up-and down-regulated disease genes was computed separately using a Kolmogorov-Smirnov-like statistic, followed by merging scores from both sets (up/down). The score is based on the extent to which the provided genes (up or down-regulated disease genes) are located at either the top or bottom of the ranked drug expression profile. One compound might have multiple expression profiles because they were tested in various cell lines, drug concentrations, treatment durations, or occasionally different replicates, resulting in multiple RGES for one disease prediction. We set a reference condition (i.e., concentration of 10 mM and treatment duration of 24 h) and used a model to estimate a new RGES if the drug profile under the reference condition was not available. We summarized these scores as sRGES without weighting the cell lines. We considered predictions to be insignificant if the maximum of the absolute sRGES is <0.25.

Selection of valid infection signatures
Drugs with known in vitro activity against any of the three CoVs (i.e., SARS-CoV, MERS-CoV and SARS-CoV-2) served as positive controls to select valid infection signatures (Data S1). When selecting the first version (V1) of valid signatures, we used all positive drugs reported before February 2020. After updating published anti-SARS-CoV-2 hits, the positive drugs were equally distributed to a calibration set and an external test set. Only the calibration set was used to select valid signatures. Qualified signatures should meet the following criteria: (1) derived from CoV infection experiments; (2) the number of DE genes mapped to LINCS was > 50 (not applied to SARS-CoV-2 signature selection because of generally fewer DE genes); (3) the maximum absolute sRGES prediction was > 0.25; (4) the sRGES of positive drugs was enriched at the top (one side Wilcoxon rank-sum test p < 0.05, FDR < 0.25); (5) the sRGES and the average EC 50 value of positive drugs were highly correlated (Spearman Rho >= 0.4, p < 0.05; not applied to SARS-CoV-2 signature selection because of the highly varied experimental settings).
Quantifying the separation of patient and control samples using a subset of transcriptome Transcripts per million (TPM) values were used to quantify the transcription of N selected genes (e.g., CoV signature genes) in every sample. For a cohort with M 1 patient samples and M 2 control samples (from healthy donors or patients diagnosed with other diseases), an N * (M 1 + M 2 ) matrix was transformed using PCA. Then the first and second PCs were used as input features to fit a linear discriminant analysis (LDA) model. The fitted model assigned a predicted label of ''patient'' or ''control'' to each sample. Finally, to quantify the separation, we calculated the balanced accuracy score based on the true label and predicted label of each sample in this cohort.

OPEN ACCESS
iScience 25, 105068, October 21, 2022 RNA samples from control, infection and treatment samples were prepared using the KAPA RNA HyperPrep Kit and paired-end sequencing was performed in Illumina NextSeq500. The ribosomal reduction RNA-seq approach was used to capture the host and viral RNAs. Sequencing reads were mapped on the human Hg38 transcriptome with the ENSEMBL GRCh38.p3 annotation using STAR aligner (Dobin et al., 2013) to capture the number of reads mapped to individual genes. Mapped reads were used to compute the DE genes using the edgeR package (Robinson et al., 2010) implemented in the OCTAD package . The genes with criteria log 2 fold change R1 or % À1 with FDR <0.05 were defined as DE genes. Additionally, to compare the viral genes among each other, the number of reads mapped was normalized with the gene length and total reads. One sample with exceptional high viral gene expression in the treatment group was excluded from the DE analysis.
For viral detection, we started with the unmapped reads left from the human transcriptome mapping to align on the 11406 viral genomes. Since more than 99% of reads were mapped on the SARS-CoV-2 genome, we extracted the reads mapped on the SARS-CoV-2 genome and quantified the reads mapped on each viral gene. Further, raw read counts of each viral gene were normalized with total read counts and gene length. Two sides Wilcox rank sums test was performed to compare viral gene expression in different samples. The p-value cut-off % 0.05 was used to identify the significant difference between the two groups.