Machine learning on large scale perturbation screens for SARS-CoV-2 host factors identifies β-catenin/CBP inhibitor PRI-724 as a potent antiviral

Expanding antiviral treatment options against SARS-CoV-2 remains crucial as the virus evolves under selection pressure which already led to the emergence of several drug resistant strains. Broad spectrum host-directed antivirals (HDA) are promising therapeutic options, however the robust identification of relevant host factors by CRISPR/Cas9 or RNA interference screens remains challenging due to low consistency in the resulting hits. To address this issue, we employed machine learning, based on experimental data from several knockout screens and a drug screen. We trained classifiers using genes essential for virus life cycle obtained from the knockout screens. The machines based their predictions on features describing cellular localization, protein domains, annotated gene sets from Gene Ontology, gene and protein sequences, and experimental data from proteomics, phospho-proteomics, protein interaction and transcriptomic profiles of SARS-CoV-2 infected cells. The models reached a remarkable performance suggesting patterns of intrinsic data consistency. The predicted HDF were enriched in sets of genes particularly encoding development, morphogenesis, and neural processes. Focusing on development and morphogenesis-associated gene sets, we found β-catenin to be central and selected PRI-724, a canonical β-catenin/CBP disruptor, as a potential HDA. PRI-724 limited infection with SARS-CoV-2 variants, SARS-CoV-1, MERS-CoV and IAV in different cell line models. We detected a concentration-dependent reduction in cytopathic effects, viral RNA replication, and infectious virus production in SARS-CoV-2 and SARS-CoV-1-infected cells. Independent of virus infection, PRI-724 treatment caused cell cycle deregulation which substantiates its potential as a broad spectrum antiviral. Our proposed machine learning concept supports focusing and accelerating the discovery of host dependency factors and identification of potential host-directed antivirals.


Introduction
Genetic plasticity of SARS-CoV-2 and herd immunity-derived evolutionary pressure have led to the emergence of variants of concern (VOC; Van Egeren et al., 2021;Flores-Vega et al., 2022). Some of the variants and recombinant lineages have evaded vaccine-elicited antibodies and monoclonal antibody-based treatments (Dejnirattisai et al., 2021;Planas et al., 2021;Wang P. et al., 2021;Iketani et al., 2022). Although antiviral drugs such as Remdesivir, Molnupiravir, or Nirmatrelvir/Ritonavir (Paxlovid) have significantly improved disease outcome in patients (Beigel et al., 2020;Hammond et al., 2022;Jayk Bernal et al., 2022;Wong et al., 2022), these treatments might select for more resistant variants over time (Gandhi et al., 2022;Hogan et al., 2023). Consequently, our current virus-directed countermeasures have limitations and there is an urgent need for developing hostdirected antivirals (HDA) that can target host-dependency factors (HDF) required for the life cycle of the virus. HDA might provide variant-independent treatment opportunities to reduce disease severity and complement virus-directed antiviral therapies (Edinger TO et al., 2014;Mahajan et al., 2021;Wagoner et al., 2022). In addition, drug repurposing can offer an expedited timeline to bring HDA therapies into clinical settings in a cost-effective and timely manner. Still, a common challenge is the robust identification of relevant host factors among varying in vitro conditions, although recent advances in functional genomics have simplified the use of high-throughput technology platforms based on CRISPR/Cas9 and siRNAs (Daniloski et al., 2021;Wang R. et al., 2021;Wei et al., 2021;Zhu et al., 2021). However, when comparing the results of these experimental screens for SARS-CoV-2 host dependency factors, there is only a marginal overlap of hits. This may be due to different viral strains, host cells and/or different multiplicity of infection (MOI) or further different experimental settings such as the selection of knockout library constructs used. Interestingly, when grouping the identified HDF into gene sets of common function or cellular processes, the consistency increases suggesting that these lists of HDF in their specific contexts contain consistent patterns on a more complex level.
Machine learning methods can identify such common patterns in experimental data, as, e.g., the identification of common regulators explaining large scale transcription profiles (Alipanahi et al., 2015;Hörhold et al., 2020) and allow for integration of a broad variety of omics and genomics data. In this study, we followed a machine learning approach to identify HDF of SARS-CoV-2. From the knockout screens (Daniloski et al., 2021;Wang R. et al., 2021;Wei et al., 2021;Zhu et al., 2021), we assembled a gold standard of genes required for virus replication. With this gold standard, we trained and evaluated the classifier. The classifier based its predictions on features of the genes describing their encoded proteins´ cellular localization, protein domains, annotated cellular processes, and gene/protein sequences. Furthermore, we embedded experimental data comprising proteomic, PPI, and transcriptomic profiles of SARS-CoV-2 infected cells. In addition, another classifier was trained based on a gold standard derived from a drug screen (Ellinger et al., 2021), in which more than 6,000 drugs were screened to determine whether they had a protective effect on cells infected with SARS-CoV-2. Both classifiers delivered a list of predicted HDF. For a few, well selected HDF, we interrogated drug databases, and identified β-catenin and its small molecule inhibitor PRI-724, particularly antagonizing the interaction between β-catenin and CBP.
The β-catenin/CBP interaction displays a pivotal part of a switchlike signaling network within the Wnt/β-catenin pathway that controls the well-balanced interplay between cell proliferation and differentiation (Akiyama, 2000;Liu et al., 2022). Activation of Wnt signaling leads to a sequester of the β-catenin destruction complex enabling cytosolic accumulation of β-catenin and, in turn, its nuclear translocation where it interacts with CBP or p300 to drive either proliferation or differentiation, respectively (Bordonaro and Lazarova, 2016). In this study, we identified β-catenin as a SARS-CoV-2 host factor using machine-learning and sought to analyze the efficacy of one of its inhibitors, PRI-724, against different pandemic-related RNA viruses in vitro.

Gene prioritizations
Two sets of machines were set up to predict HDF, one based on data from CRISPR/Cas9 based knockout screens, and one on data from a drug screen. For defining the gold standard for the first set of machines, the experimental results from four genome-wide CRISPR/ Cas9 knockout screens were considered (Daniloski et al., 2021;Wang R. et al., 2021;Wei et al., 2021;Zhu et al., 2021; Table 1). The ranking of the screened genes was taken from the Supplementary material of the respective studies. The rank product was calculated for each gene for which information was available. The n = 500 top ranking genes of this new list were selected and used as the positive class for the To get a combined list of genes, we calculated the rank product for each gene listed in their score tables and used this combined ranking for our study. In addition, a second gold standard was established based on the screening of 5,632 compounds and their capacity to inhibit the cytopathic effect (CPE) of SARS-CoV-2 in a Caco-2 cell line (Ellinger et al., 2021). In line with the authors of the original study, compounds that blocked viral CPE by 75% were considered as hits. This led to 273 compounds inhibiting SARS-CoV-2 infection without inducing toxicity. Then, their respective targets were identified using the information of different drug databases [Drugbank (Wishart et al., 2018), ChEMBL (Mendez et al., 2019), TTD (Wang et al., 2020), PharmGKB (Thorn et al., 2013), and BindingDB (Gilson et al., 2016)]. By this, 178 gene targets were found and labeled as "HDF. " For the remaining drugs with no inhibitory effect, their 1,881 targets were labeled as "non-HDF. " Effective drugs with more than 11 targets were considered to be "promiscuous, " and their targets were not included in the gold standard.

Feature generation
The procedure for feature generation was similar as published earlier for essential gene prediction (Acencio and Lemke, 2009;Aromolaran et al., 2020;Beder et al., 2021). Each gene served as a sample in the machine learning procedure, either labeled as HDF or non-HDF, or was not used for training the classifiers. For each gene, we generated a comprehensive set of 60,593 features comprising the seven categories of (1) gene sequence, (2) protein sequence, (3) protein domains, (4) gene sets from Gene Ontology, (5) conservation, (6) topology in the protein interaction network, and (7) subcellular location of the protein. We distinguished between intrinsic and extrinsic features. Intrinsic features denote features, which can be directly derived from gene and protein sequences, whereas extrinsic features describe network topology, homology, cellular localization of the expressed protein and functional (interaction) domains. Employing BioMart (Smedley et al., 2009), we downloaded the gene and protein sequences from Ensembl (v102). Protein and gene sequence features (categories 1 and 2) were calculated with several software tools [seqinR (Charif et al., 2005), protr (Xiao et al., 2015)], CodonW, 1 and rDNAse 2 spanning a large range of descriptors as explained in the following. Using seqinR, we calculated the number and percentage of each of the 20 amino acids in a protein. In addition, it was used to calculate other amino acid sequence features including the number of residues, the percentage of each of the physico-chemical classes and the theoretical isoelectric point. Using protr, we calculated the autocorrelation, Conjoint Triad Descriptors (CTD), quasi-sequence order and pseudo amino acid composition. CodonW was used calculating general gene features like gene length and GC content, and frequencies of optimal codons (Hershberg and Petrov, 2009) and the number of codons in the coding sequence. rDNAse calculated additional gene features, like auto covariance, pseudo nucleotide composition, and kmer frequencies (n = 2-7). It also calculated the sequence attribute distribution. Amino acids were grouped into three classes: (1) polar, (2) neutral, or (3) hydrophobic. The second digit in the feature name accounted for the class. More fine grained, seven attributes comprising (1) hydrophobicity, (2) normalized van-der-Waals volume, (3) polarity, (4) polarizability, (5) charge, (6) secondary structure, and (7) solvent accessibility were represented by the first digit in the feature name. The last three digits described the location of the attribute in the sequence, i.e., either at the beginning of the sequence (000), around the 25% quantile of residues (025), 50% (050), 75% (075), or at the end of the sequence (100). For example, seq.attribute.distribution 42100 is the sequence attribute of amino acids being polarizable (4), neutral (2) and are located at the end of the sequence (100). The domain features (category 3) were calculated using BioMart yielding protein family domains (pfam domains), the number of coiled coils structures, prediction of membrane helices, posttranslational modifications, β-turns, cofactor binding, acetylation and glycosylation sites, signal peptides and transmembrane helices, and the number and lengths of UTRs. To calculate the gene set features (category 4), gene sets of all terms from Gene Ontology (Biological Process, Molecular Function, Cellular Localization) were obtained from the gene annotation of Ensembl (v102). First, to each gene, its direct neighbors were added using the STRING database (Szklarczyk et al., 2019), and with this list a gene set enrichment test performed employing Fisher's exact tests. The negative log10 value of the p-value P was used as a feature. Highly redundant (overlapping) gene sets were removed by the following method. Overlap of genes among each pair of gene sets V and W was quantified by Jaccard similarity coefficients, Pairs with J(V,W) above the threshold α = 0.3 were included into a model and represented as an undirected graph G(X,E), in which X were the n gene sets and pairs above α the edges E. A Mixed Integer Linear Programming problem was set up, Subject to in which w i was the weight of a gene set and calculated from the significance value (p-value P) of the according Fisher's Exact test, The problem was solved using Gurobi™ (version 7.5.1 3 ). By this, up to one gene set of each pair with high overlap was selected and the gene set with the higher enrichment was privileged. Gene conservation (category 5) was assessed by sequence alignment of the protein sequence of each gene with all proteins listed in RefSeq (Pruitt et al., 2005) using PSI-Blast (Altschul et al., 1997). The number of obtained proteins with E-value cutoffs from 1e − 5 to 1e − 100 served as feature values of the respective features. For category 5, typical topology features like degree, degree distribution, several centrality descriptors and Page rank was obtained using protein interaction data (n = 697,802 interactions) from the BIOGRID database (Oughtred et al., 2019). In addition, we assembled a second network considering also the interactions between SARS-CoV-2 proteins and proteins of the human host cell. For this, in total 22,586 interactions between SARS-CoV-2 and human proteins were obtained from the BioGRID COVID-19 Coronavirus Curation Project (Oughtred et al., 2019). Viral proteins were added to the generic network and an interaction was added to the network between a host protein and a viral protein if the link was listed in the protein interactions between SARS-CoV-2 and human proteins. Using both protein interaction networks, in summary, 24 topological features were generated using the igraph R package for network analysis. 4 Subcellular localization (category 7) was derived using the prediction software DeepLoc (Almagro Armenteros et al., 2017), which assigns probability scores to 11 eukaryotic cellular compartments (cytoplasm, nucleus, mitochondria, ER, Golgi apparatus, lysosome, vacuole and peroxisome, plasma membrane, extracellular, chloroplast).
Furthermore, 69 features from experimental data derived from gene expression, proteomics and phospho-proteomics profiles of Caco-2 cells post infection with SARS-CoV-2 at different time points were used. Proteomics data was taken from a previous publication  and z-score transformed peptide counts for each time point post infection (0, 2, 6, 10, 24 h, MOI = 0.01) vs. time point zero taken. In line, own gene expression data from Caco-2 cells infected with SARS-CoV-2 (FFM1, Wuhan wildtype; 0, 3, 6, 12 and 24 hpi, MOI = 0.01). For both datasets, as features, log 10 differential expression values between each time point post infection and time point zero (no infection) was taken. The phosphoproteomics data was taken from a previous study (SARS-CoV-2 infected Caco-2 cells, 24 hpi, MOI = 1) and features calculated similarly as for the proteomics data .

Machine learning pipeline
Each feature was normalized using z-score transformation. We trained and validated two different sets of machines, one based on the gold standard of the knockout screens (knockout screens based classifier) and one based on the gold standard from the drug screen (drugs screen based classifier). The machine learning procedure was the same for both. For each, a 5-fold cross-validation (CV) was performed in which 4/5 of the data was used to train the model, and the remaining 1/5 of the data (validation set) was used to assess the performance leaving the test set unseen. In addition, we repeated these cross-validations five times and averaged the results over these five independent runs. The first step in the training procedure was feature selection. For this, Least Absolute Shrinkage and Selection Operator (LASSO) regression was performed using the "glmnet" R package (cv. glmnet function with parameters alpha = 1, type.measure = "auc"). In a second feature selection step, highly correlating features with Pearson's correlation coefficients r > 0.70 were removed. On average, 155 features remained. These features were fed into a Random Forest (RF) classifier using the caret package in R (tuneLength = 10, metric = "Kappa, " nthread = 20, ntree = 500). All trained models were used to predict the complete set of 20,007 genes to find novel HDF candidates, and to confirm or reject the HDF annotation of the gold standard. We defined the top 10% of the predictions from a voting scheme as predicted HDF (predHDF) of the classifiers, i.e., of the knockout based classifier and of the drug based classifier, leading to 2,182 and 1,989 predHDF, respectively. To evaluate the performance of the learned models, the area under the receiver operating characteristic (ROC-AUC) and area under the precision-recall curve (PR-AUC) of the results from the validation data was calculated using the PRROC package in R. Average and standard variances of the performance estimates were calculated using the results of the individual cross validation runs. Most discriminative features of the model were obtained using the varImp function from the caret package which ranked the features based on their impact on the decision trees accuracy.

Gene set enrichment analyses
The lists of predHDF were investigated for enriched gene sets using the R package gprofiler2 (Raudvere et al., 2019). The statistical significance threshold for all GO terms was set to a false discovery rate p < 0.05. To focus on non-too-general and non-too-specific GO terms, only terms with more than 10 and less than 200 genes were considered. Redundancy of the terms was removed as described above (section 2.2) using α = 0.4.

Drug selection
Aiming for repurposing known drugs including clinically approved drugs for the identified targets, a manually assembled data repository of drugs was set up based on seven public drug databases: Drugbank, ChEMBL, TTD, PharmGKB, BindingDB, IUPHAR/BPS, and DrugCentral (Thorn et al., 2013;Gilson et al., 2016;Ursu et al., 2017;Wishart et al., 2018;Mendez et al., 2019;Armstrong et al., 2020;Wang et al., 2020). The databases provide information about the mechanisms of actions of the drugs, drugtarget interactions and therapeutic indications. We considered only drugs approved by either of the following regulatory agencies: Food and Drug Administration (FDA), Health Canada, European Medicines Agency (EMA), and Japan Pharmaceutical and Medical Devices Agency (PMDA), and drugs currently tested in clinical trials.
For all viruses used in this study, the corresponding GenBank accession numbers, and references are listed in Table 2. Briefly, coronavirus isolates were obtained from nasopharyngeal swabs of infected individuals. Swab material was suspended in 1.5 mL phosphate-buffered saline (PBS) and propagated on Caco-2 cells. Cellfree aliquots were stored at −80°C and titers were determined by tissue-culture infectious dose (TCID 50 ) assay in A549-AT cells. The H5N1 influenza strain Hongkong/213/03 was received from the World Health Organization (WHO) Influenza Centre (National Institute for Medical Research, London, United Kingdom). Virus stocks were prepared by infecting Vero cells and aliquots were stored at −80°C. Titers were determined as described above.
According to the committee for Biological Safety (ZKBS), the entirety of infectious work presented in this study was conducted under BSL-3 conditions.

Compound treatment and infection assay
Inhibitors were diluted serially on confluent A549-AT or Calu-3 cells in 1% (v/v) serum MEM and were subsequently infected at an MOI of 0.1 with viruses for 48 h. Cells were fixed with 3% (v/v) paraformaldehyde (PFA) for 20 min and stored at 4°C in PBS until analysis. Cytopathic effects were quantified by confluency measurement using a Spark ® Cyto 400 multimode plate reader (Tecan Group Ltd.; Zürich, Switzerland).

Scratch assay
Scratch assay was implemented to visualize growth hindrance in A549-AT cells upon treatment with PRI-724. 6⋅10 5 cells were seeded in a 12-well format 1 day prior to assay. After applying scratches with a pipette tip, cells were washed with 1x PBS and treated with 10 μM PRI-724 and the respective amount of DMSO in 1% FCS MEM. Cells were subjected to live cell imaging 0, 24, 48 and 72 h post treatment using a Spark ® Cyto 400 multimode plate reader (Tecan Group Ltd.; Zürich, Switzerland).

Live cell imaging
In order to monitor the effectiveness of PRI-724 in terms of blocking syncytia formation and cytopathic effects in general, live cell imaging was carried out in A549-AT cells in a 96-well format. 3.5⋅10 4

Cytotoxicity measurement by fluorescence microscopy
Confluent A549-AT cells were treated with 10, 3, 1, 0.3 μM PRI-724 and with the highest corresponding amount of DMSO for Frontiers in Microbiology 07 frontiersin.org 48 h in 1% MEM. Cells were fixed with 3% PFA and stained with 0.5 μg/mL Hoechst 33342. Imaging was performed with Olympus IX 81 scanning a total of 9 fields/well with a 3 × 3 scatter at a total magnification of 10x (pixel size = 0.645 μm). Screening was performed using ScanR Acquisition v2.4.0.13 software. The analysis was performed using KNIME and KNIP (Dietz and Berthold, 2016). The images were filtered using a gaussian convolution (σ = 2 px). A consecutive image thresholding assigned pixels with values greater than the image-mean to the foreground. Cells were detected and labeled by connected component analysis. Cell clumps were afterwards split into single cells using the Wählby algorithm (Wählby et al., 2004). The resulting components were filtered based on their size, components too large or too small were rejected as they most likely do not depict cells. Cell count was measured for the different PRI-724 concentrations (Supplementary Figure S4).

PI staining and flow cytometry
In a 6-well plate, confluent A549-AT cells were treated with 10 μM PRI-724 and the corresponding amount of DMSO for 24 h and were infected with SARS-CoV-2 (B.

Statistics
The curve fittings presented for dose-responses to Remdesivir and PRI-724 presented in Figure 1A and Supplementary Figure S1 were performed by applying robust non-linear regression comprising a sigmoidal 4-parameter model for a total of three biological replicates per concentration:

Y Bottom
Top Bottom The exact IC 50 values calculated from these analyses are listed in Figure 1A and Supplementary Tables S7, S8. For simple group comparisons such as data presented in Figures 1B, 2, 3B, C, 4A, B, 5, one-way ANOVA was generally applied to identify significant differences between treatments. In all tests, equal variances according to the mean were assumed and further analyses were not corrected for multiple testing (Fisher's LSD test). Results were rated significant when p < 0.05. For RNA kinetics depicted in Figure 3A, two-way ANOVA was applied to identify significant differences in vRNA levels according to the timepoint of collection and the sort of treatment. Again, equal variances according to the mean were assumed and further analyses were not corrected for multiple testing (Fisher's LSD test). Results were rated significant when p < 0.05.

Results
3.1. Host dependency factors identified by genome scale screens showed marginal overlap on the gene level, but improved consistency for common cellular processes and functions First, the overlap of the results from published screening studies were investigated, in the following denoted as Daniloski et al. (2021), Wang R. et al. (2021), Wei et al. (2021), and Zhu et al. (2021). For a balanced comparison, we regarded the top 500 genes with the highest scores of every screen ( Figure 6A). Only ACE2 was common among all candidates. The highest overlap (n = 37 genes) was observed between the screens of Daniloski et al., and Zhu et al. Besides ACE2, only 11 genes were common among Wang et al., and Wei et al. All these overlaps were not significantly enriched (using Fisher's Exact Test). Similarly, we found very low correlations while analyzing the ranking of all genes of the screens (independent of the cutoff; Figure 6B). Next, we investigated how grouping of genes into gene sets of common cellular processes and gene functions affect the results. For this, gene set enrichment tests were performed with the same lists of the 500 highest scoring genes ( Figure 6C). Interestingly, the overlaps were considerably higher but still not significant. Daniloski et al. and Zhu et al. shared Table S1). In summary, we observed limited overlap among the investigated screens. The overlap was higher when regarding gene sets of common cellular processes or molecular function. This supported our concept to set up a machine learning procedure to identify more common patterns among these different datasets.
3.2. Classifiers based on data from the knockout screens and a drug based screen performed well in predicting HDF from the gold standards We assembled a list of 500 genes commonly high ranking among the investigated knockout screens (top 500 genes of the rank products, see Methods). This list served as the input for the machine learning classifier. Similarly, we selected negative controls from the lowest ranking genes. Furthermore, we set up a descriptor for gene predictions by compiling a comprehensive list of more than 60,000 features for each gene describing its nucleic acid and protein sequences, potential associated cellular processes, its associated compartments, functional domains, molecular functions, its conservation, and its network topology within a protein interaction network. Here, also other omics data of SARS-CoV-2 infected cells, like gene transcription profiles, proteomics and interactions between viral and host proteins were integrated. Performing a cross-validation, Frontiers in Microbiology 08 frontiersin.org in which training and validation data is strictly separated, we observed good performance results with an area under the curve (AUC) of a Receiver Operator Characteristics of 0.82 (1σ = 0.03; Figure 7A). In addition, the same procedure was performed and a classifier learned based on data of a drug screen (in the following drug screen based classifier; Ellinger et al., 2021). Here, we obtained slightly worse performance compared to the knockout-based classifier receiving an AUC of the Receiver Operator Characteristics of 0.71 (1σ = 0.11). Additional performance parameters are supplemented (Supplementary Table S2). Inhibition of SARS-CoV-2, SARS-CoV-1 and IAV H5N1 by PRI-724. (A) Dose-response curves of diverse SARS-CoV-2 mutational variants and SARS-CoV-1 to PRI-724 and Remdesivir. Cell confluence was analyzed 48 hpi using a Spark® Cyto Imaging System. Dose-response curves were fit to normalized % inhibition. Data represent mean ± SD of three biological replicates. The experiment was repeated twice with similar results. IC 50 values were calculated by applying robust non-linear regression. Mann-Whitney U-test confirmed no significant changes between any dose-responses analyzed. (B) Staining of SARS-CoV-2 (B.1.617.2 strain) and SARS-CoV-1 N protein, as well as IAV H5N1 NP 18 hpi. Data points represent mean ± SD of three biological replicates. The experiment was repeated twice with similar results. Infected cells were determined by Alexa 488/DAPI co-localization using an Operetta® High-Throughput Imaging system. Scale bars = 500 μm.

Predicted HDF are prominently enriched in gene sets related to morphogenesis and development
We selected 2,182 and 1,989 top-scoring genes identified by knockout-and drug screen-based classifiers, respectively. To identify the most involved cellular mechanisms in the life cycle of the virus, we performed gene set enrichment analysis for each list illuminating 1,098 and 688 gene sets of the respective classifiers (Supplementary Tables S3, S4). Interestingly, we observed a high number of common gene sets (n = 313) showing consistency among both classifiers ( Figure 7B). Regarding these common gene sets, Growth kinetics of PRI-724 treated A549-AT cells. A549-AT cells treated with high (3 μM) and low (1 μM) dose of PRI-724 were subsequently infected with SARS-CoV-2 (B.1.617.2 isolate) and SARS-CoV-1 (FFM1 isolate; MOI = 0.1). Cells were incubated at 37°C and 5% CO 2 for 48 h in a Spark® Cyto Imaging system. Top: Graphical presentation of % confluence and surface roughness for both SARS-CoV-2 and SARS-CoV-1 are shown for measurements hourly performed. Bottom: Respective brightfield microscopy images represent critical time points of CPE formation over the course of infection during treatment with PRI-724 and without. Scale bar: 200 μm. See also Supplementary Videos S1−S6. Every condition comprises three biological replicates and graphs represent means ± SD; significant differences (***p < 0.001) are indicated by asterisks obtained by performing multiple t-tests. The experiment was performed twice yielding similar results.
Frontiers in Microbiology 10 frontiersin.org we further found a high fraction of these gene sets (69 out of 313 gene sets) to be related to morphogenesis and development, followed by 60 gene sets being related to neural processes (Supplementary Table S5). Besides these gene sets, further gene sets were related to signaling, gene regulation, and immune and stress response (Supplementary Table S5). During development, a large variety of distinct different cellular identities need to be established and maintained in the embryo. Particularly, during developmental lineage reprogramming, a somatic cell can be reprogrammed into a distinct cell type by forced expression of lineage-determining factors (Vierbuchen and Wernig, 2012). Similarly, coronaviruses reprogram their host cell for their specific needs, including their replication (Spagnolo and Hogue, 2000;Shi and Lai, 2005;Reggiori et al., 2010;Nagy and Pogany, 2011;Pizzato et al., 2022). We followed this interesting parallelism and selected a gene list from the gene sets of development and morphogenesis for further prioritization. Aiming to interfere "reprogramming" with a high impact, we prioritized genes coding for proteins with high connectivity (number of nearest neighbors) and centrality measures in a proteinprotein interaction (PPI) network to enhance the impact of a treatment by small molecule inhibitors. This led to a short list of genes with highest ranks (employing rank products) based on these measures in this PPI network. Table 3 lists the top 10, Supplementary Table S6 lists the top ranking genes and their connectivity and centrality values when protein interactions to SARS-CoV-2 proteins are also taken into account. Out of these, β-catenin (CTNNB1) was selected for further analysis due to its high gene expression in SARS-CoV-2 infected cells. Interrogating publicly available compound and drug databases led to the selection of PRI-724 for experimental follow up described in the next sections. RNA kinetics and quantification of infectious virus upon PRI-724 treatment. A549-AT and Calu-3 cells were treated with 3 μM Remdesivir (Rv, blue) and indicated concentrations of PRI-724 [red; A549-AT: 10 μM (circle), 3 μM (square), 1 μM (rhombus); Calu-3: 20 μM (circle), 15 μM (square), 10 μM (rhombus)], as well as DMSO (gray) and were subsequently infected with SARS-CoV-2 (B.1.617.2 isolate) and SARS-CoV-1 (FFM1 isolate) at MOI = 0.1. (A) Viral sg-N RNA levels in total RNA lysates over time in A549-AT infected with SARS-CoV-2 (left) and SARS-CoV-1 (middle), as well as in Calu-3 infected with SARS-CoV-2 (right). Data points represent mean and SD of three biological replicates. The experiments were repeated twice with similar results. Significant differences between treatments and control determined by two-way ANOVA are indicated through asterisks; *p < 0.05 (Supplementary Table S9). (B) Infectious SARS-CoV-2 and SARS-CoV-1 virus titers 24 hpi in A549-AT determined by TCID 50 assay (left, middle) and in Calu-3 determined by plaque assay (right). Data points represent mean and SD of three biological replicates. The experiments were repeated twice with comparable results. Significant differences between treatments and control are indicated through asterisks; *p < 0.05, ***p < 0.001. Frankfurt-1 were tested. A549-AT cells were infected at an MOI of 0.1 and treated with PRI-724 or Remdesivir for 48 h. CPE-related confluency changes were measured using automated label-free brightfield microscopy ( Figure 1A). For all viruses tested, a mean IC 50 of 1.491 μM (95%CI 1.087-1.896) and IC 50 of 0.2916 μM (95%CI 0.1948-0.3885) was observed for PRI-724 and Remdesivir, respectively. No significant differences were detected among SARS-CoV-2 variants, even though the B.1 and B.1.429 variants showed ~1.5-fold and ~2-fold higher IC 50 , respectively, and BA.1, BA.2, and BA.5 variants had ~2-fold lower IC 50 ( Figure 1A; Supplementary Table S7 Figure S1C). In an attempt to validate our results and to test whether PRI-724 can be used against a broader spectrum of RNA viruses, we infected A549-AT cells with Influenza A virus (IAV), SARS-CoV-2 (B.1.617.2 strain), or SARS-CoV-1 (Frankfurt-1 strain), and performed immunofluorescence staining for IAV nucleoprotein (NP) and the coronavirus N ( Figure 1B). Significant concentration-dependent reduction of NP expression was evident for treatment with 10, 3, and 1 μM PRI-724. Treatment with 10 and 3 μM PRI-724 led to ~10-fold and ~4-fold reduction of H5N1 NP + cells, whereas for these treatments no cells stained positive for SARS-CoV-1 N. Treatment with 3 μM reduced SARS-CoV-2 N + cells ~15-fold and staining was negative with 10 μM PRI-724. PRI-724 shuffles β-catenin/CBP and β-catenin/p300 activity distribution. (A) A549-AT cells were transfected with Super M50 8x TOPFlash and pRL-SV40. After 24 h, cells were treated with 10 mM LiCl, 100 μg/mL rhWnt-3a (light blue), and 10 μM, 3 μM, 1 μM and 0.3 μM PRI-724 (red). Luciferase activity was measured after 24 h. Firefly luciferase (FLuc) was normalized to Renilla luciferase (RLuc) and the non-treated control (gray) for every sample. Data represent mean and SD of three biological replicates. Samples treated with PRI-724 were compared to the non-treated control (gray) while samples treated with PRI-724 + LiCl were compared to the LiCl-treated control (light blue) using ordinary one-way ANOVA. Significant differences are indicated by asterisks; *p < 0.05, ***p < 0.001. The experiments were repeated twice with comparable results. (B) A549-AT cells were treated with indicated concentrations of PRI-724 and DMSO. 24 h post treatment, RNA was isolated and RT-qPCR was implemented to analyze CDC25A (left), BIRC5 (right), and RNase P expression. Data represent mean and SD of three biological replicates. Significant differences are indicated by asterisks; **p < 0.002, ***p < 0.001. Experiments were repeated twice with comparable results. (C) Cells were treated as described in (B). Signals were normalized to β-actin expression. The asterisk designates the predicted band size of β-catenin. Rv, Remdesivir; C, Control.

Frontiers in
Frontiers in Microbiology 12 frontiersin.org

PRI-724 inhibits syncytium formation and CPE onset in A549-AT cells
After confirming PRI-724-mediated inhibition of CPE in A549-AT cells at 48 hpi, we examined the timeline of CPE onset and development. Label-free live-cell brightfield imaging was implemented to address viral growth kinetics upon treatment with PRI-724. Cells were treated with 3 μM (high-dose), 1 μM (low-dose) PRI-724, and the highest corresponding amount of DMSO and were subsequently infected with SARS-CoV-2 (B.1.617.2 isolate) or SARS-CoV-1 (Frankfurt-1 strain; MOI = 0.1). Imaging, confluence measurements and surface roughness measurements were performed over the course of 48 h. A549-AT confluence as well as surface roughness were mostly comparable among all treatment conditions until 24 hpi for both Cell cycle analysis in A549-AT upon treatment with PRI-724. (A) A549-AT cells were treated with 10 μM PRI-724 or corresponding amounts of DMSO, and were subsequently infected with SARS-CoV-2 (B.1.617.2 isolate, MOI 0.01). After 24 h, cells were stained with propidium iodide (PI) following flow cytometric analysis. G1, S and G2/M populations were calculated using the Watson model implemented in FlowJo (v10.8.1). Sub-G1 populations were measured manually. Additionally, cell lysates were analyzed by western blot for N prourvivingrvivin, cyclin D1 and CDK4 expression. Expression levels were normalized to β-actin expression. (B) Confluent A549-AT cells were scratched prior to treatment with 10 μM PRI-724 or corresponding DMSO amounts. Live cell imaging was performed 0, 24, 48, and 72 h post treatment to assess scratch healing properties using a Spark® Cyto imaging system. All data represent mean and SD of three biological replicates. Experiments were repeated twice. Significant results obtained by one-way ANOVA are indicated by asterisks; *p < 0.05, **p < 0.002, ***p < 0.001. Microbiology  13 frontiersin.org  These findings were confirmed by brightfield microscopy at indicated time points (Figure 2, bottom). Here, cell-cell-fusion, i.e., syncytium formation, caused by both viruses, was initially detectable at 16 and 24 hpi, respectively, in DMSO-treated cells and was not present in high-dose PRI-724 treated cells. In low-dose PRI-724 treated cells, SARS-CoV-2 syncytia and CPE were detectable, but much smaller compared to the control (Figure 2, bottom, row 2-3). Although, we found likewise results for SARS-CoV-1, cell lysis initiated at 32 hpi compared to the control (Figure 2, bottom, row 5-6).
3.7. PRI-724 downregulates β-catenin/CBP target genes CDC25A, BIRC5 and cyclin D1 protein levels and upregulates β-catenin/ p300 target c-Jun After illuminating the protective in vitro properties of PRI-724 during coronavirus infection in A549-AT and Calu-3 cells, we sought to investigate the underlying mechanism. In order to assess β-catenin/ CBP activity in A549-AT cells, we implemented a TOPFlash reporter assay ( Figure 4A), The TOPFlash plasmid contains TCF/LEF binding sites coupled to a firefly luciferase reporter. Nuclear translocated β-catenin binds to TCF/LEF transcription factors to recruit the co-factor CBP to the regulatory element (Lee et al., 2010) and thereby activates transcription. Treatment with 10, 3, and 1 μM PRI-724 inhibited TOPFlash reporter activity significantly in comparison to the uninduced control. Moreover, induction of TOPFlash activity by 10 mM LiCl was effectively blocked when using the same concentrations. The control construct with mutant TCF/LEF binding sites, showed almost no luciferase activity and no differences among treatments (Supplementary Figure S3). We further verified the effect of PRI-724 on β-catenin/CBP activity by RT-qPCR and western blot analysis. The expression of the direct target genes CDC25A and BIRC5 were reduced ~2-fold in PRI-724-treated A549-AT cells ( Figure 4B). Also, t protein levels of cyclin D1 were reduced when treated with 10 and 3 μM PRI-724. Reciprocally, β-catenin/p300 target c-Jun was increased in a concentration dependent fashion ( Figure 4C). β-catenin expression remained unchanged upon treatments.

PRI-724 causes cell cycle deregulation in A549-AT cells
It has been reported that PRI-724 causes cell cycle deregulation and block proliferation in several cancer cell lines, such as hepatocellular carcinoma (Gabata et al., 2020), pancreatic (Martinez-Font et al., 2020), and colon cancer (Emami et al., 2004;Kleszcz et al., 2020). We therefore determined whether PRI-724 and/or SARS-CoV-2 infection (MOI 0.01) effects host cell cycle progression. Flow cytometry analysis revealed a significant reduction in G1 (55-60% to 40%) and an increase in S cells (20-22% to 32-35%) upon treatment with PRI-724 compared to DMSO. No significant differences were obtained in G2/M populations. Sub-G1 populations were significantly elevated upon PRI-724 treatment and infected samples showed significantly higher sub-G1 populations than non-infected samples. SARS-CoV-2 infection did not seem to have any significant influence on the cell cycle distribution, although we noticed a slight but significant reduction in the S population of DMSO-treated infected cells compared to non-infected cells. Furthermore, western blot confirmed decreased amounts of Survivin and Cyclin D1 upon treatment with PRI-724. Also, infected cells treated with DMSO but not with PRI-724 showed slightly increased CDK4 expression ( Figure 5A). PRI-724-treated A549-AT cells also showed reduced proliferation ( Figure 5B). To visualize whether PRI-724 induces lateral growth hindrance, a scratch assay was implemented. 10 μM PRI-724 significantly hampered scratch closure at 24 h post treatment. After 24 h, the scratch size remained at ~310 μm over the course of the experiment while the scratch was almost entirely closed up in the control after 72 h ( Figure 5B).

Discussion
In this study we provide a comprehensive and robust machine learning-based host factor identification strategy (Xiao et al., 2015) using HDF screens performed for SARS-CoV-2 (Daniloski et al., 2021;Wang R. et al., 2021;Wei et al., 2021;Zhu et al., 2021) and a drug screen (Ellinger et al., 2021) along with validation of the predicted hit in the wet-lab. HDF screens of human or African green monkey cells infected with SARS-CoV-2 showed limited overlap in their hits. A better consistency was observed when regarding cellular pathways enriched in observed HDF of the screens. We reason this observation by variances among the different screens due to diverse experimental/ technical settings or/and observed biological entities, such as different host cell types or different viral strains. To some extent, such differences may be cleaned out, when regarding a gene set representing a cellular pathway which is essential for the virus. It may contain parts of signaling cascades supporting each other making knocked out genes replaceable in specific conditions. Such interdependencies are highly relevant for discovering drug targets and serve as a promising research venue for future investigations. We took another path here and let machines learn which genes are indispensable, independent of the experimental settings in an automated way. Our machines returned lists of predicted HDF that reconstruct HDF and non-HDF class labels well from unseen data when cross-validated. This proves that the machines can capture the data structure in the gold standard lists well. We followed up on investigating the predictions for their biological content, and particularly their enrichments in known cellular processes and functions. Interestingly, we found a major portion of enriched gene sets being related to development and morphogenesis, followed by gene sets being related to neural processes. This is an intriguing observation in itself, suggesting that cellular processes for development and morphogenesis in particular are hijacked by the virus to facilitate reprogramming of the host cell from cellular proliferation to the proliferation of virus particles.
Focusing on gene sets related to development and morphogenesis, we selected β-catenin due to its top ranking connectivity and centrality scores in a constructed protein-protein interaction network, and it was highly expressed during viral infection in our observed cells. β-catenin has been intensively studied, particularly in oncology. It is known for its highly ambivalent roles. In its classical role, it migrates to the nucleus and acts as a cofactor for p300 or CBP (Teo and Kahn, 2010). Investigating hematopoietic stem cells (Rebel et al., 2002) and an embryonal murine stem cell line (F9; Kawasaki et al., 1998;Ugai et al., 1999), it was shown that the choice of the interaction partner CBP or p300 matters for the critical decision between proliferation/(pluri) potency and initiation of differentiation. When interrogating publicly available small compound databases, we found PRI-724 and ICG-001 as β-catenin inhibitors. More specifically, they inhibit the interaction of β-catenin with CBP. PRI-724 is a second-generation structural derivative of ICG-001. We sought to explore the effect of both of these CBP/βcatenin inhibitors. PRI-724 performed better than ICG-001 (Supplementary Figure S1) and showed an inhibitory effect on pathogenic viruses such as SARS-CoV-2 variants, SARS-CoV-1, MERS-CoV, and IAV, in two different cell culture models. The inhibition doses were cell-type dependent which might be due to varying proliferation rates of the cells and different basal levels of β-catenin and CBP.
Viruses manipulate the cell cycle to create resources and cellular conditions that are advantageous for viral replication and assembly. Since PRI-724 is also known to affect cell cycle distribution, we hypothesized that the PRI-724-dependent reduction in virus replication and viral progeny production may be due to cell cycle deregulation. Sui et al. observed that SARS-CoV-2 induces dose dependent S and G2/M arrest at the early phase of infection, particularly, when higher MOIs were used, to facilitate virus replication (Sui et al., 2023), and, in turn, at a higher MOI (0.1), a significant decline of cells in S and G2/M phase at the late phase of infection (24 and 48 hpi). In our experiments we did not observe Frontiers in Microbiology 16 frontiersin.org any significant difference 24 hpi. This is most likely due to lower MOI (0.01) used in our study and is hence in line with the late phase observations from Sui et al. We detected that PRI-724-dependent G1 population decrease and S population increase is independent of the infection and thus it can effectively reduce virus replication and production for different SARS-CoV-2 variants, SARS-CoV-1, MERS-CoV and IAV. Regarding the drug toxicity, the total number of cells was lower after PRI-724 treatment (Supplementary Figure S4) and flow cytometry analysis showed a slight increase in sub-G1 population which was significantly pronounced following SARS-CoV-2 infection ( Figure 5A). The cell distribution pattern we found differs from the previously published work showing an increase in G0/G1 cell population (Arensman et al., 2014;Gabata et al., 2020) while using ICG-001 and C-82 (active compound of PRI-724) when applied to non-infected cells. This observation seems to be cell line or compound type dependent, because another study using colon carcinoma HCT116 cells also found a decrease in G1 and an increase in S cell population following PRI-724 treatment (Kleszcz et al., 2020). In summary, SARS-CoV-2 can reprogram the cell cycle in host cells. Our investigation showed that blocking the interaction between β-catenin and CBP with PRI-724 significantly reduced cytopathic effects, viral RNA replication, and infectious virus production. The treatment caused cell cycle deregulation in the host cell, which appeared to be unrelated to the infection. This suggests that our treatment effectively disrupts the virus-induced reprogramming of the cell cycle, which is essential for viruses to complete their life cycles. PRI-724 treatment has been also clinically investigated. PRI-724 was shown to reduce hepatitis C virus-induced liver fibrosis in mice . These investigations were followed up in two clinical trials of the same group (phase 1 and IIa, respectively; Kimura et al., 2017Kimura et al., , 2022. The authors concluded that PRI-724 treatment was well tolerated by patients with HBV and HCV induced liver-cirrhosis and showed improvements of the pathology of concern in several patients. Besides this, blockade of β-catenin/CBP reversed pulmonary fibrosis (Henderson et al., 2010), which is a central COVID-19 complication (Zou et al., 2021). A recent study confirmed overall higher β-catenin levels in SARS-CoV-2 infected patients and targeting reduced virus shedding in vitro (Chatterjee et al., 2022). A limitation of our study is that the machine learning procedure bases on screening data of the four initial screens which were at hand when we developed the computational method. New screens have been, and certainly more will be published as means to better understand the mechanisms of this wide spread and evolving virus. As a future outlook, we would like to repeat the computational analysis and include new screens that have been published since the initiation of this project. We thereby might identify additional HDF that are relevant for the SARS-CoV-2 life cycle. We would like to emphasize that further investigations with 3D models of primary cells, e.g., human bronchial epithelial cells (HBEpC) and human nasal epithelial cells (HNEpC) are crucial to validate our observations. 3D models have the ability to replicate cellular behavior in vivo more effectively than immortalized or cancer cell lines. These cell models can also offer a more accurate representation of host-pathogen interactions in vitro compared to 2D systems, resulting in more precise data and a superior prediction of drug efficacy and toxicity.
In conclusion, the machine learning approach brought up interesting connections between viral replication and the reprogramming of host cells in the context of SARS-CoV-2 infection. It may also suit for host factor studies of other viruses and a comparison between different virus entities may lead to intriguing new aspects in virus biology. We observed a strong correlation between the inhibition of CPE, vRNA, and infectious virus production for the tested concentrations in two different cell lines, making PRI-724 a promising HDA against SARS-CoV-2 and SARS-CoV-1 in our models.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.

Author contributions
TT, HE, and RK conceived the study. AV-G, AH, TB, MO, and RK performed data analysis and machine learning. MK and TT designed and performed the experiments. NB and PW performed the experiments on cytotoxicity measurement by fluorescence microscopy. TT, SC, RK, and HE acquired funding. All authors contributed to the article and approved the submitted version.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.