Discovery of host-directed modulators of virus infection by probing the SARS-CoV-2–host protein–protein interaction network

Abstract The ongoing coronavirus disease 2019 (COVID-19) pandemic has highlighted the need to better understand virus–host interactions. We developed a network-based method that expands the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)–host protein interaction network and identifies host targets that modulate viral infection. To disrupt the SARS-CoV-2 interactome, we systematically probed for potent compounds that selectively target the identified host proteins with high expression in cells relevant to COVID-19. We experimentally tested seven chemical inhibitors of the identified host proteins for modulation of SARS-CoV-2 infection in human cells that express ACE2 and TMPRSS2. Inhibition of the epigenetic regulators bromodomain-containing protein 4 (BRD4) and histone deacetylase 2 (HDAC2), along with ubiquitin-specific peptidase (USP10), enhanced SARS-CoV-2 infection. Such proviral effect was observed upon treatment with compounds JQ1, vorinostat, romidepsin and spautin-1, when measured by cytopathic effect and validated by viral RNA assays, suggesting that the host proteins HDAC2, BRD4 and USP10 have antiviral functions. We observed marked differences in antiviral effects across cell lines, which may have consequences for identification of selective modulators of viral infection or potential antiviral therapeutics. While network-based approaches enable systematic identification of host targets and selective compounds that may modulate the SARS-CoV-2 interactome, further developments are warranted to increase their accuracy and cell-context specificity.


Identification of host targets using network-based prioritization
We used a network-based protein prioritisation approach, the random walk with restart (RWR), to identify host protein targets closest to the VIP set. A random walk is a stochastic process, where the steps on the network take place with a certain probability. The process visits a sequence of adjacent nodes, forming a path, where at each step the next node is chosen at uniformly random among the neighbours of the current node. A RWR implements an additional restart probability, meaning that for every step taken in any direction, there is a probability of going back to the initial node. The advantage of the RWR method is that it provides a good representation of the topological similarity between the seed nodes and other nodes in the network.
In the RWR method, each protein node in the network is ranked in the descending order based on the probability of them being visited by a random walker that starts from any of the seed nodes (here, the VIP nodes). We implemented RWR using the Arete Cytoscape plugin that provides an exact solution of RWR, meaning it does not simulate a random walk, and therefore the method is much faster than the alternative algorithms available for calculating random walk [1,2].
The algorithm starts from a source node s and it is allowed to restart the walk in every time step at node s with a probability of r. Formally, the random walk with restart (RWR) is defined as The transition matrix W is the column-normalised adjacency matrix of the graph and ! is a vector that consists of the probabilities of being at node i at time step t. The initial probability vector $ is constructed such that the seed/source nodes (in this case nodes representing VIPs) are assigned equal probabilities. The candidate protein targets are then ranked according to the values of the steady-state probability vector % .
Instead iteratively running the nonhomogeneous first-order matrix difference equation (1) until convergence, the Arete method determines the closed-form solution of the steady-state vector as follows: The constant matrix R = ( − (1 − ) ) &# is pre calculated and the exact solution is obtained by matrix multiplication as % = $ . This calculation is further simplified as many elements of the vector $ are zero, with only elements representing the VIP proteins having non-zero values. Since the system is solved exactly using the inverse of an invertible matrix and also the graph is connected in our case (we consider the giant component of the network), the linear difference equation (1) is stable and ! converges asymptotically to the steady state % . To guarantee the convergence, we ran the Arete method 15 times on our network and obtained always the same set of RWR nodes.
We compared the network-based RWR algorithm with DIAMOnD, an iterative, local neighbourhood-based node prioritisation method [3]. The DIAMonD algorithm starts by considering the immediate neighbours of the seed nodes (here, VIPs), and then selects a node with the smallest probability of having as many edges to the seed nodes as that of a randomly selected protein according to the hypergeometric distribution. After the next node is selected, it is added to the seed set, and the process is repeated until the desired number of candidate nodes have been selected and ranked in the ascending order based on the iteration step at which they were selected.
The parameter values for both of these algorithms (RWR and DIAMOnD) were set to their default parameters; the restart probability in RWR was set to 0.7, and the number of ranked proteins in DIAMOnD was set to 200, which was the recommended value, since at ~200 iterations, the number of the DIAMOnD-prioritised proteins with direct biological evidence was shown to reach a plateau, suggesting 200 as the maximal number that should be considered [3]. We therefore selected the top-200 ranked proteins also in the RWR algorithm to match the size of the protein set identified by DIAMOnD.
We observed that the proteins identified by the RWR algorithm were more enriched in similar biological pathways with the VIP proteins, when compared to those identified by the DIAMoND algorithm ( Figure S6). Since the RWR-prioritized proteins were more related to the VIP nodes in terms of their biological functions, we considered the RWR identified host proteins for further analysis based on the top-200 nodes. The use of other cut-off values led to the same set of GO biological terms than those identified using the top-200 nodes, although their enrichment values differed slightly.
All the network visualisations were made using the Cytoscape software [4], and the network topological parameters, such as degree connectivity and betweenness centrality, were computed using the igraph R package [5].

Benchmarking network-based predictions using NCATS database
When comparing against the NCATS compound database [6], we converted the ChEMBL IDs into PubChem SIDs. The antiviral compounds were classified based on the curve rank, which is a numeric score, developed internally by NCATS, that combines classes of the doseresponse curves and efficacy, such that more potent and effective compounds with higher quality curves are assigned a higher rank [6]. In our analyses, all compounds with a curve rank >0 were classified as antivirals [7].
We compared our protein identification method with another computational method that integrates three network-based drug repurposing algorithms that rely on artificial intelligence, network diffusion and network proximity [8]. In this work by Gysi et al., they experimentally tested in VeroE6 cell lines 918 compounds among the 6340 identified by their computational method. Out of these 918 compounds, only 8.38% (77/918) showed either weak or strong activity against SARS-CoV-2 (strong activity required efficacy over a broad range of concentrations). The remaining 91.6% (841/918) compounds showed no antiviral activity against SARS-COV-2. We further compared the 918 compounds with the NCATS database, and found that 73 compounds were common, out of which 21.9% (16/73) were validated as antivirals by the NCATS assay. Compared to the NACTS success rate of 7.7% (730/9479), the computational method proposed by Gysi et al. therefore led to 2.85-fold increase in the prediction of antiviral compounds.
We also compared the compounds identified by our network-based method against the experimental validations carried out by Gysi et al. [8]. In total, 70 compounds were common between our list and the 918 compounds verified by the authors in VeroE6 cell lines. Among these, 17.5% (12/70) were classified as antivirals by Gysi et al. in VeroE6 cell lines. We therefore obtained a 1.28-fold improvement in accuracy of identifying antiviral compounds, when compared to 8.38% (77/918) accuracy by their computational method. Even if these comparisons provide some kind of reference for the general accuracy of network-based prioritization methods, the number of overlapping compounds between the methods and screens remain quite limited, making the comparisons potentially biased. Our PPI network only considers interactions verified in 293T cell lines to create a more context specific network, which is expected to decrease the success rate when comparing to experimental validations performed in other cell types.

Statistical analysis
All statistical analyses were performed using R [9] and Python [10]. The two-sample Kolmogorov-Smirnov test was employed to compare the distributions of compounds and targets between the VIP and NIP sets, with the null hypothesis being that the distributions are similar, using the Scipy package [11].

SARS-CoV-2 qRT-PCR assay
RNA was isolated using the SingleShot Cell lysis kit (Bio-Rad). Briefly, at designated time points, cells were washed 1x with PBS followed by addition of a lysis buffer. After a 5-minute incubation at room temperature, liquid was transferred to a PCR plate and further incubated at 37C for 5 min followed by 5 min at 75C. The resulting RNA was used to synthesise complementary DNA (cDNA) using the iScript cDNA Synthesis kit (Bio-Rad). Briefly, 6 µL of RNA was mixed with 2 µL of reaction buffer, 1,5 µL of water, and 0.5 µL of reverse transcriptase. This was incubated for 30 min at 42C followed by 5 min at 85C. 10 µL of water was added to each product to dilute the cDNA. Quantitative RT-PCR was performed on a Roche Lightcycler 480 using SYBR Green (Bio-Rad) with the following primers (all primers listed in the 5' to 3' orientation): human RPS18: TGC GAG TAC TCA ACA CCA ACA and CTT CGG CCC ACA CCC TTA AT SARS-CoV-2 E: ACAGGTACGTTAATAGTTAATAGCGT and ATATTGCAGCAGTACGCACACA For each reaction 2 µL of cDNA was mixed with 5 µL of SYBR, 2 µL of water and 0.5 µL of each primer. The RT-PCR reaction was 40 cycles of 5 seconds at 95C and 1 minute at 62C, followed by melt curve analysis. Melt curve analysis was used to assess whether single reaction products were produced. Expression was calculated relative to the housekeeping gene RPS18.

Drug combination analysis
The two-drug checkerboard assays were carried out as before [12]. The multi-dose combination experiment data were analysed in SynergyFinder v2.0 web-application (https://synergyfinder.fimm.fi), an open-source software for multi-drug combination analysis [13]. We used the Bliss independence model to quantify the combination synergy. This model is based on a stochastic process in which drugs elicit their effects independently, and based on the probability of independent events, the expected combination effect can be calculated [14]. The average Bliss synergy score over the entire dose-response matrix was calculated with SynergyFinder v2.0, along with the Maximum Synergistic Area (MSA), which is a 3 x 3 dose-response area of the matrix, where the drug combination shows the highest synergistic effect. The selective efficacy was calculated by subtracting the toxicity (viability of mockinfected control cells) from the efficacy (inhibition of virus-infected cells). The selective efficacy quantifies the difference between efficacy and toxicity, meaning that a selective efficacy of 100 indicates that the drug combination inhibits the virus 100% and does not affect the mockinfected cells, while a selective efficacy of 0 indicates that the drug inhibits 100% of both the virus and mock-infected cells, indicating a high toxicity. The outliers for MPA at doses 3,6 and 12nM were corrected using DECREASE web-application (https://decrease.fimm.fi/) [15].