Prediction and analysis of human-herpes simplex virus type 1 protein-protein interactions by integrating multiple methods

Background : Herpes simplex virus type 1 (HSV-1) is a ubiquitous infectious pathogen that widely affects human health. To decipher the complicated human-HSV-1 interactions, a comprehensive protein-protein interaction (PPI) network between human and HSV-1 is highly demanded. Methods : To complement the experimental identi ﬁ cation of human-HSV-1 PPIs, an integrative strategy to predict proteome-wide PPIs between human and HSV-1 was developed. For each human-HSV-1 protein pair, four popular PPI inference methods, including interolog mapping, the domain-domain interaction-based method, the domain-motif interaction-based method, and the machine learning-based method, were optimally implemented to generate four interaction probability scores, which were further integrated into a ﬁ nal probability score. Results : As a result, a comprehensive high-con ﬁ dence PPI network between human and HSV-1 was established, covering 10,432 interactions between 4,546 human proteins and 72 HSV-1 proteins. Functional and network analyses of the HSV-1 targeting proteins in the context of human interactome can recapitulate the known knowledge regarding the HSV-1 replication cycle, supporting the overall reliability of the predicted PPI network. Considering that HSV-1 infections are implicated in encephalitis and neurodegenerative diseases, we focused on exploring the biological signi ﬁ cance of the brain-speci ﬁ c human-HSV-1 PPIs. In particular, the predicted interactions between HSV-1 proteins and Alzheimer ’ s-disease-related proteins were intensively investigated. Conclusion : The current work can provide testable hypotheses to assist in the mechanistic understanding of the human-HSV-1 relationship and the anti-HSV-1 pharmaceutical target discovery. To make the predicted PPI network and the datasets freely accessible to the scienti ﬁ c community, a user-friendly database browser was released at http:// www.zzdlab.com/HintHSV/index.php.


INTRODUCTION
Herpes simplex virus type 1 (HSV-1) is a neurotropic, enveloped, and double-stranded linear DNA virus [1][2][3][4].The genome of HSV-1 is roughly 152 kb, encoding more than 74 different genes [3].As a widespread infectious virus, it can be transmitted from person to person through direct contact.Around 3.7 billion people under the age of 50 are estimated by the World Health Organization to be infected with HSV-1 worldwide [5].Once entering the human body from the skin or mucosa, HSV-1 can enter sensory neurons and be transported through axons to the trigeminal ganglion where a latent infection is established.
When stimulated, the latent virus can be reactivated to cause symptomatic or asymptomatic recurrent infections, leading to common cold sores, blisters, and various serious diseases [2][3][4]6].HSV-1 can also reach the central nervous system (CNS), occasionally leading to fatal neurological diseases, such as the herpes simplex encephalitis (HSE) [7,8].Moreover, an increasing evidence points to a strong association between HSV-1 infection and the Alzheimer's disease (AD) [9].There is no existing antiviral drug known that would eliminate an HSV-1 infection as the virus can undergo latent infection and thereby evade drug interactions.Therefore, more fundamental research efforts are required to decipher the complicated human-HSV-1 interactions to provide hints for developing novel prophylactic or therapeutic methods against viral infections.
Investigations on protein-protein interactions (PPIs) between the host and the pathogen can reveal key biological processes concerning the interaction as well as elucidate the underlying mechanisms of infectious diseases and thereby support the development of novel therapeutic strategies.As an important branch of hostpathogen PPI studies, human-virus PPI has always been a focus given the close relationship with human diseases.Current research efforts may focus on individual viral proteins at a time, such as glycoproteins involved in the HSV-1 entry into the host cell [10], ICP34.5 (neurovirulence factor) [11], ICP0 (viral E3 ubiquitin ligase) [12], ICP8 (single-stranded DNA-binding protein) [13], ICP4 (major viral transcription factor) [14] and so on.Therefore, it is still essential to decipher the interactome between human and HSV-1 proteins from a global perspective.Additional available data would enable a more robust PPI network to be built between human and HSV-1, which would make our understanding more comprehensive.In general, the experimental identification of PPIs, including the human-virus PPIs, is timeconsuming, labor-intensive, and expensive.In this context, cost-effective computational prediction methods play an increasingly important role in supplementing the experimental identification of PPIs.
A plethora of host-pathogen PPI prediction methods including human-virus PPI were previously developed [15][16][17][18], mainly originating from intra-species PPI prediction methods [19][20][21].In principle, traditional intra-species PPI prediction methods, such as the interolog mapping (IM) [22], the domain-domain interaction (DDI)-based method [22,23], and the domainmotif interaction (DMI)-based method [24], can be readily adapted to the prediction of human-virus PPIs.The IM can be used indirectly as a remedy for data scarcity by homolog knowledge transfer based on the assumption that the interacting protein pairs in one species are likely to be conserved in their cousins [25].Interacting domain pairs are considered as the building blocks of PPI networks.Itzhaki's research [26] showed that interacting domain pairs potentially mediate humanherpesvirus interactions.The DMI-based method is slowly being revealed to be useful, given the extensive mimicry of host protein short linear motifs by viruses [27,28].With the accumulation of experimentally verified human-virus PPI data, machine learning (ML)-based prediction methods were increasingly popular in the past decade, which made them worthy to be applied to the prediction of human-HSV-1 PPIs.Although none of the existing human-virus PPI prediction methods can achieve satisfactory performance, it is common knowledge that more powerful and robust predictive performance can be achieved by the integration of multiple prediction methods, which was implemented in a series of studies [21,29,30].
In this work, four PPI inference methods (i.e., the IM, DDI, DMI, and ML-based method) were integrated for high-confidence PPI prediction between human and HSV-1 across the entire proteome.In addition to the ML-based method that can output predicted scores, the other three traditional PPI prediction methods were also refined, so that each prediction method could yield an interaction probability score for any query protein pair.The four predictive scores for the query protein pair were further integrated into a final score.PPIs with higher final scores (integration score > 0.5) were singled out for further analysis.In addition to the general functional and network topology analyses of HSV-1 targeting human proteins, the biological significance of the predicted human-HSV-1 interactome was further explored with a focus on brain tissue-specific PPIs.In particular, the potential mechanisms of the HSE and AD in the context of the human-HSV-1 interactome were investigated.

RESULTS AND DISCUSSION
The landscape of predicted human-HSV-1 PPIs In this work, an integrative computational framework was applied to predict the interactions between 74 different proteins of HSV-1 strain KOS as well as 20,412 reviewed human proteins.Four methods (IM, DDI, DMI, and ML) were used in our computational framework to predict whether two proteins interact (Fig. 1).Briefly, IM is based on the experimentally validated interactions of multiple homologous protein pairs (i.e., interologs) of the query human-HSV-1 protein pair; DDI/DMI relies on the detection of the known or possible domain-domain/ motif interactions in the query protein pair to infer the interaction probability; ML is trained from the known PPIs between human and HSV-1, the feature encoding schemes of which include the sequence features extracted from protein pairs and the network properties of human proteins in the corresponding human PPI network.Finally, the four interaction probability scores (Pr IM , Pr DDI , Pr DMI , and Pr ML ) were combined into an integration score (Pr) representing the interaction probability of the human-HSV-1 protein pair.It is hard to precisely rank the performance of the four individual methods due to data limitation and bias of known human-HSV-1 PPIs.Thus, each method in the final integration was treated independently and assigned with the same weight.More methodological details are available in "Materials and methods".
The number of PPIs predicted by each method was calculated separately.As shown in Fig. 2A, the number of PPIs predicted by the DMI was the largest (41,828), followed by the DDI (13,579), the IM (7,805), and the ML (6,341).In general, the percentages of overlapping PPIs among different methods are low, implying that different methods are distinctive and complementary.Due to similarity in the methodologies, the DDI achieved relatively more consistent PPI prediction results compared to the IM and the DMI (the overlap rate in both cases accounted for about 10% of its total).After integrating the results of the four methods, the number of predicted PPIs with Pr > 0 was 65,673.Although a higher Pr should correspond to a higher reliability, it is still necessary to set a reasonable and convincing threshold for high-confidence predictions.The solution was sought from high-throughput human-virus PPI identification studies.Taking the number of experimentally validated human-HIV-1 PPIs as a reference, 100 -200 interactions with human proteins were identified for each HIV-1 protein in some high-throughput experimental studies [31].Supplementary Fig. S1 showed the number of PPIs obtained under different confidence cutoffs.In general, a low threshold will result in too many predictions, which inevitably contain false positives.On the contrary, a high threshold will yield too few predictions, and many potential interactions will be ignored.Thus, the threshold of PPI predictions was empirically set to Pr > 0.5 and 10,432 PPIs were singled out as the most likely interacting protein pairs (Supplementary Fig. S1).On average, each viral protein interacts with 141 human proteins, which is a relatively reasonable number range in comparison to high-throughput PPI experimental identifications between human and HIV-1.Moreover, we found that 690 of 728 experimentally verified PPIs (collected from the HPIDB database and used in the ML method) overlaps with our 10,432 predicted results (Supplementary Data Set S1), 601 of which PPIs could be predicted by more than one method.Figure 2B showed that the IM method accounted for the largest proportion among these 10,432 high-confidence PPIs.

Functional and network analyses showing the reliability of predicted human-HSV-1 PPIs
The 10,432 high-confidence PPIs were further analyzed.
First, the number of human proteins targeted by each HSV-1 protein was counted (Fig. 3).On average, one HSV-1 protein interacted with 145 human proteins and the top ten HSV-1 proteins contributed to 5,963 interactions (approximately 57%) in the predicted human-HSV-1 interactome.The HSV-1 protein UL22 was predicted to have the most interactions with human proteins, and the predicted interaction partners were significantly enriched in the category of membranebounded organelle components (hypergeometric test, corrected p-value = 3.37Â10 -51 ).Previous studies suggested that UL22, also called as the envelope glycoprotein H (gH), complexed with glycoprotein L (gL, UL1) and interacted with glycoproteins B (gB, UL27) and D (gD, US6) to form a viral membrane fusion machine, thereby driving the fusion of the virus with the host membranes to allow the enter or spread of the virus between the host cells [32].It is, therefore, reasonable to predict that this viral protein interacts with multiple human proteins especially membrane proteins.RL2, E3 ubiquitin ligase (ICP0), was predicted to interact with several human proteins that belong to the host cellular interferon-related proteins category (hypergeometric test, corrected p-value = 1.32Â10 -9 ), which may indicate that the RL2 is a weapon of the HSV-1 to counteract the intrinsic-and interferon-based antiviral responses.Thus, the predicted viral targets play an important role in the viral infection process, indicating the reliability of our human-virus PPI prediction.
Viral proteins tend to target some important host (human) proteins, such as the "hub" (high-degree centrality) and "bottleneck" (high-betweenness centrality) nodes of the human PPI network, to hijack and utilize host cells for viral life cycles [33].Therefore, the degree and betweenness centrality of target proteins (proteins in the human PPI network that are targeted by the HSV-1) and non-target proteins (proteins in the human PPI network that are not targeted by HSV-1) from the perspective of network biology were also calculated.It can be seen from Fig. 4 that, whether in degree or betweenness centrality, the values of target proteins were significantly higher than those of the non-target proteins (Wilcoxon rank-sum test, p-value < 2.2Â10 -16 ), which is in accordance with previous observations inferred from human-pathogen PPI network analyses [34].

Functional analysis of brain-specific human-HSV-1 PPIs
Among several diseases caused by HSV-1 infection, sporadic but often fatal HSE in the brain is of great concern.Therefore, additional focus was placed on PPIs in which the human proteins are specifically expressed in the brain tissue.569 PPIs containing 283 brain-specific human proteins from the 10,432 high-confidence PPIs were selected.According to the Gene Ontology (GO) enrichment analysis (Fig. 5), cell adhesion-related biological process (BP) terms, such as "cell adhesion", "biological adhesion" and "cell-cell adhesion", were found to be significantly enriched (Fig. 5A, corrected pvalue = 2.33Â10 -7 , 2.33Â10 -7 and 2.2Â10 -14 , respectively), which indicated the reliance of HSV-1 on the intricate events of attachment and fusion to enter cells, especially by utilizing its envelope proteins (envelope glycoproteins) to interact with cell adhesion molecules to mediate this process [35].In our results, 55 cellular adhesion molecules were predicted to interact with HSV-1 proteins.In the cellular component (CC) category, human proteins were found to be significantly enriched in microtubule or microtubule cytoskeleton (Fig. 5B, corrected p-value = 1.14Â10 -4 and 4.82Â10 -4 , respectively).Microtubules are major components of the cytoskeleton and are known to be involved in transport in all eukaryotic cells.Therefore, the above enriched GO terms are in accordance with previous knowledge about the transportation of viral capsids to and from the nucleus to complete the replication cycle after entering the host cell.This is particularly relevant to the processes associated with the establishment of latent infection and reactivation in neurons, during which the transport of capsids along microtubules in long axons is required.
Besides, one strategy usurped by the HSV-1 is to guide the entry pathway by the manipulation of various cell signaling cascades [36].In the GO enrichment analysis  results of molecular function (MF) entries (Supplementary Fig. S2), the GO term of "calcium ion binding" was found to be significantly enriched.Ca 2+ is one of the most prominent and common signal carriers and is known to modulate several steps during virus replication.The entry of HSV-1 is triggered by the interaction of the gH protein with cellular integrin, which eventually triggers Ca 2+ -mediated signaling pathways within the cell to ensure effective nucleocapsid translocation into the cytoplasm [36].Although the relationship between chloride channels and viral infections has so far received less attention, previous studies showed that chloride channels play an important role in the HSV-1 entry [37].Here, the CC enrichment of the chloride channel complex and the MF enrichment of the chloride channel activity were also found to be significant, further supporting the association between the chloride channel and the HSV-1 entry.
Collectively, the GO enrichment results of the HSV-1-interacting human brain-specific proteins were consistent with known functions associated with the HSV-1 replication cycle, suggesting that the PPIs between HSV-1 and human disrupt the normal function of proteins in the brain cells, which may cause inflammation and damage leading to HSE.These data also support the overall reliability of the predicted PPIs.A vital subnetwork (Supplementary Fig. S3) of the human-HSV-1 interactome is expected to be formed by the 569 PPIs, which may enhance the mechanism-wise understanding of diseases related to HSV-1 infection (e.g., HSE) as well as providing new hints to the discovery of novel therapeutic targets.
The association of the HSV-1 with the AD in the context of human-HSV-1 PPIs Increasing evidence points to the association of HSV-1 brain infection with AD.HSV-1 is present in the latent state in a high proportion of elderly brains.Intermittent reactivation from the latent state may cause local damage and inflammation, accumulation of which might eventually lead to AD [7].
To investigate whether the prediction results could provide supportive evidence for the association between the AD and HSV-1 infection, 1,947 AD-related human genes were compared with the 4,546 predicted HSV-1 target proteins (human proteins present in the 10,432 predicted PPIs), and 635 were found to be overlapping (Fig. 6A, hypergeometric test, p-value = 1.37Â10 -12 ).Meanwhile, the overlap between AD-related genes and target proteins specifically expressed in brain tissue was calculated and found to be still significant (hypergeometric test, p-value = 4.18Â10 -10 ).The average network distance of AD-related genes to target proteins and nontarget proteins in the human PPI network was also calculated with results showing that AD-related genes were closer to target proteins (Fig. 6B).The above network analyses may suggest the strong association of many HSV-1 target proteins with the AD, and it can be hypothesized that the virus may also indirectly affect these AD-related genes by interacting with other proteins to enhance their ability to influence the AD risk and predisposition.
The amyloid precursor protein (APP) is a single-pass transmembrane protein that is widely expressed in tissues, especially at high levels in the brain neurons, and is subsequently metabolized rapidly [38].Two pathways are known for the proteolysis of the APP (Fig. 6C), one of which includes its cleavage by α-secretase, generating the sAPPα fragment, and the other includes its cleavage by βsecretase (BACE1), producing neurotoxic amyloid β (Aβ) [38].One of the commonly recognized hallmarks of the AD is the accumulation of the Aβ.First, HSV-1 uses its capsid proteins to physically interact with the APP, thereby hijacking the APP to transport newly generated virions in infected cells through a rapid anterograde transport mechanism [2].Although such behavior changes the intracellular distribution of the APP and seems to prevent it from its conversion to Aβ partially, HSV-1 infection triggers an intra-CNS anti-microbial innate immune response to induce APP phosphorylation and activates the BACE1 activity, which jointly promotes the production of Aβ [39].The Aβ would encapsulate the HSV-1 virions to facilitate their clearance by autophagy [40,41].HSV-1 also employs virulence factors to counterattack, inhibiting the autophagy-lysosome pathway of Aβ through interaction with the Beclin-1 [11].The imbalance between the production and elimination of the Aβ caused by the HSV-1 infection accounts for excessive intracellular neurotoxic Aβ deposition within autophagosomes and endosomes, thus inducing neuronal apoptosis, which in turn can drive the degeneration of CNS tissue and the development of AD.Our predicted PPIs showed that three HSV-1 proteins (UL2, UL21, and UL45) interacted with the APP, two of which were in line with the experimental observation.Besides, the RL1 and UL45 were also predicted to play a virulence factor role in the interaction with the Beclin-1.In summary, the recapitulated interactions between the HSV-1, APP, and Aβ further argue for a mechanistic basis for the association between the HSV-1 infection and the risk of the AD (Fig. 6C).

Interactive web interface
The predicted 10,432 high-confidence PPIs were stored in a database to which an interactive web interface was provided (http://www.zzdlab.com/HintHSV/index.php) to facilitate user access.We have provided a search box for 72 HSV-1 proteins participating in these 10,432 PPIs, so any protein can be selected to view the corresponding interactions.For each HSV-1 protein, a table is provided to display all the prediction scores for each human target protein (including four individual prediction scores and one integrative score) and a subnetwork to show the PPIs, which are available for download.Human proteins can also be searched by the users to find possible PPIs with HSV-1.The 569 brain-specific PPIs, 690 known PPIs, and other datasets used in this work are also downloadable in the web interface.

Limitations of our work
The current work is inevitably subjected to the following limitations since the number of experimentally known human-HSV-1 PPIs is not sufficient.Firstly, some parameter settings were empirically selected since sufficient data for strict parameter optimization was not available.Secondly, the integration of different PPI inference methods was also hindered by the lack of data availability.In case of sufficient amount of known PPI data, some more powerful integration methods, such as the logistic regression can be tested.Thirdly, the reliability of the prediction results could not be directly assessed either.Even so, the prediction results are believed to become an important data resource, after the careful implementation of state-of-the-art PPI inference methods to provide useful PPI candidates for further experimental validation.Moreover, the new human-HSV-1 PPIs identified by experimental scientists in the future will continuously answer the overall reliability of the current predictions.

CONCLUSION
In this work, four popular PPI inference methods were used to predict the PPIs between human and HSV-1.To maximize the reliability of predictions, the interaction probability scores from the four methods were integrated into a final probability score and a stringent threshold (Pr > 0.5) was selected to single out high-confidence PPIs.The subsequent functional and network topology analyses also proved an overall reasonable reliability in methodology for the prediction strategy.To investigate the associations between the HSV-1 infection and neurodegenerative diseases (e.g., the HSE and the AD), the focus was placed on brain-specific PPIs between human and HSV-1, and a subnetwork containing 569 inter-species PPIs was established.Functional analysis shows that human proteins involved in the entry, intracellular transport pathways, and various regulatory pathways, are utilized or hijacked by the HSV-1 through complicated inter-species PPIs.Collectively, the established human-HSV-1 PPI network provides a global landscape regarding the human-HSV-1 interactome, as well as new insights into the pathogenesis of the HSV-1 infection.

HSV-1 and human proteins
In this work, the focus was placed on the PPI prediction between the HSV-1 strain KOS and human.All the proteins of the HSV-1 strain KOS were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/nuccore/952947517/).By merging two redundant proteins (Protein RL2 repeats with protein RL2_1; Protein RS1 repeats with protein RS1_1; the results are presented as RL2/ RL2_1 and RS1/RS1_1, respectively), 74 HSV-1 proteins were obtained (Supplementary Data Set S2).20,412 reviewed human proteins used for prediction were downloaded from the UniProt database [42] (Supplementary Data Set S3).

Brain-specific human genes
Brain-specific genes revealing elevated expression in the cerebral cortex were downloaded from the Human Protein Atlas (www.proteinatlas.org).By UniProt ID mapping, 1,442 brain-specific human proteins were obtained.

AD-related human genes
Gene-disease associations were downloaded from Dis-GeNET (http://www.disgenet.org/).The resulting 1,947 AD-related human genes were obtained by the UniProt ID mapping tool.

Human PPI network
The human interaction network was collected in our previous work [43], consisting of 345,064 PPIs and 18,473 proteins.It was used for network parameter analyses and network-based encoding in the development of the ML-based predictive model.The R package called the igraph [44] was used to calculate the network parameters of protein nodes in the network.

PPI prediction methods
To ensure that the predictions are robust and reliable, four prediction methods were used to infer the PPIs between 74 HSV-1 proteins and 20,412 human proteins.The four methods gave the probability scores (0 -1) of the interaction for 74 Ã 20,412 protein pairs.Finally, the four scores were combined into one final score (Pr final ) according to the integration method used in the STRING database [30].It was calculated in a naïve Bayesian manner under the assumption of the independence of various methods.The formulas to infer Pr final are as follows: Here p denotes a prior factor, which is set as 0.041 following the setting provided by STRING.Pr IM , Pr DDI , Pr DMI , and Pr ML stand for the interaction probability score for the IM, DDI, DMI, and ML method, respectively.Each method is briefly described in the following subsections.

The IM method
The IM method is a widely used PPI inference method.The core idea of IM is to infer unknown PPIs from known homologous PPIs (termed as interologs) in other organisms.Previous IM applications often used the PPI templates from one or several model species to infer unknown PPIs.To maximize the IM method, we extended the species source range of template PPIs to cover most of the experimentally identified PPIs, including both of intra-species and inter-species PPIs.Here, 571,359 template PPIs with relatively complete information were collected from seven public databases, including Bio-GRID [45], DIP [46], HPIDB [47], IntAct [48], PATRIC [49], InnateDB [50] and VirHostNet [51].We employed the strategy of HIPPIE [52] to evaluate the quality of each PPI template.For each PPI template, a quality score (S temp ) ranging from 0 to 1 was assigned by accounting for three conditions (i.e., the experimental methods for the PPI determination, the literature reporting the PPI, and the species included in the PPI).The six parameter values in the formula are as set in HIPPIE.To identify the interologs for a query protein pair between human and HSV-1, BLAST searching was conducted to identify their homologs, and the criteria for two proteins to be considered homologous are as follows: E-value£10 -5 , sequence identity≥30%, and alignment coverage of query protein≥40%.In case n homologous pairs were identified for the query pair, the IM-based interaction probability (Pr IM ) can be defined as: The DDI-based method Considering that the interaction between two proteins may be mediated through evolutionally-conserved, interacting domain pairs existing in the proteins, the DDI method was developed for PPI prediction.The list of known DDIs can be downloaded from the 3did database [53].To construct as large DDI library as possible, the expectation-maximization (EM)-based algorithm proposed by Liu et al. [54] was also employed to mine domain pairs that are frequently used in known PPIs.
Here, the domain definition was based on the Pfam database [55], and hmmscan [56] was employed to search for protein domains (E-value£10 -5 ).Among the known PPIs collected in this study, 918,116 PPIs conformed to the requirement that the corresponding two protein partners should contain Pfam domains.The probability of DDIs contained in these PPIs was evaluated using the EM algorithm.Because some domains frequently occurred in proteins that may not participate in PPIs, to avoid the introduction of potential noise, domains that occurred in such a highly frequent manner were not taken into account in the subsequent implementation of the EM algorithm.Finally, a comprehensive DDI library was compiled by combining the known DDIs in 3did and the inferred DDIs through the EM algorithm.With the principle that DDIs collected from 3did should be more reliable, the confidence score (S DDI ) for each DDI in the library was assigned based on the following formula: where S DDI-known takes 1 or 0 respectively to represent whether the DDI is known to be from 3did or not, and S DDI-EM is the score of the DDI from the EM algorithm, ranging from 0 to 1.The probability of interaction (Pr DDI ) between one HSV-1 protein and one human protein was inferred from the n domain pairs they contain, which is defined as: The DMI-based method DMI is also considered to be an important way to mediate human-virus PPIs.Like the DDI method, the DMI method can also infer PPIs.The DMI library is also a combination of known DMIs and the inferred DMIs with the assistance of the EM algorithm.Known DMIs was also be downloaded from 3did.Here, domain assignment is the same as in case of the DDI method.The motif of each protein was identified only from those motif patterns that were contained in known DMIs.Moreover, like the filtering strategy used in the DDI method, the evaluated DMIs containing the highly frequently occurred domains or motifs were removed before their scoring was undertaken with the EM algorithm.Finally, the confidence score (S DDI ) for each DDI in the library was defined using the following equation: where S DMI-known takes 1 or 0 respectively to represent whether the DMI is known to be from 3did or not, and S DMI-EM is the score of the DMI from the EM algorithm.The interaction probability (Pr DMI ) of a human-HSV-1 protein pair containing n domain-motif pairs was further inferred from the following formula: The ML-based method During the development of ML prediction models, both positive and negative samples are required.Positive and negative samples for human-virus PPI predictions are known to be highly skewed in the real application.The ratio of positive and negative samples used in the training of ML-based PPI prediction models remains an open issue.Instead of using balanced or extremely unbalanced training sample ratios, a relatively imbalanced ratio is often adopted.Based on the above considerations, the ratio of positive to negative samples was empirically set to 1:10.Therefore a training dataset containing 728 positive samples (i.e., known human-HSV-1 PPIs) and 7,280 negative samples (i.e., human-HSV-1 non-PPIs) was compiled to develop an ML-based predictor.The positive samples were collected from HPIDB 3.0 (the download date is December 2018), in which HSV-1 proteins from different strains (not just the strain KOS) were taken into account, while the negative samples were randomly selected from human-HSV-1 protein pairs with unidentified interaction relationships.Moreover, two encoding schemes were employed to transform protein pairs into feature vectors, including a sequence-based encoding scheme called the CKSAAP as well as a network property-based encoding scheme called the NetTP.The CKSAAP calculated the composition of k-space amino acid pairs for protein pairs.The NetTP encoding scheme considered that human proteins targeted by viral proteins have different network properties from those that are not targeted.Six network topology parameters were used to infer the NetTP encoding, including the degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, PageRank centrality, as well as eccentricity.More details about these two encoding schemes are available in our previous publication [43].Subsequently, the predictive models of the two encoding methods were both trained by the random forest method, and they were subsequently integrated into a stronger predictive model through logistic regression.The performance of the two individual models as well as the integrative model was evaluated through a 5-fold cross-validation (Supplementary Fig. S4).In general, the integrative model could outperform each ML model.For each query protein pair, the final prediction model generated a prediction score (S ML ) ranging from 0 to 1.Note that the F1 value was chosen to comprehensively evaluate the performance of the model, which is the harmonic mean of precision and recall of the model.When the F1 reaches the maximum under a certain threshold, the precision and recall of the model would achieve an optimal balance.The definitions of precision, recall, and F1 are as follows: where TP, TN, FP, and FN denote the numbers of true positives, true negatives, false positives, and false negatives, respectively.We calculated the F1 values of the model in the 5-fold cross-validation according to different thresholds and took the threshold value of 0.363 corresponding to the maximum value of F1 as the final criterion to determine whether the query pair had interaction or not.Furthermore, the prediction score was converted into the ML-based interaction probability score (Pr ML ): Pr ML = S ML , S ML ³threshold 0, S ML < threshold : (12)

ID mapping
The online UniProt ID mapping tool (https://www.uniprot.org/uploadlists/) was used to convert other IDs (e.g., human or viral gene IDs) into UniProt IDs.

GO enrichment analysis
The BiNGO plugin [57] in Cytoscape [58] was used for the GO enrichment analysis.The enrichment analysis of the UL22-targeted human proteins was conducted against the background of 20,412 reviewed human proteins, and the GO category of the CC was selected.To explore why the HSV-1 targets these 283 human proteins that are specifically expressed in brain tissues, a GO enrichment analysis of the three categories (BP, CC, and MF) was conducted by taking the 1,442 brain-specific human proteins as the background (reference set).Statistical significance was inferred from the hypergeometric test and enriched terms were selected with a significance level of 0.05 after the Benjamini and Hochberg False Discovery Rate correction.

Figure 1 .
Figure 1.Workflow for the prediction of human-HSV-1 PPIs.The interaction probability for each human-HSV-1 protein pair by interolog mapping (IM), domain-domain interaction (DDI), domain-motif interaction (DMI), or machine learning (ML)-based methods was evaluated.The four interaction probability scores (Pr IM , Pr DDI , Pr DMI , and Pr ML ) subsequently formed the final probability score (Pr).

Figure 3 .
Figure 3.The number of human proteins predicted to interact with HSV-1 proteins.

Figure 5 .
Figure 5. Enriched GO terms of the brain-specific human proteins predicted to interact with HSV-1 proteins in the biological processes (A) and cellular component (B) categories.

Figure 6 .
Figure 6.Association between human-HSV-1 PPIs and the AD.(A) Overlaps between human targets and AD-related genes.(B) Differences in network distance between target proteins and non-target proteins to AD-related genes in the human PPI network.*** denotes statistical significance (Wilcoxon test, p-value < 2.2Â10 -16 ).The mean value is represented by the small diamond box.(C) The possible relationship between APP, Aβ, and HSV-1.① Interaction of HSV-1 proteins (UL2, UL21, and UL45) with the APP.② APP phosphorylation and increased BACE1 activity induced by the HSV-1 infection, resulting in the conversion of APP to Aβ. ③ Aβ inhibits viral activity by encapsulating viral proteins.④ HSV-1 proteins (RL1 and UL45) interact with Beclin-1, suppressing the degradation of Aβ via the inhibition of its autophagy-lysosome pathway.⑤ To sum up, the deposition of Aβ is led by the imbalance between the Aβ production and degradation.
protein pair i not in PPI templates S temp , if protein pair i in PPI templates