Drug Side-Effect Prediction Via Random Walk on the Signed Heterogeneous Drug Network

Drug side-effects have become a major public health concern as they are the underlying cause of over a million serious injuries and deaths each year. Therefore, it is of critical importance to detect side-effects as early as possible. Existing computational methods mainly utilize the drug chemical profile and the drug biological profile to predict the side-effects of a drug. In the utilized drug biological profile information, they only focus on drug–target interactions and neglect the modes of action of drugs on target proteins. In this paper, we develop a new method for predicting potential side-effects of drugs based on more comprehensive drug information in which the modes of action of drugs on target proteins are integrated. Drug information of multiple types is modeled as a signed heterogeneous information network. We propose a signed heterogeneous information network embedding framework for learning drug embeddings and predicting side-effects of drugs. We use two bias random walk procedures to obtain drug sequences and train a Skip-gram model to learn drug embeddings. We experimentally demonstrate the performance of the proposed method by comparison with state-of-the-art methods. Furthermore, the results of a case study support our hypothesis that modes of action of drugs on target proteins are meaningful in side-effect prediction.


Introduction
Drug side-effects, or adverse drug reactions (ADRs), can be regarded as undesirable effects that are caused by normal use of drugs. Most are natural pharmacological actions of the drugs and are unavoidable. Side-effects of approved drugs are harmful to patients and can even be fatal. Serious side-effects cause 100,000 deaths per year in the United States and have been one of the leading causes of death [1]. The detection of side-effects is challenging in every stage of drug development.
Traditional side-effect detection methods such as vitro safety profiling and clinical drug safety trials are time-consuming and expensive, and many potential side-effects cannot be detected because there are so many side-effect terms. Many computational methods for analyzing or predicting drug side-effects have been proposed recently [2][3][4][5]. Li et al. [6] combined multiple data sources and proposed an inductive matrix completion method for predicting unknown side-effects. Zhang et al. [7] proposed an integrative label propagation algorithm to predict potential side-effects based on high-order similarity. Zheng et al. [8] built a drug similarity integration framework to measure the similarity between drugs from various perspectives and proposed a highly reliable negative sample selection method to improve the prediction accuracy. Zhang et al. [9] adopted ensemble methods and the feature-selection-based multilabel k-nearest neighbor method (FS-MLKNN) for side-effect prediction. Liu et al. [10] built a series of binary classifiers to determine whether a drug has a specified side-effect. With the advancement of graph embedding technologies [11][12][13], many studies have focused on learning integrated drug embeddings for various prediction tasks. Ma et al. [14] proposed a drug embedding method that is based on multi-view deep autoencoders to predict drug side-effects. Hu et al. [15] proposed a heterogeneous network embedding approach by integrating PPI information into drug embeddings.
The methods that are discussed above mostly utilize chemical profiles (e.g., fingerprints) and biological profiles (e.g., target proteins, pathways, and transporters) of drugs to predict potential side-effects. It has been proved that drug-target interactions play an important role in side-effect prediction [16][17][18]. However, the drug-target interactions that are considered in these works focus only on whether a single drug acts on a target protein and neglect the modes of action of drugs on target proteins. There are various and even opposite action modes when drugs act on target proteins, such as activation, inhibition, agonist, antagonist, potentiator, blocker, inducer, suppressor, and so on. The literature [19] found that modes of action of drugs are strongly related to the therapeutic effects and side-effects of drugs through statistical analysis.
Based on the results of [19], in this paper, we propose utilizing the action mode information of drug-target interactions to learn more effective drug embeddings for drug side-effect prediction. The modes of action of drugs on target proteins can be represented as positive or negative edges in a signed graph (as shown in Table 1). When two drugs act coherently on a common target protein, the two drugs are regarded as similar and are connected with a positive edge. In contrast, when two drugs act incoherently on a common target protein, the two drugs are regarded as dissimilar and are connected with a negative edge. This produces a signed drug network. Signed networks have been frequently used in systems biology [20][21][22][23] and can reveal the deeper complex relations between individuals. The signed drug information and other traditional drug information form a signed heterogeneous information network (signed HIN). A signed HIN is a network with multiple types of nodes and links, and the links can be positive or negative [24]. Multiple types of drug information and every kind of information can be formulated as a signed subnetwork or an unsigned subnetwork. The subnetworks form a signed drug HIN. We projected every drug into a low dimensional vector space and predicted the side-effect of the drug. The low dimension vector representations of drugs synthesized different drug information. The problem of learning drug representations can be formulated as a network embedding task on the signed drug HIN. We propose utilizing random walk [25] on the signed HIN to obtain the drug sequences and utilizing the Skip-gram model [26] to learn drug embeddings. We also experimentally demonstrate that the learned drug embeddings that integrating additional signed drug information can substantially improve the side-effect prediction accuracy. A flowchart of our side-effect prediction model via random walk on a signed HIN (RW-SHIN) is shown in Figure 1. First, we collected drug profiles from public databases, especially action modes of drug-target interactions, from the DrugBank database [27]. Then, we constructed a signed heterogeneous information network (signed HIN) based on these profiles. The signed HIN contained one signed drug subnetwork and three unsigned drug subnetworks. RW-SHIN performed different random walk procedures on signed subnetworks versus unsigned subnetworks. Here, two biased random walk procedures [28,29], which can effectively incorporate positive and negative relations into a graph, were used to obtain nodes sequences. Then, we utilized the Skip-gram model to learn drug embeddings based on the node sequences. After that, the drug embeddings were input into a fully connected neural network to predict a potential side-effect.

Drug Chemical Profile
We collected the fingerprints of drugs from the PubChem Compound database [30]. Each drug was encoded into an 881-dimensional feature vector, with entries of 0 or 1 representing the presence or absence, respectively, of a chemical substructure term.

Drug Side-Effect Profile
SIDER [31] is a database that contains the side-effects of drugs on the market that are listed in their package inserts. We downloaded the entire database from http://sideeffects.embl.de/. The information in SIDER is limited because many side-effects may not be detected in clinical trials. OFFSIDES [32] is another side-effect database, which was built by mining the FDA Adverse Event Reporting System (FAERS, http://www.fda.gov/cder/aers/default.htm). We extracted the associations between drugs and side-effects from the two databases. This produced a dataset that consisted of 548 drugs, 1385 side-effect terms, and 41,008 associations between drugs and side-effects, and each drug had 74.8 side-effects on average. An index plot of the number of associated drugs for each side-effect is shown in Figure 2.
Each drug was represented by a 1385-dimensional feature vector, where each element is 1 or 0, which represents the presence or absence, respectively, of a side-effect term for this drug.

Drug Biological Profile
We crawled the biological profiles of drugs in the DrugBank database [27], which is a widely used public drug information database. We extracted the target proteins of the 548 drugs without considering the action modes of the drugs on the target proteins. Each drug is encoded into a 780-dimensional feature vector, with 0 or 1 representing the presence or absence, respectively, of a target protein for this drug.
Meanwhile, we crawled the action modes of 548 drugs on 780 target proteins and associated every mode with a sign according the procedure that is described in [15] (as presented in Table 1). This produced a signed drug-target association matrix A ∈ {−1, 0, 1} m * n , where m is the number of drugs and n is the number of targets.

Construction of the Signed Drug Heterogeneous Information Network
We use the above three profiles to construct a drug heterogeneous information network (HIN) and learn the drug embeddings based on the constructed drug HIN. A heterogeneous information network is a special type of information network that contains several types of nodes or links. In this work, the drug HIN contains three unsigned subnetworks and one signed drug subnetwork.

Unsigned Drug Subnetworks
We constructed three unsigned drug subnetworks based on drug chemical profiles, drug side-effect profiles and unsigned drug-target associations in the drug biological profiles. The nodes in the three subnetworks correspond to all 548 drugs. The edges are defined by the Jaccard similarities of the feature vectors in every type of profile. The edge weight w u ij between drug pair (d i , d j ) in an unsigned subnetwork is defined by Equation (1).
where Pr i is the feature vector of drug i in a type of drug profile.

Signed Drug Subnetwork
We constructed a signed drug subnetwork G s using the signed drug-target association matrix A ∈ {−1, 0, 1} m * n . The node set in G s is the drug set and the edge weight w s ij between drug pair (d i , d j ) is defined by Equation (2).
where a ik is the element of the signed drug-target association matrix A. w s ij ranges from −1 to 1, where w s ij = −1 if drug pair (d i , d j ) share several targets but have opposite action modes on the shared targets, while w s ij = 1 if drug pair (d i , d j ) share several targets and have the same action mode on the shared targets and w s ij = 0 if drug pair (d i , d j ) have no shared targets. The distribution of the edge weights w s ij is plotted in Figure 3.

Methods
In our model, side-effects were predicted using the low dimensional vector representations of drugs. The drug representations are learned from the above-mentioned drug profiles. Learning comprehensive drug representations can be formulated as the task of learning node embeddings of the constructed drug HIN. Therefore, we first performed a random walk procedure on every subnetwork in the above-constructed drug HIN to obtain nodes sequences, and then utilized the Skip-gram model to learn drug embeddings based on the obtained node sequences. After that, potential side-effects were predicted using the learned drug embeddings.

Random Walk on the Signed HIN
The drug HIN that is presented above contains both unsigned subnetworks and signed subnetworks. In the signed subnetworks, the learned node representations must integrate the positive and negative relations. An effective approach is to conduct random walks on the graph to integrate the positive and negative relations into node sequences. In this paper, we use two types of biased random walk procedures [28,29] to obtain many node sequences in each subnetwork. The two biased random walk procedures are conducted on signed graphs versus unsigned graphs.

Biased Random Walk on a Signed Graph
The biased random walk procedure on a signed graph utilizes the direct neighbor relations and common neighbor relations to acquire a node sequence. As illustrated in Figure 5, for a random walker, let t i−1 = s denote the i − 1th node in the path and let node t i = h be the next-hop node. If the edge between node s and node k is negative, the walker will not choose k as the next hop but may choose k s enemy node j as the next hop, which is guided by the common neighbor relation. The common neighbor relation involves two types of sets: friend sets F(w) and enemy sets E(w) (as illustrated in Figure 6). If two nodes' friend sets or enemy sets have a large overlap area, the two nodes share common friends or enemies, respectively, and the relation between them tends to be positive. In contrast, if the enemy set of a node covers a large overlap area of the friend set of another node, the two nodes have very little in common and the relation between them will be weak or even negative.
Here, we define a common neighbor relation similarity ls(s, h), which is used to decide whether the walker jumps to an enemy's enemy node. The transition probability in the biased random walk procedure is defined in Equation (4).
where p and q are adjustable parameters that guide the walker by selecting one of its friends or an enemy of its enemy as the next hop. The biased random walk procedure assumes that an enemy's enemy may be a friend. We investigated the triangle relationships in the signed drug subnetwork to determine whether this assumption holds in our datasets. The results are presented in Table 2.  In the above investigation, we find that an enemy's enemy is more likely to be a friend in a signed drug subnetwork. Therefore, we can use the biased random walk procedure to combine the positive and negative relations and to obtain node sequences.

Biased Random Walk Procedure on an Unsigned Graph
The biased random walk procedure on an unsigned graph combines two sampling strategies: breadth-first sampling (BFS) and depth-first sampling (DFS). For a random walker, let t i−1 = s denote the (i − 1)th node and let node t i−2 = t denote the (i − 2)th node in the path. Node t i = h will be the next-hop node. As illustrated in Figure 7, h0 or h1 may be the next-hop node according to BFS and DFS, respectively.
The transition probability in the biased random walk procedure is defined in Equation (5).
where α and β are adjustable parameters that guide the walker selecting the next node following BFS or DFS. t is the last node in the path.

Learning Drug Embeddings
After obtaining node sequences in the signed HIN, we utilized Skip-gram model [26] to learn node representations. The model of Skip-gram is illustrated in Figure 8 and the optimizer objective function of Skip-gram is presented as Equation (6).
where f is the node mapping function f : v ∈ V → R |V|×d , in which d is the dimension of the embeddings; v i−w is the w − th neighbor node of node v i in every node sequence; and w is the window size.
Embedding of v i Figure 8. Skip-gram model.

Prediction Formulation
Once we have obtained the final drug embeddings, we use a fully connected neural network to predict each side-effect, as illustrated the last step in Figure 1. The predictive problem was modeled as a binary classification of each side-effect. For each side-effect, the drugs that are known to cause the side-effect were labeled as positive samples and the remaining drugs were labeled as negative samples. The predictive model utilized the L2-norm-regularized logistic regression method [33] to predict each side-effect. The input features of the model were drug embeddings. The loss function is presented as Equation (7).
where Z is the feature matrix of the drugs and the row vector z i of Z is the embedding of drug i; f is a nonlinear logistic function with L2 loss; w and b are the weight vector and the bias vector, respectively; and λ is a hyperparameter, which must be learnt from the data.

Results and Discussions
We experimentally compared the performance of the proposed method with several state-of-the-art methods and utilized a case study to support our hypothesis that modes of action of drugs on target proteins are meaningful in side-effect prediction. We also discussed the impact of embedding dimension on the prediction performance to find the best representations of drugs.

Performance Evaluation Metrics
We used the receiver operating characteristic curve (ROC curve) to evaluate the performance of each method. The ROC curve is the true-positive rate (TPR) as a function of the false-positive rate (FPR), which is based on various thresholds, where TPR and FPR are defined in Equations (8) and (9).
where TP, FP, TN, and FN are the numbers of true positives, false positives, true negatives, and false negatives, respectively. We calculated the area under the ROC curve (AUROC) to evaluate the performance of each method in predicting all side-effects.

Baselines
We compared our method (RW-SHIN) with several state-of-the-art network embedding algorithms for side-effect prediction. We implemented the following four baselines for comparison:

•
Laplacian eigenmaps [34]: Laplacian eigenmaps is a typical matrix factorization method that has been widely adopted for data analysis of biomedical networks. It aims at factorizing a data matrix of a graph into lower dimensional matrices while preserving the topological properties of the original graph. For the drug HIN, we concatenated the Laplacian eigenmaps of each unsigned subnetwork to construct feature vectors of drugs for side-effect prediction. • GCN [35]: GCN is a recently proposed network embedding method that is based on the spectral convolutional operation and realizes state-of-the-art performance on important prediction problems in recommender systems. Here, we linearly integrated the similarity matrices of the unsigned subnetworks in the drug HIN and learned the drug embeddings using GCN. • AttSemiGAE [14]: AttSemiGAE utilizes multiview graph autoencoders (GAEs) and adds an attentive mechanism for determining the weights for each view with respect to the corresponding prediction tasks. Here, each unsigned subnetwork in the drug HIN is regarded as a single-view graph in the AttSemiGAE algorithm and the supervised information is the known side-effects of the drugs. • RW-HIN: For further validation of the impacts of action modes on the quality of side-effect predictions, we designed a network embedding algorithm that ignores the signed drug information, namely, RW-HIN. The algorithm is based on random walk on an unsigned graph.
These algorithms cannot learn node embeddings on signed graphs and only utilize the following data sources to learn drug embeddings: drug chemical profiles, drug side-effect profiles and unsigned drug biological profiles. We performed the following 5-fold cross-validation procedure for each algorithm: The known drug-side-effect associations are used as a gold-standard set. The drugs in the gold-standard set are split into five equally sized subsets, and each subset is used in turn as the test set, while the remaining four subsets are used as the training set. The final performance is evaluated according to the results on all 5 folds. For accurate comparison, we use the same experimental conditions, namely, the same training drugs and test drugs are used across all methods in each cross-validation fold. The embedding dimension of each algorithm is 32. All parameters in each method (e.g., the regularization parameters, the adjustable parameters, and the embedding dimension) were optimized using grid search with the AUROC score as an objective function. Figure 9 plots the ROC curves for the five approaches based on the cross-validation experiment. The ROC curve of each method is based on the merged prediction scores of all side-effects. Our method outperforms the other methods, which suggests that the proposed method is more effective than the previous methods. Although RW-SHIN and RW-HIN are based on the same training model, RW-SHIN, which considers the modes of action of drugs on target proteins, outperforms RW-HIN. This suggests that the modes of action of drugs on target proteins are meaningful. In addition, the method with an attentive mechanism outperforms the methods that do not use attentive mechanisms on the same unsigned datasets. We examined the prediction accuracies for individual side-effects. We calculated the AUROC scores for each side-effect. Figure 10 presents a boxplot that represents the distribution of the resulting AUROC scores for 1385 side-effects for each method. Our signed drug-target information-based method produced the best results. Most high-frequency side-effects (e.g., pseudomembranous colitis, gynecomastia and interstitial nephritis) were predicted with higher accuracy, while most low-frequency side-effects (e.g., diabetic neuropathy, nephrogenic diabetes insipidus and narcolepsy) were predicted with lower accuracy. The ROC-AUC scores are positively correlated with the side-effect frequencies. However, this is not absolute: some high-frequency side-effects (e.g., abdominal pain) were predicted with lower accuracy, while some low-frequency side-effects (e.g., chronic active hepatitis) were predicted with higher accuracy. We also investigated the side-effect prediction accuracy for every drug by evaluating the Top − K prediction accuracy of each drug. We counted the number of drugs whose Top − 1, Top − 3, and Top − 5 prediction results contain at least one known side-effect. If at least one known side-effect is in the Top − K prediction results of a drug, the drug will be counted. The results are presented in Table 3. The proposed signed drug-target association-based method outperforms other methods. The number of drugs whose Top − 1 prediction side-effects are known is 276 (50.3%). The numbers of drugs whose Top − 3 and Top − 5 prediction results contain at least one known side-effect are 354 (64.6%) and 465 (84.9%), respectively.

Case Studies
We investigated the side-effect prediction results of drugs that are similar in terms of chemical profiles and target proteins but opposite in terms of modes of action on target proteins. We consider Betaxolol (DB00195) and Dobutamine (DB00841) as examples. Betaxolol is a selective beta-1 adrenergic receptor blocker that is used in the treatment of hypertension and glaucoma. Dobutamine is a sympathomimetic drug that is used in the treatment of heart failure and cardiogenic shock. Its primary mechanism is direct stimulation of beta-1 receptors of the sympathetic nervous system.
The chemical similarity of drugs can be calculated using Jaccard similarity, which is shown in Equation (1). The chemical similarity of the two drugs is 0.82. The two drugs have two common target proteins: Beta-1 adrenergic receptor (P08588) and Beta-2 adrenergic receptor (P07550). The unsigned drug biological similarity of the two drugs is 0.67. However, the action modes of the two drugs are opposite. Dobutamine acts as an agonist on the two target proteins, while Betaxolol acts as an antagonist. The top-10 prediction results of the two drugs are presented in Table 4. The prediction results of the two drugs differed substantially and there was only one common side-effect: nausea. Nausea is a very common side-effect and may be caused by most drugs. This difference is consistent with the known side-effect similarity of the two drugs, which is 0.083. Focusing on the false prediction results of Dobutamine, we found that insomnia is a known side-effect of Pindolol (DB00960), which is an oral beta blocker that is used to treat hypertension. Dobutamine and Pindolol have the same action modes on the common target proteins: the Beta-1 and Beta-2 adrenergic receptors. The chemical similarity of Dobutamine and Pindolol is 0.873. Dobutamine [36] belongs to a class of beta agonists that typically have mild to moderate adverse effects, which include increased heart rate and insomnia.

Performance Comparison among Embedding Dimensions
To examine the impact of the embedding size on the prediction performance, we compared our two models with various dimensions of drug embeddings in terms of AUROC. The results are presented in Figure 11. With the increase of the embedding dimension, the side-effect prediction performance initially increased and subsequently decreased. If the embedding dimension is small, the embeddings may lose too much information. If the embedding dimension is very large, the embeddings may incorporate redundant information and tend to overfit. The two methods performed optimally when the embedding dimension was 32. RW-SHIN outperformed RW-HIN at the same embedding dimensions. This supports our assumption that the modes of action of drugs on target proteins can provide useful information for drug embeddings.

Conclusions and Future Work
In this work, we proposed an improved drug side-effect prediction method by integrating modes of action of drugs on target proteins into drug information. We proposed a side-effect prediction method that learns more comprehensive drug embeddings based on drug chemical profiles, unsigned drug-target protein associations, signed drug-target protein associations, and drug side-effect profiles. To the best of our knowledge, no previous work considers modes of action of drugs on target proteins in the context of drug side-effect prediction. To formulate this feature of drug information, identical or opposite action modes were defined as positive or negative edges in a signed graph and formed a signed heterogeneous information network with other drug information. We used two bias random walk procedures to obtain drug sequences in the signed HIN and utilized the Skip-gram framework to learn drug embeddings based on the obtained drug sequences. In the cross-validation experiments, all results demonstrate that the prediction performance improved substantially using our method. According to the case study on the drugs Betaxolol and Dobutamine, our method can predict both existing and novel drug-side-effect associations. Moreover, the results of the case study support our hypothesis that action modes are meaningful in side-effect prediction.
In the model, chemical structures, target proteins, action modes, and known side-effects were integrated into a unified framework for learning drug embeddings for side-effect prediction. However, sometimes the modes of action of drugs on target proteins are not always available and complete for all drugs. It has been proven that side-effects of drugs are related to the action modes. Consequently, in our future work, we will utilize known side-effects to predict modes of action of drugs and to further enrich the action mode information. We will also extend the datasets of drugs by mining from electronic medical records [37].