A novel drug-drug interactions prediction method based on a graph attention network

: With the increasing need for public health and drug development, combination therapy has become widely used in clinical settings. However, the risk of unanticipated adverse e ff ects and unknown toxicity caused by drug-drug interactions (DDIs) is a serious public health issue for polypharmacy safety. Traditional experimental methods for detecting DDIs are expensive and time-consuming. Therefore, many computational methods have been developed in recent years to predict DDIs with the growing availability of data and advancements in artificial intelligence. In silico methods have proven to be e ff ective in predicting DDIs, but detecting potential interactions, especially for newly discovered drugs without an existing DDI network, remains a challenge. In this study, we propose a predicting method of DDIs named HAG-DDI based on graph attention networks. We consider the di ff erences in mechanisms between DDIs and add learning of semantic-level attention, which can focus on advanced representations of DDIs. By treating interactions as nodes and the presence of the same drug as edges, and constructing small subnetworks during training, we e ff ectively mitigate potential bias issues arising from limited data availability. Our experimental results show that our method achieves an F1-score of 0.952, proving that our model is a viable alternative for DDIs prediction. The codes are available at: https: // github.com


Introduction
Drug-drug interactions (DDIs) are a change in the effect of one drug due to the presence of another drug [1]. It can promote the efficacy or reduce the side effects and affect the drug absorption or produce adverse side effects. With the development of drugs, combination therapy is widely used clinically, and network structure data. Although graph neural networks can predict known graph structures, handling unknown graph structures remains challenging. To overcome this limitation, researchers have developed graph attention networks [30]. In DDI research, Nyamabo et al. [31] proposed a graph attention network model based on drug substructures, while Feng et al. [32] proposed a graph attention network model based on chemical molecular graph calculations. Both studies have demonstrated the potential of graph attention networks in DDI research.
Although methods based on drug features have made progress in DDI research and have been confirmed feasible in in silico methods, there are still some limitations. First, deep learning methods are mostly trained based on independent samples, and a large amount of data is required to discover the similarity and correlation between samples. Second, the network-based models is lacking a mining ability for advanced representations and some methods using graph structure cannot predict for a drug outside the network. Third, for new drugs, many methods are unable to extract their features and predict their related interactions.
In this study, we propose a novel DDI prediction model based on heterogeneous graph attention networks named HGA-DDI. The HGA-DDI uses DDIs as nodes and the same drugs as edges. To accommodate predictions for new drugs, we use only substructure molecular fingerprints of Pubchem as features. To strengthen the attention of the model to advanced features, we use the node-level attention and the semantic-level attention mechanisms originally used in heterogeneous graph attention networks. Our experimental results show that this new model achieves good performance.
In summary, the major contributions of this work are: • We develop a better performing graph attention DDI prediction method. This is the first attempt to apply heterogeneous graph attention network algorithm to predict DDIs based on drug molecular fingerprints. Moreover, the performance of the model in this paper is relatively good, which can help with research on new drugs. • In terms of algorithm innovation, this work successfully applies the method of heterogeneous network to a homogeneous network. Although the graph structure of drug interactions is a homogeneous network, multiple different mechanisms exist between drug interactions, making it critical to pay attention to advanced representations that conventional graph attention network methods cannot. The semantic-level attention mechanism of this method becomes a solution. • The HGA-DDI only utilizes structural features from drugs, and innovatively treats each interaction as a node and the existence of same drugs as edges. This allows for the construction of a graph structure based on a large amount of existing known data when predicting interactions between new drugs, without compromising the performance of the model.

Datasets
Our data is sourced from two databases, Drugbank and Twosides. Drugbank [33] consists of two parts: bioinformatics data and cheminformatics data. It integrates a vast amount of drug biochemical data, target structure and other information for drug research. Twosides [34][35][36] database collects only DDIs and is a sub-database of adverse DDIs derived from the FAERS (FDA Adverse Event Reporting System) database. We screened 1017 small molecule drugs and 202,304 DDIs that fit the FDA standards and feature extraction requirements of this work from Drugbank version 5.1.7. Subsequently, we selected 39,813 intersections recorded in Twosides as positive samples. To facilitate experimental grouping and address the imbalance of actual drug effects, we randomly generated 60,187 negative samples that did not appear in both databases, which brings the dataset total to 100,000 by selecting records that do not appear in Drugbank and Twosides.
Molecular fingerprint [37] encodes molecule information into a bit string where each bit represents a molecular feature. In this study, we used molecular fingerprints to represent drugs. Through the Pubchem database, we extracted the substructure fingerprint of drugs as the learning features. The substructure fingerprint has 881 bits, covering a wide range of different substructures and functional groups. To build the graph data structure, we used the DDIs as nodes of the graph, and edges represent whether a drug is involved in two DDIs. For each DDI, we integrate the features of the relationship by comparing the features of each position between the two corresponding drugs. If a position is the same and equal to 1 for both drugs, it is set as 1 in the integrated feature. If the position is the same and equal to 0 for both drugs, it is set as 0 in the integrated feature. Otherwise, it is set as 0.5. Finally, each DDI is encoded as a 881-dimensional vector.
To verify the model on unknown drugs with limited data, we divided the data into 200 random sub-nets for batch training and randomly selected 2% of each sub-net as public validation and testing datasets. The final data distribution is shown in Figure 1:

Prediction framework
The HGA-DDI includes two parts as shown in Figure 2. Because DDIs networks are complex graph structures, in this work, we applied graph attention network [30], which is suitable for graph problems, as the core algorithm. The graph attention network employs an attention mechanism as its main algorithm, eliminating the need for complex calculations involving matrices like Laplace. Instead, it updates node features through the representation of neighboring nodes. In the graph attention network, the learning weights from target nodes to neighbor nodes differ. The adjacency matrix defines the representation of the relevant node, and the calculation of the relationship weights depends on the features of both the node and its neighbor. Specifically, the weight of the neighbor to the node is calculated as follows: where e is attention coefficient, a is attention mechanism, h i is input feature vector and h j is output feature vector, a i j is the importance from node j to node i.
In this work, we added the semantic-level attention which has applied in heterogeneous network by Wang et al. [38]. Meta-path is defined by Wang et al. as different connection modes of nodes, and each meta-path can represent a semantic-level information of nodes. Learning the attention mechanism node embeddings of different meta-paths can provide the different importance. The following formula is represent features from graph attention layer output as the input of semantic level attention: The importance W Φ i of each meta-path Φ is calculated as following: where V is node, W is the weight matrix, b is the bias vector and q is the semantic-level attention vector.
After normalization, the final embedding of model is: The overall process of graph attention network is as follows: Algorithm 1: The overall process of graph attention network. Input: The graph G = (V, E). The node feature {h i , ∀i ∈ V}. The mata-path set {ϕ 0 , ϕ 1 , ..., ϕ p }. Output: The final embedding Z. The node-level attention weight α. The semantic-level Calculate the weight coefficient a ϕ i j ; end for Calculate the semantic-specific node embedding ; end for Calculate the weight of meta-path β ϕ i ; Fuse the semantic-specific embedding Z p ϕi=1 β ϕi · Z ϕ ; end for Calculate Cross-Entropy L = − l∈y L Y l ln(C · Z l ); Back propagation and update parameters in this model; return Z, α, β.
After computing the embeddings of drug nodes, the second part of HGA-DDI is a Multilayer Perceptron (MLP) classifier by using the embeddings as input. To train the MLP classifier, we performed ten-fold cross-validation. In each fold, 10% of the data from the test set was randomly selected as the testing portion for evaluating the performance of the MLP classifier, while the remaining 90% of the data was used for training the MLP classifier. We repeated this process ten times, each time using a different 10% portion for testing and the rest for training. The final performance of the model was determined by averaging the results of these ten folds, taking into account the performance of each individual model.
In this study, we propose two meta-paths: the Interaction Independent Feature Meta-Path (IIFM) and the Interaction-Drug-Interaction Meta-Path (IDIM). IIFM is represented by a diagonal matrix, which indicates that the network uses its own features. The mathematical expression of IIFM is as follows: IDIM refers to the Interaction-Drug-Interaction Matrix. For a given node, if there exists a shared drug between that node and another node, the corresponding element in IDIM is set to 1. Conversely, if there is no shared drug, the element is set to 0. Notably, all diagonal elements in the matrix are set to 1.

Baseline models
To validate the overall effectiveness of the model, we compared two state-of-the-art open-source drug-drug interaction (DDI) models based on graph attention networks. The first model, SSI-DDI [31], has released all its training code, while the second model, GNN-DDI [32], has shared its model architecture. We successfully reproduced both models and compared the results by extracting the SMILES representations of drug molecules from the test set used in this study.
Additionally, in order to test the ability of the graph attention network used in HGA-DDI to extract embeddings, we compared the embedding features extracted by five baseline models based on graph algorithms, and also used MLP classifier for comparison. The introductions of the five algorithms are as follows: 1) Deepwalk DeepWalk [39] is a network-based language modeling algorithm that utilizes local information obtained from truncated random walks to learn latent representations. It treats walks as the equivalent of sentences and consists of a random walk generator and an update procedure.
2) SDNE Structural Deep Network Embedding (SDNE) [40] is a semi-supervised deep learning algorithm that incorporates two orders of similarity. The first-order similarity primarily reflects the local characteristics of the graph and is used as supervised information in the supervised component. The second-order similarity mainly reflects the global characteristics of the graph, which is used by the unsupervised component.
3) LINE Large-scale Information Network Embedding (LINE) [41] optimizes an objective function and proposes an edge-sampling algorithm that improves both the effectiveness and efficiency of stochastic gradient descent.

4) Node2Vec
Node2Vec [42] learns continuous feature representations of networks and maps nodes to low-dimensional feature representations to maximize the likelihood representation of network neighbor nodes. It defines a flexible notion of a node's network neighborhood, designs a biased random walk procedure and learns to explore a variety of neighbor representations.

5) Struc2Vec
Struc2Vec [43] uses a hierarchy to measure node similarity at different scales and constructs a multi-layer graph to encode structural similarities and generate structural context for nodes.
To validate the rationality of the MLP classifier of HGA-DDI, we compared it with several machine learning methods. They are Support Vector Machines (SVM), Random Forests (RF), Gradient Boosting Decision Tree (GBDT) and K-Nearest Neighbor (KNN) Classifier.

Metrics
In order to evaluate the performance, we use precision (PRE), sensitivity (SEN), specificity (SPE), accuracy (ACC), F1 score and Matthews correlation coefficient (MCC) as metrics, and their formulas are as follows: where TP is the number of the true positive, TN is the number of the true negative, FP is the number of the false positive and FN is the number of the false negative.

Analysis of the comparison with other methods
To validate the effectiveness of the model, we compared the performance of HGA-DDI with SSI-DDI and GNN-DDI models on the same test dataset. The comparison results are in Table 1. In comparison to two state-of-the-art graph attention networks, HGA-DDI demonstrated the best performance across all metrics, showcasing the overall computational superiority of the model. However, during the comparison, it was noted that one limitation of HGA-DDI is its inability to predict the types of drug interactions. This will be addressed and improved upon in our future work.

Sensitivity analysis of graph embedding methods
To verify the graph attention network, we conduct five baseline models which used to calculate drug graph embedding on testing datasets of this work. The results are in Table 2. The Table 2 presents the comparison results, demonstrating the superior performance of HGA-DDI in terms of ACC, SEN, F1-score and MCC. These results indicate the competence of HGA-DDI in DDI prediction. While other graph embedding algorithms may exhibit a bias towards encoding data towards positive samples due to data imbalance, this results in HGA-DDI not achieving the best performance in terms of PRE and SPE metrics. However, when considering comprehensive metrics such as F1 and MCC, it becomes evident that HGA-DDI possesses better capability in distinguishing between positive and negative samples. This discriminative ability highlights the advantage of the heterogeneous graph attention network employed by HGA-DDI. Furthermore, the highest ACC value obtained by our model suggests its accuracy in identifying DDI samples and its effective extraction of graph embeddings using the graph attention network. Figure 3 provides a visual representation of the comparison results.

Sensitivity analysis of classifier algorithms
To validate the rationale and performance of the MLP classifier model, we compared it with several machine learning methods. The results are in Table 3.  In Table 3, similar to the comparison with other graph embedding algorithms, the MLP algorithm did not outperform in all metrics. These slight differences may be attributed to algorithmic errors and the learning tendency of the classifier itself. However, the MLP classifier performed the best in terms of ACC, F1, and MCC, which are comprehensive measures of the model's predictive ability for both positive and negative samples. This aligns with our expectations, and thus we can consider MLP as the most suitable classifier algorithm for HGA-DDI among the current options. Figure 4 provides a visual representation of the comparison results.

Analysis of meta-paths
In this work, we propose and use two meta-paths, IIFM and IDIM. The weights of the two meta-paths which indicate the learning of the importance assigned by our method are 0.964 and 0.036 respectively. The visualized results as shown in Figure 5: As shown in the Figure 5, the meta-path IDIM is given a higher weight on our training datasets, which means that the method regards IDIM as the most critical meta-path for identifying drug interactions. The experimental results also reflect that IDIM has more effective features than IIFM. It also further confirms the validity of semantic-level attention and the difference in the effectiveness of the meta-paths.

Case study
To verify the ability of this method to predict real data, we conduct database and literature studies as case studies. In the database study, to demonstrate the advancement of the model in considering interaction relationships with shared drugs as edges, we focused on studying newly developed drugs related to COVID-19. We collected a total of 1734 drugs related to Covid-19 from PubChem, differentiating them by whether they were included in PubChem between 2021 and 2022, resulting in 57 new drugs. We predicted a total of 98,718 potential relationships between each drug and all drugs. Using HGA-DDI for prediction, a total of 19,055 interactions were predicted as positive, with 8128 interactions classified as high-confidence samples (predicted probability of being positive greater than 95%).
We statistically analyzed these high-confidence samples to identify the top 20 sensitive new drugs in terms of drug interactions. The results are in Table 4 (as some compounds were not named, they are represented by their PubChem ID and molecular formula): analyze the correlation. In this work, we propose an interaction prediction method based on graph attention mechanism, and the learning of semantic-attention mechanism is effectively used in the method. Finally, the prediction performance of this model is better than five comparison models on our testing datasets. Moreover, through the analysis of the meta-paths selection, the importance of the reference neighbor node weight of this problem is verified. Finally, through several testing cases, it demonstrated the availability of our method.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.