Drug–target interaction predictions with multi-view similarity network fusion strategy and deep interactive attention mechanism

Abstract Motivation Accurately identifying the drug–target interactions (DTIs) is one of the crucial steps in the drug discovery and drug repositioning process. Currently, many computational-based models have already been proposed for DTI prediction and achieved some significant improvement. However, these approaches pay little attention to fuse the multi-view similarity networks related to drugs and targets in an appropriate way. Besides, how to fully incorporate the known interaction relationships to accurately represent drugs and targets is not well investigated. Therefore, there is still a need to improve the accuracy of DTI prediction models. Results In this study, we propose a novel approach that employs Multi-view similarity network fusion strategy and deep Interactive attention mechanism to predict Drug–Target Interactions (MIDTI). First, MIDTI constructs multi-view similarity networks of drugs and targets with their diverse information and integrates these similarity networks effectively in an unsupervised manner. Then, MIDTI obtains the embeddings of drugs and targets from multi-type networks simultaneously. After that, MIDTI adopts the deep interactive attention mechanism to further learn their discriminative embeddings comprehensively with the known DTI relationships. Finally, we feed the learned representations of drugs and targets to the multilayer perceptron model and predict the underlying interactions. Extensive results indicate that MIDTI significantly outperforms other baseline methods on the DTI prediction task. The results of the ablation experiments also confirm the effectiveness of the attention mechanism in the multi-view similarity network fusion strategy and the deep interactive attention mechanism. Availability and implementation https://github.com/XuLew/MIDTI.


Section 2. Similarity network fusion strategy.
Inspired by BIONIC (Forster et al., 2022), MIDTI will integrate the different similarity networks of drugs and targets with the similarity network fusion strategy.The integrated network could accurately reflect the topologies of the underlying original networks and capture functional information.Different from BIONIC, MIDTI adds a multi-view attention mechanism that adaptively learns the importance of features from different similarity networks.
Here we take the drug similarity networks as an example to demonstrate the fusion process which mainly has four steps.For step one, we feed drug similarity networks into MIDTI, which adopts GCNs to encode different networks separately.The embedding of drugs at (l + 1)-th layer can be formulated as: where c ∈ {1, . . ., P } is the index number of similarity network and A c is its adjacency matrix.Besides, σ (•) is the ReLU activation function, and Ãc = A c + I is the identity matrix with adaptive size, D is the corresponding degree matrix of Ãc , W (l)  c is a trainable linear transformation, and H (l)  c ∈ R Fm×M denotes the representations of M drugs that obtained from the l-th GCN layer.F m is the dimension of drug representations.
In step two, MIDTI extracts the embeddings of drugs from P drug similarity networks and applies the multi-view attention mechanism to obtain their discriminative features further.Specifically, Inspired by Hu et al. (Hu et al., 2018), MIDTI firstly adopts the squeeze-and-extract block to assign weights to embeddings of drugs and then utilize CNN to fuse the embeddings of drugs from different similarity networks.
The statistic z c for c-th network is calculated by: H (l)  c (i, j) (3) where H (l)  c ∈ R Fm×M is the c-th feature matrix for drug.The statistic Z for these P similarity networks is generated by Z = {z 1 , z 2 , . . ., z P } (4) MIDTI calculates their view-wise attention (VA) weight for each drug similarity network, which will be formulated as where σ (•) and δ (•) are Sigmoid and Relu activation function respectively.W 1 and W 2 are learnable matrices.Finally, the view-wise attention weight can also be represented as: Z att = {z (att,1) , z (att,2) , . . ., z (att,P ) } ∈ R Fm×P (6) where z (att,c) denotes the learned view-wise attention weight for c-th drug similarity network.
Combining the drug features with view-wise attention weight, the drug representation from the c-th network can be defined as: (7) And MIDTI employs 1D-CNN (Kiranyaz et al., 2021) model to obtain the integrated drug representation matrix X d ∈ R Fm×M .
For step three, MIDTI reconstructs the similarity matrix S d , which is formulated as: For step four, MIDTI adopts the sum of mean squared errors as the loss function between the reconstructed matrix and each original drug matrix {A 1 , A 2 , . . ., A P }.The loss function is defined as follows: By minimizing the loss function through backpropagation, MIDTI could establish the integrated drug similarity matrix A homo_d and the learned drug feature representation X d after the iteration is finished.Similarly, MIDTI could establish the integrated target similarity matrix and target feature matrix represented as A homo_t and X t .
MIDTI is trained based on the Lookahead optimizer with the inner optimizer SGD (Zhang et al., 2019).We adopt the grid search strategy to tune parameters for MIDTI.Specifically, the learning rate is set to 0.1.The number of interactive attention heads is 8.The embedding size of drugs and targets is 512.The numbers of GCN layers, interactive attention layer and MLP layers are equal to 3.During the training process, the dropout value on the deep interactive attention module is 0.1, and the default number of epochs is set to 2000.Furthermore, MIDTI adopts an early stop with a patience of 50.
Zheng's (Zheng et al., 2018) dataset also contains other drug and target such as chemical structure, drug side effect, drug substitute, and gene ontology.There are 11,819 DTIs, related to 1,094 drugs and 1,556 targets.
Yamanishi's dataset (Yamanishi et al., 2008) contains four sub-datasets, each corresponding to a family of target proteins.They are G Protein-Coupled Receptors (GPCR), Enzymes (Enzyme), Ion Channels (IC) and Nuclear Receptors (NR).The number of known drugs in GPCR, Enzyme, IC, and NR is 223, 445, 210, and 54, respectively, and the number of targets in these classes is 95, 664, 204, and 26, respectively.The number of known drug-target interactions is 635, 2926, 1476 and 90, respectively.
Section 5.The descriptions for these comparison approaches.
• RF (Pedregosa et al., 2011) is one of the ensemble learning methods for classification and its output is the class selected by most trees.We feed the embeddings of drugs and targets for DTI predictions.
• SVM (Chang and Lin, 2011) is a traditional supervised learning approach and we feed the embeddings of drugs and targets directly to predict the DTIs.• XGBoost (Chen and Guestrin, 2016) is a powerful machine-learning algorithm that combines the strengths of decision trees and gradient boosting to achieve high predictive accuracy and efficiency.• GCN (Kipf and Welling, 2016) is a semisupervised learning approach.Here we feed the drug-target association network into it and learn the embedding of drugs and targets for predicting other potential drug-target pairs.• GAT (Veličković et al., 2017) is one of the graph neural networks with the attention mechanism.We feed the drug-target association network into GATs and obtain their feature representations for completing the DTI prediction tasks.• DTI-CNN (Peng et al., 2020) obtains the embeddings of drugs and targets based on the heterogeneous networks and infers their interactions with learned features from a denoising auto-encoder model.
• GCNMDA (Long et al., 2020) builds a heterogeneous network for drugs and microbes and then employs the GCN-based framework with conditional random field (CRF) as well as attention mechanism techniques to discover entity associations.By changing the input to drugs and targets, we can also predict the DTIs.• MVGCN (Fu et al., 2022) is a model that integrates data through a multi-view graph convolutional network and aims to predict links in a biomedical bipartite network.• MMGCN (Tang et al., 2021)  In this section, we will compare MIDTI with other baselines on three different datasets, which are (Luo et al., 2017), Yamanishi's (Yamanishi et al., 2008) and Zheng's dataset (Zheng et al., 2018).The results of Luo's dataset have been displayed in the revised manuscript in Section 3.2 and 3.3.For Yamanishi's dataset, there are four subdatasets which are GPCR, Enzyme, IC and NR.Here, we also conduct comprehensive evaluation experiments with ACC, AUC and AUPR metrics.Specifically, the results under 1:1, 1:15 and 1:10 ratios have been presented at The results of MIDTI and the comparison approach on Zheng's dataset are presented at Table.S4 respectively.Specifically, on Zheng's dataset, the results under the 1:1 on ACC, AUC and AUPR values are 0.8886, 0.9546 and 0.9497.The results under 1:5 and 1:10 ratios are also shown at the Table.S4.On the whole, MIDTI achieves the best performance on this dataset.
The results of this experiment demonstrate that MIDTI has a powerful ability to discover novel interactions between drugs and targets.

Section 7. Parameter analysis experiments.
In this section, we discuss the sensitivity of several parameters of MIDTI.These parameters mainly include the embedding size, the learning rate, the number of interactive attention heads, the number of GCN layers, the number of interactive attention layers and the number of MLP layers.The corresponding experiment results are all evaluated with ACC, AUC, AUPR, F1 and MCC, respectively.The corresponding results are listed below.
• The embedding size In this experiment, we investigate the effects of embedding size on MIDTI.Here, we set the embedding size as 32, 64, 128, 256 512 and 1024 respectively and the corresponding results are shown in Fig. S1A.As the embedding size increases from 32 to 512, the performance of MIDTI has a short decrease at an embedding size of 64, with an overall increasing trend.While its performance decreases with the embedding size increasing from 512 to 1024.The best performance is achieved when the embedding size is 512, and it has values of 0.9340, 0.9787, 0.9701, 0.9370 and 0.8726 on ACC, AUC, AUPR, F1 and MCC, respectively.As a result, we adopt the embedding size of 512 in this study.

• The learning rate
The results of the Lookahead optimizer with the inner optimizer SGD at different learning rates are shown in Fig. S1B.It can be seen that the performance of MIDTI keeps getting better with the increase of the learning rate from 0.01 to 0.1, while its performance gradually decreases when the number of layers becomes larger than 3.When the learning rate is 0.1, MIDTI gets the highest scores, with values of 0.9340, 0.9787, 0.9701, 0.9370 and 0.8726 for ACC, AUC, AUPR, F1 and MCC respectively.Therefore, the learning rate of the model is set to 0.1 in this paper.

• The number of interactive attention heads
We vary the dimension from 1 to 16 to analyze the performance of MIDTI with different numbers of attention heads.According to the results in Fig. S1C, we find that MIDTI with 8 attention heads achieves the best performances.The values on ACC, AUC and AUPR are highest when the number of attention heads is 8, which are 0.9340, 0.9787 and 0.9701, respectively.The scores on F1 and MCC are highest when the number of attention heads is To analyze the impact of layer number of GCNs, we vary the number from 1 to 4. The values on ACC, AUC, AUPR, F1 and MCC metrics are lowest when the number of GCN layers is 1, which are 0.9046, 0.9605, 0.9320, 0.9061 and 0.8123.From Fig. S2A, we can see that when the number of GCN layers increases to 3, the performance of the model is the best.It has the values of 0.9340, 0.9787, 0.9701, 0.9370 and 0.8726 on ACC, AUC, AUPR, F1 and MCC, respectively.However, the performance of MIDTI decreases with the number of layers increasing from 3 to 4. Thus, the number of GCN layers is set to 3. Fig. S1 The performance of MIDTI under different thresholds for the embedding size, the learning rate and the number of interactive attention heads.. Fig. S2 The performance of MIDTI under different thresholds for the number of GCN layers, the number of interactive attention layers and the number of MLP layers.
• The number of interactive attention layers We investigate the impact of the number of deep interactive attention layers on MIDTI.The number of layers is set to 1, 2, 3 and 4, respectively.The values on n ACC, AUC, AUPR, F1 and MCC, metrics increase when the number of layers is changed from 1 to 3, and then decrease when the number of layers is increased to 4. The results in Fig. S2B indicate that MIDTI achieves the best performance when the number of interactive attention layers is 3, its values on five metrics are 0.9340, 0.9787, 0.9701, 0.9370 and 0.8726.In this study, the number of interactive attention layers is set to 3.
• The number of MLP layers MLP is employed as the classifier to predict DTIs.It is very critical to choose a proper layer number for MLP.The number of MLP layers is set to 1, 2, 3 and 4. The corresponding results in Fig. S2C fully indicate that MIDTI achieves the best performance when the number of MLP layers is 3.It can be seen that the performance of MIDTI keeps getting better with the increase of layer numbers from 1 to 3, while its performance decreases with the increase of layer numbers from 3 to 4. Therefore, the number of MLP layers is set to 3.

Section 8. Comparison results on different similarity fusion strategies.
In this study, we evaluate the performance of different similarity network fusion strategies, which are MIDTI, MIDTI_ave, and MIDTI_pro respectively.Here, we also select other two similar network fusion strategies, which are called MIDTI_ave and MIDTI_pro.For the MIDTI_ave strategy, we measure the average values from different networks as the integrated similarity values.For MIDTI_pro strategy, the integrated similarity value is formulated as S = 1 − ∏ n i=1 S i , where S i denotes the similarity values from the i-th similarity network.
The results on Luo's Zheng's and Yamanish's datasets are shown in Table .S5.From the results, we could find that the proposed similarity network fusion strategy achieves the best performance on all three datasets, which demonstrates the effectiveness of the MIDTI in finding DTIs.In practice, discovering the interactions accurately for some common drugs and targets is another effective manner to verify the effectiveness of DTI prediction models (Tian et al., 2022).In this section, we selected five typical drugs Quetiapine, Clozapine, Aripiprazole, Ziprasidone and Amitriptyline, and analyzed the DTI prediction results of these drugs.
As the same process with previous research (Peng et al., 2020), we exclude all the interactions between the selected drugs and their related targets in the training set and validation set and input the test drugs and their related targets into the test set.In the Amitriptyline set, 23 of 24 known interactions in Luo's dataset are identified.In the Clozapine set, 19 of 20 known interactions are identified.Moreover, all the known interactions are identified in Quetiapine set, Aripiprazole set and Ziprasidone set.These results initially testify that the MIDTI method has a good performance on DTI prediction.
Besides, similar to Xuan's study (Xuan et al., 2022), we list the top 10 targets with predicted scores for the five drugs and verify the predicted results based on DrugBank (Knox et al., 2010), DrugCenter (Avram et al., 2021) and PubChem database (Kim et al., 2023).DrugBank and DrugCentral are web-enabled databases containing comprehensive drug data covering drug function, drug targets and so on.The PubChem database provides up-to-date bioactive drug-like small molecule data and chemical-target interactions, which are abstracted and curated from the primary scientific databases and literature.The corresponding results have been presented in Table .S6, which shows the top 10 related targets predicted by MIDTI for the five selected drugs.We can observe that most of the novel DTIs predicted by MIDTI are verified by different databases.For example, Quetiapine is an atypical antipsychotic medication for the treatment of schizophrenia, bipolar disorder, borderline personality disorder, and major depressive disorder (Tandon, 2003).In the top 10 target candidates for Quetiapine, only one target with the gene name NT5C2, is labeled as unproved, meaning that there is no evidence to confirm their interaction.Meanwhile, we also predict the top 10 drugs with the predicted scores for three targets and the results have been displayed in Table .S7.
The above analysis indicates that MIDTI has the powerful ability to discover potential drug-target interactions, which has essential implications for drug screening and drug repositioning.
Section 10.Time and space complexity analysis.

• Time complexity analysis
The time complexity of MIDTI is crucial for its applicability.Here we will analyze its time complexity briefly.
There are mainly three steps for constructing MIDTI, which are presented in Fig. 1 in the manuscript.The main task for step one is to construct the multi-type network.Suppose the numbers of drugs and targets are M and N .The time complexity for establishing one drug and target similarity network is O(M 2 ) and O(N 2 ) respectively.Since MIDTI employs encoders (GCNs) to learn the embeddings of drugs and targets from each similarity network, the time complexity is , where E 1 , F 1 are the number of edges and embedding size from drug similarity networks, E 2 , F 2 are the number of edges and embedding size form target similarity network, d 1 and d 2 are the numbers of similarity networks of drugs and targets respectively.For merging these embeddings of drugs and targets, the time complexity is O(d 1 M 2 ) + O(d 2 N 2 ).Besides, the time complexity for constructing the drugtarget bipartite network is O(M × N ), and the time complexity for constructing the heterogeneous drug-target network is O(M 2 + N 2 + M N ).As a result, the total time complexity for step one is The main task in step two is to feature learning from multi-type networks.MIDTI learns the embedding of drugs from four types , where E 3 and F 3 are the number of edges and the embedding size of outputs from the bipartite network, E 4 and F 4 are the number of edges and the embedding size of outputs from the heterogeneous network.
Step three is to learn the final embeddings of drugs and targets with the deep interactive attention module.Suppose the input embedding size for drugs and targets is 3l and the final output embedding size is F m , the time complexity for MHA operation is O((3l) 2 F m ), the output of the linear layer is O((3l)(F m ) 2 ), and the total time complexity for SA/DTA/DTA operation is O(6l(F m ) 2 + 9l 2 F m ).The time complexity for running one interactive attention layer is O(12l(F m ) 2 + 18l 2 F m ).Moreover, MIDTI performs (n + l)interactive attention layer and the time complexity is O(3l(n + 1) 2 F 2 m ).As a result, the total time complexity for step 3 is O((M + N )(12l + 3l(n + 1) 2 )F 2 m + 18l 2 F m ), where M and N is the number of drugs and targets, l is one constant which denotes the layer number of GCNs in step 2.
In summary, the total time complexity is F 4 and F m are all constant, the total time complexity could be written as O(M E 1 )+O(N E 2 )+O(M 2 + N 2 + M N )+ O(M E 1 )+O(N E 2 )+O((M + N )E 3 )+O((M + N )E 4 ) + O((M + N )(n + 1) 2 ).Since E 1 is smaller than M 2 , E 2 is smaller than N 2 , and E 3 , E 4 are all smaller than (M + N ) 2 , the total time complexity could be written as Since n is the interactive attention layer, the final time complexity of MIDTI is O(M 3 +N 3 ).The running time for executing one iteration of MIDTI is about 0.5 seconds.

• Space complexity analysis
In this study, there are mainly M drugs and N targets.The sizes of the drug and target similarity matrix are M 2 and N 2 .The size drug-target interaction matrix is M × N .Each drug and target will occupy one Byte of storage space.The interaction relationship will also occupy one Byte of storage space.As a result, the storage space for these networks will be d 1 M 2 Byte, d 2 N 2 Byte, 2(M N ) 2 Byte.

Table . S1
The performance of MIDTI as well as other baseline approaches for predicting drug-target interaction at a ratio of 1:1 on Yamanishi's datasets..

Table . S2
The performance of MIDTI as well as other baseline approaches for predicting drug-target interaction at a ratio of 1:5 on Yamanishi's datasets..
• The number of GCN layers

Table . S3
The performance of MIDTI as well as other baseline approaches for predicting drug-target interaction at a ratio of 1:10 on Yamanishi's datasets..

Table . S4
The performance of MIDTI as well as other baseline approaches for predicting drug-target interaction under different ratios on Zheng's dataset .

Table . S5
The evaluation results of MIDTI with different similarity network fusion strategies on Luo's, Zheng's and Yamanishi's datasets .

Table . S6
The top 10 candidate target proteins of five selected drugs.

Table . S7
The top 10 candidate drugs of three selected targets.