A Novel Triple Matrix Factorization Method for Detecting Drug-Side Effect Association Based on Kernel Target Alignment

All drugs usually have side effects, which endanger the health of patients. To identify potential side effects of drugs, biological and pharmacological experiments are done but are expensive and time-consuming. So, computation-based methods have been developed to accurately and quickly predict side effects. To predict potential associations between drugs and side effects, we propose a novel method called the Triple Matrix Factorization- (TMF-) based model. TMF is built by the biprojection matrix and latent feature of kernels, which is based on Low Rank Approximation (LRA). LRA could construct a lower rank matrix to approximate the original matrix, which not only retains the characteristics of the original matrix but also reduces the storage space and computational complexity of the data. To fuse multivariate information, multiple kernel matrices are constructed and integrated via Kernel Target Alignment-based Multiple Kernel Learning (KTA-MKL) in drug and side effect space, respectively. Compared with other methods, our model achieves better performance on three benchmark datasets. The values of the Area Under the Precision-Recall curve (AUPR) are 0.677, 0.685, and 0.680 on three datasets, respectively.


Introduction
Drug treatment of patients' diseases may be accompanied by side effects, endangering the life and health of patients. Therefore, how to quickly and accurately find potential drug side effect information becomes an important step in the drug development process. The traditional methods to detect the side effects of drugs are usually biological and pharmacological experiments. These approaches often take a long time and huge capital investment. So, it is necessary to accurately and quickly predict the potential side effects of drugs through computation-based methods [1]. Most computation-based methods for predicting drug side effects used Machine Learning (ML) classification models to predict side effect categories by extracted features from the biochemical characteristics of drugs. ML has been widely used in the field of computational biology, containing potential disease-associated microRNAs [2,3] or circRNAs [4], O-GlcNAcylation sites [5], prediction of DNA or RNA methylcytosine sites [6,7], protein function identification [8][9][10][11][12], protein remote homology [13], analyzing microbiology [14], electron transport proteins [15], drugtarget interactions [16], drug-side effect association [17,18], protein-protein interactions [19,20], and lncRNA-miRNA interactions [21].
Pauwels and Stoven develop a predictive model of drugside effect association by Ordinary Canonical Correlation Analysis (OCCA) and Sparse Canonical Correlation Analysis (SCCA) [1,22]. The input feature of OCCA and SCCA is extracted from chemical structures of drugs. Cheng and Wang proposed the Phenotypic Network Inference Model (PNIM) [23] to detect new potential drug-side effect associations. Mizutani and Stoven [24] utilized cooccurrence of drug profiles and protein interaction profiles to predict side effects. The Support Vector Machine (SVM) was used to build Adverse Drug Reaction (ADR) prediction, which is based on chemical structures, biological properties of drugs, and phenotypic characteristics [25]. Zhang et al. [26][27][28] built an ensemble method, which was based on the Integrated Neighborhood-Based Method (INBM) and Restricted Boltzmann Machine-Based Method (RBMBM). Matrix Factorization-(MF-) based methods have been widely used for link prediction in bipartite networks of systems biology. To predict drug-target interactions, Neighborhood Regularized Logistic Matrix Factorization (NRLMF) [29], Collaborative Matrix Factorization (CMF) [30], and Graph Regularized Matrix Factorization (GRMF) [31] were developed via the MF theory.
In our study, we develop a Triple Matrix Factorization-(TMF-) based model to identify the associations of drug and side effect. TMF employs the biprojection matrix and two latent feature matrices (from drug and side effect space) to estimate the strength of new drug-side effect associations. Latent feature matrices are built via Low Rank Approximation (LRA), which could construct a lower rank matrix to approximate the original matrix. To improve the performance of prediction, Kernel Target Alignment-based Multiple Kernel Learning (KTA-MKL) is used to integrate multiple kernel matrices in drug and side effect space, respectively. Our method can fuse multivariate information (multiple kernels) and obtain new associations through matrix projection. Compared with other existing methods, our model obtains better performance on three benchmark datasets.

Method
2.1. Problem Description. The dataset of drug-side effect association can be regarded as a bipartite network, which has n drugs and m side effects. The relationships of drug and side effect can be represented as a n × m adjacent matrix Y ∈ R n×m . D = fd 1 , d 2 , ⋯, d n g and S = fs 1 , s 2 , ⋯, s m g are the drug and side effect sets, respectively. Y i,j = 1 denotes that drug d i and side effect s j are related; otherwise, it is 0. The associations between drugs and side effect terms are shown in Figure 1. The solid lines link the known drug-side effect associations. The hollow circles and filled squares are drugs and side effects, respectively. The prediction of new associations is a recommender task.

Drug Kernels and Side Effect Kernels.
To predict the associations of drugs and side effects, we need to construct the relationship between drugs (or side effects). In this study, we build different kernels (similarity matrices) to describe the relationships of drugs (or side effects). In drug space, the fingerprint of 881 chemical substructures is employed to encode the drug chemical structure, which is shown in Figure 2. The fingerprint represents whether some substructures are present (1) or absent (0). What is more, the known links between drugs and side effect terms (a side effect profile for a specific drug) are also used to represent the information of the subjacent network, which is shown on the right side of Figure 1. In side effect space, the drug profile for a side effect also represents the subjacent network of side effects.
For drug d i and d k , the GIP kernel is defined as follows: where γ is the bandwidth of the Gaussian kernel. γ is set as 1 in our study. pr d i and pr d k are the side effect profile of drug d i and d k , respectively. The COS is defined as follows: The associations between drug and side effect The adjacent matrix of drug and side effect (matrix Y train ) The side effect profile for drug d 1 (prd 1 ) Drug Side effect Links(associations) The drug profile for side effect s 2 (prs 2 )

BioMed Research International
The Corr kernel is calculated as follows: In order to describe the degree of correlation between two random variables, we further use Mutual Information (MI) to measure the similarity between the two random variables: where f ðuÞ (f ðvÞ) denotes the observed frequency of value u (v) in profile pr d i (pr d k ). f ðu, vÞ is the observed relative frequency. Similarly, the kernels of the fingerprint (drug space: K GIP−chem,d , K COS−chem,d , K Corr−chem,d , and K MI−chem,d ) and drug profile of side effects (side effect space: K GIP−link,s , K COS−link,s , K Corr−link,s , and K MI−link,s ) can be constructed via the above functions. The drug space has 8 kernels, and the side effect space has 4 kernels, which are listed in Table 1.

Kernel Target Alignment-Based Multiple Kernel
Learning. In our study, the kernel sets for drug space K d = fK 1,d , K 2,d , ⋯, K k d ,d g and side effect space K s = fK 1,s , K 2,s , ⋯, K k s ,s g are combined via multiple kernel learning, respectively. k d and k s are the number of kernels in drug and side effect space, respectively. A heuristic approach of Kernel Target Alignment-based Multiple Kernel Learning (KTA-MKL) [38,39] is employed to calculate the weights of each kernel. The optimal kernels of K * d and K * s can be obtained as follows: where β d = fβ 1,d , β 2,d , ⋯, β k d ,d g and β s = fβ 1,s , β 2,s , ⋯, β k s ,s g are the weights of kernels in drug and side effect space, respectively. KTA-MKL estimates the weight of each kernel by COsine Similarity of matrices (drug space): where kPk F = ffiffiffiffiffiffiffiffiffiffiffiffiffi ffi hP, Pi F p denotes the Frobenius norm. hP, Qi F = TraceðP T QÞ is the Frobenius inner product. The value of kernel alignment can describe similarity of two kernels. KTA-MKL estimates the value between the ideal kernel matrix and the drug kernel (or side effect kernel) as follows: where K ideal,d = Y train Y T train ∈ R n×n and K ideal,s = Y T train Y train ∈ R m×m are the ideal kernels of drug and side effect, respectively, which are built via the training label (known associations). [29][30][31]40], the similarity between drugs (or side effects) can be approximated by the inner product of two drug (or side effect) features as follows:

Triple Matrix Factorization-Based Model. Inspired by MF
where A and B are the matrices of Low Rank Approximation and r d and r s are the dimensions of the latent feature space in drug and side effect space, respectively. The objective function of TMF is defined as follows: where Θ ∈ R r d ×r s is the biprojection matrix. λ is the regularization coefficient of Θ. In our study, λ is set as 1.  Let ∂J/∂Θ = 0, so we can obtain functions as follows: where Equation (14) is a Sylvester equation. The final prediction can be constructed by Require: A training matrix Y train ∈ R n×m (known associations), the fingerprint vector f chem d i ∈ R 1×881 ð1 ≤ i ≤ nÞ for the drug; Two parameters: the r d and r s for TMF; Ensure: The prediction of Y * ∈ R n×m ; 1: Constructing the drug and side effect kernels, which are listed in Table 1; 2: Utilizing Equation (7) (KTA-MKL) to obtain the weights β d and β s for drug and side effect, respectively; 3: Building K * d and K * s via Equation (5), respectively; 4: Calculating A ∈ R n×r d and B ∈ R m×r s by Singular Value Decomposition (SVD); 5: Solving Equation (14) (TMF) to estimate Θ; 6: Calculating Y * = AΘB T ; Algorithm 1: Algorithm of our method. The overview of our proposed method is shown in Figure 3 and Algorithm 1.

Result
In this section, we employed benchmark dataset to evaluate our approach and compared it with other existing methods.

Datasets.
In order to test the performance of our model, three types of datasets are employed in our study. They are Pauwels's dataset, Liu's dataset, and Mizutani's dataset, which are collected from the DrugBank [41], SIDe Effect Resource (SIDER) [42], KEGG DRUG [43], and PubChem-Compound [44,45]. Table 2 lists benchmark datasets of this study.

Evaluation
Measurements. The training adjacent matrix can be obtained via randomly setting known associations as 0. In this study, we use 5-fold Cross-Validation (5-CV) and 5-fold local Cross-Validation (5 local CV) to test our method. 5-CV randomly sets known associations as 0 in the whole matrix. 5 local CV is employed to evaluate the prediction of new drugs, which do not have any side effect information. 5 local CV sets some rows of the adjacent matrix as 0 to test related drugs. The Area Under the Precision-Recall curve (AUPR) and Area Under the receiver operating Characteristic curve (AUC) are utilized to evaluate the performance of prediction.

Selecting Optimal Parameters.
In this section, we use the grid search method to get the optimal r d and r s . We test different values of and from 100 to the max value with the step of 100. The results of the grid search method are shown in Figure 4 (on Mizutani's dataset by 5-CV). r d = 700 and r s = 800 are the best parameters (AUPR) on Mizutani's dataset. In Figure 4, the lower value of AURP and AUC is blue, and the higher value is yellow. On the other two datasets, we use the same parameters of r d and r s .

Performance of Different Kernels.
We evaluate the performance of multiple kernels and single kernel on three datasets. The results of prediction are listed in Table 3 and Figure 5.   The TMF uses the drug fingerprint and drug profile for side effects. b The TMF uses the side effect profile for drugs and drug profile for side effects. c The TMF uses the drug fingerprint, side effect profile for drugs, and drug profile for side effects.  In Table 4, we list the weight of each kernel on three datasets. We can find that the weights of K MI−link,d and K MI−link,s are the highest than other kernels. At the same time, their performance is also the best. KTA-MKL could reduce bias of kernels by the low weights.

Comparison with Existing Methods.
To evaluate the performance of the TMF model, we compare it with other methods. The results are listed in Table 5 [29], which is also based on Matrix Factorization (MF). The results of other MF-based models, including Collaborative Matrix Factorization (CMF) [30] and Graph Regularized Matrix Factorization (GRMF) [31], are competitive. Local and Global Consistency (LGC) [18] is our previous work.
3.6. Local CV and Case Study. In some cases, certain drugs are new and have no information of side effects. The 5 local CV is employed to test the performance of the side effect prediction for new drugs. In this section, we also compare TMF    Table 6 and Figure 6.
To predict the side effects of a new drug, our model calculates the strength of associations between the new drug and all existing side effects. The predictive strength scores of TMF will be ranked by descending order. The higher the value of the score, the higher the possibility of associations. In this section, we discuss two cases (drug caffeine and captopril on Mizutani's dataset) of top 10 associations predicted. The details are listed in Tables 7 and 8. Results are checked by the masked associations between drug caffeine (or captopril) and side effects.

Running Time.
We evaluate the performance for predictive models of running time. The results of test are listed in Table 9. The running time of CMF is less than our method (TMF), LGC, GRMF, and NRLMF on Pauwels's dataset (910 seconds), Mizutani's dataset (757 seconds), and Liu's dataset (846 seconds). TMF costs 977, 873, and 929 seconds, which are less than the ensemble model [26].

Conclusion and Discussion
In this study, we develop a Triple Matrix Factorization-based model to predict the associations between drugs and side effect terms. In drug space, several kernels are constructed from the chemical substructure fingerprint and known side effect-associated subnet. The side effect kernels are built from the known drug-associated subnet. The kernel functions include GIP, COS, Corr, and MI. Above kernels are combined by KTA-MKL in drug and side effect space, respectively. The integrated kernel matrices (including drug and side effect) are Low Rank Approximation in the TMF model.     9 BioMed Research International effects. In the future, a graph-or hypergraph-embedded MF-based model will be developed to improve the predictive performance of drug-side effect association.

Data Availability
The datasets, codes and corresponding results are available at https://figshare.com/s/10ee9c07123304a0ef82.