Revealing Drug-Target Interactions with Computational Models and Algorithms

Background: Identifying possible drug-target interactions (DTIs) has become an important task in drug research and development. Although high-throughput screening is becoming available, experimental methods narrow down the validation space because of extremely high cost, low success rate, and time consumption. Therefore, various computational models have been exploited to infer DTI candidates. Methods: We introduced relevant databases and packages, mainly provided a comprehensive review of computational models for DTI identification, including network-based algorithms and machine learning-based methods. Specially, machine learning-based methods mainly include bipartite local model, matrix factorization, regularized least squares, and deep learning. Results: Although computational methods have obtained significant improvement in the process of DTI prediction, these models have their limitations. We discussed potential avenues for boosting DTI prediction accuracy as well as further directions.


Introduction
Drug discovery is a complicated, costly and low-success process. It is estimated that it takes about 10∼15 years and 0.8∼1.5 billion dollars from initially presenting the abstract concept to putting it into market for a new drug. Despite pharmaceutical companies investing enormous costs and time, only about 10% of drugs are successfully evaluated by FDA every year [1,2]. Nobel Laureate James Black presented that the most solid foundation for new drug discovery is beginning from old drugs [3]. Drug repurposing, which repositions the existing drugs to find new treatment clues of the old drugs, can shorter drug research

Flowchart
Various DTI inference algorithms have been designed over the past two decades. These methods usually integrated the datasets provided by Yamanishi et al. [9] and other biological information from various public databases into their proposed computational models, and then trained the models, finally scoring the interaction probabilities for unknown drug-target pairs. We briefly represented the flowchart as Figure 1.

DTI Relevant Databases
Various experimental data provides abundant information for DTI identification and significantly improved the performances of DTI prediction models. It is feasible to merge these DTI data from different databases. To address the conflict problems between data values from different repositories in the process of data merging, for example, Liu et al. [10] set a priority for each DTI and give precedence to the more reliable data source. Liu et al. [10] merged different compound-protein interaction data retrieved from Matador, DrugBank, and STITCH. Matador and DrugBank are manually curated databases. STITCH is a comprehensive repository collected from four different sources: manually curated databases, experimental validation, text mining and model prediction. Particularly, STITCH assigns each DTI a score ranging from 0 to 1000. Each score indicates confidence degree of each DTI supported by the above four types of evidence. In addition, Liu et al. [10] considered that DTIs from Matador and DrugBank are supported by biochemical experiments and the literature and gave these DTIs the highest score of 1000.
Lou et al. [11] designed a novel Network integration pipeline for DTI prediction, DTINet. DTINet developed other ways of DTI data merging from a multiple-views perspective based on the following steps: Step 1 Extracting related data from different databases: (i) drugs, DTIs and drug-drug interactions from DrugBank; (ii) proteins, and protein-protein interactions from HPRD [12]; (iii) diseases, drug-disease and protein-disease associations from the Comparative Toxicogenomics Database [13]; (iv) side-effects and drug-side-effect associations from SIDER [14]. Step 2 Excluding isolated entities (nodes) which have no edges in the network. Step 3 Integrating four types of nodes and six types of associations (edges) in Step 1 and constructing a heterogeneous network. Step 4 Building multiple similarity networks to further increase the network heterogeneity. Step 5 Removing homologous proteins or similar drugs from constructed heterogeneous networks to reduce the potential redundancy in the DTIs: (i) removing the DTIs involving homologous proteins with sequence identity scores larger than 40%; (2) removing the DTIs involving similar drugs with Tanimoto coefficients larger than 60%; (3) removing the DTIs involving the drugs with Jaccard similarity scores of side effects larger than 60%; (4) removing the DTIs involving the proteins or drugs associated with similar diseases (Jaccard similarity scores larger than 60%; (5) removing the DTIs involving either homologous proteins with sequence identity scores larger than 40% or similar drugs with Tanimoto coefficients larger than 60%.
Parts of DTI repositories are described as follows:

DrugBank
The DrugBank database [15] (https://www.drugbank.ca/) provides 12,701 drug entries, which includes 2536 FDA-approved small molecules, 1279 FDA-approved biotech drugs, 130 nutraceuticals and more than 5822 experimental drugs. DrugBank describes drug details including chemical structures, pharmacological and pharmaceutical information. Further, DrugBank provides 5144 non-redundant proteins linking these drug entries and protein details including sequences, structures and pathways.

SuperTarget
The SuperTarget database [16] (http://insilico.charite.de/supertarget/) provides comprehensive data services and links with nine websites: BindingDB, RCSB PDB, PubChem, UniProt, KEGG, DrugBank, SuperCyp, SIDER, and ConsensuspathDB. It consists of six different types of entities: drugs, proteins, side-effects, pathways, ontologies and a special subgroup of targets (the cytochromes 450). The database contains 332,828 interactions between 6219 proteins and 195,770 compounds. The drugs can be searched by drug name, PubChem ID, ATC-code, and side-effects. The targets can be retrieved by target name, EC-number, UniProt name, accession number, PDB ID, and KEGG target ID. In addition, SuperTarget provides 282 drug-target-related pathways, 6532 drug-target-related ontologies, and 63 cytochromes.

ZINC
The ZINC database [18] (http://zinc.docking.org/) is a free and curated collection of commercially-available compounds for virtual screening. It provides more than 350,000,000 purchasable compounds in ready-to-dock and 3D formats and can search compounds by one or more ZINC IDs, a specific target or targets, component ring names, common compound names, CAS/MDL Number, and vendor or catalog-specific code.

IUPHAR/BPS Guide to PHARMACOLOGY
IUPHAR/BPS Guide to PHARMACOLOGY [19] (http://www.guidetopharmacology.org/) is an open-access website and provides an expert-driven guide to pharmacological targets. The website provides interaction information between 9459 ligands and 2917 targets. Ligands contains FDA-approved drugs, synthetic organics, inorganics, antibodies, labelled ligands, metabolites, natural products, endogenous peptides and other peptides. Targets contains G protein-coupled receptors, ion channels, nuclear hormone receptors, kinases, catalytic receptors, transporters, enzymes, and other protein targets.

BindingDB
BindingDB [21] (http://www.bindingdb.org/bind/index.jsp) is a web-accessible database mainly focusing the interactions between drugs and proteins which can be candidate drug targets acting on small, drug-like molecules. It contains 1,558,402 binding data between 697,594 small molecules and 7233 protein targets. In addition, it provides 2291 protein-ligand crystal structures for proteins with 100% sequence identity and 5816 crystal structures for proteins with 85% sequence identity.

TTD
Therapeutic Target Database (TTD) [22] (https://db.idrblab.org/ttd) is an open-access website that can download different types of biological information including drug structure, therapeutic targets, pathway information, and drug combinations. The database provides 2104 drug resistance mutations targeting 63 diseases, 758 targets from 12,615 patients of 70 diseases, 629 targets across various tissues from 2565 healthy individuals, 2612 target combination, and 25,333 multi-target drugs.

MATADOR
The MATADOR database [16] (http://matador.embl.de/) is a manually annotated chemical-protein interaction website. The database differs from other resources in that it provides any direct and indirect interactions between chemical and proteins assembled by automated text-mining and manual curation. Each interaction can be deduced by retrieving the PubMed or OMIM database.

ChEMBL
The ChEMBL database [23] (https://www.ebi.ac.uk/chembl/) is a open-access website about bioactive drug-like small molecules. It provides 15,207,914 activities from 2,275,906 compound records and 12,091 targets. The properties of small molecular drugs contain 2-D structures, calculated properties including logP, molecular weight, and Lipinski parameters, and abstracted bioactivities including binding constants, pharmacology, and ADMET.

DCDB
The DCDB database [24] (http://www.cls.zju.edu.cn/dcdb/) summarizes action pattern of coordinated drugs and provides a theoretical basis for modeling and simulating beneficial drug combinations. It contains 1363 drug combinations between 904 individual drugs and 805 targets. Furthermore, it provides three types of relevant information: combined activity/indications, drug-drug interactions, and possible mechanism for each drug combination; chemical, pharmaceutical and pharmacological properties, and known molecular targets for each drug; sequence, function and affiliated pathway for each drug target.

DTI Relevant Software Packages
Drug and target features are important for unknown DTI classification. Researchers have developed various software packages to extract abundant drug and target features.

RDKit
RDKit [25] (http://www.rdkit.org/) is an open-source cheminformatics software and descriptor generator for machine learning. The website can compute various features including canonical SMILES, 2D depiction, fingerprinting, chemical reactions, molecular serialization, similarity/diversity picking, and 2D and 3D descriptions for drug molecules. The software is continuously updating from 2012.

OpenBabel
OpenBabel [27] (http://openbabel.org/wiki/Main_Page) is a open chemical toolbox. The software allows anyone to search, store, analyze, or convert data from various areas including molecular modeling and biochemistry. In addition, it can read, write, and convert over 110 chemical file formats. The software is continuously updating since 2007.

Rchemcpp
Rchemcpp [28] (http://shiny.bioinf.jku.at/Analoging/) is an efficient web server to find structural analogs in Drugbank, ChEMBL, and Connectivity Map. These structural analogs are molecular compounds similar to a query compound. Molecule kernels are applied to compute structural similarity based on shared substructures between molecules. Rchemcpp provides various important applications for drug development, for example, prioritizing molecular compounds after screening and reducing adverse side effects in the process of late research and development.

PyDPI
PyDPI [29] (https://sourceforge.net/projects/pydpicao/) is a comprehensive platform for separately compute features of proteins and drugs from amino acid sequences and chemical structures. It provides 42 descriptor types composed of 9890 descriptors for proteins, 13 descriptor types composed of 615 descriptors for drugs. In addition, the platform provides seven molecular fingerprint systems for drugs, including atom pair fingerprints, topological fingerprints, topological torsion fingerprints, electro-topological state fingerprints, Morgan/circular fingerprints, MACCS keys, and FP4 keys.

Rcpi
Rcpi [30] (http://bioconductor.org/packages/release/bioc/html/Rcpi.html) is a freely available molecular informatics toolkit for finding compound-protein interactions. The toolkit is applied to represent complex molecules from proteins and drugs and complex interactions including compound-protein and protein-protein interactions. It can also compute abundant physicochemical and structural features of proteins from amino acid sequences, molecular descriptors of small molecular compounds from their structures, compound-protein interaction and protein-protein interaction descriptors.

KeBABS
KeBABS [31] (http://www.bioinf.jku.at/software/kebabs/) is an R package to analyze biological sequences including amino acid, DNA, and RNA sequences. It complements some important kernels for sequence analysis based on kernel methods. It can efficiently select hyperparameters by cross validation (CV), nested CV and features grouped CV.

PROFEAT
PROFEAT [32] (http://bidd2.nus.edu.sg/cgi-bin/profeat2016/main.cgi) is a open web server and can extract protein features from network properties of protein-protein interaction network and amino acid sequences. It groups physicochemical and commonly-used structural features into six categories composed of 10 features: protein, protein structure, protein-protein interaction pair, protein-ligand interaction pair, small molecule, and biological network. The calculated features include 51 descriptors and 1447 descriptor values, such as amino acid composition, dipeptide composition, Geary autocorrelation, Moran autocorrelation, normalized Moreau-Broto autocorrelation, quasi-sequence-order descriptors and composition, sequence-order-coupling number, transition and distribution of different structural and physicochemical properties. Particularly, the server can also calculate other autocorrelation descriptors with the properties defined by users. The server is always updating.

Pse-in-One
Pse-in-One [33] (http://bioinformatics.hitsz.edu.cn/Pse-in-One/download/) is a flexible web server to effectively capture key features of a biological sample (such as protein, DNA, and RNA) from its sequence. It can generate nearly all the possible features for protein, DNA, and RNA through 28 different modes. In addition, it can also generate feature vectors based on user-defined properties.

ProtrWeb
ProtrWeb [34] (http://protrweb.scbdd.com/) is a R package providing various numerical representation of proteins and peptides from amino acid sequences. The package provides eight descriptor categories composed of 22 types of descriptors which include about 22,700 descriptor values. It can also automatically construct customized descriptors with used-defined properties.

On-Line Tools/Web-Service for DTI Prediction
Stimulated by the increasing interest in DTI identification and the availability of various open data repositories, many online tools have been exploited to find new DTIs. These tools have been provided without considering the mathematical models and computational complexity, and thus significantly lower the collaboration barriers among different researchers involved in multiple disciplines. More online tools are described as follows [35].

DrugE-Rank
DrugE-Rank [36] (http://datamining-iip.fudan.edu.cn/service/DrugE-Rank) nicely combines the advantages of feature-based and similarity-based methods with ensemble learning. Its performance is thoroughly validated by three types of main experiments on FDA approved drugs from DrugBank: cross-validation on the drugs before March 2014, independent test on the drugs after March 2014, and independent test on FDA experimental drugs.

DINIES
DINIES [37] (http://www.genome.jp/tools/dinies/) provides integrative analyses by combing various types of heterogeneous data, for example, chemical structures and side effects of drugs, amino acid sequences and domains of target proteins. It can accept any precalculated similarity values of drugs and targets. Users can select different parameters in the supervised learning model and specify weights to integrate different heterogeneous biological data.

Drug2Gene
Drug2Gene [38] (http://www.drug2gene.com) integrates DTI data from 19 public databases. It provides 4,372,290 unified DTIs for targets, most of which contain reported bioactivity data. It aims mainly at finding tool compounds interacting with a given target protein or identifying all known target proteins for a drug.

iGPCR-Drug
iGPCR-Drug [39] (http://www.jci-bioinfo.cn/iGPCR-Drug/) is a sequence-based classifier to infer the associations between drugs and GPCRs in cellular networking. The high throughput tool formulates a drug compound by a 256 vector, a GPCR by pseudo amino acid composition and then predict possible drug-GPCR associations based on fuzzy k-nearest neighbor method.

SynSysNet
SynSysNet [40] (http://bioinformatics.charite.de/synsysnet) is an online platform to create a comprehensive four-dimensional network from 1000 synapse specific proteins and their small molecules. It provides numerous DTI information for 750 FDA approved drugs and 50,000 compounds. Approximately 200 pathways involved can be applied to explore DTIs.

DTome
DTome [42] (http://bioinfo.mc.vanderbilt.edu/DTome) extracts and integrates four types of interaction data including drug interactions, drug-gene associations, DTIs, and target-/gene-protein interactions. It utilizes web-based query method to find drug candidates and build a DTome network based on four types of interaction data. Additionally, it can analyze and interpret a DTome network based on network analysis and visualization procedures.

PharmMapper
PharmMapper [43] (http://lilab.ecust.edu.cn/pharmmapper/) provides various repertoire of pharmacophore database related to targets in DrugBank, BindingDB, TargetBank, and possible drug target databases. The pharmacophore database contains more than 7000 receptor-based pharmacophore models. PharmMapper can automatically find the best position for a query molecule based on the models and list the top N best-fitted hits with similar target annotations.

SwissTargetPrediction
SwissTargetPrediction [44] (http://www.swisstargetprediction.ch) can accurately find the target proteins of bioactive small molecules by combining 2D and 3D similarities with known ligands. It can predict protein-small molecule interactions in five different organisms including human, mouse, rat, horse and cow.

TargetNet
TargetNet [45] (http://targetnet.scbdd.com) can find the activity for a query molecule across 623 human target proteins based on multi-target structure activity relationship analysis. It generates a DTI profiling as a feature vector of drugs to infer drug model of action, drug-drug interactions, toxicity classification, and target candidates.

DT-Web
DT-Web [46] (http://alpha.dmi.unict.it/dtweb/) computes recommendations for a query drug combined with domain-specific knowledge representing drug and target similarities. It can find drugs acting simultaneously on multiple target proteins in a multi-pathway environment. The platform is periodically synchronized with the DrugBank database and updated accordingly.

Network-Based Methods
Computational methods for DTI prediction can be roughly classified into four categories: ligand-based approaches, docking approaches, network-based approaches, and machine learning-based approaches. Ligand-based approaches assume that similar drugs tend to bind similar targets and predict underlying DTIs based on ligand similarities. However, prediction accuracies of ligand-based approaches may be unreliable when known ligands for a protein are not enough. Docking approaches fully utilize the 3D structures of proteins, however, this type of method cannot find new DTIs when the 3D structures of proteins are unknown. Network-based approaches and machine learning-based approaches tend to address the limitations of the above two types of methods. Network-based methods efficiently predicted potential DTIs by integrating graph-based techniques and various biological data.

DSSI
Campillos et al. [47] exploited a drug side-effect similarity-based inference method (DSSI). DSSI can be classified into three steps: Step 1: Developing a measure to compute the probability that two drugs share a common target based on drugs' chemical similarity (2D Tanimoto coefficient, y): Step 2: Measuring the probability that two drugs simultaneously interact with a target based on their phenotypic side-effect similarity(x): Step 3: Designing a sigmoid function to compute the probabilities of two drugs sharing the same target incorporating chemical similarities and phenotypic side-effect similarities.
where the fitted parameters DSSI can find possible DTIs, however, it can only be used to infer potential associations for drugs that have known side-effect information, thus seriously limiting its application.

MTOI
Yang et al. [48] exploited a robust computational model to mine new drug targets based on multiple target optimal intervention solutions (MTOI). MTOI is classified into two stages: drug target identification and optimal multi-target control solution inference. In stage 1, MTOI firstly defined the disease state combing experimental data from patients and cells in abnormal conditions, and the desired state that could be restored into normal physiological state; it then selected activities of potential drug targets and calculated median deviation (m.d.) of the activities between the normal and disease states to score underlying drug targets. In stage 2, MTOI added drug reactants to screened drug targets and obtained multi-target intervention solution by selecting intensities. MTOI identified underlying drug targets and best restored an inflammation-related network to a normal state. Figure 2 described the details.

Stage 2 Optimal multi-target control solution inference
Defining the disease and desired states

NRWRH
Chen et al. [49] assumed that similar drugs intend to interact with similar targets and presented a method, Network-based Random Walk with Restart on the Heterogeneous network (NRWRH) by integrating drug similarity network, protein similarity network, and DTI network into a heterogeneous network. NRWRH computed the interaction probabilities for unknown drug-target pairs by randomly walking on the heterogeneous network: NRWRH finally defined the following iteration model to compute the interaction probability by randomly walking in DTI network: Figure 3 describes the details.

Transition probability
Drug similarity network Computing probability of finding the random walker at node i at step Target similarity network DTI network where λ and γ is the probability of jumping from target/drug network to drug/target network and the restart of walking at the seed nodes, respectively.

DBSI, TBSI, and NBI
Cheng et al. [50] viewed a DTI network as a bipartite graph and developed three DTI prediction methods: Drug-based similarity inference (DBSI), Target-based similarity inference (TBSI), and Network-based inference (NBI). DBSI assumed that a query drug d i similar to known drugs interacting with a target t j may associate with t j and defined a linkage score between d i and t j : TBSI assumed that a query target t i similar to known targets, which interacts with a drug d i , may associate with d i and defined a linkage score between d i and t j : Given a target t j , NBI defined its score associated with d i : y os is the number of targets interacting with d o , and k(t l ) = n ∑ s=1 y sl is the number of drugs associating with t l .

DTINet
Luo et al. [11] integrated various information from multiple heterogeneous networks and presented a novel Network integration pipeline for DTI prediction, DTINet. DTINet used a compact feature learning method to handle the noisy, high-dimensional and incomplete natures of large-scale biological data and obtained low-dimensional but informative vector representations of drugs and targets. Figure 4 described the details.

Constructing the heterogeneous network based on the information:
Performing a network diffusion algorithm on each network and computing distribution for each drug (or protein) node Computing the best projection from drug space onto protein space via a matrix completion method Obtaining the low-dimensional feature vectors of drugs and proteins Deriving the new interacting targets for a drug based on these targets' geometric closeness to the projected feature vector of the drug DCA learns a low-dimensional vector representation for all nodes such that their connectivity patterns in the heterogeneous network are best interpreted Minimizing the difference between the diffusion distributions of individual networks and the corresponding model distributions and computing lowdimensional vector representations for each drug or protein drug-drug interactions drug-disease associations drug-side-effect associations drug-drug similarities drug-protein interactions protein-disease associations protein-protein interactions protein-protein similarities

Machine Learning-Based Methods
The researchers exploited numerous models and algorithms to find missing DTIs based on machine learning methods except for network-based methods. These methods can be roughly classified into five groups: Bipartite Local Model (BLM), regularized least squares, matrix factorizations, deep learning, and other methods.

KRM
Yamanishi et al. [9] exploited a Kernel Regression Method (KRM). KRM scored the interaction likelihoods for unknown drug-target pairs through three stages: constructing pharmacological space, learning model based on kernel regression to represent the correlation between chemical/genome space and pharmacological feature space, and calculating feature-based similarity scores. Figure 5 describes the details.

Stage 1 Constructing pharmacological space
Stage 2 Learning model based on kernel regression to represent the correlation between the chemical/genomic space and pharmacological feature space: where weight w i can be computed by optimizing the following loss function: Bleakley et al. [51] proposed a supervised learning-based Bipartite Local Model (BLM) to find novel linkage between drug d i and target t j in the following way: Step 1: Excluding target t j . For a drug d i , listing all other known targets in the bipartite network and giving their labels +1; listing the targets unknown to be targeted by d i and giving their labels −1.
Step 2: Finding a classification rule to discriminate the +1-labeled data from the −1-labeled data based on genomic sequence information for the targets.
Step 3: Taking this rule and identifying the label of t j and thus inferring whether there exists linkage between d i and t j .
Step 4: Fixing the same target t j and excluding drug d i , listing all other known drugs interacting t j in the bipartite network and giving their labels +1; listing the drugs unknown to interact with t j and giving their labels −1.
Step 5: Finding a classification rule to discriminate the +1-labeled data from the −1-labeled data based on chemical structure information for the drugs.
Step 6: Taking this rule and identifying the label of t j and thus inferring whether there exists linkage between d i and t j .

BLM-NII
Mei et al. [52] incorporated Neighbor-based Interaction-profile Inferring model (NII) into the BLM to find potential DTIs, especially for new drugs and targets (BLM-NII). BLM-NII can be grouped into five steps: computing NII, computing drugs and targets similarity matrix, learning a local model, computing the interaction probability, and obtaining final results. Figure 6 describes the details. (1 ) ,.

LapRLS, NetLapRLS
Xia et al. [53] designed Laplacian regularized least squares (LapRLS) and LapRLS incorporating DTI network (NetLapRLS) to identify underlying DTIs based on a data-dependent manifold regularization model. The details are described in Figure 7.
Computing drug similarity matrix: Defining continuous classification function: Defining continuous classification function: Predicting new DTIs with drug similarities: Predicting new DTIs with target similarities: Performing Laplacian operation: Computing diagonal matrices: where K d ∈ R n×n and K p ∈ R m×m represented two undirected graphs of drug domains and protein domains including both labeled and unlabeled samples, respectively.

RLS GIP
Van et al. [54] assumed that a drug, which exhibits a similar interaction pattern or non-interaction pattern with targets in a known DTI network, is likely to exhibit similar interacting behavior when finding new targets for the drug. Similarly, targets have similar features. Based on the assumption, Van et al. [54] exploited a Regularized Least Squares (RLS) method combined with Gaussian Interaction Profile kernel (RLS GIP ). RLS GIP predicted new DTIs based on three steps: separately computing GIP kernels of drugs and targets, obtaining K chemical,d and K genomica,t by adding a small multiple of an identity matrix and integrating the two kernels into GIP kernel, and predicting DTIs based on RLS classifier. Figure 8 (1 ) Identifying potential DTIs based on RLS classifier and drug and target kernels, respectively: Predictive results are the average of the output values (RLS-avg): Computing the Kronecker product of drug and target kernels:

WNN, WNN-GIP
Van et al. [55] developed a weighted nearest neighbor (WNN) method to infer association candidates for new drugs/targets. WNN defined an interaction profile score y d W NN for a new drug d as follows: where the weight w i can be computed by a given decay value T ≤ 1 as w i = T i−1 . WNN [55] then extended GIP [54] with WNN and exploited WNN-GIP to identify possible association information for new drugs (or targets): for a new drug d, WNN-GIP add y d W NN as a new row to original DTI matrix Y and apply GIP to obtain interaction profile of d.

Kron-RLS
Pahikkala et al. [56] presented a Kronecker Regularized Least-Square-based method (Kron-RLS) to score unknown drug-target pairs. Given a training set X (x i ∈ X is a drug-target pair) and their real labels y i (y i = 1, if the drug interacts with the target in x i ; y i = 0, otherwise), Kron-RLS formulated the problem of DTI prediction as minimizing the following objective function: where || f || 2 k is the norm of f . By representation theorem, the minimization of the above function can be described as: where a i can be computed by the following equation: where K = K d ⊗ K t included all drug-target pairs, K d and K t represented kernel matrix of drugs and targets in the training set.

KMDR
Kuang et al. [57] assumed that two similarity entities tend to link similar nodes to each other and developed a kernel matrix dimension reduction method (KMDR). KMDR defined a general formulation: where vec(Y) is a drug-target pair vector, ∧ Y is predicted drug-target association score matrix. K = V ∼ Λ V T is a kernel matrix. KMDR exploited three independent sub-algorithms: KMDR-KP, KMDR-KS, and KMDR-avg. KMDR and scored the interaction probabilities for unknown drug-target pairs by the following equation: KMDR-avg defined two kernels: K d = S d and K t = S t , scored for unknown drug-target pairs based these two kernels, respectively: The final scores can be calculated as:

Matrix Factorization
As shown in Figure 9, matrix factorization methods can be used to complete the missing values in DTI matrix. The type of method first factorized Y into two matrices A ∈ n×k and B ∈ m×k satisfying AB T ≈ Y, where A and B represented latent feature vectors of drugs and targets. k is the number of features, k n, m, respectively. KBMF2K designed a deterministic variational approximation method based on fully conjugate probabilistic model and projected drugs and targets into a unified subspace. Figure 10 illustrates the proposed probabilistic model. where ∧ and P d represented priors and projection matrices for a chosen subspace dimensionality, respectively. The drug kernel matrix K d is applied to project the drug-target pairs to a low-dimensional space, G d consisted of the low-dimensional feature representations of drugs. Similarly, G t can be computed. Finally, the predicted interaction matrix Y p can be calculated based on G d and G t .

PMF
Cobanoglu et al. [59] developed a probabilistic matrix factorization method (PMF) based on collaborative filtering algorithm. Using a probabilistic model with Gaussian noise, PMF defined the conditional probability for each observed interaction as follows: where f (y ij |a i b T j , σ 2 ) denotes the Gaussianly distributed probability density function for y ij , with mean µ and variance σ, I ij is an indicator function equal to 1 if y ij is known and 0 otherwise.
Using zero-mean, PMF represents spherical Gaussian priors on A and B as: PMF then computed the log-likelihood of A and B: Finally, the underlying DTI score matrix can be computed:

MSCMF
Zheng et al. [60] proposed a Multiple Similarities Collaborative Matrix Factorization (MSCMF) method by integrating matrix factorization, collaborative filtering and relevant biological information including chemical structures and ATC codes of drugs and genomic sequence, GO and protein-protein interaction network of targets. MSCMF found possible DTIs based on the following seven steps: Step 1: Building an objective function to minimize the squared error between Y and A and B: arg min Step 2: Introducing a weighted low-rank approximation model to distinguish labeled drug-target pairs from unlabeled pairs: arg min where W is a weight matrix, w ij = 1 if y ij is labeled, namely, interacting or non-interacting; otherwise, w ij = 0.
Step 3: Applying Tikhonov regularization to avoid overfitting of A and B to training data: arg min where λ is a regularization coefficient.
Step 4: Representing drugs similarity S d as approximation of corresponding two drug feature vectors: Similarly, target similarity S t can be represented as: Step 5: Linearly combing multiple similarity: where w d = (w 1 d , w 2 d , ., w n d ) and w t = (w 1 t , w 2 t , ., w m t ). w i d and w j t are weights from multiple similarity matrices of drugs and targets, respectively.
Step 6: Developing the entire objective function and scoring unknown drug-target pairs: arg min where λ d , λ t , and λ w are regularization coefficients.
The model can be solved with alternating least squares algorithm.
Step 7: Computing the interaction probabilities for unknown drug-target pairs:

NRLMF
Liu et al. [61] designed a Neighborhood Regularized-based Logistic Matrix Factorization method (NRLMF) to model the probability of a drug interacting with a target. NRLMF first model the interaction probability p ij between a drug d i and a target t j based on logistic matrix factorization: NRLMF then minimized the following objective function to calculate the interaction probabilities for unknown drug-target pairs by placing spherical Gaussian priors on a i and b j : where σ 2 d and σ 2 d are used to control Gaussian distribution variances, λ d = 1 The final objective function can be described as:

DNILMF
Hao et al. [62] extended NRLMF and proposed a Dual-Network integrated Logistic Matrix Factorization method (DNILMF). DNILMF first calculated the interaction probabilities for unknown drug-target pairs: DNILMF then computed the final interaction scores by maximizing the following objective function: where Z = αAB T + βS d AB T + γAB T S t , • denotes the Hadamard product.

DeepDTIs
Wen et al. [63] used Deep Belief Network (DBN) to infer potential DTIs without classifying each target into different classes. DeepDTIs identified novel DTIs through three steps: Step 1: Choosing the most simple and common features to describe drugs and targets: representing chemical compounds with extended connectivity fingerprints and targets with protein sequence composition descriptors.
Step 2: Abstracting feature representations based on DBN. DBN used by DeepDTIs consisted of five layers: the first layer (the input layer) is the calculated features, the second, third and fourth layer are the hidden layers, and the last layer is output layer.
Suppose that x is training sample, DeepDTIs modeled the joint probability distribution between x and l hidden layers based on DBN: where x = h 0 , P(h k−1 |h k ) is a visible-hidden conditional probability distribution at level k, P(h l−1 , h l ) is the visible-hidden joint probability distribution in the top level.
Step 3: Building a classification model with known label DTIs.

EENN
Gao et al. [64] developed an End-to-End Neural Network (EENN) model to identify DTI candidates directly from raw chemical structures and amino acids sequences. EENN contained four parts: describing drugs and proteins based on related biological information, projecting drugs and proteins into dense vector spaces by integrating graph-based convolutional neural network and long short-term memory recurrent neural networks, forming the context matrix for drugs and protein with attentive pooling network and computing weighted sums of the context matrix, and predicting the interaction probabilities for unknown drug-target pairs based on inference with siamese network. The details are shown in Figure 11.

Stacked Autoencoder
Wang et al. [65] designed a novel computational model to find possible DTIs combining stacked autoencoder in deep learning models. The proposed method can automatically screen hidden information from raw data and select highly representative features based on iterations of multiple layers.
The method can be grouped into four parts: describing each DTI (sample) based on 881 chemical structures of drugs and the position-specific scoring matrix related to protein, reconstructing features with stack autoencoder, classifying unknown drug-target pairs with random forest classifier, and predicting labels for test samples. The details are shown in Figure 12.
In step 2, Wang et al. first encoded the training sample X ∈ R d 0 into the hidden representation H ∈ R d 1 by the mapping f c : where J c is the activation function, W 1 and b 1 are weighted parameters W 1 ∈ R d 0 ×d 1 and bias vector b 1 ∈ r d 1 , respectively. The representation of the hidden layer H is then mapped into the output layer Z ∈ R d 0 by the mapping f d : where J d is the activation function, W 2 and b 2 is weighted parameters W 2 ∈ R d 0 ×d 1 and bias vector b 2 ∈ r d 0 , respectively. The parameters can be learned by minimizing the following loss function: where Θ r (X, Z) and τ are the reconstruction error and the weight decay cost, respectively. The hidden layer learned the features and reduced the dimension of original data by mapping. The highest hidden layer of autoencoder can be used as the features of raw data extracted by the stacked autoencoder.

Step 1 Describing drugs and targets with related biological information
Step 2 Projecting sequential input to dense vector representations with RNN: Using long short term memory (LSTM) recurrent neural network (RNN) and producing new hidden and cell state , softmax( ) and softmax( ) Step 4 Inference with Siamese network Separately feeding the attention based vector representations into the two networks: Describing a DTI as a training sample X based on drugs and targets Encoding X into the hidden representation P by the mapping: Mapping P into the output layer Z by: Dividing the sample set M into K disjoint subsets

RBM
Wang et al. [66] learned associated probabilities of unknown drug-target pairs using a two-layer restricted Boltzmann machine (RBM) where visible units encoded types of DTIs and hidden units represented latent features of DTIs. Figure 13 describes the details.

NetCBP
Chen et al. [67] exploited a semi-supervised learning-based prediction model (NetCBP) combined with network consistency. NetCBP assumed that there existed coherent interactions between drugs ranked based on their correlations to a query drug and targets ranked based on their correlations to the hidden targets of the query drug, and then designed a learning model to maximize the rank coherences relevant to known DTIs. The details are described in Figure 14.
Computing a Pearson correlation coefficient score:

Discussion
Drug repurposing involves various computational methods [1,3]. Of these techniques, DTI inference is one of the most important foundations [68,69]. In this paper, we summarized data sources and related representation involved in DTI prediction. We mainly introduced two classes of typical computational models, network-based methods and machine learning-based methods. These two types of models are applied to target proteins without any known 3D structure information and obtained effective prediction performance [52,70]. More importantly, almost all the methods can further infer novel DTIs for drugs interacting with at least one target protein [4]. Furthermore, some algorithms can effectively identify DTI candidates for new drug molecules which have no associated information with targets by combining with drug similarity, target similarity, and DTIs [4,52,71]. However, there are a few limitations to solve.
Network-based methods are limited to application because DTI data are severely imbalanced in the relevant dataset and there are many more unknown drug-target pairs than DTIs in DTI network [4,72,73]. For example, the interactions in ion channel dataset provided by Yamanishi et al. [9] should be 210 × 204 = 42,840, however, the actual interaction is 1467. More importantly, a DTI network usually contains several isolated subnetworks, where network-based models are unable to find new association information for orphan drugs (or targets) which have not any known interaction data in the DTI network [4,70]. Finally, most of the network-based methods are biased toward the drugs (or targets) which tend to interact with more targets (or drugs) [4,73]. Therefore, network-based methods should be further exploited to solve these problems in the future.
Machine learning-based methods obtained good improvement in the process of DTI prediction. Tables 2 and 3 illustrate the performances of some machine learning-based methods from Refs. [52,61]. Table 2 lists AUC and AUPR values provided by Mei et al. [52] for KRM, BLM, RLS GIP , and BLM-NII. These methods are BLM-based methods. The results show that BLM-NII obtained better performance than other BLM-based methods and prove that neighbor-based interaction-profile helps to predict new DTIs.  Table 3 lists the AUC and AUPR values provided by Liu et al. [61] for NetLapRLS, BLM-NII, WNN-GIP, KBMF2K, and NRLMF where NetLapRLS and WNN-GIP are regularized least squares-based methods, BLM-NII is BLM-based method, and the remaining are matrix factorization-based methods. The minor difference of BLM-NII in Tables 2 and 3 may be caused by different experimental settings. Matrix factorization models obtain better performance for DTI identification [59][60][61][62]74]. However, this type of method has more parameters to set and is sensitive to parameters [73]. Although RLS-WNN cannot outperform matrix factorization methods, it is relatively much faster and more robust to parameter selection [73,75]. BLMs can efficiently process many fewer unknown DTIs, and thus they exhibit much lower complexity than global algorithms. Furthermore, BLMs are usually fast and memory-efficient techniques when the dataset used is larger [52,73]. Nevertheless, BLMs cannot deal with the situation that both drugs and targets are not included in the training dataset unless integrated with other methods, for example, BLM-NII [74]. Deep learning-based methods obtained better improvement because of their powerful representation learning ability and are one powerful models for DTI prediction [63,65,76,77].
In summary, although various machine learning-based methods have been already proven to be effective for DTI identification, various challenges still remain.
(i) Most of the supervised learning methods are limited to the negative sample selection problem because there are not experimental validated non-DTI data. Therefore, this type of method can only randomly select negative DTI data from unknown associated drug-target pairs, however, these selected negative samples may contain positive DTIs, which severely affects classification performance and generalization ability of models [4,10,56,71,73,74].
(ii) Machine learning-based prediction models are usually built and evaluated with an excessively simplified experimental setting. Such settings may wander from the real case and produce over fitting results [4,74]. Especially, most of the machine learning-based models simply regard DTI as an on-off association and do not consider other key factors like quantitative affinities and molecule concentrations [56,74]. Pahikkala et al. [56] have illustrated that at least four factors may result in highly positive predictive results when building and measuring supervised machine learning-based methods: experimental setting, evaluation data set, problem formulation and evaluation setup. Therefore, DTI identification should be modeled as a rank or regression problem rather than a binary classification problem [74].
(iii) When predicting possible DTIs based on binary classification, the classification accuracy is biased because the results are from the simple average of two different classification models, which are constructed based on drugs and targets, respectively [4].
(iv) Most of the machine learning-based methods have "poor interpretability" properties, therefore, it is difficult to understand potential drug mechanism of action from a pharmacology viewpoint [74].
Although semi-supervised learning methods overcame the negative sample selection limitation by making use of the unlabeled data, it still cannot solve the problem of classifier combination [4].

Conclusions and Further Research
In this section, we attempt to provide some suggestions of further research on how to improve DTI prediction performance.

Heterogeneous Data Integration
Most models incorporate chemical and genomic information, in addition, previous works have utilized pharmacological or phenotypic information, such as side-effects data, gene expression information, and some associated data. These data represent different natures of drugs and targets and can boost prediction accuracy if used concurrently. However, most existing models are limited to homogeneous information and cannot be directly applied to heterogeneous networks.
Heterogeneous data sources give diverse information and help find possible DTIs from a multi-view perspective. To the best of our knowledge, for instance, some genes coding proteins (targets) are tightly associated with some diseases and the therapeutic effects of the drugs on these diseases reflect their biological activities to these targets. Therefore, integrating with various heterogeneous data sources, such as gene-disease association network, drug-disease association network, metabolic network associated to specific diseases, can potentially improve the accuracy and thus provide new insights.
Although several network-based strategies incorporate heterogeneous data source and derive the associated scores through network diffusion method, most existing models have some limitations and fail to give satisfactory integration paradigms: first, the noise and high-dimensionality natures of biological data easily cause predicted bias. Moreover, some network-specific information may be lost in the process of integrating multiple different networks into a single network, since edges from multiple heterogeneous networks are mixed indiscriminately in such process. Therefore, designing appropriate models to incorporate multiple relevant heterogeneous data sources still remains an open problem.

Reliable Negative Sample Selection
There exist parts of known DTIs (positive samples) and massive unknown drug-target pairs in existing DTI datasets. In addition, there are not experimental validated non-DTIs (negative samples) so that most of the supervised classification algorithms have no choice but to randomly select unlabeled drug-target pairs as negative samples. However, this part of randomly selected negative samples, in fact, may well contain positive DTIs, thereby severely confusing the classification accuracy of supervised-learning techniques. Therefore, although extracting positive drug-target pairs from unconfirmed data is an urgent task, designing an effective method to screen negative DTIs is more challenging [10]. To the best of our knowledge, positive-unlabeled learning [71,78,79] can learn high-quality positive samples and reliable negative samples from the unlabeled data and may be one effective way to select strong negative DTIs.

Noncoding RNAs as Targets
It is worth mentioning to consider noncoding RNAs as drug targets. Noncoding RNAs [80,81] (nc RNAs) are another new class of targets. ncRNAs can control gene expression and affect disease progression, which makes them targets in the process of drug research and discovery. ncRNAs consist of multiple functionally important RNAs including transfer RNA (tRNA), microRNA, intronic RNA, ribosomal RNAs (rRNA), long noncoding RNA, and repetitive RNA. Each class of RNA has different endogenous functions, which provides many opportunities for drug discovery and design.
ncRNAs have been considered as targets and obtained increasing attention. For example, microRNAs have been well-reviewed to be therapeutically targeted candidates [82,83]. Both microRNA mimics and inhibitors are being designed against targets and tested in clinical trials. For instance, the drugs BMN 044/ PRO044, BMN 045/ PRO045, BMN 053/ PRO053, SRP-4053, and SRP-4053 can be used to therapy duchenne muscular dystrophy (DMD) by targeting dystrophin pre-mRNA [81]. Recently, the research on targeting of repetitive RNAs, intronic RNAs, and miRNAs are advanced, however, long ncRNAs, which are regarded as a challenging class of possible drug targets, will be further focused upon.

Environmental Factors and Genetic Factors
Various studies have reported that associations between genetic factors (GFs) and environmental factors (EFs) can greatly influence phenotypes and diseases [84,85]. The computational modeling of GF-EF interaction prediction considerably enriches our knowledge on the mechanisms of GF-EF interactions. For instance, drugs, one class of important EFs, have been revealed to interact with targets (GFs) [84,85]. Qiu et al. [85] suggested that miRNA biomarker signatures of drugs could be applied to evaluate the effects of cancer treatments. Therefore, the analysis and identification of interactions between drugs and genetic factors could help infer novel indications for FDA approved drugs.

Deep Learning
In the era of big data, large quantities of biological data are dramatically increasing. The availability of these datasets have promoted the development of various modeling approaches [63,76]. Deep learning approach is one type of representation-learning method that can be applied to deal with complex works with heterogeneous and high-dimensional datasets. The accumulation of massive drug and target data provides quantities of biomedical features and accelerates the application of deep learning on DTI prediction [77,86]. Although several deep learning methods [63][64][65] are used to identity possible DTIs, there remains many challenges in interpreting deep learning results, such as selecting appropriate deep architectures and model parameters, solving with small samples and high-dimensional nature of the datasets. Therefore, building an appropriate deep model may be one of efficient ways to improve DTI prediction performance.

Sparse Representation
DTI data in DTI network are sparse and imbalanced. There is a small quantity of DTIs and abundant unknown drug-target pairs. For example, in the datasets provided by Yamanishi et al. [9], the number of DTIs are 2926, 1476, 635, and 90 between 445, 210, 223, and 54 drugs and 664, 204, 95, and 26 target proteins, respectively, from enzymes, ion channels, GPCRs, and nuclear receptors. The ratio of known DTIs to all drug-target pairs is 0.0099, 0.0345, 0.03, and 0.0641, respectively. The dataset provided by Wen et al. [63] contains only 6262 DTIs among possible 2,146,240 (1412 × 1520) drug-target pairs from 1412 drugs and 1520 targets, and the ratio of known DTIs to all drug-target pairs is 0.0029. More importantly, DTI prediction must be solved in small samples with high dimension natures of drugs and target information. Sparse representation can automatically discriminate various classes and provides a simple and effective ways of rejecting any invalid test samples not from any class in the training set, and thus reduces data dimension and computational cost [87]. Therefore, sparse representation-based methods may be further applied to DTI prediction.

Types of DTI
Different types of DTIs help us understand the molecular mechanism of drug action. Although the existing methods have achieved promising performance, the majority of them can only infer the binary interaction between a drug and a target, but cannot detect distinct types of interactions. However, the interactions between drugs and targets generally have different meanings, for example, direct interactions produced by protein-ligand binding and indirect interactions caused by either changed expression levels of a target protein or active metabolites induced by a drug [16,66]. In addition, DTIs can be annotated by different drug modes of action, such as activation and inhibition [17]. Therefore, how to use various biological data to identify different types of DTIs may be a challenging problem.

Personalized Medicine
The ultimate goal of DTI identification is to provide treatment clues for patients, especially for cancer patients. However, it is inappropriate to simply use one or a few drugs for all the patients [88]. Therefore, computational methods should be used to mine personalized drugs by integrating cancer-related network, drug-drug interaction network, protein-protein interaction network, metabolic network, and so on. Fusing this important information and novel network-based models, researchers may find some valuable drug discovery strategies. In addition, computational models could be applied to predict personalized drug targets, drug effects and resistances for cancer treatment, and infer personalized cancer risk for healthy individuals [89,90]. Therefore, performing personalized medicine based on DTI identification may be a topic of further research.