Drug repositioning of COVID-19 based on mixed graph network and ion channel

Research on the relationship between drugs and targets is the key to precision medicine. Ion channel is a kind of important drug targets. Aiming at the urgent needs of corona virus disease 2019 (COVID-19) treatment and drug development, this paper designed a mixed graph network model to predict the affinity between ion channel targets of COVID-19 and drugs. According to the simplified molecular input line entry specification (SMILES) code of drugs, firstly, the atomic features were extracted to construct the point sets, and edge sets were constructed according to atomic bonds. Then the undirected graph with atomic features was generated by RDKit tool and the graph attention layer was used to extract the drug feature information. Five ion channel target proteins were screened from the whole SARS-CoV-2 genome sequences of NCBI database, and the protein features were extracted by convolution neural network (CNN). Using attention mechanism and graph convolutional network (GCN), the extracted drug features and target features information were connected. After two full connection layers operation, the drug-target affinity was output, and model was obtained. Kiba dataset was used to train the model and determine the model parameters. Compared with DeepDTA, WideDTA, graph attention network (GAT), GCN and graph isomorphism network (GIN) models, it was proved that the mean square error (MSE) of the proposed model was decreased by 0.055, 0.04, 0.001, 0.046, 0.013 and the consistency index (CI) was increased by 0.028, 0.016, 0.003, 0.03 and 0.01, respectively. It can predict the drug-target affinity more accurately. According to the prediction results of drug-target affinity of SARS-CoV-2 ion channel targets, seven kinds of small molecule drugs acting on five ion channel targets were obtained, namely SCH-47112, Dehydroaltenusin, alternariol 5-o-sulfate, LPA1 antagonist 1, alternariol, butin, and AT-9283.These drugs provide a reference for drug repositioning and precise treatment of COVID-19.


Introduction
COVID-19 has caused huge losses and led to great health and economic burdens to the world. Currently, drug discovery for COVID-19 caused by SARS-CoV-2 is still ongoing. Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning models have been used for DTI prediction and drug repurposing of COVID-19. Huang et al. presents DeepPurpose, a comprehensive deep learning library for DTI prediction [1]. Number of drugs such as remdesivir, favipiravir, lopinavir have shown inhibitory effects against the SARS-CoV2 in-vitro as well as in clinical conditions [2].The mainly targets of SARS-CoV-2 include S protein, angiotensin converting enzyme 2 (ACE2), transmembrane protease serine 2 (TMPRSS2), 3CLpro, RNA-dependent RNA polymerase (RdRp), etc. Zhou's team of Westlake University analyzes the electron micros-copy structure of complexes between S protein and ACE2 [3]. The ACE2 receptor recognized by the S protein and TMPRSS2 have also been identified as potential targets, they are both key molecules in the initial stage of viral invasion of the host. There are already several drugs that can target ACE2, including captopril, propofol and tiliquinol, and some polypeptides and antibody drugs, but their anti-SARS-CoV-2 activity is unknown. Egyptian scientists use high-resolution PLpro crystals of SARS-CoV as a template to predict the structure, and predict the binding ability of several anti-SARS-CoV PLpro and anti-HCV NS3 drugs to novel coronavirus PLpro using docking method [4]. Similarly, some studies have screened 2525 FDA-listed drugs in the ZINC database using the structure of homology modeling, and found 16 molecules with strong affinity [5]. Chinese scientists obtains 7 molecules that could bind PLpro from traditional natural compounds by using ADME filtration and molecular docking, and search Chinese medicines containing these ingredients [6]. However, these above methods have shortcoming in the drug-target interaction prediction Accurate drug-target interactions plays a key role in drug repositioning, which can not only deepen the understanding of drug action, but also reposition drugs from the perspective of pharmacology.
Most drug targets come from enzymes, G-protein-coupled receptors (GPCRs), ion channels and nuclear receptors, which account for 88% of the total drug targets [7,8]. Ion channels are composed of a special class of proteins, which are assembled and embedded on the cell membrane to form a pathway that allows the rapid transport of inorganic ions across the cell membrane. Ion channels are involved in a variety of important biological functions, such as cell excitation, muscle contraction, gland secretion, nervous system development, and regulation of gene expression. The occurrence and development of many diseases (such as neuropathic pain, arrhythmia, hypertension, etc.) are related to the dysfunction of ion channels. Therefore, ion channels have become one of the classic targets for pharmaceutical companies to study.
Therefore, a DTI prediction model for COVID-19 based on graph network and ion channels was proposed in this paper. Firstly, the atomic features were extracted, and the undirected graph was generated by RDKit tool. The sequence of each target protein was regarded as a string recognized by computer, and the features of the sequences were extracted by convolution neural network. Attention mechanism and graph convolution net-work were used to predict the drug-target affinity and screen drugs that act on the target proteins of SARS-CoV-2.
The main contributions of this paper are as follows: (1) graph representation learning of drugs based on attention mechanism to achieve drug feature extraction; (2) a mixed graph network model was constructed to predict drug-target affinity; (3) seven drugs acting on SARS-CoV-2 target proteins were found from the prediction results, which provided a reference for drug repositioning.

Graph representation of drugs
SMILES, developed by David Weininger is a standard language for expressing molecular structures with ASCII strings. Using this language, chemical formulas can be converted and stored in the form of text in computers [9]. For each drug, according to its SMILES code, RDKit is used to construct the molecular graph, which serves as the input of the graph convolutional network. The nodes of the graph represent the features of the drug atoms, and the edges represent the bonds between the atoms. The feature vector of a drug atom was composed of five features: atom type, atomic level, total number of hydrogen, implicit value of atom, and whether the atom is aromatic.

Graph convolutional network
GCN is a graph-based convolutional network model proposed in 2017 [11,12]. The formula is as follows: is the weight matrix of layer l,   l H is the input matrix. σ is the activation function, and the ReLU function is usually used. D  is the degree matrix, A  = A + I, where A is the adjacency matrix of node i in the graph, and I is the identity matrix with the same shape as A, i.e., the physical meaning of which is to consider its own features information when calculating node features.

Graph attention network
GAT proposes an attention-based architecture to learn hidden representations of nodes in a graph by applying a self-attention mechanism [13,14]. Its principle is to obtain the attention coefficient of a certain node and its neighbor nodes, and then perform a weighted summation of all surrounding neighbor nodes according to the attention coefficient, and then obtain the feature value of this node after the neighbor nodes are aggregated. The calculation formula of the attention coefficient is as follows: where vector i H  and vector j H  are transformed with the learnable parameter W and connected, and then multiplied by a learnable parameter A T to obtain the attention of node i and node j. LeakyReLU function is used as an activation function and finally get the attention coefficients of node i and j. The node feature formula is as follows: After the training of the attention coefficient, all neighbor node vectors j H  of node i are aggregated with the attention coefficient, and then W is used. In this way, the features of node i and all its neighbor nodes are aggregated and taken as the output features of the node. The formula of multihead attention is as follows: where K is the number of attention heads. When calculating the attention coefficient of node i and node j, the relationship between them is considered for several times. The number of considerations is the head parameter and can be controlled artificially. This multi-attention mechanism is improved the correlation between nodes.

Graph isomorphism network
GIN is an improved GCN model proposed by Keyulu et al. [15] in 2019. The formula is as follows: where multilayer perceptron (MLP) is a multi-layer perceptron, hi is the input feature vector of the klayer perceptron, hi is the output vector, hu is the feature vector of adjacent nodes, and ε is the offset, which can be set manually.

Drug-target affinity prediction model based on mixed graph network
In this paper, we proposed a mixed graph network model based on GAT and GCN for regression prediction of drug-target affinity. The model framework is shown in Figure 1.
Firstly, SMILES was converted to a molecular graph, then a deep learning algorithm was adopted to learn graph representation. The advantages of GCN and GAT model were used to improve the accuracy of feature extraction. Protein sequence was encoded and embedded, then several CNN layers were used to learn sequence representation. Then two representation features were cascaded and undergo two fully connected layers, and finally the drug-target affinity was output.

Dataset and feature extraction
To compare with DeepDTA [16] and WideDTA [17], we ran our model on the same dataset used in these works. KIBA dataset including 2116 drugs and 229 targets and 118,254 binding affinities. The affinity value ranges from 0.0 to 17.2. 98545 binding affinities were used for training and 19709 binding affinities were used for testing the models. Datasets used in this paper are publicly available at http://www.ddccnn.wang/CovidData. GetNumAtoms method was used to obtain the total number of atoms in drug molecules, GetBonds method was used to traverse the bonds, and then GetBeginAtom method and GetEndAtom method were used respectively to obtain chemical index values of the initial atomic and end the index values of the atom, after that the information for undirected graph nodes was created. In order to improve the computational performance and reduce the waste of computational resources, information such as drug molecular graph, target sequence and drug-target affinity was packaged and combined into a data unit. Therefore, the training process can be performed only by directly reading the packaged data unit.

Evaluation index
The same evaluation indexes as WideDTA were used in this paper to judge the prediction ability of the model, namely MSE and CI [18].
MSE is the mean of the sum of squares of the difference between the predicted value and the true value. The smaller the calculated result is, the more accurate the predicted value is, which is defined as follows: CI is used to judge the accuracy of the model. The more the calculation result tends to 1, the higher the accuracy of the trained model is. It is defined as follows: where, y is the true label of drug-target affinity, f is the predicted value of drug-target affinity, and the h(n) is defined as follows:

Experimental environments
The experimental machine was configured as Intel(R) Core(TM) i7-6800K at 2.30GHz, with 64GB memory of CPU and NVIDIA Titan XP of GPU. PyTorch was used to implement the model.  Table 1. By comparing with DeepDTA, WideDTA, GAT, GCN and GIN models, it is proved that MSE of the proposed model is decreased by 0.055, 0.04, 0.001, 0.046, 0.013, and CI is increased by 0.028, 0.016, 0.003, 0.03, 0.01, respectively. It can be seen that the performance of the mixed network model is better than that of previous model, indicating that in the process of graph feature extraction, the mixed network model combines the advantages of the GCN and the GAT to obtain a certain effect.
In the KIBA dataset, the GCN model overall is slightly better than the GAT model, indicating that convolution is a good choice to solve the problem of graphs. If the graph attention network is introduced before the GCN model, the prediction effect will be better than that of pure convolution network. Therefore, continuous attempts and innovation are conducive to finding better solutions.

Visualization of prediction results of drug-target interaction network
The drug-target interaction network was drawn using Cytoscape software to realize the visualization of the prediction results of the model. There were 10555 interaction relationships in the KIBA dataset for five targets of SARS-CoV-2, and the first 87 interaction relationships were screened under the binding affinity > 13.5 condition to construct drug-target interaction network. As shown in Figure 2, red circles represent drugs and green diamonds represent targets. The degree of correlation between the drug and the target is represented by the color and size of the drug. The lighter the color and the smaller the shape, the worse the correlation; the darker the color and the larger the shape, the stronger the correlation. Through the diagram, the interaction relationships between drugs and targets can be understood more intuitively.
The drugs in Tables 2-6 are intersected to obtain 7 drugs targeting 5 ion channel target proteins of SARS-CoV-2, which are CHEMBL483525, CHEMBL291126, CHEMBL519982, CHEMBL495727, CHEMBL83790, CHEMBL520144 and CHEMBL483526, and their corresponding drug molecules are alternariol 5-O-sulfate, SCH-47112, alternariol, AT-9283, LPA1 antagonist 1, dehydroaltenusin, butin. The affinities and average values of the 7 drugs screened by the model established in this paper for 5 target proteins of SARS-CoV-2 are shown in Table 7, and the Venn diagram of the targets and the drugs is shown in Figure 3.

Drug repositioning of COVID-19 based on ion channel targets
The prediction results for 5 ion channel targets of SARS-CoV-2 are shown in Tables 2-6 respectively, including ChemBL ID, small molecule drugs, SMILES code, and drug-target affinity (The name of some small drug molecules have not found).   Table 5. Prediction results of drug-target interaction of helicase (YP_009725308.1).      [19]. According to the prediction results of this model, SCH-47112 should have the best effect on COVID-19. Staurosporine, a potent ATPcompetitive kinase inhibitor, can effectively inhibit PKC activity. The biological activity of Staurosporine in antifungal and antihypertensive treatment makes it have great potential in anti-cancer treatment. The main functions of Staurosporine are inducing G2/M phase blocking of cancer cells, adjusting G1 phase blocking and cell apoptosis [20]. It may be possible to study whether Staurosporine is effective in the treatment of COVID-19. Dehydroaltenusin, less effective than SCH 47112 as shown in the results, is a small molecule selective inhibitor of DNA polymerase α, an antibiotic produced by fungus. It blocks the cancer cell cycle and triggers apoptosis in the S phase and has antitumor activity against human adenocarcinoma tumors in vivo [21]. LPA1 antagonist 1 is a highly selective, lysophosphatidic acid receptor antagonist, LPA interacts with GPCRs to modulate signal response. Alternariol is a mycotoxin produced by Alternaria that inhibits the catalytic activity of topoisomerase I and topoisomerase II. Alternariol has biological activities such as anti-HIV, anti-cancer and antimicrobial properties [22]. Butin and AT-9283 were found to be less effective in treating COVID-19. Butin is a kind of bioactive flavonoids isolated from the heartwood of sandalwood, which has strong antioxidant, anti-platelet and anti-inflammatory activities. It can significantly reduce myocardial infarction, improve cardiac function and prevent oxidative damage of the heart caused by diabetes [23].AT-9283 is A multi-target kinase inhibitor that effectively inhibits Aurora A/B, Jak2/3 and Flt3.It can inhibit the growth and survival of multiple solid tumors in vitro and in vivo [24]. Based on the predicted results of the proposed model, these drugs act on 5 target proteins of SARS-CoV-2 and can be retargeted to investigate their potential to treat COVID-19.

Conclusions
Aiming at the urgent needs of COVID-19 treatment, a mixed graph network model based on graph attention network and graph convolutional network was proposed for prediction of drug-target affinity.
Compared with DeepDTA, WideDTA, GAT, GCN and GIN models, it was proved that MSE of the proposed model was decreased by 0.055, 0.04, 0.001, 0.046, 0.013, and CI was increased by 0.028, 0.016, 0.003, 0.03 and 0.01, respectively. According to the prediction results of drug-target affinity of SARS-CoV-2 ion channel targets, 7 kinds of small molecule drugs acting on 5 ion channel targets were obtained, namely SCH-47112, dehydroaltenusin, alternariol 5-o-sulfate, LPA1 antagonist 1, alternariol, butin, and AT-9283. However, the research results in this paper only provide a certain reference basis for drug repositioning and precise treatment of COVID-19, and the actual therapeutic effect still needs to be verified by a large number of clinical trials. In the future, we will further modify our model, such as extracting protein sequence features using location specific score matrix (PSSM) and other deep learning methods [25][26][27][28], to achieve higher model prediction performance.