Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations

Aberrant expressions of long non-coding RNAs (lncRNAs) are often associated with diseases and identification of disease-related lncRNAs is helpful for elucidating complex pathogenesis. Recent methods for predicting associations between lncRNAs and diseases integrate their pertinent heterogeneous data. However, they failed to deeply integrate topological information of heterogeneous network comprising lncRNAs, diseases, and miRNAs. We proposed a novel method based on the graph convolutional network and convolutional neural network, referred to as GCNLDA, to infer disease-related lncRNA candidates. The heterogeneous network containing the lncRNA, disease, and miRNA nodes, is constructed firstly. The embedding matrix of a lncRNA-disease node pair was constructed according to various biological premises about lncRNAs, diseases, and miRNAs. A new framework based on a graph convolutional network and a convolutional neural network was developed to learn network and local representations of the lncRNA-disease pair. On the left side of the framework, the autoencoder based on graph convolution deeply integrated topological information within the heterogeneous lncRNA-disease-miRNA network. Moreover, as different node features have discriminative contributions to the association prediction, an attention mechanism at node feature level is constructed. The left side learnt the network representation of the lncRNA-disease pair. The convolutional neural networks on the right side of the framework learnt the local representation of the lncRNA-disease pair by focusing on the similarities, associations, and interactions that are only related to the pair. Compared to several state-of-the-art prediction methods, GCNLDA had superior performance. Case studies on stomach cancer, osteosarcoma, and lung cancer confirmed that GCNLDA effectively discovers the potential lncRNA-disease associations.


Introduction
Long non-coding RNAs (lncRNAs) are non-coding RNAs with more than 200nt (nucleotides) in length [1]. There is mounting evidence that lncRNAs participate in the development and progression of numerous diseases [2,3]. Mutations and disorders of lncRNAs are associated with breast and colon cancer, atherosclerosis, and neurodegenerative diseases [4][5][6][7]. Therefore, identification of disease-related lncRNAs may help elucidate pathogenesis.
Computational biology techniques are essential and often used in many fields of biomedicine, ranging from the discovery of biomarkers to the development of drugs [8]. Machine learning and deep learning are being increasingly used to solve the most challenging problems [9][10][11][12][13][14][15]. In recent years, computational methods have been proposed to predict the associations between diseases and lncRNAs.
A heterogeneous network was constructed and named LncDisMirNet. It consisted of the nodes lncRNA, miRNA, and disease. The LncDisMirNet comprised the lncRNA network (LncNet), the disease network (DisNet), the miRNA network (MirNet), and three types of connecting edges; which respectively represent the interaction between lncRNAs and miRNAs, the association between lncRNAs and diseases, and the association between miRNAs and diseases.

Construction of the lncRNA, miRNA, and Disease Networks
Two lncRNAs are usually associated with similar diseases if their functions are similar. Chen et al. calculated the functional similarity among lncRNAs [21]. To construct the lncRNA network, the similarity between two lncRNA nodes was determined by Chen's method and an edge was added to connect them when their similarity was > 0. The weight of the edge was set to the similarity value ( Figure 1a). The matrix L = L ij ∈ R N l ×N l denotes LncNet, where L ij is the similarity between l i and l j and N l is the number of lncRNAs.
The same method was applied to determine the similarity between miRNAs and construct the network MirNet composed of miRNA nodes (Figure 1b). The matrix M = M ij ∈ R N m ×N m was used to represent the MirNet with N m miRNA nodes. M ij represents the similarity between miRNA m i and m j .
Wang et al. calculated the similarity between two diseases [40]. This method represented a disease by using a directed acyclic graph (DAG) comprising all annotations related to it. Here, disease similarity was used to construct the DisNet network, and the matrix D = D ij ∈ R N d ×N d represented it. D ij represents the similarity between disease d i and disease d j , and N d is the number of diseases (Figure 1f).
The connexion between the LncNet and DisNet nodes was established using the known lncRNA-disease correlation data. If the lncRNA node in LncNet is associated with a disease node in DisNet, an edge is added to connect them. The matrix A = A ij ∈ R N l ×N d denotes the set of edges. When A ij = 1, there is an association between lncRNA l i and disease d j . When A ij = 0, there is no association between them (Figure 1c). The heterogeneous network LncDisMirNet was constructed by combining LncNet, DisNet, and MirNet. LncDisMirNet is denoted by the matrix = ∈ × , where = + + , and , , are transpose matrices of A, B, and C, respectively ( Figure  1g).

Attention Mechanism on the Left Side of the Framework
The attention mechanism in a deep learning technique is similar to the visual attention mechanism in humans. The core goal was to select the information that was more critical to a given task. By applying our proposed attention mechanism, each feature of the nodes is assigned a different weight.
As shown in Figure 1g, the i th row = ( , , , … , ) in reflects the topology information between the i th node and all others in the network. For example, contains similarity links between lncRNA and … , association links between and disease … , and Connexions between LncNet and MirNet and between DisNet and MirNet were established based on the data of the lncRNA-miRNA interaction and the miRNA-disease association. If lncRNA l i (disease d i ) in LncNet (DisNet) interacts (associate) with miRNA m l in MirNet, then B ij C ij = 1.
If not, then B ij C ij = 0. The matrices B = B ij ∈ R N l ×N m and C = C ij ∈ R N d ×N m represented the connexions between LncNet and MirNet and between DisNet and MirNet, respectively (Figure 1d,e).
The heterogeneous network LncDisMirNet was constructed by combining LncNet, DisNet, and MirNet. LncDisMirNet is denoted by the matrix U = U ij ∈ R N×N , where N = N l + N d + N m , and A T , B T , C T are transpose matrices of A, B, and C, respectively ( Figure 1g).

Attention Mechanism on the Left Side of the Framework
The attention mechanism in a deep learning technique is similar to the visual attention mechanism in humans. The core goal was to select the information that was more critical to a given task. By applying our proposed attention mechanism, each feature of the nodes is assigned a different weight.
As shown in Figure 1g, the i th row u i = (u i1 , u i2 , u i3 , . . . , u iN ) in U reflects the topology information between the i th node and all others in the network. For example, u 2 contains similarity links between lncRNA l 2 and l 1 . . . l 5 , association links between l 2 and disease d 1 . . . d 6 , and interaction links between l 2 and miRNA m 1 . . . m 5 . Similarly, u 9 contains the links of disease d 4 to all lncRNAs, diseases, and miRNAs. Therefore, u i is the topology feature vector of the i th node in LncMirDisNet. The topology feature vector of the l 2 node is u 2 and that for the d 4 node is u 9 (Figure 2). interaction links between and miRNA … . Similarly, contains the links of disease to all lncRNAs, diseases, and miRNAs. Therefore, is the topology feature vector of the i th node in LncMirDisNet. The topology feature vector of the node is and that for the node is ( Figure 2  The various features of the lncRNA and disease nodes contribute differently and uniquely to the association prediction. Thus, an attention mechanism was established at the node feature level to extract the important features of the − association prediction. The attention scores of each node feature are defined as follows, where ∈ × and ∈ × are parametric matrices, ∈ is a bias vector and (t) = tanh(t) = is the activation function. The vector = [ , , , , … , , , … , , ] is the The various features of the lncRNA and disease nodes contribute differently and uniquely to the association prediction. Thus, an attention mechanism was established at the node feature level to extract the important features of the l 2 − d 4 association prediction. The attention scores of each node feature are defined as follows, where H att ∈ R N×N and W att ∈ R N×N are parametric matrices, b att ∈ R N is a bias vector and f (t) = tanh(t) = e t −e −t e t +e −t is the activation function. The vector s i = s i,1 , s i,2 , . . . , s i,j , . . . , s i,N is the attention score vector of each feature of u i , where s i,j is the attention score of the j th feature of u i .
So f tmax(t) k = e t k j e t j was used to normalize the attention scores for all features of u i , where α i = α i,1 , α i,2 , . . . , α i,k , . . . , α i,N is the feature-level attention weight vector of u i , and α i,k is the weight of the k th feature of u i . Therefore, the node enhancement vector based on the feature-level attention mechanism is, x i = α i attention score vector of each feature of , where , is the attention score of the j th feature of .
( ) = ∑ was used to normalize the attention scores for all features of , where = [ , , , , … , , , … , , ] is the feature-level attention weight vector of , and , is the weight of the k th feature of . Therefore, the node enhancement vector based on the feature-level attention mechanism is, where ⓧ is the element-wise product operator and is the enhancement vector of . The enhancement vectors of the lncRNA node and the disease node are = ⓧ and = ⓧ , respectively.

Graph Convolutional Network Module on the Right Side of the Framework
The graph convolutional network is a multilayer neural network proposed by Tomas Kpif in 2017 [41]. It uses the graph as an input, integrates the neighborhood node feature and structure information of the graph nodes, and represents them as a vector. Graph convolutional networks have been successfully applied towards the prediction of multidrug side effects, social networks, recommendation system and prediction of drug-target interactions [42][43][44][45]. Here, the graph convolutional network was used to predict lncRNA-disease associations. The heterogeneous network LncDisMirNet has connexions based on lncRNA, disease, and miRNA similarity, lncRNA-disease and miRNA-disease associations, and lncRNA-miRNA interactions. These are consistent so the entire heterogeneous network U is used as the input for the graph convolution.
First, = + is the adjacency matrix with added self-connections, where is the identity matrix. Then a symmetric Laplace normalization was performed on to get ∈ × , where ∈ × is a diagonal matrix such that = Ʃ , is actually the degree matrix of . The graph convolution autoencoder takes in the structure matrix and the node feature matrix as inputs. And the graph convolution autoencoder encodes the nodes in LncDisMirNet to obtain network representations of the lncRNA, disease, and miRNA nodes, where ∈ × is a weight matrix and n is a hyper-parameter. The matrix is multiplied by . This operation can be understood as an aggregation of spatial information. If = , where ∈ , the i th row in the matrix ∈ × can be understood as the feature vector of the i th node. and are multiplied to map the nodes to the low-dimensional vector ∈ . As shown in Figure  2, the second row and the ninth row in the matrix are network representations of and , respectively.
Furthermore, we traced back to its original feature space. was subsequently decoded on the basis of the graph convolution, ∈ × is a parameter matrix and ( ) = is the activation function. To make and as consistent as possible, the loss function of the graph convolution autoencoder was defined as MSE (mean-square error), where 6 of 16 feature of , where , is the attention score of the j th feature of . d to normalize the attention scores for all features of , , , ] is the feature-level attention weight vector of , and , is . Therefore, the node enhancement vector based on the feature-level e product operator and is the enhancement vector of . The cRNA node and the disease node are = ⓧ and = etwork Module on the Right Side of the Framework l network is a multilayer neural network proposed by Tomas Kpif in as an input, integrates the neighborhood node feature and structure s, and represents them as a vector. Graph convolutional networks have owards the prediction of multidrug side effects, social networks, d prediction of drug-target interactions [42][43][44][45]. Here, the graph ed to predict lncRNA-disease associations. The heterogeneous network s based on lncRNA, disease, and miRNA similarity, lncRNA-disease ns, and lncRNA-miRNA interactions. These are consistent so the entire sed as the input for the graph convolution. djacency matrix with added self-connections, where is the identity lace normalization was performed on to get ∈ × , al matrix such that = Ʃ , is actually the degree matrix of . ncoder takes in the structure matrix and the node feature matrix volution autoencoder encodes the nodes in LncDisMirNet to obtain e lncRNA, disease, and miRNA nodes, ight matrix and n is a hyper-parameter. The matrix is multiplied by rstood as an aggregation of spatial information. If = , where ∈ ∈ × can be understood as the feature vector of the i th node. and the nodes to the low-dimensional vector ∈ . As shown in Figure  ( ) = ∑ was used to normalize the attention scores for all features of , where = [ , , , , … , , , … , , ] is the feature-level attention weight vector of , and , is the weight of the k th feature of . Therefore, the node enhancement vector based on the feature-level attention mechanism is, where ⓧ is the element-wise product operator and is the enhancement vector of . The enhancement vectors of the lncRNA node and the disease node are = ⓧ and = ⓧ , respectively.

Graph Convolutional Network Module on the Right Side of the Framework
The graph convolutional network is a multilayer neural network proposed by Tomas Kpif in 2017 [41]. It uses the graph as an input, integrates the neighborhood node feature and structure information of the graph nodes, and represents them as a vector. Graph convolutional networks have been successfully applied towards the prediction of multidrug side effects, social networks, recommendation system and prediction of drug-target interactions [42][43][44][45]. Here, the graph convolutional network was used to predict lncRNA-disease associations. The heterogeneous network LncDisMirNet has connexions based on lncRNA, disease, and miRNA similarity, lncRNA-disease and miRNA-disease associations, and lncRNA-miRNA interactions. These are consistent so the entire heterogeneous network U is used as the input for the graph convolution.
First, = + is the adjacency matrix with added self-connections, where is the identity matrix. Then a symmetric Laplace normalization was performed on to get ∈ × , where ∈ × is a diagonal matrix such that = Ʃ , is actually the degree matrix of . The graph convolution autoencoder takes in the structure matrix and the node feature matrix as inputs. And the graph convolution autoencoder encodes the nodes in LncDisMirNet to obtain network representations of the lncRNA, disease, and miRNA nodes, where ∈ × is a weight matrix and n is a hyper-parameter. The matrix is multiplied by . This operation can be understood as an aggregation of spatial information. If = , where ∈ , the i th row in the matrix ∈ × can be understood as the feature vector of the i th node. and are multiplied to map the nodes to the low-dimensional vector ∈ . As shown in Figure  2, the second row and the ninth row in the matrix are network representations of and , respectively.
Furthermore, we traced back to its original feature space. was subsequently decoded on the basis of the graph convolution, ∈ × is a parameter matrix and ( ) = is the activation function. To make and as consistent as possible, the loss function of the graph convolution autoencoder was defined as MSE (mean-square error), , where = [ , , , , … , , , … , , ] is the feature-level attention weight vector of , the weight of the k th feature of . Therefore, the node enhancement vector based on the f attention mechanism is, = ⓧ , where ⓧ is the element-wise product operator and is the enhancement vector enhancement vectors of the lncRNA node and the disease node are = ⓧ ⓧ , respectively.

Graph Convolutional Network Module on the Right Side of the Framework
The graph convolutional network is a multilayer neural network proposed by To 2017 [41]. It uses the graph as an input, integrates the neighborhood node feature an information of the graph nodes, and represents them as a vector. Graph convolutional net been successfully applied towards the prediction of multidrug side effects, socia recommendation system and prediction of drug-target interactions [42][43][44][45]. Here, convolutional network was used to predict lncRNA-disease associations. The heterogeneo LncDisMirNet has connexions based on lncRNA, disease, and miRNA similarity, lncR and miRNA-disease associations, and lncRNA-miRNA interactions. These are consistent heterogeneous network U is used as the input for the graph convolution.
First, = + is the adjacency matrix with added self-connections, where is matrix. Then a symmetric Laplace normalization was performed on to get ∈ × , = , where ∈ × is a diagonal matrix such that = Ʃ , is actually the degree m The graph convolution autoencoder takes in the structure matrix and the node featu as inputs. And the graph convolution autoencoder encodes the nodes in LncDisMirN network representations of the lncRNA, disease, and miRNA nodes, × is a weight matrix and n is a hyper-parameter. The matrix is m . This operation can be understood as an aggregation of spatial information. If = , , the i th row in the matrix ∈ × can be understood as the feature vector of the i th n are multiplied to map the nodes to the low-dimensional vector ∈ . As show 2, the second row and the ninth row in the matrix are network representations of respectively.
Furthermore, we traced back to its original feature space. was subsequently the basis of the graph convolution, ∈ × is a parameter matrix and ( ) = is the activation function. and as consistent as possible, the loss function of the graph convolution autoencoder w as MSE (mean-square error), The graph convolutional network is a multilayer neural network proposed by Tomas Kpif in 2017 [41]. It uses the graph as an input, integrates the neighborhood node feature and structure information of the graph nodes, and represents them as a vector. Graph convolutional networks have been successfully applied towards the prediction of multidrug side effects, social networks, recommendation system and prediction of drug-target interactions [42][43][44][45]. Here, the graph convolutional network was used to predict lncRNA-disease associations. The heterogeneous network LncDisMirNet has connexions based on lncRNA, disease, and miRNA similarity, lncRNA-disease and miRNA-disease associations, and lncRNA-miRNA interactions. These are consistent so the entire heterogeneous network U is used as the input for the graph convolution.
First,Û = U + I is the adjacency matrix with added self-connections, where I is the identity matrix. Then a symmetric Laplace normalization was performed onÛ to get U ∈ R N×N , where E ∈ R N×N is a diagonal matrix such that E ii = jÛij , E is actually the degree matrix ofÛ. The graph convolution autoencoder takes in the structure matrix U and the node feature matrix X as inputs. And the graph convolution autoencoder encodes the nodes in LncDisMirNet to obtain network representations of the lncRNA, disease, and miRNA nodes, where W enco ∈ R N×n is a weight matrix and n is a hyper-parameter. The matrix U is multiplied by X. This operation can be understood as an aggregation of spatial information. If K = UX, where K i ∈ R N , the i th row in the matrix K ∈ R N×N can be understood as the feature vector of the i th node. K and W enco are multiplied to map the nodes to the low-dimensional vector z i ∈ R n . As shown in Figure 2, the second row z 2 and the ninth row z 9 in the matrix are network representations of l 2 and d 4 , respectively. Furthermore, we traced z i back to its original feature space. Z was subsequently decoded on the basis of the graph convolution,X =f Z, U = Sigmoid U Z W deco .
W deco ∈ R n×N is a parameter matrix and Sigmoid(t) = 1 1 + e t is the activation function. To makeX and X as consistent as possible, the loss function of the graph convolution autoencoder was defined as MSE (mean-square error), The network representations z i of the lncRNA nodes and z j of the disease nodes obtained by graph convolutional neural networks were then combined to obtain the network representation k i,j ∈ R 2 * n of the node pairs l i -d j , As shown in Figure 2, the second row z 2 and the ninth row z 9 in the matrix are network representations of l 2 and d 4 , respectively. z 2 and z 9 were concatenated to get k 2,9 and then projected onto a C (C = 2) class association probability distribution using fully connected and softmax layers. In this two-class distribution p l , class 0 means that l 2 and d 4 are not associated whilst class 1 indicates association between l 2 and d 4 . The probability of class 1 was taken as the predictive score l 2,4 of the association between l 2 and d 4 , where W l ∈ R 2×(2 * n) is the parameter matrix of the fully connected layer and b l ∈ R 2 is the bias term. score l 2,4 measures the likelihood of association between lncRNA l 2 and disease d 4 , and the greater its value, the more likely they are to be associated. The probability score l i,j in which l i and d j may be correlated can be obtained by the same method.

Construction of the Embedding Matrix of lncRNA-Disease Node Pairs
The l 2 and d 4 serve to illustrate the process of constructing embedding matrix as shown in Figure 3. If l 2 and d 4 have similarities and associations with common lncRNAs, the likelihood of association between them is high. In the matrices L and A, l 2 and d 4 have similarities and associations, respectively, with l 1 . Thus, there may be an association between them. The second row of L records the similarity between l 2 and all lncRNAs. The fourth column of A records the associations between d 4 and all lncRNAs. These were spliced together as the first part of the embedding matrix P 2,4 ∈ R 2×N . Similarly, if l 2 and d 4 have connexions with common miRNAs and diseases, they are more likely to be associated. The second row of A and the fourth row of D were combined as the second part of P 2,4 . Finally, the second row of B and the fourth row of C were combined as the third part of P 2,4 . So far, lncRNA similarity, disease similarity, lncRNA-disease association, lncRNA-miRNA interaction, and disease-miRNA association were integrated to construct the embedding matrix P 2,4 of the node pair l 2 -d 4 . The same method is used to construct the embedding matrix P i,j for the other lncRNA-disease node pairs l i -d j . The network representations of the lncRNA nodes and of the disease nodes obtained by graph convolutional neural networks were then combined to obtain the network representation , ∈ * of the node pairs -, , = ⊕ .
As shown in Figure 2, the second row and the ninth row in the matrix are network representations of and , respectively. and were concatenated to get , and then projected onto a C (C = 2) class association probability distribution using fully connected and softmax layers. In this two-class distribution p l , class 0 means that and are not associated whilst class 1 indicates association between and . The probability of class 1 was taken as the predictive , of the association between and , where ∈ ×( * ) is the parameter matrix of the fully connected layer and ∈ is the bias term.
, measures the likelihood of association between lncRNA and disease , and the greater its value, the more likely they are to be associated. The probability , in which and may be correlated can be obtained by the same method.

Construction of the Embedding Matrix of lncRNA-Disease Node Pairs
The and serve to illustrate the process of constructing embedding matrix as shown in Figure 3. If and have similarities and associations with common lncRNAs, the likelihood of association between them is high. In the matrices L and A, and have similarities and associations, respectively, with . Thus, there may be an association between them. The second row of L records the similarity between and all lncRNAs. The fourth column of A records the associations between and all lncRNAs. These were spliced together as the first part of the embedding matrix , ∈ × . Similarly, if and have connexions with common miRNAs and diseases, they are more likely to be associated. The second row of A and the fourth row of D were combined as the second part of , . Finally, the second row of B and the fourth row of C were combined as the third part of , . So far, lncRNA similarity, disease similarity, lncRNA-disease association, lncRNA-miRNA interaction, and disease-miRNA association were integrated to construct the embedding matrix , of the node pair -. The same method is used to construct the embedding matrix , for the other lncRNA-disease node pairs -.  convolution process, a zero-padding operation was run on P 2,4 to obtain P 2,4 ∈ R T×N 1 , to be precise, pad zeros around P 2,4 were operated, where T = 2 + 2 and N 1 = N + 2. In the first convolution layer, the filter length and width were set to n f and n d , respectively. If the number of filters is n conv , the convolution filter W conv is applied to P i,j to obtain the first feature maps S 1 i,j ∈ R (T−n f +1)×(N 1 −n d +1)×n conv . The area and process of convolution are defined as follows, P conv m,n = P i,j (m : m + n f , n : n + n d ), where P conv m,n is the region covered by the sliding window when filter W conv slides to the m th row and the n th column of P i,j . g(t) = ReLu(t) = max(0, t) is the activation function, and b conv (k) is the k th bias vector. If convolution filter W conv is applied to the embedding matrix P 2,4 of node pairs l 2 -d 4 , the first feature map S 1 2,4 will be obtained. Robust features can be extracted from feature map by applying max-pooling. In the pooling layer, the max-pooling operation was performed on S 1 i,j to obtain the feature representation where n a and n b are the length and width of the pooling layer sliding window, respectively. S 1 i,j (m : m + n a , n : n + n d , k) is the region covered by the sliding window when pooling window slides to the m th row and the n th column of S 1 i,j . Robust features are extracted from this region. If max-pooling was performed on the feature maps S 1 2,4 of node pair l 2 -d 4 , the feature representation Q 1 2,4 will be obtained. Next, we will continue to use node pairs l 2 -d 4 as an example. Q 1 2,4 was used as the input of the second convolution layer to obtain the feature representation Q 2 2,4 after the convolution and max-pooling operations. Convolution and max-pooling were also run on Q 2 2,4 in the third convolution layer and the pooling layer to obtain the feature representation Q 3 2,4 ∈ R n m ×n g ×n conv . n m and n g are respectively the length and width of the feature representation after three convolutions and pooling. Q 3 2,4 was flattened into the vector q 2,4 ∈ R n m * n g * n conv . Similarly, the fully connected and SoftMax layers served to project q 2,4 onto the C (C = 2)-associated probability distribution p r of class C (C = 2). The probability class 1 was taken as the predictive score r 2,4 of the association between l 2 and d 4 , where W r ∈ R 2×(n m * n g * n conv ) is the parameter matrix of the fully connected layer and b r is the bias term. score r 2,4 measures the probability of association between lncRNA l 2 and disease d 4 . The higher its value is, the more likely the association is between them. The probability score r i,j in which l i and d j may be correlated can be obtained by the same method.

Combination Strategy
The left and right sides of the model analyzed the relationship between lncRNA l 2 and disease d 4 from different perspectives. To combine their characteristics and improve model performance, a combination strategy was designed for the final prediction. The cross-entropy loss between the association prediction distribution p l and the real distribution on the left side of the model is defined as follows, where T is the number of training samples and z is the sample label. The cross-entropy loss on the right side of the model is defined as follows, The final association prediction score 2,4 of l 2 and d 4 is the weighted sum of score l 2,4 and score r 2,4 , λ ∈ (0 , 1) evaluates the contributions of the left and right sides of the model.

Reducing Overfitting
There are many parameters in our neural network. The higher the number of parameters, the easier it is to cause over-fitting. The recent technique, "dropout", consists of setting the output of each hidden neuron to zero with a probability of 0.5. The neurons that are "dropped out" in this way do not participate in the forward pass and back-propagation [46]. Thus, every time an input is presented, the neural network samples a different architecture, but all these architectures share weights. This technique reduces intricate co-adaptation of neurons, because a neuron cannot depend on the existence of other neurons. Therefore, it is forced to learn robust and beneficial features in conjunction with different random subsets of other neurons. During the test, we multiplied the output of all the neurons by 0.5, which reasonably approximates the geometric mean of the predictive distributions produced exponentially by many dropout networks.

Performance Evaluation Metrics
We used fivefold cross-validation to evaluate and compare the performance of GCNLDA with other state-of-the-art prediction methods. If there is an association between lncRNA l i and disease d j , then the node pair l i − d j is regarded as a positive example. In contrast, the lack of association indicates that l i − d j is a negative example. In the whole dataset, there were far fewer positive than negative examples. This discrepancy created a class imbalance affecting the model training. Therefore, we must randomly extract the same number of negative examples as the total number of positive samples from the dataset then randomly divide them into five equal subsets. All positive examples were also partitioned into five subsets of equal size. Four subsets each from the positive and negative examples were used to train the prediction model. All remaining samples were used for testing. Before each cross-validation, we removed the lncRNA-disease associations to be used for testing purposes then recalculated the similarity of the lncRNAs with the remaining associations.
We used the trained model to estimate the association prediction scores of the test samples then ranked them in descending order. When the association prediction score between lncRNA and disease was > θ (a threshold), this example was deemed positive. Otherwise, it was scored as a negative example. We used TP and TN to represent the numbers of correctly identified positive and negative example, respectively. FN and FP represented the numbers of misidentified positive and negative examples, respectively. The TPR (true positive rate), FPR (false positive rate), Precision (precision), and Recall (recall rate) were calculated as follows, The TPRs, FPRs, Precisions, and Recalls were calculated by changing θ. The TPRs and FPRs were used to plot the receiver operating characteristic (ROC) curve. The area under the ROC curve (AUC) was used to measure the global performance of the prediction method. To improve the assessment of the model performance in the event of class imbalance, we plotted the precision-recall (PR) curve based on the calculated precisions and recalls. The area under the PR curve (AUPR) also quantified the overall performance of the prediction method. GCNLDA's AUCs and AUPRs during each cross-validation are listed in Supplementary Table S1. The
In order to evaluate the ability of our model to predict lncRNA-disease associations, we compared it with other state-of-the-art prediction methods including Ping's method [25], LDAP [32], MFLDA [33], and SIMCLDA [34]. We adjusted the parameters of GCNLDA based on the cross-validation to optimize its prediction performance. On the left side of the model, network node representations with n = 100 were obtained from the graph convolution encoding operation. The learning rate of the autoencoder was set to 0.001. On the right side of the model, n conv1 = 20 filters, n conv2 = 30 filters, and n conv3 = 40 filters of length n f = 3 and width n d = 11 were used in three convolution layers. The learning rate was set to 0.0005. The parameters were updated by the Adam optimization algorithm throughout the training process. ReLu was the activation function for all fully connected layers. The optimal parameters of other methods are obtained through grid search. For SIMCLDA, α l = 0.8, α d = 0.6, and λ = 1; for Ping's method, α = 0.6; for MFLDA, α = 10 5 ; for LDAP, gap open = 10, and gap extend = 0.5.
As shown in Figure 4a and Table 1, GCNLDA had the best performance for 405 diseases. The AUC of the ROC curve was 0.959. The performance of GCNLDA was superior to those of SIMCLDA, Ping's method, MFLDA, and LDAP by 21.34%, 8.84%, 33.36%, and 9.64%, respectively. We listed the AUC of all five methods based on 10 well-characterized diseases. Each of these has > 15 known lncRNAs associated with them. GCNLDA presented with the best performance on these 10 diseases (Table 1). Ping's method and LDAP fused the similarity of lncRNA and disease which improved the accuracy of their similarity calculations and achieved good performance. Ping's method also exploited the topology information of the bipartite networks so its performance was slightly superior to that of LDAP. In contrast, SIMCLDA only fused multiple similarities of lncRNA. Consequently, its performance was inferior to those of the aforementioned methods. MFLDA integrates multiple data sources but ignores the similarity of lncRNAs and diseases. As a result, its performance is inferior to those of the other methods. The aforementioned methods focus mainly on lncRNA, disease similarity, and integration of multiple data sources. They make negligible use of network topology information. The advantages of GCNLDA over the other methods include deep learning to extract the local representation of lncRNA-disease node pairs and graph convolution to learn their network representation.  As shown in Figure 4b and Table 2, GCNLDA had the best performance for 405 diseases (AUPR = 0.2233). It was 16.4% better than SIMCLDA, 7.17% better than Ping's method, 18.45% better than MFLDA, and 9.64% better than LDAP. GCNLDA achieved the best performance for nine of the ten well-characterized diseases.   As shown in Figure 4b and Table 2, GCNLDA had the best performance for 405 diseases (AUPR = 0.2233). It was 16.4% better than SIMCLDA, 7.17% better than Ping's method, 18.45% better than MFLDA, and 9.64% better than LDAP. GCNLDA achieved the best performance for nine of the ten well-characterized diseases.
To verify whether the performance of our method was significantly better than those of the other methods, we conducted paired Wilcoxon tests on GCNLDA and the others. In all cases, p < 0.05 (Table 3). Relative to the other methods, then, the performance of GCNLDA in the AUPRs and AUCs was significantly better.  As shown in Figure 5, the recall rate on the top k ranked lncRNAs increases with the number of correctly identified known lncRNA-disease associations. GCNLDA  To verify whether the performance of our method was significantly better than those of the other methods, we conducted paired Wilcoxon tests on GCNLDA and the others. In all cases, p < 0.05 (Table  3). Relative to the other methods, then, the performance of GCNLDA in the AUPRs and AUCs was significantly better. As shown in Figure 5, the recall rate on the top k ranked lncRNAs increases with the number of correctly identified known lncRNA-disease associations. GCNLDA consistently outperformed other methods at different k values. The average recall rates of the top 30, 60, 90, and 120 lncRNA candidates for GCNLDA were 91.5%, 97.3%, 98.5%, and 99.7%, respectively. For Ping's method, they were 68.9%, 81.3%, 87.5%, and 92.7%, respectively. For LDAP, they were 68.5%, 81.3%, 88%, and 93.3%, respectively. For SIMCLDA, they were 49.3%, 63%, 74.1%, and 80.3%, respectively. For MFLDA, they were 42%, 53.9%, 61%, and 65.5%, respectively.

Case Studies on Stomach Cancer, Osteosarcoma, and Lung Cancer
To test the ability of GCNLDA to predict potential lncRNA-disease associations, we conducted a case analysis on stomach cancer, osteosarcoma, and lung cancer. We analyzed in detail the top 15 candidates for related diseases ( Table 4). The top 15 candidates for all the 405 diseases were obtained through GCNLDA and are listed in Supplementary Table S2. All known lncRNA-disease associations were treated as training samples and all lncRNA-disease pairs with unknown associations were used as test samples.
Lnc2Cancer is an experimentally corroborated database consisting of 4986 lncRNA-disease associations. It includes 1614 human lncRNAs and 165 human cancers. The database LncRNADisease contains lncRNA-disease associations verified by experimentation and predicted by state-of-the-art methods. Twelve of the 15 lncRNA candidates related to stomach cancer were included in the Lnc2Cancer database and 10 of them were included among the experimentally verified data in

Case Studies on Stomach Cancer, Osteosarcoma, and Lung Cancer
To test the ability of GCNLDA to predict potential lncRNA-disease associations, we conducted a case analysis on stomach cancer, osteosarcoma, and lung cancer. We analyzed in detail the top 15 candidates for related diseases ( Table 4). The top 15 candidates for all the 405 diseases were obtained through GCNLDA and are listed in Supplementary Table S2. All known lncRNA-disease associations were treated as training samples and all lncRNA-disease pairs with unknown associations were used as test samples. "Lnc2Cancer" means the lncRNA candidate was included in the Lnc2Cancer database. "LncRNADisease" means the candidate was included among the experimentally verified data in LncRNADisease. "LncRNADisease*" means the candidate was included among the predicted data in LncRNADisease. "Literature" means the candidate was supported in published studies.
Lnc2Cancer is an experimentally corroborated database consisting of 4986 lncRNA-disease associations. It includes 1614 human lncRNAs and 165 human cancers. The database LncRNADisease contains lncRNA-disease associations verified by experimentation and predicted by state-of-the-art methods. Twelve of the 15 lncRNA candidates related to stomach cancer were included in the Lnc2Cancer database and 10 of them were included among the experimentally verified data in LncRNADisease. The databases confirmed whether the lncRNAs were associated with stomach cancer. If the disease-related lncRNA candidate was labelled as "Literature", then it was supported in published studies. As shown in Table 4, candidate MIR17HG (alias mir-17-92) was labelled as "Literature" and proved to be dysregulated in stomach cancer [47].
Among the top 15 lncRNA candidates of osteosarcoma listed in Table 4, ten were included in the Lnc2Cancer database whilst two were queried in LncRNADisease with experimental support. They were confirmed to have definite associations with osteosarcoma. Recently published studies showed that AFAP1-AS1 enhances cell proliferation and invasion in osteosarcoma by regulating miR-4695-5p/TCF4-β-catenin signaling [48]. Nine of the top 15 lncRNA candidates of lung cancer were in Lnc2Cancer and eight appeared in LncRNADisease. Recent reports confirmed that lncRNA MIR155HG promotes lung cancer cell proliferation, migration, and invasion [49].
The remaining eight lncRNA candidates labelled "LncRNADisease*" were included in the predicted lncRNA-disease associations in the LncRNADisease database. These predictions reveal that GCNLDA effectively discovers potential lncRNA-disease associations.

Conclusions
GCNLDA predicts potential lncRNA-disease associations and it is based on graph convolutional network and convolutional neural networks. Attention mechanism was constructed at the node feature level to distinguish the various contributions of the node features. The graph convolution autoencoder with an attention mechanism deeply integrates the topological information of lncRNA-disease-miRNA heterogeneous networks. The convolutional neural network module captures various connection relationships related to lncRNA-disease on the node pair embedding. The network and local representations of lncRNA-disease node pairs were learned by the new framework based on graph convolutional network and convolutional neural networks. Cross-validation confirmed that GCNLDA is superior to other state-of-the-art methods in terms of both AUC and AUPR. Case studies on three diseases substantiated the ability of GCNLDA to predict potential disease-associated lncRNAs. GCNLDA may serve as an effective tool to screen reliable candidates for lncRNA-disease association validation with-lab experiment.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4409/8/9/1012/s1, Table S1: AUC and AUPR of GCNLDA in each cross-validation. Author Contributions: P.X. and S.P. conceived the prediction method, and they wrote the paper. Y.L. and S.P. developed the computer programs. T.Z. and H.S. analyzed the results and revised the paper.