SPMLMI: predicting lncRNA–miRNA interactions in humans using a structural perturbation method

Long non-coding RNA (lncRNA)–microRNA (miRNA) interactions are quickly emerging as important mechanisms underlying the functions of non-coding RNAs. Accordingly, predicting lncRNA–miRNA interactions provides an important basis for understanding the mechanisms of action of ncRNAs. However, the accuracy of the established prediction methods is still limited. In this study, we used structural consistency to measure the predictability of interactive links based on a bilayer network by integrating information for known lncRNA–miRNA interactions, an lncRNA similarity network, and an miRNA similarity network. In particular, by using the structural perturbation method, we proposed a framework called SPMLMI to predict potential lncRNA–miRNA interactions based on the bilayer network. We found that the structural consistency of the bilayer network was higher than that of any single network, supporting the utility of bilayer network construction for the prediction of lncRNA–miRNA interactions. Applying SPMLMI to three real datasets, we obtained areas under the curves of 0.9512 ± 0.0034, 0.8767 ± 0.0033, and 0.8653 ± 0.0021 based on 5-fold cross-validation, suggesting good model performance. In addition, the generalizability of SPMLMI was better than that of the previously established methods. Case studies of two lncRNAs (i.e., SNHG14 and MALAT1) further demonstrated the feasibility and effectiveness of the method. Therefore, SPMLMI is a feasible approach to identify novel lncRNA–miRNA interactions underlying complex biological processes.


Structural consistency
The concept of structural consistency was first proposed by Lü et al., which can be used to quantify the link predictability of complex networks (Lü et al. 2015;Zeng et al. 2018). They define it as the consistency of network structure features before and after random removal of partial links. In this study, we applied this method to weighted bilayer networks; viz. weighted lncRNA-miRNA bilayer network A .
We use graph ( , , ) G V E W = to represent the weighted lncRNA-miRNA bilayer network. V and E are the sets of nodes (include both lncRNA and miRNA nodes) and edges, respectively; and W is set of weights. We select a small part of the links to compose a perturbation set E ∆ , while the rest of the links are defined as R E . A ∆ and R A represent the corresponding weighted adjacency matrix, respectively; and Obviously, R A is a real symmetric matrix; therefore, it can be diagonalized as follows.
where k λ are the eigenvalues for R A and k x are the corresponding orthogonal and normalized eigenvectors.
Using E ∆ as the perturbation set, we obtain a perturbed matrix by first-order approximation. Firstorder approximation allows the eigenvalues to change but keeps the eigenvectors constant. Two cases are considered. First, consider the non-degenerated case without any repeated eigenvalues. After perturbation, the eigenvalue k λ is adjusted to k k λ λ + ∆ , and the corresponding eigenvector is adjusted to k x x + ∆ . By multiplying the eigenfunction, we have By T k x and neglecting the second-order terms T k k Using the perturbed eigenvalues while keeping eigenvectors unchanged, the perturbed matrix can be obtained, which can be considered a linear approximation of the given network A if the expansion is based on R A . Next, considering the adjacency matrix contains repeated eigenvalues. If ki λ is eigenvalues, the index i denotes M related eigenvectors of the same eigenvalues and the index k denotes different eigenvalues. It is given that any linear combination of eigenvectors belonging to the same eigenvalue is still an eigenvector. After adding a perturbation into the network, we choose the degenerate eigenvalues, which can be changed successively into the perturbed nondegenerate eigenvalues. If we define the chosen giving us for any 1 n M =  , left multiplying Equation (7) (5)

Structural perturbation method for lncRNA-miRNA interaction prediction
Generally, the link prediction problem of a network is how to estimate the probability of the existence of unobserved links according to known topological information. The network structure perturbation involved in the structure consistency calculation process can be used to predict the missing links (Lü et al., 2015).
For the lncRNA-miRNA bilayer network A , taking 5-fold cross-validation as an example, the observed links in the original adjacency matrix ( LMnet ) are randomly divided into 5 equal sized subsets. Of these 5 subsets, one is selected as the probe set, and the others, together with LSnet and MSnet , as the training set. Next, we randomly remove a fraction of links from training sets to constitute the perturbation set. The perturbation matrix can be calculated as A′ , see the section 1 for details. The final prediction matrix Â ′ is obtained by averaging over t independent selections of the perturbation set. In this way, the elements in prediction matrix Â ′ can be regarded as a score between a pair of nodes of the bilayer network A . The scores in Â ′ determine the extent of all unobserved lncRNA-miRNA interactions, and we assume that the higher the score, the more likely the potential interaction will be.

References
Lü L, Pan L, Zhou T, Zhang YC, and Stanley HE. 2015. Toward link predictability of complex networks.