Computational Model for Predicting the Relationship Between Micro-RNAs and Their Target Messenger RNAs in Breast and Colon Cancers

Motivation: Uncovering the relationship between micro-RNAs (miRNAs) and their target messenger RNAs (mRNAs) can provide critical information regarding the mechanisms underlying certain types of cancers. In this context, we have proposed a computational method, referred to as prediction analysis by optimization method (PAOM), to predict miRNA-mRNA relations using data from normal and cancer tissues, and then applying the relevant algorithms to colon and breast cancers. Specifically, we used 26 miRNAs and 26 mRNAs with 676 (= 26 × 26) relationships to be recovered as unknown parameters. Results: Optimization methods were used to detect 61 relationships in breast cancer and 32 relationships in colon cancer. Using sequence filtering, we detected 18 relationships in breast cancer and 15 relationships in colon cancer. Among the 18 relationships, CD24 is the target gene of let-7a and miR-98, and E2F1 is the target gene of miR-20. In addition, the frequencies of the target genes of miR-223, miR-23a, and miR-20 were significant in breast cancer, and the frequencies of the target genes of miR-17, miR-124, and miR-30a were found to be significant in colon cancer. Availability: The numerical code is available from the authors on request.


Introduction
Micro-RNAs (miRNAs) are a large class of small, non-coding RNAs consisting of 22 nucleotides that are expressed from longer and endogenous hairpin-shaped transcripts generally referred to as pre-miRNAs. 1 The miRNAs regulate proteincoding gene expression post-transcriptionally via the translational repression or transcript degradation of their target messenger RNAs (mRNAs), 1,2 thereby indicating that miR-NAs perform crucial roles in a variety of biological functions. The results of recent studies have demonstrated that miRNAs are deregulated in cancers. 2,3 Several computational methods have been proposed to determine how miRNAs pair with their target mRNAs, 3,4 in an effort to unravel the roles of miRNAs in the deregulated expression of their target mRNAs during cancer development 5,6 and some relationships were identified between miRNAs and their target mRNAs. 5,[7][8][9][10][11] However, the development of a computational method for the identification of such relationships in cancer remains a difficult issue. Thus far, 2 computational methods have been developed: either the identification of miRNAs that are conserved in different species or stem loop prosecutors 11 or the identification of the relationship between miRNAs and their target mRNAs via the use of sequence homologues. Previously, we proposed a numerical optimization method for multi (miRNAs) -to -multi (mRNAs), which was used for the identification of 16 miRNAs-mRNA relations in colon cancer microarray profiles. 7 The proposed method was used to identify 207 relationships successfully, out of 484. Some relationships detected in that study were verified through previous experimental evidences. For example, the relationship between miR-17 and its target E2F1 was identified in a previous colon cancer study. 12 Like previous studies, however, the difference between the expression profiles of normal and cancer tissues, which we believe to be a critical factor in determining the relationship between miRNAs and their target mRNAs, was ignored.
Here, we propose a novel method, referred to as prediction analysis by optimization method (PAOM), which is composed of a mathematical model and computational method designed to predict miRNA-mRNA relations in the context of cancer development. In this model, the predicted relations are filtered using sequence analysis resources. For mathematical modeling, we employed linear system equations to obtain the inhibiting parameters. The role of the computational method is to optimize the relations and to allow for comparisons of the differences in the parameters between normal and cancer genes. We employed 2 optimization methods-the Broyden-Fletcher-Goldfarb-Shannon (BFGS) and the Powell method, 13 both of which are well known for the optimization of a multidimensional matrix problem. For filtering sequence analysis, we used PicTar, based on the scanning multiple alignment of 3′ UTR (untranslated region) sequences and a search set of miRNA sequences, and scored the overlapping position. 11 MiRanda is a miRNA target prediction algorithm that optimizes sequence complementarity using position-specific rules and relies on strict interspecies conservation requirements. 14 In this study, we considered breast and colon cancer data with 676 relationships between 26 miRNAs and 26 mRNAs.

Methods
The mathematical formulation and computational method of the PAOM are described in this section.

Mathematical formulation
It is generally accepted that miRNAs regulate gene expression via either the transcript cleavage or translational repression of their specific target mRNAs, whereas 1 mRNA expression is regulated by multiple miRNAs. For this mechanism, we have a linear equation model, in which 1 mRNA is affected by several miRNAs ( , , , ) x x x m 1 2  as follows: where a ij represents the influence of the jth miRNA on the ith mRNA, x j represents the expression level of the jth miRNA, and y i represents the expression level of the ith mRNA. To observe the relation between m miRNAs and n mRNAs simultaneously, we rewrite the system of linear equations in matrix form.
in which the measurements of the expression of m different miRNAs are denoted by ( , , ) x x m 1  and of the expression of n different mRNAs are denoted by ( , , ) y y n 1  . From K times experiments, We solve the equation as an inverse problem, and then obtain a ij as an unknown parameter, where a ij mainly represents the effect of x i and y j , even if a ij is affected by the levels of other x ' i s(i' ≠ i).

Computational method
In this section, we present a computational scheme for the identification of the relationships between miRNAs and their target mRNAs. The computational method is composed of 3 components-the optimization routine, the objective function, and the direct code. Among them, the optimization routine is the principal component of the computational method, the role of which is to obtain a new set of parameter estimations by solving the inverse problem shown in Equation 3. In particular, we employ the BFGS method for optimization, the so-called quasi-Newton method, which requires second derivatives of the objective function and thereby makes a quadratic convergent to the minimum of error norm, coupled with a drastic reduction of the computational burden. The second method is the Powell method known as direction set method, in which no such second derivatives of the objective function are required. Both methods are useful for multi-dimensional optimization, but they do not work successfully in all cases. After testing both methods with small nodes, we selected an appropriate one for the current cases. Next, the objective function is to provide the criterion for further processing to the next iteration on the basis of iterative error norms. Regarding the error norm, we employed the L 1 and L 2 norms, which are as follows: where y x a ij ( , ) is numerically computed data and y real is the experimentally obtained expression data. The L 1 norm worked successfully for the BFGS method and the L 2 norm worked better for Powell method. Finally, the direct solver is used to generate the computational data by solving the proposed mathematical model. Those computational data are then compared with the experimental data in an objective function. As both cancer and normal data were employed in this study, a note was that the predicted relations using cancer data are called P cancer , and the predicted relations using normal data are called P normal . Then, the relationships between the cancer and normal data sets were computed and compared with each other. Therefore, the numerical scheme is composed of 3 individual algorithms, such as 2 subroutines and 1 main routine, as shown in Figure 1. In the main routines, we compared the inhibitory relations of the normal data with the inhibitory relations of the cancer data, and then calculated the comparison values (CVs) using relative error as follows: As CV becomes bigger, the relation is proportional to the strength of the relation between mRNA and miRNA. With the normal and cancer expression data sets, we calculated the unknown parameters as inhibitory relations in the subroutine. The overall numerical scheme of the proposed algorithm follows.

Data setting
We extracted experimentally known 26 miRNAs and 26 mRNAs from the RNA expression profiles of human cancers reported by Lu et al. 15 In colon cancers, each gene consists of 4 normal and 7 cancer data points. In breast cancer, each gene consists of 3 normal and 6 cancer data points. As the data sets are much smaller than unknown parameters, malpositioning problem frequently occurs in the algorithm. For filtering analysis, we used PicTar (https://pictar.mdc-berlin.de/) and miRanda (http://www.microrna.org). PicTar is a computational method used to identify common targets of miRNAs, based on scanning multiple alignments of 3′ UTR sequences and a search set of miRNA sequences, followed by the scoring of the overlapping positions combining the PicTar scores of orthologous transcripts. 11 miRanda is a miRNA target prediction algorithm that optimizes sequence complementarity using position-specific rules and relies on strict interspecies conservation requirements. 14 Neither sequence filtering has any relation with cancers.    Figure 1 represents the comparisons of the real and reconstructed data based on the parameters derived from the Powell and BFGS methods. We tested 2 optimization methods: Powell and BFGS, with 4 nodes of mRNAs (MAPK14, E2F1, HMGA2, and NOTCH1) and 4 nodes of miRNAs (miR-34a, miR-17a, miR-24, and miR-30) from data points of the normal colon tissues. We also assessed each method with the L 1 and L 2 objective functions. The BFGS method with the L 1 and L 2 norms successfully reconstructed the real data points. The Powell method worked well with the L 2 norm for the reconstruction but not with L 1 , which suggests that more numerical studies will be necessary to evaluate the performance of Powell method. When the BFGS method with the L 1 and L 2 norms was used for large nodes in the normal breast data, an error term of 0.011 was obtained with L 2 norm, thereby indicating that its accuracy was quite poor as compared with that of 5.97E-7 with the L 1 norm. In this study, therefore, we employed the BFGS method with L 1 . Figure 2 shows the frequency of negative relationships with each of the presented miRNAs in breast cancer by cutting the PAOM score 5.

Results
The frequency of miR-20, miR-23a, and miR-223 are relatively high, which suggests that those miRNAs do exert some effect on breast cancer. 16,17 Zhang et al 6 presented miR-20 with the copy number lost in breast cancer. Figure 3 represents the frequency of negative relation with each of the colon cancer miRNAs by cutting PAOM score 5. The frequencies of miR-17, miR-30a, and miR-124 were highest, which implies that those miRNAs exert an effect on colon cancer. Recently, Monzo et al 12 observed that miR-17 was detected in human colon cancer development, and Silber et al 18 observed upregulated miR-124 in colon (HCT-116) cancer cell lines. Tables 1 and 2 show the miRNAs-mRNAs relationships using the proposed method with filtering sequence analysis. Table 1 shows the predicted relationships between miRNAs and mRNAs with sequence analysis in breast cancer. Based on our proposed method, we acquired 61 strong candidates out of a total of 676 miRNA-mRNA relations. With the integration of filtering sequence analysis, we predicted 18   Those filtering resources are not generally associated with breast cancer. However, the current methods predicted some relationships using normal breast and cancer breast expression data. The results of our proposed method indicate that let-7a, miR-223, miR-98a, and miR-34a downregulate CD24 expression in cases of breast cancer. With sequence analysis, we suggest that let-7a (PAOM score 48.2) and miR-98a (PAOM score 12.0) are predicted to be strongly associated with CD24 in breast cancer. Those relationships were experimentally verified. 19,20 Kaipparettu et al 19 found that CD24 expression was downregulated by estrogen in breast cancer stem cells. Verghese et al 20 found that let-7 family was downregulated significantly in breast tumor-initiating cells (BT-1Cs) that were enriched with CD24. Dai et al 21 showed that NF2 was a tumor suppressor gene in human breast cancer. The findings of the current study reveal that miR-24, miR-141, miR-23a, miR-19a, miR-27a, and miR-15a are involved in the downregulation of NF2 expression in human breast cancer cells. With further sequence analysis, we found that miR-141, miR-23a, and miR-27a may be involved in the downregulation of NF2 expression in breast cancer cells. Recent findings have shown that miR-20 regulates E2F1 negatively. 8,22 Recently, Yu et al 17 discovered a novel regulatory mechanism of breast cancer involving miR-20, which we also found here in PAOM 57.3. The following target genes have yet been verified by their miRNAs in breast cancer; however, the supporting evidences suggest that the relations are strongly associated with the breast cancer. CXCL12 expression is downregulated in primary breast carcinomas. 23,24 Inactivation of the product of MAPK14 via PPM1D overexpression was also previously discovered in breast tumor cell lines. 25 Supporting evidence for the role of NOTCH1 in breast cancer cells involves the fact that the rate of NOTCH1 expression in human breast cancer was found to be significantly higher than those of normal breast tissues at the margin of tumor sections. 26 Zang et al 27 showed that Notch signaling is overexpressed and highly activated in breast cancer. HMGA2 has been reported to be expressed in invasive and non-invasive breast cell lines. 28 Yang et al 29 experimentally verified that a core circadian CLOCK gene evidences tumor suppression properties and is downregulated in human breast cancer cells. Overexpression of THBS1 (TSP1) was detected in breast carcinoma and melanoma cells by interferon (IFN)-gamma-differentiated U937 cells in vitro via the release of reactive oxygen species. 30 Finally, the expression of SERP1 was suppressed in papillary thyroid cancer. 31 However, SERP1 and POLR2 have not yet been identified in breast cancer cells. Table 2 shows the prediction of 15 relationships in colon cancer using the proposed method with filtering sequence analysis. HMGA2 (high mobility group [HMGI]) was observed to be abundantly expressed in human colorectal carcinomas. [32][33][34] MAPK14 that is regulated by miR-124 11,35 maintains a high level of ERbeta for E2 anti-proliferative effects in colon cancer cells 36 and in giloblastoma. 37 In addition, MAPK14 is involved in apoptosis in colorectal cancer induced by growth factors. 38 The activation of the Wnt signaling pathway appears to suppress the expression of the THBS1 gene in colon cancer cells. 39 Jung et al 40 observed that SIP1 (ZEB2), an E-cadherin transcriptional repressor, is induced by overexpressing TMPRSS2 in colon cancer cells, and affects the loss of E-cadherinmediated cell-cell adhesion, resulting in an increase in cellular motility. Krugluger et al 41 found that CLOCK is more abundantly expressed in colon cancer tissues than in normal tissue.

Prediction of colon cancer
In addition, Kiriakidou et al 42 experimentally reported that CLOCK is a target gene of miR-141. Recently, Zhang et al 43 demonstrated that the Notch1 signal transduction pathway mediates the effect of COX-2 selective inhibitors on colorectal cancer cells, and also discovered the mechanism of the Notch1  19 which was previously verified by Kaipparettu et al. 19 Further studies will be necessary to verify those findings. In colon cancer cells, we predicted 15 relationships with 12 genes and verified 9 genes-MAPK14, CLOCK, HMGA2, THBS1, CXCL12, SIP1, NOTCH1, E2F3, and DVL2-influence on colon cancer. Overall, the proposed method described in this study was successful in the detection of some potential relationships, and may provide information for experimental studies targeting toward the identification of miRNA-mRNA relationships for specific cancers. However, there is no doubt that many unidentified relations continue to exist. Therefore, a novel approach using both computational methods and experimental validation is yet to be proposed for better outcomes in the prediction of miRNA-mRNA relationships in cancers.