Elsevier

Methods

Volume 83, 15 July 2015, Pages 80-87
Methods

PlantMirnaT: miRNA and mRNA integrated analysis fully utilizing characteristics of plant sequencing data

https://doi.org/10.1016/j.ymeth.2015.04.003Get rights and content

Highlights

  • We model miRNA–mRNA relationship considering split ratio of miRNA abundance.

  • Model is designed to fully utilize sequencing data.

  • Two miRNAs suppress glucose pathway in a naturally drought resistant rice.

Abstract

miRNA is known to regulate up to several hundreds coding genes, thus the integrated analysis of miRNA and mRNA expression data is an important problem. Unfortunately, the integrated analysis is challenging since it needs to consider expression data of two different types, miRNA and mRNA, and target relationship between miRNA and mRNA is not clear, especially when microarray data is used. Fortunately, due to the low sequencing cost, small RNA and RNA sequencing are routinely processed and we may be able to infer regulation relationships between miRNAs and mRNAs more accurately by using sequencing data. However, no method is developed specifically for sequencing data. Thus we developed PlantMirnaT, a new miRNA–mRNA integrated analysis system.

To fully leverage the power of sequencing data, three major features are developed and implemented in PlantMirnaT. First, we implemented a plant-specific short read mapping tool based on recent discoveries on miRNA target relationship in plant. Second, we designed and implemented an algorithm considering miRNA targets in the full intragenic region, not just 3′ UTR. Lastly but most importantly, our algorithm is designed to consider quantity of miRNA expression and its distribution on target mRNAs. The new algorithm was used to characterize rice under drought condition using our proprietary data. Our algorithm successfully discovered that two miRNAs, miRNA1425-5p, miRNA 398b, that are involved in suppression of glucose pathway in a naturally drought resistant rice, Vandana.

The system can be downloaded at https://sites.google.com/site/biohealthinformaticslab/resources.

Introduction

microRNA(miRNA)s are small non-coding RNAs, short in length of 19–25 nucleotides. miRNAs play a significant biological role by regulating genes post-transcriptionally in animals and plants. There are several hundreds of miRNAs in human genome and they target about 60% of protein coding genes [10]. In other species including plants, miRNAs also play such critical roles [6], [17].

miRNA–mRNA regulation relationships are n to m since a miRNA targets m mRNA transcripts and a mRNA transcripts can be regulated by up to n miRNAs. Because n is much smaller than m and miRNAs target about 60% mRNAs, we can say that miRNAs are innately hubs in the gene regulation network, targeting many genes. Thus construction of miRNA–mRNA regulation network is a very important problem. Although miRNAs regulate not only mRNAs but also correlate with transcription factors and siRNAs [6], we only concentrate on relationship between miRNA and protein coding genes.

There are many computational methods for inferencing miRNA–mRNA target relationships. Yoon et al. grouped these methods into three categories [45]. Tools in the first category use sequence complementary features between miRNAs and their target mRNA sequences. Many well known algorithms such as miRanda [9], PITA [22], RNAhybrid [35], and TargetScan [24] belong to the first category. Some algorithms in the first category such as TAPIR [1] and PsRobot [40] are designed for plant miRNA target finding. Since these tools do not consider the expression information of miRNA and genes, tools in the first category tend to have high false positive rates. However, they are useful for filtering miRNA target pairs.

Tools in the second category use machine learning techniques to identify target pairs by using experimentally verified miRNA target pairs as training data. Some algorithms, such as miREE [36], NBmiRTar [44], and DIANA-microT-ANN [34], utilize both structural features and machine learning features. A recent algorithm [27] is developed for plant miRNA target pair prediction. It uses PCA–SVM algorithm to classify true and false miRNA-target pairs. The methods in the second category can find more target pairs accurately than first generation tools by utilizing experimentally verified training data, but still hard to achieve low false positive rate and to find condition specific miRNA-target pairs.

The methods in the third category use existing tools with expression profiles of miRNA and mRNA to get condition specific miRNA–mRNA target pairs. Our method proposed in this paper also can be classified to the third category. Examples of such methods are MMIA [4], [30], [42], GenMiR++ [16], and an algorithm in [29].

MMIA [4], [30], [42] performs the integrated analysis of miRNA and RNA expression data in two steps. The first step is to identify “differentially” expressed miRNAs by clustering analysis. In the second step, only genes that are targeted by differentially expressed miRNAs are considered, The gene set is further reduced by using sequence based target finding algorithms such as TargetScan, PITA and PicTar, and also by using negative correlation information between miRNA and mRNA expression levels. MMIA divides the miRNA and mRNA’s expression data to three clusters: a down-regulated group, an up-regulated group and an unchanged group. It predicts miRNA–mRNA target pairs when the miRNA belongs to a down(up) regulated group and mRNA belongs to a up(down) regulated group. This approach can effectively identify genes that are regulated by differentially expressed miRNAs. However it can miss many true miRNA–mRNA pairs whose expression is not significantly up(down) in the cell.

GenMiR++ [16] uses a linear model for expected mRNA expression values based on the following equation:E[xgt]=μt-γtkλksgkzkt,λk>0where x is mRNA expression value, zkt is expression value of miRNA k in sample t,sgk is an indicator variable to denote that miRNA k targets mRNA g,μ is the background expression of mRNA, γ is the tissue scaling factor. Using the linear model, a Bayesian network model is proposed. In the Bayesian network model, target transcript expression level x depends on a tissue scaling parameter, the miRNA expression level, a regulatory weight, and an indicator variable to denote whether or miRNA k targets transcript g. P(S|X,Z,C,Θ) is estimated using the Bayesian inference and the expectation–maximization technique.

Another algorithm [29] is based on the lasso regression technique. Given that sequence based algorithms in the first category identify K miRNAs to target mRNA j and cjk is an indicator variable to denote whether miRNA k putatively targets j-th mRNA or not, a linear model below was used,xj=k=1Kβjkcjkzk+xj0+jwhere j is an error term and xj0 is an logarithm of the expression values when no miRNA targets the mRNA. With this model, lasso regression is used to find β values minimizing the Eq. (3).βj,xj0xj-k=1Kβjkcjkzjk-xj02+λjk=1KβjkcjkLasso regression with a constraint that β should be non-positive for indicating only down-regulation of miRNA effect and λjk=1K|βjkcjk| is the penalty term for enforcing the sparsity of solution.

Although existing methods were successful to reveal important biological mechanisms, there are three major issues with existing methods especially for analyzing RNA-seq data in plant.

First, existing methods are designed for animals, not for plants. There is a major difference in miRNA regulating mechanism between animals and plants. In plant, miRNA and its target mRNAs have the nearly perfect matches. On the other hand, in animals, miRNA and its target mRNA have the nearly perfect match only in the seed region of miRNA. A recent study [13] reports existence of additional motifs in addition to seed regions, but not as perfect matches as in plant. In addition, animal miRNAs usually do not cut the mRNA but interrupt the protein translation process. To reflect this major difference, we implemented and used a de novo miRNA mapping algorithm incorporating plant specific matching information.

Second, in plant it is also known that miRNA targets intragenic regions as well as 3′ UTR [2], [26]. Thus it is necessary to detect miRNA target sites in the whole transcript region and then model the combinatorial effect of targets sites in the whole gene region. To model the combinatorial effect, we designed and implemented a logarithmic model that considers all target sites in transcripts together.

Third and the most importantly, existing methods consider only relative expression changes between miRNAs and their targets. However, use of abundance or expression quantity for the integrated analysis of miRNA and mRNA became an important issue [46]. When miRNAs are transcribed, miRNA transcripts are split and hybridize to each of their targets to suppress their target mRNA. Thus, it is obvious that the miRNA with larger amount of expressed transcripts has more significant impact on mRNA transcriptome even if the relative change is low. Until now, abundance information is difficult to measure from microarray-data. Now, sequencing data can provide the quantity of miRNA or mRNA expression more precisely. To leverage the power of sequencing data fully, we developed a split-ratio based model used terms to reflect normalized expression quantities.

We tried our system on rice data under drought condition (data is described in Result section). Drought could potentially threaten survival of plant, so complex mechanisms have evolved to accurately monitor the environment and very dynamically reprogram metabolism [5]. We tried to reveal these mechanism by analyzing drought resistant rice sample.

Section snippets

Description of overall learning algorithm

We developed an algorithm for inferencing miRNA–mRNA regulatory mechanisms in plant using sequencing data. Input to our algorithm are small RNA sequencing data and RNA sequencing data. Our algorithm infers not only targets of miRNAs but also the split ratio of each target miRNA–mRNA pairs. Inference on the split ratio is a major feature distinguishing our algorithm from existing algorithms. By considering expression quantity, our algorithm can enforce the limit on the regulation capacity of a

Data

Two rice varieties (Oryza sativa L. Japonica nipponbare, Oryza sativa L. Vandana) were used in purpose of observing the difference in the condition of drought in rice. Vandana is a rice variety with wild-type drought resistance characteristics. Two different samples of Nipponbare (normal and AP2/EREBP transgenic sample) were used. AP2/EREBP sample is the same with Nipponbare except that two AP2/EREBP transcription factors are amplified by using overexpression vectors OsCc1:AP37 and OsCc1:AP59,

Conclusion

We presented a novel approach for inferencing the regulation relationship between miRNAs and mRNAs in plant using small RNA sequencing data and RNA sequencing data. Our method is a comprehensive system for miRNA target inference deploying many methods in a single framework, such as de novo sequence mapping algorithm for plant and the regression based model optimization. The major features distinguishing our method from existing methods is that our system is designed for sequencing data and also

Acknowledgement

This research was supported by a grant from KOBIC, the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Ministry of Science, ICT & Future Planning (2012M3A9D1054622), Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (No. NRF-2012M3C4A7033341) and a grant from the Next-Generation BioGreen 21 Program (No. PJ009037022012),

References (46)

  • K.M. Creasey et al.

    miRNAs trigger widespread epigenetically activated siRNAs from transposons in Arabidopsis

    Nature

    (2014)
  • A. Döring et al.

    SeqAn An efficient, generic C++ library for sequence analysis

    BMC Bioinf.

    (2008)
  • B. Engelmann

    Plasmalogens: targets for oxidants and major lipophilic antioxidants

    Biochem. Soc. Trans.

    (2004)
  • A.J. Enright et al.

    MicroRNA targets in Drosophila

    Genome Biol.

    (2003)
  • R.C. Friedman et al.

    Most mammalian mRNAs are conserved targets of microRNAs

    Genome Res.

    (2009)
  • P. Gong et al.

    Transcriptional profiles of drought-responsive genes in modulating transcription signal transduction, and biochemical pathways in tomato

    J. Exp. Bot.

    (2010)
  • S. Griffiths-Jones et al.

    miRBase: microRNA sequences, targets and gene nomenclature

    Nucl. Acids Res.

    (2005)
  • D. Huang et al.

    Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources

    Nat. Protocols

    (2009)
  • D.W. Huang et al.

    Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

    Nucl. Acids Res.

    (2008)
  • J.C. Huang et al.

    Using expression profiling data to identify human microRNA targets

    Nat. Methods

    (2007)
  • M.W. Jones-Rhoades et al.

    MicroRNAs and their regulatory roles in plants

    Annu. Rev. Plant Biol.

    (2006)
  • M. Kanehisa et al.

    KEGG: kyoto encyclopedia of genes and genomes

    Nucl. Acids Res.

    (2000)
  • M. Kantar et al.

    miRNA expression patterns of Triticum dicoccoides in response to shock drought stress

    Planta

    (2011)
  • Cited by (0)

    View full text