Identification of new small non-coding RNAs from tobacco and Arabidopsis
Introduction
Small non-messenger RNAs have typically been defined as non-coding RNAs with sizes ranging from 20 to 500 nt [1]. Using the so-called experimental Rnomics approach, some small non-protein-coding RNAs, sized 50–500 nt have been cloned in several sequenced organisms including the model plant A. thaliana [2], [3]. In addition to new members of known small ncRNAs, such as small nucleolar RNAs, other RNAs have been identified and located in nuclear intergenic regions and in the mitochondrial or chloroplast genomes. Recently, another class of eukaryotic small ncRNA has emerged in many organisms, miRNAs, which negatively regulate their complementary mRNAs at the posttranscriptional level. In plants, miRNAs base pair to their target RNAs and induce their degradation (for review, see [4]). MiRNAs are small, non-coding RNAs of 18–25 nt in length with 5′P and 3′OH. MiRNAs sequences correspond to regions annotated as intergenic and arise from evolutionarily conserved hairpin precursor transcripts of variable length (60–300 nt in plants) cleaved by the RNAse III-like enzyme, DICER. Another class of small ncRNAs, small interfering RNAs (siRNAs), share structural properties with miRNAs. SiRNAs are 21 nt in length, however they are double stranded and arise by another mechanism (for review, see [5]).
Initially, identification of the first miRNAs in A. thaliana [6], [7] implicated the cloning of their cDNAs and confirmation of their expression by Northern blot analysis. Recently, pure computational approaches have been developed: they involve a series of conditions that a sequence has to fulfill to be eligible as a candidate [8], [9], [10]. Complete genomes were systematically checked against a set of rules derived from the previous instances of experimentally described miRNAs. These approaches were successful in the identification of large amounts of miRNAs sequences. For example, Wang et al. [10] used several filters such as hairpin structure, G-C content, identity with Oryza sativa genome to select putative miRNAs from Arabidopsis intergenic sequences.
Here, we describe a mixed method, which allowed the identification and classification of new ncRNAs. We constructed a cDNA library from small RNAs (sRNA) ranging from 20 to 30 nt. The clones originated from N. tabacum, whose known genomic sequences are scarce. Hence, it was likely that most of our sequences would not be assigned to a known sequence in this species. However, we expected that, if miRNAs were cloned, the sequences would be conserved and similarities would be found among other plant sequences. This approach thus differs from pure computational approaches, which can be described as pipelines of tests which, starting with a whole genome, eliminate sequences step by step [8], [9], [10]. The main task of such a process is to avoid false positive sequences, as these are likely to occur because of the size of the initial set and the absence of prior knowledge about the query sequences. Our starting point was very different, as we knew that each sequence we examined was actually expressed. In this context, the task of the bio-computing analysis was to assign a possible identity and/or role to these molecules, and to direct future experiments. Thus, even though we were obviously looking for the same type of features as the pure computational approaches our constraints could be more relaxed.
The sequences from 73 unique clones were compared to the Arabidopsis thaliana genome and to all known plant sequences, using a pattern-matching approach allowing for a limited number of mismatches (program PatBank). This way, we selected 15 clones from the library corresponding mostly to A. thaliana or N. tabacum non-coding sequences. By Northern blot analysis we confirmed the presence of most of the RNA candidates in Arabidopsis and/or in N. sylvestris (maternal ancestor of N. tabacum). In addition, stable folds with their flanking regions (upstream and downstream) were computationally predicted with MIRFOLD, a program specifically dedicated to the prediction of miRNA precursor structure. Stable hairpins structures were observed. Ways of characterizing the targets of these putative miRNAs are discussed.
Section snippets
Cloning of sRNAs from tobacco
Total RNA was isolated from Nicotiana tabacum leaves 3–5 h after infiltration of harpin protein from Erwinia amylovora [11]. Small RNAs (about 500 μg) were purified on a denaturing 15% polyacrylamide gel. A gel fragment spanning the size range of 20–30 nt was excised and RNA eluted overnight in 0.3 M NaCl at 4 °C. RNA was recovered after ethanol precipitation and ligated sequentially to 5′ and 3′ RNA/DNA chimeric oligonucleotide adapters (5′ adapter: ACGGAATTCCTCACTaaa (lower case are RNA) and
Construction and characterization of a small RNA library from N. tabacum
We constructed a N. tabacum cDNA library from small RNAs of N. tabacum leaves as described in Llave et al. [6]. Small RNAs (20–30 nt) were size fractionated from denaturing polyacrylamide gels. After elution, they were ligated to 5′ and 3′ adapters, amplified, cloned and sequenced. We obtained sequences from 127 clones. Size distribution showed a typical bimodal curve [6] with higher score at 21 and 24 nt (not shown). A small proportion of sequences (9%) were considered too small (≤ 14 nt) for
Conclusion
We constructed a cDNA library of small ncRNAs from tobacco using a procedure used to clone miRNA. To help the identification of the cloned sequenced we developed a software, Patbank, which allowed comparison with the fully sequenced genome of Arabidopsis. Indeed, we postulated that regulatory RNA would be conserved between the two plants. We identified sRNA localized in A. thaliana or N. tabacum intergenic regions, and in non-coding regions from organelles. Most of the sequences presented here
Acknowledgments
We thank very much Louise Chapell (Sainsbury Laboratory) for introducing one of us (M.B.) to small RNA manipulations and cloning. M.B. warmly thanks Vincent Colot and his group (URGV, Evry) for discussions and encouragement. Thank you to Edouardo Rocha (CNRS, ABI) and Gillian for advices on the manuscript. We thank both referees for helpful suggestions to improve our manuscript. This work has been founded by the ACI Interface physique, chimie, biologie: dynamique et reactivité des assemblages
References (25)
- et al.
Experimental Rnomics, identification of 140 candidates for small non-messenger RNA in the plant Arabidopsis thaliana
Curr. Biol.
(2002) - et al.
Computational identification of plant microRNAs and their targets including a stress-induced miRNA
Mol. Cell
(2004) - et al.
Novel disease resistance specificities result from sequence exchange between tandemly repeated genes at the Cf-4/9 locus of tomato
Cell
(1997) - et al.
Single processing centre models for human Dicer and bacterial RNase III
Cell
(2004) - et al.
Prediction of plant micro-RNA targets
Cell
(2002) Non-coding RNA genes and the modern RNA world
Nat. Rev. Genet.
(2001)- et al.
RNomics in Drosophila melanogaster: identification of 66 candidates for novel non-messenger RNAs
Nucleic Acids Res.
(2003) RNA silencing in plants
Nature
(2004)- et al.
Mechanisms of gene silencing by double-stranded RNA
Nature
(2004) - et al.
Endogenous and silencing associated small RNAs in plants
Plant Cell
(2002)