Elsevier

Biochimie

Volume 93, Issue 11, November 2011, Pages 2019-2023
Biochimie

Research paper
Protein-coding structured RNAs: A computational survey of conserved RNA secondary structures overlapping coding regions in drosophilids

https://doi.org/10.1016/j.biochi.2011.07.023Get rights and content

Abstract

Functional RNA elements can be embedded also within exonic sequences coding for functional proteins. While not uncommon in viruses, only a few examples of this type have been described in some detail for eukaryotic genomes. Here we use RNAz and RNAcode, two comparative genomics methods that measure signatures of stabilizing selection acting on RNA secondary structure and peptide sequence, resp., to survey the fruit fly genomes. We estimate that there might be on the order of 1000 loci that are subject to dual selection pressure. The used genome-wide screens also expose the limitations of the currently available methods.

Highlights

► Structure prediction reveals thousands of loci with evidence of stabilizing selection. ► Predictions of protein-coding regions uncover thousands of unannotated novel exons. ► Coding regions found in dicistronic transcripts and functional read-through products. ► About 9% of the RNAz hits overlap CDS, so there may be ∼1000 loci with dual functions. ► Available comparative genomics methods cannot produce accurate maps of dual loci.

Introduction

The worlds of protein-coding genes and those of non-coding RNAs (ncRNAs) are often seen as clearly separated. A small number of examples from both prokaryotes and eukaryotes, however, demonstrates that this is not strictly true, see [1] for a recent review. The best studied case in animals is the Steroid receptor activator gene (SRA). Originally characterized as a non-coding RNA with a distinctive secondary structure [2], it was later found to have isoforms coding for the functional protein SRAP, see [3]. SRA is probably the most extreme example, as nearly the complete transcript is covered by both conserved RNA structure and protein-coding region. At the other extreme, however, structured RNA motifs, such as selenocystein insertion elements (SECIS), internal ribosome entry sites (IRES), or mRNA localization signals are not at all frequent in UTRs of protein-coding transcripts [4]. In some cases, in particular in viruses, such structured elements are found in coding regions. The software tool RNAdecoder [5], which implements a comparative method for finding and folding RNA secondary structures within protein-coding regions, provided statistical evidence for frequent superpositions of RNA structure and coding sequences [6]. Nevertheless, systematic genome-wide analyses of this phenomenon have not been published to date.

The overwhelming majority of annotated coding regions translates to proteins with more than 30 amino acids. The discovery of the very short, independently encoded Tarsal-less peptides [7], [8], however, suggests that more small ORFs of this type might be hidden in the genomic DNA. Examples such as plant ENOD40 [9] with its short ORFs embedded in a heavily structured RNA, furthermore, hint at a possible association of functional secondary structure with coding capacity in such atypical transcripts. These, however, are likely to have escaped standard gene annotation procedures.

We therefore start our survey of the drosophilid genomes with the independent prediction of evolutionarily conserved RNA secondary structures and of open reading frames with evidence of stabilizing selection acting on the peptide sequence. To this end we employ RNAz [10], [11] and RNAcode [12], respectively.

Section snippets

Methods summary

This study is based on the 15-way Multiz alignment of insect genomes, which contains the 12 sequenced drosophilids, mosquito, honeybee and beetle. All analysis refer to the genome of Drosophila melanogaster. Conserved secondary structures were assessed using RNAz 2.0 [11] after slicing the alignment blocks into overlapping windows and removing alignment slices with too many gaps, too few sequences, or too extreme sequence divergence. Selection pressure on peptide sequence level was examined by

RNAz screen

RNAz 2.0 detected 15912 loci (covering about 1.3 Mb of the genomic DNA) that show evidence for evolutionarily conserved secondary structure. Of these, 394 correspond or can be aligned with known structured ncRNAs. Taking into account that a sizable fraction of known ncRNAs are not included in the input alignments (notably tRNAs, due to variations in the number of tRNA genes in the various flies, and microRNAs, due the abundance of species-specific miRNAs), this amounts to an overall recall rate

Discussion

In this contribution we have explored to what extent existing computational methods are capable of surveying loci at which coding sequence and secondary structure is simultaneously under stabilizing selection. With the notable exception of RNAdecoder [5], [6] this topic has received very little attention, probably because of the lack of well-studied examples. Hence we have combined a survey of structured RNA elements and a search for conserved coding sequences with RNAz and RNAcode,

Acknowledgments

This work was supported in part by grants from the Deutsche Forschungsgemeinschaft (grant No. STA 850/7-1 under the auspices of SPP-1258 “Small Regulatory RNAs in Prokaryotes”).

References (27)

  • D. Ulveling et al.

    When one is better than two: RNA with dual functions

    Biochimie

    (2011)
  • Q.P. Gu et al.

    Conserved features of selenocysteine insertion sequence (SECIS) elements in selenoprotein W cDNAs from five species

    Gene

    (1997)
  • R.B. Lanz et al.

    Distinct RNA motifs are important for coactivation of steroid hormone receptors by steroid receptor RNA activator (SRA)

    Proc. Natl. Acad. Sci. U S A

    (2002)
  • E. Leygue

    Steroid receptor RNA activator (SRA1): unusual bifaceted gene products with suspected relevance to breast cancer

    Nucl. Receptor Signaling

    (2007)
  • G. Grillo et al.

    UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs

    Nucleic Acids Res.

    (2010)
  • J.S. Pedersen et al.

    A comparative method for finding and folding RNA secondary structures within protein-coding regions

    Nucleic Acids Res.

    (2004)
  • I.M. Meyer et al.

    Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs

    Nucleic Acids Res.

    (2005)
  • M.I. Galindo et al.

    Peptides encoded by short ORFs control development and define a new eukaryotic gene family

    PLoS Biol.

    (2007)
  • T. Kondo et al.

    Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA

    Nat. Cell Biol.

    (2007)
  • A.P. Gultyaev et al.

    Identification of conserved secondary structures and expansion segments in enod40 RNAs reveals new enod40 homologues in plants

    Nucleic Acids Res.

    (2007)
  • S. Washietl et al.

    Fast and reliable prediction of noncoding RNAs

    Proc. Natl. Acad. Sci. USA

    (2005)
  • A.R. Gruber et al.

    RNAz 2.0: improved noncoding RNA detection

    Pac. Symp. Biocomput

    (2010)
  • S. Washietl et al.

    RNAcode: robust prediction of protein coding regions in comparative genomics data

    RNA

    (2011)
  • View full text