Skip to main content
Log in

Genome-wide identification of coding small open reading frames: The unknown transcriptome

  • Published:
Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Abstract

The identification of the complete repertoire of functional peptides in a cell is ultimately essential for a systems-wide understanding of its behavior. There have indeed been a plethora of studies purportedly designed to this end. However, these studies in fact routinely overlook a potentially significant portion of their data that might encode for peptides that are smaller than 100 amino acids. This is largely owing to technical reasons associated with the difficulty of distinguishing, with statistical significance, a coding sequence of this length from a non-coding sequence. Recently, a growing number of studies have shown that there are indeed many small open reading frame (sORF) encoded peptides that play important roles in a wide range of different biological processes. As such, there is now significant interest in methodologies that can be used to identify this drastically neglected portion of the cellular proteome. In this review, we introduce the presently known annotated sORFs and describe the new strategies that have been used to determine the coding sORFs, genome-wide.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Nekrutenko A, Makova K D, Li W H. The K a/K s ratio test for assessing the protein-coding potential of genomic regions: An empirical and simulation study [J]. Genome Research, 2002, 12(1): 198–202.

    Article  Google Scholar 

  2. Camby I, le Mercier M, Lefranc F, et al. Galectin-1: A small protein with major functions [J]. Glycobiology, 2006, 16(11): 137R–157R.

    Article  Google Scholar 

  3. Ikeuchi M, Yamaguchi T, Kazama T, et al. ROTUNDIFOLIA4 regulates cell proliferation along the body axis in arabidopsis shoot [J]. Plant and Cell Physiology, 2011, 52(1): 59–69.

    Article  Google Scholar 

  4. Galindo M I, Pueyo J I, Fouix S, et al. Peptides encoded by short ORFs control development and de-fine a new eukaryotic gene family [J]. PLoS Biology, 2007, 5(5): e106.

    Article  Google Scholar 

  5. Kondo T, Hashimoto Y, Kato K, et al. Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA [J]. Nature Cell Biology, 2007, 9(6): 660–665.

    Article  Google Scholar 

  6. Savard J, Marques-Souza H, Aranda M, et al. A segmentation gene in tribolium produces a polycistronic mRNA that codes for multiple conserved peptides [J]. Cell, 2006, 126(3): 559–569.

    Article  Google Scholar 

  7. Basrai M A, Hieter P, Boeke J D. Small open reading frames: Beautiful needles in the haystack [J]. Genome Research, 1997, 7(8): 768–771.

    Google Scholar 

  8. Näsel D R, Winther Å M E. Drosophila neuropeptides in regulation of physiology and behavior [J]. Progress in Neurobiology, 2010, 92(1): 42–104

    Article  Google Scholar 

  9. Kastenmayer J P, Ni L, Chu A, et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae [J]. Genome Research, 2006, 16(3): 365–373.

    Article  Google Scholar 

  10. Hanada K, Akiyama K, Sakurai T, et al. sORF finder: A program package to identify small open reading frames with high coding potential [J]. Bioinformatics, 2010, 26(3): 399–400.

    Article  Google Scholar 

  11. Ladoukakis E, Pereira V, Magny E G, et al. Hundreds of putatively functional small open reading frames in drosophila [J]. Genome Biol, 2011, 12(11): R118.

    Article  Google Scholar 

  12. The Uniprot Consortium. Update on activities at the universal protein resource (UniProt) in 2013 [J]. Nucleic Acids Research, 2013, 41 (D1): D43–D47.

    Article  Google Scholar 

  13. Werner M, Feller A, Messenguy F, et al. The leader peptide of yeast gene CPA1 is essential for the translational repression of its expression [J]. Cell, 1987, 49(6): 805–813.

    Article  Google Scholar 

  14. Akimoto C, Sakashita E, Kasashima K, et al. Translational repression of the McKusick-Kaufman syndrome transcript by unique upstream open reading frames encoding mitochondrial proteins with alternative polyadenylation sites [J]. Biochimica et Biophysica Acta, 2013, 1830(3): 2728–2738.

    Article  Google Scholar 

  15. Casson S A, Chilley P M, Topping J F, et al. The POLARIS gene of arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning [J]. The Plant Cell, 2002, 14(8): 1705–1721.

    Article  Google Scholar 

  16. Magny E G, Pueyo J I, Pearl F M, et al. Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames [J]. Science, 2013, 341(6150): 1116–1120.

    Article  Google Scholar 

  17. Vanderperre B, Staskevicius A B, Tremblay G, et al. An overlapping reading frame in the PRNP gene encodes a novel polypeptide distinct from the prion protein [J]. The FASEB Journal, 2011, 25(7): 2373–2386.

    Article  Google Scholar 

  18. Slavoff S A, Mitchell A J, Schwaid A G, et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells [J]. Nature Chemical Biology, 2013, 9(1): 59–64.

    Article  Google Scholar 

  19. Ghaemmaghami S, Huh W K, Bower K, et al. Global analysis of protein expression in yeast [J]. Nature, 2003, 425(6959): 737–741.

    Article  Google Scholar 

  20. Ingolia N T, Brar G A, Rouskin S, et al. The ribosome profiling strategy for monitoring translationin vivo by deep sequencing of ribosome-protected mRNA fragments [J]. Nature Protocols, 2012, 7(8): 1534–1550.

    Article  Google Scholar 

  21. Brar G A, Yassour M, Friedman N, et al. Highresolution view of the yeast meiotic program revealed by ribosome profiling [J]. Science, 2012, 335(6068): 552–557.

    Article  Google Scholar 

  22. Dunn J G, Foo C K, Belletier N G, et al. Ribosome profiling reveals pervasive and regulated stop codon readthrough in drosophila melanogaster [J]. eLife, 2013, 2: e01179.

    Article  Google Scholar 

  23. Li G W, Oh E, Weissman J S. The anti-shinedalgarno sequence drives translational pausing and codon choice in bacteria [J]. Nature, 2012, 484(7395): 538–541.

    Article  Google Scholar 

  24. Stern-Ginossar N, Weisburd B, Michalski A, et al. Decoding human cytomegalovirus [J]. Science, 2012, 338(6110): 1088–1093.

    Article  Google Scholar 

  25. Ingolia N T, Lareau L F, Weissman J S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes [J]. Cell, 2011, 147(4): 789–802.

    Article  Google Scholar 

  26. Lee S, Liu B, Lee S, et al. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution [J]. Proceedings of the National Academy of Sciences, 2012, 109(37): 2424–2432.

    Article  Google Scholar 

  27. Guttman M, Russell P, Ingolia N T, et al. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins [J]. Cell, 2013, 154(1): 240–251.

    Article  Google Scholar 

  28. Chew G L, Pauli A, Rinn J L, et al. Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs [J]. Development, 2013, 140(13): 2828–2834.

    Article  Google Scholar 

  29. Brannan C I, Dees E C, Ingram R S, et al. The product of the H19 gene may function as an RNA [J]. Molecular and Cellular Biology, 1990, 10(1): 28–36.

    Google Scholar 

  30. Sotomaru Y, Katsuzawa Y, Hatada I, et al. Unregulated expression of the imprinted genes H19 and Igf2r in mouse uniparental fetuses [J]. Journal of Biological Chemistry, 2002, 277(14): 12474–12478.

    Article  Google Scholar 

  31. Lin M F, Jungreis I, Kellis M. PhyloCSF: A comparative genomics method to distinguish protein coding and non-coding regions [J]. Bioinformatics, 2011, 27(13): 275–282.

    Article  Google Scholar 

  32. Washietl S, Findei S, Müler S A, et al. RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data [J]. RNA, 2011, 17(4): 578–594.

    Article  Google Scholar 

  33. Kong L, Zhang Y, Ye Z Q, et al. CPC: Assess the protein-coding potential of transcripts using sequence features and support vector machine [J]. Nucleic Acids Research, 2007, 35(Sup 2): W345–W349.

    Article  Google Scholar 

  34. Wang L, Park H J, Dasari S, et al. CPAT: Codingpotential assessment tool using an alignment-free logistic regression model [J]. Nucleic Acids Research, 2013, 41(6): e74.

    Article  Google Scholar 

  35. Sun K, Chen X, Jiang P, et al. iSeeRNA: Identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data [J]. BMC Genomics, 2013, 14(Sup 2): S1–S7.

    Google Scholar 

  36. Crappé J, Van Criekinge W, Trooskens G, et al. Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs [J]. BMC Genomics, 2013, 14(1): 648–660.

    Article  Google Scholar 

  37. Yang X, Tschaplinski T J, Hurst G B, et al. Discovery and annotation of small proteins using genomics, proteomics, and computational approaches [J]. Genome Research, 2011, 21(4): 634–641.

    Article  Google Scholar 

  38. Gascoigne D K, Cheetham S W, Cattenoz P B, et al. Pinstripe: A suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of proteincoding and non-coding genes [J]. Bioinformatics, 2012, 28(23): 3042–3050.

    Article  Google Scholar 

  39. Menschaert G, Van Criekinge W, Notelaers T, et al. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events [J]. Molecular & Cellular Proteomics, 2013, 12(7): 1780–1790.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling Bai  (白 玲).

Additional information

Foundation item: the National Basic Research Program (973) of China (No. 2010CB529205)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Hm., Hu, Cs. & Bai, L. Genome-wide identification of coding small open reading frames: The unknown transcriptome. J. Shanghai Jiaotong Univ. (Sci.) 19, 663–668 (2014). https://doi.org/10.1007/s12204-014-1563-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12204-014-1563-x

Key words

CLC number

Navigation