Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Quantification of translation uncovers the functions of the alternative transcriptome

Abstract

Translation has a fundamental function in defining the fate of the transcribed genome. RNA-sequencing (RNA-seq) data enable the quantification of complex transcript mixtures, often detecting several transcript isoforms of unknown functions for one gene. Here, we describe ORFquant, a method to annotate and quantify translation at the level of single open reading frames (ORFs), using information from Ribo-seq data. By developing an approach for transcript filtering, we quantify translation transcriptome-wide, revealing translated ORFs on multiple isoforms per gene. For most genes, one ORF represents the dominant translation product, but we also detect genes with translated ORFs on multiple transcript isoforms, including targets of RNA surveillance mechanisms. Measuring translation across human cell lines reveals the extent of gene-specific differences in protein production, supported by steady-state protein abundance estimates. Computational analysis of Ribo-seq data with ORFquant (https://github.com/lcalviell/ORFquant) provides insights into the heterogeneous functions of complex transcriptomes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The ORFquant strategy to quantify translation on selected transcripts.
Fig. 2: Quantification of translation on cytoplasmic mRNAs.
Fig. 3: De novo annotation of NMD candidates.
Fig. 4: Diversity in gene translation across cell lines.
Fig. 5: Agreement of protein synthesis with steady-state protein abundance estimates.

Similar content being viewed by others

Data availability

Ribo-seq data for HepG2 and K562 are available at GEO under the accession code GSE129061. Public Ribo-seq data sets for other cell lines were downloaded under the accession codes GSE79664 (HeLa), GSE73136 (HEK293), GSE74279 (Jurkat) and GSE56924 (U2OS). Nuclear and cytoplasmic RNA-seq data were accessed at the European Nucleotide Archive using the accession code PRJEB4197. TriP-seq data were downloaded from GEO using the accession code GSE69352. Bigwig tracks of 5′ ends were accessed using the accession code GSE57433. Proteomics data were downloaded from the PRIDE repository under accession code PXD002395. The list of P-site positions and junction reads and a list of quantified ORFs in the cell types analyzed are available in Supplementary Data 1. The final protein database is available in Supplementary Data 2, together with the parameters used to perform the MaxQuant search and the set of identified peptides and proteins. Source data are provided with this paper.

Code availability

ORFquant is available at https://github.com/lcalviell/ORFquant.

References

  1. Brandman, O. & Hegde, R. S. Ribosome-associated protein quality control. Nat. Struct. Mol. Biol. 23, 7–15 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hu, W., Sweet, T. J., Chamnongpol, S., Baker, K. E. & Coller, J. Co-translational mRNA decay in Saccharomyces cerevisiae. Nature 461, 225–229 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Shoemaker, C. J. & Green, R. Translation drives mRNA quality control. Nat. Struct. Mol. Biol. 19, 594–601 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of resolution using ribosome profiling. Science 324, 218–223 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wang, Y., Wang, H. & Xie, Z. Computational resources for ribosome profiling: from database to Web server and software. Brief. Bioinform. 20, 144–155 (2017).

    Article  CAS  Google Scholar 

  6. Calviello, L. & Ohler, U. Beyond read-counts: Ribo-seq data analysis to understand the functions of the transcriptome. Trends Genet. 33, 728–744 (2017).

    Article  CAS  PubMed  Google Scholar 

  7. Bazzini, A. A. et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981–993 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hsu, P. Y et al. Super-resolution ribosome profiling reveals novel translation events in Arabidopsis. Proc. Natl Acad. Sci. USA 113, E7126–E7135 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Ingolia, N. T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Reports 8, 1365–1379 (2014).

    Article  CAS  PubMed  Google Scholar 

  10. Braunschweig, U. et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Blencowe, B. J. The relationship between alternative splicing and proteomic complexity. Trends Biochem. Sci. 4, e07794 (2017).

    Google Scholar 

  12. Tress, M. L., Abascal, F. & Valencia, A. Most alternative isoforms are not functionally important. Trends Biochem. Sci. 42, 408–410 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Gonzalez-Porta, M., Frankish, A., Rung, J., Harrow, J. & Brazma, A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol. 14, R70 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599.e4 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Sterne-Weiler, T., Weatheritt, R. J. & Blencowe, B. J. The ribosome-engaged landscape of alternative splicing. Nat. Struct. Mol. Biol. 23, 1117–1123 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Thomson, D. J. Spectrum estimation and harmonic analysis. Proc. IEEE Inst. Electr. Electron. Eng. 70, 1055–1096 (1982).

    Article  Google Scholar 

  17. Rahim, K. J., Burr, W. S. & Thomson, D. J. Appendix A: Multitaper R Package in Applications of Multitaper Spectral Analysis to Nonstationary Data. PhD thesis, Queen’s Univ. (2014).

  18. Calviello, L. et al. Detecting actively translated open reading frames in ribosome profiling data. Nat. Methods 13, 165–170 (2016).

    Article  CAS  PubMed  Google Scholar 

  19. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Sultan, M. et al. Influence of RNA extraction methods and library selection schemes on RNA-seq data. BMC Genomics 15, 675 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. Elife 5, e10921 (2016).

  24. Lykke-Andersen, S. & Jensen, T. H. Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes. Nat. Rev. Mol. Cell Biol. 16, 665–677 (2015).

    Article  CAS  PubMed  Google Scholar 

  25. Lykke-Andersen, S. et al. Human nonsense-mediated RNA decay initiates widely by endonucleolysis and targets snoRNA host genes. Genes Dev. 28, 2498–2517 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Park, J. E., Yi, H., Kim, Y., Chang, H. & Kim, V. N. Regulation of poly(A) tail and translation during the somatic cell cycle. Mol. Cell 62, 462–471 (2016).

    Article  CAS  PubMed  Google Scholar 

  27. Gawron, D., Ndah, E., Gevaert, K. & Van Damme, P. Positional proteomics reveals differences in N-terminal proteoform stability. Mol. Syst. Biol. 12, 858–858 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Jang, C., Lahens, N. F., Hogenesch, J. B. & Sehgal, A. Ribosome profiling reveals an important role for translational control in circadian gene expression. Genome Res. 25, 1836–1847 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Tapial, J. et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res. 27, 1759–1768 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Geiger, T., Wehner, A., Schaab, C., Cox, J. & Mann, M. Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol. Cell. Proteomics 11, M111.014050 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

    Article  CAS  PubMed  Google Scholar 

  32. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  PubMed  Google Scholar 

  33. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Wang, H., McManus, J. & Kingsford, C. Isoform-level ribosome occupancy estimation guided by transcript abundance with Ribomap. Bioinformatics 32, 1880–1882 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Reixachs-Solé, M., Ruiz-Orera, J., Albà, M. M. & Eyras, E. Ribosome profiling at isoform level reveals evolutionary conserved impacts of differential splicing on the proteome. Nat. Commun. 11, 1768 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Legnini, I., Alles, J., Ayoub, S., Karaiskos, N. & Rajewsky, N. FLAM-seq: full-length mRNA sequencing reveals principles of poly(A) tail length control. Nat. Methods 16, 879–886 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Fang, H. et al. Scikit-ribo enables accurate estimation and robust modeling of translation dynamics at codon resolution. Cell Syst. 6, 180–191.e4 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Michel, A. M. et al. Observation of dually decoded regions of the human genome using ribosome profiling data. Genome Res. 22, 2219–2229 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114–1125 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Verbruggen, S. et al. PROTEOFORMER 2.0: further developments in the ribosome profiling-assisted proteogenomic hunt for new proteoforms. Mol. Cell. Proteomics 18, S126–S140 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Airoldi, E., Franks, A. & Slavov, N. Post-transcriptional regulation across human tissues. PLoS Comput. Biol. 13, e1005535 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Yang, X. et al. Widespread expansion of protein interaction capabilities by alternative splicing. Cell 164, 805–817 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Collins, B. C et al. Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry. Nat. Commun. 8, 291 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Martens, L. & Vizcaíno, V. A. A golden age for working with public proteomics data. Trends Biochem. Sci. 42, 333–341 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).

    Article  CAS  PubMed  Google Scholar 

  47. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).

    Article  Google Scholar 

  48. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  50. Calviello, L., Sydow, D., Harnett, D. & Ohler, U. Ribo-seQC: comprehensive analysis of cytoplasmic and organellar ribosome profiling data. Preprint at bioRxiv https://doi.org/10.1101/601468

  51. Hahne, F. & Ivanek, R. In Methods in Molecular Biology Vol. 1418 (eds Mathé E., Davis S.) 335−351 (Humana Press, 2016).

  52. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Alexa, A. & Rahnenfuhrer, J. topGO: enrichment analysis for gene ontology. R Package version 2.26.0. (2016).

Download references

Acknowledgements

The authors acknowledge funding from the German Federal Ministry of Education and Research (BMBF grant 031 A538A RBC) and the German Research Foundation (DFG grant TR175). L.C. thanks Stephen Floor (UCSF) for support and feedback during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Initial study was conceived by L.C. and U.O. L.C. ideated and implemented the ORFquant pipeline, with supervision from U.O. All data analysis and visualization was performed by L.C. Ribosome footprinting libraries in K562 and HepG2 were performed by A.H. The manuscript was written by L.C. and U.O., with additional input by A.H.

Corresponding authors

Correspondence to Lorenzo Calviello or Uwe Ohler.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Anke Sparmann was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Transcript selection and ORF quantification statistics.

In a) the number of selected transcripts per gene (x-axis) against the number of genes and their TPM levels. In b) Number of genes and their TPM values are plotted against the number of detected ORFs. In c) the number of genes (top), their TPM values, the number of unique ORF features and the variability of RSEM quantification estimates (averaged across all the ORFs per gene) are plotted against the contribution (in percentages) of the major ORF. As for b), the maximum width of each violin plot is the same for each panel, and the median value is shown as a black bar. In d) RSEM quantification (on the x axis, IsoPct_from_PME_TPMs) plotted against ORFquant quantification estimates (ORF_pct_P_sites_pN). Both values correspond to length-normalized quantification estimates.

Extended Data Fig. 2 ORFquant quantifies translation on alternatively spliced isoforms.

Aggregate plots of Ribo-seq coverage (normalized 0–1 per each region) and ORF coverage (ORF_pct_P_sites_pN) over different candidate alternative splice sites. No mixture indicates the presence of a single ORF only, while other lines indicate the presence of additional ORFs, divided by their translation values. Explanatory schemes are depicted at the bottom of each plot, with blue representing the major ORF and red the additional ORF(s).

Extended Data Fig. 3 Polysome profiles of alternative isoform with different translation output.

Distributions of exonic log2 fold changes between different polysome profiles and cytoplasmic abundance. Lowly translated ORFs are depleted in heavier polysome fractions, while highly translated ORFs show signal in all fractions.

Extended Data Fig. 4 Statistics on ORFs detection and quantification in the assayed cell lines.

In a) number of genes and average Ribo-seq signal per gene (y-axes) against the number of detected ORFs (x-axis), for the assayed cell lines. In b) number of genes and average Ribo-seq signal per gene (y-axes) against the translation of the major ORF (x-axis), for the assayed cell lines. As in a), the maximum width of each violin plot is the same for each panel, and the median value is shown as a black bar.

Extended Data Fig. 5 Genes expressing one or multiple major ORFs across different cell lines.

In a) average length-normalized translation of the major ORF and average Ribo-seq gene signal per gene (y-axes), plotted against the number of cell lines where the gene harbored a detected ORF. Colors indicate the gene biotype. Values were plotted dividing genes according the number of district major ORF detected across cell lines. The maximum width of each violin plot is the same for each panel, and the median value is shown as a black bar. In b) top enriched GO categories for genes translating one (right) or multiple (left) major ORFs across cell lines.

Extended Data Fig. 6 Examples of genes exhibiting translation of multiple major ORFs.

Translation on multiple major ORFs per gene for the a) EP400NL, b) GAS7, and c) IFRD1 genes, representing examples of alternative splicing, transcription start site or small ORF usage. Displayed tracks represent, in descending order: gene annotation and (for each cell line): P-sites positions, junction reads, ORF coverage (defined as % of gene translation) and quantified ORFs. In b) and c) intron sizes were scaled to a maximum of 300 and 150 nt respectively.

Extended Data Fig. 7 Ribo-seq assisted protein detection and quantification across cell lines.

In a) the overlap between in-silico generated tryptic peptides using the full GENCODE25 database or ORFquant-derived protein sequences in the 6 cell lines analyzed. Up to two missed cleavage events were allowed. In b) iBAQ values (y-axis) are plotted against length-normalized translation quantification estimates, for each cell line. Each plot is divided by number of unique peptides (from proteomics) and unique features with reads (from Ribo-seq).

Extended Data Fig. 8 Agreement between transcript and protein abundance.

iBAQ values (y-axis) plotted against (x-axis) gene-level TPM values from RNA-seq. Plots are divided according to the number of unique peptides detected. Number of proteins, together with Pearson and Spearman correlations, is shown for each plot.

Extended Data Fig. 9 Translation quantification and protein abundance for multiple gene products.

Pearson and Spearman correlation coefficients between ORFquant-derived % of gene translation and % of gene protein abundance. In each plot, translation quantification is show with (right) and without (left) adjusting for the presence of multiple ORFs. Size and color of each data point indicate the number of unique peptides detected.

Extended Data Fig. 10 Increased agreement between translation and protein abundance using signal from all reads.

Correlation between translation quantification and protein abundance using all reads, as shown in Fig. 5 using uniquely mapping reads only. Values are shown for all proteins (left), or as percentages of gene output for protein isoforms (right).

Supplementary information

Reporting Summary

Supplementary Table 1

Summary of Ribo-seq datasets analyzed in this study.

Supplementary Data 1

Archive containing all P-site positions and junction reads (using uniquely mapping reads), together with the set of ORFquant identified ORFs, for each cell line.

Supplementary Data 2

Archive containing the set of identified peptides and proteins, including their Ribo-seq statistics, the parameters used for the MaxQuant run and the custom protein database.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Calviello, L., Hirsekorn, A. & Ohler, U. Quantification of translation uncovers the functions of the alternative transcriptome. Nat Struct Mol Biol 27, 717–725 (2020). https://doi.org/10.1038/s41594-020-0450-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41594-020-0450-4

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing