Review
Challenges in defining the role of intron retention in normal biology and disease

https://doi.org/10.1016/j.semcdb.2017.07.030Get rights and content

Highlights

Abstract

RNA sequencing has revealed a striking diversity in transcriptomic complexity, to which alternative splicing is a major contributor. Intron retention (IR) is a conserved form of alternative splicing that was originally overlooked in normal mammalian physiology and development, due mostly to difficulties in its detection. IR has recently been revealed as an independent mechanism of controlling and enhancing the complexity of gene expression. IR facilitates rapid responses to biological stimuli, is involved in disease pathogenesis, and can generate novel protein isoforms. Many challenges, however, remain in detecting and quantifying retained introns and in determining their effects on cellular phenotype. In this review, we provide an overview of these challenges, and highlight approaches that can be used to address them.

Introduction

Advances in genome and transcriptome sequencing and gene annotation have revealed the widespread diversification of the proteome via alternative splicing across eukaryotic taxa [1], [2]. For example, exon combinatorics enable neuronal synapse specification via neurexin-neuroligin interactions [3]; maintenance of cell differentiation state as a result of alternative splicing of core pluripotency factors [4]; the immune response in the context of T-cell activation [5], [6]; and a plethora of other disease-related biological processes. Furthermore, introns themselves can contain cis and trans-acting elements including regulatory non-coding RNAs, such as small nucleolar RNAs [7] and microRNAs [8]. While these RNA transcripts are physically located in the intronic region, they can be transcribed independently of their host gene [9], [10].

Intron retention (IR) is a form of alternative splicing characterised by the inclusion of intronic sequence in a mature transcript (Fig. 1). IR is widespread in plants, fungi and unicellular eukaryotes [11], [12], [13], but was previously thought to be nearly absent and/or irrelevant in animals [14], [15], [16]. Next generation sequencing has propelled the detection and quantitation of RNA originating from introns to an unprecedented extent. Ameur et al. [17] observed significant numbers of sequencing reads mapping to introns in total RNA libraries from human brain. Nonetheless, they considered these intron-retaining transcripts as evidence for co-transcriptional splicing, as immature RNA transcripts, and not bona fide functional molecules.

Recent work has revealed that IR is not an artefact of sequencing library preparation, and is actually much more widespread in the animal kingdom than originally thought. Three percent of Drosophila introns are nearly completely retained after splicing [18], and IR occurs in 50–75% of multiexonic genes across 11 animal species, ranging from chicken and frog – to platypus and human [19]. Our lab has demonstrated that IR affects 80% of coding genes in human, especially those involved in the cell cycle and differentiation [20]. IR has also been reported in approximately 5–6% of expressed genes in the mouse cortex [21].

The fate of intron-retaining transcripts (IRTs) in animals can be quite varied (Fig. 1). A subset is retained in the nucleus (these are said to be affected by a process called intron detention (ID) [22]), while others are exported to the cytoplasm. When exported into the cytoplasm, IRTs interact with the ribosome, and undergo NMD if premature termination codons (PTCs) are detected in them during the pioneer round of translation [23], [24]. Some IRTs escape NMD to generate new protein isoforms [25], while in other cases, signals in the retained intron serve to specify the subcellular localisation of the transcript or protein [26]. Other functions that have been proposed for IRTs include the regulation of intron-derived microRNA precursor (mirtron) and snoRNA expression, and the modulation of post-transcriptional gene regulation by acting as competing endogenous RNA [27]. However, these functions are not as yet validated experimentally.

IR is a tissue-specific phenomenon in animals, which further supports its role as a mechanism of gene expression regulation. A higher proportion of introns is retained in neural and immune cell types, whereas IR events are less frequent in embryonic stem and muscle cells [19]. Increased IR in neuronal and immune cells may facilitate rapid respond to external stimuli, within a time frame shorter than that required for de novo transcription and protein synthesis [6], [21]. Specific IR patterns can be characteristic of cell subtypes as well, for example in luminal and myoepithelial breast cells [28]. In a reanalysis of over 2500 mRNA sequencing datasets, over 15000 introns retained in at least one dataset were retained in fewer than 7% of all samples considered, further supporting the tissue-specificity of this process [20]. This specificity can enable tissue-specific sequestration of intron-retaining transcripts, serving to restrict the translation of proteins only to the cells where they are required, while concurrently maintaining transcription from the locus of origin in other tissues. Such a mechanism of action has been demonstrated, for example, in genes encoding critical presynaptic proteins in both neurons and non-neuronal cells: only in neuronal cells is the last intron spliced out, preventing RNA degradation and enabling fully spliced transcripts to be exported from the nucleus [29].

IR is tightly regulated during differentiation and development. IR increases in key myeloid-related genes during granulocyte differentiation, leading to reduced RNA and protein levels, critical for the maturation of granulocytes [30]. Differential IR is also a characteristic of other cells of the hematopoietic lineage, including erythroblasts [31], megakaryocyte progenitors [32], and CD4+ T-cells transitioning from an inactive to active state [6]. IR mediates the down-regulation of genes involved in cell cycle progression and up-regulation of genes with neuron-specific functions during the differentiation of embryonic stem cells into neural progenitors [19], and during reprogramming of mouse embryonic fibroblasts to induced pluripotent stem cells [33]. A subset of intron-containing mature mRNAs are shielded from rapid degradation in embryonic stem cells via their sequestration in the nucleus [22]. IR is also inversely correlated with gene expression levels during the reprogramming of mouse embryonic fibroblasts [33]. Recently, IR in polyadenylated transcripts has been shown to be crucial for modulating mouse cortical mRNA dynamics in response to neuronal activity. Over 200 retained introns were spliced out within 15 min in response to depolarisation [21], and a significant proportion of glutamate receptor transcripts are preferentially affected by IR in the cerebellum [34].

IR is widespread across a range of cancer transcriptomes [35]. It has been described as a mechanism of tumour suppressor inactivation [36], and is the predominant form of alternative splicing in hypoxic tumour cells [37]. In hypoxia, IR leads to a reduction in protein levels of the critical cytotoxic response regulator HDAC6 and DNA double strand break pathway member TP53BP1 [37]. Roles for IR in other diseases are currently being investigated [27].

Section snippets

Intronic regions are rich in repetitive sequences and longer than exonic regions

Introns are rich in repetitive sequences, containing over double the density of these elements as exonic regions (Fig. 2). These include Long and Short Interspersed Nuclear Elements (LINEs and SINEs), DNA transposons, tandem and low complexity repeat sequences. Most of these genomic features are longer than the 75–150 nucleotide read length characteristic of current high-throughput RNA sequencing technologies. This presents a unique challenge since RNA-optimised mapping algorithms such as

Obstacles in phylogenetic IR analyses

We and others have shown that functionally related genes are affected by IR in humans and mice [19], [22], [30]. Phylogenetic IR analyses present the prospect of shedding light on the evolution and functional conservation of IR, but studying the conservation of IR comes with several challenges discussed below.

Detecting IRTs undergoing degradation is challenging

Numerous IRTs may be undetectable using RNA sequencing as they are rapidly degraded by NMD [65]. As reviewed elsewhere, NMD allows elimination of mRNAs with PTCs typically positioned more than 50–55 nucleotides upstream of their last exon–exon junction [66], [67]. As such, the majority of intron-retaining transcripts are computationally predicted as NMD targets [19], [30], [36]. However, some PTC-containing IRTs may escape NMD, and are translated. For example, the IR form of the endoplasmic

The extent and function of intron-retaining transcripts in the nucleus remain poorly understood

When carrying out RNA sequencing, most studies do not first perform subcellular fractionation. This makes it impossible to assess what proportion of polyadenylated and non-polyadenylated transcripts with intronic reads is sequestered in the nucleus. Introns in these transcripts have been termed “detained” (ID), and these RNA can be degraded or stored in the nucleus, in contrast with classical “intron-retaining” transcripts exported to the cytoplasm and accessible to the translation and NMD

Lack of protocols for isolation of subcellular compartments precludes identification of IR transcripts targeted to them

The presence of global subcellular compartments such as the cytoplasm and nucleus has been recognised since the earliest days of microscopy. However, there is less clarity on how many other spatial compartments and sub-compartments exist within cells, and what defines the RNA and protein molecules that need to be trafficked there [79].

Several studies have reported how IR can serve as a signal for subcellular targeting. IR was crucial for the correct localization of 33 neuronal mRNAs to

Detecting protein-coding intron-retaining transcripts remains challenging

It has long been hypothesised that the pioneer round of translation is a prerequisite for NMD and therefore most IRTs may be translated at least once [83], [84]. However, proteins originating from intron-retaining genes in 9 tissues had significantly lower protein output than those from non intron-retaining genes [20]. Strikingly, when examining ribosome binding data for IR events in a human cell line, there were no reads observed from retained introns, even though they were identified in

The mechanisms regulating IR are complex

One of the major questions in understanding the roles of IR in normal biology and disease is why some introns are retained while others are not. Several factors are known to be involved in IR regulation including the expression levels of splicing factors, RNA polymerase II occupancies and epigenetic changes [19], [20], [28], [30], [96], [97]. Our group has previously reported the reduced expression of exon-defining splicing factors in granulocytes that harbour higher levels of IR than their

Conclusions

IR is an important form of alternative splicing that is crucial for normal human, animal and plant biology. However, many challenges remain in identifying IRTs and understanding their functions. These challenges include the complexity of genome structure and organisation in higher organisms, and the plethora of algorithmic and bioinformatic difficulties in identifying and quantitating retained introns. This could be addressed by comparative studies of informatics methodologies, which have a

Funding

DPV is supported by the National Health and Medical Research Council [grant number 1080530] and the Cure the Future Foundation. US is supported by the Sydney Research Excellence Initiative and the Cure the Future Foundation. JJ-LW is supported by the National Health and Medical Research Council [grant numbers 1129901, 1080530, 1128175, 1126306]. JEJR is supported by the National Health and Medical Research Council [grant numbers 1061906, 1129901, 1080530, 1128175]; the Cure the Future Foundation

References (104)

  • O. Muhlemann et al.

    Recognition and elimination of nonsense mRNA

    Biochim. Biophys. Acta

    (2008)
  • D.A. Solomon et al.

    Cyclin D1 splice variants: differential effects on localization, rb, phosphorylation, and cellular transformation

    J.Biol. Chem.

    (2003)
  • R.S. Seelan et al.

    Identification of myo-inositol-3-phosphate synthase isoforms: characterization, expression, and putative role of a 16-kDa γc isoform

    J.Biol. Chem.

    (2009)
  • H.H. Nguyen et al.

    Phenotypic, metabolic, and molecular genetic characterization of six patients with congenital adrenal hyperplasia caused by novel mutations in the CYP11B1 gene

    J. Steroid Biochem. Mol. Biol.

    (2016)
  • R. Guo et al.

    BS69/ZMYND11 reads and connects histone H3.3 lysine 36 trimethylation-decorated chromatin to regulated pre-mRNA processing

    Mol. Cell

    (2014)
  • Q. Pan et al.

    Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing

    Nat. Genet.

    (2008)
  • B.-B. Wang et al.

    Genomewide comparative analysis of alternative splicing in plants

    Proc. Natl. Acad. Sci. U. S. A.

    (2006)
  • C. Reissner et al.

    Neurexins

    Genome Biol.

    (2013)
  • A. Kalsotra et al.

    Functional consequences of developmentally regulated alternative splicing

    Nat. Rev. Genet.

    (2011)
  • V. Cho et al.

    The RNA-binding protein hnRNPLL induces a T cell alternative splicing program delineated by differential intron retention in polyadenylated RNA

    Genome Biol.

    (2014)
  • T. Ni et al.

    Global intron retention mediated gene regulation during CD4+ T cell activation

    Nucleic Acids Res.

    (2016)
  • T. Hirose et al.

    Position within the host intron is critical for efficient processing of box C/D snoRNAs in mammalian cells

    Proc. Natl. Acad. Sci. U. S. A.

    (2001)
  • Y.-K. Kim et al.

    Processing of intronic microRNAs

    EMBO J.

    (2007)
  • D. Lutter et al.

    Intronic microRNAs support their host genes by mediating synergistic and antagonistic regulatory effects

    BMC Genomics

    (2010)
  • X. Gao et al.

    Enemy or partner: relationship between intronic micrornas and their host genes

    IUBMB Life

    (2012)
  • H. Ner-Gaon et al.

    Intron retention is a major phenomenon in alternative splicing in Arabidopsis

    Plant J.

    (2004)
  • Y. Marquez et al.

    Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis

    Genome Res.

    (2012)
  • A. Sebé-Pedrós et al.

    Regulated aggregative multicellularity in a close unicellular relative of metazoa

    Elife

    (2013)
  • P.A. Galante et al.

    Detection and evaluation of intron retention events in the human transcriptome

    RNA

    (2004)
  • N.J. Sakabe et al.

    Sequence features responsible for intron retention in human

    BMC Genomics

    (2007)
  • E.T. Wang et al.

    Alternative isoform regulation in human tissue transcriptomes

    Nature

    (2008)
  • A. Ameur et al.

    Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain

    Nat. Struct. Mol. Biol.

    (2011)
  • Y.L. Khodor et al.

    Nascent-seq indicates widespread cotranscriptional pre-mRNA splicing in Drosophila

    Genes Dev.

    (2011)
  • U. Braunschweig et al.

    Widespread intron retention in mammals functionally tunes transcriptomes

    Genome Res.

    (2014)
  • R. Middleton et al.

    IRFinder: assessing the impact of intron retention on mammalian gene expression

    Genome Biol.

    (2017)
  • P.L. Boutz et al.

    Detained introns are a novel, widespread class of post-transcriptionally spliced introns

    Genes Dev.

    (2015)
  • O. Jaillon et al.

    Translational control of intron splicing in eukaryotes

    Nature

    (2008)
  • A.M. Gontijo et al.

    Intron retention in the Drosophila melanogaster Rieske Iron Sulphur Protein gene generated a new protein

    Nat. Commun.

    (2011)
  • J.J.L. Wong et al.

    Intron retention in mRNA: no longer nonsense: known and putative roles of intron retention in normal and disease biology

    Bioessays

    (2016)
  • P. Gascard et al.

    Epigenetic and transcriptional determinants of the human breast

    Nat. Commun.

    (2015)
  • K. Yap et al.

    Coordinated regulation of neuronal mRNA steady-state levels through developmentally controlled intron retention

    Genes Dev.

    (2012)
  • H. Pimentel et al.

    A dynamic intron retention program enriched in RNA processing genes regulates gene expression during terminal erythropoiesis

    Nucleic Acids Res.

    (2016)
  • S.M.I. Hussein et al.

    Genome-wide characterization of the routes to pluripotency

    Nature

    (2014)
  • S. Martin et al.

    Preferential binding of a stable G3BP ribonucleoprotein complex to intron-retaining transcripts in mouse brain and modulation of their expression in the cerebellum

    J. Neurochem.

    (2016)
  • H. Dvinge et al.

    Widespread intron retention diversifies most cancer transcriptomes

    Genome Med.

    (2015)
  • H. Jung et al.

    Intron retention is a widespread mechanism of tumor-suppressor inactivation

    Nat. Genet.

    (2015)
  • D. Memon et al.

    Hypoxia-driven splicing into noncoding isoforms regulates the DNA damage response

    npj Genomic Med.

    (2016)
  • D. Kim et al.

    TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions

    Genome Biol.

    (2013)
  • A. Dobin et al.

    STAR: ultrafast universal RNA-seq aligner

    Bioinformatics

    (2013)
  • K. Wang et al.

    MapSplice: accurate mapping of RNA-seq reads for splice junction discovery

    Nucleic Acids Res.

    (2010)
  • Cited by (39)

    • Characterization of the rat Acetylcholinesterase readthrough (AChE-R) splice variant: Implications for toxicological studies

      2020, Biochemical and Biophysical Research Communications
      Citation Excerpt :

      Best_Reciprocal_Hits (BRH) was used to identify orthologous exon pairs based on the evolutionary conservation of their nucleotide sequences [21,22]. The boundaries of rat AChE intronic sequences were deduced from the genomic co-ordinates of their flanking orthologous exons [23]. The RefSeq mRNA (NM_172009) and the predicted rat AChE-R splice variant provided target sequences for the design of PCR primers.

    • Challenges in detecting and quantifying intron retention from next generation sequencing data

      2020, Computational and Structural Biotechnology Journal
      Citation Excerpt :

      Another approach, implemented in [4] is to verify that the ratio of the number of reads that map to intergenic regions to the number that maps to coding regions is less than 10%. Another source of bias is that intronic reads may originate from nascent and pre-mature RNAs [23]. So as to lessen signals due to unprocessed transcripts and overlapping antisense transcripts, it is recommended to use Poly-A enriched RNA-seq (or cytoplasmic fractionation) and strand-specific protocols [4].

    View all citing articles on Scopus
    View full text