Intron Length and Recursive Sites Are Major Determinants of Splicing Efficiency in Flies February 12 , 2017

The dynamics of gene expression may impact regulation, and RNA processing can be rate limiting. To assess rates of pre-mRNA splicing, we used a short, progressive metabolic labeling/RNA sequencing strategy to estimate the intron half-lives of ~30,000 fly introns, revealing strong correlations with several gene features. Splicing rates varied with intron length and were fastest for modal intron lengths of 60-70 nt. We also identified hundreds of novel recursively spliced segments, which were associated with much faster and also more accurate splicing of the long introns in which they occur. Surprisingly, the introns in a gene tend to have similar splicing half-lives and longer first introns are associated with faster splicing of subsequent introns. Our results indicate that genes have different intrinsic rates of splicing, and suggest that these rates are influenced by molecular events at gene 59 ends, likely tuning the dynamics of developmental gene expression.


Surprisingly, the introns in a gene tend to have similar splicing half-lives and longer first
introns are associated with faster splicing of subsequent introns. Our results indicate that genes have different intrinsic rates of splicing, and suggest that these rates are influenced by molecular events at gene 5' ends, likely tuning the dynamics of developmental gene expression.
To globally assess the kinetics of gene expression in Drosophila melanogaster, we applied a short time period metabolic labeling strategy involving 5, 10, or 20 min labeling with 4-thio-uracil (4sU) in S2 cells, followed by RNA sequencing 1 . These data were complemented by steady state RNA-seq data from 24 h 4sU-labeled RNA representing predominantly mature mRNA (Methods). As expected, read density in introns was highest at the 5 min time point and decreased rapidly to low levels at longer times (Fig. 1A). To assess the kinetics of splicing, we measured the proportion of intron-containing transcripts or percent spliced in (PSI or Y) value of each intron -representing the proportion of transcripts of a gene that containing the intron in an unspliced state -using the MISO software 2 . Intron PSI values decreased over time for > 98% of introns, reflecting the progress of splicing ( Supplementary Fig. 1A).
The labeling design results in the isolation of transcripts that initiated transcription during the labeling period, as well as some transcripts that initiated earlier and continued their elongation during this period. Therefore, we inferred the mean time since synthesis for each intron captured under each given labeling regime, taking into account rate of transcription, and used these times in estimation of intron half-lives (Methods). Using these inferred synthesis times, we fit a first-order exponential decay model to PSI values (  Table 1). The median intron half-life (t ½ ) was 4.0 minutes. We obtained similar PSI values when using exclusively junction-spanning reads to assess splicing ( Supplementary Fig. 1E), indicating minimal impact on estimated rates from lariat-derived reads, as expected (intron lariat half-lives < 15 seconds 3,4 ). Therefore, the intron half-lives represent splicing half-times. In the remainder of this study we used MISO-derived splicing half-times because MISO PSI values fit better to exponential decay models than those based on junction reads alone ( Supplementary Fig. 1C).
Our estimated rates of intron removal are consistent with previous reports of 0.5 -10 minutes per intron in a handful of mammalian genes using qRT-PCR or imaging approaches 4-6 , but slightly longer than recently estimated rates of < 2 min in yeast 7 . For typical Drosophila genes of 3-9 kb in length, the estimated time of transcription is about 2-6 minutes 8 . Thus, the intron half-lives we estimate (with median t ½ = 4 min) are consistent with common but not universal co-transcriptional splicing [9][10][11] . Based on simulations, our estimates are reasonably accurate and unbiased for half-lives between 3 and 300 min but have reduced accuracy for introns with splicing half-times of < 3 min or > 300 min (Supplementary Methods, Supplementary Fig. 1F). A half-life of 3 min would typically produce a PSI < 0.10 at the 10 min time point, and 9% of introns had PSI < 0.10 at this time point. This observation suggests that about 1 in 10 fly introns has a splicing half-time of less than 3 min; an alternative approach for half-life estimation was used for this subset to reduce bias (Methods). A similar analysis using data from the long time points suggests that >99.7% of analyzed fly introns have half-lives < 300 min, and our approach using MISO may somewhat overestimate the half-lives of longer introns for which the time required to transcribe the intron approaches or exceeds the duration of labeling (see Methods). For this reason, we used alternative analyses to study the splicing of very long introns by recursive mechanisms below.
The distribution of lengths of Drosophila introns has a sharp peak at 60-70 nt, with more than half of introns between 40-80 nt in length, and the remainder distributed over a broad range extending to tens of kilobases (kb) or more 12 (Supplementary Fig. 2A). This observation and evidence that natural selection favors short intron lengths in Drosophila [13][14][15] led us to hypothesize that introns with lengths close to the modal size of ~65 nt are spliced most efficiently 12 . We observed a strong relationship between intron length and splicing half-times (Spearman rho P < 2.2 ´ 10 -16 , Fig. 1C), with 30.7% of variance in splicing rates explained by intron length (Supplementary Methods). Specifically, introns with lengths in the range 60-70 nt were spliced most rapidly (median t ½ = 3.3 min, Fig. 1D), and median t ½ increased steadily to 20.3 min for introns > 10 kb. Genes containing introns with the shortest half-lives (< 3 min) had expression levels 2-to 5-fold higher than genes with slower-spliced introns of similar lengths, suggesting a relationship between expression and splicing rate.
Notably, for introns shorter than 60 nt, a negative relationship between intron length and splicing half-time was observed, and "ultra-short" introns with lengths between 40-50 nt had a median t ½ of 9.2 min, about 2.5-fold longer than that for the 60-70 nt class (Fig. 1D). Though ultra-short introns also have significantly weaker 5' and 3' splice sites 16 (Supplementary Fig.   2B), differences in splice site strength do not fully explain why introns < 50 nt are spliced so slowly ( Supplementary Fig. 2C), suggesting that small size itself presents a barrier to efficient splicing 17 . Introns longer than 70 nt tend to have stronger splice sites ( Supplementary Fig. 2D) and yet are spliced more slowly (whether controlling for splice site strength or not, Supplementary Fig. 2E), suggesting that bringing together intron ends for splicing may contribute materially to the time required for splicing 18 . However the time required to juxtapose intron ends cannot explain why introns with 40-50 nt lengths are spliced 2.5 times more slowly than those in the "optimal" 60-70 nt range, suggesting that some other factor -perhaps steric interference between snRNPs bound at the 5' splice site and branch site 19,20 -becomes suboptimal for the shortest Drosophila introns.
Metazoan introns are thought to be spliced in one of two modes -either by "intron definition", in which the U1 and U2 snRNPs that recognize the 5' splice site and branch point sequence first pair across the intron; or "exon definition", in which U1 snRNP initially pairs with the upstream U2 snRNP across the exon, followed by reassortment to form interactions with the downstream U2 snRNP across the intron 21 . Both modes occur in Drosophila, depending on exon-intron geometry, with short introns favoring intron definition and long introns/short exons favoring exon definition 21 , so the overall slower splicing of long introns likely results, at least in part, from the extra time required for the additional molecular events involved in exon definition.
In mammals, it has been observed that PolII elongation rates are slower near regulated introns and that alternative introns are spliced more slowly [22][23][24][25] . Comparing to constitutive introns (CI), we observed that introns that are alternatively retained (RI) in other cells/tissues (but fully spliced out in S2 cells) have somewhat longer splicing half-times, independent of intron length, as do introns that flank exons that are alternatively skipped in other cells/tissues (SEflanking) (all Mann-Whitney P < 0.01 for comparisons between CI & RI, except in largest bin, and for all comparisons between CI & SEflanking; Fig. 1E). This observation suggests that the capacity for regulation may impose limits on splicing rate.
The long half-lives for the longest introns suggests that auxiliary splicing mechanisms might be critical for efficient splicing of very long introns in Drosophila. In particular, recursive splicing is a process associated with long introns in which a single intron is removed in multiple splicing events defined by recursive sites, which consist of juxtaposed 3' and 5' splice site motifs around a central AG/GT 26,27 . Some 130 recursively spliced introns have previously been identified in flies based on analysis of over 10 billion RNA-seq reads from the entire Drosophila ModENCODE project 28 , suggesting that this phenomenon is relatively rare. However, the transient nature of recursive splicing intermediates makes it difficult to detect evidence for recursive splicing using standard RNA-seq data.
We hypothesized that our high-coverage nascent RNA data would more readily identify transient recursive events and better characterize the prevalence of recursive splicing. To do so, we used a computational pipeline to confidently identify recursive events using three key features ( Fig. 2A, Supplementary Fig. 3). First, we searched for splice junction reads derived from putative recursive sites (RatchetJunctions), as previously described 28 . Second, we developed a new computational tool, RatchetPair, to identify read pairs that map to sites flanking putative recursive segments in manner where presence of splicing can be inferred from the size distribution of library fragments. Third, we developed the first automated software, RatchetScan, for inference of recursive sites from sawtooth patterns in read density. This type of pattern is an expected product of co-transcriptional recursive splicing and has been observed as a feature associated with many recursive introns 10,28 .
Combining these three approaches, our analysis detected 541 candidate recursive sites in 379 fly introns (Supplementary Table 2). From this set, we curated a set of 243 "high confidence" recursive sites in 157 introns (with an FDR of 5%), and a "medium confidence" set of 298 sites (at an FDR of 20%; Methods). Overall, 98 introns contained multiple recursive sites, with up to seven such high-confidence sites observed in a single intron. For instance, intron 1 of the tenascin major (Ten-m) gene contains five recursive sites, two of which were previously unknown ( Fig. 2A). Of the recursive sites previously reported by Duff et al., 124 occurred in genes expressed in S2 cells. Our approach detected 119 (96%) of these known sites, as well as 126 novel high confidence sites and 296 novel medium confidence sites (Fig. 2B), thus increasing the number of recursive sites defined in this cell type by ~4-fold (Fig. 2B, Supplementary Fig. 3C). Both the high confidence and the medium confidence candidate recursive sites exhibited a strong juxtaposed 3'/5' splice site motif ( Supplementary Fig. 4A). The greater numbers detected by our approach (2-4X more sites in this cell type), using less than 1/20 th as many reads as used by Duff and colleagues, confirms the potential of nascent RNA analysis for analysis of recursive splicing.
Many very long introns (> 40 kb in length) have recursive sites, with 63% of such introns containing at least one high-confidence recursive site, and 70% when considering all identified recursive sites (Fig. 2C). This observation suggests that recursive splicing is the prevalent mechanism by which very large fly introns are excised. We assessed the sensitivity of our detection pipeline by running it on subsamples of reads ranging from 0.1% to 100% of the total (Fig. 2D). The shape of the resulting curve tapered off at higher coverage levels but never plateaued, indicating that new recursive sites were still being detected as read depth increased from 50% to 100% and therefore would likely increase further at higher read depths. A somewhat higher proportion of recursive sites were detected in high-expressed genes (TPM > 20) than low-expressed genes (TPM ≤ 20). However, subsampling of the reads mapping to high-expressed genes to levels comparable to those observed for low-expressed genes resulted in a substantially lower fraction of recursive sites at each depth, suggesting that very long introns in low-expressed genes are more likely to have recursive splicing than those in highexpressed genes (Fig. 2D). Together, these data suggest that the true fraction of very long introns that contain recursive sites may be substantially higher than our observed fraction of 63-70%, i.e. recursive splicing may be nearly universal in very long introns.
Recursive splice sites can be required for the processing of long introns 27 . The surprisingly widespread occurrence of recursive splice sites observed here raises the possibility that this mode of splicing has a substantial impact on processing of the fly transcriptome.
Alternatively, it is possible that most recursive sites are functionally neutral, and that mRNA production is not impacted by their presence. The size of our dataset enabled us to examine four properties of recursive sites that could help to distinguish between these possibilities: sequence conservation; distribution in the fly genome; distribution within introns; and kinetics of splicing. In each case, the patterns observed suggest that most or all recursive sites have functional impact.
Both high and medium confidence recursive sites exhibited twice the level of evolutionary conservation observed in and around control AGGT motifs in long introns ( Supplementary Fig. 4B), implying strong selection to maintain most or all of these sites.
Recursively spliced introns were enriched in genes involved in functions related to development and morphogenesis (Supplementary Table 3). Both of these observations are consistent with results from a previous study based on a smaller sample of recursive introns 28 .
It is still possible that longer introns contain more recursive sites purely by chance, rather than due to functional constraints on intron architecture. Indeed, while the majority of recursively spliced introns had just one recursive site, the number of sites increased roughly linearly with intron length (Supplementary Fig. 4C). However, the positioning of recursive sites within introns was significantly biased away from a random (uniform) distribution (Kolmogorov-Smirnov P = 0.003; Fig. 2E). Instead, recursive sites in introns with only one such site tended to be located closer to the midpoint of the intron than expected by chance. Furthermore, the first recursive site in introns with two or three such sites tended to be located at approximately 33% and 25% of the way from the 5' end of the intron, respectively (Fig. 2E inset). The distribution of recursive sites within introns suggests that they are positioned so as to break the larger intron into "bitesized" segments for the spliceosome (typically ~9-15 kb in length) rather than at random locations which would more often produce much longer and much shorter segments.
Recursively spliced introns were also enriched in first introns relative to subsequent introns in fly genes (hypergeometric P < 0.05).
To ask whether recursive splicing contributes to the efficiency of processing of very long introns, we estimated the splicing half-times of individual recursive segments (Methods).
Recursive segment half-times were the slowest for the first segment in the intron ( Supplementary Fig. 5A), and the preponderance of junction reads spanning the 5' splice site and the recursive site indicate that recursive segments are most often spliced out in a 5' to 3' order ( Supplementary Fig. 5B). Overall, recursive segments (mean length 9.1 kb, Supplementary Fig. 5C) had three-fold shorter half-times than non-recursive introns of the same length ( Fig. 2F), supporting the conclusion that recursive splicing increases the speed of splicing for very long Drosophila introns. The overall rate of splicing of a recursive intron is likely comparable to that of its slowest recursive segment, whose splicing is expected to be rate limiting. Splicing half-times for the slowest segment of recursive introns were also at least twofold shorter than those of non-recursive introns with lengths matching the entire recursive intron, again suggesting that recursive sites enhance efficiency of splicing (Supplementary Fig. 5D; Mann-Whitney P < 2.2 ´ 10 -16 ). A more involved analysis, estimating the mean lifetime of a recursive intron as the maximum of a set of exponentials (corresponding to the waiting time for all recursive segments to be spliced) also yielded significantly shorter half-times for recursive than non-recursive introns of similar size (Mann-Whitney P = 0.0013; Supplementary Fig. 5E).
Splicing accuracy is likely to be at least as important as speed, since splicing to an arbitrary (incorrect) splice junction will most often produce an mRNA that does not encode functional protein. As a simple measure of potential splicing errors, we tallied the fraction of reads that spanned "non-canonical" splice junctions, involving pairs of intron terminal dinucleotides other than the three canonical pairs "GT-AG", "GC-AG" and "AT-AC" that account for ~99.9% of all known fly introns. For the bulk of non-recursive introns (most of which are < 100 nt in length), the frequency of such non-canonical splicing was negligible (Fig. 2G, light grey dotted curve). However, for non-recursive introns with the much longer lengths typical of recursively spliced introns, potential splicing errors were much more frequent (Fig. 2G, gray curve), suggesting that the fly spliceosome loses accuracy as intron length increases. Notably, recursive introns had significantly lower frequencies of non-canonical junctions compared to similarly sized non-recursive introns (Fig. 2G, gold curve, Kolmogorov-Smirnov P = 0.015).
Therefore, presence of recursive splice sites may increase the accuracy as well as the speed of splicing.
Together, the results shown in Figures Figure 9). Overall, these other variables accounted for 29.5% of the variance in splicing rates. As expected, increased strength of both the 5' and 3' splice sites was associated with shorter intron half-life 29 . Length of the upstream exon and A+U content of the intron were also associated with shorter half-life. Longer upstream exons are expected to promote intron definition, which may speed splicing. Introns are U-rich in many metazoans and plants, and many proteins of the heterogeneous nuclear ribonucleoprotein (hnRNP) family bind A+U-rich motifs in introns and participate in splicing 30,31 . Surprisingly, however, two of the three most important features were properties of the gene rather than of the affected intron: gene expression and first intron length, both of which were associated with faster splicing of non-first introns in the gene. The relationship between gene expression and intron splicing rate is consistent with our observation that the fastest splicing introns tend to occur in higher-expressed genes.
The importance assigned to gene expression and the length of the first intron in a transcript, both gene-specific rather than intron-specific properties, raised the question of the relationship between splicing rates of different introns in the same gene. Since all of the introns in a transcript must be removed to produce a mature mRNA that can be exported and translated, natural selection (if present) may act primarily on the total elapsed time to splice all introns rather than on the splicing rates of individual introns. We observed that splicing half-lives of introns drawn from the same gene tend to be more similar to each other than those of randomly sampled introns, e.g., comparing standard deviations (Mann-Whitney comparison of SDs, P < 2.2 ´ 10 -16 , Fig. 3B). This relationship remained when considering the coefficient of variation rather than the standard deviation ( Supplementary Fig. 6B) and when controlling for  Table 5).  Fig. 7D), but this association was not significant in the more complex model of splicing rate including other variables described above, so was not further explored here.
We observed a moderate but significant negative correlation between the length of the first intron in a gene and the median half-lives across downstream introns (Spearman rho P < 4.4 ´ 10 -14 , Fig. 3E, Supplementary Fig. 8A-B). This relationship remained when controlling for variability in the lengths of non-first introns by restricting the analysis to genes whose non-first introns were all between 60-70 nt (Supplementary Fig. 8C). For example, the gene Gai, encoding a G-protein alpha subunit, has a long (~5 kb) first intron that is slowly spliced (t ½ = 9 min), while the remaining introns are all spliced rapidly with half-lives between 1 and 3 min (Fig.   3C). These observations suggest that the architecture of the 5' end of the gene, notably the length of the 5'-most intron, impacts the splicing efficiency of the remaining introns. Longer first introns might allow additional time for the recruitment of factors that promote downstream splicing efficiency, consistent with the observation that transcription elongation in mammalian cells is slow before reaching the second exon 34 .
Selection is likely to act on some genes to reduce the time to produce mature mRNA, particularly early in development when the time to produce proteins between cell cycles is short, and in stress response or signaling contexts where speed of response is important. Our findings suggest that a longer first intron requires more time to splice but may in some way enhance the  Figure 3A. When the rate of transcription initiation is higher, production of each transcript will occur closer to RNAs transcribed just prior. This proximity could lead to faster assembly of spliceosomes on new transcripts as a result of higher local concentration of spliceosome components recruited to and released from previously transcribed RNAs, particularly if splicing is generally co-transcriptional, as appears to be the case 10 . Curiously, increased length of the first intron -while requiring more time to process -was associated with faster splicing of the remaining introns of a gene, and with an increased number of introns per gene. Recursive sites, which appear to increase both the speed and accuracy of splicing, may be conserved in part to compensate for the increased splicing time and propensity for splicing error in long first introns.

Generation of 4sU RNA-seq
Newly transcribed RNA from 3 independent replicates of Drosophila S2 cells were labeled for 5, 10 and 20 minutes using 500 μM 4-thiouridine (Sigma, T4509). Additionally, two independent replicates of Drosophila S2 cells were labeled overnight with 4-thiouridine to approximate a steady-state RNA levels. To normalize samples and assess metabolic labeled RNA capture efficiency, several synthetic RNAs were spiked into the Trizol preparation at specific quantities per 10 6 cells. Quantities were determined as described previously 35  . Unlabeled RNA present in the supernatant was discarded. 4sU-RNA was eluted twice with 75 μl of freshly prepared 100 mM dithiothreitol (DTT). RNA was recovered from eluates by ethanol precipitation as described above.As per library preparation, RNA quality was assessed using a Bioanalyzer Nano ChIP (Agilent). Ribosomal RNA was removed prior to library construction by hybridizing to ribo-depletion beads that contain biotinylated capture probes (Ribo-Zero, Epicentre). RNA was then fragmented and libraries were prepared according to the TruSeq Stranded Total RNA Gold Kit (Illumina) using random hexamer priming. cDNA for the two 'total' RNA samples were prepared using an equal mix of random hexamers and oligo-dT primers. Illumina spike-ins were also used for normalization among samples.
Libraries were sequenced on an Illumina HiSeq machine with paired-end 51 nt reads (100 nt reads for the 'total' RNA samples), generating an average of 126M read pairs per library. Reads for each sample were filtered, removing pairs where the mean quality score of one or both mates fell below 20. One million pairs were then extracted at random, and aligned to the dm3 reference assembly (4sU RNA-seq, due to enhanced coverage of introns) or an index composed of all FlyBase release 5.57 transcripts (total RNA-seq) using bowtie 0.12.8 37

Estimating splicing from 4sU-RNA-seq data
To measure the efficiency at which each intron was spliced out in Drosophila S2 cells, we considered the percent of newly created transcripts that still had intronic reads in each timepoint. To do so, we used MISO 2 to calculate an intron-specific percent spliced in (PSI or Y) value, using reads within the intron body and junction reads spanning the 5' and 3' splice sites bordering the intron. We calculated a Y value for each annotated intron in each replicate library

Estimating intron half-lives
The labeling design used will result in the isolation of transcripts that initiated transcription at any time during the labeling period, of duration t labeling , as well as transcripts that were elongated during this period but initiated prior to the labeling period. For a transcript to be labeled during the labeling period and include portions informative about the splicing of an intron, the polymerase must have either: (i) transcribed the 3'-most base of the intron during the interval of labeling -between time 0 and t labeling or (ii) have transcribed this base before time 0, but not terminated transcription prior to time 0. The time since synthesis of the intron in case (i) above will be distributed uniformly between 0 and t labeling . And the time since synthesis of the intron in case (ii) will be distributed uniformly between t labeling and t labeling + t, where t is the time required to transcribe the portion of the transcript 3' of the intron. t was estimated as the distance from the 3'-most base in the intron to the annotated transcription endpoint, in nt, divided by the typical Drosophila transcription rate of 1500 nt/min 8,42 . We therefore estimated the mean time since synthesis for each intron captured, t inferred , as "#$%&&%' = *+,%*"#-+ 2 where t labeling is defined by the experimental condition, and t is calculated separately for each intron depending on its location in the transcript.
The approach outlined above rests on two main assumptions: 1) that all transcripts synthesized during the labeling period incorporate the 4sU label; and 2) that all transcripts containing label are equally likely to be captured. The first of these assumptions seem very plausible, while the second is undoubtedly an oversimplification. Additionally, our approach neglects any contribution from mRNA decay since mRNA half-lives are typically > 2 h in flies, much longer than typical splicing half-times estimated here 43 .
We used the set of Y values across inferred time points to estimate the rate at which each intron was excised by fitting a first-order exponential decay model:

Estimates for introns with short half-lives and long introns
The

Calculating splice site scores
We calculated the strength of splice sites using a maximum entropy model as implemented in maxEntScan 44 , using 9 bp around the 5' splice site (-3:+6) and 23 bp around the 3' splice site (-20:+3). These models were optimized on mammalian splice site preferences, but seem to be reasonable also for Drosophila and have been used in gene prediction in fly genomes.

Identifying enhancer regions
We used Drosophila transcriptional enhancers defined by the STARR-seq enhancer testing assay in Arnold et al. 33 . Significant STARR-seq peaks were overlapped with annotated Drosophila introns to identify transcriptional enhancers located within introns.

Identifying sites of recursive splicing
We used three features of recursive sites found in our nascent sequencing data to identify recursive sites: (1) splice junction reads derived from putative recursive sites

Splicing rates in recursively spliced introns
We quantified splicing rates for each recursive segment independently by mapping the original alignments onto new gene models with the upstream exon, recursive segment and downstream segment contiguous and then running the pipeline described above on these transformed alignments. Specifically, the new gene models consist of a single intron with the same length as the recursive segment flanked by exons with the same lengths as the original up and downstream exons.
Reads were mapped onto these gene models such that whether any other recursive splicing event has yet been completed was irrelevant. These reads were split into four classes: (1) Reads entirely in the recursive segment or either exon. Reads entirely inside any region were directly mapped onto their corresponding region in the new gene model.

Estimating splicing accuracy
We estimated the accuracy of splicing in Drosophila introns by identifying non-annotated junction reads with non-canonical splice site sequences within annotated introns. To do so, we first re-mapped the raw 4sU-seq reads with the STAR v2.5 software 46 , with the mapping parameter --outSAMattribute NH HI AS nM jM to mark the intron motif category for each junction read in the final mapped file.
The jM attribute adds a jM:B:c SAM attribute to split reads arising from exon-exon junctions.
All junction reads were first isolated and separated based on the value assigned to the jM:B:c tag. Junction reads spanning splice sites in the following categories were considered to be considered to arise from non-canonical non-annotated splice sites. We calculated the frequency of inaccurate splice junctions for each intron as a ratio of the density of reads arising from noncanonical non-annotated splice sites to the density of all junction reads from the intron.

Gene Ontology analyses
All Gene Ontology analyses were performed using the DAVID v6.7 database and biological processes gene ontology categories. For gene ontology analyses of recursively spliced introns, all genes with introns greater than 10,000 kb were used as a background set. For gene ontology analyses of fast or slowly spliced genes, all genes for which we were able to calculate a splicing half-life for at least 3 introns were used as a background set.

Data availability
Sequencing data have been deposited at the Gene Expression Omnibus (GEO) database under accession GSE93763.

Code availability
Code is available upon request and will be made available in a public repository prior to publication.