Identifying and characterizing functional 3' nucleotide addition in the miRNA pathway.

Over the past decade, modifications to microRNAs (miRNAs) via 3' end nucleotide addition have gone from a deep-sequencing curiosity to experimentally confirmed drivers of a range of regulatory activities. Here we overview the methods that have been deployed by researchers seeking to untangle these diverse functional roles and include characterizing not only the nucleotidyl transferases catalyzing the additions but also the nucleotides being added, and the timing of their addition during the miRNA pathway. These methods and their further development are key to clarifying the diverse and sometimes contradictory functional findings presently attributed to these nucleotide additions.


Introduction
The development and subsequent adoption of high-throughput sequencing as a tool to capture snapshots of the cellular transcriptome have considerably sharpened the collective resolution of cellular RNA landscapes. Advances have expanded our characterizations of the total transcribed coding and non-coding RNA complement [1, 2] through discoveries of entirely novel classes of RNA [3,4]. Advances have also demonstrated frequency and dynamics at which certain RNA classes are fragmented [5], and the frequency and diversity of post-transcriptional modifications to different RNA classes [6,7]. While work specific to this latter area of research has recently focused on adapting sequencing technologies to identify specific base modifications, such as methylation events [8], initial sequencing technologies were well-suited to directly detect terminal nucleotide addition [9] as well as internal editing [10][11][12].
Given their ubiquity, well-characterized functional roles, and relative ease of isolation through sizeexclusion, microRNAs (miRNAs) were an early and frequent subject of these analyses. Mature miRNAs are typically 20-23 nucleotides (nt) that result from pathways that Competing interests AMB and YA declare no competing interests. are typically unified by the processing of precursor miRNA (pre-miRNA) hairpins via the DICER ribonuclease. Pre-miRNAs are generally products of primary miRNAs (pri-miRNAs) that have been processed by the DROSHA nuclease. In animals, mature miRNAs are loaded from the DICER-derived double-stranded miRNA duplexes onto Argonaute (AGO) proteins of the PIWI/AGO superfamily. These proteins then present the miRNA for binding to complementary regions of other cellular transcripts. Binding via the miRNA seed region (positions 2-8 of the mature miRNA) to a complementary region typically within the 3' untranslated region of a target mRNA results in post-transcriptional reduction of mRNA expression, either via mRNA decay and/or translational repression.
Early sequencing experiments identified variations in the repertoire of mature miRNAs transcribed from individual miRNA loci. These distinct sequences, or "isomiRs" [13], frequently emerge through differential cleavage reactions [13,14]. However, further complexities in the isomiR repertoire that derive from a given miRNA locus were identified through deployment of second-generation sequencing technologies. The greater sequencing depth contributed to detection of high fractions of mature miRNAs containing one or more 3'nucleotide additions [9,15]. These 3' nucleotide additions included single and double nucleotide additions of adenine and/or uridine bases, as well as mono-guanylation [15,16]. While 3' modifications had been studied for several decades in the context of mRNA poly(A)-tailing and CCA-addition to tRNA, the action of such terminal nucleotide transferase enzymes on other classes of RNA, including miRNA, was an emerging subject of research [17][18][19][20][21][22]. On balance, experimental evidence coupled with phylogenetic analyses suggested the diverse complement of RNA classes across eukaryotes was evolutionarily paralleled by diversification of terminal RNA nucleotidyl transferases (NTases) of the DNA polymerase β (polβ) superfamily [23][24][25]. Indeed, soon after 3' miRNA nucleotide additions were observed in deep-sequencing data, the GLD2/PAPD4 and MUT68 NTase enzymes of the polβ superfamily became the first experimentally validated miRNA 3' nucleotide transferases in animals [26] and plants [27], respectively. Other transferases from the same superfamily acting on miRNA were later identified [16,[28][29][30], including those operating on the pre-miRNA [31,32].
The observations that 3' miRNA nucleotide addition events 1) are conserved across animals, with generally comparable rates of addition across species for some evolutionarily conserved miRNA families [15,33,34], 2) selectively occur at certain loci in specific tissues [15,34,35], and 3) are catalyzed by enzymes that had been implicated in regulating other RNA classes [19,23,36], strongly suggest that these 3' modification events are functionally important and not simply byproducts of promiscuous terminal transferase enzymatic activity. Subsequent research has characterized several distinct and sometimes competing or even contradictory functions for these modifications, depending on a range of contextual factors (Figure 1). Contextual factors include the temporal timing of the transfer event during the miRNA maturation pathway. Nucleotide additions are known to occur both before DICER processing, i.e., on the pre-miRNA hairpin and after DICER processing, i.e., on the mature miRNA derived from either the 3' or 5' arm of the hairpin, with distinct functional ramifications ( Figure 1). Another further, as-yet uncharacterized distinction could potentially exist between nucleotide addition occurring on the DICER-cleaved miRNA duplex or after AGO loading on the mature miRNA. miRNAs derived from non-canonical miRNA maturation pathways, which feature alternative processing steps and include the mirtron and AGO2 nucleasemediated pathways, are also targets for nucleotide addition. Depending on the timing of the nucleotide addition during these pathways, addition events can similarly have varying functional implications (Figure 1). Determining the timing of any identified addition event is a particularly vexing problem for researchers, as sequencing data alone is typically insufficient to identify the step at which an addition occurs.
Further contextual factors contribute to functional distinctions. Adenylation and uridylation events, paired with the NTases that catalyze them, have distinct functional outcomes, with uridylation being more closely tied to degradation [26,[37][38][39][40] and adenylation more tied to stabilization [22,[41][42][43][44]. However, the reverse functional outcome has been experimentally demonstrated for each case [30,[45][46][47]. Differences have also been observed between single and multiple transfer events. For example, polyuridylation of pre-miRNAs has been convincingly linked to degradation and/or trimming. On the other hand, the potential difference between single-nucleotide addition events and the cumulative effects of multiple single-nucleotide addition events, including combinations of adenine and uridine addition, awaits further clarification ( Figure 1) [15,16]. Finally, the presence of ancillary, tissuespecific RNAi factors [24] can contribute to the recruitment of NTases to miRNA or the functional outcome of the addition events. For example, the Lin28 RNA-binding proteins recruit DNA polβ NTases so as to promote uridylation and subsequent degradation of let-7 pre-miRNA ( Figure 1) [22,42], and the roquin RNA-binding protein stabilizes the miR-146a population in T cells apparently through enhancing 3' end mono-uridylation of the mature miRNA [46]. Notably, identifying similar distinctions in non-animal RNAi systems is an active area of research [24,48,49].
As proposed by Jones and colleagues, the miRNA 3' terminus can be conceptualized as a node where competing mechanisms influence miRNA efficacy [28]. Taken together with the abovementioned, this suggests that identification of 3' nucleotide addition at a miRNA locus is often a starting point that requires additional follow-up experimentation to disentangle potential regulatory roles. This review describes current methods for identification of 3' terminal nucleotide additions through both experimental and computational approaches.

Methods for probing miRNA 3' nucleotide additions
Deep sequencing technologies provide a top-down approach to identifying potential miRNA loci that are significantly influenced by terminal nucleotide addition events. Indeed, hypotheses leading to the discovery of important regulatory mechanisms at specific miRNA loci have previously stemmed, at least in part, from the initial analysis of high-throughput datasets [30,45,50]. In the first two sections below, we discuss the preparation and computational analysis of deepsequencing libraries from the vantage of analyzing 3' nucleotide addition events. still in their infancy. Recently described approaches for the validation and characterization of 3' nucleotide-addition events at individual miRNA loci are also discussed below.

Deep-sequencing and pre-miRNA populations
The preparation of libraries for mature miRNA sequencing has been discussed at length elsewhere and is not covered here [51]. We focus on discussing the current methods for isolating and deepsequencing cellular pre-miRNA populations, given that 3' nucleotide addition to pre-miRNAs has distinct functional implications ( Figure 1).
Pre-miRNA sequencing is complicated by size overlap with other classes of RNA whose expression levels are higher than pre-miRNAs, often by an order of magnitude or more. However, availing the sequencing depth and longer read lengths of current technologies, one option to analyzing premiRNA landscapes is to simply deep sequence the whole small-RNA transcriptome and map the results to the genome [52]. While less yield is achieved relative to other classes of small RNA, such an approach can now generate sufficient reads to analyze pre-miRNA variation, particularly at highly expressed loci.
More targeted approaches to capture pre-miRNA diversity have also been devised. The first premiRNA landscape snapshot was attained through selective amplification using a library of premiRNA-specific 5' primers [43] built with inferred pre-miRNA sequence definitions from miRBase [53]. While this method efficiently captures pre-miRNA 3' end variation, it has several drawbacks. First, it cannot capture pre-miRNA 5' end variation. In addition, it relies on the accuracy of premiRNA 5' boundary definition in the utilized data base, which can be flawed, particularly at loci where few pre-miRNA reads have been annotated. Finally, the method cannot be used to identify novel miRNA loci or improve 5' boundary definitions at already defined loci. Research projects seeking to analyze addition events over a more limited set of pre-miRNA loci have used this approach, building deep-sequencing libraries through standard approaches with an amplification step relying on a small but specific set of pre-miRNA 5'-end primers [54]. Another early method attempted to capture the pre-miRNA landscape absent a priori sequence information. This method highly expressed small RNAs in the same size range as pre-miRNAs with locked nucleic acids (LNAs) [55] While this method captures a greater range of natural variation (including 3' end nucleotide addition events), the expenses associated with LNA treatment the rounds of sequencing required to identify the best targets for LNA treatment in a given cell type, limit the frequency with which it can be deployed. Building off of these themes, the target-enrichment of sRNAs (TeSR) protocol implements steps designed to further increase yield at sites of interest and control costs [56]. While this approach again curtails discovery by requiring advance sequence information, the flexibility inherent to the approach could be applied to capturing the complete range of dynamic variations at specific pre-miRNA locus or set of loci. In addition, as the catalogable universe of miRNAs approaches its limits [57], the requirement for supplying advance sequence information is arguably less of a concern going forward.
Methods underlying computational analysis of 3' end nucleotide addition identification and quantification.-Early efforts to analyze 3' end nucleotide additions relied on the development of internal computational pipelines. The steps in these pipelines are now routinely deployed by the software packages listed above. While software packages can offer convenience, and accommodate those not familiar with command-line interfaces, their use sacrifices a level of flexibility and control during the analysis. This section describes the general computational workflow for analyzing 3' nucleotide addition events, covering steps incorporated by publicly-available software packages and earlier commandline-based approaches.
Sequences generated from small RNA libraries are first checked for quality control as in any other sequencing library (e.g. fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/ index.html), tagdust [65]). Sequences are then mapped to the human genome, extensively used programs for this purpose include bwa [66] and bowtie2 [67]. When using these programs, parameters should be set to allow for multiple mismatches, ensuring the mapping of miRNAs with 3' end additions that do not match the genome sequence. However, given that several miRNAs belong to closely related families, it is important avoid cross-mapping (i.e. mapping miRNA sequences originating from one locus to another locus), a danger amplified by using these less rigorous mapping parameters. Failure to apply such a step during the mapping process can lead to identification of false editing and false 3'end modification events, although this tends to be primarily a concern for the former [68]. To combat this, a probabilistic cross-mapping correction can be applied post-mapping or concurrent to mapping [68,69]. To our knowledge, no currently available software packages explicitly address this issue although the forthcoming miRge 2.0 program described in bioRxiv [70] does appear to include a cross-mapping correction option during its implementation. While the lack of cross-mapping correction does not invalidate 3' end nucleotide addition analysis, it is important to allow for the possibility of cross-mapping effects while interpreting such results. The collection of mapped sequences is then annotated by comparison to known miRNA genome locus boundary definitions from miRBase [53] using large-scale genome annotation software package suites such as bedtools [71] and samtools [72]. These programs can also be used to visualize the total isomiR repertoire at individual loci, which in turn can be used to isolate the total number of 3' end variants at each locus.
At this point, it is crucial to address whether or not an identified nucleotide addition event can be classified as significant. Although several proposals for addressing significance have been discussed, little in the way of consensus has been reached. Generally, significance of an addition event has been conceptualized at two levels: first, significance against an established background error rate for the sequencing machine; second, significance of an observed 3' addition count across either experimental conditions or relative to the total isomiR count at a single locus.
Determining significance relative to sequencing error is of particular importance at loci with low sequence counts. Such analyses can be performed using Fisher's exact or a comparable statistical test, with a post hoc correction for comparison of multiple hypotheses (e.g. Bonferroni correction, false discovery rate, etc.). Significance of an observed sequence count for a single isomiR with 3' additions relative to the total sequence count of all isomiRs at a specific locus can be tested similarly. In contrast, the significance of increases or decreases in 3' end addition events across distinct conditions is ascertained with differential expression analysis. As with comparisons of miRNA locus total expression, the DESeq2 [73] or EdgeR [74] software packages can be utilized. [75,76] where the bulk of the total sequence count expected to be concentrated in a relatively small number of [77] strategies [77][78][79] [75,76] [80]. Regardless of the software package or normalization strategy deployed, differential expression analyses require at least two biological replicates, and preferably more, of each condition [81].
However, the above comparisons to determine significance rely on a perhaps even more fundamental and overlooked problem: the need to obtain accurate 3' addition event counts. At issue is that the site immediately downstream of the primary isomiR sequence may be occupied by a genomically-encoded adenine or thymidine nucleotide, which can mask an addition event by appearing to be the result of differential nuclease cleavage. Termed "ambiguous" vs. "unambiguous" 3' addition events [15], approaches to date have tended to either ignore ambiguous addition events or consider the two categories separately; these approaches fail to capture actual rates of addition and ignoring ambiguous addition events can substantially underestimate the role of addition at some loci. One potential approach to this problem is estimating lower-bound percentages of genuine addition events in ambiguous addition events based on rates of unambiguous addition at other loci in a dataset. However, rates of addition can vary drastically at different loci and across tissue types [15,34,35]. With the proliferation of miRNA-landscape studies sampling broad ranges of tissue and species types [9,57,82], in addition to meta-analyses of publicly available data [35], in the future it may be possible to assign locus-specific percentages of likely addition events on a tissue-by-tissue basis.

Methods for validating miRNA 3' end nucleotide additions
Despite over a decade passing since the first global characterizations of miRNA 3' end nucleotide additions, experimental techniques to confirm these addition events and monitor expression of isomiRs harboring these 3' end additions remain limited. Current research approaches typically involve at least two rounds of deep sequencing, the first round identifies a locus or set of loci undergoing a significant 3' end nucleotide addition event, and the second round is then performed to test for the global or locus-specific effects of altering NTase expression. As such, the costs and time involved in characterizing the addition events can be prohibitive, even without accounting for the possibility of failing to identify the responsible NTase on a first attempt.
Ideally, quantitative real-time PCR (qPCR) or comparable techniques would be utilized to consistently and accurately detect and quantitate isomiRs at a fraction of the cost of repeated deepsequencing experiments. As with the quantification of isomiR-specific expression, or even expression of closely related miRNA genes, researchers face the fundamental issue of resolving expression between small RNA species with a single nucleotide difference [83]. Specifically complicating detection and quantification issues for individual isomiRs with 3' end nucleotide additions is that qPCR approaches currently utilized to quantitate miRNA expression typically employ a polyadenylation step followed by oligo(dT)-directed reverse transcription [84]. Thus, these approaches are particularly poorly suited for quantification of single or double 3' end nucleotide additions [84]. One notable exception is the miQPCR method, which will be described in detail below. For overviews of the latest technologies deployed in detection and quantification of miRNA and addition-independent isomiR expression, the reader is directed to other reviews [83,85]. Alternative or complementary approaches to qPCR, including exponential amplification reaction (EXPAR) [86] and droplet digital PCR (ddPCR)-based approaches [87], may have particular utility in the detection of mature miRNA 3' end nucleotide addition events, although, to our knowledge, these have yet to be leveraged for such purpose.
In the following, we will focus on techniques successfully deployed in studying 3' nucleotide additions. These typically center on the same deep-sequencing approaches described above, introducing upstream steps to identify and reduce the activity of the NTase(s) responsible for addition events. We will also describe a recent, promising locusspecific amplification approach that distinguishes di-uridylation at the 3' end of a specific miRNA isomiR.
Altering nucleotidyltransferase expression.-Absent more direct approaches to measure 3' end nucleotide addition, most research to this point has relied on reducing NTase activity prior to RNA isolation and deep sequencing library preparation. NTase activity has been reduced through several experimental means. Most commonly, transfection of siRNAs to reduce NTase expression level and activity [15,16,29,54]. Alternatively, NTase-deficient cell lines can be generated using a gene-trap technique [28]. RNA extracted in these NTasereduced conditions has either been prepared for global deep-sequencing profiling (e.g.: [15,28]) or has undergone locus-specific amplification using sense primers that match the 5' end of the targeted miRNA locus or set of loci (e.g.: [54]). The latter option can be paired with sequencers that provide a lower throughput like MiSeq. NTase specifically that of TUT4/ ZCCHC11, has also recently been reduced through treatment with small molecule inhibitors [88].
Targeted amplification of distinct isomiRs.-As mentioned above, the ability to rapidly and costeffectively discriminate among separate isomiRs expressed from the same locus with distinct 3' end nucleotide addition events remains an ongoing challenge. Recent work by Gutiérrez-Vázquez and colleagues, which adjusted an earlier methodology developed by Benes and colleagues termed miQPCR, may point a way forward [41,89]. miQPCR is a RT-PCR based technique [84] which serially adds first a 3' end adaptor sequence known as the miLINKER and next to the 3' end of the adaptor during the RT step [89]. This is followed by amplification using a forward miRNA-specific primer and a reverse universal primer that bridges the miLINKER and the RT primer-derived sequences: this twostep 3' end elongation process is proposed to increase the specificity of the assay, which returned impressive detection rates when tested against the closely related members of the let-7 family [89].
In discerning between isomiRs with and without di-uridine 3' end additions, the miQPCR protocol was further tweaked to introduce custom LNA probes with the qPCR primers. The LNA probe was designed to interact with specific isomiRs, including those with di-uridine 3' end additions, by bridging the mature isomiR 3' end, any 3' additions, and the 5' region of the miLINKER adaptor [41]. This enabled robust detection of specific isomiRs. Results from this method clearly demonstrated that di-uridine additions decreased in the activated T cell state as opposed to the native T cell state [41], a step forward both for monitoring 3' end modifications and linking these modifications to specific functions. The added step resembles an earlier modified protocol that attempted to isolate miRNA 3' end variants on an array using the NanoString nCounter miRNA assay platform [16,90]. In this technique, 3' ends of the array probes were ligated to a universal oligonucleotide, but reversecomplement bridge oligonucleotides were added together during the ligation step to generate a custom set of probes specific to isomiRs with 3' end additions corresponding to the sequences found in the bridge oligonucleotides. Hybridization rates were then measured and counted as per the standard nCounter assay [90].
While the demonstrated results validate further exploration of the miQPCR technique, several issues remain. Chiefly, the assay is not yet sensitive enough to detect endogenous isomiRs [41]. The assay effectively detected synthetic di-uridylated miRNAs nucleofected into naïve CD4 T cells. For loci with a great diversity of isomiRs with 3' end variations, the assay could conceivably be altered by increasing the total number of LNA probes to account for each single 3' end varying isomiR. Unfortunately, this approach could rapidly increase the overall cost of the assay. Further, the technique requires further study to determine whether it is capable of discriminating between single nucleotide 3' additions. While an assay capable of distinguishing between di-nucleotide additions is valuable, single nucleotide additions are the predominant isomiR variations present at mature miRNA loci [9,15,16,34].

Future Directions
The first indications that a significant fraction of various mature miRNAs as well as their precursors are subject to the addition of one or more non-templated nucleotides to their 3' ends occurred over a decade ago, further evidencing the probable functional diversity hidden in the complete isomiR repertoire of the animal genome. Early studies on these 3' additions pointed to a range of potential and distinct functional roles. Recent research, often focused on a single or limited sets of miRNA loci, are beginning to unravel a complexity in the functions of these addition events that was initially underappreciated (Figure 1). A picture is emerging wherein the temporal timing of addition during miRNA maturation combines with the identity of the added nucleotide, the NTase catalyzing the addition, and the presence or absence of auxiliary cofactor proteins to determine the overriding functional impact of the addition event (Figure 1).
Thus, as proposed previously, the 3' ends of mature miRNA and miRNA precursors appear to have been leveraged by the animal genome as a means of regulating miRNA production, function and and cost-effective methods that can identify and track 3' end nucleotide addition. While progress has been made in this direction, researchers are still constrained by the time and money inherent to the required multiple rounds of deep-sequencing experimentation.
In addition to the work required to resolve the sometimes-contradictory findings reported to date (Figure 1), several other avenues of research, with their accompanying complexities, await researchers in the field. One such avenue, detailed in Figure 1, is the set of addition events identified in global profiling that remain poorly understood [15,16,55]. A further emerging area of research to consider is the possible roles of 3' end addition in other small RNA classes. Several such classes are known to be processed by the RNAi machinery [24] and function in post-transcriptional gene repression, including tRNA-derived small RNAs [91], snoRNA-derived small RNAs [92], and vault RNA-derived small RNAs [93]. These classes often appear to display 3' end heterogeneity typical of canonical miRNAs, suggesting that NTases function on them in much the same way as on mature miRNAs. Possible NTase activity on other animal small RNA classes with links to the RNAi machinery await further exploration, of particular interest is the ancient RNAi-associating class of sense-antisense (s-as) transcription gene products [24] (for example, s-as transcripts observed during DNA damage response and regulation of splicing [94][95][96]). Outside of animal 3' nucleotideaddition been observed for small RNA classes correspondexpansions the template-independent DNA polβ superfamily of NTases [23,48,[97][98][99]. Yet another avenue of research in need of further exploration is understanding the preferred nucleotide substrate, if any, for several families of NTases [24,100]. Such research could lead to discovery of the elusive 3' end guanylyltransferase [15,16]. Finally, the recent characterization of a divergent branch of the DNA polβ superfamily with untemplated RNA transferase activity [101] represents another enzyme to test for possible roles in small RNA 3' nucleotide addition.
We hope this discussion assists in the future study of 3' end nucleotide addition to mature miRNA and miRNA maturation pathway intermediates. Framed against the context of the presentlyunderstood functional roles for these modifications, it becomes clear that further methodological innovation will greatly assist in advancing understanding in this burgeoning field.

•
3' nucleotide addition in miRNA pathway was first characterized over a decade ago • Methods to distinguish and characterize these additions have been under development • Diverse and even contradictory functions are attributed to these addition events

•
Further methodology development will assist in further clarification of function already been monouridylated to increase DICER cleavage efficiency (see (B)) [15] (G) mirtrons are generally targeted for degradation in flies via tailor activity [109,110], and small poly(U) tails have been speculated to be involved in miRNA transport to the cytoplasm [55].