Distributed under Creative Commons Cc-by 4.0 Mirmod: a Tool for Identification and Analysis of 5 ′ and 3 ′ Mirna Modifications in next Generation Sequencing Small Rna Data

In the past decade, the microRNAs (miRNAs) have emerged to be important regulators of gene expression across various species. Several studies have confirmed different types of post-transcriptional modifications at terminal ends of miRNAs. The reports indicate that miRNA modifications are conserved and functionally significant as it may affect miRNA stability and ability to bind mRNA targets, hence affecting target gene repression. Next Generation Sequencing (NGS) of the small RNA (sRNA) provides an efficient and reliable method to explore miRNA modifications. The need for dedicated software, especially for users with little knowledge of computers, to determine and analyze miRNA modifications in sRNA NGS data, motivated us to develop miRMOD. miRMOD is a user-friendly, Microsoft Windows and Graphical User Interface (GUI) based tool for identification and analysis of 5 ′ and 3 ′ miRNA modifications (non-templated nucleotide additions and trimming) in sRNA NGS data. In addition to identification of miRNA modifications, the tool also predicts and compares the targets of query and modified miRNAs. In order to compare binding affinities for the same target, miRMOD utilizes minimum free energies of the miRNA:target and modified-miRNA:target interactions. Comparisons of the binding energies may guide experimental exploration of miRNA post-transcriptional modifications. The tool is available as a stand-alone package to overcome large data transfer problems commonly faced in web-based high-throughput (HT) sequencing data analysis tools. miRMOD package is freely available at


INTRODUCTION
miRNAs are a class of small non-coding RNAs (ncRNAs) of size 20-24 bases, acting as post-transcriptional regulators of gene expression (Bartel, 2009). miRNAs and transcription factors are important components of gene regulatory networks (Gu, Zhang & Wang, 2012). Thus, any molecular modification in the regulators of integrated networks produces several downstream effects (Gu & Xuan, 2013;Wang, Gu & Li, 2014). It is established in plants as well as animal cells that different post-transcriptional processes modify miRNAs after biogenesis. The modifications processes include trimming, addition or substitution of miRNA nucleotides (Ebhardt et al., 2009;Kim, Heo & Kim, 2010). Modified miRNA with terminal additions may be of two types, namely templated or non-templated depending on its alignment or non-alignment with the reference genome, respectively. The non-templated additions (NTA) are biologically relevant and physiologically regulated processes mediated by enzymes (Wyman et al., 2011). Although, the functional significance of such post-transcriptional modifications is yet to be extensively explored, few studies have concluded that the modifications play an important role in miRNA stability and affects miRNA-target binding and hence the efficiency of target repression. For example, in humans-the terminal adenylation of miR-122 enhances the miRNA stability while terminal uridylation of miR-26A reduces its ability to inhibit its target (Jones et al., 2009;Katoh et al., 2009). In general, adenylation increases miRNA stability, whereas uridylation promotes miRNA degradation (Baccarini et al., 2011;Ibrahim et al., 2010;Katoh et al., 2009;Lu, Sun & Chiang, 2009). However, in few exceptional cases, as in humans and drosophila, both the 3 ′ adenylation and uridylation promote miRNA degradation (Ameres et al., 2010). It is known that amongst the two miRNA terminal modification types, the 3 ′ end modifications are more common as compared to the 5 ′ end modifications (Ryan, Robles & Harris, 2010). The less prevalent 5 ′ end modifications are of great interest as the modifications can modify the seed sequence of its reference miRNA and may alter target binding or even change target mRNAs (Ameres & Zamore, 2013). The 5 ′ modifications are also reported to be a widely conserved phenomenon, though its functional significance is yet to be explored (Ryan, Robles & Harris, 2010;Wu et al., 2007). Most of the reported miRNA modification studies primarily focus on terminal adenylation and uridylation, however there are fewer reports on modifications with cytosine and guanine (Aravin & Tuschl, 2005;Burroughs et al., 2010;Morin et al., 2008). Previous studies suggest that post-transcriptional modification in miRNAs is a highly regulated phenomenon and few miRNAs are comparatively more frequently subjected to modification (Wyman et al., 2011). Therefore, local and global analysis of miRNA modifications can enhance our understanding of miRNA medicated cellular gene regulatory mechanisms. The availability of affordable high throughput (HT) small RNA (sRNA) sequencing techniques has made it possible to determine and analyze modified sRNA sequences on a global scale. However, analyzing such HT data is not only a computational challenge but also relatively more difficult for biologists with non-computational academic background. Moreover, the literature survey revealed that currently very few computational tools are available to specifically study different types of miRNA modifications in sRNA NGS datasets (see Table 1). For example, isomiRex (Sablok et al., 2013) determines only the miRNA variants that align with the precursor miRNA sequences, whereas tools like miRanalyzer (Hackenberg et al., 2009) and miRGator (Cho et al., 2012) searches only 3 ′ non-templated modifications. SeqBuster is limited to determination of 3 ′ mofications and 5 ′ trimming  (Pantano, Estivill & Marti, 2010). CPSS (Zhang et al., 2012) and Chimira (Vitsios & Enright, 2015) servers processes datasets for limited number of species only. Another tool, isomiRID (De Oliveira, Christoff & Margis, 2013), reports modified miRNA list without distinguishing templated and non-templated modifications. IsomiRage (Muller, Marzi & Nicassio, 2014) is a tool for quantifying user defined miRNA modifications in a small RNA NGS. Similar to the existing tools discussed above, the databases like YM500v2 (Cheng et al., 2013) also performs isomiR searches; however, it is restricted to data related to human cancers. Apart from the above-mentioned limitations, the web-based tools and databases also have technical and computational limitations like restriction of size and type of input files. The output files generated by most of the tools for analysis of miRNA modifications are simple text or html files, which more often requires manual efforts to obtain relevant statistics. Moreover, none of the tools focuses on the effect of modifications on miRNA-target interactions-an obvious and expected translational effect of such modifications. The limitations of the existing tools motivated us to develop miRMOD package, which not only identifies and analyzes miRNA modifications (non-templated additions as well as trimming) but also performs local and global analysis of changes associated with modifications. miRMOD also provides an optional feature to predict the effect of miRNA modifications on miRNA-target affinities, which is consequential for its biological function. Figure 1 is a summary illustration of miRMOD pipeline. miRMOD tool essentially requires three input files-namely, a fasta file of miRNA sequences, a fasta file of sRNA reads and Bowtie generated alignment file (Langmead et al., 2009). Prior to miRMOD run, the sRNA input file for Bowtie and miRMOD is required to be reformatted with the 'prepare input' program, packaged with miRMOD. The 'prepare input' module simply converts input file (fasta/fastq/tab separated format) into a miRMOD accepted preprocessed fasta format (No normalization is performed). The preprocessed file (output from 'prepare input') should be used for Bowtie alignment with its reference genome or pre-miRNAs, using user-defined parameters. miRMOD package is bundled with three sample files, including a fasta file containing mature miRNA sequences in the human genome (2578 sequences; source: miRBase version 20).

Design and algorithm
The miRMOD algorithm uses the following criteria in order to search nucleotide additions/trimming in NGS datasets. The first criterion is that the input miRNA sequences must perfectly align with the corresponding reference genome. The second criterion is that the miRNAs reads with mono or poly-nucleotide additions at the 5 ′ or 3 ′ end of a miRNA sequence must not align with the reference genome.
For each miRNA, a Z-score is calculated to measure its relative tendency to get modified under given miRMOD run parameters. The score is calculated using the following formula: where, s i represents the score for ith miRNA and defined as the product of number of modifications and total number of modified reads observed for a given miRNA. µ(S) represents the mean value of all miRNA scores (S) and σ (S) represents standard deviation of S.
Putative target sequences may also be specified as an optional input in order to compare target-binding affinities of query miRNA sequences with that of modified miRNAs. Target prediction is performed by RNAhybrid (Kruger & Rehmsmeier, 2006) using default parameters, followed by calculation of energy change and alteration in the target sites due to the modifications. miRMOD results are organized in different windows and tabs.

Sample datasets analysis and evaluation
In order to evaluate and check the robustness of miRMOD, we downloaded clean NGS reads of 15 libraries from a GEO entry with accession number GSE21279 (SRA: SRP002272). The reads of size 19-25 nucleotides and abundance equal or greater than 2 were selected for miRMOD analysis. The fasta formatted clean reads were converted into a miRMOD readable format, using miRMOD 'prepare input' . The 'prepare input' processed reads were then mapped to the reference human genome using Bowtie (Langmead et al., 2009) . Only known miRNA sequences (2578 miRNA sequences, miRBASE v20) from each dataset libraries are used as the query sequences in order to identify miRNA modifications. The miRMOD output of miRNA modifications for all the 15 libraries was compared and validated with the modifications previously reported for the test case (Li et al., 2012).

RESULTS
miRMOD facilitates identification and analysis of miRNA modifications in sRNA HT data. miRMOD identifies single and multiple additions/deletions of terminal nucleotides in miRNA reads. miRMOD analysis of already reported datasets illustrates the utility of the tool and helped evaluation of the package. Tabular display of output produced by miRMOD helps in visualization and analysis of miRNA modifications at global as well as local level. For each library, the first tab in the main miRMOD output summarizes the results in 3 sub-sections. The first sub-section displays the gross summary results of modified miRNAs in the dataset with SRA ID: SRX018958 (252 reads in this dataset), with their total number of modifications (1,699 reads) and the miRNA in the input dataset with the highest number of modified reads (Fig. 2). The length distribution graph shows that mono-nucleotide modifications are much higher than polynucleotide modifications. The second section of the tab gives the distribution of modified reads in four different types of modifications (3 ′ addition: 26, 75,808, 5 ′ addition: 8,512, 3 ′ trimming: 19, 81,380 and 5 ′ trimming: 70,237) and a graphical display of the results. The section also displays the percentage of modified reads present in input dataset (in this case it is 40%). The third section of the first tab summarizes the abundance of different type of sRNA reads i.e., miRNAs and the corresponding modified reads present in the input library, facilitating a global analysis. The 'Details' tab ( Fig. 3B) gives comprehensive information about different types of modifications found for each query miRNA along with its modified form. The 'Frequency table' tab provides the percentage occurrence of each type of mono or poly-nucleotide modifications in the input dataset (Fig. 3B). A more detailed description about the utility and unique capabilities of miRMOD are discussed later with appropriate examples and comparisons. miRMOD accepts fasta sequences of putative or already established miRNA targets (e.g., 3 ′ UTR) to predict and compare minimum free energy and target site changes in miRNA:target and modified-miRNA:target interactions. This kind of analysis facilitates the prediction of modifications leading to target alteration, or strength of possible binding (Ryan, Robles & Harris, 2010).

miRMOD performance
We compared the miRMOD analysis of 3 ′ nucleotide additions with the results reported by Li and coworkers (Li et al., 2012), see 'Sample dataset analysis and evaluation' for details of miRMOD run. Notably, using miRMOD we were able to identify all 3 ′ additions (only non-templated) reported by them in each of the 15 libraries. Additionally, miRMOD also determines several other terminal modifications i.e., 3 ′ trimming, 5 ′ additions and 5 ′ trimming, not reported by them. Thus, miRMOD can help in identification and analysis of different types of novel modifications. We calculated the percentage of single (A, T, G and C) and di-nucleotide (A * ,T * ,G * ,C * ), 3 ′ and 5 ′ additions as well as trimming in each of the sample libraries (Tables 2 and 3). The percentage addition of 'A' at the 3 ′ miRNA end is the highest among all the other nucleotide additions in the libraries, followed by the 'AT' and 'T' . Apart from 'A' and 'T' additions, miRMOD assisted analysis also revealed significant percentages of 'C' and 'G' nucleotide additions or miRNA trimmings, in addition to what is already reported in the publication related to the datasets (Li et al., 2012). The frequency of miR122-5p 3 ′ addition with adenylation is comparatively higher than that of any other miRNA in the libraries, in agreement with the previous reports (Li et al., 2012). As expected, the proportion of 'C' and 'G' addition at 3 ′ end is significantly less than 'A' or 'T' additions. However, the 'C' and 'G' nucleotides are trimmed mainly at 3 ′ end except in the sample SRX018961, where 'A' dominates all the other mono-nucleotide trimmings. Additionally, we also analyzed the nucleotide additions and trimming at the 5 ′ ends of miRNA sequences in all the sample libraries. It may be noted that there are comparatively few reports of 5 ′ modifications in the literature. The miRMOD analysis of the 15 libraries analyzed by us reveal that 'C' is the most frequently added nucleotide at the miRNA 5 ′ ends, followed by 'A' and 'T' , in the order of decreasing frequencies. In the case of 5 ′ trimming, mono-nucleotide deletion of 'C' followed by 'T' surpasses all other nucleotide deletions. Thus, we found that 'C' is an important nucleotide involved in the 5 ′ end miRNA modification. The miRMOD analysis reveals that the most common terminal miRNA modification is 3 ′ additions, followed by 3 ′ trimming, 5 ′ trimming and 5 ′ additions, in the descending order of frequencies. As reported in previous studies, 3 ′ additions also include AA, TT and TA in significant proportion, which is also confirmed by miRMOD analysis of the dataset (Burroughs et al., 2010;Li et al., 2012). Apart from mono or di nucleotide modifications, miRMOD also determines polynucleotide additions and deletions at the miRNA terminal ends (Table S1); however, these form a very small fraction of the overall modifications.
Another interesting observation is that miRMOD determined very few miRNAs (around 1-5) with terminal trimming that transforms it to another previously classified miRNA. For example, miR-548k is converted to miR-548av-5p in the sample SRX108971, due to removal of TCGT at the 3 ′ end. Although the percentage of such miRNA conversion is very low in the studied dataset, the studies on miRNA transformation in other sequencing datasets and experiments could be interesting. We also observed a repetitive nature of modifications through the analysis of the 15 datasets. 1012 modifications are common amongst the 15 datasets. Few modified sequences in the datasets are present in higher abundance for example '1056282' for has-miR-122-5p. Such a large number of repetitions of modified sequences rejects the probability that the sequences are mere artifacts and also suggests that such conserved modifications may be functionally significant and call for experimental scrutiny.

Comparison with other existing tools
Compared with the other tools, miRMOD is a user-friendly and easy to install, moreover miRMOD installation requires least software dependencies (see miRMOD manual).
Unlike the currently available webservers and tools for miRNA modification analysis, miRMOD has the advantage that large dataset transfer over the Internet is not required and the analysis pipeline is species independent. The tool determines trimming and non-templated nucleotide additions at both terminal ends of miRNAs in sRNA NGS datasets. It also determines the percentage occurrence of each modification in input dataset, per miRNA modification score along with 'miRNA to miRNA' conversion. The utility of miRMOD is best illustrated by a comparison of miRMOD analysis with other existing tools (Table 1). Comparison with CPSS tool requires detailed comparison, as it determines 5 ′ as well as 3 ′ trimming apart from addition of non-templated nucleotides. Amongst the other existing tools, miRGator requires input files in BAM format, with file size restriction of 20 MB. However, SeqBuster is a standalone tool which has several software dependencies required for its installation. miRMOD is free from any of the mentioned limitations and equipped with several unique features to facilitate and comprehensive analysis of miRNA modification in user defined inputs.
For comparison with CPSS, we used the sample dataset of human fetal ovary, available from CPSS website. Using miRMOD, we found that 355 miRNAs were modified from 668 miRNAs present in the dataset (Fig. 3A). The different types of modifications (5 ′ and 3 ′ non-templated additions and trimming with abundance ≥10, miRMOD default read count threshold) present in the CPSS output file were identified by miRMOD too. In the test library, hsa-let-7a-5p has the highest number of miRNA modifications with 485556 modified reads. Apart from this, the 3 ′ trimming (72%) is the most common modification type for the given sample dataset. The detailed result section further elaborated the summary for local analysis by separating the reads with different type of modifications (5 ′ or 3 ′ ) in a "TreeView" format as shown in Fig. 3B. This enabled us to infer the most prominent modifications for a selected modified miRNA without re-analyzing the predicted modifications as compared to the CPSS result. In the next section of miRMOD analysis, frequency table helped us in comparing the percentage occurrence of different modifications in the test dataset (Fig. 3C). This information can help us in identifying the most common nucleotides involved in modifications in the dataset and is also a unique miRMOD feature, not available in other tools. For example 'GTT' is a preferred nucleotides modification for 3 ′ trimming whereas 'A' is the most preferred modification for 3 ′ addition over the other types of modifications in the sample dataset. Another unique miRMOD feature 'miRNA2miRNA conversion' helps in identification of miRNA(s) that are trimmed and transformed into previously classified miRNA(s) from the list of miRNA(s) submitted by the user. miRMOD performs miRNA scoring and prioritization based on its tendency to modify and produce modified reads (Fig. 3D, see 'Materials and Methods'), a unique feature of miRMOD tool. Thus, the information like which miRNAs has the higher tendency for modification as compared to other miRNAs is swiftly retrievable from the dataset using miRMOD, unlike other prediction tools. Moreover, the optional and novel miRMOD feature of Target Variation Analysis (TVA) allows multi-way capturing of highly modified miRNAs and its modification(s) to explore if selected modification is playing any role in altering the miRNA-target interactions ( Fig. 3E). Thus, miRMOD offers several unique features for sRNA data analysis, which makes it highly useful tool for modification analysis.

miRMOD evaluation with experimentally validated miRNA modifications
We used miRMOD to identify experimentally validated miRNA modifications from a publicly available dataset of human neural dataset (SRA: SRX548700) (Tan et al., 2014). Not surprisingly, miRMOD successfully identified hsa-miR-9-1 miRNA sequence as well as its experimentally verified modified miRNA 'CTTTGGTTATCTAGCTGTATGA' (generated as a result of 5 ′ trimming) in the sRNA dataset. Apart from the above-mentioned modification, two additional modified sequences were also reported by miRMOD for hsa-miR-9-1 viz. 3 ′ trimming and 3 ′ addition of 'A' nucleotide (see Supplemental Information 1). The execution time for complete analysis of a given miRNA using the benchmark dataset was ∼15 min.

CONCLUSIONS
miRMOD is a Windows-based standalone specialized tool to identify and analyze nontemplated nucleotide additions and trimming at both the termini of the miRNA sequences in sRNA NGS data. The tool provides useful statistics about various types of miRNA modifications along with its frequencies, differential miRNA-modified miRNA abundance and frequencies of the modified bases found in each of the modifications. miRMOD also analyzes differences in miRNA target interactions, as a result if miRNA modifications. The miRMOD analysis gives a global as well as micro overview of miRNA modifications in sRNA NGS datasets. miRMOD is available as a GUI based as well as cross-platform command-line package, to generate user-friendly graphical representations for miRNA modifications in a NGS sRNA dataset, and its effect on miRNA-target binding affinity.