Degradome sequencing-based identification of phasiRNAs biogenesis pathways in Oryza sativa

The microRNAs(miRNA)-derived secondary phased small interfering RNAs (phasiRNAs) participate in post-transcriptional gene silencing and play important roles in various bio-processes in plants. In rice, two miRNAs, miR2118 and miR2275, were mainly responsible for triggering of 21-nt and 24-nt phasiRNAs biogenesis, respectively. However, relative fewer phasiRNA biogenesis pathways have been discovered in rice compared to other plant species, which limits the comprehensive understanding of phasiRNA biogenesis and the miRNA-derived regulatory network. In this study, we performed a systematical searching for phasiRNA biogenesis pathways in rice. As a result, five novel 21-nt phasiRNA biogenesis pathways and five novel 24-nt phasiRNA biogenesis pathways were identified. Further investigation of their regulatory function revealed that eleven novel phasiRNAs in 21-nt length recognized forty-one target genes. Most of these genes were involved in the growth and development of rice. In addition, five novel 24-nt phasiRNAs targeted to the promoter of an OsCKI1 gene and thereafter resulted in higher level of methylation in panicle, which implied their regulatory function in transcription of OsCKI1,which acted as a regulator of rice development. These results substantially extended the information of phasiRNA biogenesis pathways and their regulatory function in rice.

OsmiR393a and OsmiR393b regulated rice primary root elongation and adventitious roots number via auxin signaling pathway [5]. The miR398 directly linked to the Arabidopsis stress regulatory networks such as oxidative stress,water deficit, salt stress, etc. [6].
In terms of siRNAs, their biogenesis could be triggered either endogenously by its genetic events or exogenously causes, such as virus infection or transgenic operation [7]. In contrast to miRNA, the precursor of siRNAs are usually long and double-stranded [7]. Recent years, researchers found the biogenesis of some siRNAs are "triggered" by miRNAs-mediated cleavage. Fragments resulted from mRNA cleavage are typically subjected to rapid degradation. However, a small proportion of the fragments will survive and subsequently be processed into double-stranded RNA (dsRNA) by RNA-dependent RNA polymerase 6(RDR6) with the aid of Suppressor of Gene Silencing 3 (SGS3). These double-stranded fragments will further be cleaved by Dicer-like (DCL) proteins in different phased manners to produce a series of 21-or 24-nt siRNAs, termed phased small interfering RNAs (phasiRNAs) [8].
SiRNAs in 21-nt length regulates gene expression by cleaving complementary transcripts the same as miRNA-mediated cleavage in plant. The bestcharacterized phasiRNAs are TAS loci-derived 21-nt trans-acting siRNAs (tasiRNAs) in Arabidopsis. Research discovered that miR173 targeted to TAS1 and TAS2 and resulted in the production of tasiRNAs. Interestingly, some of these tasiRNAs continued to recognize target transcripts to produce tertiary phasiRNAs [9]. The TAS1-and TAS2-derived tasiRNAs were involved in regulation of stress responses, such as improvement of thermotolerance [10], maintaining the normal morphogenesis of flowers in plants under drought stress conditions [11]. The biogenesis of TAS3-derived tasiRNAs were triggered by the miR390 recognition [12]. These tasiRNAs targeted to ARF family members which regulates various biological processes, including embryo development, thermotolerance developmental transitions, leaf morphology, flower and root architecture and stress responses [13,14]. Besides, report showed TAS4-derived tasiRNAs induced by miR828 regulated anthocyanin biogenesis via repression of MYB genes [15]. For siRNAs in 24-nt length, researches revealed that they were key players in triggering of RNA-directed DNA methylation (RdDM) [16], which is the major small RNA-mediated epigenetic pathway that causes transposable element repression and transcriptional gene silencing (TGS) in plants [17]. For example, recent research discovered that the distribution of 24-nt siRNAs differs in rice gametes (sperm and egg), as well as from vegetative tissues, which further suggest a major difference in reprogramming of their genomes prior to fertilization [18]. Different algorithm and software tools have been employed not only in mining the novel miRNA-phasiRNA pathways, but in exploring the miRNAs' extended regulatory networks [19]. Current research discovered two miRNAs, miR2118 and miR2275, were mainly responsible for the triggering of 21-nt and 24-nt phasiRNAs biogenesis, respectively [20]. And subsequent reports were then focused on the investigation of miR2118-phasiRNA and miR2275-phasiRNA biogenesis pathways and their biological functions [21,22]. To our knowledge, about 56 phasiRNA precursors (PHAS loci) have been identified in rice. For PHAS loci in other economic crops, approximately 261 PHAS loci in Zea mays (maize), 916 PHAS loci in Setariaitalica (foxtail millet), 201 PHAS loci in Solanumtuberosum (potato), and 123 in Solanumlycopersicum(tomato) have been discovered, respectively [23]. Besides, in addition to those noncoding regions in genome, protein-coding genes could also be the PHAS loci in plants [23,24], which implies a more complicated mechanism of plant phasiRNA biogenesis.
Due to the biological significance of phaiRNAs, mining of novel miRNA-phasiRNA pathways as well as functional cascade amplification have attracted wide attention. As an important economic crop, investigating novel phasiRNA pathways will not only benefit our understanding in post-transcriptional regulations in this organism, but also could be used as references across economic crops.
Previously, we discovered lots of siRNAs in a threeweek-old seedling sample by using the corresponding sRNA high-throughput screening (HTS) datasets. This inspired us that some of them might be phasiRNAs. Here, we continued to use our previously developed approach [25] for systematically mining of phasiRNA biogenesis pathways with these sRNA HTS datasets. In addition, considering some phasiRNAs expression might be tissue specific or stress dependent, we collected comparable sRNA HTS data sets published elsewhere using tissue-specific rice samples, which cultured under normal (control) or stress condition. The targets of novel phasiRNAs were further predicted and verified in order to provide substantial information of miRNA/sRNA-phasiRNA regulatory network in rice.

Identification of novel phasiRNA biogenesis pathways in Oryza sativa
The sRNA HTS datasets from different rice samples were employed as inputs and rice cDNA sequences as alignment reference for searching PHAS loci capable of producing 21-nt or 24-nt phasiRNAs. As a result, fourteen 21-nt and nineteen 24-nt PHAS loci candidates passed through the filtering procedures as well as the corresponding searching of sRNA triggers for phasiRNA production (Additional file 1: Table S1, Additional file 2:  Table S2). Recent reports discovered that processing of 21-nt phasiRNAs mainly depends on OsDCL4, and OsDCL3 is required for biogenesis of 24-nt phasiRNAs in rice [20]. Therefore, we evaluated the abundance of 21-nt and 24-nt phasiRNAs generated from potential PHAS loci candidates by comparing the wild-type (wt) with osdcl4 knockdown mutant (osdcl4-1) [26] (for 21nt phasiRNAs) and osdcl3 knockdown mutant (osdcl3-1) [20] (for 24-nt phasiRNAs), respectively.
As a result, five novel 21-nt PHAS loci and five novel 24-nt PHA loci along with their corresponding miRNA/ sRNA triggers were identified ( Table 1). As shown in Fig. 1 and Fig. 2, the miRNA/sRNA triggers-mediated cleavages in target PHAS loci were detected by at least one degradome sequencing dataset. Indeed, each cleavage site was close to the flank of phasiRNA production region as indicated by the relative abundances of phasiR-NAs (middle panel), which suggested these sites were primary registers for phasing process. Additionally, the abundance of phasiRNAs generated from these newly found 21-nt and 24-nt PHAS loci in wild type were relatively higher than that in osdcl4-1 mutant and osdcl3-1 mutant, respectively. This indicated that the 21-nt and 24-nt phasiRNA productions are OsDCL4-and OsDCL3 dependent, respectively (Additional file 3: Figure S1 and Additional file 4: Figure S2). Taken together, these results demonstrated that these newly found PHAS loci fit the profiles of canonical phasiRNA precursors [8,20].
Previously, we found lots of sRNAs generated in threeweek-old seedling tissues (see details about the information of GEO number of dataset and culture condition of plants in Additional file 5: Table S3). Here, we tested whether these sRNA are phasiRNAs by using our mining method. As expected, the phasiRNAs generated from two novel 21-nt PHAS loci (LOC_Os01g57968.1and LOC_Os05g43650.1) and four novel 24-nt PHAS loci (LOC_Os02g20200.1, LOC_Os02g55550.1, LOC_ Os04g45834.2 and LOC_Os09g14490.1) were identified in three-week-old seedling tissues (Table 1).
Since the triggering of phasiRNAs are sometimes tissue specific and stress dependent, a serial of sRNA HTS datasets of different rice samples were employed for mining novel PHAS loci in different tissues and stress conditions (see details about the information of GEO number of datasets, culture and treatment conditions of plants in Additional file 5: Table S3). As shown in Fig. 1 Taken together, these protein-coding genes acted as PHAS loci in different tissues and stress conditions suggested these coding sequences were regulated at posttranscriptional level in response to different stages of growth and stress conditions.
In consistent to previous discovery, two known 21-nt PHAS loci (LOC_Os12g42380.1 and LOC_Os12g42390.1) were also uncovered by our screening procedure (Additional file 1: Table S1), which have been identified as two parts of a long non-coding RNAs [27]. LOC_ Os12g42380.1-derived phasiRNAs were detected in both seedling and panicle under normal, drought and salinity stress conditions. Yet they were only detected in shoot under salinity stress. LOC_Os12g42390.1-derived . For each graph, degradome supported cleavage signature on PHAS loci were profiled above, four high throughput degradome sequencing datasets (GSM434596, GSM455938, GSM455939 and GSM476257 which were represented by triangle, diamond, circle and square with different colors, respectively) were employed for scanning the sRNA triggers' cleavage sites, which were marked by black arrows. The x axis represents the position on PHAS loci and the y axis represents the signature abundance. The abundance of 21-nt phasiRNAs which generated from the sense and antisense strand of PHAS loci in different samples were evaluated and profiled in middle images, the x axis represents the position on PHAS loci and the y axis represents the phasiRNA abundance. The phasing score of 21-nt PHAS windows were profiled at bottom, the x axis represents the position on PHAS loci and the y axis represents the phasing score phasiRNAs were detected in shoot under normal condition, and panicle in drought. These results implied there are three alternative phasiRNA production regions within their lncRNA PHAS loci, and therefore the capability of phasiRNA production might vary in different development stages and stress conditions. To note, to our knowledge, for all these newly found PHAS loci, only the biogenesis of LOC_Os04g25740.2derived phasiRNAs were triggered by a known miRNA, miR2118f. The rest of them were first-time discovered, and were recognized by novel sRNAs (Table 1), which suggested these phasiRNA biogenesis pathways are not belong to the miR2118 or miR2257 mediated regulatory networks.

Analysis of the RNA directed DNA methylation (RdDM) regulated promoters of novel 24-nt phasiRNAs
RdDM is an important regulatory event with regards to repressive epigenetic modification which triggers transcriptional gene silencing. In order to analysis the novel 24-nt phasiRNA mediated RdDM in rice, we focused on all the known promoter sequences for scanning the target sites of novel phasiRNAs generated from the newly found five 24-nt PHAS loci. The result indicated a promoter of LOC_Os02g40860.1 gene was targeted by five LOC_Os01g37325.1-derived phasiRNAs (Table 3). Since LOC_Os01g37325.1-derived phasiRNAs were detected in panicle rather than in root tissue (Fig. 2), we used the bisulfite-seq and RNA-seq datasets [41] of rice panicle and root for identification of LOC_Os01g37325.1-derived phasiRNAs-mediated DNA methylation intarget promoter and their role in transcriptional repression of target gene (LOC_Os02g40860.1). It was reported that CG and CHG methylation contexts are maintained by DNA methyltransferases and histone modifications, while CHH methylation was associated with 24-nt siRNA For each graph, degradome supported cleavage signature on PHAS loci were profiled above, four high throughput degradome sequencing datasets (GSM434596, GSM455938, GSM455939 and GSM476257 which were represented by triangle, diamond, circle and square with different colors, respectively) were employed for scanning the sRNA triggers' cleavage sites, which were marked by black arrows. The x axis represents the position on PHAS loci and the y axis represents the signature abundance. The abundance of 24-nt phasiRNAs which generated from the sense and antisense strand of PHAS loci in different samples were evaluated and profiled in middle images, the x axis represents the position on PHAS loci and the y axis represents the phasiRNA abundance. The phasing score of 24-nt PHAS windows were profiled at bottom, the x axis represents the position on PHAS loci and the y axis represents the phasing score   guided RdDM [16]. We discovered the CHH methylation status of promoter was relative higher in panicle than in root (Fig. 5). In addition, the expression level of LOC_Os02g40860.1 was relatively lower in panicle than in root. These results implied a methylation mediated transcriptional silencing of the promoter of LOC_ Os02g40860.1. For LOC_Os02g40860.1, it encodes a Casein kinase I1 (OsCKI1) protein belongs to the CKIs protein family, which are highly conserved in eukaryotes. They are involved in a variety of important biological events since they have a wide substrate specificity in vitro [42]. Taken together, we speculated that the OSsRNA-14-LOC_ Os01g37325.1-phasiRNA pathway might play crucial roles for rice seedling and panicle development.

Discussion
In recent years, researches on Oryza sativa have shown that 21-or 24-nt sRNAs distribute to genomic clusters [43]. To date, dozens of PHAS loci have been discovered in rice [23]. However, two miRNAs, miR2118 and miR2275 are mainly responsible for the triggering of 21nt or 24-nt phasiRNAs biogenesis from these PHAS loci.
Considering there are rich sources for miRNA/sRNA-phasiRNA pathways in other plant species, we conceived that the miRNA-phasiRNA pathways have not been fully discovered in rice either. Therefore, it is worthy of continuing the mining for better understanding the mechanism of phasiRNA biogenesis and the miRNA-derived regulatory network. In our previous work, we found plenty of sRNAs with unknown function and origin from a sRNA HTS data set of three-week-old seedling tissue, and speculated some of them were phasiRNAs with regulatory functions. In this study, we performed a systematically searching of novel PHAS loci from rice cDNA by utilizing the same seedling dataset with our previous established mining approach [25].
The novel 21-nt PHAS loci, LOC_Os05g43650.1, is a miniature inverted-repeat transposable element (MITE) gene [44]. Also, with regards to two 24-nt PHAS loci, LOC_Os01g37325.1 and LOC_Os02g20200.1, they are two retrotransposon genes. These indicated that the transcripts of transponsons and retrotransponsons are capable of producing secondary siRNAs, which is consistent with the same phenomenon reported by Creaseyet al. in Arabidopsis [45,46].
According to the target information of phasiRNAs, the OSsRNA-3-LOC_Os05g43650.1-phasiRNA and OSsRNA-14-LOC_Os01g37325.1-phasiRNA pathways are required for the rice development. Transponsons and retrotransponsons that play important roles in plant gene and genome evolution are ubiquitous in plants [47]. We hypothesized that the transcripts of transponson and retrotransponson might also function as important sources of phasiRNA in plants. Further exploration of such phasiRNA biogenesis pathways could benefits the in-depth investigation of their biogenesis mechanism and the miRNA/sRNA directed regulatory networks in plants.
For those phasiRNAs generated from the transcripts of LOC_Os01g57968.1, LOC_Os02g20200.1, LOC_ Os02g55550.1, LOC_Os04g45834.2 and LOC_Os09g14490.1, none of their targets were identified. However, considering these phasiRNAs were detected only in seedling, it still cannot rule out the possibility that these phasiRNA biogenesis pathways might take place in rice seedling development. LOC_Os04g45834.2 encodes a DUF584 domain containing protein. These protein family has been involved in leaf senescence in plant [48]. LOC_Os09g14490.1 encodes a TIR-NBS type disease resistance protein, which has been identified in resistance to multiple viruses in plant [49][50][51]. LOC_ Os02g55550.1 encodes a F-box/LRR-repeat protein 14, which is involved in plant immune response [52]. These genes have been proved to play important roles in plants, however, their capability of producing secondary phasiRNAs suggest they might be involved in much more complex function than what we expected. Similarly, no targets of LOC_Os01g57968.1-derived phasiRNAs was identified, however, since these phasiRNAs only expressed in panicle tissue under normal condition, it might suggest the OSsRNA-1-LOC_Os01g57968.1-phasiRNA pathway might related to the rice panicle development. Thus, systematically investigation of the temporal and spatial expression specificity of phasiR-NAs generated from the transcripts of protein-coding genes in our future work might gain insight into these phasiRNAs biogenesis requirement mechanism.
In this study, two cDNA sequences, LOC_ Os09g00999.1 and LOC_Os09g01000.1, which were able to produce plenty of Dicer-independent secondary siR-NAs in most of tissues, have attracted our attention. We further employed the searching of phasiRNAs generated from LOC_Os09g00999.1 and LOC_Os09g01000.1 for target prediction and identification. The results indicated plenty of siRNA-target interaction pairs were discovered (data not shown). This might suggest a novel pattern of secondary siRNAs biogenesis pathways. Therefore, further investigation of Dicer-independent secondary siR-NAs biogenesis pathways in plant might provide more strong evidence of this biogenesis pattern, and more meaningful information of the small RNA regulatory mechanism in plant.

Conclusions
Here, we performed degradome-based screening of novel phasiRNA biogenesis pathways in rice. Five novel 21-nt phasiRNA biogenesis pathways and five novel 24-nt pha-siRNA biogenesis pathways were also identified in addition to two known 21-nt phasiRNA biogenesis pathways. Further analysis on the targets of these novel pha-siRNAs in 21-nt and 22-nt length revealed that eleven novel phasiRNAs mediated forty-one siRNA-target interactions during rice growth and development ( Table 2, Additional file 1: Table S1, Additional file 2: Table S2 and Additional file 6: Figure S3). These results demonstrated the effectiveness of degradome-based screening in mining novel phasiRNA biogenesis pathways and substantially extend the information of phasiRNA biogenesis pathways in rice. We believed that, more novel pha-siRNA biogenesis pathways might be identified if extend our approach to other plant species.

Data source
The Oryza sativa sRNA HTS datasets of seedling, root, shoot and panicle samples under normal (control) and stress conditions, the sRNA HTS datasets of wild type, osdcl4 and osdcl3 mutants and the degradome sequencing datasets were retrieved from GEO (Gene Expression Omnibus; http://www.ncbi.nlm.nih.gov/geo/). The bisulfite-seqand RNA-seq datasets of panicle and root were contributed by Zhao et al. [41]. All the HTS datasets employed for our study were listed in Additional file 5: Table S3.
The cDNAs, full-length genomic sequences of Oryza sativa were retrieved from PlantGDB (http://plantgdb. org/XGDB/phplib/). The promoter sequences of Oryza sativa were retrieved from PlantProm DB (http://linux1. softberry.com/). All the high-throughput sequencing data were pre-processed before use, the data of each library was normalized in RPM (reads per million) as described in our previous report [53].

Identification of phasiRNA biogenesis pathways in Oryza sativa
The phasiRNA loci identification criteria were established based on the revised trans-acting siRNA (tasiRNA) biogenesis model as we reported previously [28]. The screening of PHAS loci in rice was followed by four steps: (1) cDNA/genome sequences-derived 21-nt phased duplexes were computational predicted by "phase processing", each of these duplexes has a 2-nt overhang at 3′-end. (2) Each of these duplexes was separated into two increments and used for matching with small RNAs from small RNA high throughput sequencing datasets of Rice. A potential phasiRNA production region shall contain at least 5 tandem "processing" duplexes and each of these duplexes shall contains detectable phasiRNA from sense strand (plus siRNA) and/or antisense strand (minus siRNA). (3) Degradome HTS libraries which contributed by the works of Wu et al. [54], Li et al. [55] and Zhou et al. [56] were employed for systematically scanning the degradome-supported cleavage signatures on the screened possible phasiRNA production regions as we described in our previous work [28], and maintain the PHAS loci candidates with cleavage signatures which located in the phasiRNA production region. (4) The sRNAs bound to the PHAS loci were analyzed by using miRU algorithm [57], and the sRNA cleavage sites on those loci were further verified by using degradome sequencing libraries. The degradome-supported cleavage site of a sRNA trigger shall reside within 10 to 11-nt from the 5′ end of the binding site [58]. (5) The phasing score of phasiRNA production from each PHAS loci candidate should above 1.

Calculation of phasing score
Phasing scores of phasiRNA regions were calculated based on the formula which contributed by Zheng et al. , where N represents the number of phase register occupied by at least one unique 21-nt/24-nt small RNA within a fivephase register window, p represents the total number of reads for all 21-nt/24-nt small RNA falling into a given phase in a given window, U represents the total number of unique reads for all 21-nt/24-nt small RNA falling out of a given phase.

Identitification of phasiRNA-target interaction based on degradome sequencing
The expressed novel phasiRNAs generated from 21-nt PHAS loci were predicted based on previously modified model of tasiRNA biogenesis in plant [28]. The predicted phasiRNAs were recruited for target prediction by using miRU with default parameters [57], and followed by degradome sequencing-based verification, as described previously [53,59].

Gene expression level analysis
The sequences of RNA-seq datasets were mapped to the reference cDNA sequences, and each gene expression level was calculated by the total RPM of mapped sequences.

Identification of 24 nt phasiRNA target
In order to identify the potential 24-nt phasiRNA target sites in promoter sequences, BLAST analysis was performed for finding the location of the complementary sequence of 24-nt phasiRNA with no mismatch [60]. The promoters possessed phasiRNA binding sites were remained as potential target promoters. As each of the downloaded promoter sequence containing partial mRNA sequence, we identified the corresponding potential target genes by mapping the partial mRNA sequence to cDNA sequences. The DNA methylation status of potential target promoters were analyzed by utilizing the bisulfite-seq datasets of panicle and root of rice. The expression specificity of phasiRNA in different tissues should consistent with the occurring of increasing methylation of the target promoter. The DNA methylation analysis of promoters were performed according to the method developed by Zhao et al. [41]. The sequences of bisulfite sequencing libraries were mapped to the potential promoter sequences, and the uniquely mapped sequences were used for further DNA methylation level analysis. The DNA methylation level of each cytosine was obtained by calculation of the total coverage of individual cytosines in RPM.