Genetic basis of sRNA quantitative variation analyzed using an experimental population derived from an elite rice hybrid

We performed a genetic analysis of sRNA abundance in flag leaf from an immortalized F2 (IMF2) population in rice. We identified 53,613,739 unique sRNAs and 165,797 sRNA expression traits (s-traits). A total of 66,649 s-traits mapped 40,049 local-sQTLs and 30,809 distant-sQTLs. By defining 80,362 sRNA clusters, 22,263 sRNA cluster QTLs (scQTLs) were recovered for 20,249 of all the 50,139 sRNA cluster expression traits (sc-traits). The expression levels for most of s-traits from the same genes or the same sRNA clusters were slightly positively correlated. While genetic co-regulation between sRNAs from the same mother genes and between sRNAs and their mother genes was observed for a portion of the sRNAs, most of the sRNAs and their mother genes showed little co-regulation. Some sRNA biogenesis genes were located in distant-sQTL hotspots and showed correspondence with specific length classes of sRNAs suggesting their important roles in the regulation and biogenesis of the sRNAs. DOI: http://dx.doi.org/10.7554/eLife.03913.001

quantitative difference has been assumed (He et al., 2010;Groszmann et al., 2011b). It is not known whether the quantitative variation of sRNA species between genotypes is related to the biological machinery, as quantitative variation of sRNAs has not been assayed at the population level and their genetic control has yet to be elucidated.
The recently developed expression quantitative trait locus (eQTL) analysis has provided an approach for determining the genetic control of the expression level of a gene, including cisand trans-eQTLs, as well as epistatic effects (Becker et al., 2012). This approach can also be applied to the genetic analysis of quantitative variation of sRNAs by regarding the abundance of the sRNAs in the population as quantitative traits. Once the QTLs are identified, subsequent studies can be pursued very much the same way as the analysis of genes and regulatory networks underpinning phenotypic QTLs .
There have been several studies focusing on the genetic regulation of known and validated miRNAs and small nucleolar RNAs in specific tissues/cells from samples of the human population (Borel et al., 2011;Gamazon et al., 2012;Parts et al., 2012;Civelek et al., 2013;Jin and Lee, 2013;Siddle et al., 2014). These studies detected a number of cisand trans-miQTLs that could also influence the expression of the mRNA targets, which may be associated with phenotype difference.
Here, we performed a whole genome QTL analysis of the entire sRNA kingdom consisted of 18-nt to 26-nt sRNAs from flag leaf of rice using an experimental genetic population. The analysis revealed features of the genetic controls of sRNA abundance showing both commonality and distinction with their precursor transcripts. It was also shown that the abundance of sRNAs is probably related to proteins constituting the machinery for sRNA biogenesis and functioning.

Patterns and distributions of sRNAs
The genetic materials consisted of 98 hybrids obtained by paired crosses of 196 recombinant inbred lines (RILs) derived by single seed descent from a cross between Zhenshan 97 and Minghui 63, the parents of Shanyou 63 that was the most widely cultivated rice hybrid in the 1980s and 1990s and still used with reduced area in recent years in China. Because the genetic composition of these crosses resembled individuals in an F 2 population, they were referred to as an immortalized F 2 (IMF2) eLife digest Genes within the DNA of a plant or animal contain instructions to make molecules called RNAs. Some RNA molecules can be decoded to make proteins, whereas others have different roles. A single gene often contains the instructions to make both protein-coding RNAs and noncoding RNAs.
Molecules called small RNAs (or sRNAs) do not code for proteins. Instead, sRNAs can control protein-coding RNA molecules or chemically alter the DNA itself; this allows them to perform many different roles in living organisms. In plants, for example, these molecules affect how the plant grows, the shapes and structures it forms, and how likely it is to survive challenges such as drought and diseases. Often different plants of the same species have different amounts of sRNAs, but the reasons for this remain unclear. Now, Wang, Yao et al. have made use of a technique called 'expression quantitative locus' analysis to look at how sRNAs in rice plants are controlled by additional information encoded within DNA. The analysis identified over 53 million sRNA molecules from a population of rice plants. Many of these sRNAs varied in their abundance between different plants within the population. Wang, Yao et al. also found many thousands of individual instructions within the DNA of the rice that can either increase or reduce the abundance of their associated sRNA.
Some of the abundant sRNAs were influenced by instructions within their own genes; some were influenced by instructions from other genes; and some were influenced by both. Wang, Yao et al. also found that the control of protein-coding RNAs was not necessarily related to the control of sRNAs encoded by the same gene. Further work is now needed to identify which specific DNA sequences regulate the abundance of sRNA molecules in plants and other organisms.
population (Hua et al., 2002(Hua et al., , 2003, and each hybrid in the population was hereafter referred to as an IMF2 for ease of description. An sRNA library was constructed using RNA extracted from flag leaf at the day of full expansion for each IMF2, and two biological replicates were obtained for each of the two parental lines and their F 1 hybrid, producing a total of 104 libraries. The read size of the raw sequencing data obtained using Illumina Hiseq2000 varied from 3 nt to 44 nt (Figure 1-figure supplement 1). Sequences of 18-26 nt in length that appeared to be the more abundant than others were kept for the analyses after filtering out low-quality reads and eliminating ones matching tRNAs, rRNAs, snRNAs, and snoRNAs ( Figure 1-figure supplement 2). The numbers of resulting reads varied from 14.52 million to 27.73 million per library (2.08 billion in total) with non-redundant reads ranging from 3.00 million to 6.86 million per library (0.52 billion in total) (Supplementary file 1 in Dryad [Wang et al., 2015], Table 1, Figure 1-figure supplement 2). The 24 nt sRNAs were the most numerous in both redundant and distinct reads (Figure 1-figure  supplement 2).
We mapped the reads to the SNP-replaced reference genomes of the parents ('Materials and methods'), with unique location allowing no mismatch. The reads could be divided into four categories: (1) approximately 82.28% of the sRNAs had identical sequences between parents, (2) 0.71% of the sRNAs had SNPs between the two parents, (3) 8.35% were only mapped to the Zhenshan 97 genome, and (4) 8.66% were specifically mapped to the Minghui 63 genome (Supplementary file 2 in Dryad [Wang et al., 2015]). A total of 53,613,739 unique sRNA sequences including ones with SNPs between the parents were identified by combining all the reads from 104 libraries ( Table 1). Approximately 84.7% of sRNAs were found in no more than 5 IMF2s, while only 0.13% were present in all 98 IMF2s ( Figure 1A).
The distributions of the sRNAs in different portions of the genome were not random. Approximately 51.1% of sRNAs originated from the 2-kb upstream and the genic regions of nontransposon genes ( Figure 1B). In addition, sRNAs occurred in high frequencies near the transcription start sites compared to other regions of the promoters (Figure 1-figure supplement 3). About 34.4% of sRNAs originated from intergenic regions, which account for no more than 20% of the whole genome length ( Figure 1B). About 43.9% of sRNAs were 24 nt, while 12.1% were 21 nt ( Figure 1C).
Different species of sRNAs differed in their origins of genomic regions. The genic regions of nontransposon genes held the highest number of 21 nt sRNAs, while 24 nt sRNAs were most common in the intergenic regions ( Figure 1D). sRNAs of 21 nt in genic regions were mostly derived from the exon of non-transposon genes, while 24 nt sRNAs in genic regions were mainly from the intron of nontransposon genes ( Figure 1E). The 24-nt sRNAs mainly consist of endogenous heterochromatic siRNAs (Axtell, 2013) and tend to be produced from repeats and transposable elements as well as intergenic regions (Chen, 2009). The 24-nt sRNAs are components of the epigenome that target the homologous genomic regions for de novo DNA methylation through RNA-directed DNA methylation to maintain genome stability by transcriptional gene silencing (Groszmann et al., 2011a(Groszmann et al., , 2013Pooggin, 2013).
Next, the abundance of each sRNA in a library was normalized to number of reads per millions (RPM) ('Materials and methods') to quantify the sRNA levels in each library, which were subject to analyses and comparisons. An sRNA with RPM value ≥0.6 was regarded as expressed, and an sRNA identified as expressed in more than 25 of the 98 IMF2s was regarded as an sRNA expression trait (s-trait). In this way, a total of 165,797 s-traits were recovered (Supplementary file 3 in Dryad    [Wang et al., 2015]). A total of 87.9% of s-traits were 24 nt ( Figure 1F). The s-traits mainly originated from the intergenic regions, 2-kb upstream, and genic regions of non-transposon genes ( Figure 1G).
Although the 2-kb upstream and the genic regions of non-transposon genes contained almost equal number of sRNAs ( Figure 1B), the number of s-traits derived from the 2-kb upstream of nontransposon genes was more than twice of the number of s-traits in genic regions of non-transposon genes ( Figure 1G). The distribution of different species s-traits in different genomic regions was similar to that of sRNAs ( Figure 1D,H). Although sRNAs of various lengths from genic regions were mostly from exons of non-transposon genes ( Figure 1E), the s-traits of different sizes were mainly derived from introns of non-transposon genes ( Figure 1I)

Expression correlations between sRNAs that originated from the same genes
To test whether s-traits originated from the same gene were transcribed together, the correlation coefficients between the expression values of s-traits originated from the same gene were calculated. A total of 3,739,873 correlation coefficients involving 15,078 genes and 96,914 s-traits were obtained, 2,593,503 (69.3%) and 507,528 (13.6%) of which were contributed by 2278 s-traits from genes LOC_Os03g01360 and 1008 s-traits from LOC_Os07g01240, respectively ( Figure 2). The correlation coefficients naturally fell into three classes: strong negative correlations, no correlation, and strong positive correlations with obvious dividing points at −0.3 and 0.3 (Figure 2A), which were far above 0.25, the threshold for statistical significance at p < 0.05 determined by simulation data. While 85.8% of these strong correlations (positive or negative) were contributed by s-traits originated from LOC_Os03g01360 ( Figure 2B), most of s-traits originated from gene LOC_Os07g01240 showed low (or no) correlations with each other ( Figure 2C). Although the expression values of s-traits originated from the same regions of the remaining 15,076 genes in the whole genome were mostly slightly positively correlated, the correlation coefficients were significantly distinct from that of the simulation of random data (Mann-Whitney U test, p-value <2.2e-16, Figure 2D), indicating various degrees of correlations. We found that the correlations were affected by the sizes of s-traits (Figure 2-figure supplement 1). Expression correlations between s-traits of the same size were stronger than that between s-traits of different sizes. In particular, the expression correlations between 21-nt sRNAs and 25-nt sRNAs from the same genes were very weak. The correlations between overlapped s-traits were slightly stronger than that between s-traits not overlapping with each other (  within each part were positively correlated with each other, while s-traits from the first part were mostly negatively correlated with those from the other two parts. The correlations between the expression levels of s-traits from the second parts and s-traits from the third parts were mostly positive. Thus, although very strong positive and negative correlations were found between sRNAs from the same genic regions of specific genes, this was not a common phenomenon. Taken together, this correlation analysis indicated that most of the sRNAs derived from the same genes were correlated with each other to various extents, implying that nearby sRNAs were possibly jointly transcribed. Consecutively overlapped sRNAs were found from introns of specific genes and intergenic regions, indicating that sRNAs were not necessarily jointly transcribed from the genes. Expression correlations between sRNAs that originated from the same sRNA cluster As many sRNAs may be derived from a single primary transcript, we defined sRNA clusters to quantify sRNA expression (Castel and Martienssen, 2013), based on our data of all 104 libraries. An sRNA island was defined as a genomic region composed of consecutive genomic positions matched by at least 30× sRNA read coverage. Nearby sRNA islands resided within 1000 nt from each other were merged and regarded as the same sRNA cluster. As a result, 80,362 sRNA clusters were recovered with an average size of 1927 bp (ranging from 51 bp to 47,021 bp) ( Figure 2-figure supplement 4, Supplementary file 4 in Dryad [Wang et al., 2015]). A total of 46,293 (57.6%) sRNA clusters overlapped with genic regions, 23,143 (28.8%) of which were entirely derived from genic regions. In addition, the distances between another 19,164 (23.8%) sRNA clusters and the genic regions were smaller than 1 kb. The abundance of each sRNA cluster was normalized using DEseq ('Materials and methods') (Anders and Huber, 2010). An sRNA cluster with normalized expression value ≥6 was regarded as expressed, and an sRNA cluster expressed in more than 25 of all the 98 IMF2s was regarded as an sRNA cluster expression trait (sc-trait). A total of 50,139 (62.4%) sRNA clusters were determined as sctraits, 34,762 of which were expressed in all of the 98 IMF2s.
We next calculated the correlation coefficients between the expression values of s-traits originating from the same sRNA clusters to infer whether they were transcribed together. A total of 4,282,587 correlation coefficients involving 21,571 sRNA clusters and 157,994 s-traits were obtained, 72.5% of which were contributed by s-traits from clusters chr03-261504-273325 and chr07-136331-141317 (Figure 2-figure supplement 5). These two sRNA clusters corresponded to the two genes, LOC_Os03g01360 and LOC_Os07g01240, mentioned above. Of all the correlations, 2,776,296 were relatively strong (above 0.3 or below −0.3 based on a threshold of 0.25 determined by simulation data with p <0.05). However, 81.5% of these strong correlations were contributed by the 2280 s-traits from cluster chr03-261504-273325 ( Thus, again very strong positive and negative expression correlations were only observed for sRNAs from certain sRNA clusters, while most sRNAs originating from the same sRNA clusters were only slightly positively correlated with each other. Expression correlations between sRNAs and their mother genes mRNA sequencing was conducted using the same samples as in the sRNA sequencing with one biological replicate for the IMF2 population and two biological replicates for each of the two parental lines and their F 1 hybrid, producing a total of 104 libraries. After removal of low-quality sequencing data, reads were mapped to the Nipponbare reference genome using TopHat ( The abundance of each mRNA in a library was normalized to the number of fragments per kilobase of transcript per million mapped fragments (FPKM) using Cufflinks (Trapnell et al., 2010) to quantify the expression levels of mRNAs in each library. An mRNA with FPKM value ≥1 was regarded as expressed; 35,819 (54.2%) mRNAs were expressed in none of the 98 IMF2s, while 16,849 (25.5%) were expressed in all the 98 IMF2s. An mRNA identified as expressed in more than 25 of the 98 IMF2s was regarded as an e-trait. In this way, a total of 24,987 e-traits were obtained from a total of 66,123 mRNAs in the whole genome.
For ease of description, we hereafter refer to genic sequences including both exons and introns that encoded the s-traits as 'mother genes'. To assess whether sRNAs and their mother genes were transcribed together, the correlation coefficients of the expression values of s-traits and their mother genes were calculated. A total of 77,640 correlation coefficients were obtained, involving 9968 mRNAs and 56,185 s-traits. Among all these correlations, 13,749 (17.7%) were strong correlations (above 0.3 or below −0.3, based on a threshold of 0.23 determined by simulation data with p < 0.05). However, 45.6% of these strong correlations were again contributed by gene LOC_Os03g01360 ( Figure 3A). Correlations involving LOC_Os07g01240 were mostly weak ( Figure 3B). The correlation coefficients involving rest of the genes in the whole genome were centered around zero ( Figure 3C).
In-depth analysis of s-traits from gene LOC_Os03g01360 revealed that the expression values of s-traits from the first part of LOC_Os03g01360 were positively correlated with two transcripts, LOC_Os03g01360.1 and LOC_Os03g01360.4, and negatively correlated with the other LOC_Os03g01360.2 ( Figure 3D). On the other hand, the expression values of most s-traits from the third part of LOC_Os03g01360 were negatively correlated with LOC_Os03g01360.1 and LOC_Os03g01360.4 but positively correlated with LOC_Os03g01360.2 ( Figure 3D). Whereas both negative and positive expression correlations were found between s-traits from the second part of LOC_Os03g01360 and these three transcripts ( Figure 3D).
Thus, although strong expression correlations, either positive or negative, were observed between specific sRNAs and their mother gene, the expression levels of most sRNAs were not correlated with their mother genes, suggesting independent transcription of sRNAs and mRNAs.
QTL analysis of s-traits, sc-traits, and e-traits using the IMF2 population  [Wang et al., 2015], 'Materials and methods') Yu et al., 2011), using composite interval mapping (CIM) in R/qtl with 1000 permutations (Haley and Knott, 1992;Broman and Speed, 2002;Manichaikul et al., 2009). With a false discovery rate (FDR) set at 5%, a total of 70,858 sQTLs were recovered for 66,649 s-traits (Supplementary file 7 in Dryad [Wang et al., 2015]). These sQTLs were classified as local-sQTLs and distant-sQTLs, which were also referred to as cisand trans-QTLs, respectively, in previous studies (Wang et al., , 2014, according to their locations relative to the s-traits. A local-sQTL indicated the existence of local functional polymorphism(s) that could influence the abundance of the s-trait, and a distant-sQTL meant the s-trait expression variation in the population was controlled by regulatory element(s) distant from the s-trait precursor sequence. For defining a local-sQTL, we adopted a 1.5 LOD-drop support interval of the corresponding sQTL or no more than 250 kb from the closest marker in a recombination sparse region. Otherwise, it was regarded as a distant-sQTL (Yvert et al., 2003;Morley et al., 2004;Keurentjes et al., 2007). Such classification resulted in 40,049 local-sQTLs (shown in the diagonal of Figure 4) and 30,809 distant-sQTLs (off-diagonal of Figure 4) (Supplementary file 7 in Dryad [Wang et al., 2015]). Expression variations of s-traits explained by local-sQTLs were substantially higher than that by distant-sQTLs ( Figure 4-figure supplement 3).
The LOD values for 80.4% of the local-sQTLs and 34.5% of distant-sQTLs were larger than 10.
Next, we performed QTL analysis for 50,139 sc-traits based on the same genetic map as in sQTL analysis using CIM in R/qtl with 1000 permutations. A total of 22,263 scQTLs including 11,476 local-scQTLs and 10,787 distant-sQTLs were recovered for 20,049 sc-traits utilizing the same definitions of localand distant-QTLs We also performed QTL analysis for the 24,987 e-traits based on the same genetic map, using the same algorithm and parameters as in the QTL analysis of s-traits and sc-traits. With an FDR set at 5%, a total of 6423 eQTLs were recovered for 6123 e-traits (Supplementary file 7 in Dryad [Wang et al., 2015]). These eQTLs were classified as local-eQTLs and distant-eQTLs according to the same rule applied in the sQTL analysis. As a result, 2964 local-eQTLs (shown in the diagonal of   probably due to the consecutive transcription of sRNAs and sRNA clusters, which was distinct from the transcription of mRNAs. In the extreme case, a total of 1660 s-traits were regulated by a single local-sQTL (Bin358) on chromosome 3 (Supplementary file 8 in Dryad [Wang et al., 2015]). Each of another five local-sQTLs (Bin359, Bin1183, Bin969, Bin286, and Bin166) explained the expression variations of more than 200 s-traits. The local-scQTL (Bin1183) with the largest effect regulated the expression of 93 sc-traits. Each of another 29 local-scQTLs was responsible for the expression variations of more than 30 sc-traits. In comparison, a total of 24 e-traits were regulated by Bin3, which is the local-eQTL explaining the expression variations of the largest number of e-traits. Distant-QTLs regulating many traits across all chromosomes were very common for s-traits and sc-traits but relatively rare for e-traits. A total of 15 bins were identified as distant-sQTLs, each of which regulated more than 200 s-traits; in one case, 514 s-traits were regulated by a single distant-sQTL (Bin1209). We observed three distant-scQTLs (Bin634, Bin635, and Bin636), each of which regulated more than 100 sc-traits, and another 21 distant-scQTLs, each of which explained the expression variations of more than 30 sc-traits. Whereas in case of mRNAs, only 9 bins were each responsible for the expression of more than 30 e-traits.
We further inspected the QTL results of 22,142 s-traits expressed (RPM >0) in only one of the two parents. In all, 11,194 and 10,948 s-traits were expressed only in the genome of Zhenshan 97 and Minghui 63, respectively. A total of 15,037 QTLs, 11,781 of which were local-sQTLs, were obtained for 14,139 of these s-traits. This proportion was significantly higher than the whole genome average level (Fisher's exact test, p value <2.2e-16), implying that the expression variations of these s-traits were mostly controlled by local polymorphisms. Examples of s-traits expressed in only one of the two parents were shown in Figure 4-figure supplement 8.

Identification of QTL hotspots
We next identified bins enriched with traits or QTLs based on the relative densities of the distributions in the genome ('Materials and methods'). 40 bins were identified as hotspots for s-traits (Supplementary file 8 in Dryad [Wang et al., 2015], Figure 5). A total of 132 and 129 bins were identified as local-sQTL and distant-sQTL hotspots, respectively (Supplementary file 8 in Dryad [Wang et al., 2015], Figure 5). 21 bins were both s-trait and local-sQTL hotspots, while 64 bins were both local-sQTL and distant-sQTL hotspots (Supplementary file 8 in Dryad [Wang et al., 2015], Figure 5); 123 were local-scQTL hotspots and 149 distant-scQTL hotspots (Supplementary file 8 in Dryad [Wang et al., 2015]). In all, 76 bins were local-QTL hotspots and 80 bins were distant-QTL hotspots for both s-traits and sc-traits. On the other hand, 116 and 126 bins were identified as localand distant-eQTL hotspots. The identified hotspots of sQTLs and scQTLs were quite similar but distinct from that of eQTLs ( Figure 5). Totally, 33 bins were local-QTL hotspots and 36 bins were distant-QTL hotspots for both s-traits and e-traits. Consecutive distant-QTL hotspots in adjacent genomic regions could explain the expression variations of a large number of s-traits, sc-traits, or e-traits (Supplementary file 8 in Dryad [Wang et al., 2015]). Four consecutive distant-QTL hotspots on chromosome 4, 5, 6, and 9 were identified for both s-traits and sc-traits ( Figure 5A,B). The genomic positions of consecutive distant-eQTLs hotspots were quite distinct from that of sQTL or scQTLs ( Figure 5). A cluster of the consecutive distant-eQTL hotspots (chr04: 1.713 Mb-2.495 Mb) was located close to one end of chromosome 4, while another cluster of consecutive distant-QTL hotspots (chr04: 21.997 Mb-23.342 Mb) for both s-traits and sc-traits was about 10 Mb distant from the other end of chromosome 4. A cluster of consecutive distant-eQTL hotspots (chr07: 28.058 Mb-28.557 Mb) was located close to one end of chromosome 7, while no apparent consecutive distant-QTL hotspots for s-traits and sctraits were on chromosome 7. A total of 12 bins (Bin1199-Bin1210) on chromosome 9 were identified as consecutive distant-QTL hotspots for both s-traits and sc-traits but not for mRNA e-traits. Only the consecutive distant-QTL hotspot on chromosome 6 (chr06: 2.847 Mb-4.023 Mb) was responsible for the expression variations of s-traits, sc-traits, and e-traits.     The s-trait hotspot with the highest s-trait density was Bin969 (chr07: 0.121 Mb-0.155 Mb), holding 1025 s-traits, of which 66.6% obtained at least one sQTL ( Figure 5A, Supplementary file 7 in Dryad [Wang et al., 2015]). Bin969 was also a local-sQTL hotspot with the third highest local-sQTL density. The local-sQTL hotspot with the highest local-sQTL density was Bin358 (chr03: 0.266 Mb-0.368 Mb), which held 2338 s-traits, and an overwhelming majority of these s-traits (2245) had at least one sQTL ( Figure 5A, Supplementary file 8 in Dryad [Wang et al., 2015]). sQTL hotspots and sRNA biogenesis genes sRNAs are usually generated by activities of Dicer-like proteins, Argonautes, and RNA-dependent RNA polymerases (RDRs) (Kapoor et al., 2008;Axtell, 2013). In rice, there are 43 transcripts of 32 genes for sRNA biogenesis including eight encoding Dicer-like proteins (OsDCLs), 19 Argonautes (OsAGOs), and 5 OsRDRs (Supplementary file 9 in Dryad [Wang et al., 2015]) (Kapoor et al., 2008). In this study, 26 transcripts for 20 genes were detected (FPKM ≥1) in at least one of the parents. Six transcripts of five genes (OsDCL2a, OsDCL2b, OsAGO1d, OsAGO13, and OsAGO18) were significantly differentially expressed between Zhenshan 97 and Minghui 63 (Supplementary file 9 in Dryad [Wang et al., 2015]). Comparison of DNA sequences between Zhenshan 97 and Minghui 63 revealed that 23 of the 32 sRNA biogenesis genes had polymorphisms of either SNPs or indels ( Figure 5-figure supplement 1, Supplementary file 9 in Dryad [Wang et al., 2015]).
The expression correlations between all the e-traits in the distant-sQTL hotspots (Bin632-Bin637) on chromosome 4 and all the s-traits whose expression variations were controlled by QTLs in these bins were calculated. Most of these correlations including those between the e-traits of OsRDR2 and all the s-traits were weak ( Figure 5-figure supplement 2A,B). For s-traits regulated by QTLs in the consecutive distant-sQTL hotspots (Bin1199-Bin1210) on chromosome 9 ( Figure 5A), most of the correlations between the s-traits and the e-traits in these bins were weak, while the correlations between s-traits and the e-traits of OsDCL2b were relatively strong (Figure 5-figure supplement 2C). About 79.7%, 17.4%, and 1.2% of the s-traits whose expression levels were regulated by the consecutive distant-sQTL hotspots harboring OsRDR2 on chromosome 4 were 24 nt, 23 nt, and 22 nt, respectively. On the other hand, 51.0% of the s-traits regulated by the consecutive distant-sQTL hotspots harboring OsDCL2b on chromosome 9 were 22 nt while 35.8% were 24 nt. For s-traits regulated by distant-QTLs in Bin448-Bin453 (containing OsDCL2a), 40.8% of them were 21 nt and 24.4% were 22 nt and 24.1% were 24 nt.
Five local-eQTLs and six distant-eQTLs for nine transcripts of these sRNA biogenesis genes were identified (Supplementary file 10 in Dryad [Wang et al., 2015]). We also compared the genomic positions of these eQTLs with the distant-sQTL hotspots. A distant-eQTL explaining the expression variation of OsDCL2a was located in Bin1207, which was in the consecutive distant-sQTL hotspots (Bin1199-Bin1210) on chromosome 9 (Supplementary file 10 in Dryad [Wang et al., 2015]). The expression correlations between s-traits regulated by QTLs in these bins and the e-traits of OsDCL2a were mostly strong ( Figure 5-figure supplement 2D). The functional annotations and DNA polymorphisms between the two parents for all the genes in the distant-sQTL hotspots harboring OsRDR2, OsDCL2b, and OsDCL2a were listed in Supplementary file 11 in Dryad (Wang et al., 2015).
The distant-sQTL hotspots on chromosome 5 (Bin770-Bin777) and 6 (Bin850-Bin860) were also surveyed. Approximately 98.4% and 90.5% of all the s-traits regulated by the distant-sQTL hotspots on chromosome 5 and chromosome 6 were 24 nt. The functional annotations of genes in the distant-sQTL hotspots on chromosome 5 were investigated, and no apparent relationship to the biogenesis of sRNA was found, implicating the existence of unknown mechanisms of sRNA biogenesis and/or regulation (Supplementary file 12 in Dryad [Wang et al., 2015]). A gene annotated as RNA Pol II subunit Rpb7 was found in the distant-sQTL hotspots on chromosome 6 (Supplementary file 12 in Dryad [Wang et al., 2015]). Study in yeast reveals that Rpb7 has a specific role in pre-siRNA transcription (Djupedal et al., 2005).

Genetic co-regulation of s-traits originating from the same mother genes
To test whether the expression of s-traits originating from the same mother gene was regulated by the same genetic mechanism, the genomic positions of QTLs for s-traits originated from the same mother gene were compared with each other. S-traits originated from LOC_Os03g01360 and LOC_Os07g01240 were not included in this calculation. We calculated correlations for the following comparisons: 97,040 pairs of s-traits that were both controlled by local-sQTLs ( Figure 6A), 36,123 pairs of s-traits that were both controlled by distant-sQTLs ( Figure 6B), and 51,081 pairs of s-traits in which one was controlled by local-sQTL and the other one by distant-sQTL ( Figure 6C). For a pair of s-traits originated from the same gene controlled by local-sQTL, the correlation coefficients between them were mostly strong ( Figure 6A). The correlation coefficients between s-traits controlled by the same or nearby distant-sQTLs were also strong ( Figure 6B), while the correlation coefficients between s-traits controlled by distant-sQTLs distant from each other were relatively weak ( Figure 6B), which represented Figure 6. Genetic co-regulation between s-traits originating from the same mother genes. (A) Expression correlations between pairs of s-traits originating from the same mother genes and both regulated by local-sQTLs. (B) Expression correlations between pairs of s-traits originating from the same mother genes and both regulated by distant-sQTLs. Green: distance between the two distant-sQTLs was within two 2 Mb; red: distance between the distant-sQTLs was larger than 2 Mb. (C) Expression correlations between pairs of s-traits from the same mother genes with one regulated by distant-sQTL and another one regulated by local-sQTL. (D) Expression correlations between pairs of s-traits from the whole genome regulated by distant-QTLs located in ≤2 Mb regions. DOI: 10.7554/eLife.03913.031 the majority (75.0%) of the s-traits pairs. Most of the correlations between the s-traits of localand distant-regulation were also quite high ( Figure 6C). For comparison, we also calculated correlations for all pairs of s-traits from the entire genome not necessarily the same genes that were regulated by the same sQTLs; their expression correlations were also mostly strong, either positive or negative ( Figure 6D), indicating that the strong correlations between the s-traits had common genetic basis.

Genetic co-regulation of s-traits and their mother genes
To infer the commonality of the genetic controls of the expression variations for sRNAs and their mother genes, we compared the map positions of the QTLs for sRNAs and their mother genes. Again s-traits and e-traits originating from LOC_Os03g01360 and LOC_Os07g01240 were not included in this calculation. We calculated correlations for the following comparisons: 4018 s-traits and 1084 corresponding mother genes that were both controlled by local-QTLs ( Figure 7A), 1721 s-traits and 695 mother genes both regulated by distant-QTLs ( Figure 7B), 1840 s-traits controlled by local-sQTLs and 544 corresponding mother genes regulated by distant-eQTLs ( Figure 7C), and 1110 s-traits and 582 mother genes controlled by distant-sQTLs and local-eQTLs, respectively ( Figure 7D). The expression correlations between s-traits and their mother genes that were both controlled by local-QTLs or the same and/or nearby distant-QTLs were mostly strong ( Figure 7A,B). On the other hand, the correlations between s-traits and their mother genes that were regulated by different/distant-QTLs were relatively weak ( Figure 7B-D). For all the pairs of e-traits and s-traits regulated by the same QTL position with the s-traits not necessarily from the genomic region of the e-traits, their expression correlations were mostly strong ( Figure 7E), again indicating a common genetic basis.

The genetic effects of the sQTLs and scQTLs
There were three genotypes at each polymorphic locus in the IMF2 population: AA, Aa, and aa, where A represents the allele from one parent and a is the allele from the other parent. This allowed Figure 7. Genetic co-regulation between s-traits and their mother genes. (A) Expression correlations between s-traits and their mother genes that were both controlled by local-QTLs. (B) Expression correlations between s-traits regulated by distant-sQTLs and their mother genes also controlled by distant-eQTLs. Green, the distance between distant-QTLs for s-traits and the mother genes is within 2 Mb. Red, the distance between distant-QTLs is larger than 2 Mb. (C) Expression correlations between s-traits regulated by local-sQTLs and their mother genes controlled by distant-eQTLs. (D) Expression correlations between s-traits regulated by distant-sQTLs and their mother genes controlled by local-eQTLs. (E) Expression correlations between s-traits and e-traits from the whole genome regulated by distant-QTLs located in ≤2 Mb. DOI: 10.7554/eLife.03913.032 estimating of additive and dominant genetic effects for each of the sQTLs. Additive effect for sQTL was half of the sRNA expression difference between the two homozygotes, and dominant effect was the difference between the heterozygote and the average of the two homozygotes.
Of all the s-traits analyzed, s-traits regulated by 35,235 (49.7%) of the sQTLs exhibited higher expression levels in the Zhenshan 97 genotype than Minghui 63 genotype (Supplementary file 13 in Dryad [Wang et al., 2015]). We identified sQTLs with significant dominance effects using h-test (1000 permutations, p < 0.05) (Huang et al., 2006;Zhou et al., 2012). The analysis detected 23,018 sQTLs (32.5%) with significant dominant effects, of which the majority (71.8%) showed negative dominance such that the levels of the sRNAs were lower in heterozygotes than the mid-parent values (Supplementary file 13 in Dryad [Wang et al., 2015]). Moreover, 3842 (5.4%) sQTLs exhibited negative overdominance, in which sRNA levels of the heterozygotes were lower than both homozygotes. Approximately 92.4% of the sQTLs showing significant overdominance effects, either positive or negative, were distant-sQTLs.

A closer look of LOC_Os03g01360 and LOC_Os07g01240
We took a closer look at the two loci that contributed extremely high numbers of sRNAs. Approximately 97.4% (2278) of the s-traits produced by Bin358 were encoded by the 2-kb upstream and genic region of the locus LOC_Os03g01360, where the sRNA sequences showed extensive and consecutive overlapping ( Figure 8A). LOC_Os03g01360 did not have homologs in Arabidopsis but had a homolog in maize with unknown function (http://rice.plantbiology.msu.edu/cgi-bin/ ORF_infopage.cgi?orf=LOC_Os03g01360). Three isoforms of this gene were significantly differentially expressed as measured in FPKM between Zhenshan 97 and Minghui 63 (Figure 8-figure  supplement 1A). The sRNAs of Zhenshan 97 were mainly derived from the second and the third part of LOC_Os03g01360, while sRNAs of Minghui 63 originated from all the three parts with the expression level from the third part lower than that of sRNAs of Zhenshan 97 ( Figure 8A). The expression levels of LOC_Os03g01360.1 and LOC_Os03g01360.4 in the genome of Minghui 63 were higher than Zhenshan 97 (Figure 8-figure supplement 1A). On the contrary, LOC_Os03g01360.2 was higher expressed in Zhenshan 97 than in Minghui 63. The expression of all the three transcripts in the hybrid was down-regulated compared with the mid-parent value (Figure 8-figure supplement  1A). We checked the cytosine methylation levels in this region, and the genomic DNA showed high cytosine methylation, especially from the start to the third exon ( Figure 8A, 'Materials and methods'). This high methylation might be due to the enrichment of sRNAs resulting in RNA-directed DNA methylation. In the differentially expressed region of sRNAs between Zhenshan 97 and Minghui 63, such as the third exon according to the annotated gene model, Minghui 63 had lower methylation and higher mRNA transcript level than Zhenshan 97, suggesting a negative correlation between the transcript level and methylation. The expression variation of three isoforms of LOC_Os03g01360 in the IMF2 population was controlled by local-eQTLs (Supplementary file 7 in Dryad [Wang et al., 2015]). At least one sQTL each was detected for 2200 of the 2278 s-traits located in the gene LOC_Os03g01360 region, of which 2181 (84.5%) were local-sQTLs (Supplementary file 7 in Dryad [Wang et al., 2015]). The expression variation of different transcripts of LOC_Os03g01360 probably corresponded to sRNAs from specific parts of the genic region and is regulated by the same genetic factor. The majority of distant-sQTLs regulating the expression of s-traits originating from LOC_Os03g01360 was for 21 and 22 nt s-traits and was attributed to Bin442-Bin453 on chromosome 3.
The sRNAs in Bin969 were almost entirely derived from the intron of LOC_Os07g01240, of which 617 (61.2%) were 21-nt and 22-nt sRNAs and 280 (27.8%) were 24-nt sRNAs ( Figure 8B). LOC_Os07g01240 had a homolog in Arabidopsis with unknown function and in Maize annotated as uncharacterized GPI-anchored protein (http://rice.plantbiology.msu.edu/cgi-bin/ORF_infopage.cgi? orf=LOC_Os07g01240) and was reported to modulate rice leaf rolling by regulating the formation of bulliform cells (Xiang et al., 2012). In all, 677 s-traits mapped 841 sQTLs, of which 514 were local-sQTLs (Supplementary file 7 in Dryad [Wang et al., 2015]). However, no significant expression difference of the two isoforms of LOC_Os07g01240 was detected between the parents  Figure 8. continued on next page ( Figure 8-figure supplement 1B), and thus no eQTL for the e-traits was detected. The majority of the 327 distant-sQTLs was for 21 and 22 nt s-traits and was again attributed to Bin442-Bin453 on chromosome 3. Some of the distant-sQTLs for 24 nt s-traits were attributed to Bin770-Bin775 on chromosome 5. The expression correlation between e-traits of LOC_Os07g01240 and s-traits originating from LOC_Os07g01240 was weak ( Figure 3B). This result was similar to that of Tong et al. (2013) who showed that intronic mRNAs had a distinct expression pattern from their host genes, suggesting that intronic sRNAs might have genetic regulatory mechanisms independent of their mother genes.

Discussion
The variation of sRNA abundance in a population depends on a range of factors: the levels of transcription of the precursors that is influenced by local elements mostly residing in the promoter regions of the sRNA itself; the mother gene or long non-coding RNAs where the sRNA is produced; polymorphisms in sequence influencing their biogenesis; and variation in distant regulatory factors located distantly from where the sRNA is generated. Our results indicated that almost one third of the detected local-sQTLs exhibited much larger effects than distant-sQTLs, consistent with the studies for eQTL of mRNA expression (Kliebenstein, 2009), suggesting that local elements are a major regulatory factor for sRNA variation. Local-QTLs regulating the expression of large numbers of traits were observed for s-traits but not for e-traits, resulting from the enrichment of s-traits in the region produced by the extensive and consecutive overlapping transcripts from the mother genes. Hotspots were found for both distant-QTLs of mRNA e-traits and sRNA s-traits, both of which could influence the expression variations of a large number of traits.
Our results showed that quantitative variation for a large portion of sRNAs transcribed from genic regions did not share the genetic control of the expression with the corresponding mother genes, although by common sense those sRNAs have to be transcribed along and regulated by the promoters of the mother genes. Similar results were reported in previous studies of miRNAs and their host genes, which were regulated or processed independently from their respective regulatory elements (Siddle et al., 2014). The results of Monteys et al. (2010) and Ozsolak et al. (2008) suggested that the transcription of sRNAs may also be regulated by their own promoters, independent of the mother genes. The widespread occurrence of such independent genetic controls between the sRNAs and the mother genes suggested that independent transcription of the sRNAs might be a common phenomenon, rather than only special circumstances. Moreover, the large number of distant-sQTLs that were not collocated with the distant-eQTLs for the corresponding mother genes supported the independent regulatory basis of the transcription of the sRNAs.
Our results showed that some of the sRNA biogenesis genes such as DCLs, AGOs, and RDRs were probably responsible for the quantitative variation of a large number of sRNAs. OsRDR2 was found in a region of consecutive distant-sQTL hotspots explaining the expression variation of many sRNAs, most of which were 24 nt. This was also a region of consecutive distant-scQTL hotspots with the highest number of distant-scQTLs. OsDCL2b was also found in consecutive distant-sQTL hotspots regulating the expression of sRNAs, most of which were 22 and 24 nt. Although OsDCL2a and OsDCL1c were not in distant-sQTL hotspots due to lack of local recombination in that region, they were also in bins regulating the expression of a large number of s-traits. This is in accordance with the reports that RDRs function on the upstream of DCLs in the process of sRNA biogenesis (Chapman and Carrington, 2007). Studies in sRNA pathways in Arabidopsis showed that DCL2 is responsible for the synthesis of 22 nt or 24 nt siRNAs, while RDR2 functions in the production of endogenous 24 nt siRNAs and the conversion of ssRNA template into dsRNAs that serve as substrates for DCLs (Arikit et al., 2013), which is in good agreement with the sQTLs found in this study. However, it should also be noted that these genes were associated with s-QTLs for only a small portions of the s-traits, while the quantitative variation of the sRNA abundance for majority of the s-traits was independent of the sRNA biogenesis genes.
One of the most interesting finding perhaps concerns the sRNAs from the two loci, LOC_Os03g01360 and LOC_Os07g01240, and their regulation patterns. Although both loci produced thousands of sRNAs, which were the most numerous in the genome, they showed sharp contrast in where the sRNAs were generated and the regulatory mechanisms with which the sRNAs were produced. sRNAs from LOC_Os03g01360 mostly originated from the 2-kb upstream and genic regions and were tightly co-regulated with each other and also with the mother gene. By contrast, sRNAs from LOC_Os07g01240 were mostly produced in the intronic region and loosely co-regulated with each other and not co-regulated with the mother gene; in fact, the s-trait variation and sQTLs were detected even without the expression variation of the mother gene. Since neither of the genes has been previously identified as related to the production and function of sRNAs, their roles in sRNA biogenesis warrant further investigation.
Another noticeable finding is the widespread negative dominance of sRNA levels detected in the IMF2 population such that heterozygotes had lower level of the sRNAs than the means of the two homozygotes (negative partial dominance) or the lower homozygote (negative overdominance). Such negative dominance was also observed in comparative sRNA profiling of hybrids relative to the parents in crosses of rice and Arabidopsis (Groszmann et al., 2011b(Groszmann et al., , 2013He et al., 2013), revealing a predominant-negative regulation of siRNA expression in hybrids (He et al., 2010;Chodavarapu et al., 2012). Our genetic analysis revealed that the sQTLs showing negative dominance of the sRNA levels are mostly distant-sQTLs, indicating regulation by distant elements likely at the transcriptional level. Studies of transcript levels of genes (mRNAs) in the hybrid against the parents in the same rice cross also revealed more negative dominance than positive dominance, also indicating down-regulation in the hybrid relative to the parents (Huang et al., 2006). How the downregulation of these two classes of transcripts was related to each other and how the two classes of down-regulation are related to the hybrid performance present great challenges for future studies.

Plant materials and growth conditions
The plant materials consisted of an IMF2 population of 98 hybrids produced by paired crosses of 196 RILs (Supplementary file 14 in Dryad [Wang et al., 2015]), the two parental lines Zhenshan 97 and Minghui 63 and their hybrid. The plants were grown under normal agricultural conditions at experimental farm of Huazhong Agricultural University in the rice-growing season (May to September) in Wuhan, China. A flag leaf at the day of full expansion from each of three random plants per replicate was harvested between 17:00 and 18:00 for library construction.

Library construction and sequencing
Total RNA was isolated from the leaf tissue using TRIzol reagent (Invitrogen, Waltham, Massachusetts), the process to convert total RNA into template suitable for high throughput DNA sequencing of mRNA-seq and sRNA-seq libraries using Sample Preparation Kit (Illumina, San Diego, California) followed the manufacturer's guide. Bisulfite sequencing (WGBS) libraries of the parents and the hybrid were made from genomic DNA isolated from the same leaf tissues as used for RNA-seq libraries. Sequencing was performed on Illumina HiSeq 2000 at BGI (http://www.genomics.cn/index, Shenzhen, China) (50 bp single end for mRNA-seq and 100 bp pair end for BS-seq).
For sRNA analysis, all filtered sRNA-seq reads from Zhenshan 97 libraries were specifically mapped to the SNPs-replaced Zhenshan 97 genome and reads from Minghui 63 libraries were mapped to SNPs-replaced Minghui 63 genome. The sRNAs from heterozygous materials of F 1 and IMF2 were simultaneously mapped to the Zhenshan 97-and Minghui 63-replaced genomes. Bowtie (Langmead et al., 2009) was used to align short reads to each genome at unique genome location with no mismatch allowed. The reads from heterozygous materials could be divided into three groups: reads specifically mapped to Zhenshan 97, reads only mapped to Minghui 63 genome, and reads mapped to both Zhenshan 97 and Minghui 63 at the same position. In the third class, the sequences of the sRNAs from a monomorphic site were indistinguishable between the parents and thus obtained one count in quantitation of the abundance, whereas the two counts of the sRNAs from a polymorphic site were put together.

Quantification of sRNA or sRNA cluster expression level
The expression level of an sRNA in a specific library was defined as the number of this sRNA divided by the total number (in millions) of genome-mapped sRNAs in this library, which was designated as 'RPM'. The R package DESeq (Anders and Huber, 2010) was used to quantify sRNA cluster expression level. The number of sRNA reads in each sRNA cluster for each sample was calculated and integrated as a count table, with each line representing an sRNA cluster and each column representing a sample. Then, the effective library size for each sample was estimated using the 'estimateSizeFactors' function in the DESeq package. Each column of the count table was divided by the corresponding library size to get the normalized read count, which was regarded as the expression level of the sRNA cluster.

Processing of mRNA sequencing data
The removal of poor-quality reads for mRNA-seq reads was done in the same way as sRNA analysis. The sequences from 104 libraries were mapped to the O. sativa ssp. japonica (cv. Nipponbare) version 7 reference genome using TopHat ) with default parameters. Cufflinks (Trapnell et al., 2010) were utilized to estimate gene expression levels according to the Nipponbare version 7 reference annotation.

The analysis of bisulfite sequencing
For bisulfite sequencing, trimmomatic (Lohse et al., 2012) was used to remove low-quality reads. Bismark (Krueger and Andrews, 2011) was performed to align bisulfite-treated reads to the SNPreplaced genomes, allowing no mismatch in the seed of 40 nucleotides and up to two good alignments. This bisulfate mapping tool aims to find unique alignment through running four alignment processes simultaneously (Krueger and Andrews, 2011). Then, the de-duplication tool provided by Bismark was applied to remove potential PCR duplicates. Methylation calls were extracted for every single cytosine analyzed depending on its context (CpG, CHG, or CHH).

QTL analysis
The ultrahigh-density bin map constructed by genotyping the RILs with population sequencing Yu et al., 2011) was used. The 1568 bin genotypes of each cross in the IMF2 population were deduced from the parental genotypes (Supplementary file 6 in Dryad [Wang et al., 2015], Figure 4-figure supplement 2). CIM in R/qtl (Haley and Knott, 1992;Broman and Speed, 2002;Manichaikul et al., 2009) was employed to map QTLs with 1000 permutations. Additive and dominant effects were decided by 'effectscan' function in R/qtl. Variation explained by the QTL was determined using the linear QTL model as described by Yu et al. (2011). The same genetic map and program parameters were used in the QTL analysis for s-traits, sc-traits, and e-traits.

Definition of QTL and trait hotspots
The density of s-trait and sQTL was defined as the number of s-traits and sQTLs in each bin divided by the bin size (Mb), respectively. The density of sc-traits, scQTLs, e-traits, and eQTLs was calculated in the same way. Bins with s-trait density larger than three times of the whole genome average level were defined as s-trait hotspots, while bins with sQTL density higher than six times of the whole genome average level were designated as sQTL hotspots. The process of identification of e-traits and eQTL hotspots was the same as that of s-traits and sQTLs. The definition of scQTL hotspots was identical to sQTL hotspots, while sc-traits hotspots were defined as bins with sc-traits density higher than the twice of the whole genome average. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Additional information
Author contributions JW, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article; WY, Analysis and interpretation of data, Drafting or revising the article; DZ, Acquisition of data; WX, Analysis and interpretation of data; QZ, Conception and design, Analysis and interpretation of data, Drafting or revising the article

Additional files
Major dataset The following dataset was generated: Author (