Discovering and Constructing ceRNA-miRNA-Target Gene Regulatory Networks during Anther Development in Maize

The “competing endogenous RNA (ceRNA) hypothesis” has recently been proposed for a new type of gene regulatory model in many organisms. Anther development is a crucial biological process in plant reproduction, and its gene regulatory network (GRN) has been gradually revealed during the past two decades. However, it is still unknown whether ceRNAs contribute to anther development and sexual reproduction in plants. We performed RNA and small RNA sequencing of anther tissues sampled at three developmental stages in two maize lines. A total of 28,233 stably transcribed loci, 61 known and 51 potentially novel microRNAs (miRNAs) were identified from the transcriptomes. Predicted ceRNAs and target genes were found to conserve in sequences of recognition sites where their corresponding miRNAs bound. We then reconstructed 79 ceRNA-miRNA-target gene regulatory networks consisting of 51 known miRNAs, 28 potentially novel miRNAs, 619 ceRNA-miRNA pairs, and 869 miRNA-target gene pairs. More than half of the regulation pairs showed significant negative correlations at transcriptional levels. Several well-studied miRNA-target gene pairs associated with plant flower development were located in some networks, including miR156-SPL, miR159-MYB, miR160-ARF, miR164-NAC, miR172-AP2, and miR319-TCP pairs. Six target genes in the networks were found to be orthologs of functionally confirmed genes participating in anther development in plants. Our results provide an insight that the ceRNA-miRNA-target gene regulatory networks likely contribute to anther development in maize. Further functional studies on a number of ceRNAs, miRNAs, and target genes will facilitate our deep understanding on mechanisms of anther development and sexual plants reproduction.


Introduction
Anther development is a crucial biological process for male fertility and sexual reproduction in plants. During the past decades, many genic male sterility (GMS) genes in this process and their underlying molecular mechanisms have been identified and studied in the model plant Arabidopsis as well as important crops such as rice, maize, wheat, and soybean [1][2][3][4]. In general, anther development is a complex process that encompasses many key stages from floral identity to the production of mature pollen grains, during which GMS genes involved in signal transduction, cell division, apoptosis, metabolism, and transport are temporally and spatially activated or repressed by finely-tuned regulatory pathways [1,[5][6][7]. All the GMS genes investigated previously, and their regulatory relationships, together comprise a gene regulatory network (GRN) of anther development that mainly consists of transcription factors (TFs) and their regulated target genes. In addition, miRNAs have also been reported to be involved in plant flowering and the subsequent anther developmental pathways by repressing the transcriptional or translational activities of target genes in GRNs [8][9][10]. Therefore, gene regulatory models in anther development mainly include two types: (1) trans-acting factors that interact with cis-regulatory elements, and (2) miRNA-mediated repression of target gene expression.
The intrinsic content of a GRN is the interactions among proteins, RNAs, and DNA motifs in the cell. Recently, the "ceRNA hypothesis" has been proposed as a novel type of gene regulatory model based on interactions between RNAs [11]. As endogenously transcribed RNAs, ceRNAs could function to absorb matching miRNAs like sponges (thus ceRNAs could be named as miRNA sponges as well) via miRNA response elements (MREs) and ultimately de-repress the transcriptional or/and translational limitations acting on miRNA target genes. A ceRNA MRE interacts with its corresponding mature miRNA sequence via an incompletely complementary manner, different from the interaction at target sites between miRNAs and their target genes. Initially, miRNA sponges were designed as artificial molecules to inhibit the functions of a corresponding miRNA family [12]. Meanwhile, an endogenous "target mimicry", originally annotated as a long noncoding RNA (lncRNA), IPS1, was first discovered and identified in Arabidopsis, and had an important function in regulating phosphate content by sponging miR-399 that controlled the target gene PHO2 [13]. This is a novel mode of post-transcriptional regulation, namely, miRNA (e.g., miR-399) transcription levels can be repressed by other RNA molecules (e.g., IPS1) and the expression of miRNA target gene (e.g., PHO2) is regulated. In subsequent studies, more ceRNAs originating from lncRNAs have been identified and found to be involved in the stress response and developmental processes in animals, and even play important roles in cancer [14][15][16].
Except lncRNAs, endogenous transcripts of pseudogenes, transposable elements (TEs), simple sequence repeats (SSRs), and circular RNAs (circRNAs), which are considered to be transcripts lacking protein-coding potential, have been reported to have ceRNA functions in vivo to control transcriptional levels of miRNAs and contribute to survival and development of diverse organisms [17][18][19][20]. Similarly, transcripts of protein-coding genes (coding genes) have also been found to have ceRNA functions [21]. In addition, a recent study from the evolutionary perspective supports the notion that ceRNA functions may be a common feature of pseudogenes [22]. These findings suggest that ceRNAs are important regulators in the normal developmental and stress-response processes of organisms. More importantly, it has been shown that nearly all types of relatively long transcripts in transcriptomes (e.g., pseudogenes, lncRNAs, TEs, and coding genes), but not small RNAs (rRNAs, tRNAs, snRNAs, and snoRNAs), have the possibility to function as ceRNAs [23][24][25].
Following the initial burst of ceRNA case studies, genome-wide identification and analysis of ceRNAs have been reported in plants and animals [26][27][28], and several databases of ceRNAs have been established based on experimental results and computational analyses [29][30][31]. These case studies and databases provide valuable information for revealing gene regulatory pathways relevant to ceRNA. Notably, a recent study on gastric cancer revealed a network-like ceRNA regulatory pathway including two ceRNAs, two miRNAs, and one target gene [16]. These results indicate that interactions among ceRNAs, their sponging miRNAs, and miRNA target genes in transcriptomes could comprise a ceRNA-miRNA-target gene regulatory network [23]. Several ceRNA-miRNA-target gene regulatory networks have been reconstructed in human cancers by using computational approaches, while bioinformatics-based studies on ceRNA profiles and their regulatory networks are still fewer in plants.
Several reported studies have described miRNA regulatory pathways in plant flower development or sexual reproduction [32][33][34][35]. Nevertheless, few studies have systematically introduced ceRNAs and their functions in the ceRNA-miRNA-target gene regulatory network controlling anther development. Although ceRNAs are novel players in post-transcriptional regulation, their potential roles in anther development are largely unknown. In this study, we identified and analyzed maize anther development-related ceRNAs by using RNA and small RNA transcriptome data in two maize inbred lines (lines L7 and L30) during three anther developmental stages (S8, S9, and S10). We then reconstructed ceRNA-miRNA-target gene regulatory networks in maize anther and found that many target genes and miRNA-target gene pairs in the networks participated in maize anther development based on the previous reports. These findings not only extend our understanding on maize anther GRNs by introducing ceRNA regulators, but also indicate that the ceRNA-miRNA-target gene regulatory networks may function in regulating anther development and sexual reproduction in plants.

The Landscape of Transcribed Loci in Maize Anther Development
To perform genome-wide identification of ceRNAs and investigate their potential functions in maize anther development, transcribed loci should first be identified in transcriptomes. We constructed and sequenced RNA-sequencing (RNA-seq) libraries and their corresponding small RNA-seq libraries by using anther tissues sampled at three developmental stages (S8, meiosis stage; S9, microspore stage; S10, vacuolated microspore stage) ( Figure 1). To identify stably transcribed loci and construct conserved gene regulatory networks in developing maize anther, transcriptomes of two maize lines with Oh43 genetic background, lines L7 (6 samples) and L30 (5 samples), were sequenced respectively and only their shared transcribed loci were used in the analyses (Table S1). When identifying transcribed loci in each line, we chose two thresholds including (1) the transcribed loci should have relatively high transcription levels, and (2) their transcripts should have intact sequences. By mapping the RNA-seq transcriptome data obtained above to the maize B73 reference genome sequence (version AGPv4), we identified 142,328 transcripts corresponding to 50,342 transcribed loci, with an average of 2.83 transcripts per transcribed locus. In the following analysis, the transcribed loci with transcriptional levels >0.5, namely, read counts per kilobase per million mapped reads (RPKM) >0.5, and transcribed lengths >200 nt, were used to avoid transcript noise in the RNA-seq data, which resulted in 33,347 and 31,455 transcribed loci being selected from lines L7 and L30, respectively ( Figure 1a). Based on gene models referring to public databases such as MaizeGDB, NCBI, and Ensembl, and computational annotation information produced in this study, the stably transcribed loci were annotated and divided into five types including coding genes, pseudogenes, lncRNAs, TEs, and other type (Figure 1b). Among these stably transcribed loci, coding genes were the predominant group (77.06-80.17%) when compared with the non-coding loci (pseudogenes, 1.10-1.20%, lncRNAs, 3.05-3.36%, TEs, 11.82-13.86%, and other type, 3.87-4.52%). To further identify the conserved and significant functional interactions among ceRNAs, miRNAs, and target genes, the transcribed loci shared between lines L7 and L30 were selected for the next step analysis. As a result, 8,336 transcribed loci identified in only one of the two maize lines were filtered out, and the retained 28,233 transcribed loci were considered to be stably expressed in the two maize lines during three anther developmental stages (Figure 1a and Table S2).
Among the 28,233 transcribed loci, 24,093 loci were annotated as coding genes and the remaining 4,140 loci were annotated as non-coding genes. For the 24,093 coding-gene loci, nearly 90% were also detected in two RNA-seq datasets of maize anther reported previously [36,37], which were reanalyzed in this study ( Figure 1c). Additionally, 69.83% and 82.64% of the coding-gene loci were repeatedly detected in the maize full-length cDNA (FlcDNA) and expressed sequence tag (EST) databases, respectively ( Figure 1c). The high overlap rates indicate that the coding-gene transcripts identified here are really stably transcribed and have intact sequences.  28,233 loci that overlapped between the two lines (marked in red) were used in this study. (b) Classification of the transcribed loci into five types. "Coding" represents coding genes, and "Pseudo" represents pseudogenes. (c) Confirmation of the coding and non-coding transcribed loci by using the published data and the public databases. Maize anther transcriptome datasets from two previously-published studies were reanalyzed here [36,37]. (d) Transcription expression patterns of transcribed loci used in this study. Anther developmental stages 8, 9, and 10 are represented by S8, S9, and S10, respectively. Transformed transcription levels >8 were designated as 8 to facilitate the comparisons of transcription expression patterns among the five types of transcribed loci. Correlation of the transcription expression levels between maize lines L7 and L30 at the three stages for each transcribed locus was estimated using Pearson's correlation coefficient (r). (e) Anther phenotypes (left) and transverse section analysis (right). We sequenced transcriptomes from maize anther at developmental stages of S8, S9, and S10, respectively. The developmental stages of anther samples were identified by transverse section analysis.
Among the 4,140 non-coding transcribed loci, the relatively low rates, 7.80% and 11.76%, were repeatedly detected in the other two RNA-seq datasets, respectively ( Figure 1c). The reason may be that the RNA-seq libraries for the other two datasets were constructed by using polyA-containing mRNAs which limited enrichment of non-coding transcripts [36,37]. In addition, 30.56% and 33.31% of the non-coding loci were repeatedly detected in the maize FlcDNA and EST databases respectively, while 85.26% of the non-coding loci were identified in the de novo assembled transcripts, which is similar to the overlap rate (97.08%) of the coding-gene loci (Figure 1c). Considering the majority (85.26%) of the non-coding transcribed loci and the moderate subsets (30.56% and 33.31%) confirmed in de novo assembled transcripts and the public databases respectively, the 4,140 non-coding loci can be used for ceRNA detection, and include 264 pseudogenes, 711 lncRNA loci, 2,390 TEs, and 775 unannotated RNA loci (Figure 1d and Table S2).
The transcriptional landscapes are diverse among the different types of transcribed loci in maize anthers (Figure 1d). In general, transcriptional levels of coding genes were higher than those of the non-coding loci. Notably, a larger proportion of pseudogenes were more highly transcribed than the other types of non-coding loci. Moreover, the expression patterns of a majority of transcribed loci were similar between lines L7 and L30 during the three developmental stages. On average, 61.55% of the transcribed loci showed positive correlations at transcriptional levels (Pearson's correlation coefficient, r > 0.3) between the two lines during the three stages, and 40.32% of the transcribed loci were highly associated (r > 0.8) (Figure 1d). This finding further reinforces our conclusion that the majority of the transcribed loci identified here is not only stably expressed, but also show similar expression patterns in the different lines.
Taken together, comparative analyses among the published transcriptomes, the public FlcDNA and EST databases, and the anther transcriptomes sequenced here suggest that the 28,233 transcribed loci identified in this study are creditable and can be used for the subsequent analysis.
2.2. The microRNA (miRNA) Profile in Maize Anther Development miRNAs, as key regulatory nodes in ceRNA-miRNA-target gene regulatory networks, are regulated by their corresponding ceRNAs, and further serve as the regulators of their target genes. In this study, small RNA-seq data were generated from the same samples as sequencing of the long RNAs (Table S1 and Figure S1). Among all identified miRNAs, 321 miRNA loci were annotated in the maize reference genome according to miRBase (release 22) and corresponded to 203 unique miRNA groups after merging the miRNA loci with the same mature sequences. We then selected and used miRNAs with read counts per million mapped reads (CPM) values >1 in at least one sample in each of the two lines. Based on this selection condition and the known miRNA annotation, 70 and 71 miRNA groups were identified in small RNA transcriptomes of lines L7 and L30 respectively, with the proportion of 35% of the known maize miRNA groups (70/203 and 71/203) (Figure 2a). The known miRNAs not identified here may be highly expressed at other anther developmental stages or in other maize tissues. Among these identified known miRNAs, 61 miRNA groups were shared between lines L7 and L30, and 47 were confirmed in both the two published maize anther small RNA-seq datasets [37,38], which were reanalyzed by using the same method used in this study (Figure 2a). This finding indicates that the known miRNA groups identified and selected here are stably transcribed during anther development in maize. Therefore, all the 61 known miRNA groups are creditable, and can be used in the following analysis (Table S3). The maize anther small RNA transcriptome data from two previous studies was reanalyzed here [37,38]. Sixty-one known miRNA groups and 51 potentially novel miRNA groups were used in the following analysis (marked in red). Transcription expression patterns of identified (c) 47 known and (d) 51 potentially novel miRNAs were categorized into four classes based on the reanalyzed results of Zhai et al.'s data with 10 anther developmental stage points [38]. Transcription expression levels of each miRNA group were normalized to Z-scores for data from Zhai et al., line L7 and line L30 separately.
Except for the known miRNA groups, we also identified potentially novel miRNA groups from the small RNA-seq data as determined by four thresholds including (1) a relatively high expression level (CPM > 1), (2) a meaningful secondary structure (p < 0.05), (3) shared between the two lines, and (4) further confirmed in the other two published datasets [37,38]. After being screened, 51 potentially novel miRNA groups were selected, confirmed, and used in the following analysis ( Figure 2b and Table S4). None of the 51 potentially novel miRNA groups were found to have the same mature sequence with plant known miRNAs (miRBase, release 22), indicating that the novel miRNA groups identified here may be maize anther-specific miRNAs.
Based on the small RNA dataset of maize anther published previously [38], the expression patterns of the 47 known miRNAs shared among the four miRNA datasets can be divided into four types including (1) eight up-regulated miRNAs, (2) 22 down-regulated miRNAs, (3) eight miRNAs with first up-regulation and then down-regulation, and (4) nine miRNAs with first down-regulation and then up-regulation ( Figure 2c). Similarly, the 51 potentially novel miRNAs can also be clustered into the same four types (Figure 2d).

Competing Endogenous RNAs (ceRNAs) and Target Genes of miRNAs
We predicted ceRNAs and target genes of the miRNAs selected above using computational approaches. Among the 28,233 transcribed loci, 583 (2.06%) and 857 (3.04%) loci were predicted as ceRNAs and target genes of miRNAs, respectively (Table S5). Thirty-one loci (0.11%) were predicted as both ceRNAs and target genes of miRNAs, which is significantly higher than the expected value (0.06% of 28,233 loci, 17 loci; one-tailed Fisher's exact test, p = 0.0297). The number of predicted ceRNAs (3.92) per known miRNA is significantly smaller than that (12.42) per potentially novel miRNA. Similarly, the average number of predicted target genes (7.18) per the known miRNA is relatively lower than that (10.87) per the potentially novel miRNA (Table S5).
Although we carefully filtered the ceRNAs and target genes by using well-developed and commonly used programs [23,[39][40][41], it is still uncertain whether these predicted pairs of ceRNA-miRNA and miRNA-target gene could actually interact with each other. In general, functional elements are often conserved in genomic sequences in different individuals or species because of their functional constraints. Some miRNAs as well as their target sites are conserved in sequences among closely related species, even across land plants [42,43]. Therefore, we propose that the MRE sequences of ceRNAs should be conserved if the ceRNAs computationally identified here have genuine functions in regulating their downstream miRNAs. To verify this hypothesis, we first identified 0.34 million high-quality single nucleotide polymorphism (SNP) sites by using the transcriptome data of the 11 anther-tissue samples in lines L7 and L30. We then investigated SNP densities both at and around the predicted MRE region where a known mature miRNA sequence and its corresponding ceRNA fragment bind or interact. As a result, for the known miRNAs, we found that the SNP density in the MRE region is obviously lower than that in the MRE flanking regions represented by both the upstream and downstream 60 nt sequences ( Figure 3a). In addition, the 5 -end miRNA mature sequence is found to be more important in recognizing the target gene or ceRNA when compared to the 3 -end miRNA mature sequence in plants [44,45]. Here, we found that MRE regions binding to the 5 -end sequence of known miRNAs (the 2nd to 7th nucleotides of mature miRNA, named as "5 -side" region in this study) has a much lower SNP density than the other investigated regions (Figure 3a), which is consistent with the functional significance of mature miRNA 5 -side region reported previously in plants [44,45]. The sequence conservation of MREs in predicted ceRNAs suggests that these transcripts most likely have ceRNA functions. On the other hand, the conservation of sequences was also observed at the target sites of known miRNAs predicted by the psRobot program. SNP density in the target regions interacting with the known miRNA 5 -side sequences is much lower than those in the flanking regions (Figure 3b), which is reliable evidence for the predicted target genes of known miRNAs. Moreover, the low SNP density at the target sites can also be observed in target genes predicted by the TAPIR program ( Figure 3c). Notably, for the potentially novel miRNAs, SNP densities at interacting sites of ceRNAs and target genes were similar to those in flanking regions (Figure 3d,e). This result is consistent with the previous reports, suggesting that poorly conserved or young miRNAs may be selectively neutral, with limited selection pressure on them [46]. There were four predicted target genes by using the psRobot program for potentially novel miRNAs, but no SNP site existed at the four target sites and their flanking regions. "MRE and target site" indicate binding regions recognized by mature miRNA sequences on predicted ceRNA and target genes respectively, and are marked in blue. "5 -side" represents binding regions recognized by the 2nd to 7th nucleotides from the 5 end of miRNA mature sequences (marked in red).
After investigating the functional characters of coding genes predicted as ceRNAs or target genes, we found that the proportion of coding gene-originated ceRNAs annotated as TF genes is significantly lower than that of coding gene-originated target genes annotated as TF genes (5.99% versus 13.44%; Fisher's exact test, p = 0.0001). This difference was more significant for known miRNAs (6.38% versus 21.24%, p = 5.48 × 10 -5 ), but was not significant for potentially novel miRNAs (5.75% versus 4.90%, p = 0.7051). This suggests that target genes likely tend to become key regulatory nodes in anther GRNs, especially for target genes of known miRNAs. The difference in the relative numbers of TFs between ceRNAs and target genes is also reflected in the biological function enrichment results, in which ceRNAs are mainly involved in metabolic and protein modification processes, while target genes are functionally enriched in flower development and cellular regulation processes ( Figure S2). These results indicate that coding gene-originated ceRNAs and target genes may represent two groups of genes with different functions. In other words, coding gene-originated ceRNAs tend to encode proteins involved in common cellular functions such as catalytic proteins, while coding gene-originated target genes are biased toward regulation of tissue-specific biological processes and encode proteins such as TFs.

Reconstructing ceRNA-miRNA-Target Gene Regulatory Networks in the Developing Maize Anther
After identifying miRNAs and stably transcribed loci in the maize anther, we computationally predicted ceRNA-miRNA and miRNA-target gene pairs and found that the interaction pairs displayed sequence conservation in miRNA binding regions (MREs and target sites). By integrating ceRNA-miRNA and miRNA-target interaction pairs and excluding miRNA-mediated pathways only containing ceRNAs or target genes, we reconstructed 79 ceRNA-miRNA-target gene regulatory networks during maize anther development (Figure 4 and Figure S3). These networks include 576 predicted ceRNAs, 79 miRNAs (51 known and 28 potentially novel miRNAs), and 753 target genes, which together comprise 619 ceRNA-miRNA pairs and 869 miRNA-target gene pairs. In addition, one gene (Zm00001d012381) was predicted as both a ceRNA and a target gene of a potentially novel miRNA (zma-miRN15) in the network. Meanwhile, there are complex links between 20 known miRNA-and 18 potentially novel miRNA-mediated regulatory networks ( Figure 5), and relatively simple links exist among other six known miRNA-and one potentially novel miRNA-mediated regulatory networks ( Figure S4).  28 potentially novel miRNA-mediated ceRNA-miRNA-target gene regulatory networks. The 51 known miRNA-mediated networks were first classified into four categories (i.e., "Up", "Down", "First up, then down", and "First down, then up") based on the expression patterns of known miRNAs shown in Figure 2, and then were numbered from "1" to "51". The 28 potentially novel miRNA-mediated regulatory networks were also classified and numbered from "52" to "79". Please refer to Table S5 for the detailed information of ceRNAs, miRNAs, and target genes located in the 79 regulatory networks. It is well known that ceRNAs can negatively regulate the matched miRNAs at the level of transcription, and that miRNAs can repress the transcription levels of their target genes. Therefore, it can be assumed that transcriptional levels of predicted ceRNAs and target genes should be negatively correlated with those of their interacting miRNAs across the three anther developmental stages (S8 to S10). Through expression correlation analysis on ceRNA-miRNA and miRNA-target pairs, we found that 47.17% of ceRNA-miRNA pairs (292/619) showed negative correlations (r < −0.6) in at least one of the two maize lines. Moreover, 16.32% of ceRNA-miRNA pairs (101/619) were found to be negative correlations in both the two maize lines. For the miRNA-target gene pairs, 41.20% (358/869) and 17.49% (152/869) were observed to be negative correlations (r < -0.6) in at least one line and in both the two lines, respectively. These negatively associated pairs may represent more credible interactions in the reconstructed networks.
Among the 576 predicted ceRNAs, there are 520 coding genes, three pseudogenes, 24 lncRNAs, 21 TE genes, and eight other types. Compared with the initially identified levels in the 28,233 transcribed loci (Figure 1b), the proportions of coding genes (85.34% versus 90.28%) and lncRNAs (2.52% versus 4.17%) predicted as ceRNAs were significantly increased in the networks, while the proportions of the other transcript types predicted as ceRNAs were reduced. There were 682, 3, 36, 21, and 11 target genes annotated as coding genes, pseudogenes, lncRNAs, TE genes, and other types, respectively. Similarly, coding genes (85.34% versus 90.57%) and lncRNAs (2.52% versus 4.78%) were significantly enriched with target gene functions. Notably, the enrichment of lncRNAs predicted as ceRNAs and target genes were 1.65-and 1.90-fold respectively, which are higher than the enrichment of coding genes with both 1.06-fold. These results suggest that coding genes and lncRNAs have higher probabilities involving the ceRNA-miRNA-target gene regulatory networks, especially for lncRNAs.

The ceRNA-miRNA-Target Gene Regulatory Networks during Maize Anther Development
Having reconstructed the ceRNA-miRNA-target gene regulatory networks in the maize anther and revealed its general features, we further investigated whether these networks have significant functions in regulating maize anther development. One approach for verifying their functional significance is to investigate functions of the miRNA-target genes identified in these networks. Specifically, three methods were used in this study: (1) performing gene ontology (GO) analysis for the target genes, (2) determining whether some of the target genes or their orthologs are the reported functional genes regulating anther development, and (3) comparing the overlap between the miRNA-target gene pairs reported previously and identified here. The GO analysis showed that the target genes are functionally enriched in flower development genes ( Figure S2). Permutation analysis using randomly selected transcripts with the same sample size as target genes (10,000 times) indicates that the enrichment result is not attributed to the sample bias that transcribed loci were identified from transcriptomes of anther tissues (p = 0.0164). This finding roughly supports our hypothesis that the reconstructed ceRNA-miRNA-target gene regulatory networks may contribute to maize anther development. Nevertheless, a more detailed analysis should be performed to further verify our hypothesis. A maize GMS gene ZmMs7 reported previously in our laboratory [47] was predicted as the target gene of a potentially novel maize miRNA, zma-miRN15, in this analysis. Additionally, in our recent review, we identified 79 GMS orthologous genes in maize based on the GMS gene information reported in Arabidopsis, rice, and other plant species [4]. The functional roles of these GMS genes have been well studied, and found to be involved in crucial transcription, lipid metabolism, polysaccharide metabolism, and other processes, which play important roles in maize anther development. In this study, 86.08% (68/79) of these GMS orthologous genes were identified as transcribed loci, and six of them were located in the reconstructed regulatory networks as target genes. Five genes (Zm00001d012544, Zm00001d043131, Zm00001d002929, Zm00001d034701, and Zm00001d046537) may be regulated by known miRNAs, and Zm00001d053895 was predicted to be the target gene of a potentially novel miRNA. The relative enrichment was 2-fold for the six genes (6/1177; p = 0.0599) and increased to 3-fold for the five genes targeted by known miRNA (5/525, p = 0.0198) when compared with the background (68/24,093). In addition, many reported miRNA-target gene pairs related to flower or anther development were found to be included in the reconstructed regulatory networks, such as miR156-SPL (SQUAMOSA PROMOTER BINDING PROTEIN-LIKE), miR159-MYB, miR160-ARF (AUXIN RESPONSE FACTOR), miR164-NAC (NAM, ATAF, and CUC), miR172-AP2 (APETALA2), and miR319-TCP (TEOSINTE BRANCHED1, CYCLOIDEA, and PCF) pairs [10]. These results from the three aspect analyses indicate that the reconstructed ceRNA-miRNA-target gene regulatory networks are likely involved in the process of maize anther development.
SPL family genes encode a group of plant-specific transcription factors participating in flower development [48,49]. In addition, SPL genes are the targets of miR156. There are 27 target genes in the reconstructed zma-miRNA156-mediated network, of which 14 belong to the SPL family. The expression levels of 13 SPL genes were negatively correlated (r < -0.3) with that of zma-miR156 in at least one line (Figure 6a). Notably, among four ceRNAs located in the zma-miRNA156-mediated network, Zm00001d045072 (a gene with unknown function) and Zm00001d048877 (a leucine-rich repeat transmembrane protein kinase gene) were found to be negatively associated with zma-miR156 at expression levels. In the zma-miR172-mediated regulatory network, there are 17 ceRNAs and 17 target genes (Figure 6b). Expression levels of 12 ceRNAs and 11 targets were negatively associated with that of zma-miR172. It was reported that miR172 is crucial for plant sexual reproduction [50]. Zma-miR172e, one of the zma-miR172 family members, has essential roles in stamen identity by targeting AP2 and two AP2-like genes (IDS1 and SID1) in maize [9,51]. In this study, the expression levels of AP2 family genes including SID1 and AP2 were negatively correlated with that of miRNA172 (Figure 6b). Interestingly, miR156-target genes can regulate miR172 expression [52], which indicates that the regulatory networks underlying flower or anther development may be complicated since links exist among different miRNA-mediated regulatory networks. The results obtained here indicate that the zma-miR172-mediated regulatory network not only participates in maize floral identity, but also may be involved in the development of anther and tassel.
In the zma-miR319-mediated regulatory network, six of 15 target genes belong to the TCP family (Figure 6c). The miR319-regulated TCP genes are considered to negatively regulate secondary wall-thickening in floral organs, overexpression of TCP24 causes male sterility, and the miR319 loss-of-function mutant exhibits abnormal stamens in Arabidopsis [53,54]. These reports suggest that miR319 and its target TCP genes play important roles in anther development. More importantly, we found that Zm00001d012544, maize ortholog of OsGAMYB, was one of the predicted target genes of zma-miR319 (Figure 6c). OsGAMYB has essential functions in floral organ and pollen development in rice [55]. Additionally, Zm00001d012544 is highly expressed in the maize anther (information from MaizeGDB). Therefore, Zm00001d012544 may be an important regulator of maize anther development.
ARF family genes are the targets of miR160.
We identified five ARF genes in the zma-miR160-mediated network (Figure 6d). Among them, Zm00001d002929, ortholog of AtARF17, was negatively associated with zma-miR160 expression and highly expressed in the maize anther (information from MaizeGDB). Interestingly, AtARF17 was reported to participate in pollen wall formation in Arabidopsis, and its loss-of-function results in male sterility [56]. Therefore, Zm00001d002929 may be an important regulator of maize reproduction. In addition, there are six predicted ceRNAs for zma-miR160, three of which have expressed associations with zma-miR160. These results indicate that miR160, combined with its ceRNAs and ARF targets, may play a crucial role in regulating maize anther development.
In two other networks, zma-miR159 targeted Zm00001d043131 (ortholog of OsGAMYB), and zma-miR164 targeted Zm00001d034701 (ortholog of OsDEX1) and Zm00001d046537 (ortholog of AtABCG26 and OsABCG15) (Figure 6e,f). OsGAMYB, OsDEX1, and OsABCG15 are key functional genes in rice anther development and play important roles in male fertility [55,57,58]. Five NAC family genes were predicted to be target genes of zma-miR164, and expression levels of three of them were negatively correlated with that of zma-miR164 (Figure 6f). Although miRNA paralogs belonging to the same miRNA family were slightly different in terms of numbers and types of their predicted ceRNAs and target genes, their regulatory network were relatively stable, and the majority of functional genes were shared between miRNA paralogs ( Figure S5).
Taken together, we can conclude that the reconstructed ceRNA-miRNA-target gene regulatory networks likely contribute to maize anther development.

Novel miRNAs Integrated in the ceRNA-miRNA-Target Gene Regulatory Networks
Target genes of potentially novel miRNAs were also found to be associated with anther development (Figure 7a, b). For instance, we found that ZmMs7 is targeted by a potentially novel miRNA, zma-miRN15 (Figure 7a). Recently, we reported that ZmMs7 encoding a PHD-finger TF is crucial for anther development and male fertility in maize [47]. ZmMs7 is the ortholog of AtMS1 in Arabidopsis and OsPTC1 in rice [59,60]. There were five miRNA homogenous loci (zma-miRN15a, b, c, d, and e) for zma-miRN15 in maize genome, which could be classified into two types based on two nucleotide sites (T/C and C/T) (Figure 7c). Precursor sequences of the two type zma-miRN15 could be predicted to form canonical stem-loop structures by computational analysis (Figure 7d). Except for four nucleotides at zma-miRN15 3 -end, the other 14 nucleotides were perfectly complemented with the predicted ZmMs7 target sites (Figure 7f).  Table S5). Legend in Figure 7a,b was the same as that of Figure 4, Figure 5, and Figure 6. (c) Genomic sequences of zma-miRN15 precursors at five loci (zma-miRN15a, b, c, d, and e). Genomic position of the first nucleotide at the 5 end of each locus is presented at the right side of the sequence. Region of zma-miRN15 mature sequence is presented by the black line. Nucleotides at two sites ("site 1" and "site 2") are marked in blue. Another potentially novel miRNA, zma-miRN478, was found to target Zm00001d053895, maize ortholog of AtAMS in Arabidopsis, and OsTDR in rice. AtAMS and OsTDR are essential for anther development and male fertility [61,62]. Genomic sequences around zma-miRN478 could be predicted to a stem-loop structure (Figure 7e). Notably, target sites of zma-miRN478 were observed in two of the four annotated maize isoforms of Zm00001d053895 (Figure 7g), indicating that maize-specific or younger miRNAs possibly participate in the ceRNA-miRNA-target gene regulatory network.

The ceRNA Components in Transcriptomes
The number of transcripts from non-coding genes (e.g., pseudogenes, lncRNA loci, and TEs) is much smaller than that from coding genes. Consequently, the majority of predicted ceRNAs belong to coding genes. This finding indicates that the interaction between miRNAs and transcripts of coding genes in the transcriptomes may be more frequent than that was previously thought. It is possible that transcripts of some coding genes function, either fully or partly, in regulating miRNA activities via a ceRNA regulatory mechanism, while the functional roles of their coding protein products may be less important in some developmental stage(s) or tissue(s). This may be one reason to explain why the inconsistencies exist between the levels of transcribed RNAs and translated proteins for some protein-coding genes [63].
In addition, we found that lncRNAs were relatively enriched in ceRNAs when compared with other types of non-coding transcripts, which implies that lncRNAs may be the other major source of ceRNAs. Considering that the first plant ceRNA molecule was discovered in phosphate starvation status, and that in the subsequent reports a large percentage of ceRNAs were mainly identified under abnormal conditions such as stress-induced organisms, cancer, or tumor tissues [13,16,17], it can be inferred that the ceRNA regulatory pathway may be more active under stress conditions and be considered as a supplementary gene regulatory mechanism at the post-transcriptional level. In this study, we focused on the normal developmental process of maize anther, in which the ceRNA transcripts, especially for the non-coding transcripts, may be less active in the reconstructed regulatory networks. The initial amounts of different types of transcripts and the possible stress-activated mechanism of the ceRNA-miRNA-target gene regulatory network may account for the enrichment of coding gene-and lncRNA-originated ceRNAs in this study. Further studies are needed to address the ceRNA components in transcriptomes under both normal and stress-response conditions.

ceRNAs Differ from Target Genes of miRNAs
Although ceRNAs and target genes of known miRNAs were predicted by a computational approach, it has been demonstrated that the complementary sites between ceRNAs and miRNAs, as well as between miRNAs and their target genes, are relatively conserved in sequence when compared with their flanking regions. In general, the sequence conservation in the genomic elements may indicate their functional consistence and significance. Therefore, the predicted ceRNAs and target genes of miRNAs could be used to reconstruct the ceRNA-miRNA-target gene regulatory networks. Before reconstructing the regulatory networks, we investigated their functional features. The ceRNAs represent the up-stream regulators that can repress the activities of their corresponding miRNAs, while the target genes can be considered as the down-stream effectors whose transcriptional or translational activities are repressed by the miRNAs. Although there are an increasing number of studies on ceRNA identification in plants and animals, these reports rarely address whether the general features of ceRNAs are different from other transcripts. It is well known that most target genes of conserved miRNAs are TF-coding genes involved in controlling developmental processes [46]. Here, we also found that the functions of proteins encoded by target genes tend to be TFs in the GRN, with their functions significantly enriched in flower development and cellular regulation. However, the proportion of TF-coding genes annotated as ceRNAs is similar to that of the whole background in the maize genome, and the ceRNAs mainly take part in biological functions such as metabolism and catabolic processes, which significantly differ from the functional roles of the TF-coding genes predicted as target genes. These results indicate that the different positions of ceRNAs and target genes in the ceRNA-miRNA-target gene regulatory network may be tightly associated with their different functional features.

The Functional Significance of the ceRNA-miRNA-Target Gene Regulatory Networks during Maize Anther Development
In maize, the proportion of tissue-specific transcribed genes in anther is the highest (6.0%) when compared with other tissues during various developmental stages ( Figure S6), suggesting that maize anther transcriptome has a special content and that the GRN underlying anther development may be more complicated. Previous studies have identified a number of functional genes involved in anther development, including TF-coding genes that are crucial regulatory nodes in the networks, enzyme-coding genes that participate in lipid and carbohydrate metabolism, and transporter genes [2,[5][6][7]64,65]. Additionally, several miRNAs combined with their target genes were found to have important functions in regulating the floral organ identity and the subsequent development of anther and pollen [8,9]. These protein-coding genes and miRNA regulators comprise a meaningful GRN framework for further studies on mechanisms of anther development in plants. Here, we further introduced ceRNAs into the GRN framework, and found that miRNA-target genes indirectly regulated by ceRNAs are functionally significant in anther development and male fertility. De novo genes and other newly originated genes including miRNAs are crucial genomic elements for functional novelty, and are important components comprising the functional pathway and GRN of organisms [66,67]. Among the maize GMS genes reported [4,47,[68][69][70][71], ZmMs7 was found to likely be the target of a potentially novel miRNA. These results not only extend the GRN of anther development by adding a new type of regulatory relationship (ceRNA-miRNA pair), but also reveal interactions among transcripts during anther development. At present, our understanding on the mechanisms of anther development and the ceRNA-miRNA-target gene regulatory network is far from complete. Here, all the identified ceRNAs, miRNAs, and target genes, along with their interaction relationships, provide a meaningful dataset for future studies. Therefore, more experimental studies and computational analyses should be performed to deeply understand the biological processes involved in anther development and male fertility in plants.

Plant Materials, Anther Samples, and Transcriptome Sequencing
The plants were grown in the field of the University of Sciences and Technology Beijing (USTB) with normal field management. Anther collection was performed as previously described [72]. Anthers with 2.5-4.0 mm length were collected from upper florets with two pairs of sharp forceps directly into microcentrifuge tubes, immediately frozen in liquid nitrogen and stored at -80 • C until use. Fifty to sixty anthers were collected for each biological replicate. At the same time, five fresh anthers were immersed in FAA solution (Coolabor, Beijing, China) overnight for transverse section SEM (scanning electron microscope) analyses. According to the results of SEM analyses, we ensured that the anthers collected were at the correct developmental stages (Figure 1e).
Total RNA was extracted from anther tissues using an RNAprep Pure Plant Kit (Polysaccharides and Polyphenolics-rich) (TIANGEN, Beijing, China). The small RNA-seq libraries were generated using NEBNext ® Multiplex Small RNA Library Prep Set for Illumina ® (NEB, USA) following the manufacturer's recommendations. After 3 and 5 adapter ligation and cDNA synthesis, DNA fragments corresponding to 140-160 bp (the length of small RNA between 18-30 nt plus the 3 and 5 adaptors) were recovered by 8% PAGE. The library preparations were sequenced on an Illumina Hiseq 2500 platform and 50 bp single-end reads were generated. The ribosomal RNA was removed from the extracted RNA by Epicentre Ribo-zero™ rRNA Removal Kit (Epicentre, USA). RNA-seq libraries were constructed using the rRNA-depleted RNA by NEBNext ® Ultra™ Directional RNA Library Prep Kit for Illumina ® (NEB, USA) following manufacturer's recommendations. After adapter ligation and cDNA synthesis, the library preparations were sequenced on an Illumina Hiseq 4000 platform and 150 bp paired-end reads were generated.

Analyses of Transcribed Loci and Single Nucleotide Polymorphism (SNP) in the Maize Transcriptomes
The ribosomal RNA-depleted RNA-seq libraries were constructed and sequenced from 11 anther tissue samples at three developmental stages (stages 8, 9, and 10) in two maize lines (L7 and L30) (Table S1). Raw sequencing reads were processed to remove adapter sequences, low quality reads (quality scores <20), and un-paired reads using NGSQCToolkit (version 2.3.3) [73]. The remaining high-quality clean reads were mapped to the maize B73 reference genome (version AGPv4, downloaded from Ensembl release 37) using TopHat2 with the default parameters [74]. Transcribed loci were identified in each sample using the cufflinks program without setting reference transcript annotations, and the results were then merged using cuffmerge [75]. Expression levels of the transcribed loci were estimated with the edgeR package using RPKM values [76]. The transcribed loci with RPKM values >0.5 and the lengths of their longest transcripts >200 nt in at least one sample of each line were used in the subsequent analyses.
SNP sites in the two lines were identified via the alignments of the RNA-seq data in each sample using the Genome Analysis Toolkit (GATK) pipeline [77]. High-quality SNP sites (parameters used in GATK: QD > 2.0, FS < 60.0, MQ > 40.0, MQRankSum > -12.5, and ReadPosRankSum > -8.0) from these samples were used for further analysis. If at least three samples from one line were polymorphic at a site, this site is defined as an SNP site in the investigated line. Subsequently, SNP sites from the two lines were merged together to estimate the sequence conservation of MREs (miRNA matched regions by ceRNAs) and target sites (target gene matched regions by miRNA) located in ceRNAs and target genes, respectively. The sequence conservation levels were estimated by comparing SNP densities (the number of SNP sites per kilobase) between miRNA binding regions (MREs and target sites) and their flanking regions.
Short reads in the transcriptome data were assembled de novo into longer transcripts using the Trinity program with the default parameters [78,79]. The maize public FlcDNA and EST data were downloaded from The Maize Full-Length cDNA Project (http://www.maizecdna.org/), which contains 27,455 FlcDNA sequences and 364,385 EST sequences. The identified transcripts in this study were aligned to the FlcDNA and EST sequences using BLASTn [80]. The transcripts with similarities >85% and lengths >200 nt in the aligned regions were considered with expression evidences in FlcDNA or/and EST database. The two sets of RNA-seq data from maize anther tissues published previously were reanalyzed to confirm the credibility of the transcribed loci identified in this study [36,37].

Annotation and Classification of Transcribed Loci
After identifying the transcribed loci from the transcriptomes of maize anther, we annotated and classified these transcripts into five types including coding genes, pseudogenes, lncRNA, TEs, and other type. Gene model and annotation information based on the maize reference genome sequence (version AGPv4) were used to clarify the transcribed loci. Coding-gene models from MaizeGDB (Zm00001d) and the National Center for Biotechnology Information (B73_RefGen_v4) were merged together in this analysis. When there were overlapping gene-models present in the two datasets, those from MaizeGDB were shown. LncRNAs identified in four previous studies were merged into a single list representing the maize lncRNA models [81][82][83][84]. Additionally, lncRNAs annotated by Ensembl (release 39) and NCBI (B73_RefGen_v4) were added to the lncRNA models. Pseudogene models were predicted using pseudopipe based on coding-gene models [85]. TE annotation was downloaded from MaizeGDB (TransposableElements.gff3). When a transcribed locus overlapped with other transcript types, it was assigned to a certain transcript type according to the prior order of coding gene, pseudogene, lncRNA, and TE, and transcribed loci not assigned to any of the four transcript types were marked as "other type". Transcribed locus IDs of coding genes were extracted from the MaizeGDB and NCBI databases, while transcribed locus IDs of pseudogenes, lncRNAs, TEs, or other type were designated ZmPGtxxxxxx, ZmLNtxxxxxx, ZmTEtxxxxxx, and ZmOTtxxxxxx, respectively ("xxxxxx" is a unique number specified for each locus, Table S2). TF annotation information of maize genes was obtained from the Plant Transcription Factor Database (http://planttfdb.cbi.pku.edu.cn/).

Analysis of Small RNA-Seq Data
Small RNA-seq data were initially processed using cutadapt 1.17 to remove adaptor sequences from the reads (https://cutadapt.readthedocs.io/en/stable/#). Reads containing ambiguous bases (marked as N), poly A tracts (length > 11; in miRBase, the longest poly A tract in a known mature miRNA sequence is 11 nt) and reads <18 nt or >25 nt were filtered out. The remaining reads were annotated using the Infernal package (version 1.1.2) based on the Rfam database (release 14.0) [86][87][88]. Reads similar to rRNAs, tRNAs, snRNAs, and snoRNAs were discarded. An investigation of the read length distribution was performed to confirm that the majority of the remaining reads were mature miRNA sequences. The retained high-quality reads were then mapped to the maize reference genome (AGPv4) using Bowtie software [89]. The number of reads mapped to known miRNA sequences were counted by Rsubread based on the maize miRNA models in miRBase (release 22) [90]. The MiRDeep2 package was used to identify potentially novel miRNAs transcribed in the maize anther [91]. The resulting predictions with significant randfold p values <0.05 and miRDeep2 scores >5 were considered as novel miRNA candidates. The potentially novel miRNAs were named miRNxxxx, in which xxxx is a unique number specified for each novel miRNA candidate. The transcription levels of the identified known and potentially novel miRNAs were estimated using edgeR based on the read counts that were normalized into CPM (read counts per million mapped reads) values. Known and potentially novel miRNAs with CPM values >1 in at least one sample were used in the subsequent analysis. Small RNA-seq data published previously in two independent studies were reanalyzed here using the method described above to confirm the credibility of identified miRNAs in this study [37,38]. Secondary structures of zma-miRN15 and zma-miRN478 precursors were re-predicted in the ViennaRNA Web Services (http://rna.tbi.univie.ac.at/).

Prediction of ceRNAs and Target Genes of miRNAs
ceRNAs and miRNA targets were predicted by comparing mature maize miRNA sequences and identified transcripts. Mature sequences of known maize miRNAs were obtained from miRBase, and mature sequences of maize potentially novel miRNAs were extracted from the miRDeep2 analysis results. The sequences of all isoforms of coding genes, pseudogenes, lncRNAs, TEs, and other type transcribed loci were extracted from the reference genome sequence based on their transcript models. The miRNA target genes were predicted using psRobot v1.2 [39] and TAPIR v1.2 [40], respectively. Target genes were predicted by psRobot with penalty scores ≤2.5 and without mismatched base pairs in the "5 -side" regions (the 2nd to 7th nucleotides of mature miRNA, named as "5 -side" region in this study), and targets were predicted by the TAPIR RNAhybrid engine with scores ≤4 and MFE (minimum free energy) ratios ≥0.7 and without mismatched base pairs in the "5 -side" regions. Merged results from the above two programs were used in the analysis. ceRNAs were predicted by using the TAPIR RNAhybrid engine with the target decoy search method. Candidates were considered as predicted ceRNAs if they could meet the following criteria: (1) MFE ratio ≥ 0.7, (2) no more than one mismatched base pair between the miRNA "5 -side" region and its recognized region, and (3) a bulge structure in the candidate (a mismatch loop usually 3 nt in length) behind the 10th complementary base pair in the 5 end of the mature miRNA sequences between the candidate ceRNA and its miRNA.
The parameters for identifying target genes are reasonable based on a previous comparative study of several prediction programs [41], and the rules on structural features used for ceRNA prediction are commonly used in ceRNA identification [23]. Transcriptional isoforms from the same transcribed loci predicted as ceRNA or target genes were merged together to remove redundancy.

Reconstruction of ceRNA-miRNA-Target Gene Regulatory Networks
By using the annotated types of transcripts, the identified known and potentially novel miRNAs, the predicted ceRNA and target relationships, and the expression correlations within ceRNA-miRNAs and miRNA-target pairs, we reconstructed the ceRNA-miRNA-target gene regulatory networks in this study. The miRNA-mediated networks only containing target genes or ceRNAs were filtered out, since no ceRNA and target gene both directly linked to the miRNAs in this analysis. The negative correlations of expression levels (r < -0.3) between miRNAs and their target genes, as well as between ceRNAs and their regulated miRNAs at the three anther developmental stages in at least one maize line, were considered as expression evidences and were marked with broader lines in the networks plotted by Cytoscape [92].

Specificity Analysis of Maize Anther Transcriptomes
Maize transcriptome data sets published previously included 79 tissues and developmental stages, and were used to compare differences of transcriptomes between anther and other tissues in maize [36]. Genes specifically expressed in the anther were determined with tissue specificity indexes (TSIs) > 0.9. TSI was calculated for each gene according to the method described previously [93]. A larger TSI value of a certain gene indicates that the gene is more specifically expressed in one of the investigated tissues.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.