Comparative transcriptome profiling of the fertile and sterile flower buds of a dominant genic male sterile line in sesame (Sesamum indicum L.)

Background Sesame (Sesamum indicum L.) is a globally important oilseed crop with highly-valued oil. Strong hybrid vigor is frequently observed within this crop, which can be exploited by the means of genic male sterility (GMS). We have previously developed a dominant GMS (DGMS) line W1098A that has great potential for the breeding of F1 hybrids. Although it has been genetically and anatomically characterized, the underlying molecular mechanism for male sterility remains unclear and therefore limits the full utilization of such GMS line. In this study, RNA-seq based transcriptome profiling was carried out in two near-isogenic DGMS lines (W1098A and its fertile counterpart, W1098B) to identify differentially expressed genes (DEGs) related to male sterility. Results A total of 1,502 significant DEGs were detected, among which 751 were up-regulated and 751 were down-regulated in sterile flower buds. A number of DEGs were implicated in both ethylene and JA synthesis & signaling pathway; the expression of which were either up- or down-regulated in the sterile buds, respectively. Moreover, the majority of NAC and WRKY transcription factors implicated from the DEGs were up-regulated in sterile buds. By querying the Plant Male Reproduction Database, 49 sesame homologous genes were obtained; several of these encode transcription factors (bHLH089, MYB99, and AMS) that showed reduced expression in sterile buds, thus implying the possible role in specifying or determining tapetal fate and development. The predicted effect of allelic variants on the function of their corresponding DEGs highlighted several Insertions/Deletions (InDels), which might be responsible for the phenotype of sterility/fertility in DGMS lines. Conclusion The present comparative transcriptome study suggested that both hormone signaling pathway and transcription factors control the male sterility of DGMS in sesame. The results also revealed that several InDels located in DEGs prone to cause loss of function, which might contribute to male sterility. These findings provide valuable genomic resources for a deeper insight into the molecular mechanism underlying DGMS. Electronic supplementary material The online version of this article (doi:10.1186/s12870-016-0934-x) contains supplementary material, which is available to authorized users.


Background
Sesame (Sesamum indicum L.) is a globally important and ancient oilseed crop mainly consumed for highquality oil [1,2]. It has the highest oil content among the cultivated oil crops and is rich in natural antioxidants like sesamin and sesamol, which are known by their specific antihypertensive effects and anti-oxidative activity [3][4][5]. Although important, the seed yield of sesame is unstable and relatively low compared with rapeseed, peanut and soybean. Therefore, great efforts should be made to improve the seed yield of sesame.
Heterosis utilization is the most promising approach for yield improvement, since very strong hybrid vigor (>15 %) has been observed within this crop [6]. Heterosis can be effectively exploited either by cytoplasmic male sterility (CMS) or genic male sterility (GMS). So far, only recessive GMS has been successfully applied to the production of sesame F 1 hybrids. However, this method might be constrained by certain drawbacks such as environmental sensitivity, incomplete sterility, and the timely removal of 50 % male-fertile plantlets from two-type lines for hybrid seeds production [7]. Recently, we have developed a novel dominant GMS line (DGMS) by crossing the wild species S. mulayanum L. (2n = 26) plants with the cultivated species S. indicum L. (2n = 26), which has great potential for the breeding of hybrid varieties. Cytological study showed that pollen abortion in the DGMS line (W1098A) began in pollen mother cells (PMC), continued throughout pollen development, and peaked at the late microspore stage. Moreover, the gene locus conditioning male sterile was delimited by two closely linked SSR markers SBM298 and GB50 [8]. However, the underlying molecular mechanism remains elusive.
The small diploid genome (~350 Mb) makes sesame an attractive species for genetic studies [9,10]. Recently, the high-quality genome sequence of sesame was assembled, which contains~27,148 predicted gene models, of which 91.7 % were anchored onto 16 pseudomolecules or linkage groups (LGs) [11]. Using forward and reverse genetic approaches, a growing number of genes have been identified that have vital roles in anther development. Consequently, the Plant Male Reproduction Database (PMRD, http://202.120. 45.92/addb/), a comprehensive resource for genes and mutants related to plant male reproduction, has emerged [12].
Male sterility (MS) is associated with not only the lack of viable pollen, but also the failure of pollen release [13]. The importance of tapetal programmed cell death (PCD) for successful pollen formation has been highlighted by a number of MS mutants that fail to go through normal tapetal breakdown [13][14][15]. Archesporial cell number and tapetal cell fate is controlled by EXCESS MICROSPOROCYTES1 (EMS1), a leucine-rich repeat receptor like kinase, and a small secreted protein ligand, TAPETUM DETERMINANT1 (TPD1) [16]. Tapetal development is initiated by DYSFUNCTIONAL TAPETUM1 (DYT1) [17] and DEFECTIVE IN TAPETAL DEVELOP-MENT AND FUNCTION1 (TDF1) [18], with tapetal maturation, pollen wall formation, and tapetal PCD involving ABORTED MICROSPORES (AMS) [19] and MALE STER-ILITY1 (MS1) [20]. The final stage of dehiscence involves jasmonic acid (JA)-induced gene expression and transcription factors associated with endothecium secondary thickening [13].
To elucidate the mechanism of MS more comprehensively, the transcriptomes of many higher plants have been sequenced, including Arabidopsis [21], buckwheat [22], cotton [23][24][25], watermelon [26], soybean [27], Brassica napus [28][29][30] and Brassica oleracea [31]. In this study, fertile and sterile flower buds from DGMS line with a length of~2.5 mm were sampled for RNA-seq, representing the first study of the sesame DGMS transcriptome. The aim of this study is to identify differentially expressed genes (DEGs) associated with MS, and explore the different bioprocesses involved and their putative functions. These results will be helpful to elucidate the molecular mechanism for DGMS, and assist the breeding of sesame hybrid variety.

Transcriptome profiling of fertile and sterile buds
We have previously demonstrated that male sterility mainly occurred at PMC stage in DGMS line [8]. Therefore, we sampled fertile and sterile buds at this stage, and prepared respective cDNA libraries. After sequencing with Illumina HiSeq 2000 platform, we obtained a total of 53,126,890 and 55,491,408 high quality pair-end reads from fertile and sterile flower buds, respectively, which were then cleaned and mapped to the sesame reference genome sequence containing 27,148 gene models [11]. In total, 83.54 % of the reads from fertile buds and 84.86 % from sterile buds were mapped to the reference genome, and the majority of which were uniquely mapped (Table 1). By sequences alignment, we found that a total of 22,373 and 22,788 genes were hit by the unique reads from fertile and sterile buds, respectively, which accounted for >82 % of the known gene models. The average length of genes in fertile buds was 1305 bp and it was 1297 bp for sterile buds. Most of these genes (74 % in sterile buds and 71 % sterile buds) showed very high level of gene coverage (90-100 %).
To gauge the relative level of gene expression in different tissues, we calculated the RPKM (Reads per Kilobase of exon model per Million mapped reads) value based on the uniquely mapped reads. The RPKM value for those genes detected in fertile buds ranged from 0.012 to 16683.020, with a mean of 40.974. Similarly, the minimum, maximum and average RPKM was 0.008, 33521.52 and 40.302 for genes in sterile buds. Thus, all the above genes were regarded to be expressed in either the fertile buds or the sterile buds, as indicated by a RPKM threshold ≥0.001. Unsurprisingly, most of these expressed genes (>95 %) were common between tissues; however, we also observed a small number of uniquely expressed genes (539 in fertile buds and 954 in sterile buds).

Functional characterization of DEGs
Using the criteria of at least two fold changes and false discovery rate (FDR)<0.001, we obtained 1,502 significant DEGs by comparing the genes expression levels between fertile and sterile buds, of which 751 were up-regulated and 751 down-regulated in sterile buds (Additional file 1: Table S2). Distribution of all DEGs across the sesame genome was then analyzed by anchoring gene sequences to the previously released 16 pseudomolecules (or LGs) that harbored 85.3 % of the sesame genome assembly [11]. By integrating the genome information available in public domain, we could assign the DEGs onto each LG. The results showed that LG4 had the least numbers of DEGs (4.47 %), following by LG11 with 4.76 %. In contrast, LG7 had the largest percentage of DEGs (6.83 %). Moreover, the percentage of up-regulated genes was nearly 2 folds that of down-regulated genes in LG16, LG8 and LG15. Also, LG2, LG10 and LG13 had higher percentage of up-regulated genes than down-regulated genes, while LG3, LG4, LG5, LG9, LG11 and LG12 showed an opposite trend. In addition, there were nearly equal numbers of up-and down-regulated genes in the rest of the four LGs (Fig. 1).
The putative function of each DEG was then characterized with both GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) databases. Due to the large numbers and the complex branch structure of GO categories, only the three most abundant functional groups, namely 'Cellular Component' , 'Molecular Function' and 'Biological Process' were presented, as an example (Fig. 2). In the sub-category of 'Cellular Component' , the largest numbers of genes were found to be associated with 'cell part' , which can be further sub-divided into cascades of 'intracellular' , 'cytoplasmic vesicle' and 'intrinsic to membrane'. In the next main sub-category of 'Molecular Function' , 'ion binding' and 'catalytic' were the most abundant cascades that have a respective of 71 and 19 genes. Moreover, 'hydrolase activity acting on glycosyl bonds' and 'iron ion binding' were the two dominant groups in the cascade of 'catalytic'. Within the last sub-category 'Biological Process' , 'cellular process' and 'metabolic process' were the two most prevalent cascades that can represent the typical activities of biological processes. Specifically, the most intriguing GO terms in 'cellular process' were found to be 'meiosis I' and 'pollen wall assembly' , suggesting their active roles in MS. It was noted that 'DNA recombination' was highlighted in the cascade 'metabolic process'.
In the KEGG analysis, a total of 34 pathways were enriched, of which 13 were inferred from both up-and down-regulated genes, and the rest were inferred from either down-or up-regulated genes alone (Table 2). It was showed that most of the genes are involved in 'Metabolic pathways' and 'Biosynthesis of secondary metabolites'. Interestingly, there were at least 6 genes (SIN_1006103, SIN_1017099, SIN_1014074, SIN_1023392, SIN_1015497 and SIN_1014349) annotated as 'Meiosis-yeast' or 'Oocyte meiosis' in the list of genes down-regulated in sterile buds, consistent with the GO annotation results. In the 'Biosynthesis of secondary metabolites' pathway, the number of up-regulated genes was nearly 3 times that of downregulated genes. Also, many more up-regulated genes were annotated as 'Polycyclic aromatic hydrocarbon degradation' and 'alpha-Linolenic acid metabolism'. By contrast, many genes down-regulated in sterile buds were enriched in ' Ascorbate and aldarate metabolism' and 'Glycerophospholipid metabolism'. There were also 14 upregulated genes involved in the pathway of 'Flavonoid biosynthesis' (Table 2).
These findings were further supported by a more specific comparison of metabolic pathways by using Map-Man [32]. All of the 1,502 DEGs identified between sterile and fertile buds were annotated in the TAIR database (http://www.arabidopsis.org). Consequently, 1,445 DEGs were found to be homologs of 1,240 Arabidopsis genes (Additional file 2: Table S3). To dissect the putative functions of the 1,445 DEGs that are likely to be associated with MS phenotype, we fully visualized the Arabidopsis homologous genes with MapMan and inferred a candidate pathway network (Fig. 3).
In the network, the most significant changes in transcript abundance of genes were shown to be related to 'Protein' , 'Targeting' , 'Hormones' and 'DNA'. Moreover, the expression of genes implicated in 'Ethylene and JA synthesis' were up-regulated in sterile buds, while those genes involved in 'Signaling pathway' were down-regulated in the DGMS sterile buds. In addition, the DEGs involved in 'Lipid (FA synthesis)' , 'Redox (Ascorbate & Glutathion)' and 'Energy (transport p-and v-ATPases)' were all down-regulated, whereas those in 'Second Metabolism (Flavonoids)' , 'Cell Wall (Modification)", and 'Energy (Fermentation)' were up-regulated in sterile buds, if compared to those in fertile buds. Among the differentially expressed transcription factors within the 'RNA TF' group, all of the NAC, trihelix and WRKYs (except one WRKY) were up-regulated, whereas C2C2(Zn) DOF, CCAAT and SET were down-regulated. Furthermore, in the 'Signalling' category, two MAP kinase-coding genes were down-regulated in the sterile buds ( Fig. 3; Additional file 2: Table S3).

Identification of male-sterility/male-reproduction related genes
To gain a deeper insight into the molecular mechanism underlying MS, we queried the sesame DEGs in the PMRD which contains 548 Arabidopsis male-sterility/ male-reproduction related genes. Forty nine homologous genes related to plant male reproduction were retrieved; several of these genes encode transcription factors (e.g. bHLH089, MYB99 and AMS). The transcription factor encoding genes showed reduced expressions in sterile buds, implicating their important roles in specifying/ determining tapetal fate and development (Table 3).

Allelic variants of DEGs
To gain a better understanding of the DEGs, we further predicted the effect of allelic variants on the function of their target genes using SnpEff predictor. A total of 1,057 Insertion/Deletions (InDels) were detected in 982 genes expressed in fertile buds, of which 52 reside within 48 DEGs (some genes have two InDels) (Additional file 3: Table S4). Similarly, 1,432 InDels were detected in 1,354 genes expressed in sterile buds, and 86 InDels were located within 83 DEGs (Additional file 4: Table S5). Together, we identified 138 InDels within 131 genes that were differentially expressed either in fertile or sterile buds. Of the 138 InDels identified, 62 were located in 57 genes that were upregulated in sterile buds, and 76 were located in 68 genes that were down-regulated in sterile buds (Additional files 5: Table S6 and 6: Table S7).
Specifically, in the list of up-regulated genes, a number of transcription factor encoding genes such as SIN_1002610 (Ethylene-responsive transcription factor ERF106), SIN_1024026 (NAC2), SIN_1019334 (WRKY 28) and SIN_1011023 (WRKY 33) were found. Some genes encoding 'Brassinosteroid-regulated protein BRU1' (SIN_1022411), 'COP9 signalosome complex subunit 2' (SIN_1015172) and 'Defensin J1-2' (SIN_1021298) were also highlighted (Additional file 5: Table S6). In the list of down-regulated genes, SIN_1008339 (E3 ubiquitinprotein ligase MARCH1), SIN_1010740 (L-ascorbate oxidase homolog), SIN_1026145 (Pollen-specific protein SF3), SIN_1005014 (Protein disulfide-isomerase 5-3) and SIN_1010051 (Sugar transport protein 8) were of interested in that they were likely to be related with pollen development (Additional file 6: S7). A subset of 21 genes containing InDels that were predicted to cause loss of function (LOF) and/or codon change (CC) was selected for further analysis (Table 4). Of these, InDels likely to cause CC (termed 'CC-type') were detected in 6 genes at sterile alleles, and in other 6 genes at fertile alleles. Moreover, LOF-type InDels were also detected in 6 fertile alleles and 7 sterile alleles, which showed a higher expression level in fertile buds and sterile buds, respectively (marked with asterisk; Table 4). Thus, it seemed that LOF-Type InDel might lead to the increase of transcript abundance in which it resides. This observation was further confirmed by the fact that in the 11 genes up-regulated in fertile buds, the majority (9 out of 11) of InDels were detected in fertile alleles. Similarly, in the other 10 genes up-regulated in sterile buds, the majority (80 %) of the InDels were detected in sterile alleles.
In particular, some genes such as SIN_1025190 (SCP18, Serine carboxypeptidase), SIN_1017245 (F3PH, Flavonoid 3'-monooxygenase) and SIN_1018350 (IPT, Adenylate isopentenyltransferase) with both LOF-type and CC-type InDels in sterile alleles, were up-regulated in sterile buds. Moreover, the gene encoding a kinase (SIN_1004626) with both LOF-and CC-types of InDels in fertile allele was up-regulated in fertile buds (downregulated in sterile buds). Interestingly, in another gene, SIN_1005818 (HMGB9, High mobility group B protein 9), InDel was detected in both alleles, with putative disruptive_inframe_deletion in sterile allele and LOF in fertile allele. The expression of this gene was downregulated in sterile buds but up-regulated in fertile buds ( Table 4, Additional file 7: Table S8). Taken together, a large number of sequence variants were detected in these DEGs, and their effects on transcript abundances were not conclusive.

Real-time quantitative PCR validation
To verify the RNA-Seq results, we chose an alternative strategy for both the up-and down-regulated DEGs. Twenty genes were randomly selected for validation by Real-time quantitative PCR (qRT-PCR) using the same RNA samples that was used for RNA-Seq. Primer sets were designed to span exon-exon junctions (Additional file 8: Table S1). Results showed that  although genes expression fold changes detected by qRT-PCR, in most cases, were higher than those by RNA-Seq, the trends were similar between these two methods, thus confirming the accuracy and reliability of RNA-Seq. As an example, the expression patterns of 12 randomly selected Male-sterility/male-reproduction genes were listed in Table 5, which demonstrated that the expression levels revealed by qRT-PCR and RNA-Seq were highly correlated (r = 0.762, P < 0.01, n = 12).

Discussion
We presented here, to our knowledge, the first study of sesame DGMS at transcriptome level. Transcript abundances from both fertile and sterile buds were acquired  We then mapped the high quality transcriptome reads onto the sesame reference genome and identified more than 22 thousands expressed genes, of which only 1,502 genes (~6.6 %) were differently expressed in either sterile or fertile buds, suggesting that a limited number of key genes are enough to transform the trait observably, although the development of anther is a complicated and polygenic process. We identified 49 anther development related genes in sesame that have homologs in Arabidopsis, some of which encoded transcription factors (bHLH089, MYB99, and AMS) and were possibly associated with the determination of tapetal fate and development (Table 3). Of these, 32 were down-regulated and the rest of 17 were up-regulated. Moreover, homologs of MS genes (cloned) accounted for nearly one half of the genes within each regulated category, and the rest of genes were annotated as MR related (male-reproduction related genes, with GO evidence), thus demonstrating that all these genes might be good candidates responsible for MS (Table 3). This can be explained by the fact that the sesame MS mentioned here initiated from PMC, the second stage of the anther and pollen development pathway [13], thus leading to the failure of anthers development, as observed in the male sterile buds [8].
Specifically, we found that DYT1 and TPD1 were in the list of 32 down-regulated DEGs (Table 3). Previous study has showed that DYT1 might regulate anther development via the expression of AMS and many tapetum-preferential genes, thereby indirectly affects pollen wall formation [17]. TPD1, a small peptide, was mainly expressed in microsporocytes and likely secreted into the interface between the tapetum and male reproductive cells to interact and form a receptor complex with the leucine-rich repeat receptor-like kinases EMS1, thus determining cell fate of the tapetal layer [16,33]. Therefore, it is likely that the down regulation of DYT1 and TPD1 in sesame might affect the pollen release through determining cell fate of the tapetal layer.
Another gene of interest was RBOHE (RESPIRATORY BURST OXIDASE HOMOLOGUE E). Previous study also showed that RBOHE (At1g19230) was an antherpreferential or tapetum-enriched gene, and functional loss of RBOHE resulted in delayed tapetal degeneration, thus the expression of RBOHE was reduced in dyt1 and tdf1 [33]. Consistent with this, we found that the RBOHE homologs in sesame, SIN_1024646 and SIN_1007549, also displayed significantly reduced expression in sterile buds (log 2 S/F = −1.7 and −0.9), if compared to fertile buds (Additional file 1: Table S2). Therefore, RBOHE may have a similar function in sesame DGMS.
Apart from DYT1 mentioned above, QRT2 (QUAR-TET2) was also in the MS genes (cloned) list (Table 3). Three QRT genes including QRT2 are required for the degradation of pollen mother cell wall when microspores are released from their tetrads [12]. Furthermore, QRT2 are required for anther dehiscence. In the process of floral abscission which co-regulated by JA, ethylene and abscisic acid (ABA), QRT2 is regulated by ethylene and ABA [34]. Moreover, anther dehiscence-related polygalacturonase activity is likely to be regulated by JA, ethylene and ABA [13]. In this study, the reduced expression of QRT2 was coupled with the up-regulation of genes involved in ethylene synthesis.
There were 17 up-regulated sesame genes with homologs in Arabidopsis (8 homologous to MS genes and 9 to MR genes, Table 3). Of these, the expression level of SIN_1007695 (spermidine hydroxycinnamoyl transferase, SHT) showed >200 fold increase in sterile buds, which was reminiscent of SHT expressed in the tapetum of Arabidopsis anthers [35]. Moreover, SHT was assigned into 'cluster 81' by the online tool of FlowerNet [36], which includes several genes such as KCS10, GH31 and ATA7; their homologs in sesame (i.e. SIN_1007525, SIN_1025709 and SIN_1002500) were co-up-regulated in sterile buds (Additional file 1: Table S2), implying their possible involvement in MS. This 'cluster 81' also contained TSM1 (tapetum-specific methyltransferase1), which encodes a cation-dependent CCoAOMT-like protein involved in phenylpropanoid polyamine conjugate biosynthesis and has a role in the stamen/pollen development of Arabidopsis [37]; the rest of genes with unknown functions are likely to play roles in pollen exine and lipid biosynthesis, based on their description in AtEnsembl [36]. Therefore, it would be worthy of investigating the rest genes within this cluster to get a clear view of their function. JA is specifically required for anther dehiscence during anther development [38]. Mutations in genes that participate in JA biosynthesis and perception cause a failure or delay in anther dehiscence and pollen inviability which result in male sterility [39]. Examples of such genes include the DEFECTIVE IN ANTHER DEHIS-CENCE 1 (DAD1), which encodes a phospholipase A1 that catalyses the initial step of JA biosynthesis; AOS, a gene that encodes allene oxide synthase; DEHISCENCE 1 (DDE1)/OPR3, which encodes the OPR protein 12oxo-phytodienoic acid reductase in the JA synthesis pathway [40]. Defects in all stages of the JA pathway appear to cause similar phenotypes of reduced filament elongation and a lack of dehiscence. Delayed dehiscence or non-dehiscence phenotypes have been observed in mutants defective in JA biosynthetic enzymes [13]. In this study, SIN_1016850 (homolog of PLA15, Phospholipase A1-Igamma1) was significantly up-regulated in sterile buds, whereas the homologs of allene oxide synthase encoding genes did not show differences (data not shown). However, SIN_1022877 and SIN_1022878, which are homologs of OPR1 (12-oxophytodienoate reductase 1) in Arabidopsis, displayed obvious downregulation in sterile buds (Additional file 1: Table S2). These data strongly indicated that genes involved in JA pathway are also responsible for MS in sesame.
Plant gene expression regulation is a complicated network. Through specific interactions with cis-acting target elements, transcription factors can regulate a series of relevant down-stream targets, which play an important role in plant development and the response to environmental stress. Arabidopsis ANTHER IN-DEHISCENCE FACTOR (AIF), a NAC-like gene, acts as a repressor that controls anther dehiscence by regulating genes in the jasmonate biosynthesis [38]. In fact, for the annotated NACs in Swissprot, all of the 9 sesame homologs were up-regulated in sterile buds, which strengthen the role of NACs in the regulation of MS (Fig. 3, Additional file 1: Table S2). Furthermore, 11 of the 12 WRKYs that were significantly up-regulated in sterile buds, were annotated as the orthologs of WRKY33 (Fig. 3, Additional files 1: Table S2). WRKY33 proteins are evolutionarily conserved with a critical role in broad plant stress responses, and Arabidopsis WRKY33 is a key transcriptional regulator of hormonal and metabolic responses [41]. Moreover, genes involved in redox homeostasis, salicylic acid (SA) signaling, ethylene-JA-mediated cross-communication and camalexin biosynthesis were identified as direct targets of WRKY33 [42]. Furthermore, the down-regulation of JA-associated responses appears to involve direct activation of several jasmonate ZIM-domain genes, encoding repressors of the JA-response pathway, by loss of WRKY33 function and by additional SA-dependent WRKY factors. In the present study, the co-expression behavior of NACs and WRKYs suggested their pivotal roles in regulating the sesame MS (Fig. 3, Additional file 1: Table S2).
To understand the impact of sequence variation on gene expression, the effects of allelic variants on the function of their target genes were predicted using SnpEff. Interestingly, 6 InDels were found in fertile alleles, which were up-regulated in fertile buds (and the wild-type sterile allele had lower level of expression in sterile buds); and 7 InDels were found in sterile alleles, which were up-regulated in sterile buds (Table 4). This observation suggested that the causal effect of sequence variation on transcript abundance was not so straightforward, but rather confound. This can be explained by the way that most of the InDels were detected in coding regions rather than in the promoter regions, in which it can directly affect the transcript abundance. Occasionally, we also identified InDels showing a transcriptionalregulatory function, in which the transcript abundance was decreased by the existing of causative InDels. For example, two genes (SIN_1025700 and SIN_1005818) with InDels in sterile alleles caused a decrease of transcript abundances in sterile buds, and another two genes (SIN_1004703 and SIN_1019529) with InDels in fertile alleles led to the down-regulation of genes in fertile buds, thus demonstrating a cis-acting fashion.
As suggested by Rutley and Twell [43], transcriptome studies of the male gametophyte have not only increased our knowledge and understanding, but also improved the efficacy of experimental strategies by informing experimental design (such as by gene selection for reverse genetics) and through query-based and co-expression analysis. The present investigation provided many DEGs and a number of candidate genes that can be used to elucidate the molecular mechanism underlying sesame DGMS through transgenic verification in future.

Conclusions
This study provided a set of 1,502 genes differentially expressed in the fertile and sterile buds of sesame DGMS lines based on transcriptome profiling. Half of these genes were up-regulated in sterile buds, demonstrating a complex expression pattern. Regarding the genes implicated in ethylene and JA synthesis & signaling, the expression of which were up-and down-regulated in the sterile buds, respectively. Furthermore, the majority of NAC and WRKY transcription factors were up-regulated in sterile buds.
Moreover, 49 sesame genes with homologs in Arabidopsis related with male-sterility/male-reproduction showed reduced expression in sterile flower. Some of these genes encode transcription factors (bHLH089, MYB99, and AMS) that possibly have a role in specifying or determining tapetal fate and development. Furthermore, the predicted effect of allelic variants on the function of target gene highlighted several InDels, which might contribute to fertility determination.

Plant materials and RNA preparation
The sesame plant materials used in this study include the newly developed DGMS line W1098A and its fertile counterpart W1098B, which differed from each other only by pollen fertility [8]. These two lines were both cultivated in the experimental fields of the Oil Crops Research Institute, CAAS (Wuhan, Hubei Province, China). Buds with a length of~2.5 mm were separately stripped from each of five male sterile and fertile plants and bulked for transcriptomic profiling. The fertile bulk and the sterile bulk of buds were immediately snapfrozen in liquid nitrogen and then stored at −80°C freezer until use. Total RNA was isolated from bulks of sterile buds and fertile buds with TRIzol reagent (Gibco-BRL) according to the manufacturer's instruction. Then two cDNA libraries were constructed from sterile and fertile buds, as previously described in sesame [44]. Briefly, approximately 5 mg of mRNA was fragmented, converted to cDNA, and PCR amplified according to the Illumina RNA-Seq protocol (Illumina, Inc. San Diego, CA). Sequence reads were generated using the Illumina Genome AnalyzerII (SanDiego, CA) and Illumina HiSeq 2000 platform (San Diego, CA) at the Beijing Genomics Institute (Shen Zhen, China).

Identification of Differentially Expressed Genes
The clean reads were mapped to the reference genome sequence of S. indicum (http://ocri-genomics.org/Sinbase/) [11] using SOAP aligner/soap2 (an improved ultrafast tool for short read alignment) [45]. RPKM were used to gauge the relative transcript abundance for each gene. Using the DEGseq program, significantly differential gene expression was identified between the fertile and sterile buds libraries [46]. The FDR was used to determine the threshold p-value. In this study, a stringent of FDR ≤ 0.001 and │log2 (Fold change ratio of sterile/fertile)│ ≥ 1.00 was used as the threshold to select a significantly different expressed gene.

Characterization of genetic variations
Characterization of the sequence variants such as InDels was performed using SnpEff version 4.1 [47] by referring to sesame genome annotation downloaded from the Sinbase (http://ocri-genomics.org/Sinbase) according to Wang et al. [48]. Sequence variants (InDels, frame shift, stop gained, stop lost and non synormymous coding) that potentially have high impact on transcript/ protein were predicted according to the method described by Saeed et al. [49].

GO and KEGG Pathway Enrichment Analysis
The DEGs were used for GO and pathway enrichment analysis. A corrected P ≤ 0.05 was selected as the threshold of significance to determine enrichment in the gene sets [50]. Functional classes inferred from DEGs were assigned according to GO mapping provided by the ensemble database. The Blast2GO program (https://www.blast2go.com/) was used to obtain GO annotations for the all DEGs [51]. Then, the results were submitted to WEGO (http://wego. genomics.org.cn) to generate a GO classification graph of all DEGs [52].
KEGG pathway analysis was based on the comparative results between our maped genes and the current KEGG database [53]. MapMan (version 3.5.1 R2) was also used to annotate the DEGs onto metabolic pathways.

Confirmation of candidate DEGs by qRT-PCR
To validate the DEGs detected by RNA-seq, 20 DEGs were randomly selected from 52 common differentially expressed genes in two libraries and then subjected to qRT-PCR analysis, according to Qi et al. [54]. Gene-specific primers were designed with the online tool Primer3 [55] based on the selected unigenes sequences (Additional file 8: Table  S1). Reactions were performed with the SYBR Green Real time PCR Master Mix (TOYOBO, Japan) in a Bio-Rad CFX96 instrument. For each sample, three replicates were run for each gene in a 96-well plate. The relative expression level of each gene was determined using the 2 −ΔΔC T method [56]. All data are expressed as mean ± standard deviation.