Differentially Expressed Genes between Carrot Petaloid Cytoplasmic Male Sterile and Maintainer during Floral Development

Petaloid cytoplasmic male sterility (CMS) is a maternally inherited loss of male fertility due to the complete conversion of stamens into petal-like organs, and CMS lines have been widely utilized in carrot breeding. Petaloid CMS is an ideal model not only for studying the mitochondrial–nuclear interaction but also for discovering genes that are essential for floral organ development. To investigate the comprehensive mechanism of CMS and homeotic organ alternation during carrot flower development, we conducted transcriptome analysis between the petaloid CMS line (P2S) and its maintainer line (P2M) at four flower developmental stages (T1–T4). A total of 2838 genes were found to be differentially expressed, among which 1495 genes were significantly downregulated and 1343 genes were significantly upregulated in the CMS line. Functional analysis showed that most of the differentially expressed genes (DEGs) were involved in protein processing in the endoplasmic reticulum, plant hormone signal transduction, and biosynthesis. A total of 16 MADS-box genes were grouped into class A, B, C, and E, but not class D, genes. Several key genes associated with oxidative phosphorylation showed continuously low expression from stage T2 in P2S, and the expression of DcPI and DcAG-like genes also greatly decreased at stage T2 in P2S. This indicated that energy deficiency might inhibit the expression of B- and C-class MADS-box genes resulting in the conversion of stamens into petals. Stamen petaloidy may act as an intrinsic stress, upregulating the expression of heat shock protein (HSP) genes and MADS-box genes at stages T3 and T4 in P2S, which results in some fertile revertants. This study will provide a better understanding of carrot petaloid CMS and floral development as a basis for further research.


Phenotypic analysis of floral development in the petaloid cytoplasmic male sterility (CMS) line (P2S) and its maintainer line (P2M).
To enable more detailed detection of the differentiation process in floral development, we compared the morphological change of umbels from P2S ( Fig. 1e-h) and P2M (Fig. 1j-m) at four developmental stages by scanning electron microscopy. The size of floral buds was sampled according to the previous phenotype observation 40 . At stage T1, only one shoot apical meristem germinated in the middle of the stem apex (Fig. 1e,j). During stage T2, some young inflorescences had appeared at the periphery of the umbel with multiple germinated inflorescence primordia in the center (Fig. 1f,k). At stage T3 which was similar with the stages s2-s3 40 , the inflorescence gradually became mature and many floret primordia germinated in the middle of the inflorescence (Fig. 1g,l). At stage T4, floret primordia had developed into young florets in each inflorescence and formed whole compound umbels (Fig. 1h,m). There was no obvious morphological difference between P2S and P2M in the four stages. However, when the flowers opened, the stamens of P2M normal fertile flowers were substituted by petals in P2S (Fig. 1n,i). Additionally, the real petals of P2S were a greenish color, different to that of the normal white petals of P2M. It is interesting that about 3.4% (15/440) of revertant plants with normal stamens were found in P2S.

Sequence assembly and functional annotation of differentially expressed genes (DEGs).
The gene regulation of stamen and petals development is initiated earlier than the organs morphological differentiation 31 . In order to explore the regulatory networks related to carrot CMS, we performed RNA-seq analysis of umbels from P2S and P2M at four stages. After Illumina sequencing and data processing, a total of 118.03 Gbp of clean data was obtained, with the data of each sample occupying more than 4.13 Gbp. The Q30 base percentage of all samples was above 88.65% and the average GC content was 44.92%, indicating that the quality of sequencing data was sufficient for the subsequent analysis of DEGs. Then, these clean reads were mapped to the carrot reference genome, and fragments per kilobase of transcript per million fragments mapped (FPKM) was used to evaluate the expression level of each gene. An average of 24,678; 24,438; 24,539; and 24,529 P2S genes were identified as expressed at stages T1, T2, T3, and T4, respectively, and an average of 25,212; 23,942; 25,571; and 24,768 P2M genes were identified as expressed at stages T1, T2, T3, and T4, respectively. Additionally, 1319 novel transcripts were detected in this work, 861 of which were functionally annotated in public databases (Table S1).
A false discovery rate (FDR) of <0.01 and fold change of >2 were used as the thresholds for screening DEGs between P2S and P2M libraries. The analysis identified a total of 2838 genes as being differentially expressed at the four flower developmental stages, of which 1495 were significantly upregulated and 1343 were significantly downregulated in P2M compared to P2S (Fig. 2). Compared to P2S, 217, 470, 1355, and 796 P2M genes were identified as being differentially expressed at stages T1, T2, T3, and T4, respectively (Fig. 2a). Additionally, 31 genes were found to be significantly differentially expressed between P2S and P2M at the latter three stages, 11 of which were heat shock proteins (Table S2). Three genes were differentially expressed in the previous three stages, but no genes for all of the four stages (Fig. 2a). Compared with P2S, 44 genes were upregulated and 173 genes were downregulated at stage T1 in P2M. A total of 209 genes and 261 genes were identified to be respectively upregulated and downregulated at stage T2 in P2M. The largest difference existed between P2M and P2S at stage T3, with 916 upregulated genes and 439 downregulated genes identified in P2M. At stage T4, 326 genes were upregulated and 470 genes were downregulated in P2M (Fig. 2b).
To classify the functions of the DEGs, gene ontology (GO) classification and enrichment analysis was performed. A total of 126, 234, 768, and 470 DEGs were categorized into 52 major functional groups at stages T1, T2, T3, and T4, respectively (Fig. 3). The molecular function (MF) of the DEGs mainly embodied catalytic activity, binding, transporter activity, and electron carrier activity. The DEGs categorized as functioning in a biological process (BP) were highly represented in metabolic process, cellular process, single-organism process and response to stimulus. The DEGs that functioned as a cellular component (CC) had some role mainly in cell parts, cells, and organelles at the four stages. However, the differential expression of some genes occurred at different stages (Fig. 3). Within the BP category, the identified DEGs were embodied in growth, immune system process, reproduction, biological adhesion, and rhythmic process at the latter developmental stages. Similarly, DEGs with a CC role were only differentially expressed in the extracellular region during the latter three stages. DEGs with a MF involving structural molecular activity were only detected at stage T4, and regarding receptor activity, at stages T2 and T3. Additionally, when using GO enrichment analysis, it was found that, at the latter three stages, the terms "response to stress" and "response to abiotic stimulus" were significantly enriched in BP, and "UDP-glucosyltransferase activity" was significantly enriched in MF (Table S3). The terms "organic phosphonate transmembrane-transporting ATPase activity" and "mitochondrial proton-transporting ATP synthase complex, catalytic core F(1)" were significantly enriched at stages T2 and T3 (Table S3).
To further explore the biological function of DEGs, we conducted KEGG pathway analysis of all DEGs. Pathways with P value < 0.05 were defined as significantly enriched pathways of DEGs. A total of 47, 91, 249, and 188 DEGs were respectively assigned to 35, 52, 97, and 83 pathways at stages T1, T2, T3, and T4, respectively ( Fig. 4, Table S4). The pathways with the largest number of DEGs were related to protein processing in the endoplasmic reticulum (ER), followed by pathways related to plant hormone signal transduction and then phenylpropanoid biosynthesis. KEGG enrichment analysis revealed that protein processing in the endoplasmic reticulum pathway was enriched at stages T2 and T4. Moreover, the biosynthesis of unsaturated fatty acids and fatty acid metabolism were also enriched at stage T4. A total of 28 and 31 DEGs were involved in plant hormone signal transduction and phenylpropanoid biosynthesis pathways, respectively. Furthermore, 16 genes encoding pentatricopeptide repeat protein were detected to be differentially expressed between P2S and P2M, and seven of them were targeted to mitochondria (Fig. S1).
DEGs involved in oxidative phosphorylation. As the center for ATP synthesis, mitochondria play a crucial role in CMS in many plant species 2 . In the present study, 12 genes involved in oxidative phosphorylation were identified to be differentially expressed between P2S and P2M. Among them, two genes encoding NADH dehydrogenase (DcNad2 and DcNad5) were significantly downregulated at stage T2 in P2S when compared to P2M (Fig. 5a, Table S5). Another NADH dehydrogenase gene (DcNad9) was distinctly downregulated in P2S at the three latter stages, and had the highest expression at stage T3 in P2M. The qPCR results confirmed that the expression of DcNad9 in P2M were upregulated 3.5-fold and 2.6-fold at stages T3 and T4, respectively (Fig. 5a,b).
Four transcript genes encoding cytochrome c oxidase, DcCox2, DcCox3, LOC108214800/Cox2, and LOC108222083/Cox1, were also detected. Notably, it was found that the nuclear gene LOC108214800 was 99% identical with DcCox2, which was located in mitochondria (Table S5). The two Cox2-encoding genes and DcCox3 exhibited similar expression patterns between P2S and P2M. The expressions of DcCox2 and DcCox3 in P2M were 5.0-fold and 5.1-fold higher than those in P2S, and reached peak expression at stage T3 in P2M, as verified by qPCR analysis (Fig. 5a,b). Another cytochrome c oxidase gene, LOC108222083/Cox1, was downregulated only at stage T3 in P2S compared with P2M.
Moreover, five ATPase genes, DcATP9, DcATPβ, LOC108192759/DcATP6, LOC108227727/ATPe, and NoveGene_602 (the homolog of ATPβ), also showed similarly downregulated expression patterns at stage T3 in www.nature.com/scientificreports www.nature.com/scientificreports/ P2S compared to P2M, but showed little difference at stages T1 and T2 (Fig. 5a). Validation of DcATP6, DcATP9, and DcATPβ by qPCR further confirmed our analysis. The expressions of DcATP6, DcATP9, and DcATPβ in P2M were respectively upregulated 2.2-fold, 2.6-fold, and 1.9-fold at stage T3 ( Fig. 5b). At stage T1, DcATP9 was more highly expressed in P2S than in P2M, as determined by RNA-seq and qPCR analysis. These results indicate that the synthesis and transportation of ATP might be disturbed in the CMS line.
DEGs involved in protein processing in endoplasmic reticulum. Protein processing in the endoplasmic reticulum is one of the most significantly enriched pathways at stages T2 and T4 between P2S and P2M. In this study, 38 DEGs assigned to this pathway were identified as heat shock protein (HSP) genes that were always associated with stress signal pathways, hormone receptors, kinase activity regulation, and intracellular transport [46][47][48] . Distinct from the expression patterns of DEGs involved in oxidative phosphorylation, most of the HSP genes displayed upregulated expression patterns at the latter three stages in P2S compared with P2M (Fig. 6a).
A total of 15 HSP20 genes displayed constantly high expression levels from stages T2 to T4 in P2S. By contrast, the expression of these genes in P2M was initially at low levels in stage T2, greatly increased up to the peak expression level in T3, and then declined in T4. However, the expression of HSP20 genes in P2M was significantly lower than that in P2S throughout the latter three stages. Similarly, three HSP70 genes (LOC108208857, LOC108222768, and LOC108223701), three HSP90 genes (LOC108210519, LOC108211101, and LOC108227515), and one gene encoding hsp70-interacting protein (LOC108210532) were also distinctly upregulated from stages T2 to T4 in P2S compared to P2M. Moreover, qPCR analysis showed that the expressions of LOC108222175/HSP20 and LOC108222768/HSP70 were significantly upregulated at stages T3 and T4 in P2S, which further confirmed the results of our transcriptome analysis (Fig. 6b). The high expression of HSP genes indicates that flower development in the CMS line might be related to the response to some kind of internal stress.
To analyze the function of these MADS-box genes, they were subjected to unrooted phylogenetic analysis with putative equivalents from 23 plant species based on 70 protein sequences (Fig. 7). The results suggest that DcMASD2/DcPI was grouped into the B class, subclustered with AtPI and AmGLO, and separated from AtAP3 and AmDEF. Three DcAP1 genes with 71.9% identities were assigned to the A class, adjacent to AmSQUA. Three DcDEF-like genes belonged to another subtree of the B class and were grouped with AtAP3 and AmDEF. DcAG, three DcAG-like genes, and DcAGL12 were assigned to the C class, clustered with AtAG, and were contiguous with VvAG, an AG gene in grapevine (Vitis vinifera). The identities of three DcAG-like genes were 78.6%. DcSEP1, DcSEP1L, and two DcAGL9 were grouped into the E class. No D-class genes were identified in carrot.
On the basis of the above analysis, we further analyzed the expression patterns of these MADS-box genes. Nine of them were detected to be differentially expressed between P2S and P2M (Fig. 8a). Little significant difference in expression was found for any of these MADS-box genes between P2S and P2M at stage T1, and the majority of them exhibited extremely low expression levels. It was also found that most of the MADS-box genes were significantly downregulated at stage T2 in P2S as compared to P2M, including DcPI, three DcAG-like genes, DcAGL9-1, and DcAGL12. However, conversely, the expression levels of all MADS-box genes except for DcAP1-1, DcAP1-2, and DcDEFL-3 increased at stage T3 in P2S, and exceeded the expression level of the same genes in P2M; however, these upregulated expression patterns were not considered statistically significant. Similarly, four MADS-box genes were significantly upregulated at stage T4 in P2S compared with P2M, including DcDEFL-3,  (Fig. 8b). The results showed that all of the selected MADS-box genes were expressed at a low level at stage T1 in both P2S and P2M, and showed no significant difference. However, the expression of the B-class gene DcPI in P2S was downregulated 3.2-fold at stage T2 compared to P2M. DcAG, DcAGL-1, and DcAGL-3 exhibited downregulated expression patterns at stage T2 in P2S, which coincided with the expression of DcPI. On the contrary, the expression levels of all selected MADS-box genes in P2S greatly increased at stage T3, and were significantly higher than the expression levels in P2M. The expressions of DcSEP1, DcSEP1L, and DcAGL9-1 were downregulated at stage T2, however, this change was not significant, though they were significantly upregulated at stage T3 in P2S. The qPCR results not only verified the RNA-seq results but also showed that the majority of these DEGs displayed similar expression tendencies at the same stages.

Discussion
Petaloid CMS is widely used in carrot breeding, and greatly promotes the development of carrot hybrid varieties 13 . To date, several studies have been performed on the characterization of petaloid CMS using different methods [15][16][17]19 . However, these studies were mainly focused on the differences in some mitochondrial genes between carrot CMS and fertile materials, and few studies have been carried out at the whole-transcriptome level. In this study, using the high-quality carrot genome 45 , we conducted comparative transcriptomic analysis between the petaloid CMS line and its maintainer line at four flower developmental stages in order to investigate the regulatory network related to CMS and flower organ development. To understand the process of flower formation and to perform a comprehensive transcriptional analysis during flower development, we firstly observed and analyzed the differentiation of flower organs in lines P2S and P2M throughout four developmental periods (Fig. 1). Some considerable differences were identified between the four flower developmental stages, however, little difference was observed between the two lines which is conscious with the previous observation that petaloid florets changed till organ primordia in whorl 3 emerged 40 , indicating that variation in the expression of genes crucial to CMS and flower development can be detected before the morphological change of floral organs in P2S and P2M.
In total, 426.86 million pair-end reads were generated, and the expression of more than 20,000 genes could be detected in each sample of P2S and P2M (Table S6). This large-scale transcriptome analysis provided us with a valuable resource not only for analyzing the regulatory genes related to petaloid CMS, but also for illuminating the developmental process of flower organ formation on a global level. The total number of DEGs gradually increased in parallel with the change of developmental stage, indicating that the regulation network of the male reproductive development process became more sophisticated. Moreover, the number of upregulated genes at stage T3 increased suddenly, and outnumbered the downregulated genes by more than two-fold compared to other stages (Fig. 2b), since floret primordia germinated at this stage and more genes were involved in metabolic processes and cellular processes at this developmental stage (Figs 3 and 4). GO functional analysis revealed that organic phosphonate transmembrane-transporting ATPase activity and mitochondrial proton-transporting ATP synthase complex were enriched, which suggested that alternation of the expression of ATPase-related genes might have an influence on energy metabolism between P2S and P2M. Additionally, KEGG analysis showed that many genes were involved in oxidative phosphorylation, protein processing in the endoplasmic reticulum, and others (Fig. 4), which suggests that these pathways might play important roles in carrot petaloid CMS and flower development. Furthermore, the expression pattern of DEGs and qPCR analysis indicated that most genes involved in these pathways, such as DcNad9, DcCox2, DcAG, and DcAGL-1, start to show different expression patterns at stages T2 and T3 (Figs 5-7), indicating that stage T2 might be a key stage during the regulation of petaloid CMS. www.nature.com/scientificreports www.nature.com/scientificreports/ Oxidative phosphorylation, also known as the mitochondrial electron transport chain, is one of the important energy production processes in mitochondria, and is crucial for normal plant development, especially flower organ development 49,50 . NADH dehydrogenase, cytochrome c oxidase, and ATPase are key elements of enzyme complexes in the mitochondrial respiratory chain. The activities of these genes are correlated with ATP production. Energy deficiency caused by components of oxidative phosphorylation is one of the important factors responsible for CMS and male flower organ development 7,8,51,52 . In this study, three genes encoding NADH dehydrogenase, four genes encoding cytochrome c oxidase, and four genes encoding ATPase were detected and analyzed; ATP8 was detected without difference, which is in agreement with the results of Robison and Wolyn 18 (Fig. 5). During the four developmental stages, the expression of most DEGs in P2M significantly increased at stages T2 and T3, and reached their peak expression at T3. By contrast, the expression of DEGs in P2S exhibited continuously low levels from stages T2 to T4. Thus, our results indicate that ATP production might be greatly inhibited in the CMS line due to the decline in transcript levels of mitochondrial respiratory chain components, and would therefore not be able to meet the requirements for normal stamen development. Interestingly, we also identified two genes, DcCox2 and LOC108214800, which shared 99% identity and were respectively located in the mitochondria and nucleus. This finding is consistent with the phenomenon that endosymbiotic genes are transferred from organelles to the nucleus 53,54 , which suggests that there might be an endosymbiotic gene process in the carrot genome.
The endoplasmic reticulum is an important subcellular organelle where proteins are folded with the help of lumenal chaperones 55 . Based on KEGG enrichment analysis, 38 DEGs were found to be enriched in protein processing in the endoplasmic reticulum, and most of them were classified as HSPs. HSPs function as molecular chaperones to regulate the folding, localization, and degradation of proteins in the endoplasmic reticulum 56 . More importantly, HSPs play vital roles in signal transduction during stress response 46,47 . Our results also showed that response to stress and abiotic stimulus were enriched in biological processes at stages T2-T4 (Table S3). Abiotic stresses, such as cold and heat, have great effects on floral organ development, especially male flower organs, and HSPs therefore play a vital role in maintaining normal fertility under abiotic stress conditions 57 . Stamen petaloidy in P2S might be considered as an intrinsic stress situation for plants. Thus, we presume that when P2S detects signals of stamen petaloidy and its own response to this internal stress is consequently triggered, resulting in the upregulated expression of HSP genes.
Petaloid CMS in carrot is characterized by male organ abnormality caused by the homeotic conversion of stamens into greenish petals (Fig. 1i,n). Previous studies in Arabidopsis suggested that the identity of floral organs during the developmental process was controlled by MADS-box genes 58 . According to the floral quartet model, class A, B, and E genes specify petals and class B, C, and E genes specify stamens 32 . In this study, 16 MADS-box genes were found, 9 of which were found to be differentially expressed between P2S and P2M (Table 1). Interestingly, there are A-, B-, C-, and E-class MADS-box genes in carrot, however, D-class genes are absent (Fig. 7). When the inflorescence primordia started to differentiate at stage T2, we observed a drastic decrease in the transcription levels of the B-class gene DcPI and three C-class genes DcAG, DcAGL-1, and DcAGL-3 in P2S (Fig. 8a,b). In homeotic Arabidopsis and Antirrhinum mutants with impaired B-class genes, petals are replaced by sepals and stamens are substituted by carpels 33,34 . The petaloid flowers strongly resemble homeotic Arabidopsis and Antirrhinum mutants with impaired C-class genes based on their petal-like structures in whorl 3 33,34 . In this study, three C-class DcAG-like genes exhibited similar downregulated expression patterns at stage T2 in P2S (Fig. 8a,b). Thus, our results suggest that the downregulated transcript levels of B-and C-class genes might play a key role in the homeotic conversion of stamens into petal-like organs in carrot. However, the expressions of B-, C-, and E-class genes in P2S sharply increased at stages T3 and T4 as compared to stage T2, and were significantly higher than those in P2M. The simultaneously upregulated expression patterns of DcPI, DcDEF, DcAG, DcAGLs, www.nature.com/scientificreports www.nature.com/scientificreports/ and DcSEP1s at stages T3 and T4 in P2S indicates their interdependence as described in four-quarter model in Arabidopsis, and that B-factors interacted with C-and E-factors to regulate stamen and petal development 32 . Additionally, the highly increased expression patterns of these MADS-box genes with high expression in P2S are in agreement with the upregulated expression profiles of HSP genes at stages T3 and T4 in P2S. We hypothesized that the low transcript abundance of B-and C-class genes in P2S at stage T2 might be associated with the petaloidy of stamens, and that T2 might be a crucial stage for stamen development in carrot. When plants receive stamen petaloidy signals at stage T2, they might regard these signals as internal stress, and thus trigger the upregulation of HSP genes and MADS-box genes to resist the development of male organ abnormality. This could explain the observation that there was a small proportion of revertant plants with normal stamens (3.4%) in P2S.
In general, previous studies have reported that mitochondrial genes could affect the expression of nuclear genes that are related to floral organ development 23,26,59 . In this study, our results indicate that several key genes involved in oxidative phosphorylation, such as NADH dehydrogenase, cytochrome c oxidase, and ATPase, show continuously low expression from stage T2, which caused energy deficiency in P2S (Fig. 5), and that the expression of DcPI and DcAG-like genes greatly decreased at stage T2 in P2S (Fig. 8), which might be one factor causing stamens to convert into petals. However, it is confusing that the MADS-box genes were expressed to a significantly higher degree in P2S than P2M at stages T3 and T4, while the expression of genes involved in oxidative phosphorylation was still low in P2S. Stamen petaloidy might be interpreted as an intrinsic stress by plants, and contribute to the upregulated expression of HSP genes and MADS-box genes at stages T3 and T4 in P2S (Figs 6  and 8), resulting in the occurrence of some fertile revertants. The results of this study provide an improved foundation for further research into the molecular mechanisms of carrot petaloid CMS and flower development. www.nature.com/scientificreports www.nature.com/scientificreports/ Future work focused on characterizing the function of these candidate genes will be helpful for understanding CMS and flower development in carrot.

Methods
Plant materials. The petaloid CMS line (P2S) at BC 1 S 4 M 6 and its maintainer line (P2M) at F 2 S 4 M 6 were used in this study, these two lines have very similar genetic background except the different male flower organs. Seeds corresponding to these two accessions were directly sown in the field at the Langfang station of the Chinese Academy of Agricultural Sciences on 28 July 2016. Roots were harvested in late November, stored under cold store, and transplanted in the field in late March 2017. Considering that gene expression usually displays significant differences prior to morphological changes, we chose the period when the two accessions had gone through vernalization as the first sampling stage (T1) 14 , in order to ensure a more comprehensive detection of www.nature.com/scientificreports www.nature.com/scientificreports/ gene expression. For the next three sampling stages, T2, T3, and T4, we chose periods according to the previous phenotype observation of flowers 40 when the transverse length of umbels had respectively reached 2.0, 3.0, and 4.0 mm. The stem apex samples of lines P2S and P2M at stage T1, and umbels grown at lateral branches at the latter three stages, were collected for RNA-seq and qPCR analysis using three biological replicates. All samples were frozen in liquid nitrogen and stored at −80 °C until needed.
Scanning electron microscopy of stem apex and inflorescence. Samples of P2S and P2M at different developmental stages were dissected under an anatomical lens, and then transferred to 2.5% glutaraldehyde fixative and fixed for 4 h at room temperature with intermittent shaking. Then, the samples were washed three times in 0.1 M phosphate buffer (pH 7.2) before the addition of aqueous osmium tetroxide and post-fixed in 1% aqueous osmium tetroxide for 2 h. After that, materials were washed three times with ddH 2 O to remove the remaining osmium tetroxide and were then dehydrated through 20 min exposures to a series of increasing ethanol gradients; the last dehydration with 100% ethanol needed to be repeated three times to ensure that samples were thoroughly dehydrated. After dehydration, samples were submerged in isoamyl acetate for 30 min to replace the ethanol. Subsequently, samples were dried using the CO 2 critical point drying method. Then, samples were mounted on a scanning electron microscopy (SEM) sample platform and coated with palladium-gold using a Hitachi IB-5 ion plating instrument (Hitachi, Tokyo, Japan). The prepared samples were viewed and photographed using a Hitachi SU8000 SEM (Hitachi, Tokyo, Japan).
RNA extraction and library construction for transcriptome sequencing. Total RNA from stem apex samples and young umbels was isolated using the RNAprep Pure Plant Kit (TianGen, Beijing, China). The assessments of RNA quantity and quality were followed by Wang et al. method 60 . Sequencing libraries were constructed using the NEBNext Ultra TM RNA Library Prep Kit for Illumina (NEB, Ipswich, MA, USA) and performed as described in Ou et al. 61 . The library construction and Illumina sequencing were performed on an Illumina HiSeq 2500 platform by Biomarker Technologies Co., Ltd. (Beijing, China).

Differential expression analysis and functional annotation. After removal of the adaptor sequences
and low-quality sequence reads, raw sequences were transformed into clean reads. These clean reads were then mapped to the reference genome of carrot 45 using TopHat2 62 . Gene expression levels were estimated in fragments per kilobase of transcript per million fragments mapped (FPKM). Prior to differential gene expression analysis, for each sequenced library, the read counts were adjusted using the edgeR program package 63 . The resulting FDR was adjusted using the posterior probability of being differential expression (PPDE). FDR <0.01 and |log2(fold change)|≥1 were set as the threshold for significant differential expression. Venn diagrams of DEGs were created using VENNY (http://bioinfogp.cnb.csic.es/tools/venny/index.html).
Gene function was annotated based on the following databases: the NCBI non-redundant protein sequences (nr) database, the NCBI nucleotide sequences (nt) database, the protein family (Pfam) database, the KOG/COG database, the Swiss-Prot database, the Gene Ontology database, and the KEGG database. GO enrichment analysis of DEGs was implemented by the GOseq R package-based Wallenius non-central hypergeometric distribution 64 . KEGG pathway enrichment analysis was based on all detected DEGs between male sterile and fertile samples using the KEGG pathway database 65 . KOBAS software was used to test the statistical enrichment of differentially expressed genes in KEGG pathways 66 .
Phylogenetic analysis of mads-box genes. The 70 protein sequences of the MADS-box genes in 24 representative species (Fig. 7) were downloaded from the NCBI database. Multiple sequence alignment was conducted using ClustalW. Evolutionary analyses were performed using MEGA 7.0 67 , using the neighbor-joining method to infer evolutionary history and p-distance method to compute distance. The bootstrap consensus tree inferred from 1000 replicates was taken to represent the evolutionary history of the taxa 68 .
Quantitative real-time PCR (qPCR) analysis. To verify the expression of DEGs identified by RNA-seq, qPCR analyses were performed on samples from the CMS line and its maintainer line at four developmental stages. The RNA samples used for qPCR were identical to those used for RNA-seq. The qPCR reaction procedure and analysis of relative expression level followed that of Wang et al. 60 using a Bio-Rad CFX96 instrument (Bio-Rad, Hercules, CA, USA). Gene-specific primers (Table S7) for qPCR were designed using Primer Premier 5.0 based on the candidate gene sequences. Three biological replicates and three technical replicates were conducted for each sample. The relative expression level of each gene was calculated using the 2 −ΔΔCt method and all data are shown as mean ± standard deviation between three biological replicates 69 .