Genome-wide analysis of R2R3-MYB transcription factors family in the autopolyploid Saccharum spontaneum: an exploration of dominance expression and stress response

Background Sugarcane (Saccharum) is the most critical sugar crop worldwide. As one of the most enriched transcription factor families in plants, MYB genes display a great potential to contribute to sugarcane improvement by trait modification. We have identified the sugarcane MYB gene family at a whole-genome level through systematic evolution analyses and expression profiling. R2R3-MYB is a large subfamily involved in many plant-specific processes. Results A total of 202 R2R3-MYB genes (356 alleles) were identified in the polyploid Saccharum spontaneum genomic sequence and classified into 15 subgroups by phylogenetic analysis. The sugarcane MYB family had more members by a comparative analysis in sorghum and significant advantages among most plants, especially grasses. Collinearity analysis revealed that 70% of the SsR2R3-MYB genes had experienced duplication events, logically suggesting the contributors to the MYB gene family expansion. Functional characterization was performed to identify 56 SsR2R3-MYB genes involved in various plant bioprocesses with expression profiling analysis on 60 RNA-seq databases. We identified 22 MYB genes specifically expressed in the stem, of which RT-qPCR validated MYB43, MYB53, MYB65, MYB78, and MYB99. Allelic expression dominance analysis implied the differential expression of alleles might be responsible for the high expression of MYB in the stem. MYB169, MYB181, MYB192 were identified as candidate C4 photosynthetic regulators by C4 expression pattern and robust circadian oscillations. Furthermore, stress expression analysis showed that MYB36, MYB48, MYB54, MYB61 actively responded to drought treatment; 19 and 10 MYB genes were involved in response to the sugarcane pokkah boeng and mosaic disease, respectively. Conclusions This is the first report on genome-wide analysis of the MYB gene family in sugarcane. SsMYBs probably played an essential role in stem development and the adaptation of various stress conditions. The results will provide detailed insights and rich resources to understand the functional diversity of MYB transcription factors and facilitate the breeding of essential traits in sugarcane. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07689-w.

Conclusions: This is the first report on genome-wide analysis of the MYB gene family in sugarcane. SsMYBs probably played an essential role in stem development and the adaptation of various stress conditions. The results will provide detailed insights and rich resources to understand the functional diversity of MYB transcription factors and facilitate the breeding of essential traits in sugarcane.
Keywords: MYB, Sugarcane, Expression analysis of stress, Allelic diversity Background Modern cultivated sugarcane (Saccharum spp.) is the primary source of sugar for the world. It is the topmost crop concerning total biomass production and is listed among the 10 most valuable crops [1]. Sugarcane, having a complex genetic background resulting from polyploid interspecific hybrids, was first domesticated approximately 10,000 years ago in New Guinea. Saccharum spontaneum contributes to 10-15% chromosomes in modern sugarcane cultivars, endowing the characteristics such as disease resistance and ratooning capacity [2]. The genome of haploid S. spontaneum has been assembled to the chromosome level and used as the reference genome of sugarcane [3]. Because of the development of multiple transcriptome models in recent times, including those for different tissues, developmental stages, and under various stress treatments, massive RNA-seq data are available, which can provide detailed insights and rich resources for studying sugarcane genes functions.
Transcription factors recognize specific DNA motifs in upstream regions of the genes to regulate their expression. MYB genes constitute one of the most prominent families of plant transcription factors and characteristically possess highly conserved Myb DNA-binding domains, forming a helix-turn-helix structure of about 52 amino acids [4]. MYB genes can be divided into four categories, including MYB-related, R2R3-MYB, R1R2R3-MYB, and atypical MYB, depending on the number of adjacent MYB repeats (R). Proteins with a single or a partial MYB repeat, generally located at either ends or middle of the peptide chain, are MYB-related.
MYB-related proteins include critical telomere binding proteins in maintaining the integrity of the chromosome structure [5]. Moreover, they also play an essential role in regulating gene transcription, e.g., the GARP family of plant Myb-related DNA binding motifs is involved in organ polarity in Arabidopsis [6]. Further, CIRCADIAN CLOCK ASSOCIATED1 (CCA1) and LATE ELON-GATED HYPOCOTYL (LHY) genes regulate the plant circadian clock [7]. A small number of members of R1R2R3-MYB genes are found in higher plants. Interestingly, plant R1R2R3-MYB genes share a similar function of regulating the cell cycle control with the animals [8]. R1R2R3-MYB has also been involved in cell differentiation [9] and plant stress tolerance [10].
Atypical MYB proteins contain four or more adjacent MYB repeats (R). These proteins have been found to encode in a few plants, e.g., Arabidopsis thaliana, Oryza sativa, Vitis vinifera, Glycine max, Physcomitrella patens (data sources displayed in Materials and Methods 2.1), as shown in Fig. 1. Only a few reports have been published about atypical MYB proteins by now, and the role of these proteins in the plant bioprocesses is mainly unknown. MYB transcription factors binding specific DNA sequence (CAACG/TG) result from domain structure that is formed by two closely packed amino acid sequence repeats(R) [11]. When the MYB gene contains at least two MYB repeats (R), it has transcription factor characteristics and explicitly recognizes the DNA motifs to regulate the gene transcription. R2R3-MYB proteins are the largest subfamily of MYB transcription factors in plants, as well as in S. spontaneum (Fig. 1). R2R3-MYB is characterized by two MYB repeats and the presence of a single amino acid (Leu) in the first (R2) repeat [12]. R2R3-MYB has two MYB repeats and a single amino acid (Leu) inserted in the first (R2) repeat. The R2R3-MYB family's expansion originated from the R1R2R3-MYB gene ancestor when losing the R1 repeat sequences during evolution [13] and benefiting from gene duplication events [14].
MYB genes are widely involved in plant-specific processes, such as differentiation [15], hormone response [16], secondary metabolism [17], environmental stress tolerance [18], and diseases resistance [19,20]. At least four MYB genes are involved in lignin biosynthesis in Arabidopsis by activating key regulator genes related to secondary cell wall formation [21]. Under environmental stress, MYB genes have been reported to function in response to adverse stress in Arabidopsis. Moreover, AtMYB2 and AtMYB96 function as transcriptional activators in ABA-inducible gene expression under drought stress [22]. AtMYB96 mediates abscisic acid signaling, induces pathogen resistance response by promoting salicylic acid biosynthesis, and provides drought tolerance via controlling the cuticular wax biosynthesis [20,23].
This study focused on the R2R3-MYB gene family in the S. spontaneum published sugarcane genome. We provided a detailed overview of phylogenetic relationship, gene structure, regulatory elements, expression profiles, allelic evolution, and functional characterization based on abundant transcriptome data. Our study systematically explored the evolutionary dynamics and functional diversification of SsR2R3-MYB genes and could facilitate future research on sugarcane MYB transcription factors.
To analyze the plant MYB genes thoroughly, 20 species representing 11 lineages were screened to construct a plant phylogenetic tree with S. spontaneum, including Green algae, Bryophyta, Gramineae, Cruciferous, Leguminous, Rosaceae, Solanaceae, and others. The tree topology reflected the phylogenetic relationship of these species and divergence time ( Fig. 1). Plant phylogeny showed that the higher plants possessed more MYB genes than the lower plants, such as green algae (e.g., Ostreococcus lucimarinus, Volvox carteri, and Chlamydomonas reinhardtii). A significant expansion of MYB genes was observed after the Cambrian (about 5404 80MYA), demonstrating an explosive biological diversification episode near the early period [24]. Most of the phylogenetic nodes of plant species were observed in the Cretaceous, a geological period when a typical global warming climate contributed to the terrestrial species diversity [25]. Compared with the other four kinds of grasses, S. spontaneum had one of the most significant MYB genes as predicted by PlantTFDB. One reason is the tetraploid nature of the autopolyploid S. spontaneum (octoploid). However, when corrected for ploidy level, the number of SsR2R3-MYB genes in S. spontaneum was still higher than most of the species, including Arabidopsis thaliana and other grass species. The number of MYB-related R2R3-MYB 3R-MYB Atypical MYB genes Total  30  5  1  -36  32  11  0  -43  31  7  0  -38  62  50  3  1  116  62  88  4  1  185  34  85  3  -122  122  125  3  2  MYB genes with a phylogenetic tree of the plant species indicated the expansion of MYB genes from alga to land plants, consistent with previous reports [26]. Maximum likelihood phylogenetic tree of R2R3-MYB genes from O. sativa and S. spontaneum showed that the sugarcane genome contained 15 subgroups (G1-G15) OsR2R3-MYB genes (Fig. 2, Table S2) with each distributing rice MYBs. Sugarcane and rice diverged in the Paleogene (67-26MYA) (Fig. 1); the short divergence time indicated relative conservatism of the ortholog genes. As expected, two species of R2R3-MYB genes were evenly distributed in the tree, and most genes in rice clustered with sugarcane, except for G8. However, the number of genes in each clade varied greatly; for instance, the biggest group, G3, contained 38 genes while the group G8 and G13 comprised just two genes. Twenty SsMYB genes from three unique subgroups, G2, and G10, did not contain rice genes, indicating the species' genetic divergence. Besides, the clusters depicted that the sugarcane MYB family exhibited a more  significant number of genes than that in rice, showing a significant expansion of the SsMYB family. In addition, we also constructed an ML phylogenetic tree with Arabidopsis ( Figure S1). Comparative analysis of two trees showed some differences because of different species, but overall high similarity implied the topology of reliability.
Analysis of genomic location, gene structure, and regulatory elements A total of 202 SsR2R3-MYB genes were named in turn according to their physical position on the chromosomes. MYB genes were distributed throughout all 32 chromosomes (Fig. 3c); the autopolyploid S. spontaneum genome comprised of eight homologous groups of four members each [3]. The chromosome distribution map showed that the location of the MYB genes was not evenly distributed. Most of the SsMYB genes were located on Chr3A and Chr7A, encompassing 19 and 16 genes, respectively. About 11 enrichment clusters, tiny fragments on genomic regions containing three MYB genes, were detected, and half of these genes had MYBbinding sites (MBS) depicting potential interaction among each cluster. However, some chromosomes only contained a few MYB genes. For instance, five chromosomes, including Chr2C, Chr2D, Chr6C, Chr8B, and Chr8D, had only one MYB gene. S. bicolor is one of the closest lineages of sugarcane, possessing relatively perfect genomic sequence data [27,28]. Total 125 SbR2R3-MYB genes were identified from the available sorghum genome using a similar method ( Fig. 1, Table S3). The diversity of the gene structure might be a shred of evidence regarding the evolution of gene families. The Neighbor-Joining method performed the phylogenetical gene structure analysis using diverse gene information ( Fig. 3a and Figure S2). The distribution of the tree branches was consistent with the structural features of the genes. Various sorghum genes were clustered with highly similar SsR2R3-MYB genes, e.g., SbMYB92 clustered with SsMYB149 and SsMYB156 while SbMYB27 was clustered with SsMYB30 and SsMYB44. These results sharpened our understanding of the evolution of gene events during sugarcane polyploidization. A total of 19 SsR2R3-MYB genes did not show the presence of intron, including SsMYB154, SsMYB188, SsMYB194, SsMYB170, SsMYB122, SsMYB182, and SsMYB189. Many MYB genes demonstrated a domain with a cross-intron structure.
Cis-elements in promoter regions play an essential role in controlling transcription and expression, which can deepen the understanding of the regulatory function of MYB genes. Total 2000 bp upstream of transcription initiation site (ATG) was regarded as MYB gene promoters and submitted to the PlantCARE for predicting the motifs. Various motifs from 202 SsR2R3-MYB gene promoters were involved in various plant bioprocesses (Fig.  3b). These diversified cis-regulatory elements could be divided into four main categories in terms of function: stress response, hormone response, light response, and plant growth and metabolism. A high percentage of MYB genes in the anaerobic induction (92%) and drought elements (58.9%) indicated that the MYB genes were more likely to function under these stresses. Moreover, a notable gene, MYB88, was found to have 10 LTR motifs, which is a cis-acting element involved in lowtemperature responsiveness. The significantly enriched LTR elements (5′-CCG AAA-3′) suggested that the MYB88 gene might be involved in plant metabolic response to cold stress. Many of the MYB genes regulate the plant hormone response, especially methyl jasmonate (MeJA) and abscisic acid (ABA) responsiveness. A total of 75 gene promoters had enriched regulatory elements TGACG-motif (5′-TGACG-3′) and CGTCA-motif (5′-CGTCA-3′) involved in MeJA-responsiveness, while 38 gene promoters enriched regulatory elements ABRE involved in abscisic acid responsiveness. These MYB genes were predicted to regulate MeJA and ABA signaling in plants and function in plant defense and leaf abscission. Furthermore, more than 30 light response-related elements were predicted; for instance, conservative light element G-box was widely present in the upstream sequence of genes. Several regulatory elements were also associated with other plant growth and development functions and regulation of seed growth and meristem development. Genes involved in seed-specific regulation contained the same RY-element (5′-CATGCATG-3′), and the elements involved in meristem expression demonstrated CAT-box (5′-GCC ACT-3′) and NON-box (5′-AGATCGACG-3′) in promoter regions. Finally, 119 genes were scattered on MYB binding sites, and 49 genes showed more than one binding site, suggesting that these genes probably interacted with other MYB genes. Four MYB binding elements were found in 202 SsR2R3-MYB promoters, including CCAAT-box (5′-CAACGG-3′), MBS (5′-CAACTG-3′), MBSI (5′-aaaAaaC(G/C)GTTA-3′), and MRE (5′-AACCTAA-3′). There was only one base difference between the former two elements, which accounted for 80% of the total MYB binding elements, suggesting the conservative nature of the sequence CAAC G/TG of the MYB binding site. The autoregulation of plant transcription factors is common in one family, which showed sequence-specific interactions of the family [29,30]. Dof1 binds the PEPC1 promoter, but Dof2 blocks the transactivation of Dof1 [31]. Some identified MYB genes were co-expressed on the STRING network (http://stringdb.org/), such as MYB114, MYB 168, and MYB167; MYB109, MYB108, and MYB47. These MYB genes with MYB binding site indicated the potential interaction  effects. Co-expression analysis further supports the hypothesis.

Pervasive gene duplications
Duplication is a striking feature of the plant genome. Gene duplication in the R2R3-MYB gene family occurred during earlier evolution in land plants and contributed to its amplification [32]. We estimated gene duplication events in the S. spontaneum genome by collinearity analysis. A total of 274 collinearity pairs of SsR2R3-MYB genes were identified by Blastp for all protein sequences and evaluated with MCScanX, including 144 allelic pairs and 130 non-allelic pairs (Fig. 4, Table S4). The collinearity relationships revealed that over half of the collinearity genes were concentrated in Chr 3 and Chr 7. The duplication events for MYB genes were predicted. Total 91 (25.84%) genes were tandem repeats, of which onequarter of genes were located on Chr 7. Furthermore, 146 (39.88%) genes were identified to derive from segmental duplication events; 28.1% genes on Chr 2 and 33.5% on Chr 3 evolved from segmental duplication (Fig.  4, Table S5). Segmental duplication played a critical role in the MYB gene evolution of S. spontaneum, similar to other species. About 66.5% of the R2R3-MYB genes derived from gene duplication events drove the MYB gene family expansion.
Temporal and spatial expression of the R2R3-MYB gene family To characterize the expression profile of MYB transcription factors, the temporally and spatially expression profiles of 202 SsR2R3-MYB genes were analyzed using a total of 50 RNA-seq data among three transcriptome models, including tissue and developmental stages, leaf developmental gradient, and circadian rhythm. The expression heatmap showed that most of the MYB genes had low expression levels, but 71% of gene expression values were greater than 1 (FPKM) in at least one RNA-seq sample (Fig. 5a, Table S6). Expression values of 15 MYB groups were presented in Table S6, and group G6 genes seemed to be expressed greater than that in the other groups. Five different expression patterns, i.e., C1-C5, were investigated on the tissue and developmental stages transcriptome by K-means (Fig. 5b). A total of 85 SsR2R3-MYB genes belonging to the C1 and C3 clusters had low expression value, particularly C1 genes with almost no expression. On the contrary, C2 cluster genes displayed a relatively higher expression level in all developmental periods of leaf and stem. Interestingly, 37 genes of the C4 cluster were highly expressed in the stem during the seedling stage, the early stage of the stem formation ( Figure S3A). Moreover, in the C5 cluster, 35 genes were highly expressed in the stem during each period, probably playing a regulatory role in the stem development ( Figure S3B). The clusters indicated that the global gene expression levels in the stem were significantly higher than those in the leaves, suggesting SsR2R3-MYB genes might play an essential role in stem tissue. The relative expression of SsMYB43, SsMYB52, SsMYB65, SsMYB78, and SsMYB99 were quantified by RT-qPCR, verifying the results of RNA-seq data ( Figure  S4A); additionally, SsMYB3, SsMYB15, and SsMYB157 predominant expressed in the early stage of stem formation depicted as prophase of the stem (Pro-stem), which was much higher than those in other stem nodes and leaf tissues ( Figure S4B).
Sugarcane is a typical C 4 plant with high light use efficiency. The developmental gradient model of grass leaves could be used to study C 4 photosynthesis and its regulatory factors [33,34]. The regulatory role of SsMYB genes on C 4 photosynthesis was investigated on the developmental dynamical transcriptome of sugarcane leaf. As suggested by the C 4 photosynthetic development model, leaves are gradually differentiated for active photosynthesis [33]. A total of 27 differentially expressed (See figure on previous page.) Fig. 3 Structure, distribution, and regulatory elements of SsR2R3-MYB genes. a Comparison of gene structure between S. spontaneum and S. bicolor based on the phylogenetic tree. ClustalX performed the sequence alignment of SsR2R3-MYB and SbR2R3-MYB proteins, and the Phylogenetic tree was constructed using MEGA 7.0 with Neighbor-Joining (NJ) method, 1000 bootstrap replicates, Pairwise deletion, and Bootstrap values on the nodes. SsMYB gene names are marked black, and SbMYB gene names are marked red. Gene sequences were modified to start at the transcription initiation site (ATG), and gene structures were displayed using GSDS2.0 (http://gsds.cbi.pku.edu.cn/). The CDS sequence and intron were represented as fine lines and yellow cylinders, and green cylinders highlighted the MYB domain. The first subgroup was presented here when the estimated phylogenetic relationship of S. spontaneum and S. bicolor and others were shown in Figure S2. b Cisregulatory elements of SsR2R3-MYB gene promoters with diversified plant biological functions. The functions of the predicted cis-regulatory elements covered four main categories: stress response, hormone response, light response, plant growth, and metabolism. The x-axis showed divers plant biological functions, and the y-axis indicated the number of a specific category of genes in that main category. The red rectangle represented the genes containing more than six elements involved in regulating a particular plant function. c Distribution of SsR2R3-MYB gene members in S. spontaneum genome. 202 SsR2R3-MYB genes were named according to their physical position on the chromosome and tagged in red font. Yellow font indicated chromosome name, and chromosome was represented as hollow cylinders with length scale (bp) on the left. The green spots displayed a gene enrichment cluster SsR2R3-MYB genes were detected by the leaf developmental gradient. Most of the genes (Class I) showed an expression profile, illustrating high value in the early stage of leaf development ( Figure S5). Only three genes SsMYB169, SsMYB181, and SsMYB192 in Class II (Fig.  5c, Figure S5), were identified as putative C 4 -related transcription factors using the method that associated the co-expression pattern with the photosynthetic activity [35]. The expression increased with the development of C 4 photosynthesis and displayed the highest accumulation at the leaf mature zone. SsMYB181 and SsMYB192 shared one haplotype gene Sspon.07G0015250 with SsMYB169, as the tandem genes SsMYB181 and SsMYB192 derived from a gene duplication event. Circadian rhythm is another module to study photosynthesis, in which previously identified C 4 -related regulators could also be verified. Nine SsR2R3-MYB genes showed a significant association of expression profile with the    Fig. 4 Collinearity relationships of SsR2R3-MYB genes on the S. spontaneum genome. SsR2R3-MYB collinear gene pairs were mapped to their respective locus in the S. spontaneum genome in a circular diagram. Genes located on the same chromosome (e.g., Chr1, Chr2, Chr3, Chr4, Chr5, Chr6, Chr7, Chr8) shared one line, and the inter-chromosomal collinear genes pairs were linked with the former linear color light-dark cycle (Fig. 5d). Their expression patterns were divided into three modules, each module containing three genes. The expression level of SsMYB169, SsMYB159, and SsMYB153 tailed off during the daytime until around 6:00 pm and then gradually recovered till the next cycle. However, SsMYB48, SsMYB57, and SsMYB158 raised their expressions during the day and fell at night. Unexpected but reasonable, the preliminary identification of three C 4 -related regulators, SsMYB169, SsMYB181, and SsMYB192, also showed daylight expression pattern, hinting at their involvement in the regulation of circadian rhythm, indicating that the three candidate MYB transcription factors were associated with C 4 photosynthesis.

MYB genes involved in response to drought and diseaseinduced stress
The expression patterns of SsR2R3-MYB genes were evaluated under environmental stress (biotic and abiotic stress). Six SsR2R3-MYB genes with significantly differential expression were responsive to drought induction (Fig. 6a, Table S7). The transcripts of four genes, SsMYB54, SsMYB36, SsMYB61, and SsMYB48, rapidly accumulated after drought treatment, but their expression reduced normal level after rehydration. On the other hand, SsMYB29 and SsMYB166 showed the opposite trend. Further, the upstream regulatory elements of these six genes contained the MBS element (5′-CAAC TG-3′), which was identified as MYB binding site    involved in drought-inducibility. Half of these genes retained more than one MBS. Pokkah boeng disease of sugarcane (PBD) is one of the most severe and devastating diseases caused by the Fusarium species complex, a fungal pathogen [36,37]. Nineteen different MYBs were associated with sugarcane PBD-infection and response (Fig. 6b, Table S7). According to the gene expression profiles, these genes included 14 genes with increased expression in defense response and five genes with reduced expression.
Sugarcane mosaic disease is a highly transmissible viral disease present in cane-growing regions worldwide. Sugarcane mosaic virus (SCMV), belonging to the positive-sense single-stranded RNA viruses, reduces yields by damaging chloroplast and blocking photosynthesis [38,39]. After SCMV infection, 10 SsR2R3-MYB gene expression increased, and one gene, MYB176, decreased, suggesting that these MYB genes were involved in defense against SCMV infection (Fig. 6c, Table S7). We discovered that these MYB genes were unique to sugarcane diseases, indicating the defense specificity of MYB genes for conferring the resistance of sugarcane pokkah boeng and mosaic disease.

Functional characterization
The potential function of SsR2R3-MYB genes was predicted on the identified genes with significantly specific expression. Fifty-six SsMYB genes were involved in seven plant bioprocesses (Fig. 6d) Contain None Fig. 6 Heatmaps of differentially expressed MYB genes in response to stress conditions and presumed function. The heatmaps of DEGs expression value were shown in (a, b. c) based on RNA-seq data from drought stress, pokkah boeng disease, and sugarcane mosaic disease. DEGs were identified due to their expression having significant variation after suffering stress stimulation (FPKM > 2, fold change > 2, p-value < 0.05). Abbreviation: D1, CK; D2, mild; D3, severe; D4, rehydrate. P1, CK; P2, inchoate; P3, advanced. S1, CK; S2, CK detoxify; S3, post-infection. d 56 SsR2R3-MYB genes with specific expression patterns with putative functionalities. Blue boxes represent this function and yellow with none expressed during seeding stem and were possibly involved in stem differentiation and formation (Fig. 6a). Three MYB genes were identified as candidate C 4 photosynthesis regulators, and nine genes responded in the circadian clock. Under diverse stresses, it was seen that 6, 19, and 10 SsR2R3-MYB genes responded to drought, pokkah boeng disease, and mosaic disease, respectively. Notably, SsMYB51 and SsMYB162 illustrated different expression changes between two sugarcane diseases (pokkah boeng and mosaic disease). SsMYB162 significantly accumulated, actively responding to the infection of two diseases (Table  S7). However, SsMYB51 showed a different expression pattern, negatively responding to pokkah boeng but positively answering SCMV. Moreover, 13 MYB genes had more than one putative function, indicating their role in diverse plant bioprocesses (Fig. 6d).

Allelic expression dominance drove SsMYB to function in stem
The transcriptional levels of R2R3-MYB allelic genes were compared among different tissues and different developmental stages to investigate the transcriptome dynamics of R2R3-MYB genes in the allopolyploid across eight homoeologous chromosome pairs, of which 25% of the R2R3-MYB genes displayed allelic expression dominance in all samples. The number of expression dominant genes in the A, B, C, and D genomes was 84, 93, 82, and 79, respectively. Further, the allelic genes were compared in pairs, including A-B, A-C, A-D, B-C, B-D, and C-D (Fig. 7a). Both the number of dominant genes in a single set of homoeologous chromosomes and the pairwise comparison of alleles showed no significant allelic dominance. Captivatingly, the number of dominant genes in the stem was more than that in the leaf in each allelic pair comparison. For four sets of homoeologous chromosomes, the percentages increase was 46.5, 90.6, 10.2, 143.4%, corresponding to A, B, C, and D genomes, respectively, and the overall average rise was 64.5%. The transcriptional expression of allelic genes in the stem tissues showed significant differences among different alleles than those in the leaf tissues. Allelic expression dominant genes derived predominantly from stem transcriptomes. Selective pressure analysis demonstrated that the Ka/Ks median values of dominant allelic pairs in the stem and leaf were mainly in the range of 0.3-0.5 (Fig. 7b). The number of dominant genes in the stem with positive selection (Ka/Ks > 1) was twice that in the leaf, indicating MYB dominant genes evolved faster. MYB allelic genes with median Ka/Ks in the range from 0.4 to 0.6 were divided into three categories: dominant, subordinate, and neutral alleles, of which neutral genes were the majority (Fig. 7c). The differentially expressed dominant and subordinate genes might contribute to allelic variation that affected gene expression, function, and phenotype.

Discussion
Gene duplication played an essential role in gene expansion and functional diversification in the genetic revolution and phenotypic evolution [40]. A total of 202 SsR2R3-MYB genes were identified, the second-highest number of these genes among the 21 essential plant species (displayed in Fig. 1). The number of R2R3-MYB in sugarcane was far more than the other members of the grass family. Nevertheless, sugarcane with octoploid nature had a higher number of MYB genes compared with the other species. The significant enrichment of SsMYB genes probably was affected by two rounds of wholegenome duplication, including allopolyploidization followed by autopolyploidization [41], or two rounds of autopolyploidization [3]. In grasses, 11 ( [42]. The duplication of genes distribution indicated that the MYB genes family expansion in S. spontaneum could be attributed to these duplication events. The large R2R3-MYB gene family resulted from duplication events, and autopolyploidization demonstrated diverse functions in plant-specific processes. Some genes specially expressed in stem tissues were concentrated in the stem prophase, indicating that these MYB genes might regulate biological processes related to stem development. Stem morphogenesis is tightly associated with the formation and lignification of the secondary wall (the central mechanical tissue in the stems of grass species) [43]. Indeed, some MYB transcription factors are identified to be involved in sugarcane stem development. A previous study revealed that 7 ScMYB genes were correlated with lignin content and biosynthesis [44]. ShMYB78 has been recognized as an activator of suberin biosynthesis and regulates suberin deposition [45]. In Arabidopsis, the asymmetric leaves1 (as1) gene encoding an MYB protein-mediated stem cell function interacted with meristematic genes to regulate the shoot morphogenesis [46]. Furthermore, a group of rice and maize MYB genes (OsMYB46 and ZmMYB46) activated the transcription of secondary cell wall biosynthesis and probably interacted with secondary wall-associated NAC genes [43]. The role of these SsMYB genes in stem development might provide potential genetic resources for sugarcane breeding.
MYB genes also play an important role in leaf development in grasses. In maize, a group of MYB was recognized to be involved in leaf development, as indicated by expression gradients. Myb-ZmRS2, MYB60, and MYB61 influence adaxial/abaxial polarity and stomata patterning [47]. Moreover, some ZmMYBs are highly expressed in the transition zone, affecting secondary cell wall and lignin production [33]. The Class I containing 24 SsR2R3-MYB genes were also inferred to have similar functions. A few MYB genes were identified as C 4 regulators and correlated with C 4 photosynthetic cell type-specific gene expression. An MYB gene encoding GRMZM2G130149, which regulates the transcription of phosphoenolpyruvate carboxykinase (PEPCK) in Z. mays, was categorized as a C 4 transcription factor [34]. Three putative C 4 transcription factors (SsMYB169, SsMYB181, and SsMYB192) identified in this study might play a potential role in forming photosynthetic organs and regulating the C 4 photosynthetic pathway. The LATE ELONGATED HYPOCOTYL (LHY) gene, encoding an MYB transcription factor, regulated circadian rhythms in Arabidopsis, and MYB-LHY was involved in circadian photoperiod [48]. In sugarcane, nine candidate MYB genes with high expression were associated with the circadian cycle and therefore performed similar functions. The genes associated with leaf development showed relatively low expression levels (FPKM< 10) than those linked with the stem tissues, hinting at a stem-related expression dominance for most SsMYB genes. Drought is one of the main factors restricting sugarcane growth and sugar production [49]. Identifying particular and novel candidate genes is a great strategy to improve stress tolerance in sugarcane in this context. Certain MYB transcription factors, for instance, MYB_2 [50], SoMYB18 [51], ScMYB2S1 & 2 [52], and ScMY-BAS1 [53], have been associated with the response to drought-induced stress in sugarcane. Six MYB genes were identified as a differential expression in this study, helping understand the sugarcane drought tolerance mechanism.
Following pathogen invasions, plants turn on a series of plant defense mechanisms. MYB transcription factors play a facilitating role in disease resistance by regulating plant hormone metabolism and mediating systemic resistance [54]. AtMYB30 [55], AtMYB96 [20], and SpMYB [56] have already been reported to be involved in disease resistance. Several defense-related MYB candidate genes were identified against pokkah boeng disease and mosaic disease of sugarcane. Hence, MYB genes are a component of plant defense mechanisms against fungal and viral pathogens.
Polyploids are widely distributed among plants, and about 70% of the angiosperms have experienced one or more polyploidization events during their evolution. As a plant genome evolutionary force, polyploidization plays an essential role in speciation and genomic plasticity. In this study, homologous expression dominant genes Ka/ Ks of autopolyploid sugarcane were higher than those of neutral genes, consistent with the report of allopolyploid of B. juncea [57]. The asymmetric evolution of alleles facilitated differential expression, then affected plant biological processes. No significant differences were detected between the four unichromosomal genomes A, B, C, and D for MYB dominance. However, MYB homologous expression of dominant genes was more remarkable in number in the stem tissues than that in the leaf, and the former were subject to selection pressures with more powerful, implying that MYB stems dominant genes intensified selection in sugarcane. Not surprisingly, the transcription level of MYB genes more significantly enriched in the stem. The transcriptional advantages of these MYB homologous expression dominant genes in stem tissues might provide new insights for facilitating polyploid crop breeding for sugarcane.

Phylogenetic analysis
To generate the phylogenetic trees of MYB transcription factor family genes, multiple protein sequence alignment was performed through the MATTF program (https:// mafft.cbrc.jp/alignment/server/index.html) [75] with the reported 88 rice and 138 Arabidopsis R2R3-MYB proteins [63], respectively. Moreover, different phylogenetic trees were constructed via the maximum likelihood method using software FastTree2 [76]. Neighbor-joining phylogenetic trees of sugarcane and sorghum were performed using MEGA7.0 [77].
Naming R2R3-MYB genes and gene structure Because of the autopolyploid nature of sugarcane (S. spontaneum), the identified SsR2R3-MYB genes partly possessed several alleles. The representative gene models for different alleles were screened by comparing the phylogenetic relationship and protein identity with sorghum homology protein and paralogs. Tandem replication genes and paralogs were regarded as new, which gene IDs were followed by P and T, respectively. The 202 representative SsR2R3-MYB genes were named from SsMYB1 to SsMYB202 according to their physical position on the chromosomes. Subsequently, allele names were supplemented with numbers (e.g., The Sspon.01G0002470-1A gene located at the top of chromosome 1A is MYB1-1, and Sspon.01G0002470-2D is named as MYB1-2). In general, MYB1-1 as a representative gene model was directly regarded as MYB1. The naming method of sorghum MYB genes was also treated like that of S. spontaneum.
SsR2R3-MYB genes and CDS sequences come from the newest version of Sspon.v20190103. The domain location was derived from the previous hmm search results. Gene structures were displayed using the Gene Structure Display Server (GSDS2.0) [78], consisting of the CDS region, intron region, and MYB domain. Each gene structure was arranged according to the phylogenetic location.

Collinearity analysis
Utilizing MCScanX analysis [79], collinearity relationships of SsR2R3-MYB genes and classifier program were used to sort gene duplication types. The identified collinear gene pairs were mapped to their respective locus in the S. spontaneum genome in a circular diagram using Circos 0.69 [80].

Regulatory element of upstream sequences
The 2000 bp upstream sequences were extracted from SsR2R3-MYB genes to the PlantCARE website, plant promoter, and cis-element database [81]. Then, we used them to predict regulatory motifs and estimate potentially related functions.

Abundant RNA-seq data showing gene expression
To analyze SsR2R3-MYB gene expression profiles thoroughly, 60 RNA-seq data were conducted to decipher their expressions from our lab and cooperative labs. Tissue and development transcriptome contained RNAseq data of 16 samples, including leaf, stem, three different development stages viz. seeding (35-day-old), prematurity (9-month-old), and maturity (12-month-old) stages in S. spontaneum [82]. The leaf development transcriptome was derived from the second leaf alone, the ligule on 11-day-old seedlings; 15 cm leaves were selected and cut into 15 pieces with one segment per centimeter [83]. Mature leaves corresponding to ligule in S. spontaneum, over 12-month-old, were selected to supply circadian rhythm transcriptome using 19-time points, i.e., 2 h intervals apart from 6:00 am to the second day 4:00 am, and 4 h apart from 6:00 am to the third day 6:00 am.
RNA-seq were extracted from the droughttreatment sugarcane of FN95-1702, a new sugarcane variety for both sugar and energy, bred by Fujian Agriculture and Forestry University. Sugarcane grown to 4-5 leaves was subjected to the natural drought stress treatment in the greenhouse. The mild drought was characterized by soil relative water content of about 55%~60% after 6 days, and severe drought by 25%~30% after 12 days. After a severe drought, rehydration was done, and relative water content was kept around 75%~85%, and then leave samples were retaken (5 days later). The R2R3-MYB gene expression profiles were obtained by Blast mapping to express data with transcripts of unreferenced genomes. RNAseq for pokkah boeng disease was extracted from hybrid sugarcane ZZ1, which is highly resistant to smut disease but highly susceptible to pokkah boeng disease. According to the severity of the diseased leaves, pokkah boeng disease was divided into five grades from 0 to 5. The mildly diseased leaves (1 or 2 grades) and severely diseased leaves (4 or 5 grades) were selected for analysis, while healthy leaves were used as control (CK). Three samples were extracted for RNA-seq of sugarcane mosaic disease. For the infection experiment, sugarcane grown through virusfree tissue culture was used, and then leaves corresponding to ligule were collected 1 month after the infection, while the control plants were not infected. An expression ratio > 2 (adjusted p-value< 0.05) was considered statistically significant for evaluating differentially expressed genes.
Quantitative RT-PCR S. spontaneum was planted in Multifunctional Specimen Garden, Institute of Agriculture, Guangxi University. The stem-3 at the third internode and mature leaves were collected for comparing the difference of relative expression between stem and leaf. Pro-stem is short for prophase stem, in which the samples were taken from the stem precursor tissue wrapped in the leaf sheath and located on the upper part of the stem with prominent stem nodes. Combining with stem-3, stem-6, stem-9, and mature leaves was used to verify the expression during the prophase of stem formation. Total RNA was carried out using TRIZOL reagent (Takara), employing the corresponding protocol. The qualified RNA was reverse transcribed to produce cDNA using PrimeScript™ RT reagent Kit with gDNA Eraser reagent (Takara, Japan). Primers were designed by qPCR-PrimerQuest Tool, and qPCR primers were shown in Table S8. Glyceraldehyde-3phosphate dehydrogenase gene (GAPDH) was selected as a reference gene [84]. The real-time qPCR with three biological replications was performed with SYBR green on Roche Lightcyler® 480 instrument using 2 × TB Green Mix (Takara). The reaction profile was as follows: 95°C for 30s, followed by 40 cycles of 95°C for 10s, 60°C for 30s, and 95°C for 10s. The relative expression levels were calculated by the 2 -△△ CT method.