Understanding of MYB Transcription Factors Involved in Glucosinolate Biosynthesis in Brassicaceae

Glucosinolates (GSLs) are widely known secondary metabolites that have anticarcinogenic and antioxidative activities in humans and defense roles in plants of the Brassicaceae family. Some R2R3-type MYB (myeloblastosis) transcription factors (TFs) control GSL biosynthesis in Arabidopsis. However, studies on the MYB TFs involved in GSL biosynthesis in Brassica species are limited because of the complexity of the genome, which includes an increased number of paralog genes as a result of genome duplication. The recent completion of the genome sequencing of the Brassica species permits the identification of MYB TFs involved in GSL biosynthesis by comparative genome analysis with A. thaliana. In this review, we describe various findings on the regulation of GSL biosynthesis in Brassicaceae. Furthermore, we identify 63 orthologous copies corresponding to five MYB TFs from Arabidopsis, except MYB76 in Brassica species. Fifty-five MYB TFs from the Brassica species possess a conserved amino acid sequence in their R2R3 MYB DNA-binding domain, and share close evolutionary relationships. Our analysis will provide useful information on the 55 MYB TFs involved in the regulation of GSL biosynthesis in Brassica species, which have a polyploid genome.


Introduction
Plants produce various secondary metabolites that are involved in traits such as taste, color, and scent, and have roles in plant defense against environmental changes or stress unrelated to the primary functions of plants, such as development, reproduction, and photosynthesis [1]. More than 200,000 secondary metabolites are known in plants, and humans have utilized plants for their various benefits (natural medicines, flavors, insecticides, and industrial materials) obtained through the wide chemical diversity of metabolites [2]. Recent studies have reported that cruciferous (Brassicaceae) vegetables are rich in secondary metabolites, including carotenoids, flavonoids, anthocyanins, and glucosinolates (GSLs) [3]. GSLs are derived from amino acids and sugars and are one of the largest known groups of secondary metabolites in the Brassicaceae family. GSLs and their breakdown products have recently attracted attention owing to their various beneficial roles, such as anticarcinogenic and antioxidative activities in humans, and defense against pests and pathogens in plants [2,4]. Therefore, understanding of the GSL biosynthesis pathway will increase their nutritional value and provide agriculturally useful information related to the defense mechanisms in Brassicaceae plants.
The Brassicaceae family consists of 338 genera and about 3700 species, including Arabidopsis thaliana, which is widely studied as a model plant with a small genome, and many plants with agronomic importance such as vegetables, fodder, oil crops [5]. The agricultural and nutritional properties of Brassicaceae plants have resulted in their extensive cultivation. The genus Brassica, which is a member of Brassicaceae, contains the vegetable species, Brassica rapa (A genome, Chinese cabbage, pak choi),

Metabolic Pathway of GSLs
GSLs are a large group of sulfur-containing secondary metabolites, with more than 120 different GSLs identified in Brassicaceae [22]. GSLs are classified into three groups derived from various amino acids: (1) the aliphatic group, derived from Met, Ala, Leu, Ile or Val; (2) the indolic group, derived from Trp; and (3) the aromatic group, derived from Phe or Tyr [23]. The GSLs of the three groups are synthesized in three stages ( Figure 1). The precursor amino acids of the aliphatic and aromatic groups are chain-elongated in the early phase, and then oxime is converted into the core GSL structure in the second stage, respectively. Members of the indolic group are transformed into the core GSL structure from the oxime at the first stage, without undergoing the chain-elongation stage. Finally, the side chains of the core structures are modified by oxidation, elimination, alkylation or esterification in the aliphatic and indolic groups [14]. Aliphatic GSLs are the most abundant, at about 57-97% of the total GSL content, while aromatic GSLs are minor components in Brassica species [24][25][26]. The GSLs of the aliphatic and aromatic groups have unique properties for human health and livestock and have roles in oncogenesis, disease, and nutrition [27].
Glucoraphanin, a precursor of sulforaphane, can act as an anticarcinogenic agent in human cells [28]. Sinigrin has been reported in high amounts in black mustard (B. nigra), broccoli (B. oleracea), and Indian mustard (B. juncea), and has been shown to have antioxidant, anticancer, and antifungal effects [29]. Furthermore, phenethyl isothiocyanate (PEITC), which is formed from gluconasturtiin of the aromatic GSL group, has been shown to have a significant chemo-preventative effect against human prostate cancer [30]. Conversely, progoitrin causes goiter disease in mammals, and impedes the use of Brassica crops for cattle feed [31]. GSLs of the indolic groups, such as glucobrassicin, 4-hydroxy glucobrassicin, neoglucobrassicin, and 4-methoxy glucobrassicin, have demonstrated that these breakdown products contribute to defense against biological stresses, including pests and pathogens [32,33]. Tryptophan-derived indole-3-acetaldoxime in the indolic group-also known as a precursor for auxin related to plant growth and development-and camalexin biosynthesis controlling deter bacterial and fungal pathogens [34,35]. Recently, some aliphatic and indolic GSLs have been reported that are involved in primary metabolism in plants, such as nitrogen and sulfur sources, and abiotic stresses, such as salinity, light, and elevated CO 2 [36][37][38][39][40]. sulforaphane, can act as an anticarcinogenic agent in human cells [28]. Sinigrin has been reported in high amounts in black mustard (B. nigra), broccoli (B. oleracea), and Indian mustard (B. juncea), and has been shown to have antioxidant, anticancer, and antifungal effects [29]. Furthermore, phenethyl isothiocyanate (PEITC), which is formed from gluconasturtiin of the aromatic GSL group, has been shown to have a significant chemo-preventative effect against human prostate cancer [30]. Conversely, progoitrin causes goiter disease in mammals, and impedes the use of Brassica crops for cattle feed [31]. GSLs of the indolic groups, such as glucobrassicin, 4-hydroxy glucobrassicin, neoglucobrassicin, and 4-methoxy glucobrassicin, have demonstrated that these breakdown products contribute to defense against biological stresses, including pests and pathogens [32,33]. Tryptophan-derived indole-3acetaldoxime in the indolic group-also known as a precursor for auxin related to plant growth and development-and camalexin biosynthesis controlling deter bacterial and fungal pathogens [34,35].

Regulators of the GSL Biosynthesis Pathway in the Model Plant Arabidopsis
Most of the structural genes and transcription factors involved in GSL biosynthesis have been identified from diverse molecular and genetic studies in Arabidopsis [18,19,33]. To date, more than 40 structural genes and eight TFs involved in the GSL biosynthesis pathway have been identified in Arabidopsis. Dof1.1 for DNA binding with one finger has been reported as the regulator of networks that positively control indolic GSLs in A. thaliana [41]. IQD (IQ-domain) 1.1 encodes a novel nuclear-localized protein that positively regulates GSL accumulation, and controls the expression of several GSL pathway genes [42]. A further six TFs belong to the group of MYB TFs that contain a R2R3 DNA-binding domain. MYB28, MYB29, and MYB76 are positive regulators of aliphatic GSLs in A. thaliana, and high levels of MYB28 transcription result in the production of a large amounts of aliphatic GSLs. MYB34, MYB51, and MYB122 have been shown to regulate indolic GSL biosynthesis [18,19,43,44]. Additionally, recent studies have shown that the MYB34, MYB51, and MYB122 MYB TFs are involved in the biosynthesis of jasmonate acid (JA), abscisic acid (ABA), ethylene (ET), and salicylic acid (SA), which are involved in plant defense [21]. Additionally, these MYB TFs regulate GSL biosynthesis in cooperation with MYC-bHLH (MYC-like basic helix-loop-helix) TFs, known as signaling components of the jasmonic acid pathway [45]. The R2R3 MYB TFs influence the expression of biosynthetic genes related to GSL biosynthesis in A. thaliana [19,46]. AtMYB28 has been reported to regulate the expression of aliphatic GSL biosynthetic genes, such as AtBCAT-3, AtLeuC1 and MAM1 [19]. In particular, biosynthetic gene AOP2 (2-oxoglutarate-dependent dioxygenases) expression was increased the transcript levels of the MYB28 and MYB29 of the GSL pathway, controlling GSL biosynthesis [47]. The TFs MYB34, MYB51, and MYB122 positively regulate the accumulation of indolic GSLs and have the potential to upregulate GSL biosynthetic genes, such as CYP79Bs [48]. Moreover, all six MYB TFs from A. thaliana was shown to control genes associated with primary sulfate metabolism and are closely related to the GSL biosynthesis network [49].

Characterization of MYB TFs in Brassica Species
MYB TFs comprise one of the largest gene families of plant TFs, and play significant roles in the regulation of multiple biological processes, such as developmental and environmental responses, and metabolic pathways. These MYB TFs are classified into three subfamilies depending on the number of DNA-binding domain repeats and are known to regulate various biological processes by modulating the rate of transcription initiation of target genes [50]. The R2R3 MYB subfamily, containing the DNA-binding domain of the helix-loop-helix repeats R2 and R3, is the largest group of the MYB family and regulates plant-specific processes including primary and secondary metabolism [50]. Interestingly, the DNA binding domain of the R2R3 MYB family contains conserved amino acid sequence motifs, despite the divergence of the amino acid sequence downstream and the conserved motifs contributing to functional conservation [51]. Therefore, an understanding of R2R3 MYB TFs in Brassica species may help to elucidate the regulation of secondary metabolism by polyploidy-genome evolution in Brassicaceae.
Six MYB TFs belonging to the R2R3 MYB family have been identified as regulators of GSL biosynthesis in Arabidopsis [43,52]. These MYB TFs were also identified using the NCBI (National Center for Biotechnology Information) and Brassica databases (http://brassicadb.org/brad/) of genome-sequenced Brassica species [9,14,20,53]. In total, we identified 63 orthologous copies corresponding to five MYB TFs, except MYB76. To date, MYB76 TFs have not been defined in the genomes of Brassica species, although this has been identified as the positive regulator of aliphatic GSLs in A. thaliana. A summary of these 63 orthotonus gene sequences of MYB TFs related to GSL biosynthesis in Brassicaceae is shown in Table 1. More than two copies corresponding to each MYB TF were found in the Brassica species. These results indicated that genome duplication events have contributed to the expansion of the R2R3 MYB gene family in the Brassica species. The Brassica genus, initiating with B. rapa [7] and ending finally with B. juncea [10] has had its complete genome sequence reported. Only 68% of B. nigra (Accession YZ12151) and 85% of B. oleracea (var capitata line 02-12) of the estimated genome size have been sequenced, and there is some lack of completed gene annotation in their genomes [54]. There are some insufficient gene sequence fragments of orthologous Brassica genes for which we can only confirm 52 full-length genes. Although 11 partial sequences of MYB TFs exist in Brassica, we were only able to identify the R2R3 MYB DNA-binding domain sequence in BnMYB28.2, BnMYB51.3, and BnMYB51.8. In total, 55 orthologous genes contain a complete R2R3 MYB DNA-binding domain sequence, and these genes showed high conservation at orthologous and paralogous levels ( Figure 2). These conserved domains consist of 102 amino acids (AA) and exhibited more than 90% sequence identity with each A. thaliana. However, only three motifs (N[R/K/H]VA) were conserved in the 52 full-length MYB TFs of Brassica species, despite the 12 motifs conserved in the C-terminal regions of six AtMYB TFs. Consequently, variations in gene length (in base pair) and nonsynonymous amino acid sequences are caused by polymorphisms in the downstream C-terminal region. We have identified and reported the conserved R2R3 MYB DNA-binding domains of 13 MYB TFs related to GSL biosynthesis in B. rapa [16,24]. A phylogenetic tree was constructed using the amino acid sequences of 61 R2R3 MYB DNA-binding domains containing A. thaliana to elucidate evolutionary relationships (Figure 3). Fifty-five MYB TFs were grouped based on six AtMYB TFs. Three MYB TFs of Brassica related to aliphatic GSLs, including MYB28, 29 and 76, were subgroups, with 76 located between subgroups 28 and 29. MYB34, MY51, and MYB122, involved in indolic GSLs, clustered in a large group, and the MYB34 group formed a subgroup with MYB51. These results indicated that the six MYB TFs related to GSL biosynthesis were evolutionarily conserved in Brassica species and exhibit functional conservation. The AtMYB76 showed a high sequence identity with all MYB29 TFs in the conserved R2R3 MYB DNA-binding domain, but showed a low sequence identity of 69.3% and 62-66% with AtMYB29 and MYB29 TFs of Brassica species in the full length of the MYB29 TFs (no data). Polyploidy can result in chromosomal rearrangements and gene loss, due to unequal rates of sequence evolution of duplicated genes and changes in DNA methylation [55][56][57]. Although the genome assembly of Brassica species is not complete, the loss of the MYB76 TF in Brassica species may have been caused by genome duplication in evolutionary time. Furthermore, the R2R3 MYB TFs of B. juncea and B. napus, which have allopolyploid genomes, are closely related to those of B. rapa, B. oleracea, and B. nigra, which have diploid genomes. The amino acid sequence identity of the analyzed R2R3 MYB DNA-binding domains with diploid plants showed the evolutionary origin of the allopolyploid plants in Table 2. Four BjMYB28 TFs from B. juncea (AABB) revealed a high level of sequence similarity to B. rapa (AA) and B. nigra (BB). BjMYB28.1 and BjMYB28.3 were found to possess 100% sequence identity with BniMYB28.1 and BrMYB28.3, respectively, while BjMYB28.2, BrMYB28.2, and BniMYB28.2 were fully conserved among the three species. Although BjMYB28.4 was found to possess a high level of sequence similarity (99.02%) to BrMYB28.1, it is possible that an unidentified MYB28 TF of B. nigra exists with higher sequence similarity. Similarly, BnMYB28.1 and BnMYB28.4 from B. napus (AACC) showed 100% sequence similarity with BolMYB28.3 of B. oleracea (CC), and BnMYB28.2 was found to be 100% conserved in three species. BnMYB29s, BnMYB34s, BnMYB51s, and BnMYB122s of B. napus were also found to possess high sequence identity with either BrMYBs or BolMYBs.
These results revealed that the BjMYB and BnMYB TFs originated from the genomes of BrMYBs, BniMYBs, or BolMYBs. Therefore, MYB TFs related to GSL biosynthesis are evolutionary closed and conserved in their R2R3 DNA-binding domains despite duplication and hybridization of two diploid Brassica genomes. Furthermore, the number of MYB TFs has increased during evolution, which may have allowed functional diversification and the development of complex networks for the regulation of GSL biosynthesis in polyploidy Brassica species.    Bn M YB 51 .

Functional Description of MYB TFs Related to GSL Biosynthesis in Brassica Species
Most studies on MYB TFs related to GSL biosynthesis have been performed using A. thaliana of the Brassicaceae family as a model plant with a small genome size [21,43,44]. The recent completion of the genome sequencing of Brassica species permits the identification of various gene families in its genome. B. rapa is a model dicot plant for use in studies of polyploidy-related genome structure and evolution because of the small size of its genome (529 or 485 kb) in the Brassica genus [54]. Many putative biosynthetic and regulatory genes related to GSL biosynthesis have been identified in the genome sequence of B. rapa [7]. The 13 BrMYB TFs that possess a complete coding sequence indicate that paralogous genes arising through gene duplication have led to functional diversity and changes in expression patterns, reflected by genotype-specific variation in B. rapa subspecies [24]. The expression of some BrMYB TFs, such as BrMYB28, BrMYB34, and BrMYB51 also increased under abiotic and biotic stress conditions. Furthermore, functional analysis of three BrMYB28 TFs has been performed using Agrobacterium-mediated transformation in B. rapa [16]. The three BrMYB28 TFs are involved in the regulation of aliphatic, indolic, and aromatic GSL biosynthesis, and in the expression of biosynthetic genes, such as BrAOP 2 and BrGSL-OH in transgenic B. rapa. These results suggested that the regulation of GSL biosynthesis involves a GSL pathway that is more complex than that in Arabidopsis due to the complexity of the polyploidy genome in B. rapa. Four paralogs of BjMYB28 in B. juncea have been reported as regulators of aliphatic GSL accumulation in transgenic A. thaliana and the gene silencing lines of B. juncea with a low GSL content [20,59]. Studies on MYB TFs in B. oleracea are still not sufficient for the mechanism of BolMYB TFs in the regulation of GSLs biosynthesis to be elucidated, although some studies have revealed various expression patterns of biosynthetic genes and BolMYB TFs [51,60]. Genetic studies using association mapping have identified multiple loci controlling GSL biosynthesis in B. napus and B. juncea [61,62]. Recently, the MYB28 TF was identified as the regulator of aliphatic GSL biosynthesis by associative transcriptomics using transcriptome sequencing in B. napus [63]. Additionally, association mapping of B. napus and B. juncea confirmed that MYB28 TFs are associated with GSL content [64,65].
Previous studies have demonstrated the role of MYB 28, 29, and 76 as aliphatic GSLs, and MYB 51, 34 and 122 as indolic GSLs, in the regulation of GSL biosynthesis in A. thaliana. In the case of Brassica species, only a few recent studies have shown that MYB28 TFs positively regulate the accumulation of GSLs. The complexity of the Brassica genome, due to an increased number of paralog genes through genome duplication, makes it difficult for a molecular approach to be used to determine the regulation of GSL biosynthesis. In this review, we discussed the role of MYB TFs as important regulators associated with the GSL biosynthesis pathway in Brassica species, and provided useful information on the 55 MYB TFs for improved understanding of the regulatory mechanism of GSL biosynthesis in Brassica species.

Conclusions
Recently, six R2R3 MYB TFs controlling the accumulation of various GSLs were reported as regulators of different stress responses and hormones, such as ABA, ethylene, and jasmonate, in A. thaliana. Although Brassica crops have commercial and scientific value, our understanding of the roles of most MYB TFs is lacking, with the exception of a few MYB TFs related to GSL biosynthesis in Brassica species. The 55 R2R3 MYB TFs identified as ortholog genes with A. thaliana in Brassica species shared a close evolutionary relationship, with a highly conserved DNA-binding amino acid sequence. This will provide valuable information on the mechanisms of MYB TF regulation on unique properties, such as stress responses and various metabolites containing GSL biosynthesis in Brassica species with polyploid genomes. Further extensive functional studies of the 55 MYB TFs will help to elucidate the functional diversity of genes via genome duplication in polyploidy plants.