Insights into the diversification of subclade IVa bHLH transcription factors in Fabaceae

Fabaceae plants appear to contain larger numbers of subclade IVa basic-helix-loop-helix (bHLH) transcription factors than other plant families, and some members of this subclade have been identified as saponin biosynthesis regulators. We aimed to systematically elucidate the diversification of this subclade and obtain insights into the evolutionary history of saponin biosynthesis regulation in Fabaceae. In this study, we collected sequences of subclade IVa bHLH proteins from 40 species, including fabids and other plants, and found greater numbers of subclade IVa bHLHs in Fabaceae. We confirmed conservation of the bHLH domain, C-terminal ACT-like domain, and exon-intron organisation among almost all subclade IVa members in model legumes, supporting the results of our classification. Phylogenetic tree-based classification of subclade IVa revealed the presence of three different groups. Interestingly, most Fabaceae subclade IVa bHLHs fell into group 1, which contained all legume saponin biosynthesis regulators identified to date. These observations support the co-occurrence and Fabaceae-specific diversification of saponin biosynthesis regulators. Comparing the expression of orthologous genes in Glycine max, Medicago truncatula, and Lotus japonicus, orthologues of MtTSAR1 (the first identified soyasaponin biosynthesis regulatory transcription factor) were not expressed in the same tissues, suggesting that group 1 members have gained different expression patterns and contributions to saponin biosynthesis during their duplication and divergence. On the other hand, groups 2 and 3 possessed fewer members, and their phylogenetic relationships and expression patterns were highly conserved, indicating that their activities may be conserved across Fabaceae. This study suggests subdivision and diversification of subclade IVa bHLHs in Fabaceae plants. The results will be useful for candidate selection of unidentified saponin biosynthesis regulators. Furthermore, the functions of groups 2 and 3 members are interesting targets for clarifying the evolution of subclade IVa bHLH transcription factors in Fabaceae.

Basic-helix-loop-helix (bHLH) transcription factors are one of the largest families of plant transcription factors, and are classified into approximately 25 subclades based on sequence homology within the bHLH domain and other shared protein domains [18,19]. Land plants have acquired more bHLH genes than animals, chlorophytes, or red algae [19], and some subclades evolved to regulate plant specialised metabolism [5]. Subclade IVa is a good example of such regulation, as it represents conserved transcriptional regulation of methyl jasmonate (MeJA)-mediated metabolic processes in plants [5]. TRITERPENE SAPONIN BIOSYNTHESIS ACTIVATING REGULA-TOR1 (MtTSAR1) upregulates the soyasaponin pathway in M. truncatula [20]. MtTSARs 2 and 3 are factors that activate hemolytic saponin accumulation, with differences in tissue specificity [20,21]. We recently identified GubHLH3 as a positive regulator of soyasaponin biosynthesis in G. uralensis [22], and this protein is closely related to MtTSAR2 but not MtTSAR1. This finding hints at the evolutionary history of Fabaceae subclade IVa bHLHs. Chenopodium quinoa (Amaranthaceae) seeds accumulate saponins with similar structures to the hemolytic saponins of M. truncatula. Mutations in CqTSAR-like1 (CqTSARL1) were identified as a major factor affecting differences in the saponin accumulation pattern between saponin-producing and saponin-lacking ecotypes [23]. In Catharanthus roseus (Apocynaceae), bHLH iridoid synthesis 1 (CrBIS1) and CrBIS2 were found to positively regulate the biosynthesis pathway for the iridoid branch of monoterpenoid indole alkaloids (MIAs) [24,25]. Interestingly, the functions of MtTSARs and CrBIS1 were shown to be interchangeable through heterologous expression of MtTSARs in C. roseus and CrBIS1 in M. truncatula [26]. In addition, production of both saponins and MIAs were commonly regulated by MeJA [5,21,24,27].
Numerous studies have reported genome-wide identification and classification of bHLH factors in plants [18,19,[28][29][30]. Although the genomes of Arabidopsis thaliana and Oryza sativa possess four and six subclade IVa members, respectively [19], more than 30 subclade IVa bHLH genes were found in the genomes of Glycine max and M. truncatula [21,28]. This finding suggests that Fabaceae plants may have acquired a large number of subclade IVa members during the evolution of saponin biosynthesis.
In this study, we extensively explored subclade IVa bHLHs in fabids and showed that Fabaceae plants possess a large number of subclade IVa members, which were classified into three groups based on phylogenetic analysis. Group 1 had the greatest number of members, including MtTSARs and GubHLH3. Groups 2 and 3 contained fewer members, none of which were functionally-identified, but were obviously distinct from group 1 based on the tree and highly conserved among Fabaceae plants. We also performed in silico analysis to elucidate their structures and functions. This study will help to narrow down the candidates of unidentified saponin biosynthesis regulators and clarify the evolution of subclade IVa members in Fabaceae plants.

Large numbers of subclade IVa members in Fabaceae plants
A total of 319 bHLH proteins and 33 subclade IVa members were identified previously in G. max [28]. We obtained 355 sequences of G. max bHLH proteins (Additional file 1: Table S1) using PlantTFDB [31]. Then, we assigned individual names to the novel members and re-selected subclade IVa members based on sequence similarity of the full-length proteins. Although five proteins (GmbHLH60-64) were designated as members of subclade IVa in a previous study [28], they had relatively long amino acid sequences (588-653 aa) and clustered more closely with bHLH proteins in subclade IIIf from A. thaliana on the phylogenetic tree (Additional file 3: Fig.  S1). GmbHLH327, 329, 331, 334, 337, and 345 were newly assigned to subclade IVa based on Basic Local Alignment Search Tool (BLAST) search results. Finally, we identified 34 G. max subclade IVa bHLHs ranging in peptide length from 195 to 390 aa (Additional file 1: Table S1).
We collected sequences of all bHLH proteins from 40 plant species including A. thaliana, C. roseus, C. quinoa and various fabids (Additional file 1: Table S2). These proteins were used as queries for BLAST searches against the 4 and 34 subclade IVa bHLHs identified in A. thaliana and G. max, and we thereby identified the subclade IVa members in each plant species (Additional file 2). Fabaceae plants possessed 61   sequences of Arachis hypogaea and Vigna unguiculata were not used for the prediction in PlantTFDB, their bHLH sequences may not have all been collected. The percentage of subclade IVa genes relative to all bHLH genes was 5.56-18.2% and 1.82-5.76% in Fabaceae and non-Fabaceae fabids, respectively ( Table 1). The genomes of Fabaceae contained significantly more subclade IVa bHLH genes than those of related plant families (Mann-Whitney U test, U = 329, p < 10 − 9 ).

Three groups of subclade IVa bHLHs found in Fabaceae plants
To visualise the diversification of subclade IVa members in Fabaceae and other fabids, we constructed a phylogenetic tree using full-length sequences (Fig. 1, Additional file 3: Fig. S2). Subclade IVa bHLHs were further classified into three groups. Most Fabaceae subclade IVa bHLHs were included in group 1 (Table 1), which contained all MtTSARs and GubHLH3. Groups 2 and 3 had limited numbers of members, but were highly conserved among Fabaceae plants (Additional file 3: Fig.  S2).

Conservation of bHLH and ACT-like domains and exonintron structures
As described in previous studies [16,28] the HLH and ACT-like domains are involved in dimerisation [18,25,32,33]. Using MEME algorithm [34], we searched for these conserved domains (Fig. 2, Additional file 3: Fig. S3) in 82 subclade IVa bHLHs of G. max, M. truncatula, and L. japonicus (Additional file 1: Table  S1). We found five motifs that were well conserved in almost all 82 proteins (Fig. 2a); two upstream motifs of the basic and HLH regions (Fig. 2b), and three motifs at the Cterminus corresponding to the ACT-like domain (Fig. 2c). Some group 1 members, GmbHLH105 and 106 and LjbHLH021, lacked the basic region (Additional file 3: Fig.  S3) and these three proteins clustered together in the phylogenetic tree (Additional file 3: Fig. S2).
We confirmed that exon/intron structures are conserved among subclade IVa bHLH genes with some exceptions (Fig. 3). Most members had four exons and three introns. All 82 subclade IVa bHLH genes contained one intron within the HLH domain, but its length was highly variable (Additional file 1: Table S3). This conserved intron position corresponded to pattern D, as defined in a previous study [28]. MtbHLH138, MtbHLH177, GmbHLH334, and LjbHLH014 lacked intron 3 and exon 4 (Additional file 1: Table S3), resulting in incomplete or absent ACT-like domains (Additional file 3: Fig. S3). As some members of groups 1, 2, and 3 gained or lacked introns (Additional file 1: Table S3), structural diversification may have occurred independently during their evolution.
Based on the highly conserved protein domains and exon-intron organisation across groups, we confirmed that groups 1, 2, and 3 were undoubtedly members of subclade IVa.

Expression patterns of bHLH genes in each group
Using publicly available expression atlases of G. max, M. truncatula, and L. japonicus, we compared the expression patterns of homologous genes in each plant ( Table 2). The orthologous genes in group 1 did not have a completely conserved expression profile across plant species. For instance, although TSAR1 (MtbHLH150) was expressed more in leaves and petioles, the expression levels of its orthologous genes, LjbHLH054 and GmbHLH345, were highest in nodules and flowers, respectively (Additional file 3: Fig. S4). Group 2 members were commonly expressed in nodules, except GmbHLH116 and GmbHLH334, for which expression was not observed. Almost all genes in group 3 were expressed in underground tissues, namely roots and nodules. Three of the four genes in group 3 of G. max were also expressed in the pod shells.

Discussion
One of the most diverse plant transcription factor families, bHLHs regulate many aspects of biological processes, including organ development, specialised metabolism, and the response to environmental stimuli [19]. Subclade IVa bHLH members appear to regulate specialised metabolism and defense responses [5,19]. In this study, we showed that Fabaceae plants possessed a greater number of subclade IVa bHLH genes in their genomes than other fabids (Table 1, Fig. 1). G. max and Glycine soja had approximately double the number of total bHLHs and subclade IVa members compared to other Fabaceae, as they have experienced two wholegenome duplication events, doubling their genome size [35,36]. Although the number of bHLHs in M. truncatula was similar to those of other Fabaceae plants, twice as many subclade IVa bHLHs were found in the Medicago genome (Table 1). Thus, M. truncatula likely duplicated its subclade IVa bHLHs during development of the hemolytic saponin biosynthesis pathway from the soyasaponin pathway (Additional file 3: Fig. S5).
Domain structures and exon-intron organisation were highly conserved among the 82 subclade IVa members derived from G. max, M. truncatula, and L. japonicus (Figs. 2, 3). Fabaceae subclade IVa bHLH proteins were clearly classified into three groups in the phylogenetic tree (Fig. 1). We found a strong bias in the number of Fabaceae bHLHs belonging to group 1, although no such bias was found in other fabids (Table 1). Group 1 may be a clade of transcription factors regulating saponin biosynthesis across a broad range of Fabaceae plants, as all MtTSARs and GubHLH3 were included in this group (Additional file 3: Fig. S2). Furthermore, the expression patterns of orthologous genes in group 1 were not conserved (Table 2), and the soyasaponin biosynthesis regulator, GubHLH3 was not the closest homologue of MtTSAR1 [22]. Thus, although the duplications of group 1 members apparently occurred in ancestral Fabaceae, their expression patterns and contributions to saponin  biosynthesis may have differentiated after speciation. Therefore, we should search for candidate soyasaponin biosynthesis regulators among group 1 members. Fewer members belonged to groups 2 and 3, but were highly conserved (Fig. 1, Table 1) and tended to be expressed in nodules and roots ( Table 2). We confirmed the co-expression of LjCYP93E1 (a soyasaponin biosynthetic gene) and LjbHLH032 (group 2 subclade IVa bHLH) with a Pearson's correlation coefficient of 0.797 (Additional file 3: Fig. S6). Furthermore, Fabaceae triterpene saponins likely play important roles in the rhizosphere, as reported in previous studies; increased saponin accumulation enhanced nodulation [16] and soyasaponins were the major component of root exudates [37]. These observations suggest that members of group 2 affect biological interactions in the rhizosphere through modulation of soyasaponin production. Generally, bHLH proteins form homo-and heterodimers that regulate the expression of target genes [18,25,32,33]. The possibility that subclade IVa members in groups 2 and 3 also regulate saponin biosynthesis in Fabaceae is worthy of further investigation.
Fabaceae possessed more subclade IVa members, although there was no significant difference in the total numbers of bHLH genes between Fabaceae and non-Fabaceae (Mann-Whitney U test, U = 210, p = 0.1639). This suggested that other subclades in Fabaceae might have fewer genes. We roughly estimated how many genes were present in each subclade in selected species based on the phylogenetic relationships of the bHLH domains, and found no specific contraction in any subclade (Additional file 1: Table S4).

Conclusions
In this study, we constructed a phylogenetic tree of fulllength subclade IVa bHLH proteins from 40 plant species, mainly comprised of fabids. The results clearly indicated that subclade IVa bHLHs could be classified into three groups, and that Fabaceae plants contained a large number of group 1 members, including all saponin biosynthesis regulators identified to date. This information will help to uncover unidentified soyasaponin biosynthesis regulatory factors. On the other hand, no genes in groups 2 or 3 have yet been functionally characterised in Fabaceae. These genes are interesting targets for elucidating the evolution and functions of Fabaceae subclade IVa bHLH transcription factors.

Sequence retrieval
Representative protein sequences of G. uralensis were obtained from the G. uralensis genome database [38]. A total of 163 putative bHLH proteins were retrieved based on hidden Markov models (HMMs) of HLH domain (PF00010) downloaded from Pfam 32.0 [39,40], using HMMER v3.3 software [41,42]. The bHLH domain sequences and full-length sequences of bHLH proteins (only the primary isoforms) from other plant species were retrieved from PlantTFDB v5.0 [31,43]. Subclade IVa members of selected species were identified using a BLAST search against all subclade IVa proteins of A. thaliana and G. max with an e-value threshold of <1e-50. The bHLH proteins selected are listed in Additional file 2.

Phylogenetic tree analysis
Protein alignment of full-length bHLHs or bHLH domains was performed using Clustal Omega v1.2.3 [44] with the default settings. A Newick file was generated using FastTree v2.1.10 [45] with the default settings. The phylogenetic tree was visualised from the Newick file using MEGA X [46].

Expression pattern analysis
Expression patterns of bHLH genes were retrieved from Lotus Base [52,53], Soybean eFP browser [54], Medicago eFP browser [55], and The Medicago truncatula Gene Expression Atlas [56,57]. The expression of representative genes belonging to subclade IVa was determined using publicly available databases and summarised. a LjbHLH001 and LjbHLH014 are found in the L. japonicus Miyakojima MG-20 accession, but both correspond to the same gene in the L. japonicus Gifu B-129 accession