Comprehensive identification and characterization of abiotic stress and hormone responsive glycosyl hydrolase family 1 genes in Medicago truncatula CURRENT STATUS:

Background: β-glucosidases (BGLUs) hydrolyze the β-D-glycosidic bond with retention of anomeric configuration. BGLUs were associated with many aspects of plant physiological processes, in particular biotic and abiotic stresses through the activation of phytohormones and defense compounds. However, no comprehensive and systematic investigation on the stress- or hormone-responsive BGLU proteins had ever been reported in plant. Results: In this study, total 51 BGLU genes of the glycoside hydrolase family 1 were identified in one of the model legume plant Medicago truncatula genome, and they were classified into five distinct clusters. Sequence alignments revealed several conserved and characteristic motifs among these MtBGLU proteins. Analyses of their putative signal peptides and N-glycosylation site suggested that the majority of MtBGLU members have dual targeting to the vacuole/chloroplast. Many regulatory elements possibly related with phytohormones and/or abiotic stresses were identified in MtBGLU genes. Moreover, Microarray and qPCR analyses showed that these MtBGLU genes exhibited distinct expression patterns in various tissues, and in response to different abiotic stress and hormonal treatments. Notably, MtBGLU21, MtBGLU22, MtBGLU28, and MtBGLU30 in cluster I were dramatically activated by NaCl, PEG, IAA, ABA, SA and GA3 treatments. Conclusion: Collectively, our genome-wide characterization, evolutionary analysis, and expression pattern analysis of MtBGLU genes suggested that BGLU proteins play crucial roles in response to various abiotic stresses and hormonal cues in M. truncatula. This systematic analysis provided valuable information for the functional characterization and utilization of these MtBGLU genes in improving stress tolerance in M. truncatula and/or other plant species.


Background
Plants are sessile organisms that can not escape from the adverse circumstances as well as herbivore or pathogen attack. As a result, plants have evolved to protect themselves and adapted to various biotic and abiotic stresses by collectively synthesizing a plethora of defense compounds [1]. Many of these compounds are in glucosylated forms, such as cyanogenic-, isoflavone-, and hydroxamic acidglucosides. Glucosylation could increase the solubility and stability of these compounds, so as to 3 make them suitable for storage in the vacuole or organelle, and further protect plants from deleterious effects on its own defense system [2,3]. These glucosylated compounds are activated by a group of enzymes called β-glucosidases (EC 3.2.1. 21), which belong to the glycoside hydrolase (GH) family with 166 subfamilies. Among them, β-glucosidases catalyze the hydrolysis of terminal, nonreducing β-D-glucosyl residues to release β-D-glucose, and they are categorized into glycoside hydrolase families GH1, GH3, GH5 or GH116 [4]. Among these four subfamilies, GH1 is the largest one and most plant β-glucosidases have been characterized from GH1 so far.
Hydrolysis of the β-D-glycosidic bond by β-glucosidases (BGLUs) of GH1 involves two steps (glycosylation and deglycosylation) with retention of anomeric configuration. This process requires participation of two well-conserved active glutamate residues that located in the characteristic and highly conserved peptide motifs TF/LNEP (acid/base catalyst) and I/VTENG (nucleophilic) [5]. These two glutamate residues are considered to be the key amino acids for β-glucosidase [6,7]. Furthermore, it has been shown that most BGLU enzymes are targeted to plastid, vacuole, cytosol or apoplast, and they are accumulated mainly in seedlings and young tissues of plants [8,9].
Several BGLU enzymes were elucidated to be associated with diverse biological functions, including cell wall remodeling, lignification and plant secondary metabolites [10,11]. Os4BGlu12 from Oryza sativa has been shown to hydrolyze cell wall β-glucan-derived oligosaccharides during development or wounding [12]. In Arabidopsis thaliana, two stem-specific β-glucosidases BGLU45 and BGLU46 involved in the hydrolysis of monolignol glucoside coniferin, which is the storage form of monolignols.
Glucoside coniferin in Arabidopsis could be used when lignins need to be synthesized de novo resistance to various stresses [13]. In Glycine max, the release of free isoflavones from their conjugates is catalyzed by GmICHG, a BGLU enzyme exclusively localized in the cell wall of root hairs [2].
More importantly, substantial evidences also indicate that BGLU genes play important roles in the activation of defense chemicals response to stresses, and in the release of active phytohormones [14]. For example, AtBGLU26 has been demonstrated to be important to plant defense against broadspectrum fungi in Arabidopsis [15]. AtBGLU23 from Arabidopsis is an ER body-localized β-glucosidase, 4 which is implicated to be involved in the establishment of symbiosis with the endophytic fungus Piriformospora indica, by preventing it from overgrowing in roots and triggering a defense response [16]. In white spruce, expression of Pgbglu-1 led to the accumulation of acetophenone aglycons, which was proved to be a constitutive defense mechanism against spruce budworm [17]. In Arabidopsis, the production of active ABA from ABA-GE under abiotic stress conditions were catalyzed by β-glucosidases AtBGLU18 and AtBGLU33 [18,19]. In addition, a few BGLUs in rice have also been demonstrated to hydrolyze salicylic acid glucoside [20], and gibberellic acid glucose ester [21].
In addition to their hydrolysis activity, a few BGLUs also showed transglucosidase activities in anthocyanin synthesis in carnation or delphinium [22], which indicates that BGLU also have the function of synthesizing gluco-conjugates in plant. All these studies suggest that BGLUs activated storage forms of glyco-conjugates against various stresses, and the glycosylation and deglycosylation process provides an efficient system to modulate the homeostasis of plant metabolism compounds.
As the genome of many plant species has been sequenced, the GH1 family members have been identified in plants, such as 47 BGLU genes in A. thaliana [23], 40 in O. sativa [24] and 26 in Z. mays [25]. As a fast-emerging legume model plant, M. truncatula showed the advantages of small diploid genome, self-pollination, short growing period and high genetic transformation efficiency. More importantly, the genome of M. truncatula has been sequenced [26] and the gene expression atlas (MtGEA) has also been publicly available, which makes the functional genome studies convenient.
However, members of the BGLU family in M. truncatula have hardly been investigated and their functions remain unclear, except four MJ-inducible BGLUs that were functionally characterized and shown to be active towards (iso)flavonoid glycosides [27]. It is, therefore, important to take an indepth understanding of how M. truncatula regulates individual BGLU members in response to developmental and environmental signals, especially abiotic stresses and hormones.
In the present study, we identified and characterized all 51 MtBGLU genes from GH1 family and divided them into five main clusters. The comprehensive analyses on these MtBGLUs including sequence features, phylogenetic relationships, gene structures, protein motifs, chromosomal localization, synteny analysis, and cis-acting elements. We also investigate the tissue-specific 5 expression and dynamic expression patterns of MtBGLUs in response to various abiotic stresses and hormones. This study provides valuable clues for further investigations aiming at the functional characterization of MtBGLUs gene family members, and their utilization for the genetic improvement of legume plants for resistance against environmental stresses.

Identification of BGLU genes in the M. truncatula genome
A total of 51 candidate MtBGLU genes were obtained by homology search and domain confirmation, and they were designated as MtBGLU1 to MtBGLU51 based on their location on the chromosome, but four of them (MtBGLU48-51) were not assigned to any of the eight chromosomes (Table S2).
To avoid potential confusion, information on four previously published MtBGLUs are provided in Table   S3. Physico-chemical properties of MtBGLU genes, including amino acid numbers, molecular weights, signal peptides, isoelectric points, GraVy, N-gly site and possible subcellular localization were listed in Table 1. The majority of MtBGLU proteins, except 9 MtBGLUs (MtBGLU5, 15, 16, 17, 18, 25, 45, 46 and 48), were predicted to have signal peptides ranging from 17 to 33 amino acids in length, which would target them to the secretary pathway. The lengths of the predicted precursor proteins varied from 315 aa (MtBGLU49) to 1030 aa (MtBGLU33), which correspond to protein molecular weights of 35.8 to 115.7 kDa. The length of mature polypeptides vary from 286 to 617 amino acids, corresponding to MW 32.3 to 70.3 kDa. All MtBGLUs contain one to five N-linked glycosylation sites. Isoelectric points of the predicted proteins ranged from 5.30 to 9.53. The GRAVY ranged from -0.578 to 0, indicating that these MtBGLU proteins are all hydrophilic proteins. The predicted subcellular locations showed that all BGLU proteins from M. truncatula were located in the vacuole or chloroplast, except MtBGLU33 that was located in the extracellular or Golgi.

Multiple sequence alignment, phylogenetic analysis, and classification of MtBGLU genes
The phylogenetic relationship of the MtBGLU proteins was examined by multiple sequence alignment analysis, which showed high conservation among each other (Fig. S1). All MtBGLU protein sequences contain the key amino acids involved in enzymatic catalysis: catalytic acid/base (WI/T/VTF/L/VNEP) and nucleophilic glutamate residues (ITENG), except for MtBGLU49 that either is a non-functional 6 enzyme or has a catalytic mechanism different from all the other MtBGLUs. Sequence-based phylogenetic analysis between Medicago and Arabidopsis showed that these proteins were grouped into 7 distinct clusters . Among these 51 MtBGLU proteins, 25 belong to cluster I, 8 to cluster III and 6 to clusters II, IV and V, respectively (Fig. 1). Clusters I, III, IV and V contain members from Medicago and representatives from Arabidopsis, clusters AtI and AtII with members only from Arabidopsis, and cluster II with only members from Medicago.

Gene structure, conserved domain and motif pattern of MtBGLU genes
In order to better understand the evolution of the GH1 family members in M. truncatula, we conducted the exon/intron structures of all the identified MtBGLU genes. As shown in Fig. 3B, all MtBGLU genes possessed 6 to 17 exons (five with 10 or less exons, twelve with 11 to 12 exons, twenty five with 13 exons, and nine with 14 or more exons). For analysis of the BGLU domain, the shortest conserved amino acid length is 223 aa for MtBGLU49, the longest is 503 aa for MtBGLU47, and the average length of conserved amino acid among MtBGLUs is 457 amino acid (Fig. 2B, Fig. S2).
To identify the conservative structure of MtBGLU proteins, 10 motifs were constructed through the MEME motif analysis, it showed that the majority of the MtBGLU proteins (88.2%) contained all these 10 motifs (Fig. 2C). The lengths of these conserved motifs ranged from 13 to 50 amino acids (Fig. S3).

Chromosomal distribution and synteny analysis of MtBGLU genes
The MtBGLU genes were unevenly distributed on seven chromosomes (chr1-7) except chromosome 8 ( Fig. 3). In addition, one gene (MtBGLU48) localized on scafford0108 and three (MtBGLU48M tBGLU51) on scafford0110 (Fig. 3). Around one third of them were present on chromosome 4, and only two or three genes located on chromosome 1, 5 and 6.
By performing MCscan, we defined the tandem duplication and segmental duplication of MtBGLU genes in M. truncatula genome, which contribute to the formation of gene family during the process of evolution. Five MtBGLU gene pairs (MtBGLU8/9, MtBGLU15/16, MtBGLU33/34, MtBGLU43/44 and MtBGLU49/50/51) were identified as tandem duplication, and they were localized on chromosomes 2, 3, 4, 7 and scaffold 0110. In addition to tandem duplication events, two segmental duplication events with four MtBGLU genes (MtBGLU8/26 and MtBGLU32/35) located on chromosome 2, 4, 5 were also identified. All above results inferred that both tandem duplication and segmental duplication events played an important driving force for MtBGLU gene evolution, and the former played a predominant role.
Furthermore, three comparative syntenic maps of M. truncatula associated with two representative plant species Arabidopsis (dicot) and rice (monocot) were constructed to illustrate the evolution mechanism of MtBGLU gene family (Fig. 4). A total of 4 MtBGLU genes showed a syntenic relationship with those in Arabidopsis and rice, respectively (Additional file 1). Seven and five orthologous pairs were found between Arabidopsis and rice, respectively. MtBGLU1 and MtBGLU33 were found to be associated with two collinear gene pairs in Arabidopsis, which implied that these two genes may play an important role among GH1 gene family during evolution. Besides, another two syntenic pairs (MtBGLU1 and MtBGLU35) were identified between M. truncatula and A. thaliana/O. sativa, which indicated that these collinear pairs may have already existed before the divergence of ancestral genes.
In addition, Ka/Ks analysis of the MtBGLU gene pairs were analyzed in order to better understand the effect of evolutionary stress on the formation of MtBGLU gene family (Additional file 1). All tandem and segmental duplicated MtBGLU gene pairs, and the orthologous MtBGLU gene pairs had Ka/Ks value of less than 1, and these results clearly indicated that MtBGLU genes of the GH1 family might have undergone strong purifying selective pressure during evolution.

Analysis of cis-acting elements in the promoter regions of MtBGLU genes
To further explore potential regulatory mechanism of MtBGLU gene under hormone and stress responses, the 2.0-kbp upstream promoter regions of MtBGLU genes were submitted into PlantCARE to scan for the cis-acting elements. Two types of cis-acting elements, phytohormone responsive and 8 abiotic and biotic stress-responsive elements, were detected and presented in Fig. 5. Firstly, ten hormone-responsive elements were widely presented in their promoter regions, including TGAelement, AuxRR-core (auxin responses), GARE, P-box, TATC-box (gibberellin responses), TGACG-motif, CGTCA-motif (MeJA responses), ABRE (Abscisic acid responses), ERE (ethylene responses) and TCAelement (salicylic acid responses). Seventy-five of these elements were ABRE, ERE, TGACG-motif and CGTCA-motifs, among the phytohormone responsive clusters. Secondly, six abiotic and biotic stressresponsive elements were detected, namely TC-rich repeats, W-box (stress responses), WUN motif (wound responses), MBS (drought inducible), LTR (low-temperature responses) and ARE (anaerobic induction) (Fig. 5). Additionally, many light-responsive regulatory elements and MYB binding sites were found in the promoter regions of majority of the MtBGLU genes (Additional file 2). Various cisacting elements identified in the promoters of MtBGLU genes implied that they might be involved in the response to various stresses and hormone treatments by participating in distinct regulatory processes.

Analysis of the transcription levels of MtBGLU genes from microarray data
BGLUs are known to play an important role in plants' response to environmental stresses [10,11].
However, few BGLUs were documented to be involved in these stresses in M. truncatula [27]. In the present study, seven genechip data sets of M. truncatula were retrieved from MtGEA Web Server, including salinity, drought, limit N, bacteria and fungus as well as YE, MeJA and NAA treatments. A total of 82 probes corresponding to 36 MtBGLU genes (70.6%) were identified. One representative probe for each gene was selected for expression analysis, and the expression level of the representative probes are relatively close to the average value of several probes (Additional file 3).
Our expression analyses indicated that many MtBGLU genes were induced in response to these stresses, especially under NaCl, PEG and NAA treatments (Fig. 6A counted the number of treatments for each MtBGLUs that were up-regulated by more than two fold, which is consistent with the above results for the six genes (Fig. 7).
Because of the close relationship between gene expression level and gene function, the expression profiles of MtBGLUs in eight tissues (root, shoot, leaf, vegetative bud, stem, petiole, 20-day-old seed, flower and pod) from microarray were investigated (Fig. 8A, B and Additional file 3). It showed that MtBGLU genes showed different transcription levels in various tissues. Most MtBGLU genes in cluster I had specifically low transcription level in roots; instead, genes in cluster II were highly expressed in all other tissues except roots. MtBGLUs in other three clusters (III, VI and V) had distinct expression patterns in different tissues, without obvious consistency. In general, the expression profiling of MtBGLU genes varies greatly, suggesting they might be involve in different functions.

Validation of the expression profile of MtBGLU genes by qPCR
Among all 51 MtBGLU genes, ten of them (MtBGLU14, 16,18,19,21,22,26,28,30 and 34) were highly induced by various stresses, and two of them (MtBGLU43 and MtBGLU44) were relatively highly expressed in various tissues (Fig. 7, 8B), based on available microarray dataset. To valid this result, these 12 representative MtBGLU genes were further validated by qPCR analysis.
The expression levels of these 12 representative MtBGLU genes were carried out with six another set of tissues: stems, roots, leaves, flowers, pods and seeds (20-day-old). It was revealed that MtBGLU28,  19,21,22,26,28 and 30 were significantly up-regulated to a certain level, which is consistent with the above microarray data (Fig.   7). MtBGLU18 and MtBGLU44 were obviously down-regulated during the late treatment stages.
However, MtBGLU16, MtBGLU34 and MtBGLU43 seemed to have no significant response under these stresses before 72 h of NaCl treatment (Fig. 9).
In IAA treatment, the expression levels of six MtBGLUs were greatly activated, including MtBGLU14, 19,21,22,28  Collectively, qPCR analyses of 12 representatives MtBGLU genes during various treatments were strongly paralleled with their expression pattern obtained from genechip data (Fig.7, Fig. S4). qPCRs analyses also confirmed that four MtBGLU genes (MtBGLU21, 22, 28 and 30) were frequently and strongly activated by various treatments, and they are most likely key BGLU genes involved in response to stress and hormone stimuli in M. truncatula. Discussion β-glucosidases play diverse and important roles in response to developmental and environmental cues in plants, including roles in recycling of cell-wall oligosaccharides, lignification, activation of phytohormones, defense, secondary metabolism [10,14]. It is worth noting that BGLUs participates in most of the above processes by an effective mechanism that hydrolyze (activate) inactive and stable glucoside compounds to defense plant against various stresses. So far, the genome-wide investigation of GH1 gene families have been carried out predominantly in model plants A. thaliana, O. sativa and Z. mays [23][24][25]. However, no comprehensive characterization of stress-or hormone-responsive BGLU was reported at gene family level. In the current study, therefore, a search for BGLU genes in the M.
truncatula genome brought about the identification of 51 BGLU gene members, which were characterized by a set of sequence analyses (phylogenetic tree, gene structure and motifs composition, gene distribution and synteny relationship), as well as gene expression patterns in response to various abiotic/biotic stresses and hormone treatments.
Multi-sequence alignments revealed that all deduced MtBGLU protein sequences shared high sequence identity with each other (Fig. S1) Results from MEME analysis also elucidated that most BGLU family members in either Medicago or Arabidopsis contain all 10 motifs (Fig. S5). Therefore, all other 50 MtBGLUs (except MtBGLU49) with appropriate active glutamic acids and typical sequence length seemed to possess the potential to produce catalytically active β-glucosidases, and exhibited a feature with highly conserved protein during evolution. And it is meaningful to deepen our knowledge on how plants regulate these individual members by releasing metabolites to modulate particular physiological process.
Further, subcellular localization and the conditions under which BGLUs come into contact with their physiological substrates determine their biological function. Some BGLUs may be targeted to specialized cellular organelles, like endoplasmic reticulum bodies in Brassicaceae [28]. The cellular locations of others are supposed to be standard cell compartments or apoplast, although it may be specific regions of a compartment, such as specific sections of the cell wall or peroxisome [15,29]. In M. truncatula, each BGLU member (except MtBGLU33) has dual targeting to either vacuole or chloroplast,. Besides, each BGLU member contains at least one predicted N-glycosylation site, and the process of glycosylation facilitates the storage and transport of inactive enzymes, which correspond to our prediction that a large proportion of them catalyze the hydrolysis through the secretary pathway (Table 1) including myrosinases in cluster At I and ER body-localized BGLUs in cluster At II. Actually, ortholgous genes of these two clusters also present in other plants of Brassicales order, but not any closely related Medicago counterparts (At I ~ II in Fig. 1), indicating they appear to have diverged before Arabidopsis and Medicago. Similarly, six Medicago members in cluster II have no any closely related Arabidopsis homologs, indicating that these genes have diverged before Arabidopsis and Medicago.
By combining gene expression, phylogenetic and synteny analysis, some valuable knowledge was acquired about the biological function of MtBGLU genes that involved in specific physiological process.
For example, MtBGLU43 exhibited the same expression profile to its orthologs MtBGLU44, which showed relatively high expression levels in various tissues except for roots, indicating that they may be involved in important and specific physiological processes of the aerial part in Medicago.
MtBGLU16 was specially expressed in roots with extremely low expression levels in other detected tissues, which is consistent with its ortholog MtBGLU15 with the highest expression level in roots.
Interestingly, their closest ortholog gene was AtBGLU42 in Arabidopsis, which is expressed Consisted with previous studies in other plant species [14], our current research showed that several MtBGLU genes were differentially expressed following various abiotic stresses and hormone treatments, highlighting the diverse roles of BGLU genes in plants defense. It is noteworthy that four genes (MtBGLU21, MtBGLU22, MtBGLU28 and MtBGLU30) in cluster I are strongly induced not only by abiotic stresses but also by hormone treatments, indicating their important role against stresses in Medicago GH1 family (Fig. S4). Coincidentally, MtBGLU30 (MtG2) has been reported to be involved in turnover of formononetin glucoside during wound signal-induced medicarpin synthesis in M.
In addition, cis-acting elements function as important molecular switches associated with transcriptional regulation of gene expression controlling various biological processes [45]. In the promoter regions of MtBGLU genes, several phytohormone-and stress-responsive regulatory elements were identified, which may be interacted with transcription factors to activate stress tolerance and produce chemical defense compounds in Medicago. In the promoter region of MtBGLU26, nine ABRE elements potentially involved in ABA regulation was identified, which might be associated with activation of MtBGLU26 under ABA treatment. The identified regulatory elements in Medicago will help in understanding their roles in various abiotic and biotic stresses. Overall, this study provides insight into potentially functional roles of MtBGLU genes during developmental and environmental signals, and further experiments will be useful for identification and characterization of key MtBGLU genes, such as protein structure, enzyme activity assay, subcellular localization and gene functions by analyzing loss-of-function mutants and transgenic plants.

Conclusion
In the present study, a comprehensive investigation of GH1 β-glucosidases in M. truncatula were  (Table S2).

Phylogenetic analysis and classification of the MtBGLU genes
The amino acid sequences of 47 BGLUs derived from Arabidopsis and 51 newly identified MtBGLUs were used for phylogenetic analysis. All of these sequences were firstly aligned by using ClustalX with the default parameters. Subsequently, an unrooted neighbor-joining phylogenetic tree was constructed based on the neighbor-joining method (with 1000 bootstrap replicates) using MEGA-X software.

Analysis of cis-acting elements in the promoter region of MtBGLU genes
The cis-acting elements were predicted from the 2 kb upstream promoter sequences of the MtBGLU genes which were uploaded to PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) [50].

Analysis of the expression levels of MtBGLU genes from microarray data
We download all the Gene Chip data from M. truncatula Gene Expression Atlas (https://Mtgea.noble.org/v3/), which has been developed as a compendium or "atlas" of gene expression profiles for the M. truncatula genes. The selected genechip data covered all its major organ systems (roots, nodules, stems, petioles, leaves, vegetative buds, flowers, seeds and seed pods), from plants subjected to various abiotic and biotic stresses, and specific cell and tissue types.
Amazing HeatMap software [48] was used to generate the heatmap.

Plant materials and treatments
The M. truncatula (cv. Jemalong A17) used in this study was initially obtained from the Noble

Ethics approval and consent to participate
Not applicable.

Consent for Publication
Not applicable.

Availability of data and material
All data of the current study is available in the manuscript.

Competing interests
The authors declare that they have no competing interests.

Table S1
List of primers used in this research.

Table S2
List of all MtBGLU genes identified in the M. truncatula genome.
28 Table S3 MtBGLU genes and their functions as identified and characterized in previous reports.     The relative expression levels are log2-transformed and visualized for heatmap. The colors vary from blue to red, and circle from small to large representing the scale of relative expression levels. c.
Presence of treatments in which each MtBGLU gene was up-regulated by more than two fold. Different colors represent different treatments. There were only two replicates for NaCl and drought treatments, and each of them is calculated as 0.5.  Expression profiles of 12 representative MtBGLU genes in response to abiotic stress treatments. Data were normalized to actin-related protein 4A gene and vertical bars indicate standard deviation.
Asterisks indicate the corresponding gene significantly up-or down-regulated compared with the untreated control (*P < 0.05, **P < 0.01, Duncan's t-test) Figure 10 Expression profiles of 12 representative MtBGLU genes in response to different hormonal treatments.
Data were normalized to actin-related protein 4A gene and vertical bars indicate standard deviation.
Asterisks indicate the corresponding gene significantly up-or down-regulated compared with the untreated control (*P < 0.05, **P < 0.01, Duncan's t-test)

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.