Genome-wide bioinformatics analysis of Cellulose Synthase gene family in common bean (Phaseolus vulgaris L.) and the expression in the pod development

CesA and Csl gene families, which belong to the cellulose synthase gene superfamily, plays an important role in the biosynthesis of the plant cell wall. Although researchers have investigated this gene superfamily in several model plants, to date, no comprehensive analysis has been conducted in the common bean. In this study, we identified 39 putative cellulose synthase genes from the common bean genome sequence. Then, we performed a bioinformatics analysis of this gene family involving sequence alignment, phylogenetic analysis, gene structure, collinearity analysis and chromosome location. We found all members possess a cellulose_synt domain. Phylogenetic analysis revealed that these cellulose synthase genes may be classified into five subfamilies, and that members in the same subfamily share conserved exon-intron distribution and motif compositions. Abundant and distinct cis-acting elements in the 2 k basepairs upstream regulatory regions indicate that the cellulose synthase gene family may plays a vital role in the growth and development of common bean. Moreover, the 39 cellulose synthase genes are distributed on 10 of the 11 chromosomes. Additionally expression analysis shows that all CesA/Csl genes selected are constitutively expressed in the pod development. This research reveals both the putative biochemical and physiological functions of cellulose synthase genes in common bean and implies the importance of studying non-model plants to understand the breadth and diversity of cellulose synthase genes.


Background
In plants, there exists a cellulose synthase superfamily including CesA (cellulose synthase) and Csl (cellulose synthase-like) gene family, both of which belong to the glycosyltransferase GT2 family and have a similar protein sequence structure. The encoded proteins all have glycosyltransferase activity and are key enzymes essential for cellulose and hemicellulose synthesis [1,2], which are main components of the primary and secondary cell wall.
The cellulose synthase (CesA) gene was first identified from cotton fibers according to its sequence similarity with a bacterial CesA gene [3]. Subsequently, cellulose synthase genes were identified in Arabidopsis [4], rice [5], and maize [6], barley [7]. The CesA family contains a conserved motif (DDDQxxRW) and a zinc-finger domain [8]. In Arabidopsis thaliana, AtCesA1, AtCesA3, and AtCesA6 form a cellulose synthase complex and participate in the synthesis of the primary cell walls. Whereas, AtCesA4, AtCesA7, and AtCesA8 mediate in the synthesis of the secondary cell walls. It is generally accepted that despite the involvement of most CesA in the synthesis of the primary and secondary cell wall [9][10][11], AtCesA2, AtCesA5, AtCesA9 are considered homologous proteins of AtCesA6, and these proteins are functionally redundant with each other.
Not only a major source of protein and essential nutrients, common bean (Phaseolus vulgaris L.) is also an important crop to society and the global economy [27]. However, the CesA/Csl gene families in common bean have not yet been extensively explored. Molecular biology, genomics, and computational biology have transformed the field of biology, gene discovery and functional gene annotation in plant genome-wide data is a rapidly growing research area. Considering the critical role of CesA/Csl in both the integrity and function of plants, we present a comprehensive phylogenetic and functional bioinformatics analysis of the CesA/Csl gene family in common bean. Then, using quantitative real-time polymerase chain reaction (qRT-PCR) analysis on genes identified in our computational pipeline, we validate the main genes central for the development of legume pods. These findings shed new light on the relationship between CesA/Csl function and the development of common bean. Furthermore, this research presents a theoretical framework for gene cloning and expression in the future, with the application of genetically improving the common bean through breeding.

Identification of cellulose synthase genes in common bean
In order to identify the cellulose synthase gene family of common bean, first, the Hidden Markov Model of 40 Arabidopsis cellulose synthase proteins was constructed, then the model was used as queries to search against the common bean protein databases with the BLASTP program at an e-value threshold of 10-10. Then, we searched for the cellulose synthase gene family of common bean using the constructed model and finally a total of 39 sequences can matched to CesA/Csl superfamily. 14 gene members contained a cellulose synthase domain (CS) and zinc finger structure (zf-UDP), 25 gene members only harbored a CS domain. The identified cellulose synthase proteins were named according to the order of their subfamilies and gene IDs.
These putative cellulose synthase genes in this analysis were predicted to range from 467 to 1374 amino acids in length and 53.34 kDa to 155.53 kDa in molecular weight. Furthermore, the protein isoelectric points (pIs) ranged from 5.62 to 9.05, the number of predicted TMHs ranged from 0 to 13. The subcellular localization of the putative cellulose synthase genes was predicated to be located in the membrane bound golgi and plasma membranes PHAVU_005G116500g, which were exist in extracellular (secreted) ( Table 1).

Phylogenetic analysis of cellulose synthase gene in common bean
A phylogenetic analysis was used investigate the evolutionary relationships among cellulose synthase proteins. Constructed with cellulose synthase proteins from Arabidopsis and 39 from common bean, the phylogenetic analysis showed that 15 putative cellulose proteins from common bean belong to the CesA family, while the remaining 25 cellulose synthase proteins are members of the Csl (B, D, E, and G) family (Fig. 1). CslD is close to CesA, while CslG is distantly related to the other families.

Gene structure analysis of cellulose synthase gene in common bean
Exon-intron structures of each CesA/Csl gene were constructed through the sequence alignment of their corresponding genomic DNA. Based on the phylogenetic analysis, putative CesA/Csl genes' exon/intron structures in common bean were organized into five subgroups (Fig. 2). CesA/Csl genes in the same subgroup had conserved exon/intron structures, while genes in different groups exhibited distinct gene structures. And we found that CesA gene members had the most introns, while the CslD gene members had the fewest number of introns.

Conserved motif domains of CesA/Csl gene in common bean
To evaluate the structural diversity of cellulose synthase proteins, we used the online program MEME (http:// meme. sdsc. edu/ meme/ cgi-bin/ meme. cgi) to search for conserved motifs in putative cellulose synthase protein  (Fig. 3). We identified 15 conserved motifs, and found that members in the same subfamily shared similar conserved motifs (Fig. 3). In addition, members in the CesA and CslD groups contained more motifs than members in CslB,, CslE and CslG groups (except PHAVU_005G116500).

Promoter regions analysis of CesA/Csl genes
To identify the cis-elements in the promoters of CesA/Csl genes in common bean, the 2000 bp basepairs upstream of the start codon of each gene were analysed using Plant-CARE online (http:// bioin forma tics. psb. ugent. be/ webto ols/ plant care/ html/). The results showed that abundant cis-elements were present in the promoters of CesA/ Csl genes (except PHAVU_L008300). CAAT-box and TATA-box were the most abundant elements. TATA-box, a core promoter element, which located in about 30 bp upstream of the transcription start site, while CAATbox is a common cis-acting element in promoter and enhancer regions. Also present were MYB transcription factor binding sites (TAA CCA ), light response element Box 4, and stress response elements, including MYC (in response to drought stress) and ARE (related to anaerobic stress). In addition, hormone response elements, ERE and ABRE, were observed, which respond respectively to ethylene and abscisic acid. The cis-acting elements had diverse functions and abundant types, indicating that the cellulose synthase gene family may plays an important role in the growth and development of common bean (

Chromosome location of CesA/Csl genes in common bean
Then, we investigated the chromosome distribution of the 39 cellulose synthase genes using the physical  The comparative synteny relationship map of Phaseolus vulgaris revealed a high degree of similarity with Glycine max (Fig. 6A) and a low degree of similarity with Arabidopsis thaliana (Fig. 6B).

Expression Profiles of CesA/Csl genes in common bean pod development
To investigate the functions of CesA/Csl genes in common bean pod development, we used RT-qPCR with gene-specific primers ( Table 2) to analyze the expression levels of CesA/Csl genes at three distinct pod developmental stages. A total of 21 CesA/Csl genes included in this analysis were selected based on results from the sequence alignments, phylogenetic analysis, and gene structure analysis. All CesA/Csl genes (seven CesA, four CslD, four CslB, two CslG and four CslE) were expressed at all three stages of pod development, suggesting their important roles in the development of the pod in common bean (Fig. 7A, Fig. 7B).
The expression of 7 CesA genes were evaluated and the results showed that these genes in CesA subfamily diaplayed temporal variations in different pod development of common bean. The expression of 5 CesA genes showed a trend of first increasing in stage S2 and then decreasing in stage S3 (PHAVU_005G022100g, PHAVU_009G094200g, PHAVU_002G188600g, PHAVU_004G093300g and PHAVU_009G205200g), while the expression of PHAVU_003G154600g showed an opposite trend (Fig. 7A). Moreover, the expression level of PHAVU_007G190300g decreased with the pod development (Fig. 7A).
In Csl genefamily, we found that all Csl genes selected were expressed at all three stages of pod development. PHAVU_001G211000g, PHAVU_011G020100g in CslD subfamily, PHAVU_005G020100g in CslB subfamily showed similar expression pattern: increasing in the S2 stage and decreasing in the S3 stage, which PHAVU_002G188600g, PHAVU_004G093300g and PHAVU_009G205200g in CesA subfamily (Fig. 7B). PHAVU_002G136300g in CslD subfamily, PHAVU_009G2260000g in CslG subfamily and PHAVU_L008300g in CslE subfamily diaplayed similar expression pattern: decreasing in the S2 stage and slight increasing in the S3 stage. Whereas PHAVU_008G279800g and PHAVU_006G058700g in CslE subfamily showed similar expression trend, but the expression level of PHAVU_008G279800g and PHAVU_006G058700g in S3 stage significantly higher than the expression level in S1 stage. In addition, the expression level of PHAVU_005G1163000g in CslB subfamily decreased with the development of pod, the expression level of PHAVU_011G101500g in CslB subfamily and PHAVU_003G290600g in CslG subfamily increased only in the S3 stage, the expression level of PHAVU_003G023000g in CslD subfamily significantly increased in the S2 and S3 stages (Fig. 7B).

Discussion
Until now, the CesA/Csl gene family has been extensively characterized in many plant species, including Arabidopsis, barley, cotton, rice, sorghum, soybean [11,[28][29][30][31][32]. However, this gene family remain unidentified and uncharacterized in common bean. In this study, we conducted a genome-wide survey and identified 39 putative CesA/Csl genes in common bean genome ( Fig. 1 and Table 1). This results coupled with the sequence alignment, phylogenetic analysis, gene structure construction, chromosome location and expression analysis, could provide important clues in understanding the roles of the CesA/Csl superfamily in in pod development in higher plants.
The CesA/Csl gene family found across plant species may be subcategorized into nine groups: CslA-CslH and CslJ [2,33]. All land plants contain CslA, CslC, and CslD, while CslF and CslH are found only in grasses, and cereals do not usually contain CslB or CslG [34,35]. Using phylogenetic analysis, the 41 CesA/Csl genes in Arabidopsis were categorized into one CesA group and six Csl groups (Csl A-E and G) [11]. In this study, the phylogenetic analysis showed that the 39 putative CesA/ Csl genes in common bean could be classified into 5 subfamilies: CesA, CslD, CslB, CslG, and CslE (Fig. 1), consistent with studies of plants and algae [1,35,36]. From the phylogeny we can found that CslD is close to CesA, which is consistent with the earlier reports suggesting a common origin and conserved domians of this two families [37].
Among them, 15 putative Cellulose Synthase genes clustered into the CesA gene family, which was the most abundant genes among the 40 CesA/Csl genes, while the remaining 24 genes clustered into the other 4 CesA/Csl subfamilies (Fig. 1), suggesting that they have experieced extensive expansion and diversification [33].
Investigation of gene structure and function lends a better understanding of the evolution of a gene family, revealing the divergence, conservation, or expansion of a given gene family [32,[38][39][40]. Similar to other plants, such as soybean [32] and tomato [31], most CesA/Csl genes (CesA and CslD members) share a similar gene structure in each subfamily (Fig. 2), suggesting that they are highly conserved. In contrast, members in CslB, CslG and CslE subfamily exhibit variable gene structures possibly due to chromosome fusion and/ or rearrangement [40]. Therefore, tandem or segmental duplication events in the CesA/Csl gene family have resulted in shared exon/intron structures and similar structural organization in each gene subfamily. Phylogenetic and domain analyses confirm these results. Chromosome mapping in this study further revealed that the tandem duplications also existed in CesA/Csl gene families (Fig. 5).
The cis-elements analysis detected a larger amount of cis-elements in the putative promoter regions of the CesA/ Csl genes in common bean, which suggested that CesA/ Csl genes might have potential roles in many signaling pathways.
CesA/Csl genes have been found to play an important role in plant cell walls in the biosynthesis of cellulose and hemicellulose [7,16,41]. During the pod development of common bean, the expression profiles of 21 CesA/Csl genes were revealed by RT-qPCR. The results showed that all 21 CesA/Csl genes were expressed in all three pod development stages, suggesting that all these genes are necessary for the pod growth. Most CesA genes in this study expressed highly in the young pod (S2 stage), which is in accordance with the results found in soybean [32], suggesting that these CesA genes may be involved in cellulose synthesis during the early pod development stage in common bean. We also found that 3 genes in CslD subfamily showed high expression level in the early pod development stage. And the expression level of genes in CesA and CslD subfamilies is higher than that of other Csl genes, which implies that gene members in CesA and CslD subfamilies are more actively involved in seed development than other Csl genes. Therefore, future investigation should aim to identify each CesA/Csl gene's function in common bean.

Conclusions
Based on the genomic data, 39 cellulose synthase genes were identified from common bean. The genes encoding these proteins were distributed unevenly on 10 chromosomes, and there were 3 tandem duplications. These 39 cellulose synthase proteins could be divided into five subfamilies according to their structure and phylogenetic relationship, members in the same subfamily share conserved exon-intron distribution and motif compositions. Based on the analysis of cis-element in the promoter region, we found abundant and distinct cis-acting elements, which indicate that the cellulose synthase gene family plays a vital role in the growth and development of common bean. Additionally transcriptional analysis showed that 21 CesA/Csl genes selected were constitutively expressed in the pod development, CesA/Csl gene members in different groups showed different expression trend at three stages of pod development. In general, this study revealed a putative biochemical and physiological functions of cellulose synthase genes in common bean, which provides a foundation for further function identification of CesA/Csl gene family.

Identification of CesA/Csl gene family in common bean
The Hidden Markov Model (HMM), which established by 39 CesA/Csl protein sequences of Arabidopsis, was used to search the CesA/Csl gene family in common bean genomes at an e-value cutoff of 1e −10 . To ensure genes identified with HMM model were accurate, further filtering of unique sequences was performed according to typical structural features of plant CesA/Csl proteins. The Phytozome 11.0 (https:// phyto zome. gji. doe. gov/) and ExPASy databases (https:// web. expasy. org/ compu te_ pi/) were used to obtain gene ID / name, chromosome location, peptide length and isoelectric point/ molecular weight, and functinal annotation information [42]. TMHMM v. 2.0 was used to predict the TMDs for each putative peptide (http:// www. cbs. dtu. dk/ servi ces/ TMHMM/).

Sequence and phylogenetic analyses
ClustalW was used to perform alignments of both CesA/ Csl nucleotide and amino acid sequences. Including amino acid sequences of cellulose synthase proteins from Arabidopsis and common bean, the phylogenetic analysis was performed using a neighbor-joining tree method with 1000 bootstrap replicates in the software program MEGA5, which was also used to visualize the phylogenetic analysis. Protein subcellular location were analyzed by WoLF PSORT (http:// psort. nibb. ac. jp).
The chromosome distribution of all cellulose synthase genes of common bean was identified, and the location of CesA/Csl genes was drafted with MapChart v2.0 [44]. Phytozome 11.0 Network database (https:// phyto zome. jgi. doe. gov/) was also used to obtain genomic DNA and complementary DNA (cDNA) sequences of the putative cellulase synthase genes used in this analysis.

Cis-Element analysis of putative promoter regions and synteny analysis
Using the Phytozome 11.0 network database (https:// phyto zome. jgi. doe. gov/), 2 Kbp regulatory regions upstream from the start site of translation of CesA/ Csl genes were retrieved. Then, the PlantCARE online was used to investigate the putative cis-regulatory elements in these promoter region sequences. The location of cis elements was annotated and displayed in a figure by building a physical gene map using a Perl and Scalable Vector Graphics (SVG) script. Finally, syntenic relationships of all CesA/Csl were analyzed using Circoletto [45].

Expression analysis of CesA/Csl gene family in Phaseolus vulgaris
Bean seeds were grown in pots in the open-air soils. Pods were harvested at different stages: 7 (S1), 14 (S2), and 21 (S3) days after flowering. Then, the total RNA of these pods was isolated using the Promega Plant RNA Kit (Promega, Beijing, China) according to the manufacture's instructions. Single-stranded cDNA was synthesized using 2 μg of total RNA and Oligd(T)18 primer with the Takara RT-PCR system in a total volume of 25 μl according to the protocol. Three independent PCR reactions were carried out for the 63 putative genes using SYBR Green Supermix (Takara) according to the manufacturer's protocol in an ABI 7500 Real-time system (ABI, CA, USA). IDE (insulin degrading enzyme) was used as an internal control to normalize the expression of CesA/  To visualize the relative expression levels data, S1 stage was normalized as "1", and data are means ± SD calculated from three biological replicates. * indicate significant differences in comparison with the control at p < 0.05