Genome-wide analysis of the CalS gene family in cotton reveals their potential roles in fiber development and responses to stress

Callose deposition occurs during plant growth and development, as well as when plants are under biotic and abiotic stress. Callose synthase is a key enzyme for the synthesis of callose. In this study, 27, 28, 16, and 15 callose synthase family members were identified in Gossypium hirsutum, Gossypium barbadense, Gossypium raimondii, and Gossypium arboreum using the sequence of Arabidopsis callose synthase. The CalSs were divided into five groups by phylogenetic, gene structure, and conservative motif analysis. The conserved motifs and gene structures of CalSs in each group were highly similar. Based on the analysis of cis-acting elements, it is inferred that GhCalSs were regulated by abiotic stress. WGD/Segmental duplication promoted the amplification of the CalS gene in cotton, and purification selection had an important function in the CalS family. The transcriptome data and qRT-PCR under cold, heat, salt, and PEG treatments showed that GhCalSs were involved in abiotic stress. The expression patterns of GhCalSs were different in various tissues. We predicted that GhCalS4, which was highly expressed in fibers, had an important effect on fiber elongation. Hence, these results help us understand the role of GhCalSs in fiber development and stress response.

Cotton is an important economic crop in China which yield is affected by biotic and abiotic stresses, producing prevalent fibers for textile industry (Wang et al., 2020;Fang et al., 2014). Some researches have reported that cotton fiber elongation was related to callose deposition which may be involved in the closing of plasmodesmata, then promoted the fiber length (Ruan et al., 2004). It is possible that callose affects fiber elongation by controlling CalS. In light of the above, the CalS may play a significant role in cotton in responsing various stresses and promoting fiber elongation. A callose synthase gene, CFL1 was identified (Cui et al., 2001), however, the callose synthase gene family members, phylogenetic relationships and expression patterns in cotton are still unclear.
In the current study, we identified callose synthase genes in two cultivated allotetraploids cotton, G. hirsutum and G. barbadense, and their two putative genome donors, G. raimondii and G. arboreum, then discussed their phylogenetic relationships, conserved domains, gene structures, synteny, and cis-acting elements. We also focused on the expression patterns of GhCalSs in various tissues and their expression under abiotic stress. These findings provide a solid foundation for further study of the roles of CalSs in cotton fiber development and stress responses.

Plant materials and treatments
Upland cotton TM-1, planted in Anyang Institute of Technology, was subjected to salt stress (350 mM NaCl) and drought stress (12% PEG6000) when the seedling reached two weeks. The leaves were collected 0 h, 1 h, 3 h, 6 h, 12 h, and 24 h after treatment. CCRI45 and MBI7747 were planted on farms managed by Cotton Research of Chinese Academy of Agricultural Sciences in Anyang. Cotton fibers were collected at 5, 10, 15, 20, 25 days post-anthesis (DPA). All samples were stored at −80 • C.

Phylogenetic tree construction, gene structure, and motif analysis
The phylogenetic tree among four Gossypium species and Arabidopsis thaliana was constructed by MEGA7 (Kumar, Stecher & Tamura, 2016). It was constructed by the neighbor-joining (NJ) method, with 1,000 bootstrap replicates, then was drawn by using EvolView (He et al., 2016). TBtools was used to extract the location information of CalSs and visualize the gene structure. MEME (https://meme-suite.org/meme/tools/meme) was used to identify the conservative motif with the parameter set to the maximum number of motifs: 20.

Chromosome location and synteny analysis for CalSs
The locations of CalSs on chromosomes were shown by TBtools using four cotton species genomic annotation files (Chen et al., 2020). MCScanX was used to analyze the collinearity of the CalSs, that is, using CalS protein sequences to analyze the orthologous and paralogous gene pairs (Chen et al., 2020). Collinear gene pairs were visualized by using the circos (Chen et al., 2020). To investigate the selection pressure between homologous genes, we calculated the nonsynonymous substitutions rate (Ka) and synonymous substitutions rate (Ks) of homologous genes by KaKs_Calculator (Wang et al., 2010).

Analysis of Cis-acting element in promoters and functional enrichment analysis
The 2,000 bp sequence upstream of the translation initiation codon ATG of CalS gene was selected as promoter. The cis-acting elements contained in the promoter region of the CalSs were predicted using the PlantCare website (Lescot et al., 2002). For functional enrichment analysis, gene ontology (GO) analysis was performed using the OmicShare tool (https://www.omicshare.com/tools).

RNA isolation and qRT-PCR analysis
FastPure Plant Total RNA Isolation Kit (RC401, Vazyme) was used to extract RNA, and then we used 1µg to synthesize cDNA (HiScript III 1st Strand cDNA Synthesis Kit, R312 Vazyme). ChamQ Universal SYBR qRT-PCR Master Mix (Q711, Vazyme) was used for qRT-PCR in ABI 7500 Fast Real-time PCR System (Applied Biosystems, USA). Gene-specific primers for qRT-PCR were designed by using primer-blast in NCBI, with melting temperatures of 55-60 • C, product lengths of 101-221 bp, primer length of 18-25 bp (Table S1). For qRT-PCR, the reaction contains 10 µL 2x ChamQ Universal SYBR qPCR Master Mix, 0.4 µL of each primer, 3 µL template, and ddH 2 O to make up the total 20 µL volume. Then it was carried out in the following condition: one cycles of 95 • C for 30 s, 40 cycles of 95 • C for 10 s and 60 • C for 30 s. Each experiment was repeated three times, and two of the completed data were selected for drawing. Expression of all genes were calculated using a 2 − Ct method (Livak & Schmittgen, 2001).

Identification and characterization of CalSs in Gossypium spp
Through the analysis of the CalS protein sequences in Arabidopsis thaliana, we found that all the protein sequences contain 1, 3-β-glucan synthase (PF02364) and FKS1_DOM1 domain (PF14288), total of 27 members of the CalS gene family in G. hirsutum, 28 in G. barbadense, 15 in G. raimondii, and 16 in G. arboretum were identified, all of which were named according to their chromosomal locations. The properties of CalSs in cotton were further analyzed (Table S2). The protein sequence length of CalSs ranged from 1,494 to 1,979 amino acids, with an average MW of 212.88 kDa, and shared high similarity to the Arabidopsis thaliana CalS proteins (Hong, Delauney & Verma, 2001). The isoelectric point (PI) values of the above genes were all greater than 7, indicating that CalSs in cotton were alkaline, which was the same as the biochemical properties of the CalSs in Chinese cabbage and Brassica (Pu et al., 2019;Liu, Zou & Fernando, 2018). The CalSs were most likely localized in the plasma membrane, as predicted in Arabidopsis thaliana and Chinese cabbage (Pu et al., 2019;Zavaliev et al., 2011).

Classification and phylogenetic analysis of the cotton CalSs
In order to investigate the evolutionary relationships of the CalSs in the four cotton species and its relationship with Arabidopsis thaliana, a phylogenetic tree was constructed using the protein sequences of CalSs (Fig. 1). Based on the phylogenetic tree of this study, the CalSs were divided into five groups. The distribution of CalSs in each group was shown in Table S3. The members of Group A were homologous to AtCalS11/AtCalS12, the members of Group B were homologous to AtCalS9/AtCalS10, the members of Group C were homologous to AtCalS6/AtCalS7, the members of Group D were homologous to AtCalS8, and the members of Group E were homologous to AtCalS1-5. There were Arabidopsis genes homologous to cotton in each group, further indicating that the cotton CalSs and the Arabidopsis thaliana CalSs were close in evolutionary, which was consistent with the evolutionary relationship between Arabidopsis and cotton. It is observed that most of the CalSs derived from At-subgenome of two cultivated allotetraploids cotton stayed close together with the CalS gene of G. arboretum, and the CalSs of Dt-subgenome stayed close together with the CalS gene of G. raimondii, which was consistent with the hypothesis that the allotetraploid cotton species were produced by the recombination of two diploid cotton species (Liu et al., 2015). Phylogenetic tree analysis suggested that the CalS homologous gene in cotton may have similar functions.

Gene structure and amino acid motif analysis of the CalSs
The diversity of gene structure and differences in conserved motifs are the manifestations of the evolution of multigene families (Magwanga et al., 2018). The distribution of exon/intron regions of CalSs was analyzed to understand the diversity of gene structure (Fig. 2). The number of CalSs exons varied from 1-51, and most CalSs had more than 35 exons (57/86, 66.2%). Clearly, these CalSs were divided into an exon-poor group (<7 exons, group A) and other exon-rich groups (>37 exons, group B-E) (Fig. 2, Table S2). The exons of CalSs had high similarity in the same group, and the number of exons in group B, group D, and group E were the same (Fig. 2, Table S2).

Chromosomal location, gene duplication, and syntenic analysis of the CalSs in Gossypium spp
Based on the sequencing and annotated information of the four cotton genomes, the chromosome length and the distribution of genes on chromosome could be analyzed (Fig. 3). The distribution of CalSs in the two heterotetraploid cotton species chromosomes was highly similar. For example, CalSs had the same number and distribution on chromosomes A03, A04, A05, A08, A11. In G. hirsutum, 27 GhCalSs were distributed on 15 chromosomes, including 13 GhCalSs in At-subgenome and 14 GhCalSs in Dt-subgenome. In G. barbadense, 28 GbCalSs were distributed on 16 chromosomes, including 14 GbCalSs in At-subgenome and 14 GbCalSs in Dt-subgenome. In G. arboreum, 16 GaCalSs were Gene duplication is the basis for the functional differentiation of homologous genes, the main reason for the generation of new functional genes (Conant & Wolfe, 2008). In order to explain the gene replication events of CalSs in cotton, we identified 15, 14 paralogous gene pairs in G. hirsutum, G. barbadense respectively, and one pair in G. arboreum. But there was no paralogous gene pair in G. raimondii (Table S4). GhCalS21/22, GbCalS1/2 as well as GbCalS22/23 were tandem duplication. In the four cotton species, the duplication events of the CalSs were WGD/Segmental, Tandem Duplicates, Dispersed, and proximal duplication, and the main expansion mechanism was WGD/Segmental (Table S2). In order to illustrate the collinearity of CalS genes, we analyzed the orthologous and paralogous gene pairs (Fig. 4, Table S4). There were 31 CalS orthologous gene pairs among G. arboretum and two allotetraploid cotton species, including 15 pairs between with At-subgenome of G. hirsutum and 16 pairs between with At-subgenome of G. barbadense. There were 17 CalS orthologous gene pairs among G. raimondii and two allotetraploid cotton species, including nine pairs between with Dt-subgenome of G. hirsutum and eight pairs between with Dt-subgenome of G. barbadense. Meanwile, Ka/Ks of CalS homologous pairs were calculated to further understand the adaptation of the CDS region of CalSs (Fig.  4, Table S5). Most of the homologous gene pairs Ka/Ks < 1, and about 94.6% gene pairs had a Ka/Ks ratio less than 0.5, which meant that almost all gene pairs underwent purification selection. Only Ka/Ks > 1 of GaCalS2/GbCalS1 indicated that this was a positive selection for beneficial mutations.

Analysis of Cis-acting elements in promoter
Transcription factors can be combined with cis-elements in the promoter region to regulate gene transcription. Investigation of upstream regulatory sequence can help us to well understand the regulation mechanism and also supportive to estimate the potential function of the gene (Fig. 5, Table S6). Given the effect of plant hormones in abiotic stress, we focused on plant hormone responsive elements in promoter regions. ABA-(ABRE), auxin-(AuxRR-core, TGA-element), Gibberellin-(GARE-motif, P-box, TATC-box), MeJA-(CGTCA-motif, TGACG-motif), SA-(TCA-element) responsive elements were

Expression patterns of the GhCalSs under abiotic stresses
Previous studies have reported that the CalSs respond to abiotic stresses (Cui & Lee, 2016;Jacobs et al., 2003;Fromm et al., 2013). To understand the response of GhCalSs, we used public RNA-seq data of TM-1 treated with cold, hot, NaCl, and PEG to observe the expression patterns of GhCalSs (Fig. 6). Interestingly, all expressed GhCalSs were induced by different abiotic stresses, and the expression patterns were different. The expression of GhCalS3 was significantly up-regulated under cold, hot, NaCl, and PEG. The expression patterns of GhCalSs in the same group were slightly consistent, such as GhCalS3 and GhCalS6, GhCalS2 and GhCalS16. In order to verify the results obtained by the above transcriptome, cotton seedlings were treated with PEG and NaCl, and then the GhCalS2/3/6/9/16 were selected for qRT-PCR (Fig. 7). The expression of GhCalS3 and GhCalS6 in Group A were up-regulated within 24 h under PEG treatment and reached the peak at 24 h. The expression of GhCalS2, GhCalS9, GhCalS16 were up-regulated at first and then down-regulated and last up-regulated after PEG treatment. After NaCl treatment, there was no consistent trend of gene expression. GhCalS3 was significantly induced by NaCl and significantly up-regulated at 3 h. Both GhCalS2 and GhCalS16 of Group B were down-regulated within 24 h. The expression of GhCalS6 and GhCalS9 reached a peak at 12 h. These findings indicated that the expression patterns of several GhCalSs were changed after treatment, which proved that GhCalSs increased adaptability to abiotic stress (Fig. 6).

GhCalSs expression patterns in various tissues and their role in fiber development
We used transcriptome data from different tissues of GhCalSs to gain insight the tissuespecific expression patterns of cotton. For instance, GhCalSs were expressed in various tissues, and some of them were highly expressed. Some of GhCalSs were expressed in one or more tissues (GhCalS2,3,4,6,8,9,15,16,19,20,21). However, the expression of a few genes (GhCalS1,5,7,10,11,12,13,14,17,18,22,23,24,25,26,27 ) did not show any expression in any tissues. In order to determine the effect of GhCalSs in cotton fiber development, we focused on the expression of GhCalSs in different fiber developmental stages of two samples, MBI7747 and CCRI45, with different lengths and strengths (Lu et al., 2017) (Fig. 9). The expression of GhCalS4 was the highest in TM-1, MBI7747, CCRI45 fiber tissue, so it was speculated that GhCalS4 had an important function in cotton fiber development. In order to further determine its function in fiber development, qRT-PCR was used to analyze the GhCalS4 expression differences in two samples (Fig. 9). The results showed that the expression level of GhCalS4 in the two samples gradually increased from 5 DPA to 25 DPA, which was consistent with the transcriptome data of TM-1 used above. The expression of GhCalS4 in CCRI45 was higher than in MBI7747 at 5DPA, 10DPA, 15DPA but was significantly lower than that of MBI7747 at 25DPA (Lu et al., 2017). Thus, GhCalS4 may be involved in cotton fiber elongation.

DISCUSSION
Callose plays a vital role in plant growth, development, and resistance to various adverse factors (Piršelová & Matušíková, 2012). The gene family of callose synthase has been identified in a variety of plants. In this study, we identified the CalSs in G. hirsutum, G. barbadense, G. raimondii, and G. arboreum, aiming to understand the role of CalS family in the cotton development.
A total of 86 CalSs were identified in four cotton species. They were divided into five groups based on evolutionary relationships. we divided CalSs (AtCalS9-12 homologous) into two groups due to the large difference of the CDS number, and the other groups were the same as those in Arabidopsis thaliana. The CalSs number was 2:2:1:1 in group B/C/D, which was consistent with the evolutionary relationship among cotton species (Table S3). Compared with Arabidopsis thaliana, different percentages existed between subgroups. The percentages of Group A and Group E were significantly different from those of the corresponding CalSs in Arabidopsis thaliana, suggesting that these genes in cotton may have functional differences with homologous genes in Arabidopsis thaliana to a certain extent. These results will help to validate the function of cotton CalS homologous gene with Arabidopsis thaliana.
Tetraploid cotton species are formed by natural crossbreeding between G. raimondii and G. arboretum (Wendel et al., 2009). Thus, the four cotton species are closely related in evolution. In Fig. 4, the orthologous gene pairs of CalSs were all clustered in the same branch or group. Phylogenetic and orthologous genes of CalS further indicated that the results of this study were consistent with the evolutionary view.
A large number of hormone-responsive elements were identified on GhCalSs promoters which may be involved in the regulation of GhCalSs. Salicylic acid (SA) was an endogenous signal molecule in plants (Loake & Grant, 2007). In Arabidopsis thaliana, the expressions of AtCalS1/5/9/10/12 were up-regulated by exogenous SA. Abscisic acid (ABA) played an important part in coping with a variety of adverse factors, closely related to callose synthesis (Liu et al., 2017). During the dormancy of Populus tomentosa buds, short-day induced ABA biosynthesis, promoted the expression of PtCalS1, callose deposited at the plasmodesmata to form blockage, which prevented the growth signal molecules from entering the cell and kept the dormancy state of buds (Tylewicz et al., 2018). Jasmonic acid (JA) was also involved in callose regulation, and Methyl Jasmonate (MeJA) application promoted callose deposition in grape leaves. Inhibition of the expression of Cationic peroxidase 3 (OCP3), a negative regulator of the JA pathway, increased callose deposition (Repka, Fischerová & Šilhárová, 2004). In conclusion, ABA, JA, and SA were involved in the regulation of callose deposition. GhCalSs promoters with ABA, SA and JA response elements were highly likely to be regulated by them in cotton. However, how CalS gene is regulated by these hormones in the face of biotic-abiotic stress or growth and development is not known, which needs to be further studied.
Callose deposition is one of a series of coping strategies in plants to abiotic stress. Low temperature stimulation of maize leaves increased callose content and reduced transport of photosynthate in phloem (Wu et al., 2018). In Arabidopsis thaliana, AtCalS7, AtCalS8 and AtCalS12 were associated with callose synthesis under the condition of wound (Jacobs et al., 2003;Cui & Lee, 2016;Xie et al., 2011). In this study, public transcriptome data were used to analyze the responses of cotton leaves to cold, heat, salt and drought, and qRT-PCR was used to verify the results, which showed that CalSs were involved in abiotic stresses.
Callose deposits regulate material transport and control plant development. In this study, GhCalS4 was highly expressed in fibers and differentially expressed in MBI7747 and CCRI45 fibers at each fiber developmental stage (5/10/15/20/25 DPA). It has been reported that callose deposition may be involved in the closure of plasmodesmata, and the closure of plasmodesmata had an important function in the elongation of cotton fibers (Ruan et al., 2004). In Sea Island cotton, plasmodesmata remain open longer than in Upland cotton, allowing sucrose to be fed into fibroblasts, which eventually increase osmotic potential by hydrolysis to fructose and glucose. The more soluble sugar, K + accumulated, the higher the cell leavening pressure, which promoted the elongation of cotton fiber (Hu et al., 2019). MBI7747 is a chromosome segment substitution line (CSSL) with different genetic background constructed by crosses between the upland cotton CCRI45 as the recurrent parent and the Sea Island cotton Hai 1 with outstanding fiber quality through the combination of high-generation backcrossing and molecular marker-assisted selection. The fiber length and strength of MBI7747 are better than CCRI45. In CCRI45, the expression level of this gene at 5DPA, 10DPA and 15DPA were all higher than those of MBI7747 during fiber elongation, and it was speculated that the degree of callose deposition in MBI7747 was lower than that of CCRI45, which made more sucrose input into fiber cells to increase osmotic potential and promote fiber elongation. Thus, GhCalS4 may be an introgression gene or there was difference in epigenetic regulation.

CONCLUSIONS
In this study, we identified 86 CalSs from G. hirsutum, G. barbadense, G. raimondii, and G. arboreum using conserved domains. Phylogeny, gene structure, motif, chromosome location and homologous genes were analyzed. It indicated that CalSs have been highly conserved during evolution by the analysis of CalSs structure, conversed motifs, and syntenic blocks. WGD/Segmental replication was the main driving force for the amplification of CalS family in cotton, and purification selection played an important role in the evolution of CalSs. In addition, the cis-acting elements of GhCalSs related to hormone regulation and development and their expression patterns in stresses and tissues were also analyzed. CalS gene can be induced by abiotic stress. Furthermore, the expression difference of GhCalS4 in fiber of different length and strength materials was analyzed. It was speculated that GhCalS4 played a major role in fiber elongation. These findings could lay a foundation for further study on the role of CalS gene in stress response and fiber development.