Genome wide identi cation and functional characterization of the cotton C-repeat binding factor (CBF) under cold stress condition


 Background Low temperature is a common biological abiotic stress in major cotton growing areas. Cold stress significantly affects the growth, yield and yield quality of cotton. Therefore, it is important to develop a more robust and cold stress tolerant cotton germplasms. In response to climate change and erratic conditions, plants have evolved various survival mechanisms, one of which induction of various stress responsive transcription factors, such as the C-repeat binding factors (CBFs), which have been found to enhance cold tolerance in various plants. Results In this study detailed evaluation of the cotton CBF has been carried out. A total of 29, 28, 25, 21, 30, 26 and 15 proteins encoded by the CBF genes were identified in Gossypium herbaceum, Gossypium arboreum, Gossypium thurberi, Gossypium raimondii, Gossypium turneri, Gossypium longicalyx and Gossypium australe, respectively. Phylogenetic evaluation revealed that the proteins were grouped into seven clades, with clade 1 and 6 being the largest. Moreover, majority of the proteins encoded by the genes were predicted to be located within the nucleus, while some were distributed in other parts of the cell. Based on the transcriptome and RT-qPCR analysis, Gthu17439 (GthCBF4) was highly upregulated under cold stress, and was further validated through forward genetics. The Gthu17439 (GthCBF4) overexpressed plants showed a significantly tolerance to cold stress, with higher germination rate, higher root growth and high induction levels of stress responsive genes. The over-expressed plants exhibited low level of oxidative damage, due to significant reduction in the H2O2 production. Conclusion The results showed that the Gthu17439 (GthCBF4) could be playing a significant role in enhancing cold stress tolerance in cotton and can be further exploited in developing cotton germplasm with an improved cold-stress tolerance

In order to understand the effect of cold stress on plants, it is important to distinguish between cold (0-15 o C) and freezing/chilling stress (< 0 o C). Cold stress leads to metabolic injury, destruction of the stability of protein complexes, inhibition of the metabolic pathways and various cellular processes in varying degrees, and eventually damage to the photosynthetic processes [12][13][14]. Freezing/chilling temperature promotes the ice formation in intercellular spaces of plant tissues, signi cantly reduces the e ciency of photosynthesis, and increased release of reactive oxygen species (ROS). Increased release of ROS leads to chloroplast membrane oxidation, moreover, increased levels of ROS initiate a stress signal mechanisms in plants which signi cantly alter gene expression at chloroplast and nucleus level thereby promoting lipid homeostasis, thereby increasing plants adaptation to chilling stress [15].Furthermore, when ice crystals are deposited on the cell wall, the extracellular water potential decreases and the cell membrane is destroyed, resulting in serious cell dehydration [16]. The most common phenomenon to resist the adverse effects of cold is acclimatization through the evolution of stress responsive genes, for instance, the C-repeat binding factors (CBFs), being the CBF1 and CBF3 play a signi cant role in cold acclimation-dependent freezing tolerance [17]. Many plants have increased freezing resistance after a period of low non-freezing temperature, a phenomenon known as cold acclimation. Cold acclimation is an effective way to increase plant freezing resistance after a period of low non-freezing temperature [18]. When plants are exposed to chilling conditions, the plants do mobilize the cold-response genes (COR), which then activates.
C-repeat binding factors (CBFs), followed by the accumulation of cryoprotectants, which results in the acquisition of freezing tolerance [11]. When plants are exposed to nonfreezing cold condition, the CBFs are rapidly induced by low temperature, and then the downstream target gene COR become activated [19]. In addition, as the expression of CBF is regulated by light quality, biological clock and photoperiod, it is necessary to understand the daily and seasonal regulation of the CBF genes [20]. The C-repeat binding factor/dehydration-responsive element binding factor 1 (CBF/DREB1) is well-studied as an integral transcription factor in the cold regulatory pathway. It is an adaptive response where plants increase freezing tolerance after exposure to low non-freezing temperatures [21]. The CBF/DREB protein, a DNA binding protein belonging to AP2/ERF superfamily, was rst identi ed in Arabidopsis thaliana as a cold (LT)-induced transcription factor [22]. APETALA2/ethylene-responsive element binding factor (AP2/ERF) family is a large class of plant-speci c transcription factors, including AP2, RAV, ERF and CBF [23,24]. The expression of CBF gene was very low at room temperature, but increased rapidly within 15 minutes of plants exposure to cold stimulation [4,18]. During cold acclimation, mobilization of the CBF activated cold response (COR) gene and increased accumulation levels of the cryoprotectants, do signi cantly enhance the cold tolerance in plants [11]. The CBF / DREB gene is activated by CBF expression (ICE) inducer through speci c binding of cis-elements to MYC (c-Myc, L-Myc, S-Myc, and N-Myc) in the promoter region [18,22,25].
The cold reaction pathway of ICE-CBF-COR has been proved to be the effective defense mechanism to cold stress [4]. Moreover, the known six members of the CBF family members CBF1/DREB1C, CBF2/DREB1B, and CBF3/DREB1A are induced by cold stress, while CBF4/DREB1D, DREB1E/DDF2, and DREB1F/DDF1 are induced by osmotic stresses [26]. Furthermore, the CBF1, CBF2 and CBF3 are arranged in tandem in an 8.7-kb region of the short arm of chromosome 4 in Arabidopsis [22]. All the three CBF Transcription factors perhaps acts as the activator of the expression of downstream CORs genes, but with different functions [27]. The CBF1/2/3 triple mutant Arabidopsis seedlings obtained by CRISPR/Cas9 technology are more sensitive to low temperature than the single mutant CBF2 and CBF3, as well as the double mutant CBF1/3 [17], which is an indication that CBF genes play signi cant role enhancing cold stress tolerance in plants. .
The CBF Transcription factors have been widely identi ed and isolated from rice, tomato, Brassica napus, wheat, barley and maize, which show that the CBF family is large in scale and complex in structure [2,28]. It has been reported that AtCBF gene was overexpressed in Brassica napus transgenic plants, and improved the cold tolerance [2,29]. The combination of AtDREB1A gene with stress-induced RD29A promoter improved the tolerance in transgenic tobacco to drought and cold stress [30,31]. Through phylogenetic analysis of Arabidopsis CBFs and its orthologous genes in other plants, it was found that CBFs were highly conservative in phylogeny [18]. In understanding cold stress response among the cotton genotypes, Cai et al (2019) found that the cotton germplasms of the D genome respond differently to drought stress, in which G. thurberi exhibit higher levels of cold stress tolerance by mobilizing the antioxidant enzymes and induction of stress responsive genes such as the CBF4 and ICE2. As an important oil and ber crop, cotton has been planted in more than 70 countries and plays an important role in the global economy. However, cotton yield is often adversely affected by abiotic stresses. Therefore, studying the molecular adaptation mechanism to stress resistance in cotton is of great signi cance for improving cotton yield. The 21 CBF genes have been cloned from G. hirsutum, which provides useful clues for understanding the cold tolerance mechanism in cotton. However, due to the limited genome sequence, the expression pro le of CBF family and its phylogenetic relationship with other plant the role of the CBF members are still unclear. In order to better understand the function and evolutionary relationship of the CBF gene family in cotton, we analyzed the structural variation and evolution pattern of CBF family based on the genome-wide data of several cotton species, and explored the molecular mechanism of cold adaptation formation in G. thurberi. This study provides fundamental information and reference for further research on the molecular mechanism of the CBF genes and their regulatory role to cold adaptation in cotton.

Identi cation of CBF family genes in the cotton genome
The availability of the whole sequences for the seven cotton species enabled us to identify the CBF proteins harbored in their genome. The Pfam domain PF00847 was used as the query to obtain the CBF proteins, and nally 29 members of the CBF genes were identi ed in G. herbaceum, 28 members in G. arboreum, 25 members in G. thurberi, 21 members in G. raimondii, 30 members in G. turneri, G. longicalyx had 26 members while G. australe had 15 CBF genes. Three representative cotton species from these seven species were then chosen for further detailed analysis: G. herbaceum, G. thurberi and G. australe.
The CBF CDS length in G. herbaceum ranged from 306 bp to 1,230 bp, while for G. australe, CDS ranged from 429 bp to 1,077 bp. In the analysis of the physiochemical properties of the CBF proteins, the results show a great difference, for instance, the CBF proteins obtained from the G. herbaceum, their molecular weights (MW) ranged from 11,241.88 Da to 39,020.73 Da; isoelectric value (pI) ranged from 5.26 to 10.27. While in G. thurberi, the MW of the proteins encoded by the CBF genes ranged from 16.17923kDa to 45.59153kDa, the pI ranged from 5.11 to 10.83. Similarly in G. australe, it ranged from 15.67636kDa to 40.86796kDa. The CBF genes differed substantially by the encoded protein size and its biophysical properties (Table S1).

Phylogenetic analysis of cotton CBF gene family
In order to determine the phylogenetic relationship of the CBF proteins, we constructed a phylogenetic tree by MEGA7.0, using the Neighbor-joining (NJ) method with minimal evolution and maximum parsimony. The CBF proteins were clustered into 7 clades and designated as clade 1 to 7 (Fig. 1). Clade 1 contained 61 CBF protein sequences at most, while clade 3 contains only 7 CBF amino acid sequences. Consistent with previous classi cation, all of the Arabidopsis CBFs were distributed among the clade 6 [2,24,29]. Except G. australe did not appear in clade 3, the other cotton species were distributed in all 7 groups.
Chromosomal mapping, Gene structure and C-terminal conserved motifs analysis All the genes located on various chromosomes in the three cotton genomes, were named according to their position on the chromosome. In G. herbaceum a member of the diploid type of the A genome, the CBF genes were mapped on all the chromosomes, except chromosome Chr01 which harbored no CBF genes. The highest gene loci were observed in chromosome Chr05 and Chr07 with 5 and 6 genes respectively, while the lowest gene loci was observed in chromosome Chr04,06,08 and Chro09 with a single gene locus in each ( Figure 2A). In the diploid of the D genome, G. thurberi, chromosome Gthu_1, Gthu_8 and Gthu_9 harbored no genes, while chromosomes, Gthu_5, Gthu_7, and Gthu_12 had more gene loci, however, the highest gene loci was noted in chromosome Gthu_05, 07 and 12 with 4, 6 and 5 CBF genes respectively, similarly lowest gene loci was noted on chromosome Gthu_3, 6 and 13 with a single gene locus each ( Fig 2B). Finally, in G. australe diploid cotton of the G genome, no CBF genes were found in chromosome G6, G9, G11 and G12, but the highest gene loci was only observed in chromosome G7 with 5 genes ( Figure 2C).
In order to analyze the motifs, we employed MEME to detect conserved motifs in the CBF family. There were 10 conserved motifs distributed in each CBF family ( Figure 3A). Almost all CBF proteins had the same number of motifs; three motifs were prominent among the CBF proteins, motifs 1, 2, and 4. Analyzing the arrangement of exons and introns can provide important insights into the evolution of gene families [13]. To study the exon/intron structure of the CBF gene, the CDS and the genome sequence were compared. The results showed that most of G. herbaceum and G. australe harbored no intron, and indication that the genes were highly conserved, but for the G. thurberi the least number of exons were two but others contained as higher number ( Figure 3B-E). Basically members of the same group phylogenetically harbored similar number of intron-exon ratio. Stress regulates the expression of primary and specialized metabolism genes at the transcriptional level via transcription factors binding to speci c cis-elements, a number of cis-regulatory element were found to be associated with the various cotton CBF genes, for instance MYB, ABA-responsive element (ABRE), long term repeat (LTR) among others ( Figure  3F), the MYB, ABRE are among the top ranked stress responsive cis-regulatory elements [33]. The detection of myriad of cis-regulatory element reveals the signi cant role played by the members of the cotton CBF genes in enhancing stress tolerance, more so cold stress in plants.
In determining the possible cellular sublocalization of the proteins encoded by CBF genes, an online tool wolf sport was employed. Among the three cotton species, the highest proportions of CBF proteins were embedded in the nucleus. Moreover, the nucleus is the central regulator of the cellular activities. However, other signals were observed in other cellular compartments, such as the chloroplast, mitochondria, plasma membrane, vacuole membrane and chloroplast (Table S2) RNA-seq analysis and RT-qPCR validation of the CBF genes under cold stress conditions G. thurberi transcriptome data was used to analyze the expression patterns of the 24 CBF genes under cold stress ( Figure 4A). Among them, only 17 genes were differentially expressed under cold stress. According to G. thurberi transcriptome data (previous work by the research team, yet to be published), 12 differentially expressed CBF genes were selected, and 15 and 13 genes were selected from the two cotton varieties G. herbaceum and G. australe based on the RNA sequence data. RT-qPCR was employed to analyze the expression pattern; most genes were up-regulated in the three cotton species ( Figure 4B). Among the 12 genes in G. thurberi, 10 genes exhibited signi cantly higher upregulation, with only 2 genes being downregulated. The expression trend was consistent with the transcriptome data. In G. herbaceum, 9 genes were up-regulated, and 6 were down-regulated. In G. australe, 8 were up-regulated and 5 were downregulated. The higher level of upregulation among the CBF genes across the various cotton types is an indicator that the proteins encoded by the CBF genes could be playing an integral role in enhancing cold acclimatization in cotton plants. Moreover, by integrating the transcriptome data and RT-qPCR result; we selected one of the highly up regulated CBF genes, GthCBF12.5 (GthCBF4) for further validation.

Experimental validation of subcellular localization of cotton CBF proteins
The results showed that the none pCAMBIA2300-eGFP-Flag-GthCBF4 infused plants showed green uorescence signals on both the nucleus and cell membrane, while the fusion protein pCAMBIA2300-eGFP-Flag-GthCBF4 only had green uorescence signals in the nucleus ( Figure 5), indicating that the protein encoded by the gene was localized in the nucleus. The results were in agreement to the bioinformatics prediction of the possible cell compartmentalization of the CBF proteins.
Phenotype and cell damage identi cation of GthCBF4 overexpressed Arabidopsis under low temperature.
Two highly expression lines OE-1 and OE-3 were selected for phenotypic evaluation ( Figure 6A), phenotypically, the GthCBF4 overexpressed lines (OE-1 and OE-3) showed no signi cant difference with the wild types under normal conditions, but when the plants were exposed to cold stress, the survival of the WT were signi cantly reduced, while the OE-1 and OE-3 showed higher level of survival ( Figure 6B), in which the OE plants survival rate was estimated at 60% compared to the WT with only 2% survival rate ( Figure 6C). in determining the expression levels of the GthCBF4 overexpressed gene, at 0h, the WT showed no expression, but at 1h and 3h of cold stress exposure, the CBF genes were partially inducted but insigni cantly, slightly below one, while the reverse was observed among the OE plants, The expression levels of the GthCBF4 overexpressed gene was signi cantly high, with 0h showing expression levels close to three folds, while at 1h and 3h, the expression levels of the GthCBF4 overexpressed gene was above four folds ( Figure 6D). .Under normal conditions, the stained blue area on the leaves of the transgenic overexpression plants and wild-type plants was very small. While under cold treatment, the blue areas on transgenic leaves were signi cantly smaller than the wild-type, the color depth was also lighter. Furthermore, the DAB staining method was used to re ect the accumulation of H 2 O 2 in Arabidopsis leaves. The accumulation of H 2 O 2 in the transgenic overexpression leaves and the wild-type were very low under normal conditions, and the production of brown matter was hardly seen. But after cold treatment, the brown area on the wild type Arabidopsis leaves was obviously larger than the wild type, and the color depth was also deeper ( Figure 6E). The results showed that the overexpression of the CBF gene in the transgenic lines improved the ability of the plants to oxidize the oxidative agents such as the H 2 O 2 thereby reducing the levels of oxidative damage in the plants.
Evaluation of physio-morphological traits in GthCBF4 Overexpressed and wild type plants under low temperature environment Germination of the OE and WT lines showed no signi cant differences under normal or controlled conditions, however, under cold stress, none of the WT germinated while the OE lines showed some level of germination ( Figure 7A-B), an indication that the OE lines were signi cantly improved and were able to adapt to cold stress condition. In the evaluation of the root lengths, no signi cant differences was observed under controlled conditions, but under cold stress conditions, the OE lines showed higher root lengths compared to the WT ( Figure 7C-D), thus the overexpression of the CBF genes could be playing a role either in the rate of cell division and or cell elongation. We further evaluated known stress responsive genes such as the COR15A RD29A KIN1 and COR47 [17]. The OE lines signi cantly showed higher expression levels to all the stress responsive genes pro led ( Figure 7E). The high induction levels of the stress responsive genes in the OE lines, showed that the overexpressed genes do not suppress the expression of the stress responsive genes but do promote their expression, an indication that the overexpressed gene could be playing a vital role in enhancing cold stress tolerance in plants.

Discussion
Cotton is an important economic crop, which supports the economies of a number of countries globally, moreover, it is the main source of natural ber critical raw material for the textile industries [34], however, in the recent past, its global production has signi cantly shrunken due to climate change. Rainfall has become erratic and scarce, and even the average daily temperature has signi cantly increased with prolonged periods of cold weather, thereby affecting the growth and development of the plant. In order to improve the performance of current elite cultivars, a number of strategies have been proposed, among them is the utilization of the plants transcription factors and other novel genes to improve the adaptability of the current cotton germplasms to ever changing environmental conditions. Cold stress has never been a major concern in cotton production, but the climate dynamics, has increased deleterious effects of cold stress, for instance, in the 1980s, united states of America loss due to chilling effects was estimated at 60 million dollars [35]. Furthermore, in very sensitive crops, cold stress causes considerable agricultural yield loss, particularly in sensitive crops like maize, rice and chickpea [36]. To increase the chance of survival, plants have adopted numerous mechanisms to cope with the ever changing environmental conditions, one of which is evolution of novel genes with tremendous effects in improving plants adaptability to various environmental stress factors, for instance the LEA genes have been found to offer drought stress tolerance in cotton [37].
Although abiotic stress is a major challenge in cotton growth, there is no detailed study on the CBF gene in cotton [38]. In previous studies, CBF family have been identi ed in cotton (Gossypium hirsutum) [26], wheat [39], lettuce [40], Brassica napus [41], Barley [42] and soybean [43], but there is few study in diploid cotton. In this work, genome wide identi cation, characterization and functional analysis of the proteins encoded by the cotton CBF genes was done, in which 29, 25 and 15 CBF proteins encoded by the CBF genes were identi ed in G. herbaceum, G. thurberi and G. australe, respectively. The results obtained showed that the proteins encoded by the CBF genes in cotton were higher compared to other plants such as lettuce with 14, Brassica napus with 10 and soybean with 14 CBF genes, but less than hexaploid wheat with 65 CBF genes and barley with 20 CBF genes. The high number of the proteins encoded by the CBF genes in cotton perhaps could explain the signi cant role they play in plants in relation to stress tolerance. Moreover, studies have found that Arabidopsis CBF family members have an AP2 domain, and each has a conserved amino acid sequence upstream and downstream of AP2. The upstream of AP2 is PKK/PKKPAGR (RAGRxxKFxETRHP) and the downstream is DSAWR [44]. If the PKK/PKKPAGR mutation can inhibit the binding ability of CBF to the COR promoter of its downstream genes, thereby weakening the level of CBF regulation, this sequence is necessary for CBF to perform its transcription factor function. Cotton genome contains a large and complex CBF subfamily, with a conserved AP2/EREBP domains and with CBF-like characteristics, indicating that cotton CBFs have similar function with other CBFs in dicotyledons [27]. Phylogenetic analysis showed that the CBF families are divided into seven groups, among these genes CBF1, CBF2 and CBF3 are all induced by cold stress in Arabidopsis thaliana. Therefore, we speculate that CBF genes may also respond to abiotic stress, especially to cold. Further analysis showed that the isolated CBF gene was highly expressed under cold stress, consistent with previous research results.
The cotton CBF genes phylogenetic analysis revealed that the entire members of the cotton CBF family were grouped into seven clades, this deviates from the previous ndings in which ve clusters were observed among the CBF members obtained from lettuce [40], but were in agreement to ndings obtained for the Tea Plant (Camellia sinensis) in which the CBF and their homologs were classi ed into six groups [45]. In plants, the transcriptional regulation of osmotic stress response mainly depends on two main cisregulatory elements, which are related to stress response genes ABREs and dehydration response elements (DREs). The two cis-regulatory elements were found to be associated with the CBF genes, which is an indicator that the members of the CBF gene family could perhaps be playing a signi cant role in enhancing cold stress tolerance in plants. Moreover, the DREs are mainly involved in ABA-independent pathways [46], and ABRE is responsible for detecting ABA-mediated osmotic stress signals [47]. In this study, the CBF genes detected for the three cotton species were rich in stress response genes (ABRE), dehydration response element (DRE) and cold stress response element (LTR). It is suggested that exogenous environmental stress can induce the expression of CBF gene through its response to cisregulatory elements, and further improve the resistance of plants to environmental stress.
It is universally accepted that transcription factors must be present in the nucleus to perform their functions [48]. In the evaluation of the possible cell compartmentalization of the proteins encoded by the CBF genes, both prediction and experimental evaluation showed that majority of the CBF proteins are localized within the nucleus. Several studies have found that a number of stress responsive proteins are sub-localised within the nucleus, for instance calmodulin-regulated proteins, in plants, the activity of a pea apyrase, PsNTP9, is localized in the nucleus, and have a signi cant role in enhancing resistance to heavy metals and xenobiotic Compounds [49]. Moreover, when plants exposed to environmental stress, there is increased release of oxidizing agents, which destroys the DNA, to counteract the DNA damage, plant cells evolved mechanisms for the DNA repair in both nucleus and mitochondria [50], thus the abundance of the CBF proteins in the nucleus could be signi cant in the DNA repair as a result of oxidative damage..

Overexpression of Arabidopsis CBF gene in other plant species or overexpression of CBF of other species
in Arabidopsis has revealed the potential of the CBF genes in enhancing frost resistance [51]. Moreover, it has been shown that the CBF gene plays an important role in plant cold acclimatization, downregulation of CBF1 and CBF3 results in 25-50% reduction in tolerance to cold stress levels in pre-cold treated mutant plants [52]. In this study, the GthCBF4 was strongly up-regulated in cotton seedlings under cold treatment. The survival rate of GthCBF4 transgenic Arabidopsis thaliana plants was signi cantly improved after freezing treatment. Furthermore, the inducer of CBF Expression (ICE) is a pioneer of cold acclimation, an MYC-type basic helix-loop-helix family transcription factor (TF) [53]. It has been reported that ICE1 plays a signi cant role in C-repeat binding factor 3 (CBF3) cold induction [25]. When plants encounter cold stress, ICE1 could be mobilized from JAZs bound by DELLAs and trigour the expression of CBF3. The CBF3 activates the expression of GA2ox7 to reduce the bioactive gibberalic acid (GA) level, which enhance the accumulation of DELLAs. Therefore, DELLAs can control the cold induction of CBF3 through ICE1 via JAZs. Moreover, from the trypan blue staining and DAB staining revealed that the GthCBF4 GthCBF4 overexpressed plants had a signi cant reduction in cold injury compared to the wild types, a strong indicator of the possible roles of CBF genes in enhancing cold stress tolerance in plants. Moreover, root growth was signi cantly higher in GthCBF4 overexpressed plants.
In order to further elucidate the critical roles played by the proteins encoded by the CBF genes, known stress responsive genes were pro led, and the results showed that overexpression of the CBF gene signi cantly increased the induction levels of the stress responsive genes. In general, the expression level of the four COR genes were signi cantly correlated with the freezing tolerance level. Furthermore, COR15A, which encodes the chloroplast-targeted polypeptide, enhance the cold resistance of the chloroplasts [54]. In addition, the CBF genes have the ability to induce the expression of the COR47, RD29A and KIN1 genes thereby improving the plants tolerance level to cold stress [17]. The up-regulation of the stress responsive genes further con rmed that the GthCBF4 overexpressed plants ability to tolerate the effects of cold was signi cantly higher compared to the wild types. In the past two decades, the transcriptional network of CBF signaling pathway have been extensively studied, and many COR genes have been identi ed in genome-wide expression pro les, and about 10-25% of them are regulated by CBF, which implies that more early cold regulated transcription factors are mainly involved in improving cold tolerance.

Conclusions
In conclusion, based on the genome wide identi cation and functional analysis of the proteins encoded by the CBF genes, a total of 69 CBF genes were identi ed in three diploid cotton species, in which 29, 25 and 15 CBF proteins in G. herbaceum, G. thurberi and G. australe, respectively. The RNA data and the expression pro ling of the CBF genes revealed that higher proportions were upregulated under cold stress conditions. moreover, overexpression of the Gthu17439 (GthCBF4) gene in model plant, Arabidopsis, revealed that the GthCBF4 overexpressed plants ability to tolerate cold stress was higher compared to the wild types, as evident on the germination, root growth and induction of various stress responsive genes.
The results therefore provides fundamental information for future exploration of the CBF genes in the development of a more robust cotton germplasms, which are highly resilient to cold and or chilling stress.

Materials And Methods
Identi cation of CBF family genes in the cotton genome Wild diploid cotton species genome data was retrieved from the CottonGen (https://www.cottongen.org/) to construct a local BLAST database (https://blast.ncbi.nlm.nih. gov/Blast.cgi?PAGE=Proteins) . The Arabidopsis CBF protein sequences were used as probes to compare with the wild diploid cotton. The Evalue threshold for BLASTP was set at 1e −10 to obtain the nal dataset of the CBF proteins. The protein family Pfam (Finn et al., 2014), and a Simple Modular Architecture Research Tool (SMART) (http://smart.embl-heidelberg.de/) databases were employed to con rm each predicted CBF protein sequences [55]. Redundant sequences and incomplete sequences were removed. The sequences of 10 Gossypioides kirkii and 9 Theobroma cacao CBF proteins were obtained from the CottonGen (https://www.cottongen.org/) and national center for biotechnology information (NCBI) (https://blast.ncbi.nlm.nih.gov/Blast.cgi ) databases, respectively. In addition, physicochemical parameters including the molecular weight (MW) and isoelectric point (pI) of each gene product were calculated using compute the pI/Mw tool from ExPASy (http://www. expasy.org/tools/), as previously used by Magwanga et al [37,56] in the evaluation of the physiochemical properties of the Late embryogenesis abundant (LEA) genes in cotton.
Sequence alignment and phylogenetic analysis of the cotton CBF gene family An alignment of multiple CBF protein sequences from A. thaliana, Gossypioides kirkii and Theobroma cacao were generated using the ClustalW program (https://www.genome.jp/tools-bin/clustalw). A neighbor-joining (NJ) analysis of the generated alignment was performed using the unweighted pairgroup method with arithmetic mean algorithm to construct an unrooted phylogenetic tree. Bootstrap value was 1000, and other parameters were used by default value. The tree was visualized with MEGA 7.0 software [57].
Gene structure and C-terminal conserved motifs analysis Structural information for the CBF genes including the chromosomal location and gene length were determined. The exons and introns were predicted by comparing the coding sequences with genomic sequences. The conserved motif analysis of the CBF protein sequences were predicted by MEME online software (https://meme-suite.org/meme/). Moreover the Conserved Domains Database (CDD) (https://www.ncbi.nlm.nih. gov/cdd/) was employed to search for the conserved domain information of the CBF, and used the TBtools mapping tool to draw the conserved domains [58].

Retrieval and analysis of promoter sequences
The 2000 bp sequence upstream of ATG were extracted from the transcription start site of the CBF gene sequence, and submitted the obtained sequence to the PlantCARE website (http://www.dna .affrc.go.jp/PLACE/signalscan.html). Identi cation of possible cis-acting elements in the promoter region is used to identify putative cis-regulatory elements in the promoter sequence [59]. In addition, we carried out the subcellular localization prediction of all the CBF proteins by an online tool WoLFPSORT (http://www.genscript. com/wolf-psort.html) [60].

RNA extraction and qRT-PCR analysis
At three leaf stages, cold stress was imposed by subjected to cold stress by transferring the seedlings and kept at 4°C under normal light condition. The leaves were then harvested for RNA extraction at 0h, 0.5h, 3h, 6h, 12h and 24 h of post stress exposure. The total RNA was extracted using EASYspin plus plant RNA kit (Aidlab, Biotech, Beijing, China), following the manufacturer's instructions. The quality and concentration of each RNA sample was determined by Nano Drop 2000 spectrophotometer and RNA that ful lled standard at 260/280 in a range of 1.80 -2.1 was used for further analysis [38]. The primers used for qRT-PCR were designed using primer premier 5 software for all genes (Table S3). The cotton GhActin gene, forward primer sequence 5'ATCCTCCGTCTTGACCTTG3' and reverse primer sequence 5'TGTCCGTCAGGCAACTCAT3'), was used as a reference gene for the analysis. Real-time PCR reactions were carried out in a nal volume of 25 μl, using a SYBR Green master mix and an ABI7500 thermal cycler (Applied Biosystems, Foster City, CA, U.S.A.), following manufacturer's instructions. The single stranded cDNA was synthesized from reverse transcriptase with TranScript-All-in-One First-Strand cDNA Synthesis SuperMix (TransGen Biotech kit Beijing, China) for RT-qPCR in harmony with manufacturer protocol. NCBI primer blast was used to design 27 LHC gene primers. The primers details are given in (Table S1). The 7500 fast real time system was used for analysis and 10 μL of Time fastStart Universal SYBR Green Master sigma (ROX) solution (Roche Diagnostics Mannheim, Germany) per sample was used for RT-qPCR. Total reaction mixture volume was 20 μL, containing 2 μL cDNA, 6 μL RNA free water, 10 μL green SYBER, 1 μL of each forward and reverse primer. Ghactin7 was used as control [61]. The RT-qPCR completed with three technical repeats. The expression level of genes was calculated using the formula E = 2 -ΔΔCt .

Plant material
The seeds of G. herbaceum, G. thurberi and G. australe were pre-germinated in sand at 25°C for 4 days. Then seedlings were then transferred to the hydroponic facility equipped with Hoagland nutrient solution [62]. The greenhouse conditions were set at 28°C during the day/25°C at night, the photoperiod was 16 hours, and the relative humidity was 60-70%. At three-leaf stage, cotton seedlings were subjected to cold stress by transferring the seedlings and kept at 4°C under normal light condition. The leaves were then harvested for RNA extraction at 0h, 0.5h, 3h, 6h, 12h and 24 h of post stress exposure. Each treatment was repeated three times. The leaf samples were put in liquid nitrogen, frozen and stored at -80°C before RNA extraction.

GthCBF4 subcellular localization analysis
The Agrobacterium tumefaciens (LBA4404) strain GthCBF4 vector was used for transformation studies. The design of GFP fusion constructs for pCAMBIA2300-eGFP (control GFP). The total RNA (1 µg) isolated from leaves of G. thurberi was used for ampli cation of rst strand of cDNA (TransGen Biotech Beijing). Speci c primers were used to amplify, C-repeat binding factors protein (GthCBF4) gene products from G. thurberi. The oligonucleotides used for ampli cation of GthCBF4 were forward primer 5'GAGAACACGGGGGACTCTAGAATGGTTGATTCTGGGTCGGTTTCT3' and reverse primer 5'ACCCATGTTAATTAAGGATCCAATAGAATAACTCCATAAAGG3' (sigma, Henan, China). The underlined sequences (TCTAGA and GGATCC) in the primer pair represent restriction sites for XbaI and BamHI respectively. The reverse primers used for ampli cation of GthCBF4 genes were designed in a fashion so as to eliminate stop codons of these genes. The PCR ampli ed products of GthCBF4 were digested and cloned at the restriction site(s), XbaI and BamHI respectively in the vector (GFP) backbone. The PCR cycling conditions were as follows: Step (1) 95°C-3 min, (2) 95°C-15 sec, (3) 58°C-15 sec, (4) 72°C-1 min, (5) cycle to step 2 to 4 for 35 more times, (6) 72°C-5 min, (7) incubate at 4°C. The PCR was performed using Phanta Max Super-Fidelity DNA polymerase (Vazyme, Nanjing, China). To explore the subcellular localization of the GthCBF4 gene, a pCAMBIA2300-eGFP-Flag-GthCBF4 fusion vector of CBF and GFP was transiently infused in the epidermal cells of tobacco leaves, the construct was driven by the 35s promoter and transformed Agrobacterium LBA4404 competent cells. In addition, Agrobacterium competent cells expressing only the GFP gene were used as a negative control. Four-week-old tobaccos cotyledon at leaves were selected for infusion, and cultivated in dark for 24h-36h after in ltration, and then uorescence observation was performed under a laser confocal microscope.
Cloning of GthCBF4, a CBF novel gene in arabidopsis To explore the function of GthCBF4 gene in plants. Constructed a pBI121-GthCBF4 recombinant vector, transformed into the promoter cells of A. tumefaciens GV3101[63] as previously applied by Magwanga et al [7] by adopting a freeze-thaw method [64] was performed. The wild-type (WT) A. thaliana were transformed by adopting the dipping method [65] by using a 50 mg/mL kanamycin for positive selection up to the third generation (T3) At the second generation, also referred to as the T2 generation, the GthCBF4 overexpressed lines were identi ed, through polymerase chain reaction, two high-overexpressed transgenic lines were obtained, and grown to generate the stable T3 generation.
Phenotypic identi cation, physiological and biochemical parameters of WT and GthCBF4 overexpressed plants under cold stress The transgenic and wild-type A. thaliana were grown in plastic bowls for two weeks, the growth pattern between the transgenic and the wild types were not signi cantly different. The two types, transgenic and the wild type Arabidopsis were placed in an environment of -15°C for 3 hours. After the treatment, the treated plants were moved to a 4°C light incubator to thaw for 4 hours, and nally they were cultivated under normal light conditions at 23°C. After 7 days, photographs were taken to aid the counting of the survival rate of the plants. The germination rate and root length was determined under cold stress; a t-test was used to verify the signi cance of the difference in root length between the mutant and the wild types.
The trypan blue staining method [66] was adopted to re ect the cell damage under cold stress. Moreover, the DAB staining was applied to re ect the accumulation of peroxidase under cold stress. DAB staining was performed using DAB chromogenic kit (Nanjing Jiancheng Bioengineering Institute, Nanjing, China).

Statistical Analysis
The experiments were done in three biological replicates, and the data were statistically analyzed by analysis of variance (ANOVA) procedure [67], using statistical analysis software (SAS) version 9.3. The least signi cant difference test (P ≤ 0.05) was used for mean comparison.

Declarations
Ethics approval and consent to participate Relevant permits/permissions/licences were obtained from relevant institutional, national, and international guidelines and legislation. The permission to carry out this research granted through the Institute of Cotton Research, Chinese academy of Agricultural Sciences (ICR-CAAS), Anyang, China Consent for publication Not applicable.

Competing interests
The authors declare that they have no competing interests.