Genome-wide cloning, identification, classification and functional analysis of cotton heat shock transcription factors in cotton (Gossypium hirsutum)

Background Heat shock transcriptional factors (Hsfs) play important roles in the processes of biotic and abiotic stresses as well as in plant development. Cotton (Gossypium hirsutum, 2n = 4x = (AD)2 = 52) is an important crop for natural fiber production. Due to continuous high temperature and intermittent drought, heat stress is becoming a handicap to improve cotton yield and lint quality. Recently, the related wild diploid species Gossypium raimondii genome (2n = 2x = (D5)2 = 26) has been fully sequenced. In order to analyze the functions of different Hsfs at the genome-wide level, detailed characterization and analysis of the Hsf gene family in G. hirsutum is indispensable. Results EST assembly and genome-wide analyses were applied to clone and identify heat shock transcription factor (Hsf) genes in Upland cotton (GhHsf). Forty GhHsf genes were cloned, identified and classified into three main classes (A, B and C) according to the characteristics of their domains. Analysis of gene duplications showed that GhHsfs have occurred more frequently than reported in plant genomes such as Arabidopsis and Populus. Quantitative real-time PCR (qRT-PCR) showed that all GhHsf transcripts are expressed in most cotton plant tissues including roots, stems, leaves and developing fibers, and abundantly in developing ovules. Three expression patterns were confirmed in GhHsfs when cotton plants were exposed to high temperature for 1 h. GhHsf39 exhibited the most immediate response to heat shock. Comparative analysis of Hsfs expression differences between the wild-type and fiberless mutant suggested that Hsfs are involved in fiber development. Conclusions Comparative genome analysis showed that Upland cotton D-subgenome contains 40 Hsf members, and that the whole genome of Upland cotton contains more than 80 Hsf genes due to genome duplication. The expression patterns in different tissues in response to heat shock showed that GhHsfs are important for heat stress as well as fiber development. These results provide an improved understanding of the roles of the Hsf gene family during stress responses and fiber development. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-961) contains supplementary material, which is available to authorized users.


Background
Plants have developed complex transcriptional systems that are responsive to different environmental stresses [1]. WRKY [2], MYB [3], AP2/ERF [4], NAC [5], bZip [6] and heat shock transcription factors (Hsfs) [7,8] participate in these complex and overlapping processes. These transcription factors are activated and regulate the expression of thousands of genes to sustain plant growth under unfavorable conditions [9,10]. Among these transcription factors, Hsfs have attracted particular interest recently because they are involved in many aspects of protein homeostasis including refolding, assembly and transport of damaged proteins to maintain intracellular protein stability under conditions of stress [7,8,11,12].
The Hsf gene was first cloned from fruitfly larvae and exists generally in higher eukaryotes [13]. In contrast to only one to four Hsf genes in yeast and animals, more than 52 Hsf homologs have been identified in the sequenced Populus genome [14]. The diversity and multiplicity of Hsfs in plants may result from gene duplication and functional diversity during the evolution of the genome [14][15][16][17][18]. Hsfs are a type of transcription factor that is characterized by a DNA-binding domain (DBD) and hydrophobic heptad repeat regions (HR-A/B) [19][20][21]. The DBD domain is a conserved structure, which provides Hsf proteins with the ability to bind heat shock cis-elements [20]. The function of the HR-A/B domain in Hsf proteins allows them to form active homologous trimers [22]. Under a variety of stress conditions, latent Hsfs are assembled into the activated trimeric conformation [23]. The transcription factor complexes then bind to the cis-elements of the promoters of target genes such as Hsp30, 70 and 90 to activate their expression [22,[24][25][26][27]. Based on structural characteristics and phylogenetic comparisons, plant Hsfs are grouped into three main classes: A, B and C [18,19]. All of class A and C have an extended HR-A/B region with the insertion of different amino acid residues between the A and B regions (21 amino acid residues for class A and 7 for class C). In contrast to class A and class C Hsfs, the HR-A/B region in class B Hsfs does not contain any insertions. Besides the DBD and HR-A/B domains, the functional modules in Hsfs also contain putative nucleus location signal (NLS), nucleus export signal (NES) and transcriptional activation (AHA) motifs [14,28,29]. Sequence comparisons and structural analyses indicate that the combination of an AHA motif and a NES represents the signature domain in class A Hsfs [30]. Although class B and C Hsfs lack AHA motifs and they cannot selfactivate, they regulate the expressions of heat shock inducible genes through binding to their cis-elements [14].
It has been shown that Hsfs in plants serve as regulators of tolerance to biotic and abiotic stresses [31][32][33][34]. Overexpressed HsfA1b in Arabidopsis thaliana increases water productivity and harvest index under water-replete and water-limiting conditions [35]. HsfA2 in Arabidopsis controls the responses to salt, osmotic stress, anoxia and submergence [36]. Arabidopsis HsfA1a was shown to sense heat stress and pH changes directly through binding to HSP18.2 and HSP70 promoters [37]. In addition to their roles in stress tolerance, Hsfs also perform key roles in development. HsfB4 in Arabidopsis (also known as SCZ) is specifically expressed in the quiescent center, the ground tissue initials and the endodermis and cortex in the postembryonic root. In both SCZ deletion and SCZ overexpressed plants, asymmetric division required for cellfate separation is affected, demonstrating that SCZ is a regulator of cell-fate separation [38]. Another Arabidopsis Hsf protein, Hsf4, which specifically binds to the ciselement of TBF1 is required for the induction of immune response genes. Functional analysis and genome-wide expression profiling indicate that TBF1 performs a pivotal role in the transition from growth to pathogen defense [39]. Despite these efforts in Arabidopsis and tomato, the functions of most Hsf genes in plants have not been identified and characterized, probably due to functional redundancy and limited information about this gene family.
Cotton has been the major resource of natural fiber in recent decades. Sustainable cotton production is challenged by continuous high temperatures, intermittent drought and insufficient water supply [40]. Therefore, improved stress tolerance in cotton cultivars is required to reduce the impact of stress and then increase cotton productivity. Hsf genes have been proposed to encode the master regulators of biotic and abiotic stresses as well as different developmental processes in plants [14]. Previous studies have suggested the existence of a large gene family within the tetraploid Upland cotton genome, but limited data characterizing these Hsf genes in cotton has been presented [41]. In order to gain a comprehensive image of the molecular and evolutionary characteristics as well as the possible functions of the cotton Hsf family, it is necessary to clone Hsf gene families and identify their expression patterns. Recently, the full genome sequence of diploid cotton (G. raimondii) has been published [41,42]. This provides the genomic information required for of complete cloning and annotation of Hsf genes.
Here, we report the cloning of the D-subgenome Hsf genes in Upland cotton. Analysis of their expression profiles in different organs/tissues and the effects of heat shock conditions were conducted by qRT-PCR. The results of this work provide a foundation for an improved understanding of the functional structures and genomic organization of the Hsf gene family in cotton, and will undoubtedly be useful in detailed characterization of gene function.

Material and methods
Upland cotton (G. hirsutum L.) variety Coker 312 was grown in the field at the Shanghai Jiao Tong University in China. When cotton plants were in full bloom (approximately 90 days after planting), different cotton tissues including roots, stems, leaves and developing ovules at different stages were collected and used for RNA and DNA extraction.
In order to clone all members in the Hsf protein family in Upland cotton, Arabidopsis Hsf protein sequences were used to search the cotton expressed sequence tags (EST) database (http://www.ncbi.nlm.nih.gov) using tBlastN. All putative ESTs encoding Hsf proteins in Upland cotton were assembled to build the putative cotton Hsf sequences. All the putative cotton Hsf proteins were compared with the Arabidopsis Hsf protein in BlastP searches with P-values less than 0.0001 to check whether the putative Hsf gene encoded a full-length Hsf protein.
Primers were then designed to amplify the coding sequences of all the predicted cotton Hsf genes. The amplified fragments were cloned into pGEM-T Easy vector (Takara, Japan) and confirmed by DNA sequencing. In order to avoid generating the chimeric genes during PCR amplification [43], all of cloned GhHsf genes were compared with those from diploid cotton (G. arboum and G. ramondii) on the genomic level [41,44], and those genes from A sub-genome were cloned again and revised. Finally, all the sequences encoding Hsf proteins were assigned to the D-genome chromosome. Similarities at the same locus in chromosome pairs were considered to represent alleles. Cotton Hsf genes were numbered (1, 2, 3 etc.) according to their localization on the chromosomes.

Domain and protein structure analysis
The deduced amino acid sequences of cotton Hsf proteins were aligned with the Arabidopsis Hsf family using DNAMAN and ClustalX 1.83 [45]. Molecular weight, iso-electric point, functional domains, and amino acid signal peptides of cotton Hsfs were calculated using the ExPASy online servers [46] (http://cn.expasy.org/tools). A neighbor-joining (NJ) tree of Hsf proteins was constructed using the MEGA program (version 5.0) [47]. NJ analysis was performed with the Pairwise Deletion option and the Poisson correction. For statistical reliability, bootstrap analysis was conducted with 1,000 replicates to assess the statistical support for each node.
To analyze the signature domains in Hsf proteins, the cotton Hsf proteins were compared with those from Arabidopsis and Populus by amino acid alignment using ClustalW (version 1.83). The presence of DBDs and coiled-coil structures were determined using the SMART and MEME programs [48][49][50]. In order to improve the accuracy of domain analysis, MEME tools were also used to identify putative domain motifs in the full-length amino acid sequences of cotton Hsfs. Visualization of the motifs in the cotton Hsf proteins was performed by using ProSite my domains online (http://prosite.expasy.org/mydomains).

Gene duplication analysis
Cotton Hsf gene duplication during evolution was investigated using MEGA (version 5.0). Evolutionary distances between each GhHsf sequence pair were calculated by ClustalW [51]. Hsf genes duplication was indicated by (1) shared aligned sequence covering >80% of the longer gene and (2) similarity of the aligned regions >80%.

Cotton Hsf protein localization
To investigate subcellular localization of cotton Hsf proteins, one protein from each subclass including subclass B (GhHsf3) and subclass C (GhHsf31) was chosen to analysis. Considering the function diversity of subclass A, three Hsfs GhHsf39 (A2), GhHsf25 (A1c) and GhHsf34 (A4a) were also used as the representatives. The coding regions of five cotton Hsfs (GhHsf3, 25, 31, 34 39) from three classes were cloned into the pBIB-GFP vector to generate pBIB-35S::GhHsfs-GFP::Nos constructs, To test whether the expression level is changed after heat shock, constructed pBIB-GFP vectors containing the promoters and ORFs of GhHsf34 and GhHsf39 were also generated. These plasmids were then transformed into Agrobacterium strain EHA105. Three-week-old tobacco leaves were infiltrated with Agrobacterium according to a reported method [52]. Two to four days later, the subcellular localization of Hsf proteins was analyzed by confocal microscopy (Leica TCS SP5) and the fluorescence intensity was also analyzed after heat shock for one hour. .

Heat shock treatment and qRT-PCR analysis
Cotton seedcoats were removed and sterilized with 0.1% HgCl 2 , and then grown in pasteurized sand in the greenhouse (light/dark cycle: 14 h at 25°C/10 h at 22°C, respectively; 70% relative humidity). At the five-leaf stage, whole plants were subjected to heat shock treatment. The seedlings were initially treated at 45°C for 1 h, before transfer to normal growth conditions for recovery. Subsequently, at 2 h and 4 h, cotton leaves were collected for total RNA extraction.
Quantitative RT-PCR (qRT-PCR) analysis was performed using the SYBR qRT-PCR kit (Takara, Japan) in a DNA Engine Option 3 System (MJ Research, USA) according to the manufacturers' instructions. The qRT-PCR reaction contained 0.5 μg of 1 st cDNA, 1 U ExTaq, 10 pM dNTPs, 5 pM MgCl 2 and 10 pM primers. Genespecific primers (Additional file 1: Table S1) were used to amplify specific regions of different cotton Hsfs. The ubiquitin gene [52] was used as the internal control. Transcriptional expression levels were calculated using the comparative Δ CT method. Each sample was repeated at least four times, and the amplification results were analyzed by Option 3 software.

Cloning and identification Hsf gene families in Upland cotton
To clone Hsf family genes in cotton, the amino acids of Hsf encoded proteins were used in a tBlastN search of the NCBI database for EST homologs. All of ESTs from Upland cotton showing 60% similarity to Arabidopsis Hsf genes were collected for Hsf gene assembly. A total of 43 Hsf contigs, containing open reading frames encoding the proteins similar to Arabidopsis Hsfs, were assembled and identified. All the putative Hsf genes were then analyzed for the presence of HR domains and DBD structures within the encoded proteins [14]. Three contigs without these two structures were discarded and 40 cotton Hsf genes were then used as the reference for gene cloning. According to the assembled sequences of the putative cotton Hsf genes, 40 independent genes were amplified and sequenced using Upland cotton 1 st cDNA from different tissues as the template. All of the cloned genes were confirmed to be from D sub-genome after comparing with their homologues from diploid cotton (G. arboum and G. ramondii, Additional file 2: Table S2 and Additional file 3). Following comparison with the predicted Hsf genes in the D-genome, all 40 Hsf genes were then mapped to the different chromosomes in the D-genome [41], and designated GhHsf1-40 according to the order of their chromosomal localizations ( Table 1). All 40 Hsfs were distributed in 13 chromosomes and one linkage group in the cotton D-genome. Only one tandem cluster containing 4 Hsf genes was found on chromosome 4.

Conserved domains and motifs in cotton Hsfs
The typical Hsf proteins in the plant kingdom contain five conserved domains: DBD, HR-A/B region (also known as the oligomerization domain), NLS and NES motifs and AHA domain. These domains enable Hsf proteins to perform the functions associated with stress tolerance efficiently. All the cotton Hsf proteins were analyzed to detect conserved domain structures online (www.expasy.com) and MEME tools (Figures 1, 2 and 3). The DBD in Hsf proteins is required for specific recognition of the cis-elements of Hsp promoters during the transcriptional response to stress. The DBD is composed of an anti-parallel four-stranded β-sheet against three α-helices, forming a compact globular structure. Multiple alignment showed that the highly conserved DBD is located close to the N-terminal in all the cotton Hsf proteins (Table 2, Figure 1), and comprises 83, 94, 102 or 105 amino acid residues. The smallest DBD occurs in GhHsf24 and GhHsf29, 30, 40 that is composed of 83 amino acids, while the largest is in GhHsf33 comprising 105 amino acids. The DBD in the majority of cotton Hsf proteins (34 GhHsf proteins) has 94 amino acids in length, indicating that this domain is highly conserved in GhHsf proteins. Interestingly, while most GhHsfs have a short N-terminal upstream of the DBD (<30 amino acids), GhHsf1 and GhHsf28 contain long N-terminals rich in Ser (80 and 113 amino acids in GhHsf1 and GhHsf28 respectively) (Table 2, Figure 1). The function of this long N-terminal upstream of the DBD domain remains to be determined.
The HR-A/B domain, composed of several hydrophobic heptad repeats, is responsible for the interactions with Hsfs to generate Hsfs dimers or trimers through a helical coiled-coil structure. Similar to other plant Hsf proteins, cotton class B Hsf proteins are compact without an insertion between HR-A and HR-B (Table 2, Figure 2). Class A Hsf proteins have an insertion of 21 amino acids between the HR-A and HR-B regions, and seven amino acid insertions were found in class C between the HR-A and HR-B regions. GhHsf7(A1a) has the typical HR-A/B structure, consisting of L × (6aa)L × (6aa)L: RQQQ-21aa-QQ: MMSFLAK. In contrast with other reported class C proteins, the structure between HR-A and HR-B in GhHsfs has its own characteristic. The structure in GhHsf27(C1c) is L × (6aa) L × (6aa): MNKRLE(A/T) (A/T)-4aa-QQ: MMAFLY, indicating that cotton class C proteins were probably variable during evolution. Based on the characteristics of HR-A/B, we divided 40 GhHsfs proteins into classes A (n = 22), B (n = 15) and C (n = 3). The oligomerization domain of the HR-A/B region is a conserved domain close to the DBD and separated by a flexible linker. Linkers of 12 to 37 amino acid residues exist in class A, 16 to 77 residues in class B and 10 to 29 residues in class C, with the longest average linker length in class B and the shortest in class C. In class A and class C, the variable length of the linker between the DBD and the HR-A/B region offers additional support for this classification.
Two clusters of basic amino acid residues (K/R motifs) found in Hsf proteins may contribute to their dynamic intracellular distribution between the nucleus and cytoplasm [14,53]. Pfam was searched for potential NLS and NES domains in cotton Hsf proteins. As expected, all Hsfs proteins contain K/R motifs (1 to 5 repeats) ( Table 2). This indicates that GhHsfs proteins are located in the cell nucleus. Hsfs subcellular localizations are also affected by the NES. NES is a leucine-rich motif at the C-terminus required for the NES receptor-mediated nuclear export [29]. Pfam searches showed that 16 class A cotton Hsfs contain the NES signal peptide-LTEQMGLL, while the NES in class B Hsfs is typically L(G/R)LNLM.
The function of class A Hsfs as a transcription activator is mediated by short activator peptide motifs (AHA motifs) located in the C-terminal domains [54]. Previous studies have shown that AHA motifs are characterized by aromatic (W, F, Y), large hydrophobic (L, I, V) or acidic (E, D) amino acid residues. Similar to other class A Hsf proteins, all A-type GhHsfs contain an AHA motif except GhHsf1. The length of AHA motifs in 21 GhHsfs are variable and rich in F, W and D amino acid residues. The C-terminal of GhHsf1 (A9b) does not contain a typical AHA motif but includes a distinct pattern of tryptophan residues, which probably contributes to the activator function. In vitro pull-down assays have shown that AtHsfA8 is inactive in yeast monohybrid assays and it does not recruit any components of the transcription machinery [21]. This indicates that cotton HsfA9b does not regulate gene expression independently at the transcriptional level.

Phylogenetic analysis of the cotton Hsf family and Hsf gene duplication in the D-subgenome
In order to analyze the evolution of Hsfs, 28 Populus trichocarpa Hsfs (PtiHsfs), 21 Arabidopsis thaliana Hsfs (AtHsfs) and 40 Gossypium hirsutum Hsfs (GhHsfs) were used to generate an unrooted phylogenetic tree. Genome sequencing revealed that Populus trichocarpa is evolutionarily closest to Upland cotton; therefore, a phylogenetic tree was constructed based on the cotton Hsf proteins. As shown in Figure 4, Hsf proteins from G.    hirsutum, Arabidopsis thaliana and Populus trichocarpa were clearly grouped into three different classes (A, B and C). Class A is composed of 22 Hsf proteins, which were then grouped into nine distinct sub-clades (A1-A9). The C-type Hsfs from the three plant species also constituted one distinct clade which appeared more closely related to the Hsf A-group. Correspondingly, the B-type Hsfs from the three plant species was grouped into a separate clade. Class B was classified into five subgroups and class C had only three members. As expected, the duplicated cotton Hsfs clustered on the same group.
ClustalW was used to analyze the duplication events that may have occurred during the evolution of the cotton genome. Ten duplicated gene pairs of the 40 cotton Hsf genes were identified between chromosomes, no duplication events within the same chromosome ( Figure 5). Chromosome 7 contained the most duplication events, while chromosome 1, 12, 13 were not involved in any duplication events. GhHsf3 and GhHsf7 participated in two duplication events. GhHsf3 was duplicated with GhHsf35 and GhHsf22, GhHsf7 with GhHsf21 and GhHsf25. Class A proteins contained five duplication events, class B contained four and class C contained one. These results indicate that single gene duplication events are responsible for the expansion of the Hsf gene family in cotton.
Gene structure and mutation analysis of cotton Hsfs compared with G. hirsutum and G. raimondii In plants, most genes are interrupted genes with one or more exons and several introns. The arrangement of intron and exon localization can be used to analyze the evolutionary relationships among different gene members. In order to analyze the gene structures, all the GhHsf genes were compared with the genomic sequence of G. raimondii. The results showed that 38 Hsf genes contain one intron, and only two Hsf genes contain two introns (Table 3). No intronless Hsfs were found in the cotton genome. Most GhHsf genes that were clustered in the same subfamilies shared strikingly similar exon-intron structures. For example, GhHsf genes in class A1 contain one intron and two exons. The intron patterns, which correlate well with the phylogentic clades, strongly support the existence of close evolutionary relationships within the same subfamily. The intron-exon patterns of nine duplicated genes on different chromosomes were also conserved, with the exception of the duplication pair between GhHsf2 and GhHsf29. One intron was inserted in GhHsf2 on the chr2 origin of GhHsf29 (Chr9) during evolution. Previous reports have shown that the conservation of exon-intron structure in subfamilies of paralogous genes exists in the maize WRKY transcription factor family [55]. This characteristic in the same subfamily is important for gene divergence. The different gene structure associated with the different subfamilies may be the result of gene expansion from ancient paralogs or multiple origins of gene ancestry.
The nucleotides in the coding regions of GhHsfs and GrHsfs were compared to analyze the mutation frequency of all the Hsf genes. The mutation frequency of Hsf genes is 0.00996 during the evolution from G. raimondii to G. hirsutum. The synonymous and non-synonymous substitutions are 0.00433 and 0.00564, respectively. This indicates that the rate of nucleotide substitution has increased in allotetraploid genomes relative to the diploids, and that the rate of non-synonymous substitutions is higher than that of synonymous substitutions. This result is consistent with the molecular evolutionary analyses of protein-coding regions at the genome level.

Protein localization analysis of cotton Hsf proteins
In order to investigate the subcellular localization of cotton Hsfs, five genes (GhHsf3, 25, 31, 34 and 39) from three classes were chosen to generate GFP fusion constructs (pBIB-35S::GhHsfs-GFP::NOS). The constructs were introduced into Agrobacterium EHA105 and infiltrated into tobacco leaf cells for protein localization analysis by confocal laser scanning microscopy. Three types of localization were identified among the five Hsfs ( Figure 6). GhHsf3 (B1a) and GhHsf31 (C1a) were strongly expressed only in the nucleolus. GhHsf25 (A1c) and GhHsf39 (A2), which contain NES motifs, were strongly expressed in the plasma membrane and nucleolus. GhHsf34 (A4a) was expressed in the plasma membrane, nucleolus and cytoplasm, and was also observed in the scaffold. The localization of GhHsf proteins is consistent with their protein structure [53,56,57]; that is, GhHsf25, 34 and 39 have NES motifs, while GhHsf3 and GhHsf31 do not.
The effect of heat shock on protein expression was analyzed. At 3 days after infection with the GFP fusion constructs, tobacco plants were treated with heat stress for 1 h and then transferred to normal conditions. The GFP signals of all five Hsf fusions were enhanced significantly after heat treatment (Additional file 4: Figure  S1). This result revealed that protein expression levels were also enhanced after heat shock, and that different cotton Hsf proteins may perform different roles in stress tolerance associated signal transduction during heat stress.

Expression profiles of cotton Hsf genes in different tissues
Analysis of the tissues expression profiles of GhHsfs by qRT-PCR showed that most GhHsfs were expressed in Table 3 Analysis of GhHsfs exon-intron structures Gene Locus in G.ramondii Exon 1 Intron 1 (bp) Exon 2 Intron 2 (bp) Exon 3

GhHsf2
Gorai. all the tested tissues including root, stem, leaf and ovules. All of GhHsf genes were highly expressed in the leaves, and none of the genes exhibited restricted expression in a single tissue (Figure 7). Interestingly, most of GhHsf genes were expressed at very low levels in the root, with the exception of GhHsf31, 32, the expression of which was approximately three times higher in the root than that in other tissues. In addition, analysis of the digital data showed similar expression of duplicated genes located on different chromosomes, such as GhHsf2 and GhHsf29.
Both of these genes exhibited highest expression in the ovules and lowest in other tissues.
The accumulation of reactive oxygen species (ROS) is implicated in cotton fiber development [58]. To investigate the involvement of GhHsfs in the cotton fiber development, a comprehensive analysis of their expression was performed in a WT (Xu-142) and fiberless mutant Xu-142 fl. The results showed that most genes had no difference of expression in a comparison of Xu-142 and the fl mutant, with the exception of GhHsf1, 2, 4, 6, 13, 16, 18, 19, 26, 28, 33 and 39 ( Figure 8). Among these twelve genes, the most significant difference between Xu142 and fl mutant was observed for GhHsf1, with approximately six times greater expression in the WT during fiber initiation (from -3DPA to 3 DPA) compared with that in the fl mutant. In terms of the abundance of gene expression during fiber initiation, the most abundant expression of GhHsf39 was more than 1000-times greater than that of GhHsf1, indicating that GhHsf39 may act as an important role like recovering oxidative stress or development signal etc. during fiber initiation

Expression analysis of cotton Hsf genes under heat shock
The expression patterns of cotton Hsf genes during heat stress were analyzed by qRT-PCR. Three patterns of expression were observed among the cotton Hsf gene families after heat treatment for 1 h, followed by recovery for 2 to 3 h (Figure 9). The gene expression patterns of GhHsf4, 7, 10, 25 and 38 were not changed significantly. However, the gene expression patterns of GhHsf3, 13,18,21,22,24,27,32,35,37 and 40 were inhibited after heat treatment, while those of the remaining genes were strongly up-regulated. The up-regulated genes were assigned to two categories according to the time at which the increase occurred. The expression of GhHsf1, 6,8,9,17,20,26 and 39 increased instantly in response to heat treatment, and decreased quickly during the recovery process. The highest increase (400-fold) of was observed for GhHsf39. The other 16 genes (GhHsf2, 5, 11, 12, 14-16, 19, 23, 28-31, 33, 34 and 36) were inhibited after heat treatment, and their expression levels slowly increased during the recovery process. These results provide an essential clue to the functional diversification of several Hsfs such as GhHsf1, 6 and 8 as the part of heat stress signaling system, while GhHsf2, 5, 11 and 12 proteins play critical roles in protein refolding.

Discussion
Cotton contains the highest number of Hsf family members among the sequenced plant genomes Upland cotton (G. hirsutum) is an important commercial cotton species, accounting for approximately 95% of all cotton production worldwide. Upland cotton originated from A-genome diploids native to Africa and D-genome diploids such as G. raimondii native to Mexico diverged about 5 to 10 million years ago. These two genomes were then reunited approximately 1 to 2 million years ago and generated tetrapolid Upland cotton (2n = 4X = AADD) [41,59]. Due to the high similarities in the gene sequence and genome organization between the A and D genomes, the publication of the D-genome sequence provides a useful tool to analysis gene function in Upland cotton. In this study, we cloned and analyzed 40 Hsf genes from the Upland cotton D-subgenome including 22 class A, 15 class B and three class C members.
Previous studies have indicated that the increase in the number of transcriptional genes is an important event during the evolution of complex plant systems. It is hard to achieve the expansion of transcriptional regulating genes through single gene duplications alone, indicating the importance of genome duplications in the process of gene expansion. It was estimated that more than 90% of the increase in transcriptional regulating genes over the last 150 million years results from genome duplication in the Arabidopsis lineage. Comparison of the Hsfs in the Dsubgenome of Upland cotton with predicted genes in the G. raimondii genome indicates that there is no gene-loss during tetrapolid Upland generation. As tetrapliod cotton with an "A" and "D" subgenome, Upland cotton contains at least 80 Hsf genes that originate from an ancestral polyploidy event about 2 million years ago [41,59].
The D-subgenome contains twice as many members as Arabidopsis and four times as many members in Upland cotton due to the genome polyploidation. In addition, 10 duplications were found between different chromosomes among three different Hsf gene classes, duplications in cotton being of the segment type rather than tandem gene or cluster duplication. Gene duplication among both different chromosomes and subgenomes has contributed to the Hsf family being the largest among the reported plant genomes.
Evolutionary analyses of protein-coding regions demonstrated that the rate of nucleotide substitution has increased in allotetraploid genomes relative to the diploids. The ratio of nonsynonymous substitutions is higher than those of synonymous mutations [60]. Nucleotide substitutions in the Hsf gene family in Upland cotton also follows this rule, indicating that mutations in the Hsf genes are increased under artificial selection. The total mutation frequency of Hsf genes (0.00996) is slightly lower than the frequency of whole genome mutations from G. raimondii to G. hirsutum [41,60], supporting the view that Hsf is a highly conservative gene family.
Previous reports have indicated that there are 406 myb genes in G. raimondii compared with 163 genes in the Arabidopsis MYB superfamily. Furthermore, the subgroup 9 MYB family has six members known only in cotton, comprising a possible 'fiber clade' distinct from the Arabidopsis thaliana GL1-like subgroup 15, which is involved in trichome and root hair initiation and development [41]. Similar to the unique MYB clade, the Figure 9 Expression analysis of GhHsfs in developing leaves during heat stress. Quantitative RT-PCR was performed to analyze the expression level of GhHsfs during heat stresses in young leaves (the third leaf of 20-day-old seedlings). The vertical axis represents the fold-change in expression relative to CK (1-fold). The horizontal axis represents the different times of treatment. CK: untreated young leaf; 1 h: young leaves were subjected to 42°C for 1 h. 2 h, 4 h: young leaves were returned to normal growth conditions and recovered for 2 h and 4 h after heat stress. Results were normalized using ubiquitin gene expression as the internal control (error bars indicate standard deviation).
Upland cotton D-genome contains three Hsf family members including HsfB4e, HsfA1f, HsfA9c, which are unreported in other organisms. HsfA9 is a specialized Hsf of embryogenesis and seed maturation, which represents functional diversification in the Hsf family [61].
HsfA9 is known to be controlled by transcription factor ABI3 (abscisic acid-insensitive 3) in Arabidopsis [62]. Ectopic expression of HsfA9 resulted in up-regulation of heat shock protein (Hsp) expression and Hsp101 accumulated in leaves under unstressed conditions, while over-expression of sunflower HsfA9 in tobacco seeds improved seed longevity through Hsp accumulation. Therefore, it can be speculated that HsfA9 controls seed development and longevity through interactions with other proteins such as DREB2 or ABI3. In cotton, there are three different HsfA9 members with unique HsfA9C (GhHsf11). GhHsf11 and GhHsf1 (assigned to HsfA9c and HsfA9b, respectively) are upregulated in the recovery process after heat shock and are also strongly induced after double fertilization. Upland cotton is vulnerable to heat stress because cotton begins to flower in summer. The impact of heat stress on cotton is to delay crop maturity and reduce overall lint yields and quality. These two genes probably have unique roles of reducing heat stress injury during summer bolling development.
GhHsf39 is an early heat shock response gene Heat stress causes water deficit, leaf senescence and even male infertility in cotton, and it becomes a serious handicap for cotton production [63]. Hsfs are the major regulators of heat shock protein transcription in plants responding to cellular stresses like increased temperature. The functional diversification within the cotton Hsf gene family was investigated by qRT-PCR analysis of the expression patterns under heat stress. Three distinct patterns of expression in response to heat shock were observed among the Hsf gene family investigated. In the rapidresponse type, the Hsf genes expression levels were instantly and markedly upregulated after heat shock and decreased quickly when heat stress was lifted. Eight of the Hsf genes responded according to this type, with expression levels that were increased to more than 10 times those in untreated plants. Among these genes, GhHsf39 (A2) is a typical member, the transcripts of which were enhanced by approximately 400 times in 1 h after heat stress. HsfA2 is considered a key regulator of heat tolerance in tomato and Arabidopsis due to its high activation of Hsp gene transcription and its continued accumulation during heat stress. Arabidopsis HsfA2 is localized in the nucleus and regulated by itself and HsfA1a-e [14,56]. According to the expression changes observed during heat stress, we deduced that GhHsf39 has a similar function to HsfA2 in tomato and Arabidopsis. In contrast to Arabidopsis, class B2 Hsf genes in cotton, including GhHsf 20, 6, 9 and 26 (B2a-B2d), respond rapidly to heat stress. Domain analysis shows that the Hsfs in class B lack the AHA activator domain and it is possible that these proteins serve as transcriptional coactivators with class A Hsfs, although the functional roles of these four class B Hsfs in cotton require further investigation. In the later-response type, the Hsf gene expression levels were not instantly upregulated or inhibited after heat shock, but slowly and continuously increased during plant recovery from cell damage. These results indicated that different types of GhHsfs probably have different roles in protein refolding under abiotic stresses.

Hsf proteins act as ROS regulators during fiber development
Previous studies have shown that hydrogen peroxide (H 2 O 2 ) and other ROS serve as developmental signals for the onset of secondary wall differentiation [64,65]. H 2 O 2 and other ROS at appropriate concentrations are also required for cell elongation probably through cleaving polysaccharides to relax the cell wall [66,67]. Many genes, such as GhAPX1, involved in modulating ROS concentrations are upregulated at both the transcriptional and translational levels in cotton. GhAPX1 is implicated in deto xification of H 2 O 2 produced by quick-fiber elongation, which is supported by the observation that enhanced transcript abundance and enzymatic activity of GhAPX1 during fiber elongation as well as fiber length can be improved by exogenous H 2 O 2 [58]. The observation that the H 2 O 2-scavenger activity associated with the APX2 gene can be regulated directly by Hsf class A has been confirmed in Arabidopsis. In this study, 28 Hsf genes were strongly expressed during ovule and fiber development. Moreover, 12 genes, including GhHsf1, 6, 16, 19, 33 and 39 exhibited significant differential expression during fiber initiation in Xu142 and its corresponding fl mutant. Among these genes, the most significant difference between Xu142 and fl mutant existed in GhHsf1, the expression of which was approximately six times greater in the WT than in the fl mutant during fiber initiation (from -3 DPA to 3 DPA). During fiber initiation, GhHsf39 was expressed most abundantly at more than 1000-times the levels of GhHsf1, indicating a predominant role for GhHsf39 in this process. Transcriptomic and proteomic studies have confirmed high expression of several Hsps at the stages of fiber initiation and elongation. Although the hypothesis that GhHsf regulates these Hsps directly during fiber development needs to be confirmed in detail, our results indicate that Hsf proteins act as ROS regulators during fiber development.

Conclusions
The complexity of the Hsf family has been the subject of many investigations in different plant species. In this