Genome-wide identification, classification, and expression analysis of the HSF gene family in pineapple (Ananas comosus)

Transcription factors (TFs), such as heat shock transcription factors (HSFs), usually play critical regulatory functions in plant development, growth, and response to environmental cues. However, no HSFs have been characterized in pineapple thus far. Here, we identified 22 AcHSF genes from the pineapple genome. Gene structure, motifs, and phylogenetic analysis showed that AcHSF families were distinctly grouped into three subfamilies (12 in Group A, seven in Group B, and four in Group C). The AcHSF promoters contained various cis-elements associated with stress, hormones, and plant development processes, for instance, STRE, WRKY, and ABRE binding sites. The majority of HSFs were expressed in diverse pineapple tissues and developmental stages. The expression of AcHSF-B4b/AcHSF-B4c and AcHSF-A7b/AcHSF-A1c were enriched in the ovules and fruits, respectively. Six genes (AcHSF-A1a , AcHSF-A2, AcHSF-A9a, AcHSF-B1a, AcHSF-B2a, and AcHSF-C1a) were transcriptionally modified by cold, heat, and ABA. Our results provide an overview and lay the foundation for future functional characterization of the pineapple HSF gene family.


INTRODUCTION
The living environment of plants is faced with many challenges, including cold, heat, drought, and salinity stresses (Hu & Xiong, 2014;Pereira, 2016;Zhu, 2016). Due to global warming, heat stress is becoming a serious agricultural threat for agricultural production and planting areas worldwide (Wahid et al., 2007). Typically, plants face heat stress when the temperature rises 10 to 15 degrees above the optimum growth environment. Heat stress affects plant development and growth and eventually leads to a decrease in crop yield. Consequently, as a defense or signaling mechanism in response to environmental stresses, plants regulate the expression of several genes through different transcription factors (TFs). The heat shock transcription factor family of plants is involved in heat stress response and regulates the expression of several stress-responsive proteins, including heat shock proteins (HSPs), ascorbate peroxidase (APX), and catalase (CAT) (Ohama et

Identification and Characterization of AcHSF genes in pineapple
The amino acid sequences of pineapple HSFs were retrieved using the HSF-type DBD domain (Pfam: PF00447) as a query in Phytozome JGI. A total of 30 pineapple HSFs were obtained from JGI. Only 22 AcHSFs left, after filtering out the non-typical HSF-type DBD and repeated sequences or canonical coiled-coil structures by SMART online tool (Letunic, Doerks & Bork, 2012). The information of Chromosome localization, CDS, and AA length for AcHSFs was obtained from JGI Phytozome v12.1. The biophysical properties of coding AcHSFs were calculated using the Expasy ProtParam tool. The subcellular localization of AcHSFs was analyzed using BUSCA (http://busca.biocomp.unibo.it/).

Chromosome Localization phylogenetic relationships
The information of all pineapple HSFs' chromosome localization site was acquired from Phytozome v12.1, including chromosome length, chromosome location, and gene start site. The MapChart v2.0 (https://mapchart.net/) was adopted to map the chromosomal location. The phylogenetic relationship of different HSF proteins was explored, and the phylogenetic tree was created using the AcHSF amino acid sequences and the other three species, Arabidopsis thaliana, A. thaliana, Oryza sativa (O. sativa), and Populus. trichocarpa (P. trichocarpa) by the MEGAX with a bootstrap value of 1000. The HSF gene in pineapples was referred to as AcHSF genes and classified according to HSFs in the phylogenetic tree classes A, B, and C.

Genetic structure and cis-acting elements
The gene structures of AcHSF, including exons, introns, and UTR were displayed by the GSDS online tool (Guo et al., 2007). The promoter sequence of AcHSFs found in Phytozome is located 2kb upstream of the translation initiation site. These sequences were analyzed using a plant cis-acting element database New PLACE (Higo et al., 1999), to identify cis-elements necessary for gene expression, development, and hormone signaling under abiotic stress.

Conserved domains and motifs analysis of AcHSFs
Clustal X 2.0 and DNAMAN software was used to align and edit the DBD domain and HR-A/B regions (OD). Using the cNLS online tool, NLS domains were predicted and the NetNES 1.1 online server identified NES domains in the AcHSFs. MEME server (http://meme-suite.org/) was applied to define the conserved motifs of AcHSFs following the parameters: the number of repetitions = any, the maximum number of motifs = 10, minimum width ≥10, maximum width ≤50, and motifs with an E-value <0.01 were retained.

Expression patterns analysis
The transcriptomic data generated from different organs and developmental stages of pineapple have been described previously (Ming et al., 2015;Wang et al., 2020). Briefly, the different organs include 3 stages of petal tissues, 4 stages of sepal tissues, 6 stages of stamen tissues, 7 stages of ovule tissues, 7 stages of gynoecium tissues, and root, mixed flower, leaf, and 6 stages of fruit were used to generate heatmap using the pheatmap package of R software.

Stress treatments
One-month-old uniform tissue-cultured seedlings in a rooting medium were used for the stress treatment analyses (Priyadarshani et al., 2018). The stress treatments of pineapple seeding as follows: cold (4 • C), heat (45 • C), and ABA (100 mM). Leaves were collected from three independent plants at 12 h, 24 h and 48 h after treatment and immediately stored in liquid nitrogen before RNA extraction.Untreated pineapple seedlings of the same size were used as controls.

RNA extraction and qRT-PCR
Total RNA of pineapple leaf tissues was obtained using the RNeasy Plant Mini Kit (50) (Qiagen). For cDNA synthesis, a total of 1 µg RNA per sample was used with cDNA Synthesis SuperMix (Transgen, Beijing, China). qPCR was conducted by the Bio-Rad CFX manager machine using TransStart R Top Green qPCR SuperMix (Transgen, Beijing, China). Pineapple Actin2 is used as the reference gene for qPCR (Wang et al., 2020). There are total of three biological replicates for each sample, and the results are shown as the mean ± standard deviations.

Genome-wide identification of HSF genes in pineapple
HSF-type DBD domains (Pfam: PF00447) amino acid sequences were submitted into Ananas comosus v3 Phytozome database v12.1. A total of 30 putative pineapple HSF sequences were acquired. And then, checked by the SMART online tool and Pfam database, 1 pineapple HSF sequence was rejected due to the absence of typical HSF-DBD domains, and 7 HSF sequences were abandoned due to the absence of coiled-coil structure. As a result, 22 non-redundant pineapple HSFs were identified ( Table 1). The comprehensive information of these 22 AcHSF genes, including gene name, gene ID, CDS and protein length, isoelectric points, molecular weights, predicted subcellular location, and other features are presented in Table 1. The gene with the longest amino acid length is ACHSF-A5, which contains 601 amino acids and also has the largest molecular weight of 65.81 KDa; and the gene with the shortest amino acid length was AcHSF-A9b, which contains 129 amino acids and also has the smallest molecular weight of 13.77 KDa. Prediction of protein isoelectric points (pI) can aid in the purification and isolation of proteins. The predicted isoelectric points (pI) of AcHSFs ranged from 4.68 (AcHSF -B2b) to 9.63 (AcHSF -B4c). Detailed information on other parameters has been given in Table 1.
According to the detailed gene information, 19 AcHSF genes were mapped to the 11 pineapple chromosomes and 3 AcHSF genes located in the scaffold ( Table 1). The number of pineapple HSF genes for each chromosome varied significantly, and there is no discernible pattern in the location of these genes on chromosomes. For example, three AcHSF genes were located in chromosome 5, whereas only one was present in chromosomes 2, 6, 17, and 18 respectively (Fig. 1).

Phylogenetic analysis of AcHSFs gene family
A phylogenetic analysis of 31 Populus trichocarpa HSFs, 25 rice HSFs, and 21 Arabidopsis HSFs was performed to classify the phylogenetic relationships (Guo et al., 2016), together with those of AcHSFs by generating a neighbor-joining phylogenetic tree. The HSFs were grouped into three clusters, A, B, and C, according to the difference between the amino acid sequences of the DBD domain, the HR-A/B region, and the linker between them (Guo et al., 2016;Scharf et al., 2012). Class A consisted of 10 subclusters, designated A1 to A10. Class B contained B1 to B4, and Class C comprises C1 and C2 sub-clusters. In pineapple (Ananas comosus), according to their phylogenetic relationship, 12 AcHSFs out of 22 proteins belong to class A, followed by seven AcHSFs belonging to class B, and three copies of class C (Fig. 2). As a monocot, the pineapple was more similar to rice, rather than the dicot Arabidopsis and Populus trichocarpa. However, none of the AcHSFs were found in the subclass A8 and B3, which was reported to only exist in the monocots (Li et al., 2014). It is strange that the pineapple and rice subclass A7 HSFs showed higher similarity to A2 rather than the Arabidopsis and Populus trichocarpa subclass A7, and the AtHSF-A6a also shows abnormal clustering (Fig. 2).

Conserved domains and motifs of pineapple HSFs
The modular structure of the HSFs contains 5 typical conserved domains: DBD, OD, NLS, NES, and AHA domains from N to C-terminal (Table 3). The most conserved DBD domain composed of approximately 100 amino acids, containing three α-helices and a four-stranded antiparallel β-sheet (α1-β1-β2-α2-α3-β3-β4) (Fig. 4A). In addition to the DBD domain, the HR-A/B next to the DBD domain is also important and plays a crucial role in HSF-HSF interaction (Scharf et al., 2012). Besides, HR-A/B also presents in all AcHSFs (Table 3, Fig. 4B). According to the previous studies, HSFs were artificially divided into A, B, and C classes by the distinction between the HR-A and HR-B motifs (Cheng et al., 2015;Giesguth et al., 2015;Singh et al., 2012). In general, the variable length of the flexible linker between parts A and B of the HR-A/B motif of classes A and C HSFs is approximately 15 to 80 amino acids, while the HR-A/B region is tightly connected without the embedded sequence in the middle in class B members. But strangely, the insert lengths between the HR-A and HR-B have almost no difference in pineapple HSFs (Fig. 4B). And the length of the total HR-A/B domain is about 42 amino acids almost the same in pineapple classes A, B, and C HSFs, while the length of classes A and C HSFs is about 50 amino acids and 29 amino acids of class B HSFs in Arabidopsis, rice and soybean (Chauhan et al., 2011;Guo et al., 2008;Jin, Gho & Jung, 2013;Li et al., 2014). The nuclear localization signals (NLS) and nuclear export signals (NES) are necessary for proteins to import and export the nucleus. The intracellular distribution of HSFs varies dynamically between the nucleus and the cytoplasm, depending on nuclear import and export balance. (Heerklotz et al., 2001;Scharf et al., 1998). After detecting, almost all the HSFs contained NLS sequences rich in basic amino acid residues (K/R), except for AcHSF-B2a, AcHSF-B2b and AcHSF-B4b. However, a total of 8 AcHSFs did not find the NES motifs. As reported in other plants, the transcription activator AHA motif was only located in class A AcHSFs, but the difference is AcHSF-A3 lacks the AHA motif (Table 3).
In addition to the typical conserved domains of HSF, we also detected the putative motifs by Multiple Em for Motif Elicitation (MEME). A total of 10 different motifs were identified in AcHSFs with lengths ranging from 20 to 50 aa (Fig. 5). The motif composition of the same group members is similar, but there are great differences among different group members. The conserved motifs in HSFs indicated that all AcHSFs contained motif 1, motif 2, except for AcHSF-A9a and AcHSF-B4c lack of motif 1. Motif 3 only exists in class A and C HSFs, not in class B. However, motif 7 only present in class A HSFs, and motif 5 only presents in class B HSFs. Additionally, some motifs were only discovered in a certain subfamily of AcHSFs, for example, motif 9 was present in the B1 subclass (Fig. 5).
The results showed some genes are highly expressed in certain tissues, while others are expressed gradually with the development of tissues (Fig. 6). For example, AcHSF-A1c and AcHSF-A7b have high expression levels in 7 fruit tissues, the expression of AcHSF-A9a gradually increased in petal development and have the highest expression value in the P3 development stage. The AcHSF-B4b and AcHSF-B4c are highly expressed in the 7 ovule development stages, which illustrate their important roles in the pineapple ovule development process. We also found that some genes showed tissue-specific expression patterns, such as the AcHSF-B2a was mainly expressed in the fruit S7 stage, AcHSF-A2 and AcHSF-A6 are highly expressed in leaf and flower tissues. In addition, the expression  Figure 5 The conserved motif analysis of 22 AcHSFs. A total of 10 conserved motifs were identified using Multiple Em for Motif Elicitation (MEME). This is the combined match p-value. The combined match p-value is defined as the probability that a random sequence (with the same length and conforming to the background) would have position p-values such that the product is smaller or equal to the value calculated for the sequence under test. Full-size DOI: 10.7717/peerj.11329/fig-5

Expression profiles of AcHSFs response to various stresses
To extend our understanding of AcHSFs in response to stresses, we performed qRT-PCR to investigate the expression patterns of 6 randomly selected AcHSF genes (AcHSF-A1a, AcHSF-A2a, AcHSF-A9a, AcHSF-B2a, AcHSF-B4a, and AcHSF-C1a) in heat, cold and ABA stresses. The results illustrated that almost all of the selected AcHSFs showed similar expression patterns under the same stress conditions. Cold stress drastically affects plant growth and development, and leads to a significant reduction in crop yield (Cai et al., 2015); therefore, plants must respond quickly to cold stress. As shown in the results, under the cold stress treatment, the expression of all the 6 AcHSFs increased rapidly from 0 h to 24 h and then reduced at 48 h (Fig. 7A). These may indicate that AcHSFs are commonly up-regulated within a short-timer by cold stresses, and then the expression is down-regulated rapidly. Heat shock transcription factors play crucial roles in response to heat shock induction. The result showed that the expression of 6 AcHSFs continues to increase from 0 h to 48 h in heat stress treatment (Fig. 7B). After ABA treatment, the expression of most selected AcHSF genes increased from 0 h to 12 h, and then decreased after 12 h, while the expression of AcHSF-A2a continued to increase (Fig. 7C). This result implies that the expressions of AcHSFs were suppressed under the longtime ABA treatment and might play crucial roles in different stress (cold and heat, etc.) response pathways.

DISCUSSION
During its growth and developmental stages, the pineapple is severely destroyed by various abiotic stresses (cold, heat, drought, etc.) and biotic stresses (especially fungal pathogen infection). HSFs are among the critical regulatory components of various abiotic and biotic stresses in plants. This research identified and characterized, for the first time, a systematic genome-wide review of the AcHSF family. Consequently, from the pineapple genome, a total of 30 AcHSF genes were identified. The widely accepted model of HSFs defines the necessity of HSF-type DBD and OD characterized by a coiled-coil structure. Thus, due to the absence of HSF-type DBD domains and/or coiled-coil structures, 8 of them were discarded. Meanwhile, pineapple HSF has a similar subfamily distribution compared with the monocots plant O. sativa, but is different from dicots plants A. thaliana and P. trichocarpa. Some genes are unique to monocots or dicots. For example, the subclasses AcHSF-A8 and AcHSF-B3 are confined to dicots, while AcHSF-A9 and AcHSF-C2 are characteristic of monocots, suggesting that different evolutionary events of HSF genes occurred in dicots and monocots (Fig. 2, Table 1).
Research on gene expression regulation mediated by introns has made significant progress (Le Hir, Nott & Moore, 2003;Li et al., 2019;Rose, 2008;Shaul, 2017). Therefore the study of gene structure is beneficial to elucidate the gene function. Analysis of AcHSFs gene structures revealed that most of the classes A AcHSFs contain more than one intron, and several AcHSFs have 3 or 4 introns, such as AcHSF-A1a, AcHSF-A1b, and AcHSF-A9a. However, the genes in the class B and C only contain 1 intron, except for AcHSF-C1b (Fig.  3). This particular intron structure may be due to the specific functions of the AcHSF genes. The required DBD domain and unique protein domains (HR-A/B, NLS, NES, RD, and AHA) are found in all 22 AcHSF proteins (Table 3, Fig. 4), which provide the structural basis for their conserved function (Giorno et al., 2012). The HSF DBD domain contains approximately 100 amino acids and is strongly preserved in various plant-to-animal species; we also found the same conserved domain in pineapple (Fig. 4A). As reported previously in other plants, the transcription activator AHA motif was only located in class A AcHSFs, but AcHSF-A3 lacks the AHA motif (Table 3). The HSFs that lack AHA domains might contribute to the activator's function differently or form hetero-oligomers by binding to other HSF-As (Guo et al., 2008). The expression patterns analysis of different AcHSFs showed that AcHSF-B4b and AcHSF-B4c are highly expressed in 7 ovule development stages, indicating the potential functions in pineapple ovule development. The high expression levels of AcHSF-A7b, AcHSF-C2, and AcHSF-A1c in fruit development stages uncovered their important roles in fruit development (Fig. 6). Furthermore, we found that the expression of AcHSF-A9a gradually increased throughout the development stage and reached the highest expression level in the third stage of petal development. AcHSF-A2 and AcHSF-A6 have high expression levels in leave and mixed flower tissues (Fig. 6). These data suggest that AcHSFs may regulate several developmental processes. The stress response is very significant for plant growth and development. Previous studies have shown that the HSF genes are involved in several abiotic stress response, including heat, cold, drought, and salt stress responses in different plants such as Arabidopsis, tomato, apple, Populus euphratica, and Phyllostachys edulis (Fragkostefanakis et al., 2015;Giorno et al., 2012;Ikeda, Mitsuda & Ohme-Takagi, 2011;Xie et al., 2019;Xue et al., 2014;Zhang et al., 2016a). In our study, most of the selected AcHSFs showed similar expression patterns under the same stress conditions. Under the cold stress (4 • C) treatment, the expressions of AcHSFs were induced from 0 h to 12 h and then inhibited after 12 h (Fig. 7A). The same expression pattern was also observed in the 100 mM ABA treatment, but the difference was that the AcHSFs were more sensitive to ABA treatment (Fig. 7C). The continuous increase in the expression pattern of AcHSFs was observed at 45 • C treatment, indicating that heat stress-induced the expression of AcHSFs (Fig. 7C).
Taken together, this study is the first to identify the AcHSF family genes, their properties as well as their expression profiles. This information could be used to utilize them as potential candidates in a breeding program of pineapple. However, gene expression and function analysis are complicated biological mechanisms, and additional studies are necessary to interpret the regulatory process.

CONCLUSIONS
In the present study, 22 AcHSF genes were identified in pineapple (Ananas comosus) and generated detailed information on the gene and protein structures. The expression profiles of various tissues and developmental stages were analyzed by the RNA-seq data, which may help to study their functions in different developmental processes or regulatory pathways. We also showed that some AcHSF genes participate in various biotic and abiotic stresses (heat, cold, and ABA), which may help develop new pineapple varieties with desired agronomic traits stress tolerance.