Genome-wide identification and expression analysis of the BURP domain-containing genes in Gossypium hirsutum

Many BURP domain-containing proteins, which are unique to plants, have been identified. They performed diverse functions in plant development and the stress response. To date, only a few BURP domain-containing genes have been studied, and no comprehensive analysis of the gene family in cotton has been reported. In this study, 18, 17 and 30 putative BURP genes were identified in G. raimondii (D5), G. arboreum (A2) and G. hirsutum (AD1), respectively. These BURP genes were phylogenetically classified into eight subfamilies, which were confirmed by analyses of gene structures, motifs and protein domains. The uneven distribution of BURPs in chromosomes and gene duplication analysis indicated that segmental duplication might be the main driving force of the GhBURP family expansion. Promoter regions of all GhBURPs contained at least one putative stress-related cis-elements. Analysis of transcriptomic data and qRT-PCR showed that GhBURPs showed different expression patterns in different organs, and all of them, especially the members of the RD22-like subfamily, could be induced by different stresses, such as abscisic acid (ABA) and salicylic acid (SA), which indicated that the GhBURPs may performed important functions in cotton’s responses to various abiotic stresses. Our study comprehensively analyzed BURP genes in G. hirsutum, providing insight into the functions of GhBURPs in cotton development and adaptation to stresses.


Background
Upland cotton, the most widely cultivated cotton in the world, is an important worldwide commercial crop for its natural fiber and edible oil. However, the growth of upland cotton was frequently challenged by diverse environmental stresses (such as drought, salinity and heat) during the whole developmental process which ultimately compromised economic yield [1]. BURP domaincontaining proteins are known to play important roles in plant development and stress responses [2][3][4]. The BURP domain-containing proteins were characterized by the conserved C-terminal domain, which was named based on four typical members: BNM2 (a microspore protein in Brassica napus) [5], USP (an unknown seed protein in Vicia faba) [6], RD22 (a dehydration-responsive protein in Arabidopsis thaliana) [2] and PG1β (a non-catalytic βsubunit of the polygalacturonase isozyme 1 in Lycopersicon esculentum) [7]. To date, BURP domain-containing proteins have been found only in plants, indicating that these proteins may have specific functions in plant development.
Generally, BURP domain-containing proteins contained several conserved modules: a putative signal peptide at the N-terminal hydrophobic domain; the BURP domain at the C-terminus, which included several conserved amino acid sites and four repeat cysteine-histidine (CH) motifs, namely, CHX 10 CHX 23-37 CHX 23-26 CHX 8 W (X = any amino acid residue); the variable region in the middle, unique to each BURP protein, which included a short conserved segment or other short segments; and an optional segment consisting of repeated units [8,9]. According to sequence alignment and phylogenetic analysis of the BURP domain-containing proteins identified in many species, such as soybean [10], poplar [11], alfalfa [12], rice [9], maize and sorghum [13], the members of the BURP family could be divided into BNM2-like, USP-like, RD22-like, PG1β-like and other subfamilies. The main difference among members from different BURP subfamilies was the variable region between the N-terminal hydrophobic and the C-terminal BURP domains. The variable region of the BNM2-like proteins contained only a short conserved segment. The USP-like and RD22-like proteins contained a sequence segment (~30 amino acids), while the RD22-like proteins also contained repeat units (~20 amino acids) [14]. The PG1β-like proteins contained a sequence segment (14 amino acids) that was distinguished from members of other subfamilies [7].
Many BURP proteins have been identified and classified based on sequence characteristics. However, the members from different subfamilies showed various expression patterns and functions. According to previous studies, BURP genes potentially perform two main functions: maintaining normal plant development and responding to stresses.
In plant development and metabolism, many BURP genes have been isolated and played significant roles in the development of reproductive organs, such as anther, microspore and seed, and in the metabolism of pectin in the cell wall. For instance, the expression of BNM2 indicated its functions in the development of microspores [5,15,16]. VfUSP, a BURP protein found in field bean, might be involved in regulating early development of zygotic embryogenesis [6,17]. ASG1, a BURP gene found in apomictic Panicum, probably regulated the formation of aposporous initial cells [18]. OsRAF-TIN1, an anther-specific BURP protein in rice, was crucial for pollen development [19]. The expression pattern of RA8, another anther-specific BURP gene in rice indicated its importance for dehiscence of anther [3]. SCB1, a seed coat-specific BURP protein isolated from soybean, might regulate the differentiation of parenchyma cells in seed coat [20]. AtUSP1, a BURP protein in Arabidopsis, was synthesized in cellular compartments, indicating a significant function in seed development [21]. PG1β, a non-catalytic β-subunit of the PG1 complex in tomato, might be essential for pectin metabolism by regulating pectin solubilization and degradation [7,22]. These reports showed the clues of BURPs' functions in various organs development in plants.
In response to stress treatments, some BURP proteins mainly from the RD22-like and BNM2-like subfamilies have been reported to be stress-induced. RD22, a reference gene for drought stress in Arabidopsis, was also induced by salt and abscisic acid (ABA) [2,23]. AtUSP1 was also demonstrated as a suppressor of the ABAmediated moisture stress response. BgBDC1, 2, 3 and 4, four BURP genes homologous to AtRD22 in the mangrove, responded to biotic stresses and abiotic stresses in an ABA-mediated manner [24]. BnBDC1, a BURP gene with sequence similarity to AtRD22 in oilseed rape, was induced by various stresses, including salt, osmotic and UV irradiation, and plant hormones, such as ABA and SA [25]. SBIP-355, a BURP gene homologous to AtRD22 in tobacco, might be involved in plant defense through the SA pathway [26]. Sali5-4a and Sali3-2, two BURP genes found in soybean, were induced by aluminum and Sali3-2 was involved in the salt tolerance [27][28][29]. Moreover, comprehensive analyses of BURP family genes in many plants showed that most of them were responsive to stress treatments [9,10,12,13]. These results implied that BURP genes performed crucial functions not only in plant development but also in response to abiotic stresses and might be involved in phytohormone signal pathways, such as ABA or SA.
At present, only one BURP gene in cotton (GhRDL1) has been cloned and could interact with GhEXPA1, and their co-expression could strikingly increase bolls and fruit production [30]. Meanwhile, GhRDL1 and GhEXPA1, as target genes of two HD-ZIP IV transcription factors, GhHOX3 and GhHD1, could also influence fiber length finally [31]. These findings indicated that BURP genes might perform important functions in cotton. However, most of the BURP genes in cotton are unknown, and no available genome-wide analyses of the cotton BURP gene family have been reported. So, it is considerably interesting to identify and comprehensively analyze BURP family in upland cotton. In this study, putative BURP genes were identified by screening the genome sequence of Gossypium raimondii, Gossypium arboreum and allotetraploid cultivated cotton (Gossypium hirsutum) [32][33][34][35][36]. The comprehensive analyses of BURP genes in three cotton species, including gene structure, phylogenetic tree and expression characteristics in different tissues and under various stress treatments will provide a foundation for further studies on the potential functions of BURPs in cotton growth and stress response.

Results
Identification of BURP family members in G. raimondii, G. arboreum and G. hirsutum To identify BURP family genes, the BURP domain (PF03181) was served as a query to search against the protein databases of G. raimondii, G. arboreum and G. hirsutum. A total of 18, 17 and 30 putative BURP proteins were identified in G. raimondii, G. arboreum and G. hirsutum, respectively, which were confirmed to contain the conserved BURP domain by Pfam and SMART databases. Sixty-four putative BURP proteins were named according to their previously reported homologous groups in other species. The length of putative GrBURP protein sequences ranged from 212 amino acids (aa) (GrBURP2) to 649 aa (GrPG1); GaBURPs ranged from 220 aa (GaBURP3 and GaBURP4) to 649 aa (GaPG1), and GhBURPs ranged from 166 aa (GhBURP5) to 733 aa (GhBNM4). The predicted molecular weight (Mw), isoelectric point (pI), grand average (GRAVY) of hydropathicity and subcellular localization of these proteins were shown in Table 1.

Phylogenetic analysis and classification of the BURP gene family
To investigate the evolutionary relationships of BURP proteins, phylogenetic analysis of 157 predicted BURP proteins from Oryza sativa (17), Zea mays L (10), Arabidopsis thaliana (4 and the remaining one, AtRD22, included in the 4 host BURP proteins), Glycine max (21), Populus trichocarpa (19), Theobroma cacao (17), G. raimondii (18), G. arboreum (17), G. hirsutum (30) and 4 host BURP proteins (AtRD22, VfUSPs, BNM2 and Le-PG1β) were used to construct a phylogenetic tree. The different number of BURP members in Zea mays (15 to 10), Populus trichocarpa (18 to 19) and Glycine max (23 to 21) between our study and previous reports might result from choosing only one transcript of the gene and choosing different genomic versions. These BURP proteins were clustered into eight subfamilies (BNM2-like, USP-like, RD22-like, PG1β-like, BURP V, BURP VI, BURP VII and BURP VIII) ( Fig. 1 and Additional file 1: Table S1). The BURP VI and BURP VII subfamilies contained only the BURP members from two monocots, Oryza sativa and Zea mays. The BNM2-like, BURP V and USP-like subfamilies contained only the BURP members from the investigated dicots. The results indicated that BURP proteins of these subfamilies might evolve in different directions and show diverse functions in different plants. Notably, BURP proteins from Populus trichocarpa, Theobroma cacao, G. raimondii, G. arboreum and G. hirsutum showed similar distributions in four subfamilies (BNM2-like, PG1β-like, RD22-like and BURP V).
Chromosomal location, gene duplication event and syntenic analysis of BURP genes According to the genomic location of 65 BURP genes, the chromosomal distributions of GrBURPs, GaBURPs and GhBURPs were shown in Table 1 and Fig. 2. In G. raimondii, 18 GrBURPs were unevenly anchored on eight chromosomes. D05 and D09 contained four and five GrBURPs, respectively. The other six chromosomes, D01, D02, D04, D11, D12 and D13, contained one to three GrBURPs (Fig. 2c). In G. arboreum, 17 GaBURPs were located on eight chromosomes. A03 and A08 contained the most GaBURPs (4), while the other six chromosomes contained only one to three GaBURPs (Fig.  2b). In G. hirsutum, a total of 24 GhBURPs were unevenly mapped to twelve chromosomes, while the other six GhBURPs were located on unassembled scaffolds. At05 and Dt05 contained five and four GhBUPRs, respectively. The other ten chromosomes contained only one to three GhBURPs (Fig. 2a).
Gene duplication events, including segment duplication and tandem duplication have been studied for their vital role in genomic evolution and formation of gene family [37]. To explore the relationships between BURP genes and gene duplications in cotton, gene duplication analysis was performed and shown in Fig. 2 and Table 2. In G. arboreum, one tandem duplication event (GaBURP3/GaBURP4) was identified. In G. raimondii, two gene duplication events were discovered, including one segment duplication (GrPG2 / GrPG3) and one tandem duplication event (GrRD1 / GrRD2). In G. hirsutum, 13 gene duplication events, including 11 segment duplications and 2 tandem duplications (GhRD1 / GhRD3, GhRD2/ GhRD4), accounted for 86.67% of the GhBURP gene family. It is noteworthy that the segmental duplication pairs identified in G. hirsutum are all orthologs from A and D subgenomes, indicating that the hybridization of A and D genomes and following tetraploidization led to the duplication of these genes.
The selection pressure of BURP gene duplication pairs during their evolution was investigated by analysis of non-synonymous (Ka) substitution / synonymous (Ks) substitution ratio. All of the Ka/Ks ratios were less than 1.0 which implied that these gene pairs have evolved mainly under purifying selection after gene duplication events ( Table 2). With restriction of divergence by purifying selection, gene duplication pairs might exert analogous functions. Moreover, the divergence times between the gene duplication pairs were calculated ( Table 2). In G. raimondii and G. arboreum, the divergence times of two tandem duplication BURP pairs were inferred to be 8 and 19 million years ago (MYA), while the divergence time of the segmentally duplicated BURP gene pair was inferred to be~264 MYA. In G. hirsutum, the divergence times of the duplicated BURP pairs were presumed to be~5.65 to 26.30 MYA, with an average of 15.57 MYA.
According to syntenic analysis, 30, 26 and 18 homologous gene pairs were found between G. hirsutum and G. raimondii, G. hirsutum and G. arboreum, and G. raimondii and G. arboretum, respectively (Fig. 2e). Meanwhile, Table 1 The BURP genes in three cotton species    Table 1 The BURP genes in three cotton species  Table S2), whereas the orthologs of the other five GrBURPs and four GaBURPs in upland cotton were lost. The results revealed that the GhBURPs might experience genic sequence losses during the formation of upland cotton approximately 1-2 MYA [36].

Sequence analysis of BURP proteins
The signal peptides on the N-terminus and BURP domains on the C-terminus of BURP proteins in three cotton species were searched, and the location information was shown in Additional file 3: Table S3. The results showed that all 65 BURP proteins contained the BURP domain, and 37 BURP proteins contained the signal peptides: 8, 12 and 17 BURPs from G. arboreum, G. raimondii and G. hirsutum, respectively (Additional file 4: Figure S1). The multiple sequence alignment analysis of these BURP proteins displayed several highly conserved amino acid sites and four CH motifs indicating that these sites were important for the basic function of the BURP family (Additional file 5: Figure S2). Interestingly, an aspartic acid (D) was found among all 65 BURP proteins between the last two CH motifs revealing that this site was highly conserved and might be significant for maintaining the basic function of the BURP members in cotton. However, GrBURP3 had low similarity to other proteins due to the lack of the second CH and the variable fourth CH. In general, the C-terminus of BURP proteins in cotton could be summarized as CHX 3 YX 6 CHX 23-28 CHXDX 18-23 CHX 8 W with higher conserved sequences between the last two CH motifs.
Gene structure and conserved motifs of BURPs in three cotton species To obtain further insight into the conservation and diversification of the BURPs in three cotton species, analysis of their exon-intron structures and conserved motifs was carried out (Fig. 3 and Additional file 6: Table  S4). Most (57/65) of the BURPs contained less than 3 exons, except GrRD2, GrBURP3, GhBURP5, GaBURP2, GaBNM4, GrBNM4, GaBURP5 and GhBURP8 which contained 4, 5 and 9 exons, respectively. Major members from the PG1β-like and BNM2-like subfamilies contained 2 exons. More than half of the members from the RD22-like and BURP V subfamilies contained 3 and 1 exons, respectively. The results of the gene structure analysis indicated that most of the members from the same subfamilies showed similar gene structures (Fig. 3b).
We further searched 15 conserved motifs using MEME program to obtain more insight into the diversity and similarity of the BURP proteins (Fig. 3c). The BURP domain in the C-terminus contained motif 1, 2, 3, 4, 5, 6 and 7. These seven motifs and their arrangement were similar, especially in the members from the same subfamilies. Closely related BURPs exhibited similar motifs and motif construction. However, motif 13 and motif 8-12 existed only in the BNM2-like and PG1β-like subfamilies, respectively. Motif 14 was present only in the RD22-like and BURP V subfamilies.

Analysis of cis-elements in putative GhBURP promoter regions
Many studies have shown that BURPs participated in various stress responses. To better understand and elucidate the possible regulatory functions of GhBURPs under different stresses, we identified putative stressrelated and plant hormone-related cis-elements among the 1500 bp promoter regions from the start codons  Table S5). Ten kinds of plant hormone-related elements, AuxRR-core (auxin), TGA-element (auxin), P-box (gibberellin), TATC-box (gibberellin), GARE-motif (gibberellin), CGTAC-motif (MeJA), TGACG-motif (MeJA), ERE (ethylene), TCA-element (SA) and ABRE (ABA), and four kinds of stress-responsive regulatory elements, MBS (drought), TC-rich repeats (defense and stress responsiveness), HSE (heat stress) and LTR (cold stress), were identified in the putative promoters of GhBURPs. The results revealed that all promoters of 30 GhBURPs contained at least one putative stress-responsive elements. The promoters of some GhBURPs contained many various stress-responsive elements, such as GhBNM3 (6 HSEs and 4 TC-rich repeats), GhRD2 (5 MBSs, 2 HSEs and 1 LTR), and GhBURP10 (3 HSEs, 2 TC-rich repeats, 1 MBS and 1 LTR). Similar results were found in the promoters of MtBURP [12]. The promoters of 25 GhBURPs contained cis-elements involved in defense and stress responsiveness. Heat stress and drought stress  elements were more common than other stress-related elements, which suggested that many GhBURPs, especially GhBNM3 with 6 HSEs and GhRD2 with 5 MBSs, might be involved in heat and drought stress response in cotton. In addition, promoters of 18 GhBURPs contained P-box (cis-elements related to GA), indicating that these genes might be involved in the GA regulatory pathway.

Tissue specific expression patterns of GhBURPs
To understand the possible functions of GhBURPs during cotton development, we investigated their expression profiles in different cotton organs including root, stem, leaf, petal, stamen, pistil, ovules and fibers using available transcriptomic data [35]. Among the 30 putative GhBURPs, 21 GhBURPs had a fragments per kilobase of transcript per million fragments (FPKM) ≥ 1 in at least one of the 8 tissues (Fig. 5). The other 9 GhBURPs, all from BURP V, showed very low or no expression in the investigated organs, and 7 of them were from gene duplication, which indicated that these genes may be redundancy or pseudogenes, or expressed only in specific tissues at specific developmental stages or in specific environments.
According to the expression features, 21 GhBURPs were mainly clustered into three groups (A-C) based on a hierarchical clustering analysis (Fig. 5). Eight GhBURPs containing all the members of the RD22-like subfamily, belonging to group A, showed general expression in all the organs and higher expression in reproductive organs (pistil, earlier ovule and fiber) than in other organs. Nine GhBURPs belonging to group B showed dominant expression in some organs, such as root, stem, stamen and ovule. The remaining 4 members (GhBURP3, GhBNM5, GhBNM4 and GhBNM6) belonging to group C showed universal low expression in all the organs except a few organs with slightly higher expression. The different expression patterns might be related to the various functions of GhBURPs. GhRDL1 (GhRD4), predominately expressed at the fiber rapid elongation stages (5 and 10 days post anthesis), has been confirmed to promote fiber elongation and increase fruit production by interacting with GhEXPA1 [30]. GhBNM8 with preferential expression in ovules, especially at the fiber initiation stage (− 3 DPA). GhPG5 and GhPG6 showed dominant expression in stamen, indicating their possible roles during stamen development.

Stress-induced expression profiles of GhBURPs
According to the analysis of cis-elements in promoter regions and previous studies on BURPs in other plants, GhBURPs might be involved in stress response. To verify this hypothesis, we analyzed the expression profiles of 15 GhBURPs (FPKM ≥1) under heat, salt and drought treatments using available transcriptomic data [35] (Fig. 6). The results revealed that all 15 GhBURPs could be induced by all three stresses with different degrees.
Among these genes, GhBNM3, a member of the BNM2-like subfamily, was notably induced by heat stress with a continuing increasing trend, which was consistent with the most heat elements in the putative promoter of this gene. Four genes from the RD22-like subfamily, GhRD1, GhRD2, GhRD3 and GhRD4, with many stress-related elements, especially drought elements, showed significant up-regulated expression under salt and polyethylene glycol (PEG) treatment and downregulated expression under heat treatment, which accorded with the name "dehydration-responsive protein". Overall, under heat treatment, most of these genes exhibited early down-regulated and then upregulated expression. In response to salt and PEG treatments, more than half of GhBURPs showed early up-regulated and then down-regulated expression. Similar expression patterns were found among the members of the same subfamilies.
To further explore the potential functions of GhBURPs in response to stress-related plant hormones, we also analyzed the expression characteristics of 26 GhBURPs (except GhBURP3, GhBNM5, GhBNM7 and GhBURP9 with no detected expression in leaf ) under ABA and SA treatment by qRT-PCR ( Fig. 7 and Fig. 8). All of the 26 GhBURPs were induced by ABA or SA at different points in time after treatments. Under ABA treatment, 26 GhBURPs were divided into 3 major patterns (a-c) according to their expression characteristics (Fig. 7).
More than half (15/26) of these genes showed an early (mainly at 3 h and 6 h) down-regulation followed by upregulation (mainly at 12 h and 24 h) (Fig. 7c). Four genes (GhPG3, GhPG4t, GhBURP8 and GhBURP11) showed up-regulated expression at all time points and reached a maximum at 9 h, except GhBURP11 at 24 h (Fig. 7a). The expressions of remaining 7 GhBURPs were downregulated with the minimum levels at 3 h and 6 h (Fig.  7b). Under SA treatment, 6 expression patterns were found, and 21 GhBURPs were contained in 3 predominant expression patterns: up-regulated expression pattern (5 GhBURPs), early down-regulated and then upregulated expression pattern (10 GhBURPs) and early up-regulated, then down-regulated and finally upregulated expression pattern (6 GhBURPs) (Fig. 8).
GhBNM6 showed a down-regulated expression pattern at all time points (Fig. 8f ). Two paralogous gene pairs showed two other expression patterns, which were slightly up-regulated and dramatically down-regulated (GhPG1 / GhPG2) (Fig. 8d) and early down-regulated, then up-regulated and finally down-regulated (GhPG3 / GhPG4) (Fig. 8e). In addition, GhBURP8 and GhBURP11 showed up-regulated expression at all time points under ABA and SA treatments.

Discussion
Characterization of the BURP gene family in cotton BURP genes have been extensively studied in many plants for their functions in plant development and responding to stresses [2, 6-8, 22, 26]. The genomewide analysis of the BURP gene family was performed in abundant plants and found different numbers of BURP members. In our study, 18, 17 and 30 BURP genes were identified in G. raimondii, G. arboreum and G. hirsutum, respectively (Table 1). Recently, two different versions of genome sequences of G. hirsutum were obtained using single-molecule real-time sequencing, famous for long-read, which improved the contiguity and completeness for highly repeats chromosomal regions [38,39]. In the further studies, we will refer to the different genome sequences for more accurate analysis. The analysis of gene duplication events indicated that gene duplication, especially segment duplication, played a significant role in formation of GhBURP gene family ( Fig. 2 and Table 2) [40,41]. Whole-genome duplication (WGD) and polyploidization happened throughout the evolutions of many higher plants [42]. For example, in the ancestral seed plant, one WGD occurred~310 MYA and in all eudicots, the paleohexaploidization event occurred~130-190 MYA [43]. In addition, the studies on the evolution of cotton revealed that two WGD events occurred at approximately 115-146 and 13-20 MYA in Gossypium [32][33][34]. Following these two WGD events, A-genome diploids and D-genome diploid began to diverge~5-10 MYA [32,38]. Then, G. hirsutum was formed following the polyploidization of two diploid ancestors at approximately 1-2 MYA [35]. In G. raimondii, the divergence time of the segmentally duplicated BURPs was inferred to be~264 MYA, indicating that the segmentally duplicated BURPs began to divergence before the ancestral WGD. The divergence time of tandem duplicated BURPs in G. raimondii and G. arboreum were inferred to be 8 and 19 MYA, which were after two WGD events. In G. hirsutum, the inferred divergence times of segmentally duplicated GhBURPs were~5.65 to 26.30 MYA, indicating that the divergence of the segmentally duplicated GhBURPs were accompanied with the divergence of A and D progenitor genomes ( Table 2). As an indicator of selection pressure on a gene or gene region, the analysis of ratios of Ka/Ks indicated that these gene duplication pairs performed similar functions for limitation of purifying selection in function divergence [44]. Based on the analysis of similarity and synteny, some homologs of GrBURPs and GaBURPs were missing in G. hirsutum, Fig. 6 Expression profiles of GhBURPs under three stresses. The expression characteristics of 15 GhBURPs under heat, salt and PEG treatments were investigated using available transcriptomic data. 0 h, 1 h, 3 h, 6 h and 12 h indicate hours after different stress treatments. Gene names and the subfamilies are shown on the right. Blocks with colors represent the relative expression levels of GhBURPs which might be due to gene loss caused by deletion or rearrangement of segment in chromosomes, point mutations and insertion or deletion of genes during the evolutional history of G. hirsutum [43].
The results of phylogenetic analysis showed that the BURP members from nine species could be classified into eight subfamilies ( Fig. 1 and Additional file 1: Table   S1). The RD22-like, PG1β-like and BURP VIII subfamilies were composed of BURP members from monocots and dicots, indicating that these genes might have originated before the divergence of monocots and dicots [10,12]. However, the members of other subfamilies were only from investigated monocots (BURP VI and BURP VII) or dicots (USP-like, BNM2-like and BURP V subfamilies), indicating that these genes might evolve separately and perform different functions between monocots and dicots [9,13,45]. Some results were different from previous studies due to the different methods and species used in the phylogenetic analysis. In our study, we searched BURP proteins in the more species including two monocots, seven dicots and four host BURP proteins than previous studies and 1000 bootstrap replications were used for more reliable classification results in phylogenetic analysis.
According to the structure analysis, all 65 BURP proteins contained conserved BURP-domain, especially the four notable CH motifs (Additional file 4: Figure S1 and Additional file 5: Figure S2). The component motifs and arrangement of motifs of members were conserved in the same subfamilies and were diverse among different subfamilies, which was in accordance with the results of the exon-intron structure of BURPs (Fig. 3). Similar results were found in previous studies and might be related to BURPs' conserved and different functions in plant development, metabolism and stress resistance [9,10,13,21].

Putative functions of GhBURPs
Gene expression patterns were usually known as a significant indication to explore their functions [46]. To explore the potential functions of GhBURPs, their expression patterns in eight tissues were investigated (Fig.  5). The results revealed that the paralogous gene pairs showed similar expression characteristics, indicating that these genes may execute analogical functions, which was confirmed by the expression patterns of BURPs in Glycine max [10]. Notably, GhBNM7 and GhBNM8, homologous to AtUSPL1, which regulated seed development [21], were dominantly expressed in early-stage ovules (cotton seed), hinting their potential roles in ovule and fiber development. GhPG5 and GhPG6, homologous to AtPGLs, were preferentially expressed in stamen [22,47], indicating their functions in pollen development possibly by degrading pectin of the cell wall. In addition, GhRD1, GhRD2 GhRD3, and GhRD4, homologous to AtRD22, were pervasively expressed in the investigated organs, especially in fiber, and had higher expression levels in root than other members of the GhBURP gene family. Meanwhile, previous studies have attested that GhRD4 regulated fruit production and fiber length through controlling cell wall expansion and RD22-like genes could respond to dehydration and other stresses [2,30,31,46,48]. These results indicated that these genes might play potential roles in fiber development and the response to various stresses, which needs to be further verified.
Since the "R" gene (AtRD22) of "BURP" was identified, the stress response of BURP genes gradually became an indisputable fact [2]. The analysis of cis-elements revealed that all putative promoters of the 30 GhBURP contained at least one of the investigated stressresponsive elements, suggesting that these genes might respond to various stresses ( Fig. 4 and Additional file 7: Table S5). Notably, most (25/30) of GhBURPs contained defense and stress response-related cis-elements, indicating that the functions involved in stress responses were maintained during evolution of the GhBURP gene family. Consistently, expression patterns of 15 GhBURPs under three stress treatments (heat, salt and PEG) showed that all of these genes responded to stress treatments at different levels ( Fig. 6). Among them, the expression of GhBNM3 with 10 stress-related cis-elements and four RD22-like genes (GhRD1, GhRD2 GhRD3 and GhRD4) with 4 to 8 stress-related cis-elements were remarkably up-regulated under heat and salt stress treatments, respectively, indicating that these genes might respond to heat and salt stresses via stress-related ciselements.
Previous studies have indicated that the ABA and SA signal pathway were important to stress responses and plant defense [9,49]. Some BURP genes including BnBDC1 [25], AtRD22 [23], two OsBURPs [9] and fifteen BURPs in Glycine max [10] were induced by ABA. In our study, qRT-PCR analyses revealed that 26 GhBURPs were induced by ABA and SA at different points, indicating that this gene family probably participate in ABA or SA signaling pathways (Figs. 7 and 8). The four members from the RD22-like subfamily showed similar expression pattern (initially down-regulated and later up-regulated) in response to ABA and SA. In addition, GhBURP8 and GhBURP11, with neither ABA-nor SA-responsive elements, had a strongly up-regulated expression pattern, indicating that some unidentified stress-related cis-elements might be crucial to response and defense to stresses in cotton.
In general, most GhBURPs (except genes with very low or no expression in investigated tissues) were induced by heat, salt, PEG, ABA and SA, even though the induction of some GhBURPs was slight. Similar to previous reports, the members of the RD22-like subfamily (GhRD1, GhRD2 GhRD3 and GhRD4) were clearly induced by stress treatments, especially by heat and salt treatments, and might be related to the higher expression in root than other members [2]. Meanwhile, the significant roles of GhRD4 in regulating fruit production and fiber length have been proved in previous studies [30,31], implying that this gene executed important functions in plant growth and response to abiotic stresses. The other three genes showed expression profiles similar to GhRD4 in investigated organs and under stress treatments indicating their similar functions. The expression levels of GhBURPs from other subfamilies also changed under stresses, which was similar to the result in Medicago [12]. Taken together, GhBURPs were extensively induced by stresses, revealing that these genes perform important functions in response to different stresses in cotton. These results implied GhBURP gene family's potential importance in improving cotton stress tolerance.

Conclusion
In this study, a systematical analysis was carried out to investigate the BURP domain-containing gene family in three cotton species (G. raimondii, G. arboreum and G. hirsutum). Based on analyses of phylogeny, gene structure and conserved motifs, the BURP genes could be divided into 8 subfamilies. Gene duplication analysis indicated that GhBURP gene family expansion might be due to segmental duplication. Expression analysis revealed that GhBURPs exhibited different expression characteristics at different organs. The analyses of qRT-PCR and transcriptomic data indicated that GhBURPs were induced by various stresses. The results of our study provide a comprehensive view of the GhBURP gene family and might be valuable for improving of cotton stress tolerance.

Identification of BURPs in cotton
The hidden markov model (HMM) profile corresponding to the BURP domain (PF03181) domain was downloaded from Pfam database (http://pfam.xfam.org) [50]. The genomic databases of G. raimondii, G. arboreum and G. hirsutum were obtained from the CottonGen website (https://www.cottongen.org/icgi/home), The genomes of other species were downloaded from phytozome database (https://phytozome.jgi.doe.gov/). HMMER 3.0 was used to search the BURP genes from the three cotton species genomes [51]. The default parameter of e-value threshold was set at 1e− 10. Then, all of the candidate BURP proteins acquired from the HMM search, further confirmed the existence of the BURP domain using SMART database (http://smart.embl-heidelberg.de/) with the normal mode [52]. The BURP genes in other species were identified with the same method.

Chromosomal location, gene duplication and syntenic analysis of BURP genes
The distribution of BURP genes was determined by MapInspect (http://www.plantbreeding.wur.nl/uk/ softwaremapinspect.html) based on physical location from genome databases of three cotton species. First, the predicted BURP proteins were aligned using ClustalW2 at EMBL-EBI (http://www.ebi.ac.uk/Tools/msa/clustalw2/). Then, gene duplication events were determined according to the following criteria: (1) the coverage of alignment sequence was ≥80% of the longer gene; (2) the similarity of the aligned regions was ≥80%; and (3) tightly linked genes on the same chromosome were considered as tandem duplication [37,57]. The syntenic analysis was performed by using MCScanX software with the default parameters [58]. Circos was adopted to plot the diagram of segmental duplication events on chromosomes [59].
The selection pressure of each BURP gene duplication event was calculated by Ka/Ks rates using DnaSp V5.0 software [60]. Generally, Ka/Ks < 1 signified purifying selection; Ka/Ks = 1 signified neutral selection; and Ka/ Ks > 1 signified positive selection. The divergence times of these gene duplication pairs were speculated according to previous reports [38,61].

Sequence analysis
Multiple sequence alignment was performed using identified BURP proteins with ClustalX 2.0. The locations of conserved BURP domains and potential signal peptides of these proteins were determined using SMART database and SignalP 4.0 server (http://www.cbs.dtu.dk/services/SignalP/), respectively [45,62].
The organization of exon-intron structures of BURP genes was ascertained by comparing coding sequences with corresponding full-length genes on the Gene Structure Display Server (GSDS: http://gsds.cbi.pku.edu.cn/) [63]. The conserved motifs from BURP proteins were identified via MEME program. The optimized parameters were set as follows: size distribution, zero or one occurrence per sequence; motif count, 15; and motif width, between 6 and 50 residues [64].

Plant materials and treatments
In this study, G. hirsutum L. acc. Texas Marker-1 (TM-1) was grown in a climate-controlled chamber (light/ dark cycle: 16 h at 28°C/8 h at 22°C). Four-week-old seedlings with exhibiting third true leaves were treated with 100 μM ABA, 100 μM SA and water as a control group. The leaves were collected at 0 h, 3 h, 6 h, 9 h, 12 h and 24 h after treatments. After harvest, all the treated materials were immediately frozen in liquid nitrogen and stored at − 80°C.

RNA extraction and gene expression analysis
Total RNA was isolated from the collected samples using the RNA-prep Pure Plant Kit (TIANGEN, Beijing, China). The DNA-free RNA was reverse transcribed using the PrimeScript RT Reagent kit (Takara, Dalian, China) according to the manufacturer's instructions. qRT-PCR was carried out with an ABI 7500 Real-Time PCR System (Applied Biosystems) using SYBR Premix Ex Taq (Takara). The protocol was performed as follows: step 1: 95°C for 30 s; step 2: 40 cycles of 95°C for 5 s, 60°C for 34 s; and step 3: melting curve analysis. Three biological replicates and the 2 -△△CT method were used to calculate the relative expression levels of GhBURPs [66]. The cotton histone3 gene (GeneBank accession no. AF024716) was used as an internal reference gene to normalize target gene expression levels [57]. The genespecific primers used for qRT-PCR were designed by Primer 5.0 (Additional file 8: Table S6).
The expression levels of GhBURPs in different tissues and under treatments (heat, salt and drought) were investigated using the reported transcriptomic data [35].