Genome-Wide Identication, Characterization and Evolution of BBX Gene Family in Cotton

Background: The B-BOX (BBX) proteins have important functions in the regulation of photomorphogenesis. The BBX gene family has been identied in several plants, such as rice, Arabidopsis and tomato. However, there still lack a genome-wide survey of BBX genes in cotton. Results: In our present study, 63 GhBBX genes were identied in cotton. The analyses of phylogenetic evolution and gene structure showed that the GhBBX genes were divided into ve subfamilies, and contained B-box conserved domains. qRT-PCR analysis releaved that both GhBBX27 and GhBBX33 had potential roles in proanthocyanidin synthesis of brown cotton bers. Conclusions: This study provides a genome-wide survey of the BBX gene family in cotton and highlights its role in proanthocyanidin synthesis. This result will help us to further understand the complexity of the BBX gene family and the functional characteristics of its members. In this present study, a systematic analysis of the GhBBX gene family was performed, including conserved domain, gene structure, phylogenetic relationship, gene duplication and expression pattern analysis. The GhBBX genes were divided into ve structural groups: I (11 genes), II (7 genes), III (6 genes), IV (31 genes), V (8 genes), which were supported by the analyses of gene structural and conserved domain. The motifs of the two B-box conserved domains are C-X2-C-X4-A-X3-C-X2-D-X4-C-X2-CD and C-X7-C-X2-CD-X3-H. qRT-PCR analysis revealed that the GhBBX genes play an important role at different development stages of brown cotton bers. GhBBX27 and GhBBX33 were successfully transformed into wild-type Arabidopsis using Agrobacterium-mediated method, The color of transgenic Arabidopsis seed coat was darker than wild type by DMACA staining, and the hypocotyls of transgenic Arabidopsis seedlings are shorter than wild type.


Structural analysis of BBX proteins
The sequence of the rst B-box (B1) motif contained in these genes were C-X 2 -C-X 4 -A-X 3 -C-X 2 -D-X 4 -C-X 2 -C-D ( Fig. 1A), and the sequence of the second B-box (B2) motif were C-X7-C-X2-C-D-X3-H (Fig. 1B). The distance between the two B-box domains was 5-20 amino acids. It has been found that the conserved amino acid residues in the B-box motifs are very important in regulating protein-protein interaction and transcriptional regulation. According to the reported article, the second B-box motif of AtBBX21 in Arabidopsis was key to bind to HY5 promoter and promote the transcription of HY5. The sequence of CCT domain is R-X 5 -R-Y-X-E-K-X 3 -R-X 3 -K-X 2 -R-Y-X 2 -R-K-X 2 -A-X 2 -R-X-R-X-K-G-R-F-X-K (Fig. 1C).
In order to understand more clearly the relationship between these genes, a phylogenic tree was constructed according to BBX genes obtained from Pyrus bretschneideri, O. sativa, A. thaliana, P. trichocarpa, Z. mays and Gossypium hirsutum. These BBX proteins were divided to ve structural groups (Fig. 2). The structural group I contains two B-box domains and one CCT domain. The structural group II contains also two B-box domains and one CCT domain, but there are some differences in the second Bbox domain between the group I and group II. The structural group III consists of a B-box domain and a CCT domain. The structural group IV contains two B-box domains without CCT structure. The structural group V contains only one B-box domain. In cotton, there are only 24 GhBBX genes with CCT structure, accounting for 38% of all GhBBX genes. Among them, 11 GhBBX genes belong to the group I, and 7 GhBBX genes are in the group II and 6 GhBBX genes are in the group III. The remaining are shared into the groups IV and V containing 31 and 8 GhBBX genes, respectively (Fig. 2). Nonetheless, the GhBBX16 is found to t in the structural group II, which only contains a B-box (Fig. 2). These results indicate that some of the GhBBX proteins can lose a domain in recent evolutionary events, but retain other common features of their structural groups. Phylogenetic analysis shows that the GhBBX proteins belonging to the same structural group are classi ed by amino acid similarity and the structural organization of B-box and CCT domains.

Structural analysis of BBX genes in cotton
Page 7/20 The gene structures of these BBX genes were analyzed in upland cotton. The results indicated that only four BBX genes had no introns, all of which belong to the structural group IV (Fig. 3). The remaining BBX genes contained 1 to 9 introns, but GhBBX40 had the most introns (Fig. 3). GhBBX4, GhBBX9, GhBBX12, GhBBX13, GhBBX18 contained four introns and the same numbers of exons. In addition, there were 26 BBX genes with two introns and one exon, and there were 11 BBX genes with three exons and the same numbers of introns, which belong to the structural group IV and V without CCT (Fig. 3).
Expression pattern of partial BBX family genes in cotton In order to understand the gene expression pattern of the BBX family genes, qRT-PCR were carried out. The expression levels of these BBX genes from the structural group IV and structural group V were tested at 6 days post anthesis (DPA), 12 DPA, 18 DPA and 24 DPA in brown cotton bers (Fig. 4). The experimental results showed that small numbers of genes including GhBBX25, GhBBX59, GhBBX47 and GhBBX34 were expressed at low levels during these periods (Fig. 4). On the other hand, GhBBX58, GhBBX63, GhBBX49, GhBBX51, GhBBX41, GhBBX60 were expressed at lower levels at 6 DPA, 12 DPA and 18 DPA. GhBBX41 and GhBBX54 had low expression at 24 DPA, but there were high expression levels at other periods (Fig. 4). On the contrary, GhBBX25 had high expression at 18 DPA, but the expression of GhBBX25 was lower at other periods. GhBBX37 had the same expression level at these four periods. The expression of GhBBX55 was higher at 18 DPA and 24 DPA than that at other two periods (Fig. 4).

DMACA staining of transgenic Arabidopsis seeds
In Arabidopsis, AtBBX21 belongs to the structural group IV of the AtBBX gene family. The previous study desmonstrated that the mutation of the second B-box of AtBBX21 reduced the expression of CHS, CHI, F3H, F3'H, DFR and LDOX, while the mutantion of the rst B-box were not signi cantly different from the overexpressed plants in Arabidopsis. In this study, the eukaryotic expression vectors of GhBBX27 and GhBBX33, which belong to the structural group IV similar to AtBBX21, were constructed and transformed into Arabidopsis. These seeds of transgenic plants were dyed in DMACA reagent to observe the color of the seed coat. The results displayed that the seed coat colors of transgenic Arabidopsis with GhBBX27 and GhBBX33 were dark, and the seed color of wild type Arabidopsis was a light color (Fig. 5). The color of the Arabidopsis seed coat was caused by the accumulation of proanthocyanidins in the seed coat. The transgenic Arabidopsis seed coats of GhBBX27 and GhBBX33 had a darker color than the wild type. This suggests that GhBBX27 and GhBBX33 may have an effect on the accumulation of proanthocyanidins.

Morphological observation of transgenic Arabidopsis seedlings
The seeds of transgenic and wild-type Arabidopsis were selected, sterilized and sown on MS medium. When the true leaves grew, the morphology of Arabidopsis seedlings was observed. The length of hypocotyls of the two transgenic Arabidopsis seedlings of GhBBX27 and GhBBX33 were shorter than that of the wild-type Arabidopsis (Fig. 6). While the photomorphogenesis of seedlings is established, HY5 can inhibit the hypocotyl elongation. It is speculated that GhBBX27 and GhBBX33 can inhibit the elongation of seedling hypocotyls by promoting the expression of HY5.

Discussion
In plants, BBX proteins regulate plant growth and development. The AtBBX protein contains two B-box and one CCT domains, and plays a key role in owering by regulating the photoperiod. AtBBX1 is the core factor in controlling the owering of Arabidopsis. CONSTANS (CO) promotes owering under long-day (LD) condition, but there is no effect on the owering time under short-day (SD) condition [16] . CO mutants ower late only under LD condition, whereas CO over-expressing plants ower early in both LD and SD conditions. The expression of BBX2 and BBX3 is different from CO. The change of the former has less regulation on owering time, but the overexpression of BBX2 can shorten the cycle of two different circadian rhythms [17] . BBX4(COL3) is not only a positive regulator of photomorphogenesis but also promotes root growth [7] . BBX6 positively regulates the expression of FLOW-ERING LOCUST (FT) to achieve the role of a speci c pollen inducer of SD [18] . BBX7 (COL9) plays the role of LD-speci c owering repressor by reducing the expression of CO and FT [19] . A number of AtBBXs that only have two B-box domains have also been identi ed. The Arabidopsis mutants showed that AtBBX18, 21 and 22 act as positive regulators in the seedling de-etiolation processes. AtBBX19, 24 and 25 have the opposite effects [8] . AtBBX32, which has only a single B-box, is a modulator of light signaling too [20] . AtBBX proteins without CCT domain regulate plant growth and development through other factors. Subsequent studies have revealed that a series of proteins containing B-box, including AtBBX21 and AtBBX22, are also direct targets of COP1. BBX-containing proteins are also known to share many roles with HY5 in response to light. Experiments show that AtBBX21 has a subtle relationship with the shade avoidance response [5] . It is also acknowledged that AtBBX22 can interact with HY5, and that AtBBX21 activate HY5 expression through its binding to the promoter of HY5 [9] . AtBBX1, which is the rst BBX gene identi ed in Arabidopsis, encodes a protein controlling photoperiod owering time. The overexpression of AtBBX1 induces early owering, and the mutation of AtBBX1 delays owering. In rice, OsCO3/OsBBX2 can inhibit owering under SD conditions, and plays a role in regulating the owering period [21] . Hd1/OsBBX1 can not only promote owering under SD but also inhibit owering under LD [22] . In addition to rice, other BBX proteins containing B-box and CCT domains in green plants have also been studied. The function of TaHd1-1 in wheat is similar to that of the rice Hd1/OsBBX1 [23] . StCO is isolated from potato and participates in the formation of photoperiod tubers [24] . In sugar beet, BvCOL1 is heterologous overexpressed in Arabidopsis co-2 mutant, and rescues the wildtype owering phenotype, which indicates that it functions identically as AtBBX1 [25] . Not the same with BBX members with B-box and CCT domain, other BBX proteins have multiple functions, which are involved in the regulation of photomorphogenesis and abiotic stress responses. For instance, AtBBX24 participates in UV-B signaling pathway, photomorphogenesis seedling de-etiolation and saline stress responses [26] . On the other hand, AtBBX21/STH2/LHUS and AtBBX22/STH3 are implicated in seedling photomorphogenesis and shade avoidance responses [5,27] , while AtBBX18 negatively regulates heat tolerance in Arabidopsis [28] . Heterologous expression of some BBX genes indicates that the transcription activity is affected by amino acid substitutions in the B1 or B2 domains. It is hypothesized that some BBX proteins containing the B-box conserved domains have played a central role in the regulation of plant development during evolution.
Although the BBX family has been identi ed in pear, corn and Arabidopsis, the BBX family in cotton has not yet been identi ed. In this article, 63 GhBBX genes were screened from cotton gene bank according to B-box domain, and were classi ed into ve subfamilies whether contain CCT domain or not. Among these genes, GhBBX27 and GhBBX33 similar to AtBBX21 are selected to study their functions in regulating the biosynthesis of proanthocyanidin. The DMACA staining of transgenic Arabidopsis seed coat was darker than that of wild type, indicating that the proanthocyanidin content increased. Therefore, it is speculated that GhBBX27 and GhBBX33 can positively regulate the synthesis of proanthocyanidin. The determination of the hypocotyls of transgenic Arabidopsis seedlings suggests that the regulation of proanthocyanidin synthesis may be related to the promoter of HY5. According to the analysis of the promoter of HY5 in cotton (Table 2), it is found that HY5 in cotton contains the same sequence as HY5 in Arabidopsis. The sequence is the T/G-box where AtBBX21 interacts with HY5. When the T/G-box of HY5 was point-mutated, it was found that AtBBX21 could not interact with the promoter of HY5 [10] . Therefore, it is speculated that GhBBX27 and GhBBX33 are combined with G-box of HY5 to promote the expression of CHS, CHI, F3H, F3'H, DFR and LDOX, thus further promote the synthesis of proanthocyanidin.

Conclusions
In this present study, a systematic analysis of the GhBBX gene family was performed, including conserved domain, gene structure, phylogenetic relationship, gene duplication and expression pattern analysis. The GhBBX genes were divided into ve structural groups: I (11 genes), II (7 genes), III (6 genes), IV (31 genes), V (8 genes), which were supported by the analyses of gene structural and conserved domain. The motifs of the two B-box conserved domains are C-X2-C-X4-A-X3-C-X2-D-X4-C-X2-CD and C-X7-C-X2-CD-X3-H. qRT-PCR analysis revealed that the GhBBX genes play an important role at different development stages of brown cotton bers. GhBBX27 and GhBBX33 were successfully transformed into wild-type Arabidopsis using Agrobacterium-mediated method, The color of transgenic Arabidopsis seed coat was darker than wild type by DMACA staining, and the hypocotyls of transgenic Arabidopsis seedlings are shorter than wild type.
It is speculated that GhBBX27 and GhBBX33 may play roles in promoting the synthesis of proanthocyanidins in brown cotton ber.

Plant material
Zongcaixuan 1 (P26), which is a brown cotton line breeded by our laboratory members, were planted in the High-tech Agricultural Park of Anhui Agricultural University. Cotton bolls were collected at 6 days post anthesis (DPA), 12 DPA, 18 DPA and 24 DPA, respectively, and were taken to the laboratory with liquid nitrogen freezing for storing in ultra-low temperature refrigerator.

Sequence retrieval
To identify and annotate BBX genes in cotton, the Arabidopsis BBX protein sequences from the Arabidopsis Information Resource (TAIR) database (http://www.arabidopsis.org) were used as queries to search against cotton genome database with BLASTP program (e-value < 1e-5

Arabidopsis transformation and DMACA staining
The recombinant plasmid pCambia1301a-GhBBX27 and pCambia1301a-GhBBX33 were transformed into Agrobacterium tumefaciens EHA105 by the oral dip method, respectively. The transformed plants were selected on MS medium supplemented with 50 mg L − 1 hygromycin. The positive plants were transferred to soil in the greenhouse at 25 ℃ under a 16 h light/8 h dark photoperiod, and were further con rmed by PCR. Finally, the seeds of transgenic and wild-type Arabidopsis were dyed by 0.1% DMACA, and observed the color of the seed coats.

Observation of hypocotyl lengths
The seeds of transgenic and wild-type Arabidopsis were selected, sterilized and sown on MS medium. When the true leaves grew, the hypocotyl lengths of Arabidopsis seedlings were observed. played no role in the design of the experiment and data collection, analysis, or preparation of the manuscript.
Author's contributions WPP and GJS designed the experiments. WPP, SN and WY performed most of experiments and analyzed the data. MNJ and GN assisted in experiments. WPP and SN wrote the manuscript. GJS and LDH revised the manuscript. All authors read and approved the nal manuscript. Relative expression of GhBBXs from the structural groups IV and V at different developmental stages of brown cotton bers The DMACA staining of Arabidopsis seeds. A: Transgenic Arabidopsis seeds of GhBBX27; B: Transgenic Arabidopsis seeds of GhBBX33; C: Wild-type Arabidopsis seeds Observation of hypocotyl lengths of transgenic Arabidopsis seedlings. A: Transgenic Arabidopsis seedlings of GhBBX27; B: Transgenic Arabidopsis seedlings of GhBBX33; C: Wild-type Arabidopsis seedlings