Epigenetics and Codon Usage of the Histone Genes in 12 Drosophila Species

Histones are proteins that bind to DNA and form nucleosomes. There are several types of histones that differ in chromosome distribution and timing of their expression. In Drosophila, each canonical type histone is identical or highly similar in amino acid sequence to its corresponding replacement type histone; however, gene structure and codon usage differ between the two types of histones. Identification of the evolutionary changes responsible for the differences between these two histone types will lead to an understanding of the development of epigenetic regulation. Here, recent findings regarding codon usage for canonical and replacement types of histones are outlined for study of the evolution of these histone genes and their epigenetic regulation in Drosophila.


Introduction
The molecular mechanism of epigenetics caused by histone modification and replacement is one of the important problems to be solved in the fields of cell biology and molecular biology. Several types of histones are found in Drosophila; one type is expressed dependently with chromosome replication, another type is expressed independently, and other types are histones specific for the centromere region (CEN-P) or are histone-like proteins [1][2][3][4][5][6][7][8][9][10][11]. Most of the early histone studies investigated the replication-dependent type in sea urchin, Xenopus and Drosophila, and therefore this type is called a 'canonical' histone. The other histone type, which is expressed independently of replication, was called a 'variant' or a 'replacement' because of some amino acid substitutions compared to the corresponding canonical histone [12]. The codon usage for the two histone types is reviewed here; however, many interesting results have also been reported regarding CEN-P and histone-like proteins [3,5,11,[13][14][15][16][17]. In addition to the normal 'functional' histone genes, broken genes or pseudogenes have also been found in a genome [18]. The histone genes for canonical histones in Drosophila have been reviewed elsewhere [19]. These genetic differences will be helpful for understanding the evolution of these histone genes and their epigenetic systems in Drosophila.
In Drosophila melanogaster five histones, H1, H2A, H2B, H3, and H4, are known as canonical type histones [1,40] and four histones, H2AvD, H3.3A, H3.3B, and H4r are known as replacement type histones (Table 1) [65][66][67][68]. A replacement type for H2B has not yet been found in Drosophila; however, a pseudogene for H2B was found [18].  structure of histone genes seems to be the same among different Drosophila species [71][72][73][74][75][76][77][78].  Genes for the canonical histone type are clustered by tandem duplication, and a repetitive unit is 5 kb (L) or 4.8 kb (S) in D. melanogaster. The difference between the L and S units is due to the presence or absence of a tRNA-derived element [69]. The histone gene cluster is located at the D-E region of chromosome II and the unit is repeated more than 100 times [79,80]. An ordinary repeating unit codes for 5 histone genes, termed a 'quintet' (H1, H2A, H2B, H3, and H4), and an exceptional unit codes for 4 histone genes, termed a 'quartet' (H2A, H2B, H3, and H4). In Drosophila virilis and Drosophila americana, quintets and quartets coexist in the genome [27,75,81,82]. A similar quartet cluster was also found in Mytilus edulis [83]. The repeating units of the cluster in a species are highly similar in terms of DNA sequence and have evolved in a concerted fashion [70,74,77]. No intron has been found for any gene for the canonical histone type. A hair-pin loop structure is present in the downstream region of each histone gene and transcription stops at that point [69,84]. A poly(A) tail is not usually added in the transcripts of canonical histones [85] although some exceptions have been reported [86]. The presence of a signal for polyadenylation has been indicated [69,86].

The GC Content at the 3 rd Codon Position of the Histone Genes in Drosophila
The GC content at the 3 rd codon position of the genes for the canonical and replacement histone types was analysed and compared for 12 Drosophila species [87][88][89] for which the genome project has already been completed. Analyses for the H1 and H2B genes have not been presented before because of the absence of corresponding replacement histones. Results for these genes are shown together with those for the other histone genes in Figure 3. The GC content at the 3 rd codon position is clearly affected by at least two factors; one factor is 'species' and the other factor is 'genes' [76,90,91]. In the species comparison, the GC content for any histone gene in Drosophila willistoni was considerably lower than that for other species (Figure 3). The same tendency for a drop in GC value in D. willistoni has been reported for other genes [92,93]. Table 2 shows the GC content at the 3 rd codon position for each histone gene averaged over the 12 Drosophila species. For any comparison of canonical and replacement type histones, the GC content of the replacement type is always higher than that of the canonical type. Comparison of the average GC content of the two types shows that the average GC content is 8.9% higher in the replacement type (62.8%) than that in the canonical type (53.9%) ( Figure 3 and Table 2). The highest GC content among the canonical types was observed for H2B (61.7%). The reason for this finding is not known. One possibility is the absence of a corresponding replacement type in spite of the fact that H2B is a core histone. The lowest GC content among the canonical types was observed for H1 (48.0%). This finding is probably related to the fact that the expression level of H1, which functions as a linker, is half that of the core histones. As described below, the GC content at the 3rd codon position is relevant to the usage of codons.

Codon Usage of the Histone Genes in Drosophila
Codon usage for nine histone genes, five canonical (H1, H2A, H2B, H3, and H4) and four replacement (H2AvD, H3.3A, H3.3B, and H4r) genes is shown in Figure 4. A group of two synonymous codons is shown on the left-hand side. Groups of three, four, and six synonymous codons are shown on the right-hand side. For convenience, a 3 rd codon with A or U is indicated at the lower end of each bar, and a 3 rd codon with a G or C is indicated at the upper end of each bar. In this way, the GC content at the 3 rd codon can be imaged by summing the G and C blocks at the upper end of the bars. Codon usage cannot be compared for Cys, because of no usage or a small number. When codon usage of the two histone types is compared, generally speaking G or C is used more frequently at the 3 rd codon position for the replacement type than for the canonical type. In some exceptional cases, although the number of cases is small, the usage of a synonymous codon was almost the same for the two histone types (for example, in a comparison of codons for Glu between H2A and H2AvD, or of codons for Tyr between H3 and H3.3B) or showed an inverse tendency (in comparisons of codons for Asp or Ile between H3, H3.3A, and H3.3B). Therefore, the GC content difference of the 3 rd codon between the two types seems to reflect a general tendency rather than reflect several unique usages for a specific amino acid.

Codon Usage at the Histone Modification Sites
The relationship between codon usage and histone modification (at Lys, Arg, Thr, or Ser) was analysed for four histones in Drosophila [87][88][89]. The modification pattern of Lys was complicated because of many sites of modification; several kinds of modifications such as methylation, acetylation, phosphorylation, and ubiquitylation; and multiple modifications at a single site such as 1-3 methylations, methylation and acetylation [20,45,46,51,52,94]. Although the relationship was not clear-cut, when the codon AAA was frequently used, this Lys site tended to be modified.
Thirteen arginine sites are known to be methylated in a total of four histones. The amino acid at position 73 of H2B in Drosophila is substituted with a different amino acid from arginine and therefore this site is not methylated. At three arginine positions in canonical type histones, position 76 in H2A, 128 in H3, and 92 in H4, the arginine codon AGA was used the most. All three of these positions were modified ( Figure 5). Thus the codon AGA was used with a frequency of more than 50%, only for modified sites. There may thus be a relationship between AGA codon usage and arginine methylation. for arginines in the canonical (C) or replacement (R) histone types, and the codon usage for these arginines are indicated. The codon usages (%) are labeled with the same colours as in Figure 4. The protein length of H2A in Drosophila differs from that of human/mouse H2A by six amino acids. Indels in human/mouse and Drosophila may cause gaps in the amino acid numbers defined from the N-terminal end for each species; therefore, caution is needed regarding the amino acid position numbers of the modification sites. The "M" on the X axis refers to a site for methylation.
Five serine sites are phosphorylated in total (position 1 in H2A, 33 in H2B, 10 and 28 in H3, and 1 in H4). There is an amino acid substitution for the non-modified amino acid at position 14 of H2B. For the canonical type, the codons UCU and AGU are used more than the other codons for serine at 7 sites, and 3 of these sites (position 1 in H2A, 28 in H3, and 1 in H4) correspond to sites of serine modification ( Figure 6). There is thus possibly a connection between the phosphorylation of serine and the usage of UCU and AGU codons.
Four threonine sites are phosphorylated in total (position 119 in H2A, and 3, 11, and 118 in H3). There was no obvious tendency for specific codon usage for threonine. In Drosophila H4, the amino acid at position 47 is substituted with a non-modified amino acid and therefore this site would not be phosphorylated.
In the future, it is anticipated that new histone modifications and more biological meanings for histone modification will be found. As for the timing of these modifications, it was recently reported that a certain modification (H3K9) occurred during translation, but not post translation [95]. The possible connection between modification and codon use such as between methylation of Arg and the use of AGA, and between phosphorylation of Ser and UCU or AGU usage suggested that histone modification might be associated with a specific tRNA, leading to one of the modification mechanisms at a specific site in the protein. It is a possibility that some amino acids within histones are modified during translation.

Histone Genes and Epigenetics Evolution
Gene structure of the two histone types is very different. Large amounts of the canonical histones need to be produced within a short period during early development in Drosophila. This can be accomplished by multiple gene copies, tandem gene clusters, and no splicing. On the other hand, large amounts of the replacement histones are not required; however, they should be expressed at the proper time. Therefore, for the replacement histones, a single or a few gene copies should be sufficient or better than multiple gene copies.
Exon-intron structure is also remarkably different between the genes of the two histone types; no intron has been found for the canonical type, but 1-4 introns have been found for the replacement type. Therefore, control of histone expression by splicing is only possible for the replacement type. Although the detailed mechanisms regarding the control of histone expression by splicing remains unknown, several conserved sequences at splicing sites have been found for the replacement type [87][88][89]. Transcriptional control plays an important role in controlling the expression of the canonical type [19,69]. However, a transcriptional control region was not found in the upstream region of the genes for the replacement type histones [87,88] except for the H2AvD gene [89]. Thus, for the replacement type histones, transcriptional control is only possible for the H2AvD gene since this is the only replacement type histone gene to have a conserved transcriptional control sequence upstream of the H2AvD gene [89].
Another difference between canonical and replacement type histones is their codon usage. The replacement type used G or C at the 3rd codon position more often than the canonical type. Codon use may affect the translation efficiency in conjunction with the composition of tRNA pools in the cell. Furthermore, the extraordinary biased codon usage at the sites of histone modification suggested a functional difference in codon usage.