The Human Histone H2A.Z Gene SEQUENCE AND REGULATION*

The gene encoding the human basal histone variant H2A.Z has been cloned and sequenced. There is a single functional H2A.Z gene with several pseudogene copies. No other histone genes were found in the 3 kilobases of upstream sequence or in the 0.7 kilobase of downstream sequence. proliferating quiescent rate regulation

if health, Bethesda, Maryland 20892 The gene encoding the human basal histone variant H2A.Z has been cloned and sequenced.
There is a single functional H2A.Z gene with several pseudogene copies. No other histone genes were found in the 3 kilobases of upstream sequence or in the 0. A. J., Harvey, R. P., and Wells, J. R. E. (1989) Nucleic Acids Res. 17, 1745Res. 17, -1756. Both have four introns with identical exon-intron borders, but three of the introns in the chicken gene are much longer than those in the human.
The promoter regions of the two genes have little overall homology; however, two GC boxes and one of the CCAAT boxes are conserved.
DNA in chromatin is packaged into nucleosomes, the protein component of which is composed of four families of core histones, H4, H3, H2B, and H2A (2). Whereas three of the four histone families that participate in nucleosome core formation contain only one or several almost identical isoprotein species, the H2A family contains three subfamilies that differ in length as well as sequence (3,4). In mammals, the members of one subfamily, H2A.l and H2A.2, are synthesized primarily in concert with DNA replication during S phase of the cell cycle. The members of the other two subfamilies, H2A.X and H2A.2, are synthesized at lower almost constant levels throughout the cell cycle and, in addition, are synthesized in quiescent cells at a rate 5510% of that observed in proliferating cells. H2A.Z represents -5% of the total H2A in vertebrates, whereas H2A.X represents -10% (4)(5)(6). Although a separate role has yet to be assigned to the basal H2A isoproteins, H2A.X and H2A.Z, such a role is suggested by the observations that these proteins are more highly evolutionarily conserved than their DNA replication-linked counterparts (7)(8)(9)(10)(11)(12)(13)(14) and have also been found to compose a greater proportion of the total H2A in more transcriptionally active chromatin fractions (10,(15)(16)(17)(18) and in certain early embryonic stages (8,9,19,20).
The genes encoding the major core and histone Hl proteins typically lack introns and are present in multiple clustered copies on several chromosomes in mammalian cells (21)(22)(23). These genes are expressed primarily in concert with DNA replication during the cell cycle, producing messenger RNAs with short 5'-and 3'-untranslated regions and a stem-loop structure instead of a poly(A) tail at the 3'-end. Increased transcription of these genes in combination with increases in mRNA transport, processing, and stability brings about a lo-50-fold increase in the steady-state levels of these mRNAs when cells begin DNA replication. This is reflected in the elevated levels of histone production during S phase (24, 25).
The situation for genes encoding basal histone isoproteins is more complicated. Some of these histones, of which H2A.Z is an example, are encoded by single copy genes that contain introns and produce polyadenylated mRNAs which have longer 5'-and 3'-untranslated regions than those of the DNA replication-linked histone mRNAs. Genes of this type also include those for H3.3 (chicken and human) and those for the H2A.Z homologous proteins of chicken, Drosophila, and Tetruhymena 1,and 14,respectively). A second class of basal histone variant mRNAs has been recently characterized (13,29,30). This class includes H2A.X, which is encoded by a gene that has not only the signals for message polyadenylation, but also the conserved sequence motifs that are critical to 3'-end processing of the mRNAs encoded by DNA replication-linked histone genes. Such a gene can produce two sizes of mRNA, one of which is a shorter replicationlinked nonpolyadenylated mRNA and the other a longer polyadenylated mRNA. In this study, we report the sequence of the human H2A.Z gene along with 3 kb' of upstream sequence and 0.7 kb of downstream sequence. The amount of the H2A.Z transcript is unlinked to DNA replication; however, both the amount and rate of synthesis of the H2A.Z transcript are decreased as proliferating cell cultures become quiescent. We have cloned sequences from the 5'-side of the

RESULTS
Human Histone H2A.Z Sequence-The sequence and features of the human gene encoding the basal histone protein H2A.Z are shown in Fig. 1. The H2A.Z mRNA is encoded by five exons, which range in size from 78 to 438 bp and which are separated by four introns ranging in size from 276 to 427 bp. The exon-intron boundaries exhibit proper splice junction consensus sequences (46), with each of the introns having a 5'-GT and a 3'-AG. Within the first several hundred base pairs upstream from the transcription start site, there are the consensus sequences of several known cis-acting transcription elements: a TATA box (47,48), three CCAAT boxes (49,50), and two GC boxes (51,52). Several hundred base pairs farther upstream t,here is another similarly oriented and spaced pair of GC boxes. Just downstream from the poly(A) attachment site is a G/T cluster (GTTGTGTATT), which fits the consensus sequence (YGTGTTYY) observed at the same position in a variety of eukaryotic genes (53, 54).
There are two regions of Alu repetitive sequences (55-57) situated upstream from the H2A.Z gene (core sequences at positions -2670 to -2630 and -1395 to -1355). There do not appear to be any other histone sequences either in this upstream region or in the 739-bp sequence downstream from the end of the mature mRNA.
H2A.Z Gene Organization-In the human genome, it appears that there is one functional gene for H2A.Z and possibly several pseudogenes. When probes prepared from the 5'-UTR or from within the first or fourth introns of the H2A.Z gene are hybridized to Southern blots of restricted human DNA, a single band is observed for each restriction enzyme (Fig.2). In contrast, when a probe prepared from almost all of the H2A.Z cDNA sequence (EcoRI-B&XI fragment of the cDNA, therefore inclusive of only exon sequences between positions 98 and 2157 in Fig. 1) is hybridized to the Southern blot, several additional smaller bands are observed in each case and are likely to represent pseudogene forms of the H2A.Z mRNA sequence (Fig. 2, lanes 2, 6, and 10). Confirmatory evidence for the presence of H2A.Z pseudogene copies was obtained by using the polymerase chain reaction (37) to amplify sequences in human genomic DNA between primers complementary to sites within the fourth and fifth exons of the H2A.Z gene (Fig.  3). DNA amplified from the gene would have an expected length of 729 bp, whereas DNA amplified from an intronless H2A.Z pseudogene would have an expected length of 442 bp. Both products were found. These results indicate that there is at least one and probably several H2A.Z pseudogenes in the human genome.
H2A.Z Gene Transcription-The H2A.Z gene encodes a mRNA that is 869 bases long, not including the poly(A) tail; the cDNA previously sequenced contained 82 A residues, for a total length -950 bases. Measurement of the H2A.Z mRNA against RNA standards yields a length of 970 bases (data not shown). Primer extension to the 5'-end of the mRNA (Fig. 4) shows that the major transcription start site is 2 bases upstream from the 5'-end of the cDNA in the clone that we sequenced previously (11).
The amount of the H2A.Z mRNA is not dependent on DNA replication (Fig. 5, lanes a and b), consistent with the inde-pendence of H2A.Z protein synthesis from replication (5). This is in contrast to the well-described behavior of H2A.l mRNA, which greatly decreases when DNA replication is inhibited (24,25). We had previously shown (6) that although H2A.Z protein synthesis is independent of DNA replication, it decreases IO-20-fold when proliferating cell cultures become quiescent. Fig. 5 also shows that the H2A.Z mRNA level does not seem to be affected when protein synthesis is inhibited (lane c) or when both protein and DNA syntheses are inhibited (lane d). Fig. 6 presents an experiment in which H2A.Z mRNA levels were measured in hamster ovary cells which were synchronized at the beginning of S phase and allowed to progress through S phase in the absence of serum. Under these conditions, cytofluorometric measurements show that the cells progress in synchrony through S phase, with almost all the cells having a tetraploid amount of DNA by 9 h; the cells divide once and enter quiescence, maintaining a diploid amount of DNA (data now shown). Thymidine incorporation studies also show that DNA synthesis is completed by 8-9 h.
H2A.Z mRNA levels are elevated until 18 h, but decrease at 21 h to 10% of the maximal level and to 5% at 24 h (Fig.  6). Nuclear run-on studies in the same system show that transcription rates are similar at 6 and 12 h, but decreased 3-5-fold at 26 and 57 h (Fig. 7). These results are consistent with a IO-20-fold decrease in the rate of H2A.Z protein synthesis and indicate in particular that the decrease in H2A.Z mRNA level between 18 and 24 h is due in part to a decrease in the rate of H2A.Z gene transcription.
In addition, the results suggest that in these cells under these conditions, the switch into a quiescent condition may take place well after the end of S phase and during a rather narrow time interval.
H2A.Z Gene Promoter-The H2A.Z gene promoter sequences were delimited by CAT assays (Fig. 8). When CAT constructs containing various fragments of the upstream region were transfected into human lung fibroblast (IMR-90) cells, the construct containing 234 base pairs of upstream sequence produced the maximal level of CAT activity. This construct (Fig. 8, F) included the TATA box, the three CCAAT boxes, and the two GC boxes nearest the transcription start site. Construct C, which included only the TATA box, produced 18% relative activity, whereas construct D, which included the TATA box plus the most proximal CCAAT box and GC box, produced 51% relative activity. In construct dS, the three CCAAT boxes and two proximal GC boxes were deleted, and the distal pair of GC boxes was moved closer to the TATA box; nevertheless, this construct exhibited only a low level of promoter activity, a level similar to that elicited by construct C, which contained only the TATA box. The longer constructs I, M, and S, which included all five proximal putative promoter elements plus the more distant pair of GC boxes, for as yet undetermined reasons, exhibited reduced levels of CAT activity. We intend to explore the possibility that so-called silencer sequences may exist in this upstream region (58). These results localize the H2A.Z core promoter to the region within 234 bp of the transcription start site, a region which contains known promoter signals, a TATA box, three CCAAT boxes, and two GC boxes. DNA was digested with EcoRI (lanes I-41, PuuII (lanes 5-8), or RstXI (lanes 9-12); electrophoresed on 1% agarose gels; and Southern-hlotted to nitrocellulose. Lanes I, 5, and 9 were probed with a DNA fragment containing the 5'-UTR (positions l-102 in Fig. 1) of the human H2A.Z cDNA. Lanes 2, 6, and IO were probed with the coding portion and the 3'-UTR of the human H2A.Z cDNA (EcoRI (position 98) to &XI (position 2157)). Lanes 3, 7, and 11 were probed with a DNA fragment from the first intron (positions 154-328 in Fig.  1). Lanes 4,8,and 12 were probed with a DNA fragment from the fourth intron (positions 147441733 in Fig. 1). Note that probes from the 5'.UTR as well as from the first and fourth introns hybridize DNA fragments of the expected sizes seen in the restriction map of the sequenced gene; additional hands result when the H2A.Z cDNA is used as the probe.    Total RNA was isolated at the times indicated from cultures of hamster ovary cells synchronized to progress through S phase into quiescence. RNA preparation, Northern analysis, and cell synchronization were as described under "Materials and Methods." Quantitation was by densitometry with a Beckman DU-8B spectrophotometer. Nuclei were isolated at the times indicated from cultures of hamster ovary cells synchronized to progress through S phase into quiescence. Nuclear preparation, runon analysis, and cell synchronization were as described under "Materials and Methods." Quantitation was by densitometry with a Beckman DU-8B spectrophotometer.  Fig. 1) is a G/T cluster observed in the same position in a variety of other eukaryotic genes and thought to be involved in 3'-mRNA end formation and polyadenylation (53,54). However the H2A.F gene does not appear to have a similar sequence in this position. Comparison of the 171 base pairs of the reported upstream sequence proximal to the H2A.F coding sequence to the similarly positioned upstream sequence of the H2A.Z gene brings out several significant observations. The overall homology of these regions is very low, but two GC boxes are found in the same orientations and relative positions upstream from the transcription start site (Fig. 9). Interspersed with the CC boxes, which are potential binding sites for the Spl trans-acting factor (59-61), are CCAAT boxes which may be the binding sites for any of several CCAAT box-binding transcription factors (62, 63). Human H2A.Z has three such sequences, whereas chicken H2A.F has one (Fig. 9). The human H2A.Z gene promoter has the consensus TATA box sequence situated in the proper position relative to the transcription start site (Fig. 9), whereas Dalton et al. (1) have suggested that the chicken H2A.F gene may utilize a degenerate TATA box sequence that is located 24 base pairs upstream from the transcription start site of this gene.
The 3'untranslated regions of the three H2A.Z mRNAs from cow, rat, and human showed 98% nucleotide positional identity, which is even higher than that found in the coding regions (11). Whereas all three mRNAs have a polyadenylation signal sequence in the same position near the 3'-end of the untranslated region, the sequence of the bovine H2A.Z cDNA extended at least 142 nucleotides farther downstream. Comparison of this sequence to that of the human H2A.Z gene shows that the first 22 nucleotides are identical in sequence except for one substitution (data not shown). Farther downstream, the overall homology decreases, with blocks of mismatched sequence alternating with blocks of identical sequence, the most distant block of identical sequence being an 18-nucleotide stretch starting 100 bp beyond the 3'-end of the H2A.Z mRNA (position 2291 in Fig. 1). The chicken H2A.F mRNA 3'-UTR, by contrast, does not have a high homology with the mammalian sequences. The observation that the sequences of the H2A.Z 3'-UTRs have been selectively maintained in common between mammals but not between more distant vertebrates suggests that these sequences evolved and were conserved for a specific, albeit yet unknown, functional role in these genes or their mRNAs in mammalian cells. Interestingly, these 3'-UTR sequences have not diverged between mammalian and avian species in the histone H3.3 gene. In this case, the first 500 base pairs downstream from the end of the coding sequence in chicken and in human are 85% conserved in nucleotide positional identity (26-28).
The H2A family of histones is unique among the four core histone families in containing three closely related subfamilies that have been conserved as separate entities during evolution. This relationship was recently confirmed with the isolation and sequencing of the human H2A.X cDNA. Whereas the sequence of the first 120 residues of H2A.X is almost identical to that of human H2A.1, its carboxyl-terminal sequence is homologous to those of the histones H2A of yeast and several other lower eukaryotes (13), results which suggest that this H2A carboxyl-terminal motif is present in all eukaryotic species.
Differences between the H2A.Z and H2A.l polypeptide sequences are more extensive compared to the differences between the H2A.X and H2A.I sequences, resulting in an overall amino acid positional identity of -60%. Harvey et al. (8) reported the nucleotide sequence of a cDNA encoding what is now recognized to be an H2A.Z homologous protein in chicken (H2A.F). Partial or complete sequences for HBA.F/ Z homologous proteins have been obtained for human, cow, rat, trout, Drosophila (H~AvD), sea urchin (HZA.F/Z), and Tetrahymena (hvl) (9)(10)(11)(12)64  teins. In addition, tryptic peptide analysis shows that the acellular slime mold Physarum has an H2A.Z homologue (65). Sequence comparisons show that there are three blocks of polypeptide sequence that have greater than 80% homology between H2A.l and HBA.F/Z separated by blocks of sequence that have less than 45% homology. Although how the function of the H2A.X and H2A.Z isoproteins differs from that of H2A.1/2 is not known, the former pair have been found to be enriched in transcriptionally active chromatin fractions (10,(15)(16)(17)(18) and in certain early embryonic stages (8,9,19,20).
The histones that are synthesized in concert with DNA replication are encoded by mRNAs that have a stem-loop structure at the 3'-end which links their stability to ongoing DNA synthesis. The genes of these histones also contain a sequence just downstream from the mature 3'-end of the mRNA that binds a U7 small nuclear ribonucleoprotein and which is involved in the processing of these mRNAs (66, 67). For basal histones, it is now apparent that two related mechanisms are involved in the independence of basal histone synthesis from DNA replication. Some histone isoproteins, such as HPA.Z-and H3.3-related species, are encoded by more typical poly(A) mRNAs. However, others, such as H2A.X and some representatives of the other histone families (29, 30), are encoded by genes that can generate two types of mRNAs, one terminating with poly(A) and the other with the stemloop structure, bestowing replication-linked stability on the mRNA. The result of this second mechanism is that the same gene can generate histone in both replication-linked and replication-independent manners, whereas the first mechanism results in only replication-independent synthesis.
The level of H2A.Z and other histone mRNAs is greatly decreased in quiescent cells compared to proliferating cells. For the H2A.Z mRNA, the level decreases lo-20-fold, as expected from the lower level fo H2A.Z protein synthesis under these conditions. Our results suggest that both transcriptional and post-transcriptional components are involved in the change in mRNA level since the -3-fold change in the rate of transcription cannot completely account for the lo-20-fold change in mRNA level. The change seen for H2A.Z gene transcription is comparable to that reported for the replication-linked histone genes between G1 and S phase, even though in this case the change in mRNA level may be 50-fold. Transcription of the DNA replication-linked histone variants is also changed as cells go from a proliferating state to a quiescent state. Stein et al. (68) have reported that the transcription of the DNA replication-linked human H4 genes drops to undetectable levels when HL-60 cells are terminally differentiated by treatment with 12-O-tetradecanoylphorbol-13-acetate (68). The drop in H4 mRNA level is ascribed to a shutdown of gene transcription that may result at least in part from the loss of binding of a specific trans-acting factor within the proximal promoter (68, 69). However, previous work from this laboratory (70) had shown that quiescent histone synthesis including H4 synthesis was occurring in HL-60 cell cultures that had been grown in the presence of 12-O-tetradecanoylphorbol-13-acetate for 7 days. In the case of HZA.Z, our results indicate that as IMR-90 cells change from a proliferating state to a quiescent state, the H2A.Z gene promoter is down-regulated, but not shut off, and that the observed reduction of H2A.Z mRNA during this transaction is due to both transcriptional and post-transcriptional regulatory mechanisms.
The promoter regions of the human and chicken genes are not highly conserved. However, within the proximal upstream sequence, the two genes have in common two GC boxes, potential Spl transcription factor-binding sites, in similar positions and in the same orientations. There is also a CCAAT box in each gene between a putative TATA box and the most proximal GC box. Farther upstream from the H2A.Z gene there is another similarly oriented and spaced pair of GC boxes. Similar arrangements of the gene-proximal &-acting transcriptional sequence elements are observed in the promoters for herpes simplex viral and human thymidine kinase as well as for human cY-globin (49,62). Whereas the functionality of any specific promoter sequence motifs must await further studies, the localization of the functional promoter of the H2A.Z gene to this region coupled with the conservation of several putative promoter elements between the human and chicken genes suggests that they are important in the regulation of this gene.