The Human Thymidine Kinase Gene Promoter DELETION ANALYSIS AND SPECIFIC PROTEIN

We report a functional analysis of the human thymidine kinase (tk) gene promoter. We have linked the tk promoter to the chloramphenicol acetyltransferase (CAT) gene to allow direct measurement of promoter strength by assaying chloramphenicol acetyltransferase enzyme activity after transfection into mouse L cells. Putative transcription elements have been identified by deletion and mutation analysis of this pro- moter. The promoter relies primarily on two ”CCAAT” elements and a series of “GC” elements found farther upstream. Two-thirds of promoter activity is main- tained by a construct containing 139 base pairs of sequence upstream of the initiation of transcription that contains only one GC and one of the CCAAT elements. In addition, an evolutionary comparison identifies two highly conserved promoter elements: the -40 CCAAT element and a “TATA” element located at -21. We have further characterized both CCAAT elements using a mutational as well as protein binding analysis. From this study we have determined that both the -70 and -40 CCAAT elements bind strongly to the same factor, with a slightly higher affinity for the -40 CCAAT. Competition studies suggest that the CCAAT factor that binds to this promoter is homologous to protein nuclear factor Y, which binds to the major histocompatibility complex class I1 Ea! gene promoter. In addition, either CCAAT element is capable of sup-plying almost as much promoter strength as is supplied in the presence of both.

Thymidine kinase is a crucial enzyme in the salvage pathway for thymidine triphosphate formation. This enzyme activity is cell cycle-regulated, increasing during S phase to meet the demands for TTP in DNA replication and declining following S phase (1). Regulation of tk gene expression has been shown to occur at multiple levels. Regulation of expression during terminal differentiation appears to occur, at least in part, by a post-transcriptional mechanism (2). Stewart et al.
(3) have shown that the increase in tk mRNA levels following release from serum starvation is due in part to a post-transcriptional mechanism. However, several studies using either the nuclear run-on transcription assay (3) or various recombinant DNA constructs (4) also showed a significant level of transcriptional regulation in serum-starved cells. We wish to define the promoter elements responsible for tk regulation as well as to characterize the constitutive aspects of this promoter. As an initial step we have carried out a deletion analysis to identify regions that are functionally important for transcription of this gene. In addition, protein binding studies were carried out to further characterize a region that is highly conserved and appears to be important for expression and possibly regulation of tk transcription.
A number of specific DNA-binding proteins have been discovered that interact with eukaryotic promoters to help facilitate and/or regulate transcription (5-15). Spl and its binding sequence "GGGCGG have been suggested to be important for constitutive expression of genes (reviewed in Ref. 9). There are a number of different transcription factors that bind to CCAAT sequences (6, 10, 12-15). However, although CCAAT elements have been shown to be integral parts of constitutive promoters, such as the HSV' tk promoter (6), some members of a group of proteins that bind to this core sequence (5) have been suggested as functioning in the developmental regulation of transcription (10, 13). To date, at least five distinct CCAAT element binding proteins have been identified (CTF (ti), CBP (6), NF-Y (lo), NF-Y* (lo), and CTF-displacing protein (11)). Other CCAAT factors have been defined as CP1 and CP2 (12), but the exact relationship of these factors to those previously mentioned is not clear. The reason for the existence of several proteins that interact with the same core CCAAT sequence is not known. CTFdisplacing protein has been suggested to function as a downregulator of transcription in the sea urchin histone gene H2B-1 by preventing binding of CTF in nonexpressing tissues (11). Variation in CCAAT binding factor(s) that bind to the human tk promoter has been reported in response to serum stimulation of BALB/c/3T3 cells (13). Previously we have reported the sequence of the human thymidine kinase promoter and identified several putative transcription elements by sequence similarities (16). In addition to a TATA element located at -21, several GGGCGG motifs were found, as well as two CCAAT sequences located at positions -40 and -71. These studies implicate several of these sequences as being functional promoter elements by deletion analysis and present a protein binding analysis of the CCAAT elements.

The Human Thymidine
Kinase Gene Promoter

MATERIALS AND METHODS
Construction and Assay of Promoter Mutants-We have previously demonstrated that a fragment of the human tk promoter, extending from position -457 upstream of the cap sites to position +35 within the 5"noncoding region of the first exon (presented in Fig. I), was capable of efficiently directing transcription of the chloramphenicol acetyltransferase (CAT) gene when transfected into mammalian cells (16). We have excised the tk promoter and CAT gene from the construct used for this analysis (TKCAT (16)) and inserted it into the BamHI site of pUC118 (17) to take advantage of the restriction cleavage sites available in the pUC118 multilinker.
Deletions were created from the 5'-end of the promoter using exonuclease 111 and S1 nuclease by the method of Henikoff (18) as modified by Hoheisel and Pohl (19). XbaI digestion was used to linearize the DNA in conjunction with SphI digestion of the multilinker to minimize deletions in the reverse direction that might damage the sequencing primer binding site. Escherichia coli MV1193 was transformed with the final ligation mixture and individual colonies were picked. Single-stranded DNA was prepared by infecting 1ml cultures of the recombinant bacteria with the M13 helper phage M13K07 (17). The resulting packaged DNA was then isolated by the standard M13 miniprep protocol (20). The recombinants were then analyzed by standard dideoxy sequencing protocols (20). Recombinant plasmids from mutants of interest were obtained by large scale double-stranded plasmid preparations and purified through two successive cesium chloride/ethidium bromide equilibrium centrifugations (21).
Oligonucleotide-directed point mutations of the proximal CCAAT box were created in several of the deletion mutants using the method of Kunkel et al. (22). The oligonucleotide 5"CTCGTGATTTGC-CAGCACGC produces a point change of a G to a T ( u n d e r h a ) in the original CCAAT. Mutation frequencies were between 10 and 50% and were assayed directly by dideoxy sequence analysis as described above.
Promoter strengths were characterized by parallel transfections of each of the recombinants into L929 cells (ATCC CCL1) using the DEAE-dextran method (23). Cell lysates were prepared and assayed for chloramphenicol acetyltransferase activity as described (23). In several experiments, to correct for slight variations in the transfection efficiencies from one sample to another, 15 pg of sample DNA was co-transfected with 5 pg of plasmid pXGH (24). The expression of the human growth hormone gene contained in the pXGH was assayed by radioimmunoassay (the Allegro kit, Nichols Institute) and used to normalize the chloramphenicol acetyltransferase activities.
Labeling and Isolation of DNA Fragments for Binding Assays-Our initial studies utilized the insert from ptk167, a clone containing the HinfI-RsaI fragment (-133 to +34, see Fig. 1) of the human tk promoter, blunt-ligated into the SmaI site of pUC119. The insert fragment (designated tk167) was 3'-end-labeled for DNase I and band shift assays by digestion with either EcoRI or BamHI (which cut in the polylinker on opposite sides of the insert) and repair of the overhang using a Klenow fill-in step (21) with either [ c Y -~~P ]~A T P or [cY-~'P]~GTP (3000 Ci/mmol). The DNAs were then phenol-extracted, ethanol-precipitated, and redigested with either BamHI or EcoRI, respectively, to excise the insert. These fragments were then purified using a 6% polyacrylamide gel and soak-eluted (21). Later studies utilized double-stranded oligonucleotides, representing either the proximal (tkP) or distal CCAAT (tkD) sequences (see Fig. 1 and Table I). Oligonucleotides were synthesized on a Coder 300 oligonucleotide synthesizer and purified by electrophoresis on a 14% polyacrylamide, 7 M urea gel. Equal amounts of the two strands of the respective oligonucleotides were annealed and 5'-end-labeled using [Y-~'P]ATP by a polynucleotide kinase reaction as described in Maniatis et al. (21) or end-repaired using [w3'P]dATP in the presence of the Klenow fragment of DNA polymerase I. Preparation of Nuclear Extracts-Nuclear protein extracts were prepared from logarithmically growing HeLa S3 (ATCC CCL2.2) monolayers grown in Dulbecco's modified Eagle's medium with 10% fetal calf serum essentially as described by Dignam et ai. (25). The cells were rinsed three times with phosphate-buffered saline, detached from the plate with a cell scraper, and centrifuged at 500 X g for 5 min. The pellet was resuspended in Nonidet P-40 lysis buffer (10 mM Tris (pH 7.4), 10 mM NaC1, 3 mM MgC12, 0.5% Nonidet P-40) and incubated for 5 min at 4 "C. The nuclei were pelleted at 500 X g for 5 min and washed with Nonidet P-40 lysis buffer. The nuclei were then resuspended in 0.35 M NaCl, 5 mM Na-EDTA, 10 mM 2mercaptoethanol, and 10 mM Tris (pH 7.5) and incubated at 4 "C for 30 min to dissociate non-histone chromatin proteins. The suspension was centrifuged at 10,000 X g for 10 min, and the supernatant was collected for binding studies.
Band Shift Assay-Band shift assays were carried out with minor modification of the method of Fried and Crothers (26). Binding reactions were performed in 20-p1 volumes. All reactions contained 2 p1 of nuclear extract (approximately 8 pg of total protein), 2 pg of poly(d1-dC), and 1 X binding buffer (0.05% Nonidet P-40, 4% glycerol, 1 mM Na-EDTA, 10 mM 2-mercaptoethanol, 10 mM Tris (pH 7.5)). The NaCl concentration was adjusted to a final concentration of 70 mM. A 10-min preincubation was conducted, followed by incubation for 20 min with labeled DNA. All reactions were performed at room temperature. This reaction was electrophoresed on a 6% nondenaturing polyacrylamide gel, and the gel was dried and autoradiographed.
To assess relative binding affinities of various DNA sequences for the CCAAT factor, a competition assay was utilized. For these studies a band shift assay was carried out as above but with the appropriate molar excess of cold specific competitor DNA added at the same time as the labeled DNA fragment. Reactions were then electrophoresed in parallel and the extent of competition assessed by relative band petitor DNAs.
intensities in the presence and absence of various amounts of com-DNase Z Protection Assay-For the DNase I protection assay (27) binding reactions were carried out as described above. After the 30min binding period, 2 pl of DNase I (15 pg/ml, freshly diluted in 30 mM MgClZ, 1 mM dithiothreitol 20% glycerol) was added, and the DNase reaction was allowed to proceed at 22 "C for 60 s. At this time the samples were diluted and loaded onto a 6% polyacrylamide band shift gel as described above. After electrophoresis was complete the gels were autoradiographed directly without drying. Shifted and unshifted bands were excised and the DNA was electroeluted. The DNA was phenol-extracted and ethanol-precipitated to concentrate the sample. The samples were then denatured and analyzed on a 6% sequencing gel. The marker used in each case was a Maxam-Gilbert G + A sequencing reaction (28).

Deletion
Analysis-To assay for functional elements in the human tk gene promoter, a heterologous minigene was constructed that contains the human tk promoter linked to the CAT gene. Several 5'-promoter deletion mutants were then constructed and assayed to determine their respective activities in L cells (Fig. 1). As upstream sequences are deleted there is a gradual decrease in promoter activity. With one exception, discussed below, each deletion that causes a significant decrease in promoter strength involves deletion of one of the previously predicted GC or CCAAT elements (Ref. 16, Fig. 1B). This suggests that the GC and CCAAT elements are the only promoter elements in this fragment, although we cannot rule out minor contributions from as yet unidentified regions. Looking at the data in Fig. 1 quantitatively, it would appear that the CCAAT elements are responsible for about 30% of the promoter activity and that GC elements may be responsible for the other 70% of the activity.
Because there are multiple copies of all of the transcriptional elements it is difficult to assess the importance of individual elements. We have, therefore, created point mutations in the proximal CCAAT sequences of several of these deletion mutants (Fig. 1). These point mutations completely eliminate the in vitro binding of protein to this CCAAT region (data not shown). Although not a perfectly controlled comparison, looking at the promoter strength of the -58 deletion (which contains only the proximal CCAAT sequence) and -88M (which has the distal CCAAT but a mutagenized proximal CCAAT) suggests that either of the two CCAAT sequences can function in the promoter at approximately the same strength. The point mutation in the proximal CCAAT box of the -58M mutant confirms that the point mutation in the CCAAT box completely eliminates promoter activity from this element. Thus it appears that both CCAAT elements are functional but that the presence of two such CCAAT elements (as in -88, -139 and -251) does not result in twice as much activity. It is also interesting to note that two-thirds of the promoter activity can be supplied by the presence of only one GC and one CCAAT element (-139M, Fig. 1). Therefore the additional elements have relatively small effects, at least in this particular assay. The deletion between -58 and -51 is the only region that affects transcription but does not contain a similarity to any previously defined transcription element. It is possible that there is an undefined element between the two predicted CCAAT sequences. However, due to several factors such as the close spacing between the CCAAT elements, the footprint of the factor that interacts with the -40 CCAAT sequence, as well as binding studies with oligonucleotides (see below), we believe that the actual binding sequence of the proximal CCAAT binding factor may be quite broad and be partially affected by this particular deletion (see Fig. 1).

* T C G C T C C G C C C~~C G~G G A C C G A G G C G G G G C T C A G A C C A~C C C C A C C C C G A T~G C C A C G T C C A T C G C C~G A T T T C~G G C C C T C
Evolutionary Comparison-In addition to the human tk promoter, sequences of two other cellular tk promoters (hamster (29) and chicken (30)) have been determined, allowing for an evolutionary comparison. All three of these promoters contain core "GC" element identities located in various positions upstream from the TATA element (31), suggesting that

The Human
Thymidine Kinase Gene Promoter each of these promoters may rely on Spl binding to help facilitate transcription. The most striking similarity between these promoters, however, is a CCAAT element that lies 15-20 base pairs upstream from the TATA element (Fig. 2, position -40 in the human promoter). In the human and chicken tk promoters, there are other CCAAT sequences that are very similar to the conserved CCAAT sequence (see Fig.  2B for human). However, the presence of only the single CCAAT element in the hamster promoter confirms the need for only one CCAAT sequence and the preference for the -40 position in tk expression. Protein Binding to the CCAAT Element-In order to more accurately define the human tk promoter elements, we wished to determine protein-binding regions in the tk promoter. Although they contribute less than a third of the overall promoter strength, for several reasons we were most interested in characterizing possible protein interactions with the CCAAT elements. The reasons for this interest are: 1) the proximal CCAAT is well conserved between human, hamster, and chicken, with respect both to sequence and position (Fig.  2); 2) by deletion analysis they were shown to contribute significantly to the overall transcription rate of tk (Fig. 1); 3) CCAAT sequence binding factor(s) for this promoter have been suggested to vary in the cell cycle (13); and 4) our binding studies make it clear that this is the strongest protein-DNA interaction occurring within the tk promoter.
For protein binding studies, a 167-base pair promoter fragment (tk167) spanning -133 to +34 of the tk gene was used initially (see Fig. 1). This fragment contains one GC element, both CCAAT element identities, and the TATA element. Using 8 pg of protein from a HeLa nuclear extract, only one band is seen in a band shift assay, although when higher amounts of nuclear extract are used other less prominent bands begin to appear (data not shown). Analysis of the binding region corresponding to the dominant band in a band shift assay using DNase I protection indicates that the -40 CCAAT element and surrounding sequence is protected from DNase I digestion (Fig. 3). Although the exact boundaries of this interaction cannot be determined, the approximate binding domain is marked in Fig. 1. Although we cannot detect A CCCACACCGCCACACCCCCCCGCCCACATCCCCCTCCCCTTGCCACCCCTCCGTCTT ~~~ ~~ I I I I I I I I I I  I I I I I I  I I I I I I  I  I  I I   I  I I I I I  I I I I Table I) and -40 (TK2, included in the tkP oligonucleotide in Table I) CCAAT sequences.

G G C C A A A T C T C C C C C C A C G T C A C C C C C C C G G~C C A T G C C C G C C G C G C
Homology to the consensus is indicated by a dash, and the X represents a base missing in TK2 relative to TK1. protection at any other position, including the GC element identity, the TATA element, or the -70 CCAAT sequence, this does not rule out some binding to the distal CCAAT element. It is possible that the observed shifted band is the result of binding to either of the CCAAT sequences, resulting in identical shifts. If the proximal CCAAT sequence had a higher affinity, it might represent a higher proportion of the molecules and show a footprint, whereas the lower affinity site would be obscured by the background. Other factors may interact with sequences in this fragment; however, they either are less abundant or interact with a lower affinity than the observed binding.
To corroborate this DNase I protection data, doublestranded oligonucleotides were synthesized containing the distal (tkD) and proximal (tkP) CCAAT sites, respectively. These were labeled individually and used in a band shift competition assay to determine whether they bound the same factor and at what affinity (Fig. 4). Both oligonucleotides bound a specific factor effectively, and they both competed with each other. This suggests that they bind the same factor, although tkD showed a somewhat lower affinity than tkP. It is not clear whether the footprint failed to show protection of

The Human Thymidine
Kinase Gene Promoter 2347 the distal site because of preferential binding to the proximal site that excludes binding to the distal site or simply because of a lower binding ratio that resulted in a higher background in that region. In a previous experiment a 20-base pair oligonucleotide (tk20 , Table I) was used in a similar competition assay and found to have no binding capability. The tk20 oligonucleotide was ligated to form predominantly dimers and found to effectively block binding to labeled tk167 in a competition assay (data not shown). This shows that the tk20 oligonucleotide did not contain a complete binding site, but dimerizing it resulted in effective binding. It also showed that the band shift observed with tk167 could be almost completely eliminated by competition with a CCAAT sequence, making it unlikely that other factors contribute significantly to that observed shift. The difference between tk20 monomer (which doesn't bind by itself) and tkP (which does bind) is all at the 5'-end and suggests that the binding region is skewed to the 5'-side of the CCAAT sequence. This is consistent with the deletion data that showed partial loss of activity in a deletion from -58 to -51 (Fig. 1).
Competition by Other Promoter Elements-We have shown that the CCAAT elements are specifically recognized by a DNA binding factor. Since CCAAT binding factors appears to be a heterogeneous group of distinct proteins (5,6, 10, ll), we were interested in characterizing the factor that binds to the proximal CCAAT element with respect to its capacity to bind other promoters. Two promoter fragments were initially chosen for these analyses. The HSV tk promoter was chosen because of general similarities between these promoters (GC elements and inverted CCAAT sequences) as well as its ability to bind both proteins CBP (6) and CTF (5). The SV40 promoter was chosen as a control because it appears to rely heavily upon GC elements and has thus far not been shown to depend on CCAAT sequences for transcriptional activity. As shown in Fig. 5A, a 100-or 200-fold molar excess of the SV40 promoter was incapable of competing for binding to the tk CCAAT element in tk167. As a positive control for this assay, unlabeled tk167 was also used in a 100-and 200-fold molar excess and effectively prevented protein association with the labeled tk167. The result with the HSV tk promoter was somewhat surprising in that it showed only slight competition (Fig. 5B).
To shed further light on the nature of the CCAAT factor involved in this interaction, we carried out a number of other competitions using labeled tkP oligonucleotide and competing it with other known CCAAT sequence oligonucleotides shown in Table I. We find that the rabbit @-globin CCAAT sequence is an ineffective competitor and that the human a-globin promoter CCAAT sequence is only moderately effective (Fig.   6A). However, the MHC class I1 Ea gene CCAAT sequence is a very effective competitor. In a further competition experiment (Fig. 6B) we find that the MHC oligonucleotide is only 2or %fold less effective than tkP at competing for this binding activity.

DISCUSSION
Our analysis shows that although a very small promoter region of the tk promoter is needed for minimal activity, elements found several hundred bases upstream contribute significantly to promoter strength. It appears that the majority of promoter strength can be contributed by the presence of a single GC element along with a single CCAAT element. The additional GC and CCAAT elements then contribute only moderate increases in promoter strength. These data do not eliminate the possibility of some as yet undefined promoter elements being present, but they suggest that other elements are not likely to contribute strongly to tk promoter strength.
A previous deletion analysis has also been reported for the

Thymidine
Kinase Gene Promoter  Each of the other lanes represents the same reaction with either 300or 600-fold excesses of specific competitors used. The competitor oligonucleotides, the human a-globin, rabbit @-globin, and murine MHC I1 Ea gene CCAAT elements are defined in more detail in human tk promoter by Kreidberg and Kelley (32). In their study, a minigene was constructed to contain the human tk promoter linked to the human thymidine kinase cDNA. 5' deletions were made in the promoter region and the resulting deletion mutants were assayed quantitatively for their ability to transform Ltk-cells to the tk+ phenotype. The results from this previous study were quantitatively very different from ours, and we show a much longer effective promoter region and have identified additional transcription elements. We feel that the chloramphenicol acetyltransferase assays presented here more closely reflect actual promoter strength and that an assay involving numbers of tk transformants may be heavily influenced by the need for a merely threshold level of promoter strength. There appear to be at least three types of transcriptional elements required for tk transcription. The TATA element is still only identified on the basis of its position and similarity with previous TATA elements (33) and evolutionary conservation (Fig. 2). GC elements (and presumably Spl protein binding) are indicated as important both because of their close sequence identity with previously described GC elements (33) and because of the decreases in transcriptional activity upon their deletion (Fig. 1). The CCAAT elements at -70 and -40 (the more unusual position for a CCAAT element) appear to be able to play an important part in promoter activity. Evolutionary considerations (Fig. 2) suggest the ancestral importance of the -40 CCAAT sequence. The tendency to duplicate this sequence at other positions of the promoter in human and chicken, however, suggests that the presence of multiple CCAAT elements may prove advantageous for expression of this gene.
We have attempted to define the factor that binds to the human tk CCAAT elements by competition studies using other CCAAT sequences. Although these are complicated by the possibility of promoter elements binding the same factor with very different affinities, these studies suggest a close relationship between the tk CCAAT protein and the previously described NF-Y protein which binds to the MHC class I1 Ea gene CCAAT box (10) and possibly to a CCAAT box in the rat albumin gene promoter (15). Expression of the Ea gene is tissue-specific (10) but not cell cycle-regulated, and NF-Y has a fairly ubiquitous tissue distribution (36), lending no particular support to the idea that this factor may be involved in cell cycle regulation. The thymidine kinase CCAAT factor shows only moderate affinity for sequences from the HSV tk gene or the human a-globin gene, both of which bind to CTF (10). C/EBP (14) is also not a likely binding candidate because of the low binding we observe to the HSV tk gene, as well as poor binding of isolated CBP protein to the human tk sequence (data not shown). A recent paper by Chodosh et al. (12) provides two new terminologies for CCAAT-binding proteins, CP1 and CP2. The exact relationship of these factors to factors such as C/EBP and NF-Y is not clear.
Expression of the human tk gene is highly regulated in a proliferation-dependent manner (1). Although some of this regulation has been shown to be at the level of transcription under certain conditions (3), the promoter has been shown to have a strong constitutive component (3, 34). Since GC elements have been correlated with constitutive, or "housekeeping," gene expression (35), it is likely that the GC elements in the tk gene are an important part of its constitutive expression. On the other hand, CCAAT elements and their various binding proteins have been increasingly implicated in regulated gene expression. The sea urchin histone H2B CCAAT element can bind a CCAAT-displacing protein that blocks gene expression (ll), and a tk CCAAT binding factor has been shown to vary in response to serum stimulation (13). Thus with the identification of a number of independent CCAAT binding proteins (5, 6, 10, ll), there is a strong possibility that the tk CCAAT element could be involved in the observed tk transcriptional regulation (3). However, our data showing only one-third of the tk gene promoter strength associated with the CCAAT elements suggest that this element could only result in a moderate regulation of this promoter.