Conserved Cut Repeats in the Human Cut Homeodomain Protein Function as DNA Binding Domains*

Homeodomain-containing proteins are believed to function as sequence-specific DNA binding proteins, regulating gene expression. Specificity of sequence recognition is conferred by the homeodomain acting either alone or in conjunction with other conserved DNA binding domains as is the case for Pou domain and Paired domain proteins. The recent isolation of cDNAs encoding mammalian homologues of the Drosophila Cut homeodomain protein has revealed that the 72-amino acid Cut Repeats are conserved in evolution. We have investigated the biochemical activity of human Cut Repeats by expressing fusion proteins containing glutathione S-transferase linked to various combinations of Cut Re- peats and Cut homeodomain. We show by gel retardation and DNase footprinting assays that Cut Repeats can function as DNA binding domains, either independently or in cooperation with the homeodomain. The binding aWnity (KD) to a specific recognition site was estimated to be 8 x M for Cut Repeat 3 and 4 x 10”O M for Cut Repeat 1. When both Cut Repeat 3 and the Cut homeodomain were present in the fusion protein, the binding aWnity was increased to 4 x loell M. These results define a novel class of proteins that contain in addition to the homeodomain a second conserved protein domain, the Cut Repeats, that also function as a DNA binding domain.

The molecular structure of the homeodomain is defined by three a-helices with a flexible amino-terminal arm (5-7). The latter two a-helices form a helix-turn-helix structure analogous to that of many prokaryotic transcription factors; however, the physical interactions of the homeodomain with DNA appears to be distinct form that of prokaryotic transcription factors (6-9).
Helix three, often referred to as the recognition helix, makes several base contacts with the DNA, and as a consequence is responsible for much of the binding specificity. The amino-termind arm makes contact in the adjacent minor groove (5-7).
Homeodomains have been shown to bind to sequences containing a TAAT core, although binding sites with divergent sequences have also been identified (8).
Homeodomain proteins constitute a large family divided in several classes on the basis of amino acid sequence comparison within their homeodomain and also on their association with another conserved DNA binding domain. Examples of this are the POU and the Paired classes of transcription factors which harbor the POU-specific or the Paired domain, respectively. The POU-specific domain is a 75-82-amino acid domain which is separated from the homeodomain by a short variable linker region (10) (reviewed in Ref. 11). Altogether these three protein segments form what is called the POU domain, originally found in the pituitary-specific Pit-VGHF1, the Oct-1, and the Oct-2 mammalian transcription factors and the Cuenorhabditis elegans cell lineage control gene unc-86 (POU: pit, oct, unc) (10, 12,13). The Pou homeodomain is the main determinant of binding specificity, but DNA binding requires the participation of both the Pou-specific domain and the homeodomain, since neither of these domains alone can bind to DNA with high affinity (14,151. The Paired domain is a 128-amino acid domain encoded by the paired box originally identified in the Drosophila segmentation genes paired andgooseberry (16,171. The paired box was subsequently detected in mammalian Pax genes, as well as in genes from other vertebrates (reviewed in Ref,18). At least three mouse mutant phenotypes have been linked to molecular defects in one of the Pax genes: undulated (Pax-1), splotch (Pax-31, and small eye (Pax-6) (19-21). Molecular analysis of specific DNA binding by Paired proteins indicate that the paired domain can function either autonomously or through molecular interaction with the homeodomain, in which case the binding specificity is modified (17).
The Drosophila Cut protein and its mammalian counterparts, the human CCAAT-displacement protein or the canine Clox (Cut like homeobox) protein, appear to belong to a unique class of homeodomain proteins (22)(23)(24). They are the only homeodomain proteins having a histidine at the ninth amino acid of the third helix. This amino acid has been shown in some proteins to determine the specificity of binding to the two bases following the TAAT core, and distinct classes of homeodomain proteins contain different amino acids at this position (2). The biochemical activities as well as the biological function of Cut in fly or in mammals remains to be defined, but the pattern of expression and the phenotype of mutants in Drosophila suggests that it is involved in cell specification in several tissues (22,(25)(26)(27). Thus in lethal cut mutants, embryonic cells normally destined to become external sensory organs instead differentiate into internal sensory (chordotonal) organs, while other cut mutants fail to develop Malpighian tubules or present wing malformations ("cut wing") (26,(28)(29)(30). Cut proteins contain three conserved Cut Repeats which were originally identified by sequence analysis in Drosophila Cut. Cut Repeats are 72-amino acid motifs which share from 52 to 63% amino acid identity with each other in Drosophila . The mammalian Cut proteins present a similar structural organization with three Cut Repeats followed by a homeodomain. Interestingly Cut Repeat 3 in the fly and the human proteins are more similar to one another, with 70% amino acid identity, than to any other Cut Repeat within the same protein.
The high degree of conservation of Cut Repeats suggested that they may have an important biological function. We have investigated the biochemical activity of Cut Repeats by expressing in bacteria and purifying fusion proteins containing the glutathione S-transferase linked to various combinations of Cut Repeats and Cut homeodomain. We show that Cut Repeats can function as DNA binding domains in vitro, independently or in cooperation with the homeodomain. These results define a novel class of homeodomain proteins that contain another conserved motif, the Cut Repeats, serving as DNA binding domains.
MATERIALS AND METHODS Plasmid Construction-Plasmids for expression of the glutathione S-transferase (GST)'/Cut fusion proteins were prepared by inserting into the bacterial expression vector pGEX-3X (Pharmacia LKB Biotechnology Inc.) various fragments derived from cDNAs for the human Cut protein. The sequence of the human Cut protein has been published as the human CCAAT-displacement protein (CDP) and the cDNA sequence (HSCDP) can be obtained from GenBank, accession no. M74099 (23). The nucleotide and amino acid numbers used hereafter are taken from this cDNA sequence and ita deduced amino acid sequence. Cut Repeat 1: EcoRl (nt 1605)-BamHI (nt 2019) fragment treated with Klenow and inserted into the SmaI site of pGEX-3X (the EcoRI site was added at nt 1605 during cDNA cloning). Cut Repeat 2: CauII (nt 289O)-CauII (nt 3413) fragment into the SmaI site of pGEX-3X. Cut Repeat 3: Cad1 (nt 3413)-RsaI (nt 3737) fragment inserted by blunt ligation into the EcoRI site of pGEX-3X, after treatment of both the vector and the fragment with Klenow. Cut Repeats 2+3: RsaI (nt 2861)-RsaI (nt 3737) fragment into the EcoRI site of pGEX-3XX, after treatment of the vector with Klenow. Cut Repeat 3+Homeodomain: EcoRI (nt 3061)-EcoRI (nt 5374) into the SmaI site of pGEX-3X note that these two EcoRI sites were added to that particular cDNA clone during cloning and that the stop codon is situated at nt 4559. Cut Homeodomain: BstXI (nt 3625)-ApoI (nt 3963) fragment treated with T4 DNA polymerase and inserted into the SmaI site of pGEX-3X.
Expression and Purification ofthe Fusion Proteins-Plasmid vectors expressing GSTICut fusion proteins were introduced in the DH5 strain of Escherichia coli. Induction of expression and purification of GST fusion proteins were done as previously described (31). Glutathione-Sepharose was purchased from Pharmacia (catalog no. 17-1756-01).
DNA Filter Binding Assapsheared salmon sperm DNA was endlabeled with [y-32PlATP using T4 polynucleotide kinase. Two ng of labeled DNA were incubated with the indicated amount of Cut Repeat fusion protein or bovine serum albumin for 20 min at 25 "C, in 20 nw Tris-HC1, pH 7.5, 10% glycerol, 0.1 ~l l~ dithiothreitol, 10 n m KCl, and The abbreviations used are: GST, glutathione S-transferase; nt, nucleotide; PCR, polymerase chain reaction; EMSA, electrophoretic mobility shiR assay. 0.1 m phenylmethylsulfonyl fluoride, in a final volume of 50 pl. The solution was then vacuum sucked through a nitrocellulose filter which was then washed with 2 ml of the binding solution. The counts retained on the filter were counted in a LKB liquid scintillation counter.
Polymerase Chain Reaction (PCR)-mediated Random Oligonucleotides Site Selection-Radiolabeled, double-stranded oligonucleotides were used in electrophoretic mobility shiR assay (EMSA) with the GST/ Cut Repeat 3 fusion protein. The oligonucleotides used contained 22 random nucleotides flanked by 15 nucleotides of defined sequence on either side to permit annealing ofthe primers for PCR amplification and cleavage by restriction enzymes for cloning. Cut Repeat 3 binding sites were selected by isolating the lower mobility protein-DNA complex separated by polyacrylamide gel electrophoresis, followed by PCR-amplification of the isolated DNA for subsequent EMSA. After four cycles of amplification and selection, selected oligonucleotides were digested with EcoRI and BamHI and cloned into the corresponding sites of the Bluescript KS vector (Stratagene). Several clones were obtained and 25 were sequenced. No clear consensus binding site could be derived; however, some of these oligonucleotides, in particular C3S, proved to be good binding sites. We are currently repeating the site selection procedure with the hope that a consensus binding site could be derived after more cycles.
DNA for Binding Assays-The sequence of C3S oligonucleotides is as follows: 5'-AAAAGAAGCTTATCGATACCGT-3'. This sequence was obtained by PCR-mediated random site selection using a GST/Cut Repeat 3 fusion protein.
EMSA-EMSA were performed with either 10 or 100 ng of fusion protein previously purified on glutathione-Sepharose beads. Samples were incubated at room temperature for 5 min in 25 nw NaC1, 10 nw Tris, pH 7.5, 1 nm MgC12, 5 nw EDTA, pH 8.0, 5% glycerol, and 1 m M dithiothreitol, in a final volume of 20 pl, with the indicated amount of either poly(d1-dC) as nonspecific competitor or the same, cold, oligonucleotides as specific competitor, when specified. Five pg of bovine serum albumin were included in the reaction mixture. End-labeled double-stranded oligonucleotides (20,000 cpm, -10 pg) were added, and samples were further incubated for 15 min. Samples were loaded on a 5% polyacrylamide gel (30:l) and separated by electrophoresis at 8 V/cm for 2 h in 0.5 x Tris borate-EDTA. Gels were dried and visualized by autoradiography.
DNase Footprinting Assay-The C3S sequence was introduced in the Bluescript KS vector (Stratagene). The recombinant plasmid was 32P end-labeled at the BssHII site with T4 polynucleotide kinase and cleaved with XbaI. After electrophoresis through a 4% polyacrylamide gel, the labeled fragment was pusled by passive elution in 10 nm Tris-HC1, pH 7.5, 1 nm EDTA. DNase footprinting was carried out as described elsewhere (32, 33).
Calculation of the Binding mnity-Essentially, EMSA were performed as described above, but using a fixed amount of DNA (510 PM), a wide range of protein concentrations, and with the following modifications: less than 10 PM of DNA was used and protein and DNA were incubated for 1 h at 4 "C. The binding affinity (ICn) was calculated using the method described by Janet Carey (34,35). The amount of free and bound DNA was quantitated by scanning of the autoradiograms on a Phosphoimager (Fuji). Scintillation counting of the excised bands in one case gave similar results. The data was plotted as the fraction of free DNA uersus log of protein concentration. The protein concentrations did not take into account the fraction of inactive proteins which were estimated in independent experiments to be less than 30% in each case.

RESULTS
A cDNA encoding the human homologue of the Drosophila homeodomain Cut protein was obtained from a cDNA expression library prepared from placenta and screened with radiolabeled oligomerized oligonucleotides containing the MElal sequence (GGAAAAAGMGGGAGGGGAGGGATCC) from the c-myc promoter2 The sequence of this cDNA is identical to that of the recently reported cDNA for the CCAAT-displacement protein (23). To investigate the function of the Cut Repeats, we subcloned various segments of the cDNA into the bacterial vector pGEX-3X expressing the GST (Pharmacia). Fusion proteins were expressed in bacteria and purified by affinity chromatography over a glutathione-Sepharose column. Proteins used in this study are schematically represented in Fig. 1A and D. Dufort and A. Nepveu, manuscript in preparation. will be named, thereafter in the text, by the domain(s) of the human Cut protein that they harbor. The purified fusion proteins were visualized by Coomassie Blue staining following SDS-polyacrylamide gel electrophoresis (Fig. 1B 1

1). Cut Repeat 2 (C.R.2), Cut Repeat 3 (C.R.3), Cut Repeats 2+3 (C.R.2+3), Cut homeodomain ( H D ) , Cut Repeat 3+Homeodomain (C.R.3+HD).
teolytic cleavage products, which is to be expected with a protein of this size (104 kDa). We reasoned that if the conserved Cut Repeats carried a biochemical function, they would probably do so by interacting with either DNA, RNA, or proteins. To establish if Cut Repeats could interact with DNA, a DNA filter binding assay was performed (Fig. 1C). Fusion proteins containing either Cut Repeat 1,2, or 3 significantly retained DNA, whereas a protein containing only glutathione S-transferase (data not shown) did not retain DNA over control bovine serum albumin ( Fig. 1 0 . To identify a high affinity binding site for the Cut Repeat 3 fusion protein, we performed the procedure of PCR-mediated random site selection. After four cycles of amplification and selection, selected oligonucleotides were cloned into the Bluescript KS vector (Stratagene) and 25 clones were sequenced. Although some of these oligonucleotides, in particular C3S, prove to be good binding sites, no clear consensus binding site could be derived. We are currently repeating the site selection procedure using a larger amount of different nonspecific competitors and performing more selection cycles to verify whether a consensus binding site could eventually be derived.
To establish whether distinct Cut Repeats would bind to the C3S sequence, we performed EMSA using a panel of Cut fusion proteins and double-stranded oligonucleotides encoding the C3S sequence (Fig. 2). A single retarded complex is observed with fusion proteins containing either Cut Repeat 1, Cut Repeat?, Cut Repeats 2+3, the homeodomain, and Cut Repeat 3+homeodomain. In every case, as little as 10 ng of purified protein was sufficient to obtain a strong signal, indicating that each of these proteins bind to this sequence with relatively high affkity. In contrast to this, no specific retarded complex was obtained using 10 or 100 ng of the GSTICut Repeat 2 fusion protein or up to 500 ng of GST alone (Fig. 2, lane 3; data not peat 3+homeodomain preparation showedrelatively more pro-shown). This result indicates that Cut Repeat 2 does not bind to the C3S sequence. It remains to be determined whether Cut Repeat 2 can interact specifically with other, as yet unidentified, DNA sequences.
To assess the specificity of interactions between Cut Repeats and the C3S sequence, binding reactions were performed in the presence of an excess of either specific or nonspecific competitor (Fig. 3). The retarded complex was reduced in each case in the presence of 10 ng of cold C3S double-stranded oligonucleotides and completely eliminated when 100 ng of C3S oligonucleotides were included in the reaction mix. In contrast, 100 ng of poly(d1-dC) did not affect binding by Cut Repeat 1 and Cut Repeats 2+3, whereas a slight reduction was observed with Cut Repeat 3 alone. These results demonstrate that binding of the Cut Repeats to the C3S oligonucleotides is specific. In contrast, the interaction of the homeodomain with the C3S oligonucleotides was not specific as the retarded complex was competed with similar amounts of specific or nonspecific competitor (Fig.  3 0 . Binding of the Cut Repeats to the C3S motif was assessed by DNase footprinting (Fig. 4). Cut Repeat 1 and Cut Repeat 3 protected the C3S sequence (Fig. 4, lanes 2 and 4 ) . Some DNase hypersensitive sites are also visible above the area of protection with Cut Repeat 1. The Cut homeodomain only weakly protected the C3S sequence (Fig. 4, lane 6). Results from the EMSA presented in Fig. 2 indicate that the binding affinity of Cut Repeat 3 was not affected by the presence of Cut Repeat 2 in the protein, whereas binding affinity was clearly increased when the Cut homeodomain was included. We therefore asked whether the presence of either Cut Repeat 2 or the homeodomain in addition to Cut Repeat 3 would alter the protection pattern of the C3S sequence observed with Cut Repeat 3 alone. Interestingly, while the protection pattern remained unchanged when Cut Repeat 2 was present with Cut Repeat 3, several DNase hypersensitive sites appear above and below the protected area when the homeodomain was present with Cut Repeat 3 (Fig. 4, lane 8). Altogether these results suggest that the homeodomain may cooperate with Cut Repeat 3 to bind to the C3S sequence.  1,3,5,7,9, and 11  To estimate the DNA binding affinity of the Cut Repeats to the C3S sequence, EMSA were performed using a fixed amount of DNA (10 PM) and a wide range of protein concentrations (Fig.  5 A ) . Since the DNA concentration was negligible compared to the protein, the protein concentration required to bind half the DNA can be taken as an approximation of the dissociation constant, K D (~~~) (35). Visual examination of the results indicated that approximately 0.25 ng of Cut Repeat 1 (0.3 nM) and 5 ng of Cut Repeat 3 (6.25 nM) are sufficient to bind half of the DNA. A more accurate value for KD was obtained by scanning of the autoradiograms with a Phosphoimager (Fuji). The halfmaximal binding point was determined by measuring the decrease in free DNA rather that the increase in bound DNA, since the protein/DNA equilibrium may be perturbed by electrophoresis (35). The data was plotted as the fraction of free DNA uersus log of protein concentration (Fig. 5B). Because the protein concentrations were determined by Bradford assay and did not take into account the fraction of inactive proteins, the actual protein quantities may be slightly lower than those indicated. For Cut Repeat 1, the calculated K D is 4 x 10"O M whereas for Cut Repeat 3 it is 8 x M. Previous results (Fig.  2) suggested that the Cut Repeat 3 and the Cut homeodomain cooperate to bind to the C3S sequence. We would then expect the binding affinity of the Cut Repeat 3+homeodomain protein to be much greater than that of Cut Repeat 3 alone. Indeed the calculated KO for the Cut Repeat 3+homeodomain is 4 x lo-" M, almost two orders of magnitude higher than that for Cut Repeat 3 alone (Fig. 5 0 . This KD value is similar to that of other sequence specific DNA binding proteins such as NFI (KD 1.2 x lo-" M) or CAP (KO 2 x lo-" M) (36, 37).

DISCUSSION
Our experiments reveal that the conserved Cut Repeats in the human Cut protein are DNA binding domains. We have identified a DNA sequence, C3S, which is specifically recognized in gel retardation and DNase footprinting assays by either Cut Repeat 1 or Cut Repeat 3. We further estimated that Cut Repeats 1 and 3 bind with high affinity to the C3S sequence, with KD of approximately 4 x 10"O M and 8 x M, respectively. The demonstration that the conserved Cut Repeats can bind specifically and with high affinity to DNA defines a novel class of homeodomain proteins containing a second DNA binding domain. In addition to the Paired domain and the Pou domain proteins, the Cut proteins represent a third class of homeodomain proteins with a demonstrated bipartite DNA binding domain.
The conserved Cut Repeat 2 did not bind to the C3S sequence, however it was found to interact with DNA in a DNA filter binding assay. It is possible that the Cut Repeat 2 also can bind specifically to DNA but that we have not yet identified a recognition site for it. This is likely since the C3S sequence was isolated by the method of PCR-mediated random site selection using a fusion protein containing only Cut Repeat 3. We are currently repeating the same procedure with the Cut Repeat 2 to assess whether this protein segment could bind with high affinity to specific sites.
Our results also indicate that the Cut homeodomain and Cut Repeat 3 can cooperate to bind to the C3S sequence. Cut Repeat 3 was shown to bind autonomously and with high affinity to the C3S sequence. However with a fusion protein containing the homeodomain in addition to the Cut Repeat 3, the binding afinity for the C3S sequence was increased by nearly two orders of magnitude in comparison with Cut Repeat 3 alone. We conclude that the Cut Repeat 3 can bind to DNA either autonomously or in conjunction with the homeodomain. This mode of interaction with DNA is in some ways analogous to that of the Paired domain transcription factors. Paired domains can bind DNA either autonomously or in cooperation with the Paired homeodomain (2, 17). In contrast, specific and high affinity DNA binding by Pou proteins requires the participation of both the Pou specific domain and Pou homeodomain (14,15). If Cut Repeats can function autonomously in the cell, we would expect, as for the Paired domain in PAX-1, to find proteins containing a Cut Repeat without a homeodomain (38). We are currently performing low stringency hybridization experiments to determine whether other loci contain sequences related to the Cut Repeats.
Interestingly, although the binding affinity to the C3S sequence was augmented when the homeodomain was present in addition to the Cut Repeat 3, the extent of the protected sequence in DNase footprinting assay was not increased. This result suggests that the homeodomain may contact DNA in the same region as the Cut Repeat 3. Further studies will be necessary to determine the bases that make contact with either the homeodomain or/and the Cut Repeat 3.
The finding that the human Cut protein contains at least three, and possibly four, DNA binding domains raises interesting questions about how the Cut protein recognizes its target silds) in the cell. The Cut Repeats and homeodomain may each have slightly different binding specificities such that they would individually bind with highest affinity to different recognition sequences. In addition, as was found for the Cut Repeat 3+homeodomain, Cut DNA binding domains may achieve higher binding affinity to certain sites when acting in conjunction with one another. Future work should determine whether each Cut Repeat can cooperate with either the Cut homeodomain or another Cut Repeat in binding DNA. The possibility that different DNA binding domains within the same protein may bind to DNA either alone or in cooperation with one another would considerably increase the repertoire of sequences to which the Cut protein can bind. In support of this hypothesis, it is striking that so far three cDN& encoding mammalian homologues of the DrosophiZu Cut protein have been isolated using three different DNA sequences in the protein purification procedure or in the cDNA plaque screening: the FP sequence AAGAAAAGGAAACCGA'M'GC) for the human CCAAT displacement protein, the pe2 sub-element (GATCTGTGAGCT-GTGGAATGTAAGGGAGATC) for the canine Clox protein and the MElal sequence (GGAAAAAGAAGGGAGGGGAGGGA-TCC) for our human Cut cDNA clone (23, 24h2 In addition, we showed in the present study that the human Cut protein can bind to the C3S sequence (AAAAGAAGC'ITATCGATACCGT). These DNA sequences bear little resemblance to each other except for the AAAAGAAmotif present in FP, MElal, and C3S. This motif is not, however, present in the Be2 sub-element. It therefore appears that the Cut protein can recognize different binding sites and we assume this ability is conferred by the presence of several DNA binding domains within the proteins. Future work should reveal the binding specificity of each Cut Repeat and Cut homeodomain, acting either alone or in conjunction with one another. Finally, the identification of cellular targets for the Cut protein should help understand how the protein binds to DNA in vivo.