The rabbit uteroglobin gene. Structure and interaction with the progesterone receptor.

The study of the regulation of uteroglobin gene in the rabbit endometrium constitutes a model for analyzing the mechanism of action of progesterone in mammals. The gene has been cloned into lambda phage and sequenced. Comparison of the sequence of the gene with the amino acid sequence of preuteroglobin and the three-dimensional structure of uteroglobin established by crystal x-ray diffraction showed that the 3 exons correspond to different functional domains of the protein and that at least one of the splice junctions does not map at the surface of the protein. S1 mapping allowed us to define the RNA polymerase initiation site. No difference was observed when analyzing premessengers from the endometrium, where the gene is controlled by progesterone and estradiol, and from lung where the gene is constitutively expressed and not controlled by these hormones. In addition, S1 mapping revealed the existence of several minor transcription initiation sites. In the 5' flanking region between positions -33 and -24 there is the sequence AATACAAAAA which may correspond to a Goldberg-Hogness box. Two other A- and T-rich sequences were found further upstream from the gene, one of these preceding by about 30 nucleotides a minor start of transcription. No obvious feature, possibly related to steroid regulation, was observed in the nucleotide sequence. A fragment of the gene containing the "promoter" region (from nucleotide +10 to nucleotide -394) was preferentially retained on nitrocellulose filters after incubation with purified rabbit uterine receptor. A competitive binding assay was used to compare the affinity for the receptor of various DNA fragments. Labeled "promoter" region DNA was incubated with receptor and various concentrations of nonlabeled competing DNA, and the nitrocellulose-bound radioactivity was measured. This method showed the existence of several high affinity binding sites in the 5' part of the gene and in adjacent regions. However, no high affinity binding sites were observed in the 3' part of the gene. Also, within the "promoter" region there were at least two high affinity binding sites for the receptor.

preferentially retained on nitrocellulose filters after incubation with purified rabbit uterine receptor. A competitive binding assay was used to compare the affinity for the receptor of various DNA fragments. Labeled "promoter" region DNA was incubated with receptor and various concentrations of nonlabeled competing DNA, and the nitrocellulose-bound radioactivity was measured. This method showed the existence of several high affinity binding sites in the 5' part of the gene and in adjacent regions. However, no high affinity binding sites were observed in the 3' part of the gene. Also, within the "promoter" region there were at least two high affinity binding sites for the receptor.
* This work was supported by the Institut National de la Sante et de la Recherche Medicale (P.R.C. 135050), the U.E.R. Kremlin-BicGtre and the Fondation pour la Recherche MBdicale Franqaise. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ To whom correspondence should be addressed.
Morphological changes in the rabbit endometrium are the basis of several classical tests used to characterize and assay steroids having progestational activity (1,2). These changes consist mainly in the development of glands and the enhancement of the endometrial secretion. It has been shown that the major protein secreted into the uterine lumen is uteroglobin (3) also called blastokinin (4). Progesterone induction of uteroglobin is mainly due to increased levels of uteroglobin messenger RNA (5, 6). Recently it has been shown that progesterone increases transcription of the gene (7). The cDNA (8) and genomic fragments (9)(10)(11) coding for uteroglobin have been cloned. The rabbit progesterone receptor has been purified and polyclonal (12) and monoclonal' antibodies obtained. It has thus become possible to use this system to study the interaction of a purified mammalian progesterone receptor with a cloned gene which it controls. A prerequisite to such studies is knowledge of the exact structure of the uteroglobin gene and of the adjacent regions. We report here the sequence of the gene (with the exception of the central part of the largest intron). The 3' and 5' flanking regions, with a special emphasis on the latter, have also been sequenced. The site of initiation of transcription has been determined by SI mapping (13) with RNAs prepared from endometrium and lung, i.e. two tissues where the uteroglobin gene is expressed but where it is under completely different hormonal control (14). We also show that the rabbit progesterone receptor recognizes preferentially several regions in the 5' part of the gene and in the 5' flanking region. During the completion of these studies the intron-exon composition of the uteroglobin gene was reported by Menne et al. (11).

MATERIALS AND METHODS
Clones-The isolation of plasmids containing uteroglobin cDNA and of phages XUGLl and XUGLS containing rabbit genomic fragscribed (8,9). DNA from phage XUGLl was digested with EcoRI and ments encompassing the uteroglobin gene has previously been dethe 7 fragments subcloned in plasmid pBR325. The BamHI fragment starting at nucleotide -394 and finishing at nucleotide +10 was subcloned in plasmid pBR322. The experiments were performed in compliance with the French National guidelines for containment.
DNA Sequence Determination.-The method of Maxam and Gilbert (15) was used. Restriction fragments were end labeled using T 4 polynucleotide kinase (P-L Biochemicals) and [Y-~'P]ATP, after dephosphorylation with calf intestine alkaline phosphatase (Boehringer Mannheim). Fragments labeled at one end were generated by redigestion with an appropriate restriction enzyme or by separation of strands and 96.3% of the nucleotides were determined from at least two restriction sites. In most cases both strands were analyzed and all restriction sites were sequenced through.
SI Mapping-The protocol of Weaver and Weissmann (13) was Uteroglobin Gene and Progesterone Receptor 10385 used. Total poly(A+) RNA (nuclear and cytoplasmic) was prepared (5) from endometria and lung of 5-day pregnant rabbits. A DdeI fragment extending from nucleotide -355 to nucleotide +158 was end labeled with T4 polynucleotide kinase. It was cleaved with AuaI and the labeled fragment extending from nucleotide +158 to nucleotide -193 was purified by polyacrylamide gel electrophoresis. This fragment was eluted from the gel and aliquots of 50,000 cpm were hybridized with 25 pg of poly(A+) RNA during 16 h at 55 "C as described (13). Nuclease S1 digestion (100 units) was performed for 30 min at 45 "C and the nuclease-resistant fragments were separated by electrophoresis on 6% polyacrylamide gels (13).
Receptor Purification-It was performed as previously described (12) except that the preparation was chromatographed on a 0.7-ml DNA-cellulose column. After washing the column, the receptor was eluted in 0.8 ml of 5 mM sodium phosphate, 0.5 M NaC1, pH 8.3, buffer. Specific activity of the receptor preparation averaged 5000 pmol of bound hormone/mg of protein.
Receptor Binding to DNA-This binding was studied as described for the Escherichia coli CAMP binding protein attachment to specific regions of the lac promoter DNA (16 Incubation was for 90 min at 0 "C. Aliquots (50 gl) were filtered onto nitrocellulose filters (B.A. 85, Schleicher & Schiill, Dassel, Germany). After washing with 100 pl of the same Tris buffer containing no bovine serum albumin, the radioactivity bound to the filters was determined.

RESULTS
Sequence of the Uteroglobin Gene and of the Flanking Regions, Exons and Introns-Comparison with the cDNA sequence (17) shows that the gene contains 3 exons. The first one is 103 nucleotides long (for start of the gene see S1 mapping experiments) containing 48 noncoding nucleotides and 55 nucleotides corresponding to 18 of the 21 amino acids of the signal peptide. The second exon contains 188 nucleotides (coding for the last 3 amino acids of the signal peptide and up to threonine 60). The last exon is 174 nucleotides long, 30 coding for amino acids from glutamic acid 61 to methionine 70 and 144 nucleotides representing the 3' nontranslated end of the messenger RNA. The first intron is about 2270 nucleotides long. Its central part has not been sequenced. The second intron is 332 nucleotides long. Both introns follow the general rule (18) and start with GT and finish with AG. More extensive consensus sequences have been proposed (19). The donor consensus sequence AAG/GT GATGT matches in 8 of 9 nucleotides the corresponding sequence in exon l/intron 1 (CTG/GTGAGT) and in only 6 of 9 nucleotides in exon 21 intron 2 (ACG/GTAACC). The acceptor consensus sequence (C)"N TAG/G is closely followed by the corresponding regions of both intron-exon borders. At the 3' end of the gene is found the sequence AATAAA preceding by 19 nucleotides the site of addition of poly(A). The intron-exon composition of the uteroglobin gene observed in these experiments is identical with that reported by Menne et al. (11).
Regions Flanking the Gene-We have sequenced 597 nucleotides upstream from the transcription start since this region may be important in the hormonal control of the gene.  g q g c a q q g c a t t g q c t c g g c t a g g t a t g g g g t t t t q q g t c t t t q~t g g g t q t t c t g c g g a a TGCTTTC~TAAACTGCAAGCAGATCacatccgtcgtcctqavctcttatttacctg~tt q a v 9 g a g a g a g g c g~v c a c c t~~c a~g g~q t a g c c a g~a c t c c c a g g a~g g c g g c q g 31 20 avcctgcccttggaccccacga~qaaactctgacgccttccaaagtcctttcctctagqt 3100 t t c t q c c a t t q g c c a c c c a a a a a~t~a~c t c t c c c c t t a c~t t c t a c c a c c t a c q t c   3250 tcaaactccacctgtqqqagagaa

Uteroglobin Gene and Progesterone Receptor
The sequence AATACAAAAA which might be a somewhat unusual form of a Goldberg-Hogness box (20) is found between positions -33 and -24. A t positions -81 to -85 there is the sequence CAAGT which might be the equivalent of a CAT box (20).
Two other sequences resembling the Goldberg-Hogness consensus sequence are found further upstream from the gene, TAAATAAT (-98 to -91) and AATATTTA (-132 to -125). S1 mapping experiments showed that at least the first of these sequences may be used as a minor transcription start site.
The sequence of the uteroglobin gene has been studied in a recent publication (11). Differences with this sequence are shown in Table I. SI Mapping of the Start of Transcription-Defining the nucleotide where transcription starts was especially interesting since steroid receptors may regulate the transcription of genes. It is thus important to precisely localize sequences recognized by the receptors in relation to the starting of transcription. Moreover, the uteroglobin gene is expressed in the endometrium and in the lung. Since the hormonal control of the uteroglobin gene is entirely different in the two tissues (14), it is of interest to determine if transcription of the gene starts at different places or at the same place in both tissues.
The probe which was used was labeled at the DdeI site at nucleotide +158 (inside intron 1) and extended in the 5' direction up to the AuaI site at nucleotide -193. The labeled region could thus be protected only by the premessengers. As shown in Fig. 1 a major and broad band of radioactivity was observed around position -4. Study of 20 genes has shown that transcription usually initiates at an A which is preceded by a pyrimidine, actually a C in most cases (21). It is thus probable that transcription starts at the A labeled +1 in the Table I. RNA molecules which are capped at the 5' end can often protect an additional 1-5 nucleotides of the DNA probe from digestion by S1 nuclease (22). This position of the start of transcription differs by a single nucleotide from the beginning of messenger as determined by cDNA sequencing (17). Some minor larger bands were also observed on the gel. They did not disappear with further S1 digestion (data not shown). It is thus possible that they correspond to minor transcription starts. One of the bands lies at nucleotide -71. An A is found at position -69 which again corresponds to the proposed consensus sequence (21). This start of transcription is preceded by Goldberg-Hogness-type sequence (TAAATAAT) at about 30 nucleotides upstream (positions -98 to -91). A minor smaller band was also observed at position +81; it may correspond to an allelic variation of the messenger sequence. Finally it was observed that the major start of transcription was identical in endometrium and lung.
Progesterone Receptor Interaction with the Uteroglobin Gene-We hypothesized that the receptor could specifically interact within the 5' control region adjacent to the uteroglobin gene. Thus the BamHI fragment extending from position +10 to -394 was isolated, 5' end labeled, and aliquots incubated with various concentrations of receptor. After filtration and washing, the radioactivity bound to nitrocellulose filters was determined. The binding of the receptor to this gene fragment was compared to that of a similar fragment of nonspecific DNA (404 base pairs long HpaII fragment of pBR322). As shown in Fig. 2 there was a preferential retention of the BamHI fragment when compared to the nonspecific control segment of DNA.
The specificity in terms of receptor binding was studied in two types of experiments. To eliminate a preferential binding of the BamHI fragment independent of the nature of the interacting protein, and experiment was performed in which nonspecific proteins (hepatic cytosol) were used in place of receptor. As shown in Fig. 3 there was no preferential binding to the BamHI fragment. Moreover, it is known that receptor binding to DNA is specifically inhibited by pyridoxal phosphate (23). At very low concentrations (0.4 mM) pyridoxal phosphate almost completely prevented the retention on nitrocellulose filters of "promoter" DNA after incubation with the receptor preparation (Fig. 4).
To analyze the distribution of high affinity binding sites for the progesterone receptor in the region of the uteroglobin gene we used a competitive binding assay. Unlabeled gene fragments were added to receptor and 32P-labeled BamHI fragment of 404 base pairs which contains the promoter. The ability of these fragments to displace bound ["PIDNA was determined. The uteroglobin gene was cut into 3 fragments by EcoRI: fragment a of -5000 base pairs contained about 3200 base pairs 5' from the gene, the first exon, and -1700 base pairs of the first intron; fragment b containing -490 base pairs of the first intron; and fragment c (1064 base pairs) which contained the 3' part of the gene, the end of the first intron, all of the second intron as well as the second the third exons, and 236 base pairs downstream from the 3' terminus of the gene. As may be seen in Fig. 5A only fragment a exhibited a high affinity for the receptor; however, the affinity was somewhat lower than that of the 404-base pair BamHI  is BamHI-digested calf thymus DNA. Experimental conditions are as in Fig. 2. B, a, b, and c are fragments of -2800 base pairs, 404 base pairs, and -1800 base pairs, respectively, obtained by BamHI digestion of the 5-kilobase EcoRI fragment. Labeled 404-base pair BamHI "promoter" fragment was incubated with progesterone receptor (5 pg/ml) in presence of various concentrations of unlabeled competing DNA. Bound radioactivity was measured. The DNA fragments which were used are shown at the bottom of the figure (B, the first exon). a, fragment extending from nucleotide -394 to nucleotide -251; b, fragment extending from nucleotide -250 to nucleotide -194; c, fragment extending from nucleotide -193 to nucleotide +10; d, fragment extending from nucleotide -394 to nucleotide +lo.
"promoter" fragment (d). Nonspecific DNA (e = calf thymus) was a very poor inhibitor. The same experiment was performed with the same result after S1 digestion of DNA fragments (not shown). Thus single stranded regions generated and Progesterone Receptor by restriction enzyme digestion were not responsible for differences in the affinity for the receptor of various DNA fragments.
TO further analyze the distribution of the high affinity sites, the 5000-base pair EeoRI fragment was digested with BamHI yielding 3 DNA fragments. The 404-base pair (+lo to -394) "promoter" fragment had the highest affinity. The -2800base pair fragment (upstream from position -394) and the -1800-base pair segment (downstream from position +lo) also exhibited some affinity for the receptor (Fig. 5B).
We also determined if the 404-base pair BamHI "promoter" fragment contained one or several regions responsible for high affinity binding of receptor. After digestion with AvaI 3 fragments were obtained; one of 144 base pairs (-394 to -251), one of 57 base pairs (-250 to -194), and one of 203 base pairs (-193 to +lo). As shown in Fig. 6 the 144-and 203-base pair fragments had both a high affinity for the receptor (comparable to that of the 404-base pair BamHI fragment), whereas the central 57-base pair fragment had a markedly lower affinity. To confirm this result the 404-base pair BamHI fragment was digested with HinfI yielding two fragments of 181 base pairs and 223 base pairs, respectively. Both were labeled and incubated with various concentrations of receptor. The two fragments bound the receptor with similar affinities (not shown).

DISCUSSION
The uteroglobin gene contains 2 introns. The first one begins at the end of the region coding for uteroglobin signal peptide. Similar situations have been observed for many proteins (24,25). The second intron interrupts the coding sequence at nucleotides corresponding to threonine 60. The three-dimensional structure of uteroglobin has been estab: lished by x-ray diffraction of crystals at a resolution of 2.2 A (26). The protein is made of two identical subunits which together encompass the hydrophobic pocket constituting the steroid binding site and are held together by noncovalent bonds and 2 disulfide bridges. The amino acids preceding threonine 60 constitute the major part of the protein and are sufficient to encompass the steroid binding site, while the amino acids following threonine 60 are mainly involved in the tight binding of both subunits. As in many other proteins, the introns separate functional domains of the protein. However, this intron-exon boundary does not follow the rule proposed by Craik et al. (27) where splice junctions map at protein surfaces. Threonine 60 is one of the few hydrophilic residues which are buried inside the uteroglobin molecule (26).
The TATA consensus sequence preceding by about 30 nucleotides the start of transcription is known to be an important feature of the recognition of the eukaryotic genes by RNA polymerase I1 (20). In the uteroglobin gene the sequence AATACAAAAA is found between positions -33 to -24. This sequence is somewhat unusual since the transversion from A-T to C-G in both eukaryotic and prokaryotic Goldberg-Hogness or Pribnow boxes has been considered as a mutation decreasing the efficiency of promoters (28,29).
Two other A-and T-rich sequences resembling more the usual type of TATA box are found further upstream from the gene. However, only one of them may be used in the recognition of a minor transcription initiation site. These observations illustrate the difficulties in defining exactly the DNA sequences recognized by RNA polymerase 11. S1 mapping experiments showed that the nucleotides at position +1 are identical in endometrial and pulmonary uteroglobin premessengers. Thus the major site of transcription initiation of a unique gene (9), which can be regulated by completely different hormonal mechanisms in two different tissues, is identical in the two tissues. Studies of DNA modification or chromatin structure might shed some light on the differential nature of uteroglobin gene regulation. However, minor sites of initiation have been reported for other genes (30). Their significance remains unclear.
During the completion of this study a paper was published (11) describing a sequence analysis of the uteroglobin gene. The major structural features which are described in this work are analogous to our observations but there are many minor differences (see Table I).
The method which has been used to study progresterone receptor interaction with the uteroglobin gene was similar to that used for the study of the binding of E. coli chloramphenicol (CAMP binding) protein to the lac operon (16). Similar concentrations of protein were necessary to observe, filter retention of DNA suggesting a similar affinity for "promoter" DNA. The existence of several high affinity binding sites for receptors resembles the results obtained with other systems (egg white proteins (31,32) and mouse mammary tumor virus (33)). On the other hand Govindan et al. (34) observed a single interaction between glucocorticoid receptors and mouse mammary tumor virus DNA.
Our preparation of receptor was about 50% homogenous and was thus of similar purity to that used on some of the studies on mouse mammary tumor virus (33). However, we cannot completely eliminate the possibility that the DNA binding we observe was due to some contaminating protein.
The inhibitory effect of pyridoxal phosphate (23) is, however, evidence in favor of the DNA binding being due to receptor. Further evidence in favor of this interpretation is the fact that when using receptor preparations of lower specific activity ( i e . less purified) we observed a diminished preferential binding to specific DNA. Since purification yielded steroidreceptor complexes and not free receptor it has been impossible to study the effect of the hormone on receptor binding.
The high affinity DNA binding sites observed in this study may be related to the biological activity of steroid-receptor complexes but further experiments will be necessary to establish this point. However, if this hypothesis is true then the existence of several binding sites for receptor in steroidregulated genes suggests that receptors act by changing the structural or functional properties of the chromatin on a large portion of the gene.