Characterization of the Gene Encoding a Folate-binding Protein Expressed in Human Placenta IDENTIFICATION OF PROMOTER ACTMTY IN A G-RICH SP1 SITE LINKED WITH THE TANDEMLY REPEATED GGAAG MOTIF FOR THE ets ENCODED GA-BINDING PROTEIN*

The gene encoding a folate-binding protein (FBP) ex- pressed in human placenta has been cloned by screening a genomic library with the KB cell FBP complemen- tary DNA. This gene, contained in a 10-kilobase EcoRI fragment of this genomic clone, has 5 exons, 4 introns, the AATAA polyadenylation signal in the 3‘-untranslated region, and a 5”flanking sequence which contains the promoter elements, all of which span approximately 5 kilobases. Transcription initiation was mapped by RNase protection to a site 73 base pairs downstream from a G-rich sequence linked to a tandemly repeated GGAAG sequence which is a motif that the ets oncogene encoded GA-binding protein (GABP) transcription fac- tor binds. Gel-shift and supershift mobility assays indicate that the G-rich sequence and the ets motif bind specifically to SP1 and GABP, respectively. These cis regulatory elements in tandem drive expression of the chloramphenicol acetyltransferase reporter gene in transiently transfected mouse 3T3 cells. The location of these elements upstream of transcription initiation in this gene, which lacks an appropriately located TATA box promoter, indicates that this SP1-GA binding region most probably regulates expression of this placental FBP. The gene encoding this placental FBP has been assigned the FBPPL-1 gene because it is

(m-FBP) from a hydrophilic soluble form of this protein (2)(3)(4)(5). This m-FBP is anchored to the external surface of the plasma membrane of cultured human KB (6) and CaCo-2 cells (7) by a glycosylphosphatidylinositol (GPI) tail which can be released as a soluble form from the cell membrane by digestion with phosphatidylinositol-specific phospholipase C.
Cellular orientation can also distinguish the m-FBP on the plasma membrane from a nonhydrophobic FBP which has been identified in the cytoplasm of human leukemia cells (8) and normal granulocytes (9). Finally, these FBP(s) can also be distinguished by monospecific antiserums raised to these different forms of the FBP (10).
Screening of Genomic Library-The ACharon 4A human genomic library was obtained from ATCC, Rockville, MD. This library was constructed by Maniatis et al. (17) from human fetal liver genomic DNA partially digested withAluIIHaeII1 and, after EcoRI linker ligation and EcoRI digestion, was cloned into the ACharon 4Avector. The library was screened with a FBP cDNA that was isolated from a cDNA library prepared from KB cell poly(A)+ RNA (11). The identification of the hybridizing clones and subsequent plaque purification were camed out using standard procedures (18). For some initial screening of the library, the final high stringency wash (0.1 x SSC, 1% SDS at 65 "C) was omitted. The DNA isolation was carried out using both the plate lysis and the liquid culture methods (18).
Plasmid Cloning-The selected clones were separately digested with EcoRI and BamHI and the fragments (ranging from 3 to 15 kb) were isolated following separation by agarose gel electrophoresis and then ligated respectively, into EcoRI-digested and phosphatase-treated, and BamHI-digested and phosphatase-treated pUC18. To subclone the product of the polymerase chain reaction (see below), the amplified DNA fragment was blunt-ended using the Klenow fragment of DNA polymerase 1, phosphorylated with T4 polynucleotide kinase, and then isolated by agarose gel electrophoresis and the DNA was purified from the gel using Geneclean kit from BIO 101, Inc. (San Diego, CA). The DNA fragment was then ligated to SmaI-digested and phosphatase-treated pUC18. Plasmid DNA was prepared by alkaline-SDS lysis followed by precipitation with polyethylene glycol 8000 as described by Sambrook et al. (18).
DNA Sequencing-DNA sequencing was performed directly on isolated recombinant clones using the dideoxy chain termination method of Sanger et al. (19) provided with the Sequenase kit from U. S. Biochemical Corp. The universal and reverse primers were used to obtain the sequence of each strand beginning at the linker site and these sequences were then extended using 18-24-mer oligonucleotides synthesized to the unambiguous sequence determined in the preceding run and from the published sequence of the cDNA for the placental FBP (13). Both strands of introns 3 and 4, and the 5'-region of the gene were also sequenced. DNA sequences were analyzed using PC gene software supplied by IntelliGenetics, Mountainview, CA.
Polymerase Chain Reaction (PCR) and Reverse Danscription-PCR-PCR was used to determine the size of introns in the cloned fragments of the gene and to amplify the 5"region of the gene from normal human genomic DNA. PCR was also used to amplify fragments of putative upstream regulatory elements to insert into the pCAT vector constructs to analyze for promoter activity.
The 5"region of the gene was amplified from genomic DNA using the same PCR conditions described in the preceding paragraph with the same 5'(-246) + 3'(-222) sense amplimer but with a 5'(-58) + 3'(-35) antisense amplimer just upstream from the first intron. The PCR fragments that were generated were processed and cloned as described in the section on plasmid cloning (see above).
Mapping of the Zkznscription Start Sites by RNase Protection-The antisense RNA fragments for this assay encompassed the putative TATA box and a G-rich sequence selected by computer analysis as a GC box, [5'(-434) -3'(-118)1, and a 191-nucleotide sequence extending from 5'(-216) -3'(-25). These fragments were prepared by PCR using 24- mer sense and antisense primers to the termini of these regions and  then subcloned into the transcription vector, pGEM 32, which contains  the SP6 and T7 RNA polymerase promoters. Following digestion with  EcoRI to linearize the plasmid containing the TATA box-G-rich fragment, and BamHI for the plasmid containing the 191-bp fragment, the  antisense and sense strands were labeled with [32PlUTP by in vitro  transcription using the SP6 and T7 polymerases, respectively, with the kit from Promega (Madison, WI). The RPAkit from Ambion (Austin, TX) was used for the RNase protection assay with 58 pg of placental RNA and, for the negative control, 58 pg of tRNA. The actin RNA antisense transcript as provided by the manufacturer was used with liver RNA as the positive control. The protected fragment(s) was separated by electrophoresis on a 6% urea-polyacrylamide gel for 90 min, the gel was then dried and exposed to Kodak X-AR5 film.
Identification of Danscriptional Regulatory Regions of the Gene-Three regions of the gene extending upstream from the ATG start codon of the first coding exon were analyzed for promoter activity (see legend of Fig. 4). These regions were prepared either by PCR amplification using the genomic clone as the template and 24-mer sense and antisense primers to the termini of each sequence or by digestion of the genomic clone with a restriction enzyme and the elution of the specific fragment following agarose gel electrophoresis.
Each fragment was blunted using DNA polymerase (Klenow fragment), purified by gel electrophoresis, and subcloned into the XbaI linker site upstream of the chloramphenicol acetyltransferase (CAT) reporter gene in the pCAT Basic plasmid (pCAT Basic, Promega). An aliquot (3 pl) of the ligation mixture was used to transform DH5a competent E. coli (Life Technologies Inc.). Several transformed colonies were selected from each plate by hybridization to the corresponding primer. Plasmid DNA was prepared and the orientation of each insert was established by DNA sequencing. described by Sambrook et al. (18) was used to co-transfect NIH 3T3 For transient expression of CAT activity, the transfection protocol fibroblasts with the pCAT constructs and, as an internal control to monitor transfection efficiency, the plasmid containing the &galactosidase gene. Briefly, 10 pg of each plasmid DNA contained in 450 pl of H,O was mixed with 50 pl of 2.5 M CaCI,. This preparation was slowly mixed with 500 p1 of 2 x HeBS solution (280 m~ NaCl, 10 m~ KC1, 1.5 m~ NazHP04.2Hz0, 12 m M dextrose, and 50 m~ Hepes, pH 7.05), and incubated for 30 min at room temperature. The DNA-CaP04 precipitate was suspended in 10 ml of Dulbecco's modified Eagle's medium containing 10% fetal calf serum and layered over the 3T3 fibroblasts which had been grown to 2040% confluency in a 100-mm Petri dish. The cells were then incubated for 6 h at 37 "C in 5% CO, following which the medium was removed and the cells shocked by exposure to 15% glycerol in 1 x HeBS for 2 min. After one wash with 1 x HeBS, the cells were incubated for 24 h in Dulbecco's modified Eagle's medium containing 10% fetal calf serum. The cell layer was then washed with Hank's balanced salt solution and the monolayer trypsinized (0.01%) for 10 min at 37 "C. The cells were harvested in 10 ml of phosphate-buffered saline, pelleted by centrifugation at 2000 rpm for 10 min at 4 "C, and then suspended in 0.25 M Tris-HC1, pH 7.8. After three cycles of freezethawing, the supernatant cytoplasm, obtained following centrifugation, was divided into two fractions. The lysate to be assayed for CAT activity was incubated for 10 min at 65 "C to inactivate endogenous deacetylases. The milky suspension was centrifuged at 4 "C for 5 min to obtain the clear supernatant which was assayed for the CAT activity using [14C]chloramphenicol as the substrate (20). Fifty p1 of this cell extract was mixed with 80 p l of assay mixture (0.647 M Tris-HC1, pH 7.8,70 pg of acetyl-coenzyme A, and 0.25 pCi of the 14C-labeled chloramphenicol) and incubated at 37 "C for 1 h. The reaction was terminated with the addition of 1 ml of ethyl acetate followed by 10 s mixing, repeated three times. Following centrifugation in a microcentrifuge for 5 min, 900 pl of the organic phase was evaporated to dryness. The dried pellet was dissolved in 20 p l of ethyl acetate which was then spotted on a thin layer silica chromatography plate (Sigma) and developed with chlorofod methanol (95:5) for 2 h and the plate then exposed for 16 h to Kodak X-AR5 film. The residual [14Clchloramphenicol and the mono and diacetylated derivatives were extracted from the silica underlying the respective autoradiographic spots and the radioactivity determined in a liquid scintillation detector.
The other fraction of each 3T3 cell lysate was assayed for p-galactosidase activity using the system obtained from Promega (Madison, WI). A 30-111 aliquot of each lysate was diluted to 150 pl with HZ0 and 150 pl of 2 x assay buffer (120 m~ Na,HP04, 80 m~ NaHzP04, 2 m~ MgCI,, 100 m~ pmercaptoethanol, and 1.33 m g / d 0-nitrophenyl-P-o-galactopyranoside). The samples were mixed and incubated at 37 "C for 30 min. The reactions were terminated by the addition of 500 pl of 1 M Na2C03 and the absorbance at 420 nm was determined. The protein concentration of the cytosol fractions was assayed using the BCA protein assay reagent supplied by Pierce Chemical Co. and the p-galactosidase activity was computed as the absorbance/pg of protein. T h e radioactivity (dpdpg protein) comprising the sum of the mono and diacylated chloramphenicol (CAT activity) was normalized to 0.1 A unit of P-galactosidase activity for that sample and the ratio of the CAT activity of the construct to the CAT activity of the pCAT Basic vector control was computed to be the fold stimulation.
Analysis of DNA-protein Binding by Gel-shift Mobility Assay-A GAAGGGAAGGAAGAGAGGAAGGAGAATAGC-3'(-186) and the sense oligonucleotide sequence, 5'-(-233)GAAGAGGGTGGGGTCTg complementary antisense sequence were prepared for the gel-shift mobility assays. This sequence contains the G-rich region (5'-GGGTGGGG-3'), which is a computer-selected putative SPl binding motif, linked to the tandemly duplicated GGAAG sequence, the consensus motif for binding a number of related ets oncogene encoded nuclear proteins (16). A second set of sense and antisense oligomers were prepared to the sequence, 5'(-220) --f 3'(-186) so that this probe will contain only the tandemly repeated GGAAG. These oligomers were gel purified and the antisense strand was end-labeled with [y-32PlATP and T4 polynucleotide kinase and then annealed with the unlabeled sense strand present in 10-fold excess. A third set of oligomers, [5'-TCTGCL TAGGCTAGCTAGAGAGKTAGGAGAAT-3'1 were prepared that con-Gins basesubstitutions (underlined) in the region 5'(-220) "* 3'(-1901, and whch mutates four GA dinucleotides to CT dinucleotides in the motif which is the putative GABP binding sequence. This double stranded oligomer was used for competition studies in the gel-shift mobility assay to determine the specificity of the duplicated GGAAG pentamer as the essential motif for binding the GABP transcription factods) (see above).
The GABP binding reaction2 was carried out in the same buffer with the addition of 10 ng of the GABP, subunit with or without 10 ng of the GABP,, subunit and 1 ng of the 32P-labeled double stranded oligonucleotide probe. These reactions were incubated for 30 min at 4 "C.
The specificity of SP1 binding to the probe was determined by competition with 25 ng of unlabeled SP1 double stranded oligonucleotide (5'-ATTCGATCGGGGCGGGGCGAC-3'). Aliquots of each assay reaction were electrophoresed in a 5% polyacrylamide gel in 0.25 x TBE buffer for 4 h at 4 "C. The gel was then incubated for 20 min in 5% glycerol, dried at 80 "C for 1 h, and exposed to Cronex film.
Supershift assays using rabbit polyclonal antiserum to human SP1 (Santa Cruz Biotechnology Corp.) and to the GA-binding proteins2 was also used to determine whether a mobility shift observed with the nuclear extract was due to the binding of the radiolabeled probe to these specific transcription factors. The respective 32P-labeled probes were preincubated with nuclear extract (14 pg of protein) for 60 min at 4 "C and then 1.5 pl of the specific antiserum was added and the incubation continued for 16 h. The reaction mixture was then electrophoresed in a 4% polycrylamide gel and subjected to autoradiography as described.

RESULTS
Identification and Isolation of the Gene Encoding a Placental FBP-A total of 29 clones in the ACharon 4A genomic library hybridized to the KB cell FBP cDNA. The clones were then distinguished by separate digestions of the DNA in order to determine the size of the insert in each clone (EcoRI) and to identify independent isolates on the basis of a difference in the fragments generated (BamHI). Southern blotting with the KB cell FBP cDNA as the probe was used to identify the homologous gene fragments. The EcoRI digest of 4 clones generated seven fragments of which only one, a 10-kb fragment, hybridized to the KB cell FBP cDNA. This 10-kb fragment was subcloned into pUC18 for sequencing.
Nucleotide Sequence and Organization of this Gene- Fig. 1 shows the nucleotide sequence of the gene with coding exons aligned with placental FBP cDNA (13), and the deduced amino acid sequence. The gene has five exons (129, 174,189, 136, and 483 bp(s) in length) interrupted by four introns (-1200, -2200, 117, and 158 bp(s) in length) and spans at least 5 kb. The exons of this gene have greater than 95% homology with the placental FBP cDNA as reported by Ratnam et al. 113) including a Kozak consensus sequence flanking the putative translation initiation site (22). There are, however, a number of important differences. First, the 3"terminal nucleotide (-25, shaded) of exon 1 is a G instead of a C contained in the placental FBP cDNA and this provides an AG/gc sequence which may be an intron donor splice site. Second, exon 4 also differs from this placental FBP cDNA, containing C instead of A at nucleotide 357 (shaded). Third, 144 nucleotides of the 5"untranslated region of the gene beginning at nucleotide -118 is completely divergent from the corresponding region of placental FBP cDNA (shaded region). The 3"untranslated region of this gene contains the polyadenylation signal sequence which begins 164 nucleotides following the stop codon and 16 nucleotides upstream from the beginning of the poly(A) sequence reported for the cDNA (13).
Except for this divergence in the nucleotide sequence of this 5"region of the gene from the corresponding untranslated region of the placental FBP cDNA, the evidence that this gene encodes the placental FBP is compelling. Our first consideration to explain the observed divergence was that the discrepancy may be the result of a cloning artifact occurring in the preparation of either the cDNA or the genomic libraries. To resolve this question we used a number of strategies with PCR and reverse transcriptase-PCR to amplify this region of genomic DNA. For the first strategy, we prepared a n antisense primer common to this placental FBP gene and the placental FBP cDNA sequences, and a sense primer complementary to the unique 5"divergent region of the gene. For the second strategy, we used the same antisense primer but with a sense primer unique to the 5"region of the cDNA that was divergent from this gene. A 213-bp fragment was amplified using the first strategy with the sense primer to the unique 5' extremity of the gene which was subcloned into pUC18 and the sequence obtained was identical to this divergent region of the cloned gene shown in Fig. 1. With the second strategy, no identifiable fragment was obtained using the sense primer complementary to the unique sequence of this region of the placental FBP cDNA.
For the third strategy, we used reverse transcriptase-PCR to generate the first strand cDNA from placental RNA with an antisense primer extending from 5' ( (Fig. 2) that was predicted from the location of the amplimers indicating that this "divergent" region is in the transcribed RNA. No product was obtained in a control reaction in which reverse transcriptase was omitted. Southern blotting analysis of the amplified product using an internal 32P-labeled oligomer [5'(-67) + 3'(-91)1 confirmed the validity of this fragment (data not shown). As an additional control, reverse transcriptase-PCR with placental RNA was also carried out using the primers indicated for the second strategy (i.e. the sense primer specific for the cDNA) and we did not amplify any fragment. Computer analysis (PIC Gene, IntelliGenetics) of the 5'-untranslated region of this gene was used to identify putative cap sequences which could contain a transcription start site(s), a TATA box, and other promoter elements. These regions are indicated in Fig. 1. One putative cap sequence, GCAATTCT[5'(-347) + 3'(-340)] (boxed) appropriately located 33 bp downstream from a selected TATA box having the sequence 5'-ATAAAATCC-3' (overlined) was identified. A Grich sequence (overlined) within the divergent sequence from that reported for the untranslated 5"region of the cDNA (13) was identified as containing a putative GC box. However, this sequence, GAGGGTGGGGTCT [5'(-229) + 3'(-218)], is unusual for a GC box because it lacks the classical hexanucleotide GGGCGG motif (23).
Mapping the Dunscription Initiation Site-The RNase protection assay using a 32P-labeled 316-nucleotide probe, which extends from 5'(-434) + 3'(-118) and spans the putative TATA box and G-rich sequence, identified a number of fragments from 20 to 57 nucleotides in size but the autoradiographic intensity of a 38-bp-fragment appeared to predominate (data not shown). Even after a 96-h exposure time, no fragment appeared which located a putative cap site 31-35 bp downstream from the computer selected TATA sequence indicated in Fig. 1.
To confirm that this 38-bp protected fragment has correctly located the transcription start site for this gene, the RNase protection assay was repeated using a shorter probe of 191 nucleotides [5'(-216) + 3'(-25) 3 which is 93 nucleotides longer at the 3'-end than the probe used for the protection assay described above. Fig. 3 shows a protected fragment of approximately 128 nucleotides which is not observed in the lane containing the tRNA control. This places transcription initiation at the same site obtained with the RNase protection assay using the 316-nucleotide probe. This result was confirmed by replicate analyses. Gel-shift Mobility Assay to Identify DNA-binding Proteins-A mobility gel-shift was obtained (Fig. 5 A ) with the 32P-labeled oligomer, 5'(-233) + 3'(-1861, containing the G-rich-GGAAG tandem duplicated motif and purified SP1 protein (lane 2 ) and this shift was partially competed by the unlabeled G-rich-GGAAG oligomer (lane 3 ) and completely competed out by the SP1 specific oligomer (lane 4 ) indicating that this G-rich sequence contains a GC box even though it lacks the GGGCGG hexanucleotide that is the classical SP1 binding motif (23).

Identification of Putative Danscriptional Promoter Elements
When the same radiolabeled probe was incubated with the nuclear extract (Fig. 5B), four distinct retarded bands were observed (indicated by the arrowhead, open and closed triangles and opened arrowhead in lanes 4,6, and 8). Two of these retarded bands (arrowhead and open triangle) were competed out by the SP1 specific oligonucleotide (lanes 5, 7, and 8 ) indicating that this G-rich motif binds the SP1 protein contained in the nuclear extracts. A supershift assay using a n antiserum to the human SP1 transcription factor that cross-reacts with mouse SP1 (Fig. 5C, lane 4 ) provides additional evidence that the component of the nuclear extract binding the probe is the SP1 protein.
The mobility shift assay was also used to establish whether the tandem linked duplicated GGAAG ets motif in the 5'-flanking region bind specifically to the GA-binding protein (GABP) (both the GABP, and GABP,, subunits) and whether a nuclear extract prepared from 3T3 cell nuclei contains similar DNAbinding proteins. For this experiment a 35-bp fragment was prepared which included only the GGAAG sequences (5'(-220) -+ 3'(-186)). The autoradiograph in Fig. 6A shows two retarded bands (arrows 1 and 2, lane 2 ) following incubation with the GABP, subunit and these shifted bands were competed by the unlabeled oligonucleotide containing only the GGAAG sequences (lane 3 ) .
When the probe was incubated with both the GABP, and GABP,, subunits, the retardation of the probe differs. A more slowly migrating band (Fig. 6 A , band 3, lane 4 ) appears with the persistence of band 2, but band 1 that was observed in lane 2 is now missing. Competition with the unlabeled GGAAG containing oligonucleotide markedly reduces the intensity of bands 2 and 3 (lane 5 ) . A similar retardation of two complexes was observed by Thompson et al. (24) in the gel-shift assay using recombinant GABP, and a DNA substrate derived from the en-hancer of a n immediate early gene of herpes simplex virus, and the GABP,, subunit reacted with the GABP, on both complexes.
The same probe (Fig. 6 A ) was incubated with the 3T3 nuclear extract and shows the mobility shift of a fragment (lanes 7 and 9, arrow 4 ) which is competed out by the GGAAG containing oligonucleotide (lanes 8 and IO). Fig. 6B is a longer exposure of the gel and shows a doublet (lanes 7 and 9, arrow 5 ) that is seen as lighter bands in Fig. 6A and which is also competed out (lanes 8 and 10) by the GGAAG containing oligomer. We do not know whether this doublet is due to some additional factor in this crude nuclear extract which is interacting with the putative primary GABP transcription factor or whether it is an artifact (although it was observed in replicate assays) but it is evident by the competition studies that the GGAAG repeat sequences compete out both components of this doublet.
A mobility shift of the longer probe containing both the G rich and GGAAG motifs (5'(-233) -+ 3'(-186)) with the nuclear extract is seen in Fig. 6C and shows that just the lower band (arrow) is competed out by the unlabeled oligomer containing only the tandem linked GGAAG pentamers. The upper band (arrowhead), that is not competed out with this oligomer, is the analogous band in Fig. 5B which is the mobility shift due to the SP1 protein in the nuclear extract binding to the G-rich sequence. With this longer probe we did not identify the doublet shown in Fig. 6B (band 5) that appears to be a GGAAG motif for the GABP by the competition studies. Fig. 6D shows the super- electrophoresis because the shift of the same probe with the control non-immune serum (lane 2 ) is more intense. Fig. 6E shows the mobility gel-shift of the shorter 32P-labeled GGAAG oligonucleotide (Le. lacking the G-rich sequence obtained with the nuclear extract (lane 2)) and this is again competed out by the unlabeled GGAAG oligomer (lane 3). This mobility shift, however, was not competed out by the oligonucleotide in which four GA dinucleotides of the GGAAG pentamers were mutated to CT dinucleotides (lane 4 ) .
It is of interest that the purified GABP;GABPD1 complex and the 3T3 nuclear extract differ in the retardation of the GGAAG containing oligonucleotide. This could be the result of a n intrinsic property of the ets-encoded GABP-related proteins expressed in 3T3 cells which differs from the purified GABPJ GABPDl subunits, or it could be secondary to some alteration of the GA-binding proteids) during the preparation of the nuclear extract. The competition observed with competing unlabeled GGAAG containing oligonucleotide, however, and the failure of the mutant oligomer to compete, establishes the specificity of the interaction of a component of the nuclear extract with this ets binding motif.
The gel-shift mobility assay (Fig. 7) was also used to estab- For this assay a 35-bp "P-labeled probe, 5'(-220) -3'(-186), was prepared. A: lune 1 , probe alone; lune 2, probe plus 10 ng of GABP,; lune 3, same as lane 2 plus the addition of 50 ng of unlabeled oligonucleotide sequence; lune 4, same as lune 2 plus the addition of 10 ng of GABP,, subunit; lune 5, same as lune 4 plus the addition of 50 ng of the unlabeled oligonucleotide sequence. For lunes 6-10, the same "2P-labeled oligonucleotide probe was incubated with the 3T3 cell nuclear extract. Lane 6, "2P-labeled probe plus 7 pg of nuclear extract protein; lune 7, same as lune 6 plus 2 pg of poly(dI-dC), and 3.5 pg of nuclear extract protein; lune 8, same as lune 7 plus 50 ng of unlabeled oligonucleotide; lune 9, same as lane 6 plus 2 pg of poly(d1-dC); lune 10, same as lune 9 plus 50 ng of unlabeled oligonucleotide. B, same as A but a longer exposure of the gel. C, mobility shift of the longer probe, [5'(-233) + 3'(-186), with the 3T3 cell nuclear extract. Lane 1, the "P-labeled probe and 3.5 pg of nuclear extract protein; lune 2 same as lane 1 plus 25 ng of the GGAAG containing oligonucleotide; lune 3 same probe plus 7 pg of nuclear extract protein; lune 4 , same as lune 3 plus 25 ng of the GGAAG containing oligomer. D, supershift of the shorter probe (5'(-220) + 3'(-186)) incubated with the nuclear extract and the antiselvms to the GABP protein. Lane I , 32P-labeled probe alone; lune 2, probe plus 0.8 pg of nuclear extract protein; lune 3, same as lane 2 plus 1.5 pl of normal rabbit serum; lunes 4 and 5, same as lune 2 plus 1.5 pl of antiserum to GABP, and GABP,, respectively. E , gel-shift mobility assay of the same probe as in B with the competing GGAAG containing oligomer and the oligomer containing the GA dinucleotides mutated to the CT dinucleotides. Lane I , probe alone; lune 2, probe plus 14 pg of nuclear extract protein; lune 3, same as lune 2 plus 6.5 ng of GGAAG containing oligomer; lune 4, same as lune 2 plus 6.5 ng of the GA -CT mutated oligomer.
lish the validity of the computer selection of the putative TATA following incubation with the TFIID protein and this shift has box as the binding site for the TATA protein (TFIID) even been blocked by the competing unlabeled TATA sequence (lanes though this sequence was not located in the usual site 25-30-bp 3-5). upstream from the transcription start site (23). This TATA box  placenta and its structural organization, shown in Fig. 8, is identical to the organization of the gene encoding the FBP in KJ3 cells as well as its related pseudogene (15). Exon 2 is interrupted by a 2.2-kb intron which is similar to the 2.5-kb intron of the FBP/KB gene. Moreover, the introns are interrupting the codons of all the coding regions of these genes at the same sites. However, introns 3 and 4 of the FBPPL-1 gene share less than 65% homology with F B P m gene and FBP/KBpseudogene. Table I summarizes the intron-exon regions of FBPPL-1 gene. Except for the donor splice site of tntron 1, the exon/ intron junctions are in good agreement with the consensus sequence at both the donor and acceptor splice sites (25). Introns 2 and 3 are type 0, whereas intron 4 is type 1, with the intron splice site after the first G of the glycine codon.
The alignment of the amino acids deduced from the FBPI PL-1 gene and corresponding cDNA (13), and the FBP/KB gene (15) and corresponding cDNA (11,121, is shown in Fig. 9. Those sequences deduced from exons 3 and 4 have the highest identity (77%) compared to those sequences deduced from exon 2 (62%) and exon 5 (73%). There are a number of clustered amino acid sequences having a positive charge that are common to the KJ3 cell FBP (11, 121, the placenta FBP (13), the human milk ding Protein Gene 4733 FBP (261, and the bovine milk FBP (27). These include (from RIz2; such charged clusters may be the site of ligand binding as suggested by Svendsen et al. (27).

DISCUSSION
There are two FBP(s) expressed in human placenta. One is identical to the FBP in KJ3 cells and the cDNA encoding this FBP has been cloned from a library prepared from KB cell (11) and placental RNA (12). Ratnam et al. (13) purified a second FBP from human placenta and cloned the cDNA encoding this protein which proved to be distinct from the cDNAencoding the GPI-FBP. Page et al. (14) have recently reported the cloning of a similar cDNA and the corresponding gene which, however, lacked details of the structural and functional organization of the 5'-flanking promoter region.
The proximal upstream region of this FBPPL-1 gene is of considerable interest for a number of reasons. First, an SP1 binding region that is contained within a 13-nucleotide G-rich sequence, approximately 78 bp upstream from transcription initiation, is not a classical GC box in which the hexamer, GGGCGG, is the SP1 binding motif (23). However, the gel-shift mobility assay using purified SP1 protein and competing SP1 binding oligonucleotide, and the supershift assay with anti-SP1 antiserum has established that this sequence, with T substituting for the C in the hexamer, is a motif which binds the SP1 transcription factor. Second, this G-rich sequence is fused to a tandemly duplicated GGAAG pentamer which is the binding motif for a number of related ets oncogene-encoded nuclearbinding proteins that are believed to be transactive transcription factors (16). The gel-shift mobility assay using purified GABP, and GABP,, and the specificity established by competition with the unlabeled oligomer containing this tandem duplicated GGAAG pentamer is evidence that this motif is the sequence to which the ets-encoded proteins bind. A similar gelshift observed with the nuclear extract from 3T3 cells that was competed out with the unlabeled oligonucleotide containing these GGAAG sequences but not with an oligomer in which the GA dinucleotides were mutated to CT oligonucleotides, provides additional evidence that these pentameric motifs are the binding site(s) for the ets related proteins. A similar organization of the SP1 binding sequences with the downstream tandemly duplicated GGAAG ets motifs has recently been identified by Carter et al. (28) as a basal transcription promoter element in the nuclear-encoded cytochrome oxidase subunit IV (COW) gene. More recently, Virbasius et al. (29) have purified and sequenced from HeLa cells nuclear respiratory factor 2 which is involved in the transcriptional regulation of the proximal promoter of this COIV gene. Nuclear respiratory factor 2 is a multimeric protein comprised of 5 subunits with amino acid homology to GABP,, GABP,,, and GABP,, subunits (24). Virbasius et al. (29) have also proposed that nuclear respiratory factor 2 is the human homolog of mouse GABP and, therefore, is involved in regulating the expression of cellular genes. The similar organization of the SP1-GGAAG tandem repeats in the proximal promoter region of the FBPPL-1 gene and the promoter activity of this region in driving expression of the CAT reporter gene suggests that this domain may also regulate expression of this placental FBP.
These G-rich and tandemly linked GGAAG cis elements may provide a mechanism for tissue specific expression of the FBP(s). The GABP, and GABP,, subunits of the GA-binding protein complex form a heteromeric aZp2 tetramer which binds to the GGAAG motif for most efficient induction of transcription (30). Whereas the GABP, binds to the GGAAG sequence in the absence of the GABP,, (or GABP,,) subunit, GABP,, (and GABP,,) does not bind to the DNA directly but rather, the a &

-F N K W + V G A A C Q P F H Q Y F P P W L C N E~S R G S X C I~P A C Q 4
-220 """""""""""""""""""", The numbers in the right margin are the amino acid numbers starting represents amino acid homology;represents a gap in the alignment.
with the first methionine.
tetramer increases the binding affinity of the complex to the GGAAG sequence (24). We have also observed that GABP, shifts the GGAAG containing probe to two positions and this could be due to additional binding to the GGAAG pentamer further downstream from the G-rich tandemly linked GGAAG complex. The addition of GABPpl clearly shifts both components of the probe which had been retarded by GABP, alone. Thus, only tissues and cells which express both subunits of this GABP complex will express the FBP encoded by this gene. Another interesting organization of the 5"flanking region of this FBPPL1 gene is the location of a TATA box between -360 and -380, well upstream from the G-rich SP1 binding sequence. In our initial studies of this gene: primer extension using a primer to the region 5'(-190) --f 3'(-213) and RNA from placenta identified a transcription initiation site in a putative cap consensus sequence 33 bp downstream from this TATA box. However, we could not confirm this transcription start site by repeated RNase protection assays and we, therefore, concluded that this TATA sequence is not a regulatory element for this FBPRL-1 gene. Since the multigene family encoding the human FBP(s) has arisen by duplication of an ancestral gene (see below) (151, evolutionary divergence that follows such duplication could modify regulatory elements that provide high ex-E. Sadasivan, M. M. Cedeno, and S. P. Rothenberg, unpublished. pression (i.e. TATA box promoter) andor tissue specificity. The evolutionary modification of the ancestral FBP gene to a TATAless "housekeeping" gene (31) for low level constitutive expression of the placental FBP could be a n appropriate adaptation to the development of the placenta in mammalian reproduction since this would ensure a mechanism to transfer folate from maternal to fetal circulation. Ragoussis et al. (32) have located the family of FBP genes in a tandem arrangement within a 140-kb region on chromosome 11 (q13.3-q13.5) and this organization could be more permissive for such a regulatory evolutional adaptation following duplication of the ancestral FBP gene so that different forms of the mammalian FBP(s) may have specialized functions in folate metabolism.
A puzzling finding initially was the divergence of the nucleotide sequence of the 5'-region upstream of the first exon in the FBPPL-1 gene from the 5'-untranslated region of the placental FBP cDNA as reported by Ratnam et al. (13). A number of subsequent studies, however, have clearly established that the 5"untranslated sequence in the cDNA differed from this corresponding region of the gene as a consequence of a cloning artifact. First, PCR using a sense amplimer to a divergent sequence of the FBPPL-1 gene and an antisense amplimer to a sequence common to the gene and the cDNA, amplified a fragment from genomic DNA precisely the size predicted from the distance between the two amplimers (213 bp) and it has the same nucleotide sequence as this divergent region of the gene. Second, the RNase protection assay to locate the transcription start site used a probe extending from 5'(-126) + 3'(-25) which includes both the sequences in the gene in common and divergent from the cDNA cloned by Ratnam et al. (13) and we located the cap site within the divergent region of the gene at base -153. If there was another species of mRNA encoded by the sequence in the cDNA, there should have been a 92-nucleotide protected fragment (5'(-117) + 3'(-25)) which corresponds to the sequence common to the genomic clone and the cDNA, and this was not observed. Finally, reverse transcriptase-PCR generated a predicted size fragment from placental RNA using an antisense primer to the common region for the first strand synthesis of the cDNA and a sense amplimer to the divergent sequence for subsequent amplification, whereas no amplified fragment was obtained using a sense primer to a sequence unique for the cDNA.
A property shared by these FBP(s) is that they have higher affinity for oxidized folate (Le. pteroylglutamic acid) and lower affinity for the more physiologically active reduced folate cofactors and this suggests that these proteins are likely to have the same ligand binding domain. Accordingly, we analyzed the KB cell, placenta, human milk, and bovine milk FBPs for common "charged" domains suggested by Svendsen et al. (27) as the likely region for binding the folate ligand. Of the charged clusters common to these FBPs (see above), the sequence R119-K120-E121-R122 may be the most likely putative folate-binding site because it is flanked on the amino and carboxyl sides by hydrophobic residues W1I8 and F123-L124, respectively, that could form a hydrophobic pocket for the pteridine domain of the folate molecule. Moreover, there are 9 cysteine residues (positions 83, 90, 99, 103, 129, 133, 140, 146, and 160) that encompass this cluster and which can form stabilizing disulfide bridges that are essential for ligand binding function (33).
The GPI-FBP isolated from KB cells is one of a recently described class of proteins that is anchored to the external surface of the cell membrane by a glycosylphosphatidylinositol tail (34). This GPI tail is attached to the carboxyl-terminal residue and for this to occur a stretch of hydrophobic amino acids in the carboxyl terminus of the nascent protein must first be removed. This region of the KB cell FBP differs from the amino acid sequence encoded by the FBPPL-1 gene; i.e. histidine (2281, valine (229), and asparagine (230) replace serine, glycine, and alanine, respectively, in the KB cell FBP and these amino acids have not been reported to be the carboxyl terminus of GPI-tail proteins (35). Since there is little overall amino acid sequence homology in this carboxyl-terminal region of the KB cell FBP cDNA and this FBPPL-1 gene, it may very well be that this placental FBP is not a GPI-linked protein and it may be anchored to the cell membrane by a transmembrane stretch of hydrophobic amino acids having the a-helical configuration. In fact, Antony and co-workers (36) have previously shown that a placental FBP can be released in a soluble form by a n endogenous Mg2+-dependent protease in this tissue. Verma et al. (37) have recently demonstrated that a FBP expressed in cultured chorionic villi is GPI-linked but the precursor labeling technique used may have only labeled one of the two FBP(s) identified in placenta (13). Obviously, a FBP lacking a GPI tail would not incorporate precursor substrates into this structure and would, therefore, not be identified by this method.
The high degree of homology between exons 3 and 4 and the type of introdexon junctions between all the defined introns of this FBPPL-1 gene and the FBPKB genes indicate that these genes are derived by duplication of a n ancestral gene and they have remained on chromosome 11 (q13.3-q13.5) during evolution (32). The nucleotide divergence between the FBPPL-1 and F B P m genes suggests that the duplication occurred about 350 million years ago, based on the estimate of lo6 years for each 0.17% divergence (38).
Finally, the fact that two different molecular forms of functionally similar proteins are found in human placenta (13) suggests that tissue-specific factors are likely to be involved in the expression of the genes encoding these FBPs. Placenta contains tissue components derived from both maternal and fetal origin and it will be of interest to establish the source of the GABPOI (or GABP,,) subunit of the GABP heteromeric complex that may be necessary for constitutive expression of the FBPPL-1 gene.