Characterization of the promoter for the alpha 1 (IV) collagen gene. DNA sequences within the first intron enhance transcription.

Two overlapping clones spanning 19 kilobase pairs (kb) of the 5' end of the alpha 1 (IV) collagen gene were isolated and found to contain a single exon which encoded the 5'-untranslated sequence and 84 base pairs of the signal peptide. The 5' end of this exon was determined to be the 5' end of the transcript by S1 nuclease protection and primer extension. The nucleotide sequence of 1 kb of the 5'-flanking DNA was extremely G + C-rich (greater than 70%) and contained two GC boxes and a putative cAMP regulatory sequence. The transcriptional regulation of the alpha 1 (IV) gene was studied with chimeric gene constructs utilizing 2.5 kb of the 5'-flanking sequence coupled to the gene for chloramphenicol acetyltransferase. Transfection of this construct into differentiating F9 cells resulted in low chloramphenicol acetyltransferase activity compared to beta-actin or Rous sarcoma virus long terminal repeat promoters, although these cells produce large amounts of collagen IV. Inclusion of a 2.7-kb sequence 2.3 kb downstream from the first exon in either orientation increased the transcription of the chloramphenicol acetyltransferase construct approximately 10-fold in F9 cells, but was not active in NIH 3T3 cells, which synthesize little collagen IV. These results indicate the presence of an enhancer within the first intron, which increases the expression of this gene.

components including laminin and collagen IV (12,13). One frequently studied model of this process is the differentiation of F9 teratocarcinoma cells exposed to retinoic acid and dibutyryl CAMP, which show a rapid and coordinate expression of the genes for basement membrane proteins including laminin and collagen IV chains (14). Most models of gene regulation implicate control regions at the 5' end of the gene. For this reason, we have isolated murine genomic clones coding for the amino-terminal portion of the al(1V) collagen chain. Here we describe the characterization of the clones containing the first exon and the 5"flanking sequence of the al(1V) chain gene of collagen IV. We have also begun to identify the sequences necessary for expression of this gene by transfecting plasmids, which contain potential regulatory sequences of this gene coupled to chloramphenicol acetyltransferase, into both undifferentiated and differentiated F9 cells. Our results indicate that noncoding DNA sequences located within 5 kb' downstream of the first exon of the al(1V) gene are required for the expression of the al(1V) collagen gene. Identification of First Exon-A 140-bp fragment from the 5' end of a cDNA to the N-terminal portion of the murine (ul(IV) chain mRNA' was used as probe to obtain two overlapping genomic clones from a murine genomic library (the generous gift of P. Leder, Harvard University). A single exon was identified within these genomic clones by hybridization of the cDNA probe in Southern blots and by sequence analysis (see Fig. l).' The nucleotide sequence of a 1.5-kb HindIII-XhoI fragment containing the putative first exon was obtained by subcloning random fragments produced by sonication in M13 phage and sequencing the single-stranded phage DNA by the dideoxy chain termination method with universal primers (15). Ambiguous sequences were resolved by sequencing plasmid DNA with synthetic oligonucleotides and avian myeloblastosis virus reverse transcriptase (16). Residual ambiguities were resolved by chemical degradation methods (see Fig. 2).
The 5' boundary of the exon was determined by S1 nuclease mapping performed according to the method of Berk and Sharp (17). A NcoI-XbaI fragment 5' end-labeled at the NcoI site (IO6 dpm) was precipitated with 10 pg of tRNA or poly(A+) RNA from differentiated F9 cells (14); dissolved in 80% formamide, 0.4 M NaC1,40 mM PIPES, pH 6.4, 1 mM EDTA; denatured at 90 "C for 10 min; and hybridized at 52 "C. After 3 h, three equal aliquots were diluted 10-fold in icecold S1 buffer (0.25 M NaCl, 30 mM potassium acetate, pH 4.6, 1 mM The abbreviations used are: kb, kilobase pair(s); bp, base pair(s); ' Killen, P. D., Burbelo, P. D., Sakurai, Y., and Yamada, Y. (1988) PIPES, 1,4-piperazinediethanesulfonic acid.  Structure and activity of the cul(ZV) collagen gene promoter and first intron-chloramphenicol acetyltransferase constructs DNA fragments from the promoter and first intron were subcloned into the promoterless pSVOCAT-X plasmid. These constructs were transfected into F9 and NIH 3T3 cells as described under "Experimental Procedures" and chloramphenicol acetyltransferase (CAT) activity was determined 48 h later. The values shown are percentages of total chloramphenicol converted to acetylated derivatives. This percentage was calculated from data obtained by counting radioactive material in the appropriate regions of the thin-layer chromatograms. -ZnS04) containing 100, 500, or 1000 units/ml S1 nuclease. After incubation at 37 "C for 30 min, the digestion was stopped by phenol extraction, and the DNA fragments were precipitated. To identify the 5' end of the murine al(1V) transcript, an analytical primer extension experiment was performed using an end-labeled synthetic oligonucleotide (5'-CATGGTGGCGCGCCCGGGGC-3') corresponding to the 5' 20 bp of the genomic fragment used for S1 nuclease mapping described above. The oligonucleotide primer was annealed with 2 pg of poly(A+) from differentiated F9 cells, and the primer was extended with 10 units of avian myeloblastosis virus reverse transcriptase at 43 'C in 50 mM Tris-HC1, pH 8.3, 80 mM NaCl, 40 mM KCl, 6 m M MgC12, 10 mM dithiothreitol. After 1 h, the reaction was stopped by phenol extraction, and the reaction products were precipitated with ethanol.

J.
The DNA fragments from S1 nuclease digestion and primer extension were resuspended in 80% formamide; denatured at 90 "C; and analyzed in parallel on a 6% polyacrylamide, 7 M urea gel with products of chemical degradation of the end-labeled NcoI-XbaI fragment used in sequencing (18).
Chimeric Gene Construction-To subclone the 5'-flanking sequences of the murine al(1V) transcript, pSVOCAT (19) was linearized at the H i d 1 1 site, the site was filled in with the Klenow fragment of DNA polymerase, and the product was ligated with XhoI linkers (designated pSVOCAT-X). All subsequent constructs were obtained in this derivative of pSVOCAT. To obtain al(IV) promoter chloramphenicol acetyltransferase constructs, a plasmid containing the 2.7kb BamHI-XhoI genomic fragment was linearized at the unique NcoI site present within the first exon. Brief digestion with Ea131 exonuclease was used to remove the translation initiation site and a short portion of the 5"untranslated sequence. Fragments containing the residual first exon and approximately 2.5 kb of the 5'-flanking sequence were excised with BamHI, the ends were polished with S1 nuclease and the Klenow fragment of DNA polymerase, and XhoI linkers were attached. Following removal of excess linkers, the fragments were cloned in pSVOCAT-X. The orientation of the genomic fragments and the extent of Ba131 deletion were determined by nucleotide sequencing. One of these Constructs (p47) containing the sequence from positions -2500 to +74 was selected for further constructs. A truncated promoter construct (at positions -715 to +74), p47A, was prepared by cutting p47 with XbaI, filling in the ends with Klenow fragment, and attaching XhoI linkers. Following removal of excess linkers, the 0.8-kb fragment was subcloned in pSVOCAT-X, and its orientation was determined as described above.
To assess the role of regulatory elements within the first intron, various restriction fragments were subcloned into the unique BamHI site located 3' to the chloramphenicol acetyltransferase gene. Derivatives of p47 and p47A containing various restriction fragments from the first intron of the gene are listed in Table I. A 5-kb XhoI-EcoRI fragment was subcloned into p47 and p47A after attachment of BamHI linkers and designated p48 and p52, respectively. A 2.3-kb XhoI-XbaI fragment and a 2.7-kb XbaI-EcoRI fragment were prepared similarly and subcloned in both orientations in the BamHI site of p47A. All plasmids used for transfection were banded by equilibrium centrifugation in cesium chloride density gradients three times. Transient Transfection and Chloramphenicol Acetyltransferase Assay"F9 teratocarcinoma and NIH 3T3 cells were grown in Dulbecco's modified Eagle's medium supplemented with fetal calf serum (lo%), glutamine (300 pg/ml), penicillin (400 units/ml), and streptomycin (50 pg/ml). One day before transfection, 250,000 cells were plated in 60-mm tissue culture dishes. The cells were transfected with 5 pg of plasmid DNA by the calcium phosphate method (20). Twenty-four hours after transfection, the DNA was removed by five washes with serum-free medium, and the cells were incubated either in normal medium or in the presence of M dibutyryl cAMP and low7 M retinoic acid. Forty-eight hours after treatment, the cells were harvested, and chloramphenicol acetyltransferase activity was determined by the method of Gorman et al. (19).

RESULTS
Identification of First Exon-We previously isolated a 530bp cDNA clone for the murine al(1V) chain from a library constructed by specific primer extension utilizing poly(A+) RNA from differentiated F9 teratocarcinoma cells. This cDNA encodes 141 bp of the 5'-untranslated sequence, a signal peptide, and a portion of the N-terminal cross-linking domain. 2 We used a 140-bp fragment of the 5"untranslated portion of this cDNA to obtain two distinct but overlapping genomic clones which contained the first exon and 17 kb of the 3"flanking intron sequence. The nucleotide sequence of this exon agrees perfectly with the 5' 225 bp of the cDNA up to a typical splice donor consensus sequence. This exon included portions of the 5'-untranslated sequence and 84 bp coding for the first 28 amino acids of the protein (Fig. 1). By using a 128-bp fragment of the al(1V) cDNA as probe, a third genomic clone containing the second exon was isolated? This clone contained approximately 13 kb of the 5"flanking sequence which showed no overlap with the two genomic clones containing exon 1, suggesting that the first intron is a t least 30 kb. S1 nuclease protection and primer extension were utilized to identify the 5' boundary of the exon and the transcription initiation site(s) of the gene. For S1 nuclease analysis, a 868bp NcoI-XbaI genomic fragment 5' end-labeled at the NcoI site was annealed to mRNA isolated from F9 teratocarcinoma cells treated with retinoic acid and dibutyryl cAMP and digested with different concentrations of S1 nuclease. Two fragments at +1 and -184 bp were protected even at high S1 Upper, schematic representation of probes for S1 nuclease mapping and primer extension experiments. The asterisks show labeling position. H, HindIII; X, XbaI; N, NcoI. Lower, S1 nuclease and primer extension studies. Primer extension studies were carried out with a 20-mer oligonucleotide derived from the NcoI site and labeled at the 5' end as primer (lanes 1,8, and 9). The template for reverse transcriptase was 5 pg of poly(A+) RNA from differentiated F9 cells. Hybridization of S1 nuclease mapping was with a NcoI-XbaI fragment 5' end-labeled at the NcoI site and 10 pg of tRNA (lanes 2-4) or differentiated F9 poly(A+) RNA (lanes 5-7). Lanes 2 and 5, S1 nuclease, 1000 units/ml; lanes 3 and 6, S1 nuclease, 250 units/ml; lanes 4 and 7, S1 nuclease, 100 units/ml. The Maxam and Gilbert (18) A + G reactions were carried out with the same NcoI-XbaI fragment as used for S1 nuclease mapping (lane 10). Electrophoresis was with a 6% acrylamide 7 M urea sequencing gel. Arrows denote products detected in both primer extension and S1 nuclease protection lanes. The numbers on the right represent the major transcriptional initiation sites. The major site is designated +l. nuclease concentrations (Fig. 2, lower, lanes 5-7). As expected, no protected fragments were observed when the genomic fragment was annealed with tRNA ( Fig. 2, lower, hnes 2-4). Both the S1 nuclease-protected fragments and the primer extension products showed microheterogeneity around the initiation sites. The absence of primer-extended products longer than those predicted by S1 nuclease analysis excludes the possibility that S1 nuclease analysis was detecting intron boundaries rather than transcription initiation sites.
Nucleotide Sequence of Promoter-The nucleotide sequence of the 5"flanking region of the murine al(1V) gene is shown in Fig. 3. The major transcription initiation sites are shown.
The 5'-flanking DNA has a high G + C content (75%). DNA sequence analysis identified numerous inverted repeats clustered close to exon 1. GC boxes, i.e. GGCGGG and CCGCCC, occur at positions -4 and -63. Other repeats such as the sequence from positions -301 to -316 showed an inverse

A A G C T T G G C A G C A A A A G G T G C C T G A G G C T A C G T T T T A A T T A T T A G C T C C C T A G A G A G G G C C G A C T C C T G C A C T G C A C G G G A C G A G C G G C T
9 0 -1 1 7 0 -1 1 4 0 -1 1 1 0

Z
GC BOX CAAT BOX  complementarity to the sequence from positions -247 to -262. These repeats and others could form stable hairpin structures affecting chromatin structure in this region. N o TATA box was observed 5' to either initiation site, whereas CAAT boxes (at positions -26, -224, and -340) were found only on the noncoding DNA strand. In addition, a sequence similar to the SV40 enhancer core sequence (21) was present at positions -200 to -207, and a sequence from positions -368 to -381 was similar to the consensus sequence of a cAMP regulatory element (22).

C T T C T G C T G C T C T T C G C C G C C C T T C T G C T C C A C G A G G A G C G C A G C C G A G C A G C T G C G A A G G T G A G T T C C C T G C G G G C G G T G C G C C C C C A G 1 4 4 0 L a u L a u L a u L a u P h a A l a A l a L a u L a u L a u H l a G l u G l u A r g S a r A r g A l a A l a
Promoter and Enhancer Activity-To test the functional activity of the promoter, chimeric gene constructs were prepared with the 5'-flanking sequence of the murine al(1V) gene directing the transcription of the prokaryotic gene, chloramphenicol acetyltransferase. These constructs were transfected into F9 and control cells. In preliminary experiments, we found that differentiated F9 cells, which synthesize copious amounts of basement membrane proteins, took up the DNA poorly in comparison to undifferentiated F9 cells. For this reason, DNA was transfected into undifferentiated F9 cells, which subsequently were induced to differentiate by treatment with retinoic acid and dibutryl CAMP. After 48 h of exposure to these agents and 72 h after transfection, the mRNA for the al(1V) chain is significantly increased relative to untreated cells (14). In contrast to 8-actin and Rous sarcoma virus long terminal repeat promoters, which show a high level of expression in F9 cells under these conditions, p47 and p47A showed only low chloramphenicol acetyltransferase activity which was little different from the promoterless pSVOCAT-X plasmid.
Since minimal promoter activity was detected, we looked for other sequences within the gene which might enhance transcription. In this series of constructs, approximatel 5 kb of the intron immediately downstream from exon 1 were placed downstream from the chloramphenicol acetyltrans ase gene. Regardless of whether 2.5 or 0.8 kb of the 5'-flanking sequence were present, a high level of chloramphenicol acetyltransferase expression was noted in both differentiating (see Table I) and undifferentiated F9 cells. Little chloramphenicol acetyltransferase activity was noted when these constructs were transfected into NIH 3T3 cells, which synthesize approximately '/40 the amount of collagen IV as differentiated very large (>30 kb), we decided to obtain the promoter region in genomic clones utilizing a specific cDNA probe representing the 5' portion of the al(1V) chain mRNA. We obtained overlapping genomic clones that contains 19 kb of sequence including 2.5 kb of the 5"flanking region, the 234 bp of the first exon, and at least 15 kb of the first intron. Primer extension and S1 mapping were used to confirm that we had isolated the first exon of the murine al(1V) collagen gene and the promoter for the gene. About 1.5 kb of genomic DNA containing the first exon were sequenced. Unlike the promoters for the interstitial collagens (23-26), the al(1V) collagen gene does not contain a TATA box. The lack of a TATA box may explain the occurrence of multiple transcription initiation sites observed from primer extension and S1 protection experiments since this sequence is important for the precise localization of the RNA polymerase on the promoter (27). Three CAAT boxes were found on the noncoding strand within the promoter for the al(IV) collagen gene and could be of functional significance. Several other genes, including Rous sarcoma virus (28), hsp 70 (29), and the a2(I) chain of the collagen I promoter (25), have functional CAAT boxes on the noncoding strands.
The 5"flanking region of the al(1V) gene is G + C-rich and resembles the promoter of housekeeping genes such as hydroxymethylglutaryl-CoA reductase (30), hypoxanthine phosphoribosyltransferase (31), c-myc protooncogene (32), and the epidermal growth factor receptor (33). The al(1V) promoter also contains two SP1 binding regions. In addition, this promoter contains a sequence at position -368 which matches the cAMP consensus sequence, found necessary for the cAMP modulation of the phosphoenolpyruvate carboxykinase (22), somatostatin (34), and fibronectin (35) genes. It should be noted that cAMP increases collagen IV synthesis when added to cultures of differentiating F9 teratocarcinoma cells (13,36). The enhancer core sequence (21) is similar to the sequence GTGGAATC found at position -200 in the promoter. Thus, this promoter appears to have a number of potential regulatory sites. Recent sequence analysis4 of the promoters of the I31 and B2 chains of laminin has shown several regions of homology including the consensus sequence for CAMP, which suggests that the basement membrane proteins laminin, fibronectin, and collagen IV may be regulated by CAMP.
To test the putative promoter for transcriptional activity, chimeric chloramphenicol acetyltransferase expression plasmids containing the al(1V) promoter region were transfected into F9 teratocarcinoma and NIH 3T3 cells. The studies showed that the 5"flanking region of the al(1V) gene acts as a very weak promoter, not unlike the observations made with the promoter of the a2(I) (37), al(1) (38,39), and al(I1) (40) collagen chain genes. Constructs with portions of the first intron plus the promoter coupled to the chloramphenicol acetyltransferase gene were prepared, and these showed a much higher level of transcriptional activity in teratocarcinoma cells, but not in 3T3 cells. The active portion of the intron was localized to a 2.7-kb fragment 2.3 kb downstream from the first exon. This fragment was active when placed in either orientation in the construct, suggesting that it could be an enhancer element. Since enhancers have been identified in the first intron in the al (1) and al(I1) chain genes, this regulatory element could be a general feature of collagen genes.
Interestingly, the transcriptional activity of the active construct was as high in undifferentiated as in differentiated F9 teratocarcinoma cells, which differ greatly in the amount of Y. Yamada, manuscript in preparation. collagen IV they synthesize. Possibly, undifferentiated F9 cells produce factors which bind to the enhancer region necessary for the high level of transcription. The endogenous gene may fail to respond to these factors, perhaps because of altered chromatin structure and/or methylation state (41,42). Retinoic acid and cAMP treatment could induce changes in chromatin structure and methylation pattern in the regulatory region of the gene which allow for the high levels of expression. As shown here, the collagen IV gene contains a number of potential regulatory regions. Alternatively, as in other genes, negative regulatory factors may limit the transcription of collagen IV genes in undifferentiated F9 cells, and these are not present in the portions of the gene used in our constructs. The identification of such elements as well as the detailed characterization of the regulatory regions described here should help clarify the factors controlling the synthesis of collagen IV.