Structural and Functional Characterization of the Human Decorin Gene Promoter A HOMOPURINE-HOMOPYRIMIDINE S1 NUCLEASE-SENSITIVE REGION IS INVOLVED IN TRANSCRIPTIONAL CONTROL*

Decorin is a leucine-rich, chondroitin/dermatan sul- fate proteoglycan which binds collagen and growth factors. We have recently completed the genomic organiza- tion of human decorin and discovered two alternatively spliced leader exons, designated exon La and Ib, in the 5’-untranslated region. Initial analysis of the sequences upstream to these two exons showed that promoter Ia contained only two GC boxes while promoter Ib con- tained a CAAT and two TATA boxes in close proximity to the transcription start site. ’Ib determine if these 5’- flanking sequences exhibited promoter activity, chi-meric chloramphenicol acetyltransferase expression plasmids containing the promoter region of either exon Ia or Ib were transfected into HeLa and MG-63 osteosarcoma cells. The results showed that only the region flanking exon Ib was functional. In vitro transcription assay generated two transcripts of 92 and 82 base pairs (bp) indicating that both TATA boxes could be used. Using stepwise 5’ deletion analysis we found that the mini- mum promoter region at -140 bp from the transcription start site, which contained only the CAAT and the two TATA boxes, exhibited strong promoter activity. When a larger construct containing an additional 800 bp of up- stream region was tested, a significant increase in transcriptional activity was observed. Interestingly, this promoter region contained several putative binding sites for ubiquitous factors (AP1, A P S , and NF-KB) and

Decorin is a leucine-rich, chondroitin/dermatan sulfate proteoglycan which binds collagen and growth factors. We have recently completed the genomic organization of human decorin and discovered two alternatively spliced leader exons, designated exon L a and Ib, in the 5'-untranslated region. Initial analysis of the sequences upstream to these two exons showed that promoter Ia contained only two GC boxes while promoter Ib contained a CAAT and two TATA boxes in close proximity to the transcription start site. 'Ib determine if these 5'flanking sequences exhibited promoter activity, chimeric chloramphenicol acetyltransferase expression plasmids containing the promoter region of either exon Ia or Ib were transfected into HeLa and MG-63 osteosarcoma cells. The results showed that only the region flanking exon Ib was functional. In vitro transcription assay generated two transcripts of 92 and 82 base pairs (bp) indicating that both TATA boxes could be used. Using stepwise 5' deletion analysis we found that the minimum promoter region at -140 bp from the transcription start site, which contained only the CAAT and the two TATA boxes, exhibited strong promoter activity. When a larger construct containing an additional 800 bp of upstream region was tested, a significant increase in transcriptional activity was observed. Interestingly, this promoter region contained several putative binding sites for ubiquitous factors (AP1, A P S , and NF-KB) and for transforming growth factor-p and a 150-bp homopurinehomopyrimidine element with several mirror repeats. When contained in a supercoiled plasmid, this sequence exhibited sensitivity to endonuclease S1, an enzyme that preferentially digests single-stranded DNA. Precise S1 mapping, obtained by direct sequencing of nine distinct SI-generated clones, revealed that in all cases the borders of the sensitive sequence resided within the pur/pyr segment. We propose that this region of the promoter could adopt an intramolecular hairpin triplex structure in vivo and may play a role in the chromatin organization at the decorin gene locus. In addition, this region was able to up-regulate a minimal heterologous promoter in transient transfection assays. The results show that the structure of the decorin gene promoter is different from that of any other proteogly-* This work was supported in part by National Institutes of Health Grants CA-39481 and CA-47282 (to R. V. I.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18  can promoter characterized so far and indicate that the pur/pyr segment plays a role in the regulation of gene transcription.
The interest in decorin, a leucine-rich proteoglycan expressed by both vascular and avascular connective tissues, has recently increased by discoveries that this molecule displays diverse functions serving as a biological ligand for both matrix molecules and growth factors (1-3). Specifically, decorin binds collagen type I i n vitro thereby inhibiting collagen fibrillogenesis (4,5) and also binds TGF-P1 removing this potent cytokine from the immediate microenvironment thus neutralizing its biological activity (6). This binding property of decorin to TGF-P has been recently confirmed in an experimental animal model of glomerulonephritis in which infusion of recombinant decorin prevented the fibrotic degeneration of renal glomeruli (7). These biological attributes of decorin suggest that this molecule may also play a role in the development of cancer since constant remodelling of extracellular matrix is required for tumor growth and development (8). Indeed, we have discovered a marked up-regulation of decorin gene expression in the stroma of human colon cancer (9, 10) and have demonstrated that this phenomenon could be reproduced in an in vitro system utilizing mesenchymal cells and colon cancer cells (11, 12). Subsequently, we have shown that the aberrant expression of decorin in colon cancer is associated with a hypomethylation of the decorin gene (13, 14).
We have recently characterized the genomic structure of the human decorin gene and discovered two novel leader exons in the 5'-untranslated region of decorin (15). Over 13 kb of intervening sequence separate these two exons which are alternatively spliced to exon 11, a feature also present in avian and bovine decorin genes (15). Because of the existence of two leader exons, we considered the possibility of a two-promoter system regulating cell or tissue-specific expression of decorin.
To make possible a detailed molecular understanding of the regulation of decorin gene expression, we investigated the structural and functional activity of the two putative promoter regions (Ia and Ib). Using transient transfection assays and stepwise 5' deletion analysis, we found transcriptional activity only in promoter Ib. A number of positive and negative cisacting elements were identified, including a TGF-P negative element and a homopurinehomopyrimidine stretch in the distal promoter region that was sensitive to S1 nuclease. This region may adopt an intramolecular hairpin triplex i n vivo and could play a role in the organization of the chromatin at the The abbreviations used are: TGF-p, transforming growth factor p; PCR, polymerase chain reaction; CAT, chloramphenicol acetyltransferase; kb, kilobase paids); bp, base paids). decorin gene locus. The results of this study identify sequences important for the transcriptional control of the decorin gene and offer the opportunity to investigate the abnormal expression of this gene during fibrosis and tumor growth.
Genomic Clones and DNA Sequencing-DEC-10 consisted of a 3.4-kb fragment containing the first exon (Ib), the 5'-flanking DNA, and the second exon (244 bp) of the human decorin gene (15). DEC-1 consisted of a 4.0-kb fragment of the human decorin gene containing an alternative first exon (Ia) and the 5'-and 3"flanking DNA. DNA isolated from both sources was subcloned into pBluescript KS (Stratagene). Plasmids were sequenced by a modified dideoxynucleotide chain termination method (16) using either polylinker primers T3 and T7 or synthetic oligonucleotide primers as described before (17,18). In addition, automatic sequencing of constructs was performed using the Applied Biosystem model 373A DNA sequencing system according to the manufacturer's instructions. Ambiguities were resolved by modifymg sequencing reactions or electrophoretic conditions to enhance DNA sequences proximal or distal to the primers. Alignment of nucleotide sequences and comparison with EMBL and NBRF data bases were performed utilizing the programs contained in PC/GENE (Intelligenetics) including CD-ROM data base (release 7.1) and the programs contained in the GCG package of the Jefferson Cancer Institute.
In Vitro IFanscription Assay-A PCR-generated 1108-bp fragment containing exon Ib and -1 kb of 5'-flanking sequence was phosphorylated at the 3' end and digested with restriction endonuclease SacI at the 5' end. This fragment was subcloned into pBluescript KS digested with SmaI and SucI. Plasmid DNA of this construct was purified by affinity chromatography on Qiagen columns (Qiagen Inc., Chatsworth, CA) according to the manufacturer's protocol and digested with EcoRI in order to linearize the plasmid at the terminal end of exon Ib. HeLa nuclear extract was prepared according to the method of Stein et al. (21). Run-off transcription products were generated from in vitro transcription of this linearized DNA (1 pg) containing either HeLa nuclear extract or 90 pg of HeLaScribe nuclear protein extract (Promega, in vitro transcription grade) according to the method of the manufacturer and analyzed on a denaturing 8% polyacrylamide gel.
Construction ofNested Deletions within the Decorin Promoter-Using DEC-10 plasmid as a template, a 1.43-kb fragment of the human decorin gene was amplified by PCR from the 5'-flanking region of exon Ib using the T7 primer and an antisense exon Ib primer which contained an additional BamHI sequence at its 5' end. This PCR fragment was digested with SalI and BamHI and subcloned into the PUC-CAT vector in the reading frame of the chloramphenicol acetyltransferase (CAT) reporter gene. This construct was designated PUCDEC-IOKAT. The PUC-CAT vector contained a promoterless CAT reporter gene within PUC-9. A 400-bp HindIII-Sal1 fragment was subcloned into PUCDEC-lO/CAT. This construct was digested to completion with SalI and SacI which generated 3' overhanging SacI and 5' overhanging Sal1 ends. This allowed us to generate deletions at regular intervals from the SalI site (-983 nucleotide position relative to the transcriptional start site) which was sensitive to exonuclease 111. Deletions from this end spanning from -983 to +35 in each of plasmids were made by exonuclease I11 digested with the Erase-a-base kit (Promega) and done according to the protocol of the manufacturer. The exact 5' end was determined by sequencing each individual clone. Another deletion was made by a restriction endonuclease digestion of PUCDEC-lO/CAT plasmid with SulI. This linearized DNA was incubated with deoxynucleotides and Klenow polymerase to generate blunt ends (22). This blunt ended, linearized DNA was digested with SmuI producing three DNA fragments.
One of them spanning from -983 to -140 in decorin promoter was discarded, while the other two fragments were purified by electrophoresis in a 1% agarose gel, isolated from the gel using the Geneclean I1 Kit (BIO 101 Inc., La Jolla, CA), and ligated together. The resulting plasmid contained a decorin promoter sequence spanning from -140 to + 35 bp which was fused to the CAT reporter gene. Additional constructs containing the pur/pyr segment linked to a minimal heterologous promoter of DNA polymerase a are described under "Results and Discussion" and the legend to Fig. 10. Each construct was sequenced to determine correct ligation. S1 Nuclease Sensitivity Assay-Supercoiled PUCDEC-IOKAT plasmid containing the decorin promoter and exon Ib (-983 to +35) in PUC-CAT vector were prepared by the alkaline lysis method and purified by two successive CsCl density gradient ultracentrifugation steps (22). Approximately 2 pg of DNA were digested with 25 units of S1 nuclease (Promega) in 40 m~ potassium acetate, pH 4.6,300 m~ NaCl and 1.3 m~ ZnSO,. After incubation for 5 min at 37 "C, the reaction was terminated by adding EDTA to a final concentration of 5 m~ and heating the reaction at 70 "C for 10 min to inactivate the S1 nuclease. The DNA was then subjected to phenol extraction and ethanol precipitation. Half of this DNA was digested with EcoRI, while the remaining was incubated with deoxynucleotides and Klenow polymerase to generate blunt ends (22). Following ligation the resulting plasmids were sequenced to determine the precise locations of the S1 nuclease cleavage sites.
Dansient Dansfection and CAT Assays-Transient transfections of HeLa and osteosarcoma MG-63 cells in suspension were performed by the calcium phosphate procedure as described (22). Briefly, the cells were transfected with 20 pg of DNA mixed with 2 pg of the pSV-pGAL plasmid DNA in order to quantify transfection efficiencies and incubated at 37 "C for 12 h in Dulbecco's modified Eagle medium supplemented with 10% fetal bovine serum. After 12 h, the growth medium was changed and the cells were incubated at 37 "C for an additional 48 h. The cells were then washed twice with phosphate-buffered saline, removed from the culture vessel by treatment with trypsin, washed twice, and assayed for the reporter CAT gene activity using thin layer chromatography (23). The @-galactosidase activities were measured according to standard protocols (22).

RESULTS AND DISCUSSION
The discovery of two leader exons in the 5"untranslated region of the human decorin gene (15) has raised the possibility that a two-promoter system and alternative splicing could be responsible for the presence of heterogeneous transcripts from a single gene. To identify active promoter sequences in the decorin gene, we analyzed the structural and functional activity of the DNAregions flanking the two exons (Ia and Ib). Below we provide evidence for promoter activity in the region 5' to exon Ib only, identify novel structural motifs that may be involved in the decorin gene transcriptional control, and clarify some of the issues raised by recent studies on the cDNA cloning of the 5' termini of the human decorin.
Complexity of Exon la Across Specie-Previously, we characterized exon Ia as a 81-bp region which we amplified by anchored PCR from a human embryonic skin fibroblast cDNA library (15). Fig. 1 schematically illustrates the organization of human decorin gene and summarizes the size and homology of exon Ia in a variety of recently described cDNAs from human  (25), chicken cornea (26). mouse 3T3 cells? and rat uterus (27). respectively. and non-human cells. It is notable that the decorin cDNA sequence from human embryonic skin fibroblasts originally described by Krusius and Ruoslahti (24) contained a highly divergent region at the 5' end. Specifically, the initial 25-bp sequence was not found in any other cDNAs, including cDNAs isolated from murine and avian species, nor was it found in the genomic clone DEC-1 which contained exon Ia. This sequence ( Fig. 1 B ) contains a terminal hexamer (GAA'ITC) recognized by the restriction enzyme EcoRI and, thus, it may represent a truncated cDNA. Interestingly, the sequence recently reported by Fisher (25) for the human bone exon Ia revealed a longer exon extending to 169 bp. This sequence matched perfectly our genomic sequence in DEC-1 clone, with the exception of the initial 23 bases (Fig. 1). To assess whether there was another exon in the upstream region of the decorin gene, we rescreened all our genomic clones (which include -10 kb of 5"flanking region and >13.2 kb of 3'-flanking region as related to exon Ia) using inverse complementary oligonucleotide probes (23-mer) based on the sequence of either the 5' end of human skin fibroblast decorin cDNA (24) or the bone cells exon Ia (25). In either case, all the Southern blottings of our genomic clones failed to reveal any additional reactive band (not shown).
To determine whether the two divergent sequences of exon Ia discussed above represented novel 5' exons, we amplified a A Z A P cDNA library from embryonic fibroblasts using anchored PCR (15). In this strategy, we took advantage of the fact the library contained the pBluescript plasmid between the phage arms and thus we used as sense primers the T3 or T7 sequences and as antisense primers the two 23-mer sequences described above. In the case of the sequence from embryonic lung fibroblasts (24), we had no specific amplification. In contrast, in the case of bone cells (25), several fragments were amplified (Fig.  2 4 ) . Specifically, two fragments of -630 and -940 bp were predominant (Fig. 2 4 , lane 3 1. Southern blotting using the 32P end-labeled 23-mer (the same primer used for amplification) Notice the presence of a single hybridizing band of 4.2 kb, markedly different from the two typical transcripts of -1.9 and -1.6 kb found in decorin (15,24), in both skin fibroblasts and rhabdomyosarcoma cells.
showed a strong reactivity with the amplified bands (Fig. 2B,  lanes 2 and 3 ), but not with 500 bp of decorin cDNA which was amplified at the same time (Fig. 2B, lane 4 ) . To test whether the two major amplified bands were related to decorin (i.e. they could represent a novel 5'-untranslated exon of decorin), we purified the two DNA fragments from the gel and used them as probes for Northern blotting of mRNA known to contain decorin. Interestingly, both bands hybridized with a 4.2-kb transcript in skin fibroblasts (Fig. 2, C and D, lane 11, rhabdomyosarcoma cells (Fig. 2, C and D, lane 2 ) , but failed to react with human prostate (Fig. 2 0 , lane 3), a tissue that is highly enriched in decorin mRNA (15). Since decorin mRNAis encoded by two transcripts of -1.9 and -1.6 kb (15,241, these findings indicate that the 23-bp sequence reported by Fisher (24) codes for a mRNA species different from decorin and presumably their cDNA clone of decorin contains a "scrambled" 5' terminus. Taken together, the results presented above favor the absence of additional leader exons, at least within 10-20 kb of the upstream region. This notion is corroborated by the following observations. First, anchored PCR amplification of the fibroblast cDNA library used before to recover the 5' region of decorin showed no additional 5' clones. Second, PCR amplification of decorin from several cDNA libraries, using exon Iaspecific primers, always yielded fragments of the predicted length and always in frame to exon I1 (15). Third, as illustrated in Fig. 1, exon Ia sequences with a high degree of conservation (8590%) were also observed in decorin cDNAs isolated from chicken cornea (261, rat uterus (27), and mouse 3T3 cells.2 Interestingly, in the murine sequence the homology with the human genomic clone extended for 207 bp, thus suggesting that the size of exon Ia may be significantly greater than 80 bp as originally described (15).
Ribonuclease Protection Assay for Exon Z a a n d Exon ZZ-In order to establish the transcription start site for exon Ia, we performed a series of experiments using primer extension of various RNAs and RNase protection assays. Primer extension often gave products significantly shorter than predicted probably due to premature termination of the reverse transcriptase enzyme in areas of complex RNA secondary structures (22). In contrast, RNase protection assay of RNA from human prostate, a tissue enriched in decorin message (154, gave a protected band of 173 bp (Fig. 3, lane 2) based on the migration of the 32P-labeled DNA standard from 4x174 virus. Because it is well established that RNA has a lower mobility than DNA of the same length when run on urea-polyacrylamide gels (22), the corrected size is likely to be -5% smaller. Using this correction, then, the size of the protected exon Ia is 164 bp, in close agreement with the size reported for exon Ia in human bone cells (25).
Exon I1 was easily detected in a variety of tissues and cells expressing decorin. Specifically, exon I1 was detected in prostate tissue (not shown), embryonic lung fibroblasts (Fig. 3, lane 41, human adult kidney (Fig. 3, lane 5), but was not found in prostate cancer PC-3 cells (Fig. 3, lane 61, which do not express decorin: or when yeast tRNA was used instead of human RNA (Fig. 3, lane 7). The protected exon I1 (lanes 4 and 5) gave a value of 255-256 bp, slightly higher than the 244 bases, the actual value obtained by cloning and sequencing the gene in two separate laboratories (15,271. After correction for different migration as described above, the size of the protected exon I1 was 243 bp, remarkably close (within one nucleotide) to that determined by direct sequencing. Thus, based on these data, we feel that our estimates of the size of exon Ia are accurate and within experimental error.
The 5"Flanking Sequence to Exon Z a Does Not Show Functional Promoter Activity-Analysis of the putative promoter Ia, a 2.1-kb sequence 5' to exon Ia (151, showed a lack of a CAAT or TATA box as in most eukaryotic promoters. Only a few possible regulatory cis-acting elements were observed, including two  Ia does not function as a decorin promoter. Although the possibility for the existence of a n additional exon upstream to exon Ia cannot be totally excluded, the evidence discussed above suggests that no exons related to decorin have been discovered. A more likely possibility is the presence of a distal promoter several kilobases from the actual transcription start site. This question could be better addressed once the structural and functional activities of the decorin promoter regions from different species are characterized. The Structure of Promoter Ib: Several Regulatory Cis-acting Sequences-In contrast to the putative promoter Ia, the region 5' to exon Ib (Fig. 4), contained several regulatory cis-acting elements known to play a role in transcriptional activity. Computer-assisted analysis using the GCG FIND and PCGENE EUKPROM programs revealed a number of interesting features. T w o TATA-like motifs (TATAAT and 'I"M'AM) and a

G C A G C A T C T T G C C T O F A T C A 6~~?~~~A C A C~~~G A G~~C C C A C T h T C T C A~T~~C T~A C~C T C T C T T G C~O F
1 EXON Ib * * * * * . . . . .

GTCGGTATTCTACATGGCAAGTAGCTTTGATTGTCAGGGTTCAGGTGAGGACTGGG~TGGAGTGACAATGA~GGGAATGTGGCTAGAGCTATT~TTC +181
and are correctly predicted by the computer programa FIND and EUKPROM contained in the GCG and PCGENE packages, reepectively. An AP5   FIG. 4. Sequence of the decorin gene promoter and exon lb. Two TATA-boxee and a CUT-box are prewnt in the proximal promoter region sequence is present at -527, a TGF-B negative element is positioned at -688, an A P 1 motif is located at -816, and two direct repeats (double homopyrimidine stretch i n the distal promoter region is bold with two direct repeats bold underlined. The major transcription start site is indicated underlined) are present between the A P 1 and AP5 motifs and include one binding site each for the NF-KB transcription factor (31). The long by +1, while the other initiation site is indicated by a triangle. The sequence of exon Ib is shaded with the possible hairpin loop labeled by stars. The sequence has been edited, and several bases have been substituted and corrected. This sequence has now been replaced in the GeneBanW EMBL Data Bank but has retained the original accession number (M98262).
CAAT box were detected within the first 95 bp from the putative cap site (Fig. 4). By analogy with other well characterized eukaryotic promoters (29), these TATA boxes may serve to fm the start site for decorin transcription (see below). Distal to this region, there were other cis-acting motifs originally described as SV40 enhancer sequences, AP5 andAP1, located at -525 and -813 bp, respectively (for a review, see Ref. 30). Two direct repeats of 18 and 19 bp were found upstream of the AP5 motif. Interestingly, both repeats contained sequences that conform to the consensus site for NF-KB (31). These sequences have been involved in interleukin-1-induced stimulation of transcription in a variety of genes (31). It is noteworthy that interleukin-1 stimulates the synthesis of chondroitiddermatan sulfate proteoglycans in human synovial fibroblasts (32) and that decorin mRNA levels are increased 3-fold following interleukin-1 treatment of human skin fibroblasts (33).
A TGF-p-negative element was found at position -685. This motif conforms to the consensus sequence Gnn'ITGGtGa that has been found in the promoter region of transidstromelysin, and its expression is mediated through the proto-oncogene c-Fos (34). This putative TGF-P-negative element could function as a transcriptional silencer and suppress decorin gene activity in TGF-@sensitive cells. Although it was originally reported that TGF-P up-regulates dermatadchondroitin sulfate proteoglycan (351, it is now becoming clear that decorin mRNA is either unaffected or markedly suppressed by TGF-P while biglycan and versican are more likely to be the gene products induced by the cytokine (36)(37)(38).
Finally, in the distal 5' region of the promoter Ib there were two direct repeats of 18 bp which were contained in a 150-bp region composed of homopyrimidine residues (CT) in the coding strand. This region contained mirror repeats and, as demonstrated below, was sensitive to S1 nuclease digestion.

Determination of the Danscription Start Site of Exon Zb by in Vitro Danscription
Assay-We have previously detected exon Ib in a number of cells and human tissues using either reversetranscriptase PCR or Northern blotting with exon-specific probes (15). Indeed, using either technique, it appeared that exon Ib is the most widely distributed exon among tissues with an average expression that is about 10 times greater than that of exon Ia (15). In addition, exon Ib sequence has been independently discovered in human adult skin fibroblasts4 and in decorin cDNA from human (24) and bovine (39) bone cells. In the human bone cells, a much longer exon Ib has been reported (24) that follows exactly our genomic sequence of DEC-1 and * L. A. Beavan, personal communication.
extends for about 200 bp including the two TATA boxes. This long transcript, as recently discussed for the osteopontin gene (401, may actually represent an incompletely processed nuclear transcript present in the RNA preparation that was used in the original cloning of the cDNA library (41).
In spite of wide expression of exon Ib, both primer extension and RNase protection assays failed to provide consistent results. A likely explanation for this is that this region has the possibility of a hairpin loop structure similar to that observed in the c-myc gene (42). To circumvent this problem, we utilized an in vitro transcription assay using HeLa nuclear extracts and, as template, the construct containing promoter Ib and the entire exon Ib (cf. Fig. 4). The results, using commercially available purified nuclear extracts (Fig. 5, lane 1 ) or HeLa extracts purified by us as described before (43) (Fig. 5, lane 2) gave two transcripts of 98 and 86 bp, respectively. If one corrects for the overestimate due to the difference between RNA and DNA mobility as above, then two possible transcription start sites of 93 and 82 bp are detected for this decorin promoter. A closer analysis of the two transcripts revealed that the 82-bp transcript was the predominant one and was remarkably similar in size to exon Ib (81 bp) which we previously PCR amplified from a human fibroblast cDNA library (15). Of note, the minor transcript of 93 bp has been found in a cDNA isolated from human adult skin fibroblast^.^ These data are compatible with the notion that the two transcripts are driven by RNA polymerase and that the two TATA boxes are responsible for the presence of two separate transcription start sites. A detailed analysis of HeLa transcription factors that bind to TATA boxes have revealed a remarkable conservation in DNA motifs between human and yeast (44). The two TATA-like motifs located in the upstream region of the decorin gene are indeed highly conserved and active in a variety of eukaryotic and prokaryotic promoters (44). Significantly, mutational analysis of the consensus TATA box sequence (TATAAA) has shown that motifs identical to those found at -66 (TATAAT) and -50 (TlTAAA) of the decorin promoter maintain 21 and 54% of the transcriptional activity as compared to the wild type (44). This is in agreement with our finding that the 82-bp transcript is more actively transcribed than the 93-bp transcript (Fig. 5). It has been demonstrated that the presence of a T or an A instead of a G at the seventh position of TATA elements enhances transcription (44). It is noteworthy that the TATA-like box of decorin at -66 contains a G as the seventh nucleotide, while the one at -50 contains an A, thus further indicating that the latter TATA element should exhibit more transcriptional activity. Finally, because the studies discussed above (44) have used HeLa In v i h o transcription of the decorin promoter. This autoradiogram of a 8% acrylamide denaturing gel shows two transcripts of 98 and 86 bp (arrowheads) generated by using as template the exon Ib and the 5"flanking region and HeLa nuclear extracts which were either purified by us (lane I ) or purchased from Promega (lane 2). In either instance, a major band of 86 bp and a minor one of 98 bp are identified. As discussed in the text, correction for slower RNA migration as compared to DNA standard gave a value of 93 and 82 bp, respectively. The bottom panel is a schematic representation of the linearized plasmid DNA containing pBluescript KS (BS, dark bar), decorin promoter (narrow light bar) with the two TATA boxes indicated by open squares, and exon Ib (shaded bar). The restriction endonuclease EcoRI ( E ) cleaves the plasmid DNA immediately after the exon Ib. The start positions of the two transcripts are shown by arrows. The numbers on the left indicate the migration of molecular weight DNA markers derived from Hinfl-digested 6x174 viral DNA fragments which were end-labeled with [n-"P]ATP following dephosphorylation. nuclear extracts and in vitro transcription assays very similar to our study, we conclude that the genomic sequence immediately 5' to exon Ib is capable of directing accurate initiation of decorin transcription.
Collectively, these data indicate that the two distinct cap sites are due to the usage of two closely spaced TATA boxes and suggest that they are both operational in vivo. As we formally demonstrate in the experiments described below, the 5"flanking region to exon Ib exhibits strong promoter activity in transient transfection assays.

A HomopurinelHomopyrimidine Region in Promoter Ib Is
Sensitive to SI Nuclease Digestion-The 5"flanking region to exon Ib (Fig. 4) contained a unique 150-bp segment featuring pyrimidine deoxynucleotides (CT) in the coding strand and purine deoxynucleotides (AG) in the non-coding strand. In addition there were mirror repeats in the 3' end side of the homopyrimidine stretch (Fig. 4). PurlPyr sequences have recently attracted significant interest because they can adopt a novel hairpin triplex referred as the H form of DNA (45,46). In this unusual structure, half of the pyrimidine strands form a normal Watson-Crick duplex with the corresponding homopurine strand, while the remaining (palindromic) homopyrimidine stretch folds back and binds into the major groove of DNA via Hoogsteen base pairing. The peculiar feature of such a structure is that half of the homopurine strand remains unmatched and thus becomes accessible to cleavage by single strand-specific endonucleases such as S1 nuclease (46). We, therefore, inspected the 5'-flanking region to decorin Ib for its sensitivity to S1 endonuclease. In these experiments, we banded supercoiled plasmid DNA twice by CsCl density gradient ultracentrifugation to avoid any shearing or nicking of the DNA which would preclude any meaningful conclusions since nicked DNA is highly sensitive to S1 nuclease (22). A typical DNA preparation of this construct migrated as a single band of -3.1 kb in a supercoiled conformation (Fig. 6, lane 2). When this DNA preparation was incubated with S1 nuclease at pH 4.5, it was rapidly converted into a linearized plasmid which now migrated as a -6-kb fragment (Fig. 6, lune 4 ) . To identify the S1-sensitive site, we sequentially digested this plasmid with the restriction endonucleases EcoRI and HindIII, which gave the predicted fragments of 2.9, 1.9, and 1.3 kb, respectively (Fig. 6, lam 3). If the S1-sensitive site were located in the homopyrimidine stretch (i.e. at the beginning of the insert as illustrated in the top panel of Fig. 61, one would expect that digestion with S1 nuclease followed by digestion with EcoRI should specifically cleave only in the 5' of the 1.3 kb band. The results supported this notion since SlIEcoRI treatment of the construct reduced the 1.3 kb band by about -100 nucleotides ( Fig. 6, lune 5, labeled by arrowhead). These results are consistent with the presence of S1-hypersensitive sites in supercoiled DNA containing the decorin promoter region. S1 sensitivity required supercoiled plasmid DNA since predigestion with either EcoRI or HindIII prevented S1 digestion (not shown).
To establish more precisely the sites of S1 sensitivity in the decorin promoter, we briefly digested the construct with S1 nuclease and the linearized plasmids were recovered from the gel. The ends were filled with Klenow polymerase and blunt end-ligated. A number of clones were isolated and sequenced with a reverse primer 3' to the homopyrimidine stretch. The results showed that in all nine clones isolated (Fig. 7) there were deletions varying in size between 27 and 95 bp, which were located exclusively within the homopurinehomopyrimidine region.
Significance of the HomopurimlHomopyrimidine Region in the Decorin Promoter-It is now well established that triplex (H-forms) of DNA develop by the unpairing and folding back of the pyrimidine strand, which forms Hoogsteen hydrogen bonds in parallel orientation to the opposite half of the purine strand (46). This creates a triplex stem of CGC and TAT triplets, a loop of pyrimidine residues, and a single stretch of half of the purine mirror repeat (45,46). In general, low pH, negative supercoiling and increasing length of the pur/pyr segment act interdependently to stabilize intramolecular triplex formation. Because cytosine residues are protonated at low (<5.0) pH, it has been proposed that these residues are involved in Hoogsteen pairing for which protonation is required (45). Interestingly, pur/pyr structures frequently occur upstream from eukaryotic genes and in recombination sites. Similar repetitive structures have been found in a number of proto-oncogenes including the epidermal growth factor receptor (47,481, the c-Ets-2 (491, and c-Ki-ras (50,51). In addition, extracellular matrix proteins such as chicken collagen a2(I) (52,53) and a2(VI) (54), bovine osteonectin (55), and bone sialoprotein (56) have similar structures in the 5"flanking region of their promoters.
What could be the role of such repetitive elements near the promoter region of genes? Although not completely established, current evidence indicates that pur/pyr stretches are directly involved in making the chromatin "active" by functioning as cis-active motifs that bind specific protein moieties. For example, in Drosophila a DNA-binding protein with an affinity for the GAGA consensus sequence similar to that present in the noncoding strand of the decorin pudpyr segment regulates the transcriptional activity of the Ulthrabithorax gene (57). It is a relatively abundant sequence-specific DNA-binding protein and has been shown to bind a number of promoter regions in Drosophila (58). This protein recognizes a sequence C/AGAGAGAGC which differs from the GAGA factor described by Young and collaborators (55, 56, 59) that recognizes GGGA  (top panel) was linearized by a brief treatment with S1 nuclease (cf. Fig. 6, lane 4 ) and recovered from the gel. The protruding termini were filled with nucleotide triphosphates using the Klenow polymerase. Following blunt end ligation of the plasmids, a number of clones were isolated and sequenced.
The sequence data showed that only the homopurinehomopyrimidine region of the promoter was sensitive to S1 nuclease. The top sequence is the homopurinel homopyrimidine region of the decorin promoter in the original supercoiled plasmid with the arrows indicating the cleavage sites. The dotted lines indicate the portions deleted from the homopurine/ homopyrimidine region after S1 nuclease digestion. terminus of the promoter (empty bar) and is fused to the CAT gene a t position +35 relative to the transcription start site in the noncoding exon Ih sequence (shaded bar). The regulatory motifs of the decorin promoter are schematically represented at the top by various symbols whose key is provided at the bottom left. See the text for additional details.
or GGA repeats. In addition, another GAGA-binding protein has been recently characterized in Drosophila that binds to purine-rich regions in a homeobox gene designated Kriippel (60) and shows homology to the osteonectin factor (56). Finally, several proteins from Drosophila nuclei bind to regions of alternating C and T residues present in the promoters of the heat shock genes hsp70 and hsp26 and the histone genes his3 and his4 (61).

Reporter Gene Analysis of the Decorin Promoter Zb and 5'
Deletion Constructs-To prove that the putative promoter region upstream of exon Ib does in fact exhibit transcriptional activity, we performed reporter gene analysis using various decorin promoter-CAT fusion plasmids. Fig. 8 diagrams the decorin promoter-CAT fusion plasmids used in these experiments. Reporter gene analyses were performed using several controls and at least in two types of cells, HeLa (primarily) and MG-63 osteosarcoma cells. In each transfection experiment we used a CAT fusion plasmid driven by the SV40 promoter and enhancer (pSV2-CAT) as a positive control and a promoterless  9. S u m m a r y of CAT expression assays of decorin gene promoter and various stepwise 6' deletion constructs. HeLa cells were co-transfected with decorin gene promoter CAT constructs of various lengths and the pSV-pGal plasmid. After transfection, the cells were incubated for 48 h, and the cell extracts were assayed for CAT activity by thin layer chromatography and normalized for p-galactosidase activity. The numbers at the bottom represent the size of each construct relative to the transcription start site. Promoter activity is expressed as a percentage relative to maximum CAT activity as produced by the full-length decorin promoter (-983). The values represent the normalized mean of five independent experiments run in duplicate or triplicate with S.D. < 15%.
CAT plasmid as a negative control. In addition, the cells were cotransfected with a plasmid containing the P-galactosidase gene driven by the same strong viral promoter (pSV40-pGALJ, and the transfection efficiency for each reaction was normalized for @galactosidase activity. The relative activity of eight constructs of five independent experiments is summarized in Fig. 9. Although strong promoter activity was detected with the -140 bp construct, the smallest construct containing the two TATA boxes and the CAAT box (cf. Fig. 4), the activity was increased when the distal promoter region between -879 and -983 was included. Interestingly, this region contained the entire homopyrimidine segment, thus indicating that this pur/pyr region has enhancer activity. A reproducible, although moderate, decrease in functional activity of the promoter was observed when the construct including the TGF-@negative element (the region extending to -735, Fig. 9) was used. These data suggest that this region may represent a negative cisactive element as described in stromelysin (34) and provide a molecular correlation of previous studies showing that decorin can be down-regulated by TGF-fl (36)(37)(38).
The Homopurine-Homopyrimidine Segment Affects the Activity of a Minimal Heterologous Promoter-In order to prove directly a functional role of the decorin homopurine/ homopyrimidine region, we made a construct containing the entire homopyrimidine region from -781 to -1003, (Fig. 10) coupled to a minimal heterologous promoter, the human DNA polymerase a gene (62). This construct was then fused to the CAT gene and tested in transient transfection assays as described above. The human DNA polymerase a has been previously shown to be constitutively expressed in dividing cells and is ubiquitous in nature (62), thus providing the opportunity to test whether or not the pur-pyr stretch is involved in regulating decorin gene expression. In two independent transient transfection experiments, the pur-pyr segment was capable of inducing a 3-fold increase in CAT activity (Fig. 10). These results corroborate the data provided above and show a direct functional role for such a DNA sequence in the transcriptional regulation of human decorin gene. It is noteworthy that a recent study has shown that a 178-bp homopurine-homopyrimidine region located in the upstream region of the rat neural cell adhesion molecule (N-CAM) was capable of down-regulating the activity of a luciferase reporter gene driven by the basal The homopurine-homopyrimidine segment (between -781 and -1003) was subcloned into a vector containing human DNApolymerase a (DPA) gene promoter spanning -140 to +40 relative to the transcriptional start site (62), which was fused to the CAT gene (schematically represented in the top panel). The left bottom panel represents a photograph of a representative thin layer chromatogram of ['4Clchloramphenicol and its acetylated products produced in CAT reaction assays. Lanes 1 and 2 correspond to constructs lacking or containing the homopyrimidine region, respectively. Lane 3 is the promotorless CAT gene construct. The bottom right panel is a summary of CAT activity normalized for p-galactosidase activity. Promoter activity is expressed as a percentage relative to maximum CAT activity as produced by the CAT construct containing the decorin homopurinehomopyrimidine segment (+).
SV40 promoter (63). In addition, the pur/pyr region of N-CAM bound specifically to nuclear proteins extracted from the cells that expressed the gene but not from those cell lines that did not express N-CAM (63). The differential effects of these DNA sequences, together with the demonstrated cell specificity, underline the complexity of these regulatory domains and warrant future studies aimed a t elucidating their role in gene transcription.
Conclusions-In this paper we have characterized the structural and functional activity of human decorin gene which encodes a proteoglycan involved in the regulation of matrix assembly and cytokine binding. The results have shown that the organization of human decorin promoter is unusual and differs significantly from that of any other proteoglycan gene promoter so far described including the serglycin (64), biglycan (651, syndecan (66), versican5 and perlecan (67) genes, respectively. We have demonstrated that, although decorin contains two alternatively spliced leader exons with flanking putative promoter regions, only one of these regions upstream to exon Ib exhibits strong transcriptional activity. The elucidation of key regulatory elements in the active decorin promoter provides new insights into the understanding of the control of decorin gene expression and corroborates previous observations on modulation of decorin gene expression by growth hormones and cytokines.