Structural Analysis of the Gene Encoding Human Aromatase Cytochrome P-450, the Enzyme Responsible for Estrogen Biosynthesis*

The structural gene encoding aromatase cytochrome P-450 (P-450AROM) was isolated from human genomic DNA. The gene spans at least 52 kilobases and is composed of 10 exons, the first of which is untranslated. Analysis of the transcription initiation site of human P-450AROM mRNA reveals the differential use of 1 of 3 consecutive G residues at the cap site. DNA sequence analysis indicates that the gene has a putative TATA (ATAAAA) sequence at -23 base pairs (bp) and putative CAAT binding sequences beginning at -41, -67, and -83 bp. The 5”flanking region contains se- quences similar to consensus sequences of cis-acting elements defined as regulators of aromatase gene expression. These putative sequences include a CAMP regulatory element at -21 1 bp, an AP1 (protein kinase C) site at -54 bp, and glucocorticoid regulatory ele- ments at -352 bp and within the first intron at +346 bp. There appears to be only one gene encoding P-450A~OM in the human genome. Two major species of human P’450AROM mRNA (3.4 and 2.9 kilobases) are derived from the use of two polyadenylation signals.

(1-5). This enzyme utilizes molecular oxygen and reducing equivalents provided by a ubiquitous NADPH-cytochrome P-450 reductase to catalyze a series of three sequential hydroxylations that results in loss of the angular methyl group at carbon 19 and phenolization of the A ring of the steroid (6)(7)(8)(9).
In addition to ovarian granulosa cells, aromatase is expressed in several tissue sites including the placenta, Sertoli (10) and Leydig cells (11,12) in the male, and adipose tissue (13) and several sites in the brain including the hypothalamus, hippocampus, and amygdala (14,15) in both sexes. Aromatase expression in adipose tissue has been implicated in the development of endometrial cancer (16) as well as estrogen-dependent breast cancer (17). In adipose tissue, the principal estrogen formed is estrone, whereas in granulosa cells, the primary estrogen formed is estradiol-17P and in the placenta, estriol. Expression of a full-length cDNA encoding human P-450AROM suggests that a single enzyme is capable of catalyzing the aromatization of all three classes of androgen substrates, namely androstenedione, testosterone, and 16a-hydroxylated androgens (18). It is likely therefore that the formation of different estrogen products in the various sites of expression reflects the presentation of these different substrates to the same enzyme, rather than the presence of different forms of aromatase in each tissue site.
In previous studies (19,20), we have shown that aromatase activity of human adipose and ovarian granulosa cells is subject to complex and multifactorial regulation which is correlated with changes in the levels of mRNA encoding this protein. In order to examine, in greater detail, the regulation of aromatase expression in the human, we have isolated and characterized the gene encoding P-450AROM. Utilizing the fulllength cDNA (18) and a primer-extended cDNA insert as hybridization probes, four genomic clones encoding the entire P-450AROM structural gene have been isolated from two different human genomic libraries. The gene spans at least 52 kb and contains an untranslated first exon as well as two polyadenylation signals. Characterization of the regulatory sequences of this gene should pave the way to understanding the multifactorial regulation and tissue-specific expression of human P-450AROM.
Analysis of the Aromatase Cytochrome P-450 Gene EXPERIMENTAL PROCEDURES AND RESULTS*

DISCUSSION
Structure of Human Aromatase Cytochrome P-450 Gene-In this study is presented an analysis of the gene encoding human P-450AROMj the enzyme responsible for the conversion of androgens to estrogens. The gene is similar to that of other cytochrome P-450 species in that the structural gene comprises 10 exons (most cytochrome P-450 genes contain between 8 and 10 exons); the heme-binding region and the entire 3"untranslated region are encoded by the last exon (29). However, the gene is much larger than those of other steroidogenic cytochrome P-450 species and may in fact be the largest cytochrome P-450 gene analyzed at this time. The entire gene spans at least 52 kb; and since there are two regions where the clones do not overlap, the actual size is unknown. By comparison, the genes for the other two microsomal steroidogenic cytochrome P-450 species, namely P-45OCz1 (30) and P-45OI7= (31), span 3.7 and 6.5 kb, respectively. Of the mitochondrial steroidogenic cytochrome P-450 species, the gene for bovine ll@-hydroxylase cytochrome P-450 (P-45OIl8) is 8 kb (32) long, whereas that for human cholesterol side-chain cleavage cytochrome P-450 (P-45OScc) is at least 20 kb long (33), and this also has an intron in which the clones do not overlap.
Analysis of the intronlexon boundaries of human P-450AROM reveals rather poor correlation with other microsomal steroidogenic cytochrome P-450 species. Poor intronlexon boundary alignment between the various genes is a characteristic feature of the cytochrome P-450 superfamily, leading to speculation that the ancestral common gene had many more exons than the modern counterparts (29). Number of Aromatase Cytochrome P-450 Genes Present in Human Genome-The issue of whether there is more than one aromatase enzyme in the human is an important one for several reasons. In the first place, it has been suggested (34) that different aromatase enzymes exist in the placenta, ovary, and adipose tissue since the major estrogen produced in each of these tissues is different, namely estriol in the placenta, estradiol in the ovary, and estrone in the adipose tissue. Equally, however, this could be due to the presentation to the same enzyme of different substrates, namely l6a-hydroxylated androgens in the case of the placenta, testosterone in the case of the ovary, and androstenedione in the case of adipose tissue. The issue is also of importance clinically. At present, there is much interest in the development of more effective inhibitors of aromatase for use clinically in the management of patients with breast cancer (35). However, the only source of aromatase available for the testing of such inhibitors is that derived from human placenta. Moreover, there is evidence to suggest that the estrogen which may be of consequence in the development of breast tumors is that produced in breast adipose tissue surrounding the developing tumor. If the enzyme responsible for such estrogen in adipose tissue were different from that in the placenta, then clearly, inhibitors developed against the placental enzyme might be less efficacious toward the enzyme present in adipose tissue.
Our previous work (18) on the expression of the cDNA encoding P-450AROM together with previous work (3,4) on the purified enzyme suggest that a single enzyme is capable of metabolizing all three categories of C19-steroid substrate and Portions of this paper (including "Experimental Procedures," "Results," and Figs. 1-9) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press. that it is not necessary to postulate the presence of different enzymes in the different tissues which synthesize estrogens. This work on the characterization of the gene encoding P-4 5 0~~0~ is consistent with this view. In all of our restriction mapping and Southern analyses, we have obtained no evidence to suggest that there is more than one P-450AROM gene within the human genome. Our conclusions in this context therefore differ from those of Chen et al. (36), who suggested, on the basis of Southern mapping, that there exist two human P-450AROM genes.
Characterization of 5'-Untranslated Exon-Although DNA sequence upstream from the bases encoding the start of translation contained putative CAAT and TATA boxes, primer extension failed to reveal any start of transcription associated with these. Moreover, an oligonucleotide prepared against a region commencing 39 bp downstream from this putative TATA box failed to hybridize in Northern analysis of poly(A+) RNA from human placenta (Fig. 3, lane 7). We conclude therefore that an intron is present in the DNA 5' of the exon encoding the start site of translation and that the intron/ exon boundary occurs at the point where the genomic sequence and those of the cDNAs all become identical. A sequence similar to the splice junction consensus sequence is also present at this site (Ref. 24 and Fig. 8). The region of the gene that we have called exon I therefore fulfills the criteria of a 5'-untranslated exon, although the clone that contains it does not overlap with the clones containing the exons encoding the translated gene product. This conclusion is also supported by the following criteria. First, the genomic clone contains sequence identical to that of the 5'-end of a cDNA (clone cDNA-2) isolated from a primer-extended human placental library and is identical to the human P-450AROM cDNA sequence published by Harada (28). Second, an oligonucleotide prepared complementary to this region hybridizes to poly(A+) RNA from human placenta to give bands identical to those observed using the full-length cDNA as a hybridization probe (Fig. 3, lune 5). Third, primer extension products of human placental poly(A+) RNA initiated using an oligonucleotide complementary to exon I1 also contain DNA corresponding to exon I (Fig. 7). These findings indicate that exon I is present on the same mRNA species as is the first translated exon.
Putative Regulatory Sequences within Human Aromatase Cytochrome P-450 Gene"5"Flanking sequence comprising 918 bp upstream of exon I has been sequenced. Analysis of 5"flanking region of exon I indicates the presence of a putative TATA (ATAAA) box. Downstream from this (23 bp) is the site of transcription initiation as revealed by primer extension. Sequences similar to putative CAAT binding elements are present at -41, -67, and -83 bp. P-450AROM expression in vivo and in vitro is under the control of a number of hormonal regulators including factors that act via CAMP-dependent protein kinases (e.g. gonadotropins) and protein kinase C as well as growth factors such as epidermal growth factor, basic fibroblast growth factor, transforming growth factor @, tumor necrosis factor (371, and glucocorticoids, which act via specific receptors that bind responsive cis-acting elements (38). The 5"flanking sequence of P-450AROM was therefore evaluated for sequences that have been shown to confer responsiveness to the above factors or their second messengers. A putative CAMP-responsive element was found at -211 bp. This sequence (TGTGGTCA) is identical to the published consensus sequence for CAMPresponsive elements (TGACGTCA) (39) at six of eight of the positions. A putative AP1-like cis-regulatory sequence (TCAGTCA) that would confer protein kinase c responsive-ness was found at -54 bp. A single deviation from the consensus sequence occurs: the C at position 2 is a T/G in the consensus AP1 heptamer (40). A putative glucocorticoid-responsive element (TGTCCATGGTTCT) was identified at -358 bp. This element is notable because it is similar to the repeat palindrome in the mouse mammary tumor virus 5'flanking sequences described by Strahle et al. (41) that was the most effective in terms of glucocorticoid responsiveness. A canonical, second glucocorticoid-responsive element sequence is indicated in the first intron (TGTTCT), at +346 bP. Is There a Second Untranslated Exon?-The presence within a P-450AROM cDNA (clone cDNA-3) of yet another divergent 5"region upstream from the splice junction is suggestive of the presence of a second translated exon within the human P-450AROM gene which may be subject to differential splicing. This, of course, raises the exciting possibility that these exons may be differentially spliced in different tissues which express aromatase and that such differential splicing may account for some of the complexities of the regulation of expression of this gene which have been observed in these tissues. However, the possibility of a second untranslated exon is rendered less likely by the following considerations. First, this sequence could not be amplified out of human genomic DNA, although the corresponding sequence of clone cDNA-2 could readily be amplified. Second, we were unable to detect this sequence in our genomic libraries. Third, this sequence could not be amplified out of human placental poly(A+) RNA by primer extension using oligonucleotides complementary to either exon I or 11. Fourth, an oligonucleotide complementary to this region did not hybridize to placental poly(A+) RNA (Fig. 3, lane 6) or to human ovarian poly(A+) RNA (data not shown), as indicated by Northern analysis. We conclude therefore that although this sequence is present within the 5"region of a human P-450AROM cDNA upstream from a splice junction, it does not appear to be a component of the normal pattern of expression of the P-4 5 0~~0~ gene. Its presence within the cDNA therefore may result either from an unlikely artifact of the original cloning or from an extremely rare polymorphism, perhaps involving an extra piece of DNA inserted 3' of exon 1.
Nature of Transcribed RNA Message-Although there appears to be only one P"i50AROM gene within the human genome, Northern analysis of RNA from tissues in which the gene is expressed reveals the presence of two hybridizable mRNA species, one of 3.4 kb and one of 2.9 kb (Fig. 3). We believe that this is due to the use of alternative polyadenylation signals for the following reasons. In the first place, there are two putative polyadenylation signals within the 3'-untranslated region of the human P' 450AROM gene, and we have isolated cDNAs that contain polyadenylated tails corresponding to the use of each of these polyadenylation signals (18, 42). In all other aspects, the sequences of the cDNAs are identical, including the sequence of the coding region. Second, when an oligonucleotide complementary to the area of the 3'untranslated region between these polyadenylation signals is used as a hybridization probe in Northern analysis of human placental poly(A+) RNA, only of these mRNA species hybridizes, namely the 3.4-kb band (Fig. 3, lane 3). This proves convincingly that the RNA species of 2.9 kb does not contain the 3"untranslated region between the polyadenylation signals. Third, an estimate of the size of the mRNA that would be expected based on the length of these cDNAs is in good agreement since sizes of the corresponding cDNAs are 3.0 and 2.7 kb, and most polyadenylated tails are 200-400 bases long.
Conclusions-Understanding of the regulation of the bio-synthesis of estrogens in the human is of great importance for a number of reasons. First, the ratio of androgen to estrogen is responsible for a number of important physiological parameters such as the expression of the appropriate sexually dimorphic phenotype as well as reproductive capacity. The expression of the gene within the hypothalamus and other regions of the brain is probably required for the imprinting of sex-related behavior as well as the pattern of gonadotropin secretion by the hypothalamic-pituitary axis. The expression of aromatase in the preimplantation blastocyst may well provide a signal for implantation, and this could account for the observation that, at present, no mutations resulting in a loss of aromatase activity have been characterized. Last, a number of common human cancers, in particular endometrial and breast cancer, are estrogen-dependent. Therefore, for these reasons, understanding the differential regulation of aromatase in the various tissues in which it is expressed in the human as well as understanding the developmental and tissue-specific expression of the enzyme are of great importance and interest. Characterization of the gene encoding human P"i50AROM should pave the way for studies addressing the issue of the different modes of regulation of this gene.

Acknowledgments-We gratefully acknowledge the assistance of Dr. Michael McPhaul (Department of Internal Medicine, University of Texas Southwestern) in the preparation of the primer-extended cDNA library and the expert advice and help of Dr. Sandra Graham-Lorence (Green Center, University of Texas Southwestern) as well as the expert editorial assistance of Sandra Finley.
Addendum-Subsequent to submission of this manuscript, we became aware of the publication of Toda et al. (43). These authors used a similar strategy to ours to characterize the two P-45OARoM mRNA species. They also stated, without showing substantiating data, that they had isolated a second cDNA differing from their published insert in terms of the sequence of the 5"untranslated region. goftatqctc tqaCdCCtqt CCtamQTCCT CGCTACTGCA TQGOMTTGG ACCCCTCATC TCCClCGGCA GATTCCTGTG OATGGGOATC GGCAQTGCCT GCABCTACTA CMCCGGGTG l y P Z 0 G l y T y r C y s U e t G l y I l e G 1 y P r o L L I u I l e B t l r H i l ) G l y A q P h e L e U T r p U e t G l y T h r G l y 8 e r A l . C y B A a n T y r T y r A a M r q V . 1 TATGGAGMT TCATQCGAGT CTGOATCTCT GQAGAGGAAA CACTCATTAT C A G C M a t a a atCtqttCdt aatcqaaqac atacttttta aatcqaqqct qgaqtttttt qqdqttaaga T y r G l y o x U e t A r q V a I T r p l I o B a r ' GlyGlYGlvT h r L e u I l * I l e8arLy caactttatt ttaaatcttq atgtctttgt ttctaacqct atatttttac cactaaaatq aaqtqaqcaa tccccaaaaa tctaacattq caaacaqaat aattwaattt tgcttgaatt ttcadlatcac tttattcqtq attcacaqat atacatcaca tqtacaqaac acttaqctat aaaaqaacaa aaacaqqeqt aacacaqaac aqttqcaatt tttqqtqtaa ctaaqatqtt qaaqccaqca gtacataaat