Structural analysis of the gene encoding rat cholesterol alpha-hydroxylase, the key enzyme for bile acid biosynthesis.

The gene encoding cholesterol 7 alpha-hydroxylase (P450VIIA) was isolated from rat genomic DNA. The gene spanned about 11 kilobases and contained six exons. Blotting analysis of genomic DNA and complete matching of restriction maps of several isolated genomic clones indicated that there appeared to be only one gene in the rat genome. The putative transcription initiation site was present 61 base pairs upstream from the ATG codon. The typical TATA sequence and CCAAT promoter element were found at 24 and 47 base pairs upstream from the transcription initiation site, respectively. Alignment of several P450 proteins showed that the cholesterol 7 alpha-hydroxylase gene shared location of introns with none of the other P450 genes except for intron 5, which was in the same position as intron 10 of the gene encoding P450IVA1. The alignment also indicated that the distal helix of cholesterol 7 alpha-hydroxylase contained an asparagine in place of the well conserved threonine that is postulated to be involved in the O2 binding site. Unusual residues, Asn-126 and Thr-442, were also found at the sites where all other P450s have positively charged amino acids, which are considered to be involved in interaction with heme propionate. These replacements may be related to the unique function and unusual lability of the hydroxylase. Analysis of evolutionary distance between the cholesterol 7 alpha-hydroxylase gene and other known P450 genes indicated that yeast P450LIA is most closely related to P450VIIA. This finding suggests that the cholesterol 7 alpha-hydroxylase gene is an evolutionarily old P450 gene.


Structural Analysis of the Gene Encoding Rat Cholesterol a-Hydroxylase, the Key Enzyme for Bile Acid Biosynthesis*
Masazumi Nishimoto, Osamu Gotoh$, Kyuichiro Okuda, and Mitsuhide NoshiroQ From the Department of Biochemistry, Hiroshima University School of Dentistry, Kasumi 1-2-3, Hiroshima 734 and the $Department of Biochemistry,Saitama Cancer Center Research Institute,Saitama 362,Japan The gene encoding cholesterol 7a-hydroxylase (P450VIIA) was isolated from rat genomic DNA. The gene spanned about 11 kilobases and contained six exons. Blotting analysis of genomic DNA and complete matching of restriction maps of several isolated genomic clones indicated that there appeared to be only one gene in the rat genome. The putative transcription initiation site was present 6 1 base pairs upstream from the ATG codon. The typical TATA sequence and CCAAT promoter element were found at 24 and 47 base pairs upstream from the transcription initiation site, respectively.
Alignment of several P450 proteins showed that the cholesterol 7a-hydroxylase gene shared location of introns with none of the other P450 genes except for intron 5, which was in the same position as intron 10 of the gene encoding P450IVA1. The alignment also indicated that the distal helix of cholesterol 7a-hydroxylase contained an asparagine in place of the well conserved threonine that is postulated to be involved in the O2 binding site. Unusual residues, Asn-126 and Thr-442, were also found at the sites where all other P450s have positively charged amino acids, which are considered to be involved in interaction with heme propionate. These replacements may be related to the unique function and unusual lability of the hydroxylase. Analysis of evolutionary distance between the cholesterol 7a-hydroxylase gene and other known P450 genes indicated that yeast P450LIA is most closely related to P450VIIA. This finding suggests that the cholesterol 7a-hydroxylase gene is an evolutionarily old P450 gene.
Cholesterol 7a-hydroxylase catalyzes the initial hydroxylation of cholesterol at the 7a-position, which constitutes a rate-limiting step for cholesterol catabolism (Bjorkhem, 1985). The enzyme is a microsomal monooxygenase system comprising cytochrome P450VIIA' and NADPH-cytochrome * This study was supported in part by a grant-in-aid for scientific research on Priority Areas P450 from the Ministry of Education, Science and Culture of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number($ TO whom correspondence should be addressed Dept. of Biochemistry, Hiroshima University School of Dentistry, Kasumi 1-2-3, Minami-ku, Hiroshima 734, Japan. ' T h e abbreviations used are: P450VIIA, protein or amino acid sequence of cholesterol 7a-hydroxylase (product of CYP7 gene); kbp, kilobase pair(s).

M59184-M59189.
P450 reductase (Wada et al., 1968;Bjorkhem et al., 1974). The enzyme activity in the rat exhibits a circadian rhythm, with maximum activity at night and minimum activity during the day , and is regulated by the level of bile acids returning to liver via enterohepatic circulation. The enzyme activity is also reported to be affected by administration of hormones such as adrenal cortical hormone and thyroxine (Balasubramaniam et al., 1975), cytosolic factors (Kwok et al., 1981), and phosphorylation (Scallen and Sanghvi, 1983).
Recently P450VIIA has been purified to homogeneity and its enzymatic natures characterized to show strict substrate specificity and stereo-and regioselectivity (Ogishima et al., 1987). Using specific antibodies against the enzyme as a probe, a cDNA clone encoding P450VIIA has been isolated and its complete amino acid sequence predicted from the nucleotide sequence of cDNA (Noshiro et al., 1989). Subsequent studies using the specific antibodies and the cDNA have demonstrated that the circadian rhythm of cholesterol 7a-hydroxylase activity is mainly pretranslationally regulated and that both the enzyme protein and mRNA of cholesterol 7a-hydroxylase show rapid turnover . For further understanding of such enzyme regulation, characterization of the protein structure as well as the regulatory elements of the gene may be needed. In this study, we determined the structure of the rat cholesterol 7a-hydroxylase gene and performed an alignment analysis of the protein sequence. EXPERIMENTAL PROCEDURES*

RESULTS AND DISCUSSION
Number of Cholesterol 7a-Hydroxylase Genes in Rat Genome-Eight positive clones were isolated from 2 X lo6 plaques by screening XEMBL genomic libraries using the insert of p7a-11 cDNA as a probe and then subjected to restriction mapping analysis. We constructed the cholesterol 7a-hydroxylase gene by overlapping two clones (XG-3 and XG-4) as shown in Fig. 1. Other isolated clones showed the restriction sites of BamHI, EcoRI, HindIII, PstI, and XbaI overlapping with parts of the constructed gene (data not shown). To examine the size and complexity of the cholesterol 7a-hydroxylase gene, we performed Southern blot analysis of total genomic DNA using a total insert of p7a-11 cDNA or fragment I or I1 (shown in Fig. 1) as a probe. The hybridized signals with the total insert of cDNA showed a simple hybridization pattern on digestion of genomic DNA by BamHI, EcoRI, or HindIII ( Fig. 2A). When fragment I, which contains Portions of this paper (including "Experimental Procedures" and Figs. 1-6) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal t.hat is available from Waverly Press.

6467
Cholesterol 7a-Hydroxylase Gene the first exon and the flanking sequence (700 base pairs) prepared from XG-4, was used as a probe, Hind111 digests of genomic DNA showed only one hybridized band (2.2 kbp) corresponding to the expected DNA fragment of the cholesterol 7a-hydroxylase gene (Fig. 1). When fragment 11, a 3'noncoding region of p7a-11 cDNA (1.1 kbp), was used to hybridize EcoRI digests of genomic DNA, only one hybridized band (4.0 kbp), which was also expected from the restriction map, was observed (Fig. 2C). Sizes of the hybridized bands with fragments I and I1 were the same as the corresponding DNA fragments obtained from HindIII-digested XG-4 and EcoRI-digested XG-3, respectively (data not shown). Taken together, the data show that there appears to be only one gene encoding P450VIIA in rat genome. Accordingly, the two clones, XG-3 and XG-4, may have been derived from the same gene covering an entire transcriptional region.
Introns 1-5 were 1.7, 0.7,0.9, 1.9, and 0.5 kbp long, respectively, and were not so large and not too different from one another in size. Nucleotide sequences of all exon/intron boundaries followed the canonical G-T/A-G rule. Flanking sequences of 5'-and 3"splicing sites in the introns were well in accordance with the splice junction consensus sequences proposed by Mount (1982) except for the sequence (5"tattttaaaatatctg-3') of the 3"splicing site of the fourth intron, which was rich in nucleotide A. This sequence may be usable also for splicing according to the pre-mRNA splicing mechanism recently proposed by Chabot et al. (1985), namely small nuclear ribonucleoprotein particles should recognize not the primary structure but the secondary or tertiary structure of the 3"splicing region.
The poly(A)+ attachment site at the 3' end of the genomic sequence was assigned by comparison with the cDNA sequence. The putative poly(A)+ attachment signal, AGTAAA, existed 1 2 base pairs upstream from the poly(A)+ attachment site. Li et al. (1990) suggested that there are two major cholesterol 7a-hydroxylase mRNA species of different size in rat liver, which may be derived from the use of two poly(A)+ attachment signals. However, our Northern hybridization analysis ) has shown only one major mRNA species, although another polyadenylation signal sequence corresponding to the second signal reported by Li et al. (1990) (CATAAA at position 1964 in their cDNA sequence) is recognizable in our nucleotide sequences in both cDNA and gene. The reason for this discrepancy is presently unknown.
To define the start site for transcription, S1 nuclease mapping was performed. As shown in Fig. 4, the signal of S1 nuclease-resistant fragment corresponding to nucleotide G, which was considered to be the nucleotide capped in the mRNA, was observed 61 base pairs upstream from the ATG translation initiation codon. In the region preceding the putative cap site, sequence TATAAA, the expected TATA box, occurred between positions -29 and -24. The sequence ATTGG complementary to the CCAAT consensus sequence is present between positions -51 and -47. However, we have not yet discovered a regulatory element, which may be sensitive to bile acid(s), a putative negative feedback effector for cholesterol catabolism. The region further upstream is now under investigation.
Analysis of Amino Acid Sequence-Li et al. (1990) and Jelinek et at. (1990) have also recently reported the nucleotide sequence of cDNA for rat cholesterol 7a-hydroxylase. Their coding sequences are identical to our sequence except for a few nucleotide differences, which do not alter the amino acids encoded. These are likely due to the different strains of rats used. Accordingly, the amino acid sequences of P450VIIA reported by all three laboratories are completely identical. Fig. 5 shows the alignment of P450 amino acid sequences with the exon/intron boundaries superimposed. No intron in the cholesterol 7a-hydroxylase gene is equivalent to any intron in other P450 genes with respect to their locations on the aligned amino acid sequences except for intron 5 of the cholesterol 7a-hydroxylase gene, which is the same position upon alignment as intron 10 of the gene encoding P4501VAl. This poor correlation favors the hypothesis that the addition of introns after divergence of P450 families may be a mechanism for generation of these variations in gene structure (Sogawa et al., 1984).
Several amino acid residues are highly conserved in all P450 species to maintain the function of hemoprotein as monooxygenase (Gotoh and Fujii-Kuriyama, 1989). Most of the conserved amino acids were also observed in P450VIIA, as indicated at the bottom of each alignment in Fig. 5. However, the alignment analysis indicated that the distal helix of P450VIIA contained an asparagine in place of the well conserved threonine residue (alignment position 355), which is common to almost all P450s so far sequenced, except for three: P-450IIA2 (replaced by serine (Matsunaga et al., 1988)), P450IIB3 (replaced by serine (Labbi! et al., 1988)), and P450IIIA1 (replaced by proline (Gonzalez et al., 1985)). In P450CIA1 the hydroxyl group of the side chain of this amino acid residue makes a hydrogen bond with the carbonyl oxygen atom of glycine (alignment position 351), which is considered to be important in forming the O2 binding site (Poulos et al., 1987). Although substitution by an asparagine residue at position 355 has not been observed in the other P450 species so far sequenced, it does not seem to have a fatal effect on enzyme activity, since Imai and Nakamura (1989) have demonstrated that replacement of the corresponding threonine 301 of P450IIC2 with asparagine by site-directed mutagenesis did not abolish the activity but retained 38% of that of the wild type. It should be noted that this asparagine residue is also conserved in the amino acid sequence of human P450VIIA, which was predicted from the cDNA . Furthermore, the amino acid sequence of P450VIIA lacks basic amino acids such as arginine or lysine at alignment positions 157 and 508 (as indicated in Fig. 51, which are likely to interact with heme propionate (Poulos et al., 1985), i.e. positions 157 and 508 of rat P450VIIA are replaced by asparagine 126 and threonine 442, respectively. The same substitutions have also been observed in human cholesterol 7a-hydroxylase. These substitutions may explain the unique function and unusual lability of cholesterol 7ahydroxylase described by Ogishima et al. (1987). Fig. 6 shows a phylogenetic tree constructed based on the analysis of the P450 protein sequences. Among the known P450 families, P450VIIA is most closely related to yeast P450LIA. These two P450s are estimated to have diverged 1.0-1.2 x lo9 years ago. It has been proposed (Gotoh and Fujii-Kuriyama, 1989) that the substrate specificity of ancestral enzyme might have been stringent, but duplications and mutations produced P450 species with relaxes specificities. Accordingly, early divergence of P450VIIA may be a reason for its stringent substrate specificity (Ogishima et al., 1987).
It has been reported that the circadian rhythm of cholesterol 7a-hydroxylase activity  and the modulation of the activity by bile acid(s) (Jelinek et al., 1990) are mainly regulated at the pretranslational level. The enzyme activity is also modulated by various other factors such as thyroxine, glucocorticoids, and insulin . Characterization of the regulatory sequences in the gene responsive for these factors should be helpful for understanding the multifactorial regulation and tissue-specific expression of liver cholesterol 7a-hydroxylase, which may lead to the ultimate clarification of the etiology of a number of important human diseases related to disorders of cholesterol metabolism.