Complete cDNA and Protein Sequence of a Pregnenolone 16ar-Carbonitrile-induced Cytochrome P-450 A REPRESENTATIVE OF A NEW GENE FAMILY*

A full-length cDNA complementary to rat liver mRNA coding for pregnenolone 16a-carbonitrile-in-duced cytochrome P-450 (P-45OPCN) was isolated and completely sequenced. P-45OPCN mRNA is 2038 nu-cleotides in length and has a continuous reading frame (82-1596) that encodes a protein of 504 amino acids (Mr = 57,917). The amino-terminal sequence of 18 residues of the purified P-450PCN protein agrees with the open reading frame of the cDNA sequence. The P-45OPCN mRNA nucleotide and amino acid sequences clearly establish that this, cytochrome is a member of a separate P-450 family different from the phenobarbital-induced (e.g. P-45Oe) and 3-methyl- cholanthrene-induced (e.g. P-45Oc) P-450 gene families. P-450PCN shares 38 and 37% nucleotide similarity and 33 and 33% amino acid similarity with P-45Oe and P-450c, respectively. P-45OPCN, P-450e, and P-450c exhibit greater homology in the C-terminal half than in the N-terminal half of the proteins. Included in this region is the cysteinyl phage DNA preparations with EcoRI revealed the presence of a few isolates with approximately 2100-bp inserts consisting of separate 500- and 1600-bp fragments. These fragments were cloned into pBR322, amplified, and subsequently isolated. M13 shotgun libraries of the two fragments were prepared by use of the sonication method (30). Briefly described, the separate inserts were circularized by ligation and sonicated by four 10-s bursts of 50 watts each with a Branson Sonifier. The resultant fragments were electrophoresed on a 1% agarose gel; DNA sizes between 400 and 800 bp were collected by excision and electroelution. DNA was made blunt-ended with T4 DNA polymerase and ligated into M13 mpll replicative form that had been linearized with SrnaI. Transformation was carried out with Escherichia coli JM103, and plaques with inserts were confirmed by amplifying individual phage in liquid culture, spotting the phage supernatant on nitrocellulose filters, and annealing with 3ZP-labeled nick-translated P-450PCN cDNA inserts. Sequencing was carried out by the dideoxy chain terminator method (31) with the use of the 17-bp primer from P-L Biochemicals. Each nucleotide was read on an average of 7 times and a minimum of once in each direction (opposite DNA strands). Sequence data were analyzed by use of standard nucleotide (32-34) and protein (34) computer programs.

A full-length cDNA complementary to rat liver mRNA coding for pregnenolone 16a-carbonitrile-induced cytochrome P-450 (P-45OPCN) was isolated and completely sequenced. P-45OPCN mRNA is 2038 nucleotides in length and has a continuous reading frame (82-1596) that encodes a protein of 504 amino acids (Mr = 57,917). The amino-terminal sequence of 18 residues of the purified P-450PCN protein agrees with the open reading frame of the cDNA sequence.
The P-45OPCN mRNA nucleotide and amino acid sequences clearly establish that this, cytochrome is a member of a separate P-450 family different from the phenobarbital-induced (e.g. P-45Oe) and 3-methylcholanthrene-induced (e.g. P-45Oc) P-450 gene families. P-450PCN shares 38 and 37% nucleotide similarity and 33 and 33% amino acid similarity with P-45Oe and P-450c, respectively. P-45OPCN, P-450e, and P-450c exhibit greater homology in the C-terminal half than in the N-terminal half of the proteins. Included in this region is the cysteinyl fragment (surrounding residue 443 in P-45OPCN). which appears to be the most conserved among all fragments of other P-450 proteins. Of interest, the N-terminal region of P-450PCN does not contain the cysteine residue previously thought to contribute the thiolate ligand to the heme iron in P-450 proteins; these data establish more firmly the cysteine residue located in the carboxylterminal region as serving this function. These sequence studies further support the conclusion derived from chromosomal localization studies and Southern blot analyses that P-450PCN represents a member of a distinct third family of P-450 genes, which diverged from a common ancestor more than 200 million years ago.
Cytochromes P-450 represent an unknown number of NADPH-dependent CO-inhibitory heme-containing enzymes involved in the metabolism of endogenous compounds, such as steroids and fatty acids, and numerous foreign chemicals, such as drugs and chemical carcinogens (1)(2)(3)(4)(5). Although several hundred chemicals are known to induce one or more P-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18  450-mediated enzyme activities (1,4), the number of classes of inducers is not known. Prototypes of the first two major classes of P-450 inducers are 3-methylcholanthrene (6) and phenobarbital (7, 8). TCDD' was subsequently shown to be far more potent than 3-methylcholanthrene as an inducer of P-450 (9) and as a ligand for the Ah receptor which mediates the induction response for this class of inducers (10)(11)(12). The cDNA nucleotide and deduced amino acid sequences have now been reported for several, members of the phenobarbitalinducible (13,14) and TCDD-inducible (15-18) P-450 gene families, and it is clear that these two families diverged more than 200 million years ago.
A distinct third class of P-450 inducers is the steroids, and PCN is a prototype for this class (19). Glucocorticoids such as dexamethasone, as well as PCN, have been shown to regulate de mu0 protein synthesis of one or more forms of P- 450 (20, 21). Although only one PCN-induced P-450 protein has been purified and characterized to date (22), Southern blot hybridization reveals that at least 50 kilobases of genomic DNA hybridizes with a PCN-inducible P-450 cDNA clone, suggesting the presence of multiple genes (23,24). Evidence via Northern blot hybridization (23) and direct annealing of the P-450PCN cDNA against a cDNA clone homologous to phenobarbital-induced P-450b (24) suggest that the PCNinducible cytochromes P-450 are products of a gene family distinct from either the phenobarbital-or TCDD-inducible P-450 gene families. Furthermore, chromosomal localization studies have demonstrated that the P-450PCN and the phenobarbital-inducible P-450 gene families map to mouse chromosomes 6 and 7, respectively (24,25), and the TCDDinducible P-450 gene family maps to mouse chromsome 9 (26).
In this paper the full-length cDNA nucleotide and deduced amino acid sequences of PCN-inducible P-450PCN are reported and compared with those of the phenobarbital-inducible and TCDD-inducible P-450 gene families.

EXPERIMENTAL PROCEDURES
Total polysomal mRNA was isolated from 150-g male rats that had been administered PCN 24 h prior to killing (23). Double-stranded cDNA was produced and ligated into Xgtll (27) by use of EcoRI linkers (28). Plaque hybridization (29) was then carried out with pP450PCN-10 to isolate phage containing P-450PCN cDNA inserts. pP450PCN-10 contains a 3' portion of about 1600 bp of its corresponding 2300-nucleotide mRNA (23). Digestion of several purified The abbreviations used are: TCDD, 2,3,7,8-tetrachlorodbenzo-pdioxin; PCN, pregnenolone 16a-carbonitrile; P-450PCN, that form of cytochrome P-450 inducible by PCN and the cDNA of which has been sequenced in this report; bp, base pairs. 7435 phage DNA preparations with EcoRI revealed the presence of a few isolates with approximately 2100-bp inserts consisting of separate 500-and 1600-bp fragments. These fragments were cloned into pBR322, amplified, and subsequently isolated. M13 shotgun libraries of the two fragments were prepared by use of the sonication method (30). Briefly described, the separate inserts were circularized by ligation and sonicated by four 10-s bursts of 50 watts each with a Branson Sonifier. The resultant fragments were electrophoresed on a 1% agarose gel; DNA sizes between 400 and 800 bp were collected by excision and electroelution. DNA was made blunt-ended with T4 DNA polymerase and ligated into M13 mpll replicative form that had been linearized with SrnaI. Transformation was carried out with Escherichia coli JM103, and plaques with inserts were confirmed by amplifying individual phage in liquid culture, spotting the phage supernatant on nitrocellulose filters, and annealing with 3ZP-labeled nick-translated P-450PCN cDNA inserts. Sequencing was carried out by the dideoxy chain terminator method (31) with the use of the 17bp primer from P-L Biochemicals. Each nucleotide was read on an average of 7 times and a minimum of once in each direction (opposite DNA strands). Sequence data were analyzed by use of standard nucleotide (32)(33)(34) and protein (34) computer programs.

RESULTS
Sequence of P-45OPCN Full-length cDNA-The Xgtll vector (27) was used because a large number of clones can be generated and easily screened by plaque hybridization, thereby increasing the probability of obtaining a particular full-length cDNA. Indeed, from 5 pg of poly(A) RNA, 1 x lo7 total recombinant phage were produced, and approximately one-half contained rat cDNA inserts. When 20,000 plaqueforming units were screened with pP450PCN-10,20 positives were selected. This percentage of positive clones is proportional to the estimated abundance of induced P-450PCN mRNA in total poly(A) RNA (23). Six phage were isolated and grown; three had cDNA inserts of about 2.1 kilobases consisting of two EcoRI fragments. The two EcoRI inserts were then subcloned into pBR322 and completely sequenced. The sequence across the single EcoRI site in the cDNA was obtained by restriction site-specific cloning of a fragment from the intact phage cDNA insert.
Rat P-450PCN full-length cDNA contains 2038 bp ( Fig. 1) and, therefore, is of similar length to rat P-450d and P-450e cDNAs (13)(14)(15) but shorter than rat P-45Oc (16). Since poly(A) tracts are typically 200 to 300 residues, the P-450PCN mRNA would be 2.3 to 2.4 kb in length; this estimate agrees well with the size of P-450PCN mRNA, as determined by Northern blot analysis (23). A continuous reading frame of P-450PCN cDNA spans nucleotide 82 through 1596, leaving 81 bp of 5' leader sequence and more than 400 bp of 3' noncoding sequence. Moreover, the codon usage for rat P-450PCN (Table I) was consistent with the usage of more than 20,000 codons in the rat gene data base. This is particularly evident when the codon usage is examined for leucine, isoleucine, valine, serine, proline, threonine, alanine, and arginine; in these cases the most infrequently used codons in P-450PCN translation are also the most infrequently used codons in the cumulative data base ( Table I). The codon usage of P-450PCN is also compared with that of rat P-45Oc and P-450e in Table  I. The UGA termination codon of P-450PCN and P-450e is the same as that of other rat genes in the data base, whereas P-45Oc uses UAG for termination. Otherwise, those codons used most infrequently in P-45Oc and P-450e are similar to those used most infrequently in P-450PCN. A single poly (A) addition site in the P-450PCN cDNA ( Fig. 1) was found in its expected position 25 to 30 bp from the poly(A) tract.
Analysis of the P-45OPCN Protein-Amino-terminal sequencing was carried out on purified P-450PCN in order to verify the correct cDNA reading frame. Except for positions 1 and 5, perfect agreement was obtained between the automated Edman degradation and the cDNA encoding the first 20 amino acids (Table 11). For unknown reasons, the first and fifth cycles did not yield clearly identifiable phenylthiohydantoin derivatives. Comparison of the amino acid composition of P-450PCN, P-450c, and P-450e (Table 11) shows several striking differences, giving further evidence that these three proteins are derived from distinct gene families. For example, P-45Oc and P-450e histidines are almost twice as abundant as P-450PCN histidine; P-450PCN and P-45Oc tryptophans are at least five times as common as P-450e tryptophan.
The P-450PCN protein contains 504 residues and has a calculated molecular weight of 57,917. This value is most 6 kDa higher than the estimated molecular mass of 52,000 determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (22). This type of variation has also been noted for the molecular masses derived from cDNA sequences of rat P-450d (15), mouse P3-450 (17), mouse P1-450 (18), and rat P-450e (13), in which cases the deviations were in the range of 3 to 5 kDa.
Comparison of the hydropathy indices for P-450PCN, P-450c, and P-450e reveal many dissimilarities (Fig. 2). This is not unexpected, however, since these three proteins represent products of three different gene families. Conversely, hydropathy index patterns for proteins which the same gene family (e.g. P1-450 and P3-450; P-450e and rabbit form 2) tend to be quite similar (18). Two similarities are noted, however, between the three proteins in Fig. 2. The first is a stretch of 15 to 25 hydrophobic amino acids present at the amino terminus of each of the three proteins; this region may serve as a common functional domain for all of the P-450 proteins, or it may simply function as a membrane-binding signal during translation (38). The second point of similarity is the large segment of hydrophilic amino acids (residues -400 to -440) appearing just before the conserved C-terminal cysteinyl peptide (Fig 2, bar 3) in all three proteins. This segment may serve some function related to the enzyme active-site, or it may be required for proper recognition of the oxidoreductase. Comparison of P-45OPCN Nucleotide and Protein Sequences with Other P-450 cDNA and Protein Sequences-The nucleotide and amino acid sequences of P-450PCN were compared with those of other P-450 cDNAs and proteins published to date (13)(14)(15)(16)(17)(18)(39)(40)(41)(42). The data are summarized in Table 111. Basically, the P-450PCN cDNA nucleotide sequence has diverged more than 60% from the other P-450 cDNAs, whereas the P-450PCN amino acid sequence has diverged 67% or more from the other P-450 proteins.
Amino acid sequences of the so-called N-terminal and Cterminal cysteinyl fragments were also compared for nine P-450 proteins (13)(14)(15)(16)(17)(18)(39)(40)(41)(42). It can be seen (Fig. 3) that P-450PCN represents a member of a distinct P-450 gene family, although all eight eukaryotic proteins exhibit some relatedness t o one another, as well as to the prokaryotic p -4 5 0~~~ (42). One striking finding in the P-450PCN sequence is the absence of cysteine in the so-called "N-terminal conserved   CCT CTG CCT  TTT TTT CYC ACT GTG CTG   AAT TAC TAT ATG GGT TTA TGG AAA TTC GAT GTG GAG TGC CAT AAA   I

T T C C C T C A A G G A G T T C T G T C G T C A~T G T a 1590
T G S '

G G C T T A A T~~~~T G T T T T G A T T C G G T A C A T C T T T G A T C T T T C T~T G T C T~T G T A~T C T A A T A T~T G A C A A G T C A G T G A~T T~T
1710 1830     cysteinyl fragment," postulated (41) to be the donor of the thiolate ligand to the heme in the enzyme active-site. In the case of P-450PCN, isoleucine (using the codon ATT) replaces cysteine. An identical amino acid substitution appears to have occurred in P-~~OSCC, the only other P-450 protein reported to date lacking a cysteine in its N-terminal conserved cysteinyl fragment (39); however, the isoleucine codon in this case was ATC.

T G T G A C A C C T T T A A T T G T A G~T T T G G T A T C A G A T G T T T A G A T G C A T T A T T C T A C A C T A A A T G T T A C A T~T G T~C T T C T T T~T~~
Two alignments are shown for P-450c, P1-450, P-450d, and P3-450 (Fig. 3, top). Of interest, the N-terminal fragment centered around Cys-160 or Cys-158 has considerably less homology with the other P-450 proteins than an N-terminal fragment centered aroung Tyr-172 or His-170 having no cysteine.
In contrast, the P-450PCN C-terminal cysteinyl fragment (Fig. 3, bottom) exhibits considerably more homology with the corresponding fragment from the other eight entries. Particularly noteworthy is the presence of phenylalanine, glycine, glycine, arginine, glycine, alanine, glutamine, leucine, and phenylalanine in positions -7, -6, -4, -2, +6, +9, +12 and +13, respectively, relative to cysteine. In this C-terminal fragment P-450PCN appears to be about equally similar to members of the TCDD-inducible and phenobarbital-inducible P-450 gene families. The data in Fig. 3, therefore, lend further support to the hypothesis (17, 18, 37) that the C-terminal conserved cysteinyl fragment is important in the enzyme active-site.
Global nucleotide and amino acid alignment programs between P-450PCN and members of the phenobarbital-and TCDD-inducible P-450 gene families appeared to show more homology in the C-terminal half of the protein. This finding is particularly well illustrated by the dot matrix analysis in Fig. 4. Homology across the C-terminal region of the P-450 proteins suggests that this portion of the molecule may play a more important role than the N-terminal half of the molecule in some common P-450 function or property, such as heme binding or interaction with NADPH-cytochrome P-450 oxidoreductase or cytochrome bs. Segments of homology in the N-terminal half of the proteins can be seen in Fig. 4, however, especially when P-450PCN is compared with P-450e.

DISCUSSION
The complete cDNA and deduced amino acid sequences of PCN-inducible P-450PCN have been determined. Although the P-450PCN amino acid sequence has diverged at least 67% from every other P-450 sequence reported to date, it has still retained certain features in common with all other members of the P-450 superfamily. These include (i) the hydrophobic N-terminal region (except in the case of the mitochondrial P-45Ossc in which post-translational processing occurs (39)) and (ii) the conserved C-terminal cysteinyl fragment containing The comparative data in this report also allows an estimate as to when the P-450PCN gene separated from the other P-450 gene families that have been sequenced to date. The unit evolutionary period is the time, in millions of years, required for 1% divergence in the amino acid sequence of a speciesrelated protein (43). This rate of divergence becomes increasingly nonlinear as one goes further back in time (43,44). Fossil data and other protein sequence data indicate that the rabbit-rodent split occurred about 60 million years ago (43,44) and the rat-mouse split occurred about 17 million years ago (45). Based on phenobarbital-inducible rabbit and rat P-450 amino acid sequence (46) and mouse and rat P-450 amino acid sequence data (18), the P-450 unit evolutionary period has been estimated as 2.1 (46) and 2.4 (18), respectively. Since the P-450PCN protein has diverged at least 67% from any other P-450 protein reported to date, it can be concluded that the steroid-inducible P-450 gene family has separated from the phenobarbital-and TCDD-inducible P-450 gene families much more than 200 million years ago.
All P-450 families so far characterized have been found on different chromosomes. The phenobarbital-inducible P-450 genes have been linked to the Coh locus near the proximal end of mouse chromosome 7 (25), whereas the TCDD-inducible P-450 gene family has been localized to mouse chromosome 9 (26). The gene encoding the P-450 responsible for C-21 hydroxylation of steroids maps very close to the H-2 locus on mouse chromosome 17 (47). The PCN-inducible P-450 gene family has recently been assigned to mouse chromosome 6 (24), the same chromosome to which NADPH-cytochrome cys-443. P-450 oxidoreductase maps (24). It, therefore, appears that these above-mentioned P-450 gene families have diverged from one other such a long time ago that the members of the superfamily no longer are localized to one chromosomal region. The same conclusion can be drawn for the globin and immunoglobulin gene superfamilies.
Although only a single PCN-induced P-450 cDNA has been cloned and sequenced, Southern blot analysis (23,24) suggests the presence of multiple P-450PCN-related genes. This question awaits further structural analysis of other PCN-inducible P-450 cDNA and genomic clones. Finally, it should be noted that the precise relationship of the P-450PCN reported in the present study to the cytochrome described by Elshourbagy and Guzelian (22) remains to be established.