Structure and Organization of the Microsomal Xenobiotic Epoxide Hydrolase Gene*

The gene for the microsomal xenobiotic rat liver epoxide hydrolase has been isolated and characterized. Clones were obtained from a Wistar Furth Charon 35 genomic library by hybridization with a full-length epoxide hydrolase cDNA. The gene for the xenobiotic epoxide hydrolase is approximately 16 kilobases in length and consists of 9 exons ranging in size from 109 to 420 base pairs and 8 intervening sequences, the largest of which is 3.2 kilobases. S1-nuclease mapping, primer extension studies, and sequence analysis were used to determine the 5’ cap site and the size of the first exon ( 170 base pairs). Regulatory sequences anal- ogous to TATA, CCAAT, and core enhancer sequences were noted in the 5”flanking region of the gene. The cDNA and gene for epoxide hydrolase displayed nucleotide sequence identity although they were isolated from different rat strains. Also, Southern blot analysis of restricted liver DNA from inbred Fischer 344 and Wistar Furth rat strains, and outbred Sprague-Dawley rats indicated a high degree of structural similarity for the epoxide hydrolase gene within these three strains. Only a single functional epoxide hydrolase gene was identified and no evidence of hybridization to the genes for the microsomal cholesterol epoxide hydrolase or the cytosolic epoxide hydrolase was observed. How- ever, a pseudogene for the microsomal xenobiotic epoxide hydrolase was isolated and characterized from the genomic library. ster-eospecific


The nucleotide sequence(s) reported in this paper has been submitted to the GenBankm/EMBL Data
Bank with accession numbeds) 50291 1. $Postdoctoral fellow of the American Cancer Society. Present address: Dept. of Pharmacology and the Cancer Research Center, University of Rochester, Rochester, NY 14642. and the nuclear envelope of several mammalian cell types (7-11) and is readily inducible by phenobarbital (10, 12), N-OH- 2-acetylaminofluorene (13) and trans-stilbene oxide (10,14). Although the function of microsomal epoxide hydrolase is essentially the hydration of reactive epoxides, the enzyme plays a dual role in the metabolism of large polycyclic hydrocarbons which possess a bay region. Epoxide hydrolase inactivates monofunctional epoxides of these compounds and produces the precursor for conversion to a very reactive dihydrodiol bay region epoxide by the monooxygenase system (15).
Cloning of the cDNA for the Sprague-Dawley rat liver microsomal xenobiotic epoxide hydrolase was first reported by Gonzalez and Kasper (16) and the nucleotide sequence and translation of the cDNA was subsequently determined by Porter et al. (17). The cDNA contains an opening frame of 1365 nucleotides coding for a 52,581-dalton protein consisting of 455 amino acids (17). The amino acid composition of the deduced protein agrees well with the amino acid composition determined by direct analysis of the purified rat protein (17).
In this report, we describe the organization and partial sequence of the microsomal xenobiotic rat liver epoxide hydrolase gene. The cDNA was used to isolate a family of clones which contained the xenobiotic epoxide hydrolase gene from a Charon 35 Wistar Furth rat liver genomic DNA library. Only a single functional gene has been identified; however, during the characterization of the gene, a pseudogene of epoxide hydrolase was also isolated and characterized.

Structure and
Organization of the Epoxide Hydrolase Gene-The intron-exon distribution of the microsomal epoxide hydrolase gene was determined from a family of five unique Charon 35 genomic clones (Fig. la). The gene spans approximately 16 kb2 of DNA and consists of nine exons and eight introns. Clones XEHE-1 and -2 were used to localize the first three exons while three different overlapping clones  were utilized to position the final six exons. Approximately 11 kb of flanking DNA lie 5' of the first exon while the 3"flanking region extends 9 kb downstream from the polyadenylation site ( Fig. 1).
Of particular interest is the fact that a small segment of Portions of this paper (including " Materials and Methods" and Figs. 2,[4][5][6][7]and 9) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 86M-3804, cite the authors, and include a check or money order for $4.00 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal that is available from Waverly Press.
*The abbreviations used are: kb, kilobase; bp, base pair; AMV, avian myeloblastosis virus (Miniprint); PIPES, l,4-piperazinediethanesulfonic acid (Miniprint). DNA (approximately 1.5 kb) within the third intron remains uncloned (dashed line, Fig. la). Even though the genomic library was exhaustively screened, no clones were isolated that linked XEHE-2 and XEHE-3. Numerous attempts were made to isolate specific restriction fragments suspected of containing this region of the gene, but cloning these fragments into Charon 35 or M13mp vectors proved unsuccessful. A variety of bacterial hosts were also used in these cloning attempts including Escherichia coli strains K802 (recA+), HBlOl (recA), and CES 200 (recBC, sbcB). In order to analyze the size and restriction sites present in the gap, genomic rat liver DNA was hybridized with probes specific for exons 3 and 4 ( Fig. la). Fig. 21 shows the hybridization of restricted genomic rat liver DNA with the probe specific for exon 4. This probe, transcribed as described under "Materials and Methods," hybridized with a single 2.6-kb DNA fragment in the BamHI digest ( Fig. 21) but did not hybridize with the epoxide hydrolase pseudogene (discussed later in text). The probe is specific for the portion of exon 4 which is 5' to the BamHI site in exon 4 and therefore also specific for the 1.1-kb Sau3A-BamHI fragment at the 5'-end of XEHE-3 ( Fig. la). Since this probe recognizes a 2.6-kb BamHI fragment in genomic DNA (Fig. 21), the gap is at least 1.5 kb in length. As expected from the restriction map of the genomic clones (Fig. l), the probe also hybridized to a 1.4-kb fragment in the XbaI digest. Hybridization to a 4.5-kb fragment in the HindIII digest suggests that a HindIII site is present in the missing DNA segment. Also, a 10.5-kb fragment was detected in the EcoRI digest (Fig. 20, consistent with the location of EcoRI sites in the genomic clones (Fig. 1) and a gap of approximately 1.5 kb of DNA. Fig. 211 shows the hybridization of digested genomic DNA with the 1.7-kb BamHI fragment of XEHE-2 which contains exon 3 (Fig. 1). These restriction digests contain multiple bands, because this probe also hybridizes to the pseudogene. In the BamHI digest, the 1.7-kb fragment is detected as well as the cross-hybridizing 1.65-kb BamHI fragment derived from the pseudogene (Fig. 211). The 10.5-kb EcoRI fragment previously detected by the exon 4-specific probe also hybridized with the exon 3 probe (Fig. 211). These results suggest that this EcoRI fragment contains the missing DNA and extends into regions of the gene detected by both probes. Furthermore, the exon 3-specific probe hybridized to 6.0-and 9.0-kb XbaI fragments and to 10-and 13-kb fragments in the HindIII digest (Fig. 211). The 6.0-kb XbaI fragment and the 13-kb HindIII fragments are apparently derived from the pseudogene (to .be discussed). Detection of a 9-kb XbaI fragment and a 10-kb HindIII fragment with the exon 3-specific probe also agrees with the restriction map of the gene (Fig.   I), assuming the gap is only 1.5 kb in length. A gap of 1.5 kb would also mean that intron 3 is approximately the same size as intron 7 and shorter than intron 1, the largest intron ( Fig.  1).
In order to determine the number of exons and to identify the intron-exon junctions, all regions of XEHE-2 and -3 hybridizing with the full-length cDNA were sequenced. Fig. 3 displays the sequence and translation of the exons, the intronexon junctions, and the immediate 5'-and 3"flanking regions. The gene consists of 9 exons ranging in size from 109 to 420 base pairs and 8 intervening sequences. The largest intron is approximately 3.2 kb in length. All of the intron-exon junctions are completely consistent with published consensus sequences for donor and acceptor sites (26) and four of the splice sites occur within glycine codons. Furthermore, the exonic sequences account for the complete cDNA sequence; hence, the gap is located entirely within intron C (Fig. 3).
Primer Extension and Sl Nuclease Analysis of the Cap Site- The location of the cap site and the size of the first exon was determined by primer extension analysis and S1 nuclease experiments. SI nuclease experiments were done with the 565bp BamHI-PstI fragment from the 5'-end of the 1.95-kb BamHI piece containing exon 1 ( Fig. la, expanded scale). The BamHI-PstI fragment was labeled with [y3*P]ATP and polynucleotide kinase, then hybridized to epoxide hydrolaseimmunoenriched poly(A)+ mRNA from trans-stilbene oxidetreated Wistar Furth rats and to poly(A)+ mRNA from Sprague-Dawley rats induced with 2-acetylaminofluorene. Fig. 4A shows that RNA from both rat strains (lanes 4 and 5) protected an approximately 161-bp fragment from Sl nuclease digestion. Also, the immunoenriched poly(A)+ mRNA gave a greater signal than the 2-acetylaminofluorene-induced poly(A)+ mRNA even though 20 times less RNA was used. Poly(A)+ mRNA from Sprague-Dawley rats treated with trans-stilbene oxide also gave identical results in the S1 nuclease experiments (data not shown).
The primer used in the primer extension studies was a 78bp SphI-Sin1 fragment from the 5'-end of pEH52 (17). Furthermore, sequence analysis demonstrated that this fragment was completely contained within the first exon of the gene. Fig. 4B displays the extension of the primer by reverse transcriptase following hybridization to the poly(A)+ mRNAs from both the Wistar Furth and Sprague-Dawley rats which were used in the S1 nuclease experiments. In each case, the 78-bp primer was extended 75 bases (Fig. 423). Both the SI nuclease-protected fragment and the extended primer specified the same cap site. Furthermore, S1 nuclease and primer extension experiments, plus the sequence of the first exon indicated that the first exon is 170 bp in length. Also, identical results were obtained in both types of experiments using RNA from two different strains of rats which had been treated with different inducing agents.
Organization and Structure of a Microsomal Epoxide Hydrolase Pseudogene-During the screening of the Wistar Furth genomic DNA library with pEH52, a second overlapping family of clones (XEHP) was isolated which also hybridized to the cDNA. Analysis of these clones subsequently indicated that they represent a pseudogene of microsomal epoxide hydrolase. Fig. 6 shows the five clones used to characterize the pseudogene and a partial restriction map of these clones. The lower case letters define sequences in the pseudogene hybridizing with the epoxide hydrolase cDNA.
Hybridization analysis of these clones indicated that the

A L m A L m T y r ILe LaU 61u L y s Phe S e r T h r T r p T h r L y s Ser GLu T y r Arp GLu Leu GLu Asp GLy GLy Leu GLu Ar
(1040)

CTG 6 1 1 A A C ATC ATG A T C TAC TGG A C G A L A GGA A C C ATT GTC TCC T C C C I A C G C T A C TAC LAG G A G AAT TTG GGC C A G L e u Asp Asp L e u L e u V a l Asn I L e Net I L e T y r T r p T h r T h r G l y T h r I L e V a l S e r S e r GLn A r g T y r T y r L y s GLu A r n L e u G l y
GLn (1140)

GTC T T T GTG C C C A C T G G C T T T TCA GCC TTC C C T T C C GAG CTA CTG C A T G t t C C A G A A AAG TGG GTG A A G GTC AAG T I C C C C M A CTC ATC V a l Phe V a l P r o T h r GLy Phe Ser
A l a P h e P r o S e r GLu L e u L e u H i s A l a P r o GLu L y r T r p V a l L Y S V a l L y s T y r P r o L y s L e u I L e (1263)

C C T G A C C A~A~T C G A G G A~C C A G A C T T A A A C T C C A C A G A G T C G T A T G T T A C C C C C A T A T G C T T C A C C T C A C T A C A T A G C T G T G T T A G C T A C A T G G C T T T A A~G G A T T T A T T T
(1 365) pseudogene spanned approximately 14 kilobases of DNA and possessed exons separated by intervening sequences. However, when small fragments of pEH52 were used as probes several regions of pEH52 failed to hybridize to the pseudogene clones. Fig. 7 shows the result of hybridization of the subcloned 5'-end of pEH52, which corresponds to the first 136 base pairs of the first exon, to XEHP-2 and XEHE-1. Even though the XEHP clones possessed approximately 8 kb of DNA 5' to the first site of hybridization with pEH52, no hybridization signal was observed corresponding to the first exon of pEH52. The 5' most hybridization signal observed in the XEHP clones (Fig. 6, region a) is homologous to the second exon of the gene. Further hybridization analysis indicated that in Fig. 6, regions a, b, c, d, and e correspond to exons 2, 3,5,7, and 9, respectively, of the epoxide hydrolase gene (data not shown).

C T A C A~~~G C C A A A T G C C A T~~~C A A T A T C C A C A S G G G C C C T C C C T T C C A A A A G A G A G T T G C A T G T G C T T C C C C A T C C C C T G T C T G G A G A G T C C T G C T G G 6 A C T l C T A A A C T
To further analyze the structure of the pseudogene and its relationship to the epoxide hydrolase cDNA, the region around the HindIII site in the 2.3-kb BamHI fragment of AEHP-5 was sequenced (Fig. 6). This HindIII site is analogous to the HindIII site in exon 9 of the gene (Fig. 1). Fig. 8 is a best fit computer analysis of this fragment of the pseudogene and the last 700 bases at the 3'-end of the epoxide hydrolase cDNA (17) which contains the sequences coded for by part of exon 6 and all of exons 7, 8, and 9. The sequence of the pseudogene indicates that this DNA fragment contains the acceptor splice site analogous to the acceptor splice site for exon 7 of the gene and a high degree of sequence identity with the cDNA in the region corresponding to exon 7. After 81 nucleotides, this identity abruptly ends prior to the donor splice site for exon 7 (base 155, Fig. 8). In this continuous fragment of the pseudogene, no regions of similarity are found with the cDNA corresponding to the last third of exon 7 or to exon 8 (Fig. 8). However, at the site where the acceptor splice site for exon 9 should occur, a high degree of identity (94%) is again observed between the pseudogene and the region corresponding to exon 9 in both the translated and 3'nontranslated regions of the cDNA. These results indicate that the pseudogene does not include part of exon 7 and all of exon 8 in their proper locations. Also, the sequence present between exons 7 and 9 in the pseudogene which is not homologous with the cDNA contains at least two termination codons in all three possible reading frames (Fig. 8). Hence, the results strongly suggest that the XEHP family of clones represents a pseudogene of the epoxide hydrolase gene and is missing all or part of several exons.
Analysis of Genomic DNA from Different Rat Strains-Since the cDNA, isolated from outbred Sprague-Dawley rats, displays sequence identity with the gene from inbred Wistar Furth rats, the organization of epoxide hydrolase-hybridizable sequences in genomic DNA of these rat strains and inbred Fisher 344 rats was investigated. Total liver DNA from the three strains was digested with several restriction enzymes and hybridized with pEH52. Fig. 9 shows a high degree of similarity between the three strains, and in contrast to patterns typical of multigene families such as the cytochromes P-450 (30), a relatively simple pattern of restriction fragments was observed. HindIII digests of DNA from all three strains exhibited the same number and pattern of fragments (Fig. 9); however, several different size fragments were observed between the strains in the BamHI and EcoRI digests. It is not known if these differences are due to fragments generated from the epoxide hydrolase gene or pseudogene; however, the number and sizes of restriction fragments in Fig. 9 as well as the gene-cDNA sequence homology (Fig. 3) strongly suggest that there is only one functional epoxide hydrolase gene in these rat strains.

DISCUSSION
Several forms of epoxide hydrolase have been identified and characterized in rat liver. These include at least two microsomal forms (3-6) and a cytoplasmic form (1,2). The three forms of epoxide hydrolase have been considered to be distinctly different since they display different substrate specificities and antibodies raised against any of the different forms do not recognize the other forms (4). During the cloning of both the cDNA and gene for the microsomal xenobiotic epoxide hydrolase, no evidence was obtained to suggest that the forms of epoxide hydrolase are related at the molecular level. The cDNA and gene for the microsomal xenobiotic epoxide hydrolase did not detect any other sequences which were derived from the other forms of epoxide hydrolase. Whether the different forms of epoxide hydrolase have been produced by divergent or convergent evolution is still not FIG. 8. Comparison of nucleotide sequences of the epoxide hydrolase pseudogene and the epoxide hydrolase cDNA. The upper strand is the sequence of the final region (Fig. 6, segments d and e) of AEHP-5 observed to hybridize to the epoxide hydrolase cDNA. The lower strand is the sequence of the final 700 bases of the epoxide hydrolase cDNA (17). Arrows in the cDNA sequence indicate the sites were introns were located (Fig. 3). The two sequential termination codons (TGA and TAG) present in the cDNA are underlined. Sequences were aligned using the best fit analysis of the Staden program (25). known. However, not until the other forms of epoxide hydrolase are cloned and sequenced will their relationship be better understood.
Since the cDNA was cloned from an outbred Sprague-Dawley strain and the gene was obtained from the inbred Wistar Furth strain, the nucleotide sequences were compared in order to detect possible differences. With one exception, the nucleotide sequences of the exons and cDNA were found to be identical (Fig. 3). The only region of nonidentity between the cDNA and gene was observed within the first 18 nucleotides of the cDNA, pEH52 (Fig. 5 ) . However, this stretch of 18 nucleotides is the reverse complement of a sequence occurring in the 3'-nontranslated region of the cDNA (17). This particular region is not present in pEH52 but is found in an overlapping cDNA (pEH4) (17). The position of the 18 bases in pEH4 coincides with the sequence immediately following the 3'-end of pEH52. It is possible that this difference in the sequence between the cDNA and gene represents a strain difference in the rats used to obtain the cDNA and gene; however, this conclusion is not supported by the primer extension and Sl nuclease analysis (Fig. 4) of the cap site of the gene using poly(A)+ mRNA from both rat strains. Since poly(A)+ mRNAs from two different rat strains were used in these experiments, if the sequence of the 18 bases at the 5'end of pEH52 (Fig. 5) were correct then the poly(A)+ mRNA from the Sprague-Dawley rats should have given a DNA fragment 45 bases shorter in the S1 nuclease experiments than obtained with the Wistar Furth mRNA. Similarly, in the primer extension experiments it would have been unlikely that the primer, which was identical in both the cDNA and gene, should have given the same size extended fragment using both Wistar Furth and Sprague-Dawley mRNA as templates. However, both fragments of identical length were detected with both types of mRNA in both the SI nuclease and primer extension experiments. These results then suggest the 18 nucleotides at the 5'-end of pEH52 represent a cloning artifact and not a strain difference.
The sequence GATAAA, which is analogous to the AA-TAAA polyadenylation identifier sequence, is found 15 bases upstream from the site of polyadenylation in both the cDNA and gene (Fig. 3). Also, downstream from the polyadenylation site in the gene are two sequences, GTGTGTG and TGTTTCTT (29), reported to be involved in the normal cleavage and polyadenylation of 3' termini of other cDNAs.
During the isolation of the epoxide hydrolase gene, no other genomic clones were identified to suggest the presence of more than a single functional microsomal xenobiotic epoxide hydrolase gene in Wistar Furth rats. The presence of a single gene in rats is also consistent with the apparent inability to differentiate multiple forms of this microsomal epoxide hydrolase by the use of different inducing agents (31). The observation that identical results were obtained in the S1 nuclease and primer extension studies (Fig. 4) even though RNA from rats treated with different inducing agents was utilized also supports this idea. Our data also suggest that the report by Guengerich et al. (32) of the purification of multiple forms of epoxide hydrolase from rat liver is most likely an artifact of the purification procedures (33). Using a monoclonal antibody to rat liver microsomal epoxide hydrolase, Telakowski-Hopkins et al. (34) also failed to identify more than a single form of epoxide hydrolase in rat liver microsomes.
Although microheterogeneity has been observed in microsomal epoxide hydrolase activity in different strains of mice (35), no evidence for multiple functional alleles of the epoxide hydrolase gene has been observed in rats during the cloning of the cDNA or gene. Oesch et al. (36) also previously exam-ined 22 strains of both inbred and outbred rats and did not observe genetic polymorphism in epoxide hydrolase activity. The sequence homology of the cDNA and gene as well as the lack of complexity of the restricted genomic DNA from the different rat strains (Fig. 9) is also consistent with a single allele coding for microsomal epoxide hydrolase in these rat strains. However, it is possible that other rat strains may possess an allele not represented in the outbred Sprague-Dawley rats. Rampersaud et al. (37) have demonstrated three isoforms of P450h in rats and only two of the isoforms were present in outbred Sprague-Dawley rats.
The presence of a single epoxide hydrolase gene in rat liver means that the promoter region of the gene must be capable of responding to the known inducers of microsomal epoxide hydrolase, such as phenobarbital, trans-stilbene oxide, and 2acetylaminofluorene (12-14). Dexamethasone has also recently been shown to decrease the transcription of the epoxide hydrolase gene (38). Differences that are observed in level of the response of different rat strains to the inducing agents of epoxide hydrolase (36) are probably then related to the level or activity of trans-acting regulatory factors. In the 5"flanking region of the gene (Fig. 3), sequences analogous to the TATA (CATAAAAA) and CCAAT boxes are present -22 to -29 and -132 to -136 base pairs, respectively, upstream from the cap site. Sequences showing greater than 85% identity to known enhancer core sequences (27,28) are also present in both the first intron and the 5"flanking region (Fig. 3). Furthermore, a 20-base pair indirect repeat (-121 to -141 bp) has also been identified in the 5"flanking region of the gene that may also have a regulatory function. The position of this sequence is interesting since it is located 10 bases 3' from the CCAAT sequence and 8 bases 5' of a possible enhancer sequence. The repeated sequence has two dissimilar bases in the middle and then a 9-base sequence (CCAGGTCAC) extending in both the 5' and 3' direction. Another point of interest is the ATG sequence located in nontranslated exon 1 (positions 77-79). Although identification and characterization of the sequences involved in regulating expression of the gene has not yet been accomplished, a fragment of the gene containing 160 bp of exon 1 and approximately 1000 bp of the 5'-flanking region of the gene is capable of promoting the transcription of the bacterial chloramphenicol acetyltransferase gene when cloned into pSVO-CAT (39) and transfected into H41IE C3-V rat liver hepatoma cells. 3 Hybridization and sequence studies revealed that the XEHP family of genomic clones represented a pseudogene of epoxide hydrolase. The observation that the pseudogene is nonfunctional is based on the fact that it lacks DNA sequences corresponding to exon 1 as well as all or part of several other exons (Fig. 8). The similarity in the hybridization pattern of the epoxide hydrolase cDNA to restricted genomic DNA from the different rat strains (Fig. 9) also suggests that the pseudogene is present in all three strains. The pseudogene appears to have arisen by gene duplication and subsequent deletion of one or more exons.