The Structure of the Rat Glutathione S-Transferase P Gene and Related Pseudogenes*

We have isolated the rat placental-type glutathione S-transferase (GST-P) gene from a X phage library using GST-P cDNA clone, pGP5 (Sugioka, Y., Kano, T., Okuda, A., Sakai, M., Kitagawa, T., and Muramatsu, M. (1985) Nucleic Acids Res. 13, 6049-6057), as a probe. The rat GST-P gene is about 3 kilobase pairs long and contains 7 exons and 6 introns, encoding the same GST-P protein specified by pGP5. The cap site maps 70 nucleotides upstream from the translation initiation site. The canonical promoter “TATA” box was found 27 base pairs upstream from the putative cap site. Two hundred base pairs upstream from the cap site are rich in G + C residues (61%), and the hexanucleotide sequence 5’-GGGCGG-3’ is found at position -47 to -42. We have also isolated several processed-type pseudogenes which were presumably originated by reverse transcription followed by insertion at target sites.

The Structure of the Rat Glutathione S-Transferase P Gene and Related Pseudogenes* (Received for publication, August 18, 1986)

Akihiko Okuda, Masaharu Sakai, and Masami Muramatsu
From the Department of Biochemistry, The University of Tokyo Faculty of Medicine, Hongo, Bunkyo-ku, Tokyo 113, Japan We have isolated the rat placental-type glutathione S-transferase (GST-P) gene from a X phage library using GST-P cDNA clone, pGP5 (Sugioka, Y., Kano, T., Okuda, A., Sakai, M., Kitagawa, T., and Muramatsu, M. (1985) Nucleic Acids Res. 13, 6049-6057), as a probe. The rat GST-P gene is about 3 kilobase pairs long and contains 7 exons and 6 introns, encoding the same GST-P protein specified by pGP5. The cap site maps 70 nucleotides upstream from the translation initiation site. The canonical promoter "TATA" box was found 27 base pairs upstream from the putative cap site. Two hundred base pairs upstream from the cap site are rich in G + C residues (61%), and the hexanucleotide sequence 5'-GGGCGG-3' is found at position -47 to -42. We have also isolated several processed-type pseudogenes which were presumably originated by reverse transcription followed by insertion at target sites.
In the course of an attempt to identify a specific pattern of gene expression during chemical hepatocarcinogenesis, we came up with a protein that increased dramatically in precancerous liver cells as seen by an O'Farrell-type two-dimensional gel electrophoresis (1). This protein, designated orginally as p26- 6.9 by it's molecular weight and PI, was found to be identical with the placental-type glutathione S-transferase(GST-P)' reported by Sat0 et al. (1-3). The glutathione Stransferases are a group of dimeric multifunctional proteins in drug biotransformation and xenobiotics metabolism (4-6). GST-P was first found in placenta, but later also in kidney, lung, and testis (3). The concentrations in these normal cells, however, are much lower than in precancerous and cancerous liver (2, 3). GST-P protein which is hardly detectable in normal rat liver becomes constitutively expressed in hyperplastic nodules and hepatocellular carcinomas at concentrations nearly two orders of magnitude higher in every foci examined irrespective of the kind of carcinogen used (1-3). This extremely high coincidence between the hepatocarcinogenesis and the derepression of this protein prompted us to study the induction mechanisms of this enzyme during cancerous changes of the liver cells. ' The abbreviations used are: GST-P, glutathione S-transferase P; kb, kilobases; bp, base pair; SDS, sodium dodecyl sulfate.
We have recently isolated a cDNA clone, pGP5, complementary to GST-P mRNA and determined the primary structure of this protein (7). We have also demonstrated that the dramatic increase in the enzyme activity and the protein of GST-P parallels with the amount of GST-P mRNA (7). The obvious next step is to isolate the genomic clones of this gene and study their structure and regulation. We report here the isolation and characterization of several genomic clones that hybridize to pGP5. We show that one of these clones is most likely the normal gene encoding GST-P and others are processed-type pseudogenes by several criteria.

EXPERIMENTAL PROCEDURES
Materials-Restriction enzymes, polynucleotide kinase, T4 DNA ligase, DNA polymerase (Klenow fragment), bacterial alkaline phosphatase, and nuclease S1 were purchased from Takara Shuzo (Kyoto, Japan), Sankyo (Tokyo, Japan), and Bethesda Research Laboratories and used according to the manufacturers' specifications. [y3'P]ATP (specific activity 7000 Ci/mmol) and [a-?*P]dCTP (specific activity 3000 Ci/mmol) were from New England Nuclear. Nick translation kit was purchased from Amersham Corp. A partial HaeIII + AluI rat genomic DNA library cloned into Charon 4A was a generous gift of Thomas Sargent, National Institutes of Health.
DNA Blot Hybridization-DNA isolated from Sprague-Dawley rat liver was digested by restriction enzymes, electrophoresed on a 0.8% agarose gel. The gel was soaked in 0.25 N HCl for 15 min a t room temperature and transferred to a nitrocellulose filter after alkali treatment and neutralization (8). Hybridization and washing were carried out as described for the plaque hybridization.
DNA Sequencing-Fragments of DNA digested with various restriction endonucleases were ligated to the replicative form of M13 mplO or 11 cleaved with restriction enzymes which produce the complementary ends for fragments (9). Competent Escherichia coli cells, strain JM105, were transfected with the ligated DNA, and pgalactosidase negative plaques were screened. Single-stranded phage DNA were prepared from these clones and used as template for the chain terminator sequencing procedure of Sanger et al. (10).
Preparation of RNA and 5' End Analysis-A total cellular RNA was prepared from an acetylaminofluorene-induced rat hepatoma using the guanidine thiocyanate extraction method (11). The location of the cap site of GST-P mRNA was determined by nuclease S1 protection mapping according to Berk and Sharp (12). A Hinfl fragment (101 hp) of GST-P genomic clone corresponding to 5' end was labeled at the 5' end with [y-?*P]ATP, denatured with 0.3 N NaOH, and loaded on an 8% polyacrylamide strand separation gel. Both strand-separated fragments were recovered from the gel and hybridized with 50 fig each of total hepatoma RNA in 0.5 M NaCl, 50 mM Tris-HC1, pH 8.0, 1 mM EDTA for 3 h a t 65 "C. The reaction mixture was then diluted 10-fold in ice-cold S1 buffer (0.25 M NaC1, 0.03 M sodium acetate, pH 4.6, 1 mM ZnS04, 100 pg/ml denatured salmon sperm DNA) containing 2000 units/ml of nuclease S1. The S1 digestion was performed at 30 "C for 60 min. After the reaction, DNA was precipitated by ethanol and analyzed on an 8 M urea, 10% polyacrylamide gel (13).

RESULTS AND DISCUSSION
Southern Blot Analysis of Rat Genomic DNA-To estimate the number of GST-P genes in the rat genome, we carried out Southern blot hybridization with probes I and I1 which cover the amino-and carboxyl-terminal portions of GST-P, respectively (Fig. Ut). These probes were hybridized to rat DNA digested with restriction endonuclease, SacI. Fig. 1B shows that five SacI fragments (5.9, 5.6, 3.8, 3.6, and 0.95 kb) of rat DNA hybridize to both probes I and 11, suggesting that these fragments contain the entire GST-P coding sequence. Five bands common to both probes were also detected with other enzymes, such as BamHI or BglII (data not shown). These results suggest that there are at least five gene sequences homologous to the GST-P mRNA in the rat genome. Solid lines represent pUC8 sequences. cDNA probes I and I1 were prepared by digestion of pGP5 (7) with the indicated restriction endonucleases and isolation of appropriate fragments from preparative agarose gel using DE81 paper. Restriction enzymes are abbreviated as follows: Bg, BgZI; Bs, BstEII; E, EcoRI; H, HindIII; S1, SalI. B, Southern blot hybridization patterns of rat DNA with GST-P cDNA fragments. Total DNA was extracted from a Sprague-Dawley rat liver and cleaved to completion by SacI. The digest (15 pg per lane) was electrophoresed on a 0.8% agarose gel and then transferred to a nitrocellulose filter (8). Hybridization with nick-translated DNA probe I and IZ (indicated on the top of each lane) and washing were performed as described under "Experimental Procedures." pGP5 (7), as a probe. Seven positive plaques were detected.
Two of them contained an identical insert. Six nonidentical clones were designated as ChGSTP11, 12, 22, 32, 62, and 71 and were further characterized. The restriction maps of these clones are shown in Fig. 2. Also shown in these maps are the regions that hybridized to pGP5. From the results of restriction mapping analysis, ChGSTP22 insert seems to be included in ChGSTP32, while the other clones show cleavage maps completely different from each other.
As discussed below, ChGSTP22 and 32 contain an active gene and the other clones contain processed-type pseudogenes.
Sequence Analysis of the Active Gene-3.8-kb BamHI fragment of ChGSTP22 containing the entire coding region was electrophoretically isolated for DNA sequencing. Several different restriction enzymes were used to yield small overlapping pieces of this fragment. These were then individually

TG ccri CCG TAC
-e t Pro Pro Tyr ACC ATT GTG TAC TTC CCA GTT CGA G GTAGGAGCTAATGACTGCATGAGGGGAGGTCTCGGGTAGGCTGCAGCTCTGTGGGGAACCTCCAGA Thr Ile Val Tyr Phe Pro Val Arg G -=

T G A G T G A C A C A C C T A G T G G A G G G G G G G C A G A G G T A
Val Val Thr Ile Asp Val Trp Leu Gln Gly Ser Leu Lys Ser Thr Cys = =    Asn Gly Lys Gln *** found in this genomic DNA sequence except for one nucleotide substitution. The nucleotide G at number 6 of cDNA (the first letter of the initiation codon being number 1) is changed to A at the corresponding position of genomic DNA (a dot is placed in Fig. 3). We regard the reason for this difference possibly due to strain or allelic polymorphism. The nucleotide occupies the third position of a codon and thus does not change the amino acid (proline). Second, this DNA sequence contains appropriate signals for RNA transcription, splicing, and polyadenylation as discussed below. Third, this DNA fragment is expressed in cultured hepatoma cells when transfected.* GST-P mRNA sequence in this gene is encoded in seven exons that are interrupted by six introns. All the introns have consensus splicing signals, GT---AG at their 5' and 3' boundaries, permitting assignment of their position in the coding sequence of GST-P.

Determination of the Transcription Initiation Site and Pu-
tative Regulatory Sequences for Transcription-To define precisely the start site for transcription, we performed a nuclease S1 protection mapping (12) (Fig. 4). 101-bp HinfI fragment ( Fig. 4B) was terminally labeled and then electrophoresed on a strand separation gel. Both strand-separated fragments were hybridized to total RNA extracted from acetylaminofluoreneinduced hepatocellular carcinoma cells and digested with nuclease S1. With one of the single-stranded DNA, S1-resistant fragments were observed, but with the other fragment DNA was completely digested with the enzyme. S1-resistant fragments were sized on a polyacrylamide gel in parallel with the chemical degradation products of the same DNA preparation (13). Fig. 4A shows that several protected fragments were generated, beginning from one of the bases in the sequence, 5'-TATTC-3', in the genomic DNA. However, the signal of the band corresponding to A was the strongest of all. Taken together with the fact that transcription of most eukaryotic mRNA begins with purine residues (14), the A of the TATTC sequence, 70 nucleotides upstream from the AUG translation initiation codon, is most likely the nucleotide to be capped in this mRNA. Multiple bands were probably produced by the nibbling of protected DNA as is frequently the case, although the possibility that the transcription starts from multiple sites cannot be ruled out completely. In the region preceding the putative cap site, the sequence TATAA-, the expected "TATA" box occurs between -23 and -27. On the other hand, CAAT sequence which is also frequently found in eukaryotic genes and is thought to be related to the promotion of transcription is not evident in this gene. Two hundred nucleotides upstream region from the transcription initiation site of this gene is GC-rich (61%) and there is a GC stretch located at position -50 to -37. This GC stretch has a hexanucleotide sequence, 5"GGGCGG-3' in it. This hexanucleotide sequence and inverted complement of this sequence, 5'-CCGCCC-3', are known to exist in a number of cellular genes as well as in the 21-bp repeats of SV40 genome and are demonstrated to interact with a cellular transcription factor, Spl (15-19). In most cases, several copies of these hexanucleotide sequences are present in each promoter region, but there is only one GGGCGG sequence between -398 and -1 of the rat GST-P gene. So whether this hexanucleotide sequence in this gene is related to transcriptional activation or not is presently unknown. However, this GC box completely fits the consensus   FIG. 4. A, nuclease S1 protection mapping of the 5' terminus of the rat GST-P mRNA. 101-bp HinfI fragment from genomic clone ChGSTP22 extending from -56 to +45 was terminally labeled and then strand-separated. The anticoding strand was hybridized with 50 pg of total RNA of hepatoma and digested with nuclease S1. The nuclease S1-resistant fragments were sized on a 10% polyacrylamide urea gel (lane c). The same fragment was subjected to chemical degradation at adenine and guanine residues (lane a) and thymine and cytosine residues (lane b). Arrows indicate the possible transcription start sites predicted from the size of protected fragments against nuclease S1 treatment. B, schematic representation of the probe for nuclease S1 mapping. The asterisk shows labeling position of the probe. Horizontal lines represent part of GST-P gene. Solid box indicates the first exon. Transcript is shown by a horizontal arrow.
through comparison of GC boxes of many Spl-responsive promoters (20).
Analysis of Other Clones Representing Four Pseudogenes-Since multiple loci were suggested by the genomic Southern blot analyses, we decided to analyze more clones that hybridized with pGP5 in order to characterize the nature of these loci.
ChGSTP11: 9.0-kb EcoRI fragment was subcloned in plasmid, pUC8. As hybridizing regions lay in 3.8-kb BamHI fragment, we isolated this fragment for DNA sequencing. Several different restriction enzymes were used to yield small overlapping pieces of this fragment. These fragments were then individually cloned into M13 mplO or 11 vectors and sequenced by the Sanger's dideoxynucleotide sequencing method (9, 10). Sequence data shown in Fig. 5 confirm that ChGSTPll is a pseudogene of the processed type, a type that has been proposed to be originated from reverse transcription of mRNA and reinsertion of a cDNA into the genome. It has no intron, carries a poly(A)-like stretch at the end of the 3'untranslated sequence which is possibly derived from poly(A) of the mRNA. In addition, the homologous sequence ends precisely at the limits of mRNA, and bordered by an almost complete direct repeat, which is presumably originated from a staggered break at the insertion site. Upstream and downstream sequences of pGP5-homologous region of ChGSTPll are completely different from corresponding regions of the active gene. Although ChGSTPll is highly homologous (91%) to the cDNA clone, this gene cannot be functional. This clone has an opal stop codon in the coding region, thus preventing the sequence from being translated. ChGSTP12: Southern blotting of EcoRI digest of this clone revealed that the 2.0-kb EcoRI fragment contains the entire hybridizing region. This fragment was isolated and subcloned in pUC8. There is one Hind111 site in this 2.0-kb EcoRI fragment. We sequenced from this HindIII site to both directions about 350 bp each. Sequencing data confirmed HindIII site of this fragment to correspond to that of pGP5. There is a poly(A)-like stretch from 273 bp downstream of the HindIII site, and no intron was present in sequenced 700 bp. Homology to pGP5 is about 90%, but several stop codons were found in every reading frame (data not shown).
ChGSTP62: Hybridizing 3.7-kb EcoRI fragment was subcloned in pUC8. This fragment has one HindIII site in it. We again sequenced from Hind111 site to both directions up to 350 bp each (data not shown). This clone also has many features of a pseudogene of processed-type, i.e. it has a poly(A)-like sequence and no intron. Homology of this clone to pGP5 is 91%. ChGSTP71 is also found to be a processedtype pseudogene by sequencing analysis (data not shown). Homology of this clone to pGP5 is 78%.
Relationship of Cloned Genes to Genomic Southern Blot Bands-Genomic Southern blot analyses show that at least five gene sequences exist in the rat genome that are closely related to pGP5 sequence. The existence of several loci which contain homologous sequences to pGP5 in human and mouse genomic DNA is also demonstrated by Southern blot analyses.' ChGSTPll contains a 5.9-kb SacI fragment which probably corresponds to the 5.9-kb band detected in Fig. 1. ChGSTP22 and 32 contain a 3.6-kb SacI fragment which is also detected in Fig. 1 as such. ChGSTP12 contains a 5.6-kb SacI fragment which appears in the genomic Southern blot (Fig. 1). However, ChGSTP62 does not contain the entire genomic Sac1 fragment hyhridizable to pGP5. To determine which SacI fragment detected in Fig. 1 corresponds to Ch-GSTP62, we searched for a unique portion between the left end of the insert and the first SacI site from it. We found that a 0.3-kb SacIIPstI fragment (probe B in Fig. 2) was unique in the genome and hybridizable to the 3.8-kb SacI fragment detected in genomic Southern blot in Fig. 1 (data not shown). We also searched for a unique portion in ChGSTP71 and found that 0.45-kb HindIIIISacI fragment, probe C in Fig. 2, was unique in the genome. Probe C was hybridized to 14-kb SacI fragment of rat DNA (data not shown). This band was not detected in Fig. 1, probably because the homology of ChGSTP71 to pGP5 was significantly lower than that of other clones. From these restriction maps and Southern blot analyses, it is clear that four clones are now obtained out of five loci detected in genomic Southern blot analysis on total rat DNA (Fig. 1). Hybridization signals of the band comprising ChGSTP22 and 32, which are now identified as the active gene, are the strongest of all and those of ChGSTPl1, 12, and 62, which are determined to be pseudogenes, are less strong. The signal of 0.95-kb SacI fragment is the weakest among five fragments detected in the genomic Southern blot, and a clone containing this fragment has not been obtained so far. Weakness of the signal of this fragment may well be due to the high degree of mismatches of the sequence. We have a clone, ChGSTP71, which does not correspond to any fragment detected in genomic Southern blot. Its hybridization signal was rather weak compared to other clones, suggesting that the locus corresponding to this clone may not be detected in genomic Southern blot analysis with cDNA. Although it is difficult to determine the exact number of GST-P gene family, these results strongly suggest that only one active gene of GST-P exists in rat genome. Further support to this conclusion is obtained in the following way. Total rat DNA was restricted with BamH1, EcoRI, or Sad, and Southern blot hybridization was performed with 0.6-kb SmaI fragment (probe A in Fig. 2 ) in the fifth intron of ChGSTP22 as a probe. A single band appeared €or rat DNA digested with any of these restriction enzymes (3.8-kb BarnHI, 4.8-kb EcoRI, 3.6-kb Sac1 fragments) (data not shown). These bands are predicted from the restriction map of ChGSTP22, and the intensity of the signal corresponded to that expected from a single copy gene.
The mechanism by which dramatic increase of GST-P protein occurs in hyperplastic nodule and hepatocellular carcinoma is presently unknown. Northern blot analysis shows that the content of GST-P mRNA in normal liver, hyperplastic nodules, and hepatocellular carcinoma is proportional to this protein content, showing a two orders of magnitude