DNA Sequence Abnormalities of Human Glucose-6-phosphate Dehydrogenase Variants*

Over 400 supposedly biochemically and genetically distinct variants of glucose-6-phosphate dehydrogenase (GGPD) have been described in the past. In order to investigate these variants at the DNA sequence level we have now determined the relevant sequences of introns of G6PD and describe a method which allows us to rapidly determine the sequence of the entire coding region of G6PD. This technique was applied to six variants that cause G6PD deficiency to be function- ally so severe as to result in nonspherocytic hemolytic anemia. Although the patients were all unrelated, iden- a a Val + Leu at amino acid 213. and identical to each other, viz. G + A at nt 1178 in exon 10 producing a Arg + His substitution at amino acid 393. G6PD Loma Linda had a C + A substitution at nt 1089 in exon 10, producing a Asn + Lys change at amino acid 363. The results confirm our earlier results suggesting that the NADP-binding site is in a small region of exon 10 and suggest the possibility that this area is also concerned with the binding of glucose-6-P. placed ice. electrophoresed standard 8% urea sequencing gels,

makes it possible to identify the mutations of variants with established kinetic abnormalities. For example, recently we were able to demonstrate that a unique class of GGPD variants characterized by a requirement for high levels of NADP to maintain enzyme stability had mutations clustered in a small region of exon 10 (5). Two strongly basic amino acids were identified as the presumed NADP-binding site.
In order to facilitate identification of the mutations in more of the over 400 variants that have been described to date it is necessary to develop more facile methods for sequence analysis. Accordingly, we have determined the relevant intron sequences and used the polymerase chain reaction (PCR) to amplify segments of DNA that encompass the entire coding * This work was supported by National Institutes of Health Grants HL25552 and RR00833 and by the Sam Stein and Rose Stein Charitable Trust Fund. This is Publication 6479-MEM from the Research Institute of Scripps Clinic. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) The abbreviations used are: GGPD, glucose-6-phosphate dehydro-

M57553-M57563.
genase; PCR, polymerase chain reaction; nt, nucleotide. region of GGPD. We now describe this technology and apply it to the identification of mutations from six additional patients with hereditary nonspherocytic hemolytic anemia due to GGPD deficiency.

MATERIALS AND METHODS
Determining Intron Sequences 25 pg of genomic DNA was digested to completion with EcoRI. It was phenol/chloroform-extracted and ethanol-precipitated and redissolved in 50 p1 of 5 mM Tris, pH 7.5, with 0.1 mM EDTA. TWO hundred sixty-eight ng of the DNA was ligated to 1 pg of hGTlO arms having a EcoRI ligation site (Promega, Madison, WI). The phage was packaged using Gigapack Gold (Stratagene, La Jolla, CA) according to the manufacturer's directions and plated with top agarose after infection into K802 bacteria. The library obtained with this material contained 3 x lo6 phages. It was screened without amplification using cDNA probes. Two clones were purified and subcloned in a plasmid (Bluescript, Stratagene) for sequencing according to standard procedures. One contained exons 3-6 and one contained exons 7-13 (6).
The clones were sequenced using Sequenase (U. S. Biochemical Corp.) priming with oligonucleotides to known exon sequences (2) to determine intron sequences. Only one strand was read; ambiguous nucleotides were recorded as "N." We did not clone the fragment containing exons 1 and 2, but obtained the relevant intron sequences through the kindness of Dr. A. Yoshida (City of Hope Medical Center).

Biochemical Characterization of Deficient GGPD Variants
G6PD from washed erythrocytes of six unrelated male patients with hereditary nonspherocytic hemolytic anemia due to G6PD deficiency was partially purified and characterized according to the standard methods recommended by a scientific group of the World Health Organization (7). K,,, values and their standard errors were computed using the program published by Page (8). Reactivation of partially purified G6PD Nashville was performed at room temperature using a solution containing 200 &M NADP', 10 mM MgC12, pH 7.0, as previously described (9).

DNA Coding Sequence Analysis
Amplification-DNA was isolated from EDTA-anticoagulated blood using standard methods. The G6PD gene was amplified in five segments using PCR in two stages to give single-stranded DNA. The PCR reaction contained 34 mM Tris-HC1, pH 8.8,8.3 mM ammonium sulfate, 3.4 mM MgC12, 85 pg/ml bovine albumin, 5% dimethyl sulfoxide, 1 pg of genomic DNA, 0.5 mM of each dNTP, 5 units of Taq polymerase, and oligonucleotides as specified in Table I. For the first stage 50 ng of the sense and antisense oligonucleotides were used for 15 cycles (92 "C for 30 s, 58 "C for 30 s, and 72 "C for 30 s). The second stage of amplification, designed to make single-stranded DNA, was initiated by adding 300 ng of the appropriate nested oligonucleotide (Table I) to the PCR reaction and continuing with 35 additional cycles of the same cycling program. Due to high T/A content of the nested oligonucleotide to make the antisense strand of exon 2 an annealing temperature of 50 "C was used instead of 58 "C. Upon completion of the PCR, the reaction was phenol/chloroform-extracted, chloroform-extracted, sodium acetate was added to a final concentration of 0.2 M, and the mixture was precipitated with 2 volumes of ethanol. The DNA was recovered by centrifugation and lyophilized to dryness. It was then dissolved in 28 pl of water.
Purification of Single-stranded PCR DNA by Spin Dialysis-A DNA Sequence Abnormalities of Human GGPD Variants modification of a protocol obtained from Caput' was used to purify the single-stranded DNA. Glass beads (212-300 pm, Sigma) were washed with 1 N HC1, thoroughly rinsed with sterile water, and resuspended in sterile 10 mM Tris, pH 7.5, 0.1 mM EDTA to cover the beads by 1-2 cm. Bio-Gel P-10 beads (200-400 mesh, Bio-Rad) were suspended in 10 mM Tris, pH 7.5, 0.1 mM EDTA, autoclaved for 15 min, and after cooling made to a 50% suspension. Tubes were prepared by puncturing the bottom of a 0.5-ml microcentrifuge tube with a 25-gauge needle. Each tube was then suspended inside a 1.5ml microcentrifuge tube from which the bottom had been removed with a knife blade. One drop of glass beads was put into the bottom of the 0.5-ml tube, and the remainder was filled completely with the 50% Bio-Gel P-10 beads. The tube was suspended in a 12 X 100-mm glass tube and centrifuged at room temperature at 400-500 X g for 3 min. The solution of amplified single-stranded DNA was immediately layered on top of the bed of beads, and the 0.5-ml tube was then suspended in an intact 1.5-ml tube and again centrifuged at the same speed for 5 min. Under these conditions the DNA passes through the column into the 1.5-ml tube leaving behind the free trinucleotides. Sequencing-100 pmol of sequencing oligonucleotides specified in Table I were labeled using polynucleotide kinase and 2 pl of [-y-"P] ATP (800 Ci/mmol) in a 25-pl system. The 3'P-labeled oligonucleotide was purified by spin dialysis as above. The purified single-stranded DNA was sequenced using a commercially available Taq polymerase sequencing kit (Promega) with the following modifications. 10-20 pl of the PCR DNA and 5 pl of 5 X buffer were diluted to 25 pl, 1 pl of 4 pmollpl labeled sequencing oligonucleotide was added, and the tube was incubated at 50-55 "C for 1% min and transferred to a 72 "C water bath. The reaction mixture was aliquoted into the termination mix tubes and Taq polymerase added as directed by the manufacturer.
The reaction was allowed to proceed at 72 "C for 5-7 min. The stop solution was then added and the samples were placed on ice. The mixtures were electrophoresed on standard 8% urea sequencing gels, dried, and radioautography was performed.

RESULTS '
Intron Sequences-The intron sequences are presented in Fig. 1.
Biochemical Characteristics of the GGPD Variants-The characteristics of the mutant G6PDs studied are summarized in Table 11. All of the variants were extremely thermolabile.
G6PD Nashville was sufficiently labile even at 0 "C that it proved to be rather difficult to characterize, much of the activity being lost while the characterization was under way. We studied the capacity of NADP to reactivate G6PD Nashville because of its marked instability and when we realized that its mutation was very near those of other variants that had been shown to undergo reactivation with high concentrations of NADP (5). There was a 10-fold loss of activity during purification, another 25-fold loss during 3 h of dialysis against a buffer containing 10 p~ NADP, and the activity of the mutant enzyme decreased 3-fold after storage in the cold in the presence of 10 p~ NADP. It could be activated about 15fold by incubation at 25 "C for 3 h in a solution containing 200 p~ NADP (9). The results of these studies are presented in Fig. 2.
We found that the kinetic characteristics of reactivated G6PD Nashville differed from that of the unreactivated enzyme. Before reactivation the K,,, of the partially purified mutant enzyme for NADP was 16 f 1.0 ptM (mean f S.E.) and that for glucose-6-P was 87 f 8.8 p~. After incubation with NADP the K,,, for NADP had fallen to 3.8 f 0.13 pM and that for glucose-6-P had risen to 120 f 1.4 p~. The lower precision of the estimates before reactivation were due, of course, to the meager amount of activity available for estimations of the K,,, values.
Coding Sequences-The sequence of the entire coding regions of G6PD Minnesota, Anaheim, Nashville, and Loma Linda were determined. Problems in reading the sequences D. Caput, personal communication.
were encountered only at nucleotides 1349-1353 when sequencing with an antisense primer. The sequence was read with much less difficulty when using the sense primer. A C/G-rich region in exon 5 was sometimes difficult to interpret, and in these cases addition of 5% dimethyl sulfoxide to the sequencing reaction was helpful.
Except for single base pair substitutions, as shown in Table  11, each corresponded exactly to the consensus coding sequence (15) as corrected for the error in nt 33 (16). Each had a C at the polymorphic site at nt 1311 and a G at the polymorphic site at nt 1116. In the case of G6PD Marion and Gastonia sequencing was carried out only until a point mutation producing an amino acid change had been encountered.
G6PD Marion, Gastonia, and Minnesota each had identical mutations, a G + T at nt 637 leading to a Val + Leu substitution at amino acid 213. The mutations of Nashville and Anaheim were identical to each other, uiz. G + A at nt 1178 producing a Arg + His substitution at amino acid 393.
G6PD Loma Linda had a C + A substitution at nt 1089, producing an Asn + Lys change at amino acid 363 (Fig. 3). None of these mutations created or destroyed restriction sites; each putative mutation was verified by sequencing both strands.

DISCUSSION
It is likely that all or nearly all of the human G6PD variants have defects in the coding region, since the residual enzyme virtually always has demonstrable abnormal properties; thus it is unlikely that regulatory mutations, mutations that should produce decreased quantities of normal enzymes, will be found. Indeed, all of the variants that have been sequenced to date have been found to have point mutations in the coding region (5,(16)(17)(18)(19)(20). Therefore, to aid in determining the sequence of additional variants, we have developed a facile technique of sequencing the complete coding region of human G6PD from single-stranded DNA produced by a PCR strategy. Such a strategy depends upon determining the sequence of introns sufficiently distant from the intron/exon junction so as to avoid consensus sequences that might be present in many genes and therefore provide multiple amplified segments. The sequences that we have determined ( Fig. 1) served as a basis for the amplification of the coding sequence of G6PD. They agree quite well with the more limited intron sequences recorded by Viglietto et al. (11).
The six variants we have studied here are no exception to the rule that point mutations cause G6PD variants. Three were found to have a point mutation at nt 637, one at nt 1089, and the other two at nt 1178.
Since several of the variants selected for this study manifested an increased K,,, for glucose-6-P we thought that they might identify the glucose-6-P-binding site. Lysine 205 reacts with pyridoxal phosphate (21, 22), and it has been suggested that because glucose-6-P inhibits this reaction this amino acid may represent the binding site of this substrate. Only seven amino acids separate the Marion/Gastonia/Minneapolis mutation from this reactive lysine. However, the increase in the K, of variants with this mutation was borderline. It is interesting and somewhat puzzling that the conservative Val + Leu substitution had a marked effect upon the stability of the enzyme and that red cells from individuals who inherited this abnormality manifested a marked impairment of their intravascular survival without any drastic changes in kinetic properties. In contrast, G6PD Mediterranean which manifests a much more severe degree of deficiency is not associated with chronic hemolytic anemia. It is possible that the slightly greater bulk of the leucine interferes with the proper folding The primers used in sequencing the entire coding region of G6PD are shown. The first stage oligonucleotides are used to amplify by 15 PCR cycles double-stranded DNA including the exons indicated in the first column. Single-stranded DNA is produced by adding the second stage oligonucleotides and continuing PCR for an additional 35 cycles. Sequencing is then carried out using the sequencing oligonucleotides shown. Many of the sequencing oligonucleotides are identical with the second stage primers. Details of the methodology are given in the text. of the G6PD molecule and renders it very unstable. Thus, it may be that even though the mean activity of circulating red cells is higher than that of G6PD Mediterranean, the older members of the red cell population become totally devoid of enzyme activity and thus perish in the circulation. The second mutation detected is in the region of exon 10 that we have previously identified as a NADP-binding domain (5). The two examples of this variant, G6PD Nashville and G6PD Anaheim, both had a high K, for NADP. More impressive was the very high K , for glucose-6-P manifested by these variants. A similarly high K, for glucose-6-P was pre-viously observed in the case of G6PD Riverside, which is due to a mutation only 17 amino acids downstream from these variants. We had previously noted (5) that the mutations at amino acids 386 and 387, although resulting in the loss of a positive charge, were anomalously associated with slower than normal migration toward the anode. We interpreted this as support for the concept that the enzyme normally contained bound NADP, contributing to its negative charge, and that this binding was lost in these mutations. The same phenomenon is evident in the case of G6PD Nashville/Anaheim. A positively charged arginine has been lost, and yet the migra- The central unsequenced portion of the introns is denoted by . . . . Table I were based upon these sequence data. The sequence of intron 2 was kindly provided to us by Dr. A. Yoshida. More limited GGPD intron sequence data have also recently been published by Viglietto et al. (11). tion of the enzyme is essentially unaltered. This finding is consistent with the possibility that the arginine 393 is involved in the binding of one of the phosphates of NADP. The marked instability of G6PD Nashville, even in the cold, and the nearness of its mutation to the cluster of mutations that produce an enzyme that is reactivated by NADP (Fig. 3) led us to explore the effect of incubation with 200 PM NADP on the activity of this enzyme. Striking reactivation was observed (Fig. 2). The reactivated enzyme had kinetic properties that were significantly different from those of the enzyme that had been allowed to lose activity. The K,,, NADP became normal-?s of Human GGPD Variants ized while the K,,, glucose-6-P became even higher. Thus, the same region of the molecule that seems to be involved in NADP binding is also associated with a decreased affinity of the enzyme for glucose-6-P. However, a review of the characteristics of published G6PD variants (12) indicates that a relationship between the K,,, for glucose-6-P and for NADP is generally not manifested. It is entirely possible that in the native folded state of the enzyme the NADP-binding area together with the glucose-6-P binding area forms a site that binds both substrates and thus facilitates the movement of an electron from glucose-6-P to NADP. The crystal structure of the enzyme has not yet been solved. G6PD Loma Linda is characterized by a mutation that is only 22 amino acids upstream from the putative NADP-binding site of the enzyme. The K, of this enzyme for NADP was modestly increased, but at the time that this enzyme was available to us its possible reactivation with NADP was not investigated.

The amplifying primers shown in
It is interesting and somewhat surprising that among six unrelated patients from different parts of the country only three different point mutations were found. All of the deficient patients had hereditary nonspherocytic hemolytic anemia, a disorder which would subject the individual to adverse selection. Such disease-producing mutations are therefore relatively rapidly eliminated from the population, and a high proportion of those that are encountered represents new mutational events. The mutation at nt 1178 represents a change from CpG to CpA, which can result from the common CpG to TpG mutation (23-27) on the opposite DNA strand. Thus, one might postulate that a hot spot giving rise to recurrent mutations is present at this location. However, the mutation at nt 637 did not involve the loss of a CpG dinucleotide. We documented similar recurrent mutations in sequencing the DNA of seven unrelated patients with G6PD that was reactivated by NADP; only four different mutations were found (5). In the case of polymorphic variants of GGPD, too, less variability than had been expected has been encountered. With very few exceptions G6PD-deficient persons in the Mediterranean area have the same mutation at nt 563 (5,11,(28)(29)(30), and the same double mutations at nt 202 and nt 376 are present in many parts of the world in persons whose G6PD had been thought to be unique on the basis of biochemical characterization (12,16,(31)(32)(33).
In some cases G6PD variants that appeared to be very different when characterized by standard biochemical methods (7) have proven to be the same on sequence analysis. This has been true especially when the variants were characterized in different laboratories but is the case even when the same investigators characterize different samples a t different times (33). The present investigation sheds some light on these discrepancies. The kinetic properties of the variants that are reactivated by NADP change markedly during reactivation, and the reciprocal change presumably occurs when the enzyme is becoming inactivated during purification. Thus, the properties that are documented may to a large extent be a function of the extent to which the enzyme has become altered at the moment that its characteristics are being determined. This may account for the fact that the reported properties of G6PD Genova (lo), a recently reported variant, are completely different from those of G6PD Beverly Hills (5), one of the NADP-sensitive mutants. Yet, both have been found to have identical mutations at the DNA level.3 It is also important to realize that the partially purified enzyme samples that are characterized are from blood samples drawn from patients who are usually at a considerable distance from the laboratory "A. Argusti, A. Ahluwalia, P. Mason, and L. Luzzatto, personal communication.  The properties of G6PD Gastonia, Minnesota, and Loma Linda had been included in previous compilations of variants (12). G6PD Marion, Anaheim and Nashville are not previously studied. At the time that G6PD Marion was characterized the similarity of its properties to those of G6PD Gastonia were recognized, but since the patients were unrelated it was regarded quite possibly that they were the results of different mutations. The values given for G6PD Nashville are those on the enzyme that had not been reactivated with NADP. The methods used are those recommended by a World Health Organization scientific group (7), and the normal values are taken from Yoshida et al. (13,14).

Electrophoresis GGPD variants
substitution V a P + Leu V a P + Leu Val213 + Leu Ar$y3 + His + His Am"' + Lys EDTA-Borate-Tris. N, normal. where the studies are performed. The time that elapses between the obtaining of the sample and its characterization may vary from a few hours to a week, and during this time and during the 4-24 h required to purify and characterize the enzyme, there is ample opportunity for minor proteolytic changes and for changes in the folding of the protein to occur.

Reactivation Time (hrs)
Thus, we are forced to the conclusion that GGPD variation is not nearly as great as was once believed. Further investigation of the actual degree of heterogeneity should be speeded by sequence analysis of additional variants.