Identification of a second mutation in the protein-coding sequence of the Z type alpha 1-antitrypsin gene.

This study reports the entire nucleotide sequence of the protein coding region sequence of the alpha 1-antitrypsin (alpha 1AT) Z gene, a common form of the alpha 1AT gene associated with serum alpha 1AT deficiency. In addition to Glu342 to Lys342 mutation in exon V which has been previously identified by peptide analysis, another point mutation (GTG to GCG in exon III) in the gene sequence predicts a second amino acid substitution (Val213 to Ala213) in the Z protein. This Val213 to Ala213 mutation was confirmed to be a general finding in Z type alpha 1AT gene by evaluating genomic DNA from 40 Z haplotypes using synthetic oligonucleotide gene probes directed toward the mutated exon III sequences in the Z gene. Furthermore, the exon III Val213 to Ala213 mutation eliminates a BstEII restriction endonuclease site in the alpha 1AT Z gene, allowing rapid identification of this Val213 to Ala213 substitution at the genomic DNA level. Surprisingly, when genomic DNA samples from individuals thought to be homozygous for the M1 gene (the most common alpha 1AT normal haplotype) were evaluated with BstEII, 23% of the M1 haplotypes were BstEII site negative, thus identifying a new form of M1 (i.e. M1(Ala213], likely identical to M1 but with an isoelectric focusing "silent" amino acid substitution (Val213 to Ala213). Although the relative importance of the newly identified exon III Val213 to Ala213 mutation to the pathogenesis of the abnormalities associated with the Z gene is not known, it is likely that M1(Ala213) gene represents a common "normal" polymorphism of the alpha 1AT gene that served as an evolutionary intermediate between the M1(Val213) and Z genes.

This study reports the entire nucleotide sequence of the protein coding region sequence of the alpha 1antitrypsin (alAT) Z gene, a common form of the alAT gene associated with serum alAT deficiency. In addition to Glu342 to LysS4' mutation in exon V which has been previously identified by peptide analysis, another point mutation (GTG to GCG in exon 111) in the gene sequence predicts a second amino acid substitution (Va1213 to Ala213) in the Z protein. This Va1213 to Ala213 mutation was confirmed to be a general finding in Z type a1AT gene by evaluating genomic DNA from 40 Z haplotypes using synthetic oligonucleotide gene probes directed toward the mutated exon I11 sequences in the Z gene. Furthermore, the exon I11 Va1213 to Ala213 mutation eliminates a BstEII restriction endonuclease site in the alAT Z gene, allowing rapid identification of this Va1213 to Ala213 substitution at the genomic DNA level. Surprisingly, when genomic DNA samples from individuals thought to be homozygous for the M1 gene (the most common a l A T normal haplotype) were evaluated with BstEII, 23% of the M1 haplotypes were BstEII site negative, thus identifying a new form of M1 (Le. M1(Ala2I3)), likely identical to M1 but with an isoelectric focusing "silent" amino acid substitution (Va1213 to Ala213). Although the relative importance of the newly identified exon I11 Va1213 to Ala213 mutation to the pathogenesis of the abnormalities associated with the Z gene is not known, it is likely that M1(Ala213) gene represents a common "normal" polymorphism of the alAT gene that served as an evolutionary intermediate between the M1(VaP3) and Z genes.
Alpha l-antitrypsin (a1AT') is an antiprotease that functions primarily as an inhibitor of neutrophil elastase, an * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 50261

4.
4 To whom reprint requests should be addressed Bldg. 10, Rm 6D03, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892.
For clarity, we have adopted the following nomenclature. If the IEF is M1 but the sequence unknown, the haplotype is referred to as M1. If the genomic DNA sequence around the 213 region is examined (e.g. by the restriction endonuclease BstEII or with oligonucleotide probes), M1(VaP3) and M1(Ala213) will be used to identify the two haplotypes (i.e. the IEF pattern is M1, but the two proteins differ at residue 213.) Other abbreviations used are: kb, kilobase pairs; IEF, isoelectric focusing. omnivorous protease capable of destroying most forms of connective tissue (1). Coded for by a 10.2 kb at the chromosomal segment of 14q31-32 (2), the a1AT gene is comprised of five exons and four introns (3). The a1AT gene is expressed in liver hepatocytes and in mononuclear phagocytes as a 1.75kb mRNA that is translated and secreted into the rough endoplasmic reticulum as a 418-amino acid precursor protein containing a 24-residue signal peptide (3)(4)(5)(6). In the rough endoplasmic reticulum, N-linked carbohydrates are added to each of 3 asparaginyl residues, the protein is translocated to the Golgi where the high mannose carbohydrates are trimmed, and the glycosylated a1AT is secreted as a mature protein of 394 amino acids (5)(6)(7)(8). The mature protein circulates in the plasma with a half-life of approximately 5 days (9, lo), and it diffuses into all tissues where it functions to inhibit neutrophil elastase.
In contrast to the "family haplotypes, the Z haplotype is associated with low plasma levels of d A T , i.e. "alAT deficiency" (11,13). Typically, individuals homozygous for the Z protein have a1AT levels 10-15% of normal (11). The Z gene represents 1-2% of all a1AT haplotypes of individuals of European descent (14,15). Importantly, the ZZ homozygous state is associated in children with neonatal hepatitis, cholestasis, and cirrhosis, and in adults, with emphysema developing by ages 30-40 (11,(18)(19)(20). The emphysema is thought to develop because there is insufficient a l A T available to protect the fragile alveolar structures from their burden of neutrophil elastase; as a result there is slow, progressive destruction of the alveolar walls from the uninhibited elastase (19).
Since the observation in 1976 that a tryptic digest of the Z protein contained a single amino acid substitution (MI G~u~~ to Z LYS~~'), it has been assumed that this substitution is responsible for the deficiency and associated clinical manifestations of the ZZ homozygous state. As part of a general evaluation of a l A T gene structure associated with a l A T deficiency, we have cloned and sequenced the Z a l A T gene.
To our surprise, we found that in addition to the "classic" exon V G~u~~~ to Lys342 substitution, the Z gene contains a second amino acid mutation (exon 111, Va1213 to Ala213). Fur-thermore, through the evaluation of genomic DNA of what were thought to be a1AT M1 homozygotes, we have identified a new, common form of a1AT type M1 which shares the VaP3 to Ala213 mutation with the Z gene but has the same G~u~~~ sequence as the common M1 gene.

EXPERIMENTAL PROCEDURES
Sources of Genomic DNA-Genomic DNA was isolated from white blood cells of individuals with various alAT phenotypes by the method of Jeffreys and Flavell (21). The a1AT phenotypes were identified by a combination of serum IEF, serum a1AT levels, and family studies (13,22). The alAT serum levels were measured by radial immunodiffusion using the commercial standard (Behring Diagnostics). In addition to the alAT heterozygote M3Z used for the cloning of the Z gene, genomic DNA was evaluated from 26 individuals with the serum phenotype MlMl and 20 individuals with the serum phenotype ZZ. As controls, DNA was evaluated from individuals with haplotypes M2 (n = 18), M3 (n = 6), and S (n = 7).
Cloning and Sequencing of the Protein-coding Sequence of the Z Type alAT Gene-Using complete EcoRI digestion, a 10-kb EcoRI fragment of genomic DNA from an individual with the alAT phenotype M3Z encompassing the entire protein-coding regions (exons 11-V) of the alAT gene (3) was cloned into XgtWES as described previously (23). The Z and M3 clones were identified by hybridization with 19-mer oligonucleotide'gene probes specific for the DNA sequences complementary to the amino acid sequences centered about Lys3" (Z gene) and G1u3" (M3 gene) (23, 24). The 10-kb Z clone was digested into three fragments with PstI (1.6 kb containing exon 11, 2.4 kb containing exons 111 and IV, and 1.1 kb containing exon V) and subcloned into pUC13. The double-stranded plasmid DNA with the insert was directly sequenced by the dideoxynucleotide chain termination method using bidirectional primers (25); 12,15-mer oligonucleotides were used to cover the sense sequence and 13,15-mer oligonucleotides to evaluate the antisense sequence of exons 11-V and neighboring intron regions (3).
Evaluation of alAT Genes for Restriction Fragment Length Polymorphisms-After sequencing of the Z a1AT gene demonstrated a mutation in exon I11 at residue 213 (see "Results"), it became apparent that this mutation should result in a loss in a restriction site for the endonuclease BstEII (GJGTNACC). To evaluate this, a 2.4-kb region encompassing exons I11 and IV of the cloned Z gene was placed in pUC13, and a 0.95-kb fragment encompassing the entirety of exon I11 was isolated. This exon I11 probe included the sequence of the Z gene from a PstI site 5' to exon I11 to a BstEII site just 3' to exon 111. After labeling with [cx-~'P]~CTP by nick translation (261, this exon I11 probe was used to evaluate genomic DNA (5 rg/lane) digested with restriction endonucleases PstI and BstEII according to the manufacturer's recommendations and analyzed by the Southern procedure.
Detection of Single Point Mutations Using Oligonucleotide Probes-To determine whether the two mutations found in the cloned Z type alAT gene were universally present in all a1AT genes of a population of individuals who appear to be Z homozygotes by conventional criteria utilizing serum a1AT analysis, 19-mer oligonucleotide probes were constructed complementary to the two Z gene mutations coding for amino acid substitutions (exon I11 M1 Va1213 to Z Ala213 mutation with the probes G-GAC-CAG-GTG-ACC-ACC-GTG for the M1 gene and G-GAC-CAG-GGG-ACC-ACC-GTG for the Z gene; exon V M1 Glu3" to Z Lys342 mutation with the probes AAC-ATC-GAC-GAG-AAA-GGG-A for the M1 gene and ACC-ATC-GAC-UG-AAA-GGG-A for the Z gene). The genomic DNA to be evaluated was cut with the restriction endonucleases EcoRI and BglI; together these two enzymes conveniently cleave the alAT gene such that each of the five exons are contained within each of five different size DNA fragments (23). All oligonucleotide probes were labeled with incorporation of [32P]dNTPs by Escherichia coli DNA polymerase (Klenow fragment) using reverse complementary templates and a small 8-mer primer (27). The evaluation of various genomic DNA samples using these oligonucleotides was carried out as previously described (23), except that the washing step at 55 "C for 3 min was omitted with the exon I11 probes.

RESULTS AND DISCUSSION
Soon after the discovery of alAT deficiency in 1963 (28), it was recognized that the Z protein had a different charge than what was then called the M protein (29). In 1976, a tryptic peptide of the Z protein was found to differ from the corresponding peptide of the M protein by a loss of a glutamic acid residue and an addition of a lysine (30). Subsequent sequencing of an 8-amino acid segment of this region of the M and Z proteins confirmed the Glu to Lys substitution (31).
Finally, when the entire M1 protein was sequenced by Carrel1 et al. (32), the partial human cDNA and the entire baboon cDNA by Kurachi et al. (33), and the entire M1 cDNA by Long et al. (3), it became apparent that the involved glutamic acid was situated at residue 342. The universality of the G~u~~ to Lys342 (GAG to M G in the genome) difference in all Z genes was confirmed at the genomic level by Kidd et al. (24) and by Nukiwa et al. (23) using 19-mer oligonucleotide probes centered about the G~u~~~ to Lys342 substitution.
With this as a background, despite the fact that more than 60% of the Z protein or gene had not been sequenced (12,31,34), it has been generally assumed that the G~u~'~ to Lys342 substitution was the only difference in the primary structure between the Z and M1 proteins (32). However, when we sequenced the entire protein-coding exon region of the Z a1AT gene, we found that in addition to the classic exon V G~u~~~ to Lys342 substitution, the Z gene contained a second substitution (exon 111, VaP3 to Ala213; Fig. 1). In addition, the Z gene contains a silent base change in exon I1 (AAG to AAA) that codes for L Y S '~~ in both the M1 and Z proteins.
Inheritance of the Z a1AT gene has several consequences: 1) the Z protein aggregates in the rough endoplasmic reticulum of the alAT secreting cells (35, 36); 2) there is a reduced rate of secretion of the molecule by these cells (6,(37)(38)(39); 3) the plasma levels of alAT are markedly reduced (28); and 4) the Z protein does not function as well as an inhibitor of neutrophil elastase (40). All available evidence suggests that in the ZZ homozygous state that the Z gene is transcribed in a normal fashion, that a1AT synthesizing cells have normal levels of d A T mRNA, and that the Z type mRNA can be translated in a normal fashion (6,(37)(38)(39). However, studies with liver and mononuclear phagocytes of such individuals have shown that these cells secrete less alAT than those of normals (6,39). Consistent with this fact, light microscopic evaluation of biopsies of liver of ZZ individuals demonstrates intracellular accumulation of alAT, and transmission electron microscopic evaluation of these specimens has shown that the alAT accumulates in the rough endoplasmic reticulum (36). Furthermore, evaluation of the intracellular form of a1AT recovered from such livers demonstrated that it contains "high mannose" carbohydrate side chains (41). Together, this evidence has led to the concept that as the Z protein is produced, N-linked carbohydrates are normally added. However, liver accumulation and the plasma deficiency associated with the homozygous Z state result from a decreased rate of folding of the high mannose form of alAT in the rough endoplasmic reticulum, allowing hydrophobic residues in adjacent molecules to interact, leading to aggregation. Interestingly, those Z type alAT molecules that are translocated to the Golgi undergo normal trimming of the carbohydrate side chains, and such molecules are normally secreted (41) and have a normal circulating half-life (9). However, a recent study by Ogushi et al. (40) has demonstrated that the Z type molecule has a significantly reduced association rate constant for neutrophil elastase. In this context, in addition to the fact that the ZZ homozygous state is associated with a marked reduction in a1AT levels, on the average, the Z type molecule takes longer than does the M type molecule to inhibit an equivalent amount of neutrophil elastase.
The relative importance of the newly identified VaP3 to Ala mutation compared to the classic G~u~~' t o L Y S~~' mutation to each of these abnormalities associated with the Z protein is not known. The normal alAT protein has been crystallized and its three-dimensional structure determined, but the threedimensional structure of the Z protein has not been evaluated (42). In the M protein structure, the GW4' residue is located in sheet A strand 5 and ValzL3 residue at the turn of segment 202-223 which forms a strongly twisted, double-stranded antiparallel ladder. It has been hypothesized that the G~u~~~ to Lysq4' substitution results in a loss of a critical salt bridge (Glu"' to LysZgo) which has an effect on the rate folding of the inhibitor, perhaps explaining the reduction in the rate of three-dimensional folding of the Z protein in the rough endoplasmic reticulum. The Va1213 does not appear to participate in any critical salt bridges, nor does it appear in the threedimensional structure near the active site a t Met3". It is, however, reasonably close (in the tertiary structure) to and hence to a carbohydrate attachment site (42). Whether this has any consequence to the intracellular handling of the molecule, or whether the Va1213 to Ala213 substitution (or the G~u "~~ to Lys"' substitution) has any affect on the association rate constant of the interaction with neutrophil elastase, is unknown.
Despite the fact that the importance of the VaP3 to Ala2I3 substitution is not known, the knowledge of its presence has led us to the identification of a previously unrecognized, but common polymorphic form of the normal M1 gene. Evaluation of the normal M1 gene sequence in the V a P 3 region revealed that the endonuclease BstEII normally cuts in the sequences in exon I11 coding for the amino acids Gln212-Va1213-Thr214. Theoretically, however, with the substitution GTG to GCG ( V a P to Ala213) in the Z protein, this BstEII restriction site would be lost. Evaluation of genomic DNA from individuals homozygous for the M1 gene and those homozygous for the Z gene demonstrated this to be the case. In this context, if the Z gene is cut with PstI and BstEII, there is no BstEII site in exon 111, and thus a single 0.95-kb fragment is generated that can be detected with an exon I11 probe (Fig. 2, lane I). In contrast, in the M1 gene, the presence of the exon I11 BstEII site leads to the generation of the 0.72-kb fragment (Fig. 2, lane 2; a 0.23-kb fragment is also generated, but it does not appear on the autoradiogram because it does not bind efficiently to the filter). We initially thought this loss of a restriction site associated with the Z gene would be useful as a method to uniquely identify the Z gene from the common "family haplotypes. However, in evaluating this hypothesis, we soon realized that a significant proportion of genomic DNA samples that had been identified as being M l M l homozygotes by conventinal criteria (13) could be further subgrouped depending on whether they contained or did not contain the BstEII restriction site. In this context, some M l M l samples contained the BstEII site (Fig. 2, lane 2), while others were homozygous for the absence of this site (Fig. 2, lane 3), and still others were heterozygous for this site (Fig. 2, lane 4). However, when M2, M3, and S haplotypes were evaluated, all were BstEII positive (Le. all have the same sequence in the 213 region as the classic M1 gene; data not shown). Thus, it became apparent that d A T haplotypes thought to be M1, can actually be M l ( V a P ) or M1(Ala'l3). ' Comparison of the IEF patterns of serum of individuals homozygous for M1(ValZL3) and M1(Ala2I3) demonstrated they were identical (data not shown), as might be expected by a An unpublished sequence of an alAT cDNA (S. L, C. Woo, and E. W. Davie) referred to by Carrel1 et al. (32) and also by Rosenberg et al. (43) showed an Ala at amino acid 213; presumably this cDNA represents M1(Ala2I3). substitution of the neutral amino acid Ala213 for the neutral amino acid Va1213. However, both M1(Val2l3) and M1(Ala'l3) could be easily distinguished from the other common Mfamily haplotypes M2 and M3. Furthermore, the M1(Ala213) haplotypes was found to be transmitted in a codominant autosomal fashion and was associated with normal serum levels of d A T (not shown).
Construction of 32P-labeled oligonucleotide probes complementary to the sequence differences among the M1(Va1'13), Ml(Ala'''), and Z genes centered about residues 213 and 342 verified that those genes corresponding to IEF patterns of serum identified as ZZ together with genomic DNA BstEII patterns that were -/-(i.e. only 0.95-kb fragment generated) are homozygous for the Ala213 a n d L Y S~~' sequences (Fig. 3A). Those genes corresponding to IEF patterns of serum identified as MlMl together with genomic DNA BstEII patterns that were +/+ (i.e. only 0.72-kb fragment seen) are homozygous for the VaP3 and sequences (Fig. 3B). However, the M1(Ala213)M1(Ala213) genes ( i e . those corresponding to the IEF patterns of serum identified as M l M l but with the . . . .

E x o n I I l~a 1~'~p r o b e : G G A C C A G G@G A C C A C C G T G E x o n V g l~~~~p r o b e : A C C A T C G A C H A G A A A G G G A
* * * * * * * * * * * * * * * * * * * * * substitution ( M l ( V a P ) to M1(Ala2l3) or Z) is caused by a base change of GTG to GCG. The 19-mer oligonucleotide gene probes used to detect this change are indicated as the "exon 111 VaTL3 probe" and "exon 111 A1a213 probe." From the BstEII data (Fig. 2), it would be expected that the exon 111 V a P probe would hybridize to the M1(VaI2l3) haplotype but not to the Z or M1(Ala2I3) haplotype while the exon 111 Ala2I3 probe would hybridize in the reverse order. The exon V Glu"" to Lys mutation (M1(VaI2l3) or M1(Ala2I3) to Z) is caused by a base change of GAG to and the "exon V LYS"'~ probe." From prior studies (23, 24), it is known that the exon V GIu3'* probe hybridizes to all M1 genes tested (presumably including M1(Ala'I3) as well) but not to the Z gene, while the exon V L~S~'~ probe hybridizes to all Z genes but not M1 (and presumably M1(Ala213)). All oligonucleotide probes were labeled with incorporation of [32P]dNTPs by E. coli DNA polymerase (Klenow fragment; Pharmacia) using reverse complementary templates and a small 8-mer primer (27). Bases in the probes that are labeled are indicated by *. Genomic DNA from various sources were cut with the endonucleases BglI and EcoRI, 5    genomic DNA BstEII patterns that were -/-) are homozygous for the Ala213 and G~u~~~ sequences (Fig. 3C). Furthermore, when 46 genomic samples identified by the combination of isoelectric focusing of serum, serum a l A T levels, family studies, and BstEII restriction patterns of genomic DNA as being M1(Va1213)M1(Va1213), M1(Va1213)M1(Ala213), M1(Ala213)M1(Ala213), or ZZ, there was a 100% correlation with the corresponding Va1213-G1~342 (Ml(Va1213)), Ala213-G~u~~~ (M1(Ala213)), or Ala213-Lys342 (Z) sequences as identified with the respective oligonucleotide probes (Table I).

Exon Ill ala213 probe: G G A C C A G G I G A C C A C C G T G Exon V 1~s~' probe: A C C A T C G A C I A G A A A G G G
Interestingly, when we used the exon I11 VaP3 and exon I11 Ala213 oligonucleotide probes to evaluate genomic DNA from individuals previously identified as being M l M l homozygotes, it became apparent that the M1(Ala213) haplotype was relatively common. In this regard, oligonucleotide evaluation of 26 genomic samples of Caucasians thought to be M l M l homozygotes revealed that 16 were homozygous with the exon I11 VaP3probe (i.e. true M1(Va1213)M1(Va1213) homozygotes), eight were heterozygous for the exon I11 VaP3 and exon I11 Ala213 probes ( i e . , M1(Va1213)M1(Ala213) heterozygotes), and two were homozygous for the exon I11 Ala213 probe (i.e. M1(Ala213)M1(Ala213) homozygotes). Assuming these frequencies hold for the Caucasian population as a whole, these data suggest haplotype frequencies (among haplotypes previously identified as MI) for M1(Va1213) of 77% and for M1(Ala213) of 23%. In this context, the M1(Ala213) gene is likely as frequent as the previously identified "family haplotype M2 and more frequent than M3 (14,15).
Like its unknown importance to the Z gene or protein, the functional importance of the Va1213 to Ala213 mutation to the M1 gene or protein is unknown. However, our preliminary studies comparing the individuals homozygous for M1-(Va1213)M1(Va1213) to those homozygous for M1(Ala213)M1-(Ala213) have failed to reveal any marked differences in a l A T levels or function.
The fact that the Z gene differs from the M1(Va1213) gene by more than one mutation, and that some individuals thought to have the M1(Va1213) a l A T haplotype actually have the M1(Ala213) haplotype, leads to two interesting conclusions. First, since the Z sequence differs from the M1(VaP3) sequence at two sites (amino acids 213 and 342), the Z gene could not have evolved from the M1(VaP3) gene (or vice versa) directly by a single mutational event (Fig. 4). Second, the available evidence suggests that the M1(Ala213) gene was an evolutionary intermediate between the M1(Va1213) and Z genes. In this regard, the M1(Ala213) gene sequence3 is identical to the baboon d A T sequence (33) at the codons LyslZ9 (AAG), Ala213 (GCG), and G~u~~~ (GAG). In contrast, the M1(VaP3) sequence differs from the baboon at one codon (M1(Va1213) (GTG), baboon Ala213 (GCG)) and the Z gene differs from the baboon at two codons (Z Lys'*' (AAA), Z L~s~~~ (AAG); baboon LyslZ9 (AAG), baboon G~u~~~ (GAG)) ( Fig. 4).