Characterization of the human Na,K-ATPase alpha 2 gene and identification of intragenic restriction fragment length polymorphisms.

We have determined the structure of the gene that encodes the alpha 2 isoform of the human Na,K-ATPase. The gene contains 23 exons and spans approximately 25 kilobases. The amino acid sequence of the human alpha 2 isoform deduced from the genomic sequence exhibits 99% identity to the rat alpha 2 isoform. One of the nine amino acid differences between the human and rat sequences occurs at an amino acid position which is known to be involved in species differences in sensitivity of the alpha 1 isoform to cardiac glycosides. Approximately 1500 base pairs of sequence flanking the 5' end of the alpha 2 gene have been determined. This region contains numerous potential AP-1, AP-2, and NF-1-binding sites, a potential Sp1 recognition site, and several sequences that are similar to the glucocorticoid receptor-binding site. The transcription start site was mapped by primer extension and S1 nuclease protection analyses of RNA from human brain, skeletal muscle, and heart. Multiple transcription initiation sites are clustered between residues -104 to -99 relative to the translation initiation codon. A potential TATA box is located 29 base pairs upstream of the first transcription initiation site. Immediately 5' to the apparent TATA box is a 35-base pair polypurine.polypyrimidine tract containing an imperfect mirror repeat which resembles sequences that form triple-stranded structures. Two intragenic DNA probes which detect restriction fragment length polymorphisms associated with the alpha 2 gene have been identified. These probes will be useful in genetic linkage analyses designed to define the possible role of the Na,K-ATPase in certain hereditary disorders.

We have determined the structure of the gene that encodes the a2 isoform of the human Na,K-ATPase. The gene contains 23 exons and spans approximately 25 kilobases. The amino acid sequence of the human a2 isoform deduced from the genomic sequence exhibits 99% identity to the rat a2 isoform. One of the nine amino acid differences between the human and rat sequences occurs at an amino acid position which is known to be involved in species differences in sensitivity of the a1 isoform to cardiac glycosides. Approximately 1500 base pairs of sequence flanking the 5' end of the a2 gene have been determined. This region contains numerous potential AP-1, AP-2, and NF-1-binding sites, a potential Spl recognition site, and several sequences that are similar to the glucocorticoid receptor-binding site. The transcription start site was mapped by primer extension and S1 nuclease protection analyses of RNA from human brain, skeletal muscle, and heart. Multiple transcription initiation sites are clustered between residues -104 to -99 relative to the translation initiation codon. A potential TATA box is located 29 base pairs upstream of the first transcription initiation site. Immediately 5' to the apparent TATA box is a 35-base pair polypurine-polypyrimidine tract containing an imperfect mirror repeat which resembles sequences that form triple-stranded structures.
Two intragenic DNA probes which detect restriction fragment length polymorphisms associated with the a2 gene have been identified. These probes will be useful in genetic linkage analyses designed to define the possible role of the Na,K-ATPase in certain hereditary disorders.
The Na,K-transporting ATPase (EC 3.6.1.37), an integral membrane protein present in all animal cells, is responsible for maintaining Na' and K' gradients across the plasma membrane. The enzyme consists of two subunits, a large (Mr -112,000) catalytic subunit ( a ) and a smaller (protein component, M, -35,000) glycoprotein subunit (p) whose function is unknown. Multiple isoforms of the a subunit (al, a2, and * This work was supported in part by National Institutes of Health Grant HL28573. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bunk with accession numbeds) 505096.
4 Supported by a predoctoral fellowship from the Albert J. Ryan Foundation.
To whom correspondence should be addressed. a3)' have been identified using biochemical (Sweadner, 1979;Lytton, 1985a;Hsu and Guidotti, 1989) and cDNA cloning (Shull et al., 1986;Hara et al., 1987) techniques. These isoforms exhibit differences in tissue specificity, developmental patterns of expression, hormonal regulation (reviewed in Lingrel et al., 1989), sensitivity to cardiac glycosides (Sweadner, 1979), and affinity for Na' (Lytton, 1985b). The a1 isoform, although most abundant in kidney, is expressed in a broad range of tissues, while the a2 and a3 isoforms exhibit a more limited tissue distribution. Studies of rat Na,K-ATPase mRNA abundance (Young and Lingrel, 1987;Herrera et al., 1987;Orlowski and Lingrel, 1988;Schneider et al., 1988) indicate that the a2 isoform is found predominantly in brain, skeletal muscle, and heart, increasing in all three tissues during development. The a 3 isoform is found primarily in brain and to a lesser extent in skeletal muscle and heart. A developmentally regulated transition in the expression of a2 and a 3 mRNAs occurs in heart. The a3 and a1 mRNAs are predominant in fetal and neonatal heart, while a2 and a1 predominate in juvenile and adult heart (Orlowski and Lingrel, 1988;Schneider et al., 1988).
The functioning of the Na,K-ATPase is essential for a variety of cell homeostatic processes including osmotic balance and cell volume regulation, Na+-coupled transport of nutrients including glucose and amino acids, and maintenance of the resting membrane potential (reviewed in DeWeer, 1985). In addition, certain specialized functions including electrical excitability of nerve and muscle and fluid movement across kidney and intestinal transport epithelia require Na,K-ATPase activity (DeWeer, 1985). The enzyme may also facilitate fluid absorption from the lung at birth (Bland and Boyd, 1986) and may play a role in determining the ionic composition of the cerebrospinal fluid (Vates et al., 1964) and aqueous humor (Cole, 1961).
The functional diversity of Na,K-ATPase activity is due in part to the existence of multiple a isoforms that are encoded by separate genes. Human genes encoding the a l , a2, and a3 isoforms have been isolated (Shull and Lingrel, 1987;Sverdlov et al., 1987), and one of these, a3, has been partially sequenced (Ovchinnikov et al., 1988). Two additional genomic sequences exhibiting nucleotide and deduced amino acid similarity to the a isoforms have also been identified (Shull and Lingrel, 1987;Sverdlov et al., 1987). The members of this gene family The nomenclature is that used by a number of investigators (Felsenfeld and Sweadner, 1988;Hsu and Guidotti, 1989;Ismail-Beigi et al., 1988). a1 or a1 refers to the predominant kidney isoform with NH, terminus Met-Gly-Lys-Gly-Val; a2 or aII, previously referred to as a(+) (Sweadner, 1979;Lytton, 1985a), has NH, terminus Met-Gly-Arg-Gly-Ala; a3, identified initially by cDNA cloning and referred to as a111 (Shull et al., 1986), has NH, terminus Met-Gly-Asp-Lys-Lys. ~u~a n ~a , K -A~P~e a2 Gene ~t r u c t u~e and RFLPs 17533 are dispersed in the human genome. The a1 and a2 genes, designated ATPlAl and ATPlA2, are located on the short and long arms, respectively, of chromosome 1, while the a3 gene, ATPlA3, is located on chromosome 19 ('Yang-Feng et al., 1988). One of the a-like genomic sequences, ATPlALZ, is physically linked to the a2 gene (Shull and Lingrel, 1987). A fifth a-like sequence, ATP1AL1, is located on chromosome 13 (Yang-Feng et al., 1988). Sequences of a human @ subunit gene and pseudogene have also been determined (Lane et al., 1989) and have been localized to human chromosomes 1 and 4, respectively (Yang-Feng et al., 1988). In order to understand the molecular m~h a n i s m s involved in the regulated developmental and tissue-specific expression of the Na,K-ATPase a isoforms and to investigate the role of the enzyme in human disease processes, we have characterized the gene encoding the human Na,K-ATPase a2 isoform and its 5"flanking sequences. Based on the genomic sequence, we have deduced the amino acid sequence of the human a2 protein. We have also identified intragenic probes which detect restriction fragment length polymorphisms. These probes will be useful in investigating the genetic linkage of the Na,K-ATPase to hereditary diseases potentially involving the sodium pump.

EXPERIMENTAL PROCEDURES
DNA Sequencing-Restriction fragments from phage clones CL6-2, CL23-1, and CL30-2 were subcloned into either m13mp18 and m13mp19 or pIB131 (International Biotechnologies, Inc.), and a series of nested deletions were prepared using Cyclone or Tornado deletion subcloning kits (International Biotechnologies, Inc.). Singlestranded templates were sequenced by the dideoxy method of Sanger et al., (1977) using [a-:35S]dATP and DNA Sequencing kits from Amersham Corp., Pharmacia LKB Biotechnology, Inc., or United States Biochemical Cow. Custom-designed oligonucleotides synthesized on an Applied Biosystems DNA synthesizer model 380A were occasionally used as sequencing primers. Deoxyinosine triphosphate was frequently substituted for dGTP in order to relieve compressions. All restriction sites were sequenced across except the EcoRI sites in introns 1, 6, 7, 10, and 20 and a Hind111 site in intron 13. With the exception of small EcoRI and Hindi11 fragments which may have been present at these restriction sites, the gene was sequenced in i t s entirety. Seventy-five 5% of the sequence indicated in Fig. 1 was obtained in both strands. Computer analyses were performed using the program DNANALYZE, version 22.1 (Wernke and Thompson, 1989).
Protein Structure Analysis-The predicted secondary structure of the human a2 protein was determined using the methods of Chou and Fasman (1974) and Garnier et al. (1978). Hydrophobicity and potential membrane-associated helical regions were predicted using the algorithms of Kyte and Doolittle (1982) and Eisenberg et ai. (1984).
RNA Isolation-Total RNA was isolated from adult human brain, skeletal muscle and heart, and from rat brain and heart using the procedure of Chirgwin et al. (1979). Poly(A)+ RNA from human heart and rat brain and heart was obtained by affinity chromatography on oligo(dT)-ceIlulose.
Primer Extension Analysis-Primer extension analysis was performed according to the method described . Two synthetic oligonucleotides complementary to the human a 2 gene, nucleotides -8 to -35 and -36 to -65 relative to the translation initiation site, and two oligonucleotides complementary to the rat a2 cDNA, nucleotides -8 to -36 and -37 to -66 relative to the translation initiation site, were end-labeled using [y-3'P]ATP and T4 polynucleotide kinase. The labeled primers were hybridized to RNA by incubating overnight at 30 "C in S l hybridization solution (80% formamide, 40 mM PIPES: pH 6.4,400 mM NaCl, and 1 mM EDTA. After precipitation of the annealed primer and template, the hybridized primers were extended by incubating with 40 units of AMV reverse transcriptase in reverse transcriptase buffer (50 mM Tris/ The abbreviations used are: PIPES, 1,4-~iperazinediethanesulfonic acid; kb, kilobase pairfs); bp, base pair(s); ClRATP, y-[4-N-2chloroethyl-N-methylamino)]benzylamide ATP)].

.-
HCl, pH 8.0, 5 mM MgCl,, 5 mM dithiothreitol, 50 mM KC[) containing 0.55 mM of each deoxynucleotide triphosphate. The samples were precipitated by addition of ethanol, denatured, and analyzed on an 8% polyacrylamide sequencing gel. SI Nuclease Mapping-S1 nuclease protection of the 5' end of the a 2 transcript was performed as described (Greene, 1987). Briefly, a 32P 5'-end-labeled synthetic oligonucleotide complementary to nucleotide -8 to -35 was annealed t o a single-stranded M13mp18 template containing a 2.5-kb EcoRI insert covering the putative transcription initiation site. The primer was extended using the large fragment of DNA polymerase I and the resulting double-stranded product cleaved with SphI. After denaturing alkaline agarose gel electrophoresis, a radiolabeled single-stranded fragment, 230 nucleotides in length, was eluted from the gel. 5 X lo* cpm of probe and 50 pg of total RNA or 15 pg of poly(A)+ RNA were incubated in 40 p1 of S1 hybridization solution at 65 "C for 10 min, then at 30 "C for 18 h. The samples were digested with 50, 100, or 300 units of S l nuclease by incubating for 1 h at 30 "C in 300 pl of S1 buffer (280 mM NaCl, 50 mM sodium acetate, pH 4.5, and 4.5 mM ZnSO4) containing 6 pg of denatured salmon sperm DNA and S1 nuclease. The protected fragments were precipitated by addition of ethanol, denatured, and analyzed on an 8% polyacrylamide sequencing gel.
Preparation of Repeat-free, Gene-specific Probes-DNA was isolated from recombinant phage clones containing human Na,K-ATPase a2 sequences, digested with restriction endonucleases, separated by agarose gel electrophoresis, and transferred to nylon membranes.
Total human genomic DNA, labeled with "P by nick translation, was hybridized to the Southern blots using conditions describedpreviously (Shull and Lingrel, 1987). Fragments which appeared to lack repetitive elements, based on the absence of an autoradio~aphic signal, were isolated by preparative gel electrophoresis and subcloned into the plasmid vector pIBI31. The insert-containing plasmids were labeled with 3zP by nick translation and used as probes on Southern blots of restriction enzyme-digested DNAs from phage clones containing inserts representing each of the cy genes ATPlAl, ATPlA2, ATPlALl, and ATPlAL2 in order to confirm gene specificity. The same probes were also used on restriction enzyme-digested human genomic DNA to confirm gene specificity and absence of repetitive elements.

RESULTS
Structure of the Human Na,K-ATPase a2 Gene-The nucleotide sequence of the human Na,K-ATPase a2 gene was determined by sequencing genomic DNA inserts from phage clones CL6-2, cL23-1, and CL30-2 which were reported previously (Shull and Lingrel, 1987). Seventy-five ?6 of the sequence, including the entire 5"flanking sequence, the 5'u n t r a n s l a~ region, and all coding exons, was determined in both strands. The gene contains 23 exons and spans approximately 25 kb of genomic DNA. As illustrated in Fig. 1, the exons are clustered in three regions. The first exon, containing 5"untranslated sequences and nucleotides encoding the first four amino acids of the primary translation product, is separated from the remaining exons by a 5.0-kb intron. Exons 2 through 13 occupy the central portion of the gene, while exons 14 through 23 are clustered at the 3' end. The protein coding exons range in size from 60 to 269 bp, while introns vary in length from 129 bp to 5.0 kb. As shown in Table I, the intron/ exon boundary sequences conform to published consensus sequences (Breathnach and Chambon, 1981;Mount, 1982). Each splice donor site begins with GT, while each splice acceptor site ends with an AG and is preceded by a polypyrimidine tract. The locations of introns within the protein coding sequence are illustrated in Fig. 2.
Deduced Sequence of the Human Na,K-ATPase a2 mRNA and Protein-A composite sequence representing the human a2 mRNA is shown in Fig. 2. Portions of the genomic sequence corresponding to coding regions could be unambiguously determined because of the high similarity between the human and rat a2 amino acid sequences. Also, sequences at predicted intron-exon boundary positions agree with published consensus sequences ( Table I). As described below, the 5' end of the transcript was determined by S1 nuclease protection and primer extension analyses. The 3' sequence shown in Fig. 2 includes three potential polyadenylation signals analogous to those used in the rat. Based on homology to the consensus, AATAAA, and to the rat sequence around the polyadenylation signals, it is expected that the second and third sites would be used in the human. However, it is not clear if the first site is used. The human a2 protein coding sequence exhibits 90% nucleotide identity to that of the rat a2 cDNA (Shull et aL, 1986). The sequence of the 5"untranslated and 3"untranslated regions (excluding the A h element in the 3"untranslated region) exhibit 89 and 66% identity, respectively, to the corresponding regions (excluding the AC repeat in the 3'untranslated region) of the rat a2 cDNA.
The deduced human a2 amino acid sequence exhibits 99% identity to the rat sequence. There are no insertions or deletions in the human sequence relative to that of rat and only nine amino acid differences occur between the two sequences, most of which are conservative changes. These are (human amino acid -+ rat): Val-64 + Ile, Ile-102 -+ Leu, Gln-111 + Leu, Val-127 "+ Ile, Arg-274 --$ Gln, Met-644 + Val, Lys-675 + Arg, Asn-676 Asp, and Met-887 + Thr. One of these amino acid differences, Gln-111, occurs at a position which is involved in species differences in sensitivity of the a1 isoform to cardiac glycosides (Price and Lingrel, 1988). The human a 2 primary translation product, like that of the rat, is 1020 amino acids in length. The first five amino acids, numbered -5 to -1 in Fig. 2, are identical to those of the rat which apparently are removed by posttranslational processing (Lytton, 1985a;Shull et al., 1986). If this is also true for human a2, then the mature protein would consist of 1015 amino acids and have a M , of 111,857.
Location of the Transcription Initiation Site and Nucleotide ~e~~~e of the 5 "~~~n~i n g~e g i o n -T o map the a2 transcrip-tion start sites, primer extension and S1 nuclease protection analyses were performed. For primer extension analysis, two primers complementary to the 5'-untranslated sequence, depicted in Fig. 3, were used to analyze human total RNA isolated from adult brain and skeletal muscle and poly(A)+ RNA from heart. As shown in Fig. 4, a cluster of apparent transcription initiation sites located within a single 6-bp region were observed in all three tissues with both primers. Four predominant bands were observed at positions -104, -103, -100, and -99 relative to the start of translation. For comparison, primer extension analysis was also performed using rat brain and heart poly(A)+ RNA. In rat brain and heart samples, a cluster of apparent initiation sites was also observed, with the major bands located at positions -108 and -104 relative to the translation start site. The site at position -104 site is only 6 bp beyond the 5"terminal end of the rat brain a2 cDNA described previously (Shull et al., 1986).
To confirm the location of the transcription initiation sites indicated by primer extension analyses, S1 nuclease protection studies were performed (Fig. 5 ) . The S1 probe consisted of a 230-nucleotide single-stranded fragment extending from position -36 to -265 relative to the translation start site, Again, a cluster of initiation sites was identified in all tissues, with major sites at -104, -103, -100, and -99. Although in many eukaryotic genes the site of initiation is frequently an adenine (Breathnach and Chambon, 1981), this is not the case for the Na,K-ATPase a2 gene, where the apparent initiation sites occur at T, C, and G residues. The nucleotide sequence of the 5' end of the Na,K-ATPase a2 gene is shown in Fig. 3. A potential TATA sequence (Breathnach and Chambon, 1981), TATTTAAA, is located 29 bp upstream of the 5'-most transcription initiation site.
No obvious CCAAT consensus sequence is observed within 100 bp of the transcription start site. However a sequence, Nucleotide sequences around the exon/intron boundaries are presented. Exon sequences are in upper case letters and intron sequences in lower case. Amino acids encoded by codons bordering the splice junctions are shown and the number of the amino acid immediately 5' of the splice site is indicated. The NH2-terminal amino acids cleaved to yield the mature protein are assigned negative numbers. Numbers above codons 5' of the splice sites enumerate the nucleotides within the mRNA after which splicing occurs. The first base of the ATG initiation codon is designated +l. Intron sizes (bp) are shown in parentheses. A potential Spl-binding site, GGGGCGGG (Dynan et al., 1986), is located 119 bp upstream of the apparent cap site. The entire 1.6-kb 5'-flanking region exhibits 57% GC content. The 159-bp region immediately preceding the apparent TATA box and continuous with the strand corresponding to the mRNA exhibits 71% AG content and includes a polypurine-rich region, consisting of 34 purines out of 35 bases, that may have regulatory significance (see "Discussion").
Repetitive DNA-The a2 sequence was examined for the presence of human repetitive elements including A h (Britten et al., 1988;Jurka and Smith, 19881, 0 (Sun et al., 19841, K (Sun et al., 1984), and L1 (Scott et al., 1987) repeats. Five complete Alu repeats were identified, each exhibiting 7 2 4 7 % nucleotide identity to published Alu consensus sequences. The Alu elements are present in introns 1, 3, 13, and 22, and in the 3"untranslated region of exon 23. The Alu repeat in intron 1 is followed by an 11-nucleotide poly(A) tract and is flanked on both sides by a perfect 6-bp direct repeat. Similarly, the Alu element in intron 13 possesses a 15-bp poly (A) tract and is flanked by perfect 9-bp direct repeats. The Alu sequence in intron 3, although not followed by a poly(A) tract, is followed by a 23-nucleotide A-rich region (70% A). However, it is not flanked by short repeats. The Alu element in intron 22 is present in the reverse orientation relative to the gene. It is flanked on both sides by perfect 6-bp inverted repeats, rather than direct repeats, and is followed by a 30nucleotide poly (A) tract. The Alu element in the 3"untranslated region is followed by a 14-nucleotide poly(A) tract and flanked by perfect 12-bp direct repeats.
Comparison of a2 and a3 Gene Structure-Both the a2 and a3 (Ovchinnikov et al., 1988) genes contain 23 exons and 22 introns. With the exception of introns 1 and 10, the introns in the a2 and a3 genes occur in exactly the same positions. Exon 1 in a2 includes the 5"untranslated sequence and encodes the first four amino acids that are apparently removed posttranslationlly, while exon 1 in a3 contains the 5"untranslated sequence and encodes the first two amino acids. In the a 2 gene, intron 10 occurs between the codons for Lys-437 and Arg-438, while in a3 it interrupts the Arg codon (AG-G). Although the positions of introns in the a2 and a3 genes are basically the same, there appears to be little similarity in terms of intron size or sequence. The size of nine a 3 introns has been determined and the sequence of five of these has been reported (Ovchinnikov et al., 1987a). There is no correlation between the sizes of the corresponding a2 and a3 introns. For example, intron 13, which is the second largest (3.9 kb) intron in the a2 gene, is the smallest (70 bp) reported a 3 intron. There is also no apparent sequence similarity between corresponding a2 and a3 introns. In addition, the occurrence of repetitive elements is not conserved. Whereas a2 has five Alu repeats, one copy each in introns 1, 3, 13, 22, and in the 3"untranslated region, an analysis of the published a 3 sequence reveals four Alu repeats, all present in intron 16.
Position of IntronlExon Boundaries Relative to Structural or Functional Domains-A number of investigators have proposed that exons may represent structural units (Go, 1981), elements of sequential supersecondary structure (Blake, 1978;Lonberg and Gilbert, 1985;Blake, 1985), or structural and functional domains (Gilbert, 1978;Gilbert, 1985;Sakano et al., 1979). In addition, intron/exon splice junctions frequently map to predicted surface segments of regions predicted not to be membrane-embedded (Craik et al., 1982;Argos and Rao, 1985). Because the structural and functional domains of the Na,K-ATPase have not been fully characterized, only limited ATGGGCCGTGGGGCTGGCCGTGAGTACTCACCTGCCGCCACCACGGCAGAGAATGGGGGCGGCAAGAAGAAACAGAAGGAGAAGGAACTGGATGAGCTGAAGAAGGAGGTGGC~TGGATGACCACAAGCTGTCCTTGGATGAGCTGGGC

~~~~~~~~~~~~~~~~~~~~~~~u~~~S e~~~~~~a~~a~~~~~~A l a G l u A s n G i y G l y G l y L y s L y s L y s G l n L y s G l u L y s G l u L e u A s p G l u L e u L y s~. y s G l u v a l A l a M e ~x o q )!Exon 2
Exon ZFExon 3 -5 1 10 20 Exon 3v~xon 4 30 40 CGCAAATACCAAGTGGACCTGTCCAAGGGCCTCACCAACCAGCGGGCTCAGGACGTTCTGGCTCGAGATGGGCCCAACGCCCTCACACCACCTCCC~CAACCCCTGAGTG~GTCAA~~TCTGCCG~CAGCTTT~CGGG~G~~TCTCCATC

G l u S e r~l a L e u L e u C y s C y s i i e G l u L e u S e r C~s G l y S a G l u~l e P r o P h e A s n S e r T h r A s n L y s T y r G l~L e u S e r I l e H i s G l u A r g G l u A s p S e r P r~G l n S e r H i S
Exon 1 g t g t c c c t g c a a c c t c t a t a a t a g a a q 4 a a c a g a g t q 4~g g  correlations between the locations of exon boundaries and structural or functional domains can be made. However, some interesting features are apparent.

t a a t c t c t c t g a g c c t c a g t t t a c c c a c t t g g a a a~g t t a g g c c a c a~c t g t c a t c c t g g g a t t c t q a q t c a a c t g c t c c c c t t c a g c c a g a t c c t a g
Positions of introns depicted on a hydropathy plot of the deduced a2 amino acid sequence are shown in Fig. 6. Also depicted on the graph are regions which exhibit amino acid similarity among eukaryotic aspartyl-phosphate transport ATPases Korczak et al., 1988;Serrano, 1988;Shull and Greeb, 1988). Residues which may be involved in functional properties of the enzyme are also indicated. These include Asp-369, which is phosphorylated during the catalytic cycle (Bastide et al., 1973) and other residues which, based on labeling with ATP analogues (reviewed in , seem to form part of the ATP-binding site. Residues involved in species differences in ouabain sensitivity, located extracellularly at the borders of the first and second hydrophobic domains, are also shown. As indicated on the graph, intron positions generally do not interrupt sequences encoding predicted membrane-spanning domains (shaded regions).
The first intron interrupts the five-amino acid sequence that is apparently removed to yield the NH, terminus of the mature protein. Exons 2 and 3 encode a hydrophilic cytoplasmic region predicted to be a-helical in structure. Within the region encoded by exon 2 is a lysine-rich sequence which is conserved across Na,K-ATPase isoforms and species. The function of this domain is unknown but it has been suggested that it may serve as an ion-selective gate controlling passage of Na' and K ' ions to and from binding sites (Shull et dl., 1985;Shull et al., 1986). The predominant feature of exon 4 is that it encodes the first transmembrane domain as well as the first extracellular domain. Located at the borders of this extracellular sequence are two amino acids which are involved in species differences in sensitivity of the a1 isoform to inhibition by ouabain (Price and Lingrel, 1988). Exon 5 encodes the second transmembrane domain. The cytoplasmic region between the second and third transmembrane domains is encoded by the 3'portion of exon 5 and by exons 6, 7, and the 5' portion of exon 8. Within this cytoplasmic region are two segments which are conserved among aspartyl-phosphate transport AT-Pases. Introns 6 and 7 fall within the sequences encoding these conserved regions in both the Na,K-ATPase a2 gene and the fast-twitch skeletal muscle Ca-ATPase (Korczak et ai., 1988). The significance of these conserved domains is unknown. However, within the second conserved region is a glycine residue (Gly-261) that is present in all eukaryotic to the 5"untranslated region of the human n2 mRNA, were annealed to 50 pg of total RNA from human brain or skeletal muscle or to 15 pg of poly(A)+ RNA from human heart. Two oligonucleotide primers, 30 and 29 nucleotides in length and complementary to the same region of the rat n2 cDNA (Shull et al., 1986), were annealed to 15 pg of poly(A)+ RNA from rat brain or heart. The products resulting from extension with avian myeloblastosis virus reverse transcriptase were analyzed by electrophoresis on 8% denaturing polyacrylamide gels. The human-derived synthetic oligonucleotides were also used as sequencing primers for Sanger sequencing of a fragment containing the 5' end of the gene. The sequence read off the gel is the antisense strand. Left panel, primer extension and sequencing with primer I. Right panel, primer extension and sequencing with primer 11. aspartyl-phosphate transport ATPases. Mutation of this residue to aspartate confers vanadate-resistant ATPase activity to the fungal H+-ATPase (Ghislain et al., 1987). The third and fourth hydrophobic membrane-spanning domains are encoded by the 3' portion of exon 8.

Human Na,K-ATPase a2 Gene Structure and RFLPs
The large cytoplasmic portion of the enzyme located between the fourth and fifth predicted transmembrane domains is encoded by exons 9-15 and part of exon 16. Exon 9 contains the highly conserved region surrounding the aspartate residue which is phosphorylated during the catalytic cycle. Three additional regions of similarity among cation transport AT-Pases are located in this large cytoplasmic region Korczak et al., 1988;Serrano, 1988;Shull and Greeb, 1988). Based on chemical labeling studies (reviewed in Lingrel et al., 1989) and predicted secondary structure analysis (Taylor and Green, 1989), these regions appear to be a t or near the nucleotide-binding domain. The first of these homologous regions, located entirely within exon 12 in the a2 gene (amino acids 497-519), contains the conserved lysine (Lys-500 in a2) which is labeled by fluorescein 5'-isothiocyanate (Farley et al., 1984;Kirley et al., 1984). The labeled Lys and the Gly-Ala following it are conserved in all eukaryotic aspartylphosphate transport ATPases and may be analogous to the NH2-terminal ATP-binding loop of adenylate kinase (Taylor and Green, 1989;Walker et al., 1982). Interestingly, although the entire region of homology around the fluorescein 5'isothiocyanate site (amino acids 497-519) is within a single exon in a2, in the sarcoplasmic reticulum fast-twitch Ca-ATPase gene (Korczak et ul., 1988), intron 13 occurs between the highly conserved Lys and the Gly following it, interrupting the predicted ATP-binding loop. Na,K-ATPase a2 gene. A single-stranded 32P-labeled fragment beginning 36 bp 5' of the translation start site and extending 230 bp 5' was hybridized to 50 pg of total RNA from human brain or skeletal muscle or to 15 pg of poly(A)+ RNA from human heart. SI nuclease protection analysis was performed as described in "Experimental Procedures." The protected fragments were analyzed on 8% denaturing polyacrylamide gels. Lane 1, size markers; lane 2, undigested probe; lane 3,50 pg of total RNA from human brain; lune 4,50 pg of total RNA from human skeletal muscle; lane 5,lO pg of yeast tRNA; lane 6,15 pg of poly(A)+ RNA from human heart; lanes 6-9, sequencing reaction using primer I, which was used in generating the SI probe (see "Experimental Procedures").
A second region of similarity in cation-transporting AT-Pases occurs between residues 574 and 625 in 1x2. This segment contains a 19-amino acid sequence (amino acids 594- uertical lines on a hydropathy plot of the predicted Na,K-ATPase a 2 protein sequence. The hydropathy plot was generated using the algorithm of Kyte and Doolittle (1982) using a window of 13 residues. Hydrophobic residues are shown above and hydrophilic residues below the horizontal line. Regions which, based on the algorithms of Kyte and Doolittle and Eisenberg et al. (1984), may span the membrane are filled with black and denoted HI-HlO.
Regions of the n2 protein sequence which exhibit homology among mammalian cation transporting ATPases (Na,K-ATPase, Ca-ATPase, H,K-ATPase) are indicated by black bars on the lower line of the figure. Asp-369 indicates the aspartic residue which is phosphorylated during the reaction cycle. The binding sites for fluorescein 5'-isothiocyanate, ClRATP, and 5'-(p-fluorosulfonyl)benzoyladenosine are indicated. The amino acid residues a t the H1-H2 boundary which are involved in differences in species sensitivity to inhibition by ouabain are indicated by white boxes. DNA was isolated from peripheral leukocytes of 10 unrelated individuals, digested with BglII, size fractionated by agarose gel electrophoresis, transferred to nylon membranes and hybridized with Na,K-ATPase a 2 gene probes. Probe ATPIAS (6-2-3) is a 2.5-kb EcoRI fragment representing the 5' end of the a2 gene. Probe ATPIAS (30-2-6) is a 0.6-kb fragment from the 3' portion of the a2 gene. Allele sizes are indicated. 612) within which occurs a conserved Thr-Gly-Asp which may correspond to the COOH-terminal ATP-binding loop of adenylate kinase (Taylor and Green, 1989;Walker et al., 1982). The aspartic acid residue (Asp-611) may correspond to the aspartate found in other adenine nucleotide-binding proteins which has been suggested to be involved in binding the magnesium ion in Mg-ATP (Walker et al., 1982) or alternatively, in forming hydrogen bonds interacting with the transition state to facilitate catalysis (Al-Shawi et al., 1988). In a2, intron 13 falls rather centrally (between amino acids 604 and 605) within this 19-residue sequence. A third region highly conserved among aspartyl-phosphate transport AT-Pases occurs between residues 704 and 758 of a2 and contains a peptide fragment which is labeled with the ATP analog, 5'-(pfluorosulfony1)benzoyladenosine . A second conserved Thr-Gly-Asp sequence is located at residues 707-709. The aspartate (Asp-709) is labeled with the ATP derivative ClRATP (Ovchinnikov et al., 1987b). This region may form part of the nucleotide-binding domain  or it may form a hinge region contacting the phosphorylation and nucleotide-binding domains (Taylor and Green, 1989). In a2, this region is located almost entirely within exon 16. In the Ca-ATPase, intron 15 immediately precedes the Thr of the Thr-Gly-Asp sequence (Korczak et al., 1988).
Exon 17 encodes a large hydrophobic region which may represent one or two transmembrane passes. Exons 18 encodes a hydrophilic region which is conserved among mammalian aspartyl-phosphate transport ATPases. The significance of this conservation is not clear. Exons 19-22 encode hydrophobic domains which may represent transmembrane domains. However, the number of transmembrane passes in the COOH-terminal half of the protein is uncertain. The 3' portion of exon 20 contains a potential CAMP-dependent phosphorylation site (Shull et al., 1986). Exon 23 encodes the hydrophilic COOH-terminal region which is probably located in the cytoplasm.
Detection of Restriction Fragment Length Polymorphisms with Na,K-ATPase a2 Probes-Several restriction fragments from within the a2 gene were identified which lacked repetitive elements and were gene specific. These probes were tested for their ability to detect restriction fragment length polymorphisms in human genomic DNA that had been digested with a panel of restriction endonucleases (BamHI, BglII, EcoRI, HindIII, KpnI, MspI, PstI, PuuII, Sad, and TaqI). Two probes which detected polymorphisms are shown in Fig.  7. Probe ATPlA2 6-2-3 consists of a 2.5-kb EcoRI fragment from the 5' end of the a2 gene and includes exon 1. This probe detects a two-allele polymorphism in BglII-digested  Fig. 8. The genotype distribution for each probe does not deviate significantly from that expected for Hardy-Weinberg equilibrium.

DISCUSSION
The objective of this study was to analyze the structure of the Na,K-ATPase a2 gene, to determine the 5'-flanking sequence, and to identify intragenic probes which can be used in genetic linkage studies. The human Na,K-ATPase a2 gene spans approximately 25 kb and, like the human Na,K-ATPase a3 gene (Ovchinnikov et al., 1988) and the rabbit fast-twitch skeletal muscle sarcoplasmic reticulum Ca-ATPase gene (Korczak et al., 1988), contains 23 exons. The gene structure appears typical for a eukaryotic protein-coding gene. The median exon size, 135 bp, falls within the most abundant exon size class observed in higher eukaryotic protein-coding genes (Naora and Deacon, 1982). Similarly, the median intron size, 253 bp, is similar to the size of the most abundant eukaryotic intron class (Naora and Deacon, 1982). Although the position of introns in the a2 and a3 genes are basically the same, the intron size and sequence and the position of Alu repeats are not conserved.
As the a2 isoform is one of the Na,K-ATPase isoforms expressed in adult heart, it is a receptor for cardiac glycosides used in the treatment of congestive heart failure. Therefore, the sequence of human a 2 in the regions involved in cardiac glycoside binding and sensitivity is of considerable interest. Several sites that may be involved in the interaction of cardiac glycosides with the a subunit have been identified, including the extracellular junctions between the first and second transmembrane domains (H1 and H2) (Shull et al., 1986;Price and Lingrel, 1988). Two residues that account for species differences in ouabain sensitivity of the a1 isoform, located at the boundaries of the Hl-H2 junction have been determined by site-directed mutagenesis. Conversion of Gln-111 and Asn-122 of the sensitive sheep enzyme to Argand Asp, respectively, which occur in the insensitive rat enzyme, results in a fully resistant enzyme (Price and Lingrel, 1988). One of the amino acid differences between the human and rat a2 sequences occurs at residue 111 (human, Gln; rat, Leu). Thus, there may be differences in ouabain binding between the human and rat a 2 isoforms. The Gln-111 and Asn-122 residues are also found in human (Kawakami et al., 1986) and pig (Ovchinnikov et al., 1988) a1 and in human a3 (Ovchinnikov et al., 1988). The codon for Asn-122 (AAT) in the a2 gene borders intron 4. The last nucleotide of the exon bordering the splice donor site is usually a G. However, in this case the nucleotide is a T. In the a3 gene, the corresponding Asn codon is AAC (Ovchinnikov et al., 1988). If a G were present in this position in either gene, the codon would represent Lys, a charged amino acid, and the resulting enzyme might be expected to be more resistant to ouabain. This suggests the possible existence of evolutionary pressure for maintaining sensitivity to ouabain-like compounds including putative endogenous inhibitors of the Na,K-ATPase.
In order to begin addressing questions concerning the regulation of Na,K-ATPase expression at the transcriptional level, we determined the transcription initiation sites and examined the 5"flanking sequence of the a2 gene for potential promoter elements, transcription factor-binding sites (Spl, AP-1, AP-2, CPl/CP2, NF-1, and CACCC factor-binding sites), hormone response elements (thyroid hormone receptor and glucocorticoid receptor-binding sites), and for regions of unusual DNA structure. Primer extension and S1 nuclease protection analyses of RNA from human brain, skeletal muscle, and heart, the three major tissues in which the a2 isoform is expressed, demonstrate that there are four transcription initiation sites clustered between -104 and -99 relative to the translation start site. All four sites appear to be used in each tissue. However, in skeletal muscle, the -100 and -99 sites appear to be used less frequently than in brain or heart. The a2 5"flanking region contains two sequences exhibiting similarity to the CCAAT element consensus sequence or its reverse complement (Chodosh et al., 1988) (Fig. 3 and Table 11). This pentanucleotide is the core-binding site for the transcription factors CP1 and CP2 and is usually located within 100 bp of the transcription start site. The sequences in a 2 which exhibit similarity to the consensus are located much further from the transcription start site, at -430 and -1031. Another CCAAT element, with the consensus TTGGCTNNNAGCCAA, is recognized by nuclear factor I (NF-I/CTF) (Jones et al., 1987). There are five potential NF-1-binding sites in the a 2 5"flanking region which maintain a t least 4 of the 6 residues in the NF-1-binding site that, based on methylation interference analysis, seem to contact the NF-1 protein (Chodosh et al., 1988). One of these, at position -1457, has all six essential contact points, resulting in a good match to both half-sites of the palindromic CTF/ NF-1-binding site. A site at -183 exhibits 100% identity to the core AGCCAA hexanucleotide, the minimal recognition site for CTF/NF-1 (Jones et al., 1987).
The CACCC element, GCCACACCC (Dierks et al., 1983), is an upstream promoter element that seems to act synergistically with glucocorticoid/progesterone receptor-binding sites in mediating hormone induction probably via interactions of the hormone receptor and CACCC-binding factor (Schule et al., 1988). There are three sequences in the a2 5'flanking region that exhibit a five out of five match to the core pentanucleotide CACCC sequence or its reverse complement. One of these (at -1190) overlaps a potential glucocorticoid response element.

Analysis of the 5'-fhnking region of the human Na,K-ATFase a2 gene for potential transcription factor and hormone receptor-binding sites
Consensus binding sites for the glucocorticoid receptor (GRE) and for transcription factors AP-1, AP-2, CP1/ CP2, NF-1, and the CACCC box factor are indicated. Sequences within the 5"flanking region of the human a2 gene which exhibit similarity to these consensus sequences are shown. For the glucocorticoid response element and CPl/CP2 elements, matches to the reverse complements were observed and are also shown. Numbers indicate the distance of the element from the f i r s t transcription start site. The octamer motif, ATGCAAAT, is found within SV40 and immunoglobulin enhancers and in the upstream promoter regions of a variety of other genes (see Fletcher et a!., 1987). A sequence (ATTCAAAT) that exhibits a seven out of eight nucleotide match to this motif is located in the a2 gene at position -384 relative to the transcription start site.

17541
AP-1 is a mammalian transcription factor that influences basal transcription levels and is also required for induction of transcription by phorbol ester tumor promoters (reviewed in Curran and Franza, 1988). Recently, protein products of the jun and fos proto-oncogenes have been shown to contribute to the AP-1 protein complex and to bind to the AP-1 consensus recognition sequence, TGA~G,C)TCA (see Curran and Franza, 1988). The 5' end of the a2 gene contains six sequences which exhibit at least a six out of seven match to the AP-1 consensus sequence. One of these, located at -297, exhibits 100% identity to the AP-1 consensus. Since the fos and jun proto-oncogene products have been associated with cellular processes involved in development, differentiation, and neuronal function (Curran and Franza, 1988), the possibility that the Na,K-ATPase may be regulated by fos and jun products is intriguing. In addition, the potential regulation of Na,K-ATPase expression by phorbol esters deserves investigation.
AP-2 is a stimulatory transcription factor that binds to control regions of a number of eukaryotic genes itchel ell et al., 1987). Within the 5"flanking region of a2, there are three sequences which exhibit at least a seven out of eight identity to the AP-2 consensus binding site, CCCCAGGC, or its reverse complement. One of these, at -1101, exhibits 100% identity to this recognition sequence. Since Na,K-ATPase activity is modulated by hormones including glucocorticoids, mineralocorticoids, and triiodothyronine (reviewed in Lingrel et ab, 1989), we examined the 5'-flanking region of the a2 gene for potential hormone response elements. A consensus recognition sequence for the mineralocorticoid receptor has not been reported. However, the mineralocorticoid receptor and glucocorticoid receptor may recognize similar regulatory elements (Arriza et al., 1987;Cat0 and Weinman, 1988). The glucocorticoid response element consists of a hexanucleotide core, TGTTCT, which forms part of a 15-bp imperfect dyad symmetry element, GGTA-CANNNTGTTCT (Karin et aL, 1984;Jantzen et al., 1987). There are 14 sequences in the 5'-flanking region of a2 that exhibit at least a five out of six match to the core glucocorticoid response element or its reverse complement. One of these at -1007 exhibits 100% identity to the hexanucleotide core. In most cases, homology to the entire 15-bp palindromic sequence is low. However, a region at -468 exhibits a 10 out of 15 match to the complete sequence, having a five out of six match to the first half of the dyad and a five out of six match to the second half. The first intron also contains numerous matches of at least five out of six to the core element.
A consensus thyroid hormone receptor-binding site has not been reported for human genes. However, for the rat growth hormone gene, the minimal sequence requirement for thyroid hormone induction of the rat growth hormone promoter (reviewed by Samuels et al., 1988) or heterologous thymidine kinase promoter (Glass et al., 1987;Brent et ai., 1989) is a 23bp region, AGGT~GATCAGGGACGTGACCG, located -160 bp upstream of the transcription initiation site.
Although there are several sequences within the 5' end of the a2 gene which exhibit partial identity to this consensus, none of these appears to be in particularly good agreement.
Because non-B DNA structures have been suggested to be involved in gene regulation, we examined the 5' end of the a2 gene for sequences potentially capable of adopting unusual conformations. A striking sequence, located immediately 5' of the TATA box, consists of a homopurine. homopyrimidine tract containing an imperfect mirror repeat sequence (see Fig.  3). Homopurine. homopyrimidine sequences are frequently found in promoter regions of eukaryotic genes and often exhibit S1 nuclease h~ersensitivity (reviewed in Wells et at, 1988). Chemical modification and single strand-specific nuclease sensitivity studies indicate that such sequences, particularly those possessing mirror repeat symmetry, may form unusual DNA structures, called H-DNA, which contain both single-stranded and triple-stranded regions (see Wells et aL, 1988). These structures consist of a core triple helix and a fourth strand, which although closely associated with the triplex, is exposed. It has been suggested that such structures may provide access to transcription factors, facilitate proteinprotein interactions, or inhibit stable chromatin assembly, thus functioning in the r e~l a t i o n of gene expression (Christophe et al., 1985;Htun and Dahlberg, 1988;Wells et al., 1988).
The Na,K-ATPase, because of its pivotal role in maintaining Na+ and K+ balance, has been implicated in a number of disease processes including essential hypertension, familial obesity, various kidney transport defects such as Liddle's syndrome and pseudohypoaldosteronism, and neuromuscular disorders such as periodic paralysis (Hilton, 1986;Layzer, 1982;Schwartz and Spitzer, 1978). One method for investigating the etiology of hereditary diseases is to examine the linkage of candidate genes to the disease. In this approach, polymorphic markers for a gene which is under consideration as the possible disease locus are used in genetic linkage studies (Gusella, 1986). We have identified two DNA probes from within the a2 gene which reveal restriction fragment length polymorphisms and thus should be useful in testing the potential role of a defective Na,K-ATPase a2 gene in the etiology of certain hereditary diseases.