Structure of a Plasma Membrane H’-ATPase Gene from the Plant Arabidopsis thaliana”

Physiological and biochemical studies have sug- gested that the plant plasma membrane H+-ATPase controls many important aspects of plant physiology, including growth, development, nutrient transport, and stomata movements. We have started the genetic analysis of this enzyme by isolating both genomic and cDNA clones of an H+-ATPase gene from Arabidopsis thaliana. The cloned gene is interrupted by 15 introns, and there is partial conservation of exon boundaries with respect to animal (Na+/K+)- and Ca2+-ATPases. In general, the relationship between exons and the pre- dicted secondary and transmembrane structure of different ATPases with phosphorylated intermediate sup- port a somewhat degenerate correspondence between exons and structural modules. The predicted amino acid sequence of the plant H+-ATPase is more closely related to fungal and protozoan H+-ATPases than to bacterial K+-ATPases or to animal (Na+/K+)-, (H+/K+)-, and Ca2+-ATPases. There is evidence for the existence of at least three isoforms of the plant H+- ATPase gene, These results open the way for a molecular approach to the structure and function of the plant proton pump.

Biochemical studies (Serrano, 1984 have indicated that the major ATPase activity characterized in plant plasma membranes (Hodges et at., 1972) corresponds to the electrogenic proton pump identified by physiological studies (Spanswick, 1981). The plant proton pump belongs to the family of cation-pumping ATPases which are sensitive to vanadate and form a phosphorylated intermediate (Serrano, 1984). The genes of different bacterial, fungal, and animal ATPases of this family have already been cloned and sequenced .
Cloning of the plant plasma membrane ATPase gene is a prerequisite to get a further insight on the structure and function of this important enzyme. The predicted amino acid sequence could provide clues about its secondary structure and evolutionary relationship to other ATPases. The exon/ intron structure of the plant gene could be compared with that of animal ATPases of the same family (Korczak et al., 1988;Ovchinnikov et al., 1988), in the context of a correspondence between exons and structure-function modules of proteins (Gilbert et al., 1986). Finally, the cloned gene could provide a powerful tool for the study of the physiological role of the enzyme. This would be based on the construction of transgenic plants (Schell, 1987), which either enhanced (increased gene dosage, strong promoters) or inhibited (antisense methodologies, Lichtenstein, 1988) expression of the ATPase. This kind of analysis has already been started with the plasma membrane ATPase of the yeast Saccharomyces cerevisiae (Cid et al., 1987;Cid and Serrano, 1988). However, the suggested "high level" functions of the plant enzyme, such as transport at the whole plant level, hormonal regulation, and development, cannot be approached with the yeast model system.
The plant Arabidopsis thaliana constitutes a convenient experimental system for plant genetics and molecular biology (Meyerowitz, 1987). We have started our genetic analysis of plant plasma membrane ATPase by isolating both genomic and cDNA clones from an ATPase gene of A. thaliana. There is partial conservation of exon boundaries with respect to animal ATPases, although the amino acid sequence is more closely related to fungal and protozoan ATPases. The isolation of this gene opens the way for a molecular approach to the structure and function of the plant proton pump.

MATERIALS AND METHODS
Screening of the Genomic Library-A library of A. thaliana (strain Landsberg erecta) genomic D N A in bacteriophage EMBL4 (Kaiser and Murray, 1986) was provided to us as a generous gift from Dr. E.
Meyerowith, California Institute of Technology. It was utilized to infect Escherichia coli strain TG1 (a rk-derivative of JM101, Norrander et aL, 1983) and screened (Kaiser and Murray, 1986) for the ATPase gene at a density of 10,000 phage plaques/l3-mm plate. The hybridization probe was a deoxyinosine-containing (Ohtsuka et al., 1985) oligonucleotide designed by Shull and Lingrel (1986) and derived from the amino acid sequence CSDKTGT, fully conserved in the phosphorylation site (conserved region C-2 of Figs. 3 and 4) of 8557 eukaryotic ATPases with phosphorylated intermediate . It was a 2:l mixture of the degenerate sequences GT(1,C) CC(I,C)GT(T,C)TTITC(I,C)GAICA and GT(I,C)CC(I,C)GT(T,C)T TITCICTICA, which were synthesized with an Applied Biosystems DNA synthesizer and designated CA-1. The oligonucleotide was 5'end labeled using T, polynucleotide kinase and [-y-32P]ATP. Hyhridization was performed overnight at 37 "C in 6 X SSC (1 X SSC = 150 pM NaCl and 15 pM Na citrate (pH 7)) with 0.1% SDS' (Shull et al., 1985) and 2.5 ng/ml of labeled oligonucleotide (about 400,000 counts/ min/ng). Washing of the filters was performed in the same medium, first at 30 "C and then at 37 "C for two periods of 10 min. Filters were exposed during 3 days at -70 "C with an intensifying screen.
In order to confirm the identity of the positive clones and to construct a restriction map, phage DNA was purified (Maniatis et al., 1982), digested with either EglII, EcoRI, HindIII, SacI, SalI, or XbaI and subjected to Southern analysis by hybridization with either the same CA-1 oligonucleotide as above or with the degenerate oligonucleotide (designated CA-3): TC(A,G)TT(I,C)AC(I,C)CC(A,G)TC (I,C)CC(I,C)GT, which corresponds to the amino acid sequence TGDGVND, fully conserved in the ATP binding site (conserved region C-6 of Figs. 3 and 4) of eukaryotic ATPases and about 300 amino acids downstream of the phosphorylation site  Conditions for hybridization and washing were as above.
Screening of the cDNA Library-A cDNA library from leaf poly(A) RNA of A. thaliana in bacteriophage X g t 10 (Huynh et al., 1986) was a generous gift of Dr. C. R. Somerville (Michigan State University).
It was utilized to infect E. coli strain C600 (Kaiser and Murray, 1986) containing the HflA 150 mutation (Stratagene). Screening density was 7000 plaques/l3-mm plate. The hybridization probe was a 0.54 kb HindIII fragment from the largest exon (see Fig. 1) of the ATPase genomic clone, extending from nucleotide 1998 to 2542 in the sequence of Fig. 2. It was labeled by the method of Feinberg and Vogelstein (1983) as optimized by Hodgson and Fisk (1987). Hybridation was at 37 "C in 6 X SSC with 0.1% SDS, 50% formamide, and 5 ng/ml of the labeled probe (about 200,000 counts/min/ng). Washing of the filters was performed in 1 X SSC with 0.1% SDS during five periods of 15 min at 37 "C.
Restriction Mapping and Sequencing-The 3.5-kb EcoRI insert of the positive cDNA clone and the 9.5-kb EcoRI fragment from the positive genomic clone 28 (see Fig. 1) were isolated from low melting point agarose gels (Langridge et al., 1980) and subcloned into pBS (k) phagemid vectors (Stratagene) for restriction mapping. Sequencing was partially effected by subcloning internal restriction fragments into the same plasmids and more exhaustively by the generation of unidirectional deletions with exonuclease I11 (Henikoff, 19841, utilizing the Bluescript Exo/Mung Sequencing System from Stratagene. Single strand DNA was prepared by infection with helper phage as described in the Stratagene instruction manual. The dideoxynucleotide method (Sanger et al., 1977) was utilized with T7 DNA polymerase (Tabor and Richardson, 1987) and the Sequenase kit from United States Biochemical Corporation. 35S label (Biggin et al., 1983) and 0.2-mm gels covalently bound to the glass plates (Garoff and Ansorge, 1981) were employed. Both strands of DNA were sequenced in their entirety and all overlaps between partial sequences were established.
DNA and RNA Analysis-The Columbia strain of A. thaliana was obtained from Dr. E. Meyerowitz (California Institute of Technology). The plants were grown at 23 "C under continuous light on sterile standard plant soil. Total RNA (Wadsworth et al., 1988), Poly(A)+ RNA (Aviv and Leder, 1972) and DNA (method of Hattori et al., 1987, but with the CsCl gradient of Weeks et al., 1986) were prepared from the aerial part of the plants.
Sequence Analysis-The set of programs of the University of Wisconsin Genetic Computer Group (Devereux et al., 1984) was employed. Sequence similarity was scored according to Gribskov and Burgess (1986), with groups of similar amino acids derived from the mutational difference matrix of Dayhoff et al. (1978). These groups are: (M,I,L,V,F), (F,Y,W), (S,T,P,A,G,), (N,D,Q,E), (H,R,K), and (C). Secondary structure prediction was performed by the empirical method of Chou and Fasman (1978) as improved by Garnier et al. (1978). Transmembrane a-helices were predicted by the method of Kyte and Doolittle (1982) as modified by Engelman et al. (1986).

RESULTS
Isolation of an ATPase Gene from A. thaliana-After screening 100,000 plaques of the genomic EMBL4 library (equivalent to more than 10 genomes worth of independent inserts), we found six positives phages corresponding to three different overlapping clones. This genomic locus hybridized with the two oligonucleotide probes corresponding to the phosphorylation and ATP binding sites of eukaryotic ATPases (Fig. 1). We sequenced about 8 kb of genomic DNA surrounding the hybridizing region (from the EcoRI site upstream of the coding region to the downstream HindIII site, see Fig. 1). An open reading frame was found which contained all the sequences conserved in ATPases with phosphorylated intermediate  but which was interrupted by putative introns. In order to confirm this point, we utilized a probe from a large putative exon to isolate a complete clone from a cDNA library (Fig. 1). Sequencing of this clone confirmed the reading frame and intron positions deduced from the genomic sequence (Fig. 2).
The arrow at the beginning of Fig. 2 indicates the first nucleotide of the cDNA clone found in the genomic sequence. The first 320 nucleotides of the cDNA clone could not be found in more than 4 kb upstream sequence of the genomic clones. In addition, there is no consensus sequence for splicing (see below) at the position of the arrow. Therefore, these first 320 nucleotides of the cDNA probably correspond to a cloning artefact. The predicted functional ATG is located at position 366 of the cDNA. It satisfies the two conditions of optimal context (AXXATGGC) and longest open reading frame (Joshi, 1987a). In addition it is conserved in two other isoforms of the ATPase partially sequenced (see below). The putative polyadenylation signal AATAAA (Joshi, 1987b) was found 50 base pairs upstream of the poly(A) tail (Fig. 2).
Amino Acid Sequence Corresponds to Plasma Membrane H+-ATPase-The predicted amino acid sequence corresponding to the isolated genomic and cDNA clones was compared with other sequences of ATPases with phosphorylated intermediate by utilizing the programs Bestfit and Gap (Devereux et al., 1984). The available sequences were: H+-ATPases from the fungi S. cereuisiae (Serrano et al., 1986), Neurospora crmsa (Addison, 1986;Hager et al., 1986), and Schizosaccharomyces pombe (Ghislain et al., 1987), probable H+-ATPase from the protozoan Leishmania donovani (Meade et al., 19871, (Na'l K')-ATPase (Shull et al., 1985), (H+/K+)-ATPase (Shull and Lingrel, 1986) and Ca2'-ATPase (MacLennan et al., 1985) from animal cells and K+-ATPases from the bacteria E. coli  (Hesse et al., 1984) and Streptococcus faecalis (Solioz et al., 1987). The cloned Arabidopsis ATPase was much more related to fungal and protozoan plasma membrane H+-ATPases (57-60% similarity, 30-32% identity) than to either bacterial K+-ATPases or animal ATPases involved in Na+,K+, and Ca2+ transport (46-49% similarity, 15-19% identity). This suggests that the plant sequence corresponds to a plasma membrane H+-ATPase. Fig. 3 shows the alignment of the ATPase sequences from S. cerevisiae and Arabidopsis. In addition to the regions conserved in all ATPases with phosphorylated intermediate, the hydrophobic stretches proposed to constitute transmembrane a-helices show extensive similarities in both enzymes. These regions of the sequence are usually not conserved between different members of the ATPase family . Most of the differences between the yeast and plant ATPases occur in the Nand C-terminal domains.
More definitive evidence for the identity of the cloned gene is provided by the presence in the Arabidopsis ATPase sequence of six tryptic peptides sequenced by Schaller and Sussman (1988) from the purified H'-ATPase of oat root plasma membranes (Fig. 3). 72 out of 81 amino acids are identical between the Arabidopsis sequence and the oat peptides, 5 of the 9 changes corresponding to conservative replacements. This close similarity is observed even in peptides covering regions not conserved in the family of ATPases with phosphorylated intermediate , as peptide c and most of peptide h. Therefore, the cloned Arabidopsis ATPase probably corresponds to a plasma membrane H' -ATPase of this plant.
Partial Corzseruatiort of Exon Boundaries in Eukaryotic AT-Pases-The 15 introns interrupting the coding sequence of the Arabidopsis ATPase contain the invariant splice junction sequences GT (at 5') and AG (at 3') (Fig. 2). The surrounding nucleotides also conform to the consensus sequences of plant d i c e iunctions (Brown, 1986). Fourteen of the 15 introns are small (69-222 base pairs), as seems to be the rule for Arabidopsis genes (Chang and Meyerowitz, 1986;Ludwig et al., 1987;Nairn et al., 1988). However, the first intron is 1012 base pairs long, the largest intron found up to now in Arabidopsis.
In order to compare the exon boundaries of different eukaryotic ATPases, we have aligned the sequences and located the position of introns (exon boundaries) in the Arabidopsis ATPase, Neurospora H+-ATPase (Addison, 1986;Hager et al., 1986), rabbit Ca2+-ATPase (Korczak et al., 1988), and human (Na+/K+)-ATPase (Ovchinnikov et al., 1988). As the overall similarity between distant members of the ATPase family is low, alignment of the different sequences on the only basis of similarity may be misleading. On the other hand, even distantly related proteins of a given family share a common folding pattern (Lesk and Chothia, 1986). Therefore, we have combined the alignment based on sequence similarity with that based on predicted secondary structure (Fig. 4). First, the positions of the predicted transmembrane a-helices (H-1 to 9) and of the motifs conserved in all the ATPases (C-1 to 6, see Fig. 3) were aligned for all the enzymes. Then the intervening sequences with low similarity were aligned according to the predicted regions of a-helix and strands of psheet. Due to the uncertainties of present methods of structure prediction (Kabsch and Sander, 1983), the results of such analysis have to be received with caution. However, the pattern emerging from Fig. 4 is that, allowing for some insertions and gaps, there is good conservation of predicted secondary  I I I I I I l l I I I I I I I  I I   I I I I I I I I I I I I  I I I I I I I I I  I I  I I   I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I  I  I l l I I I I I I I  I I I I I I I  I I I I I I I I  . . .

FIG. 3. Alignment of the amino acid sequences of A. thaliana (upper lines) and S. cerevisiae (lower lines) plasma membrane H+-ATPases.
The programs Bestfit and Gap (Devereux et al., 1984) were employed. Similar amino acids are linked by dashes. The positions of the hydrophobic stretches proposed to constitute transmembrane a-helices (H-1 to 9) and of the motifs conserved in eukaryotic ATPase (C-1 to 6) are underlined. The peptides sequenced from the oat root plasma membrane ATPase (Schaller and Sussman, 1988) are aligned on top of the Arabidopsis sequence.
structure in all the ATPases. The main common features of the prediction are: 1) the N-and C-terminal domains contain mostly a-helices; 2) the domain between H-2 and H-3 contains mostly @-sheets; 3) the central domain between H-4 and H-5 contains both a-helices and @-sheets; 4) there is a small region of @-sheets before C-6. In addition, there is partial conservation of the position of exon boundaries in relationship to the predicted structural elements of the ATPases. Examples of conserved introns include those after H-2, H-6, H-7, and H-9, those before H-1 and H-5, and those close to C-3, C-5, and Evidence for ATPase Isoforms-That the cloned ATPase gene is expressed in plants was demonstrated not only by the isolation of a cDNA clone but also by Northern analysis (Fig.  5A). On the other hand, Southern analysis under conditions of high stringency indicated the presence of some other related genes in the Arabidopsis genome (Fig. 6B). In addition to the fragments predicted from the restriction map of the cloned gene (the stronger bands in Fig. 5B, see Fig. l ) , Sac1 and Sal1 produce one and Hind111 three additional fragments which give a weaker hybridization signal.
This suggests that the number of related genes is at least three. Accordingly, we have C-6. isolated two different cDNA clones which cross-hybridize with the original cDNA and genomic clones. By using these cDNA as probes in a Southern similar to that of Fig. 5B, the weaker bands now become the predominant ones (data not shown). Partial sequencing (corresponding to about 400 amino acids) indicates that the proteins codified by these two genes are 96% identical to each other and slightly less related (90-92% identity) to the sequence of Fig. 2. Therefore, these two other cDNAs probably correspond to two closely similar isoforms of the H'-ATPase. A full description of these additional isoforms will be reported elsewhere.
During the process of reviewing the present manuscript, we became aware that two H+-ATPase cDNA clones from Arabidopsis have been isolated in the laboratory of M. Sussman (University of Wisconsin). The sequence of one of the two is published (Harper et al., 1989), and we have obtained this complete sequence and partial sequence of the other cDNA from Dr. M. Sussman. The AHA-1 and AHA-2 cDNA clones of Harper et al. (1989) correspond to the two closely related cDNA clones we have partially sequenced.
It is not clear why these other isoforms were not found in the genomic library, where only overlapping clones of the  (Addison, 1986;Hager et al., 1986); Ca, rabbit Ca*'-ATPase (Korczak et al., 1988); NaK, human (Na'/K')-ATPase (Ovchinnikov et al., 1988). Sequences were aligned by fixing the positions of hydrophobic stretches (H-1 to 9) and conserved motifs (C-1 to 6) and then taking into account the similarities in predicted secondary structure.  same isoform were isolated. Apparently, there is a bias in the genomic library utilized for the present work against the other ATPase genes. This could also explain why the gene for the Ca2'-ATPase recently identified in plant plasma membranes (Briars et al., 1988) could not be found in the genomic library.
This gene should hybridize with the oligonucleotide probes employed in the screening, and therefore, it is probably also not well represented in the library.

DISCUSSION
The close similarity in amino acid sequence between fungal and plant plasma membrane H'-ATPases revealed by the present work supports the suggestion made on the basis of biochemical studies (Serrano, 1984) that these enzymes correspond to the same type of proton pump. The protozoan ATPases could also correspond to the same type of enzyme (Meade et al., 1987), but there are no biochemical studies to support this point. Now that phylogenetically distant H' -ATPases have been sequenced, it is interesting to correlate electrogenic proton transport with the presence of conserved polar groups in the proposed transmembrane helices. The most conspicuous feature is an aspartate residue in the middle of proposed transmembrane helix H-6 (Asp-685 in Arabidopsis, Asp-730 in Saccharomyces), which is conserved in fungal, plant, and protozoan H'-ATPases. Recent evidence from sitedirected mutagenesis of the yeast ATPase indicate that this residue is essential for activity.2 It was interesting to compare the position of exon boundaries in plant and animal ATPases in order to investigate if, as observed for other protein families (Gilbert et al., 1986), there is a correspondence between exons and structural modules. The partial conservation of exon boundaries detected by comparison of the structure of Ca"-ATPase and (Na'/K')-ATPase genes (Korczak et al. 1988) can now be extended to the plant H'-ATPase. The 15 introns of the Arabidopsis gene are located in positions of the predicted secondary and transmembrane structure which are roughly equivalent to the intron positions of animal ATPases (Fig. 4). The small differences in exon boundaries can be explained by the existence of mechanisms that alter the position or existence of introns during evolution (Traut, 1988). Sliding of exon-intron junctions has been detected in different protein families and proposed to contribute to evolutionary change (Craik et al., 1983). Therefore, it has been suggested (Traut, 1988) that the observed relationship of exons to protein structure represents a degenerate state of an ancestral correspondance between exons and structure-function modules in proteins.
From the 25 predicted transmembrane helices of plant and animal ATPases, only two (H-8 of Arabidopsis ATPase and H-4 of Ca2'-ATPase) are clearly interrupted by introns. Other five-transmembrane segments are interrupted near their N or C termini (Fig. 4). This small fraction of interrupted hydrophobic stretches is within the range found for an extensive survey of membrane proteins (Argos and Rao, 1985). By taking into account the limitations of the methods employed to predict transmembrane helices (Engelman et ~l . , 1986), the observed intron positions in the ATPase genes support a correspondence between exons and transmembrane segments (Argos and Rao, 1985;Kopito et al., 1987). Both in soluble (Craik et al., 1983) and membrane (Argos and Rao, 1985) proteins splice junctions seem to map on surface loops which separate elements of secondary structure. The most conserved regions of the ATPases ((2-1 to C-6, Fig. 4) are predicted to be surface loops, and therefore, introns very often map in these regions with some displacement between the different ATPases.

Hi-ATPase Gene
and developmental stages. The identification of isoforms of the plant H'-ATPase suggest that a similar complexity of regulated expression may be present in this case. It is interesting that despite the general observation that Aradibopsis genes contain few introns and few or only one isoforms (Meyerowitz, 1987), a relative complexity is found for the H' -ATPase gene. This may be related to the important regulatory role of this enzyme in plant physiology. In addition, it may be expected that even more introns and isoforms will be found in the H'-ATPase genes of other higher plants.