Gene Structure of the Human Mitochondrial Adenosine Triphosphate Synthase B Subunit*

The mitochondrial ATP synthase B subunit is encoded by a nuclear gene and assembled with the other subunits encoded by both mitochondrial and nuclear genes. As the next step in the analysis of the molecular mechanisms coordinating the two genetic systems, the gene for the human B subunit was cloned, and its struc- ture was determined. The gene contains 10 exons, with the first exon corresponding to the noncoding region and most of the presequence which targets this protein to the mitochondria. Eight Alu repeating sequences including inverted repeats were found in the 5’ upstream region and introns. An S1 nuclease protection experiment revealed two initiation sites for the transcription. A typical TATA box was not present at about 30 base pairs upstream from either initiation site. Three CAT boxes (CCAAT) were found between the two initiation sites. In addition, one CAT box was found 41 base pairs upstream from the first initiation site. Two GC boxes (potential Spl binding sites) were located in the 5’ upstream region, one of them linked to Alu repeating sequences. For determination of the promoter activity, fragments of various length from the 5‘ upstream region were fused to a chloramphenicol acetyltransferase gene and transfected into cul- tured cells. This experiment showed the existence of an enhancing structure(s) for transcription between nucleotide -400 and -1100 in the upstream region. Mitochondrial ATP synthase (FoFl) catalyzes ATP for-mation, using the energy of proton flux through the inner membrane during oxidative phosphorylation (1,2). Two subunits

The mitochondrial ATP synthase B subunit is encoded by a nuclear gene and assembled with the other subunits encoded by both mitochondrial and nuclear genes. As the next step in the analysis of the molecular mechanisms coordinating the two genetic systems, the gene for the human B subunit was cloned, and its structure was determined. The gene contains 10 exons, with the first exon corresponding to the noncoding region and most of the presequence which targets this protein to the mitochondria. Eight Alu repeating sequences including inverted repeats were found in the 5' upstream region and introns. An S1 nuclease protection experiment revealed two initiation sites for the transcription. A typical TATA box was not present at about 30 base pairs upstream from either initiation site. Three CAT boxes (CCAAT) were found between the two initiation sites. In addition, one CAT box was found 41 base pairs upstream from the first initiation site. Two GC boxes (potential Spl binding sites) were located in the 5' upstream region, one of them linked to Alu repeating sequences. For determination of the promoter activity, fragments of various length from the 5' upstream region were fused to a chloramphenicol acetyltransferase gene and transfected into cultured cells. This experiment showed the existence of an enhancing structure(s) for transcription between nucleotide -400 and -1100 in the upstream region.
Mitochondrial ATP synthase (FoFl) catalyzes ATP formation, using the energy of proton flux through the inner membrane during oxidative phosphorylation (1,2). Two subunits of mammalian FoFl are encoded by a mitochondrial gene (3) and the other subunits (7-12 subunits) by a nuclear gene (4). The / 3 subunit is encoded by the nuclear genome, synthesized in the cytosol, imported into mitochondria, and then assembled with the other subunits (5). The numbers of mitochondria per cell vary greatly depending on the developmental stage, cell activity, and type of tissue (6, 7). These facts suggest a functional interaction between the two genetic systems. However, the molecular mechanism for coordinating the two genetic systems is unknown. To understand the molecular basis of this coordination requires an analysis of the regulatory system for the subunit encoded on the nuclear * This research was supported by Grant 62617006 from the Ministry of Education, Science, and Culture of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked ''advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequencefs) reported in thispaper has been submitted 503906.
to the GenBankm/EMBL Data Bank with accession numberfs) $ To whom correspondence should be addressed. genome. As the first step, we have cloned cDNA of the human p subunit (8).
All subunit structures of FoFl reported to date, for either prokaryotes or eukaryotes, are very similar (1, 2). In particular, the primary structure of the fi subunit is highly conserved between various species. In addition, it contains sequences very similar to those in parts of some nucleotide binding proteins, such as recA protein, adenylate kinase, and the ras gene product (2). Thus, it may be possible to correlate function of each region of the human / 3 subunit with exon structure.
Here, we report the organization of the gene, some structures involved in its expression, and the existence of an enhancing structure for transcription in the 5' upstream region.

EXPERIMENTAL PROCEDURES
Gene Cloning and DNA Sequencing-Genomic DNA encoding the 0 subunit was isolated from a human genomic library, using the 0 subunit cDNA (8) (EcoRI fragment; 1.8 kilobase pairs) as a hybridization probe. The library was a gift from Dr. Nojima (Department of Pharmacology, Jichi Medical School). Human genomic DNA was partially digested with Hue111 and AluI and ligated to the EcoRI sites of a X phage DNA, Charon 4A, with EcoRI linkers. Phage DNA was purified from positive plaques as described (9). DNA fragments were subcloned into plasmid pUC18 or pUC19 (Fig. 1). The nucleotide sequence was determined by the dideoxy nucleotide chain termination method using M13 phage single-stranded DNA as template (10, 11). Most of the sequence was determined by the shot-gun sequence method using sonication for shearing the fragments (12).
CATAssay-An EcoRI-BamHI (nucleotide +432, Fig. 2) fragment containing the 5' upstream region, the first exon, and part of the first intron of the fi subunit gene was subcloned into plasmid pTZ18R.
The plasmid was digested with EcoRI or DraI (nucleotide -355), shortened by treatment with Ba131 exonuclease, and then digested with BamHI. The ends of the digested fragments were filled in with T4-polymerase. ligated with HindIII linkers (dCAAGCTTG), digested with HindIII, and then fractionated by agarose electrophoresis. The isolated fragments were ligated to HindIII-digested pMLCAT (15,161. pSV2CAT was used as a positive control (17). The plasmids were isolated by the alkaline sodium dodecyl sulfate method (9), digested with DNase-free RNase A, and purified by CsCl gradient centrifugation (9) followed by Sephacryl S 2000 column chromatography (Pharmacia LKB Biotechnology Inc.). These plasmids were RNA-free. The orientations and lengths of the inserts were deter-' The abbreviation used is: bp, base pair(s).  The fragments were subcloned as shown by long arrows. The small fragments were obtained by sonication or with restriction enzymes for the nucleotide sequencing. Some parts were determined by using synthetic oligonucleotides (17 bases in length; synthesized with an automatic DNA synthesizer, Applied Biosystems Model 380B, Foster City, CA) as primers. The direction and extent of sequence determinations are indicated by short arrows. +, h, and indicate the nucleotide sequence obtained by sonication, with restriction enzymes, or by using a synthetic oligonucleotide as a primer, respectively. The locations of the Alu repetitive sequences in the human F1-ATPase j 3 subunit are also shown by arrows.

11257
--1 n~n t mined by electrophoresis after digestion with selected restriction enzymes. Plasmid DNA (10 pg) was transfected into HeLa or mouse A9 cells (2 X lo6 cells in 9-cm dishes) (14, 18). After 2 days, the cells were lysed, and the lysates (2 mg protein mg/ml) were incubated with acetyl-coA and ["C]chloramphenicol for 1 h at 37 'C (14). The acetylated chloramphenicol was separated by thin layer chromatography on Merck high performance thin layer chromatography precoated plates of Silica Gel 60 with a concentrating zone. For quantitative analysis, the silica gel around the radioactive spot was scraped off and assayed for radioactivity in a scintillation spectrometer.

Molecular
Cloning-A human genomic gene library constructed in Charon 4A was screened with 32P-labeled human F1 / 3 subunit cDNA (8). Out of a total of 1,500,000 plaques, we identified five positives. The positive plaques were purified by successive rounds of plaque hybridization. The phage DNAs were isolated and subjected to "Southern" analysis using 5' region (EcoRI-Aut11 fragment; 280 bp) and 3' region (SmaI-EcoRI fragment; 886 bp) of the cloned cDNA (8) as hybridization probes. Three kinds of clones were isolated. DNA from the first clone hybridized with 5' and 3' regions of the cDNA. The second clone appeared to contain an allelic gene of the first one, because all but the KpnI fragments were identical. DNA in the third clone was shorter than that in the      others, so this gene may be a pseudo-gene.' Therefore, the structure of the DNA in the first clone was determined in detail.

A C C C T G T C C C G T C C A C C G A A A A T A C A G C G G A A G G A G A -
Organization of the human F1 ATPase /3 subunit gene is shown in Fig. 1, and most of the nucleotide sequence is shown in Fig. 2. The intronlexon junctions were determined by direct comparison of the nucleotide sequences with the cloned cDNA. The sequence at the 5' end of the cDNA was not found in the gene. In addition, the sequence at the point of divergence of the cDNA and genomic sequences did not agree with the consensus sequence (AG/GT rule) at intronlexon junctions (19). An S1 mapping experiment showed that mRNA from HeLa cells hybridized with the fragment from the genomic gene (Fig. 4). Therefore, we conclude that 18 nucleotides encoding the presequence were rearranged, probably during construction of the cDNA library. The presequence should be MLGFVGRVAAAPASGALRRLT-PSASLPPAQLLLRAVRRRSHPVREYAAQ instead of PPAQLLLRAVRRRSHPVREYAAQ (8). Sequences at the other intronlexon junctions were completely consistent with the "AG/GT" rule. The gene was composed of 10 exons. The first exon contained the noncoding region and most of the presequence which targets this protein to the mitochondria.
The amino acid sequence of the human Fl /3 subunit is highly homologous to that of tobacco (20) (Fig. 3B). However, the gene organization, junction points, and the length of exons and introns are quite different from that of the tobacco gene (Fig. 3A). Four of the splice junctions in the human gene were close to their counterparts in tobacco except that they were shifted in the 3' direction by three or four codons (Fig. 3B). Structure in Noncoding Region-A stretch of 29 alternating Ts and Gs began at nucleotide -1558. This sequence has the potential to form a left-handed helical structure or Z-DNA (21). Eight Alu repeating sequences (22) are present in the gene (Figs. 1 and 2). Three inverted repeats are located in the 5' upstream region. The seventh intron contains two direct repeats of the Alu sequence.
Structure Related to Transcription-The sites of transcription initiation were determined by S1 nuclease mapping analysis of the 5' end of the gene. The S1 nuclease protection experiment showed the existence of two initiation sites for transcription (Fig. 4). The first initiation site is at +1 and the second at +197. Putative TATA boxes (19) were not found around 30 nucleotides upstream from either initiation site. Three CAT boxes (CCAAT) (19,23,24) were found between the first and second initiation sites, one at -41. Two GC boxes (CCGCCC) (23, 25) were found, one of them linked to Alu repeating sequences (Fig. 2). Cis-acting sequences involved in the regulation of transcription were assayed by measuring expression of CAT in transient expression assays. Various lengths of the 5"flanking

MTSLWGKBTGCKLFKFRVAAAPASGALRRLTPSASL-
H. Tomura, unpublished result. region of the /3 gene were inserted into the DNA of a plasmid which harbors the gene for chloramphenicol acetyltransferase. These plasmids were transfected into HeLa cells or A9 cells by calcium phosphate co-precipitation (14, 17).
The proximal 200-bp segment of 5"flanking DNA region was sufficient to elicit measurable levels of expression (pHB3CAT). The fragment between 400 and 1100 bp (pHB1CAT) enhanced the expression 9-fold over the level obtained with pHB3CAT (Fig. 5). Similar results were obtained on transfection into mouse A9 cells (data not shown). These results suggest the existence of an enhancing structure upstream from about -400. Determination of the transcriptional initiation sites by S1 mapping. The right panel shows the outline of the experiment described under "Experimental Procedure." The hybrid protected from S1 nuclease digestion was subjected to electrophoresis in a sequencing gel (6% acrylamide gel with buffer gradient). Lane 1, the sequence ladder obtained by with a mixture of four bases. Lane 2, fragment markers obtained by digestion of 6x174 DNA with HaeIII.
Lane 3, labeled fragment hybridized with poly(A+) RNA (7 pg) prepared from HeLa cells. Lane 4, fragment hybridized with 7 pg of yeast tRNA (control). The two arrows indicate the two fragments protected against S1 nuclease in the lane 3.

DISCUSSION
Organization of the Gene-The FoF, complex is widely distributed in most organisms. The /3 subunit is a catalytic center of FoFI, and its structure is highly conserved (2). Thus, the organization of the gene should provide evidence of evolutionary relationships. Boutry and Chua (20) described organization of the gene for the @ subunit of Nicotianuplumbaginifolia. The length and number of introns and exons differ from those of the human gene. Four intronlexon junctions of the human gene were close to their counterparts in tobacco, but shifted three or four codons in a 3' direction. The shifting may be explained by mutations that block a splice junction, unmasking a cryptic slice site that adds or deletes small amounts of protein (26). However, it is difficult to explain why the splice junctions were shifted in the same direction in each case.
Most mitochondrial proteins encoded in the nuclear genome have transient presequences (5). These peptides, which direct proteins into the mitochondria, are sometimes embedded in a cytosolic protein (27). Since such targeting sequences are often found even in Escherichia coli. (28), they may have been fused to an existing protein during evolution making it into a mitochondrial protein. According to the "exon-shuffling" model (29), the addition of such an additional function should be provided in a separate exon. In fact, the first exon in the gene for the human ATP synthase /3 subunit corresponds to the presequence. According to the "module hypothesis" by Go (30), an exon is a structural unit which makes a compact peptide. This suggests the compact structure of the presequence peptide. Nucleotide binding proteins have regions of sequence similarity which constitute nucleotide binding sites. In the /3 subunit, such regions are found in exons 5 and 6 (8). The Alu repeating sequences are located in introns 3 and 7. Thus, the exons that code for the nucleotide binding sites may have been rearranged at the sites of the repetitive sequences during evolution. Our finding of a pseudo-gene lacking exons 4-73 is consistent with this hypothesis. The cluster of acidic and basic amino acid residues in the @ subunit is proposed to H. Tomura and S. Ohta, unpublished observation. include the conformational change by a proton flux (31). The region for the acid-base cluster was found in exon 9, also located between two Alu sequences. Expression of the Gene-ATP synthase and some enzymes related in the respiratory chain are complexes of the products of both nuclear and mitochondrial genes (4,32). Mitochondria vary in number and morphology depending on intercellular and extracellular conditions. However, the fundamental molecular mechanism for the coordination of the two genetic systems is unknown. Mammalian cells in culture are good experimental systems to study this problem for several reasons. Mammalian mitochondrial DNA has been studied as extensively (33-35) as the yeast one. Mammalian DNA fragments can be transfected into cultured cells and promoter activity easily estimated (18). Several transcriptional factors have been purified from mammalian cells. In addition, it may be possible to study nuclear-mitochondrial interactions using some abnormal mitochondria from human patients with mitochondrial myopathy (36). Therefore, we used a human cell line for our analysis. S1 mapping revealed two initiation sites for the transcription. The role of the two start sites in the regulation of transcription is unknown. Three CAT boxes (CCAAT) are located between the first and second initiation sites. If a transcriptional factor is bound to one of those CAT boxes, transcription from the second site may be enhanced. A stretch of alternating Gs and Ts was found in the 5"flanking region. Although the tendency of this structure to form a left-handed helix or Z-DNA is weaker than that of (GC),, the sequence (GT), is widely distributed in eukaryotic genome (21). The relations of the left-handed helix of DNA with the potential effect on transcription were discussed (21). The role of the alternating G T structure in the expression of the p subunit gene is under investigation.
Three inverted Alu sequences were found in the 5' upstream region, one of them linked with a GC box. The role of the repetitive sequence in regulating gene expression is not clear, but the CAT assay showed that the fragment (-400 to -1100) containing the Alu repeats exhibited enhanced transcription. Further analysis is required to determine the roles of these structures. In any event, our data demonstrate that full expression of the gene required more than 400 bp of upstream sequence in the gene of a "house-keeping'' enzyme. It will be important to isolate the trans-acting transcriptional factors involved. Analysis of the gene structure may allow us to elucidate the molecular basis of the regulation and of coordination of the synthesis of the ATP synthase complex in cultured cell systems.