Structural Organization of the Gene for the E l a Subunit of the Human Pyruvate Dehydrogenase Complex*

The structural organization of the X-linked gene for the E l a subunit of the human pyruvate dehydrogenase complex has been determined by restriction endonuclease mapping and DNA sequence analysis of overlap- ping genomic clones. The gene is is approximately 17 kilobase pairs long. It contains 11 exons ranging from 61 to 174 base pairs and introns ranging from 600 base pairs to 5.7 kilobase pairs. All the splice donor and acceptor sites conform to the GT/AG rule. The transcription initiation site was determined by S1 nu- clease mapping. The DNA sequence around this site is very GC-rich. A “TATA box”-like sequence and a “CAAT box”-like sequence are present 24 and 113 bases upstream from the cap site, respectively. Also upstream from the cap site are several sets of inverted repeats, direct repeats, several sequences resembling the transcription factor Spl binding site, a glucocor-ticoid-responsive element, and two CAMP receptor binding sites. The pyruvate dehydrogenase (PDH)’ complex is one of the major enzyme systems involved in the control of aerobic energy metabolism. This complex, with a M, of about 7 x lo6, converts pyruvate to acetyl-coA within the mitochondrial matrix. The catalytic activity of the complex is maintained by three enzyme components: pyruvate decarboxylase (El, EC 1.2.4.1), dihydrolipoamide acetyltransferase (E2,

The main mechanism for regulation of PDH complex activity is a cycle of phosphorylation and dephosphorylation (2). Phosphorylation occurs on 3 serine residues on the Ela subunit by a specific PDH kinase. However, phosphorylation of only 1 of these residues is required for inactivation of the whole complex (3). The removal of phosphates by a specific phosphatase restores activity.
Genetic defects in the PDH complex are reported to be the most common cause of primary lactic acidosis in humans (4).
*This work was supported in part by the National Health and Medical Research Council of Australia. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ Present address: Department of Pediatrics, School of Medicine, Tohoku University, Sendai, Japan.
ll To whom correspondence and reprint requests should be addressed.
In the majority of cases, the basic defect appears to be in the Elcv subunit.
We have previously reported the isolation of cDNA clones corresponding to the entire length of the mRNA for the PDH E l a subunit ( 5 ) . In addition, the gene for the functional E l a subunit has recently been localized to the p22.1-22.2 region of the human X chromosome (6). Here we report the gene structure of the human PDH Ela subunit. This will facilitate further studies on the expression and regulation of this gene and will provide necessary information for analysis of mutations in patients with P D H E l a deficiency.
Isolation and Screening of Cosmid Genomic Clones-The construction of a human leukocyte genomic DNA library using the cosmid vector pCVOOl has been described previously (7). Approximately 90,000 recombinant colonies were transferred to nitrocellulose membranes and hybridized to a [32P]oligo-labeled human PDH E l a subunit cDNA probe (clone PDH1) (5). DNA was extracted from positive colonies by alkaline lysis (8) and characterized by restriction endonuclease mapping.
Restriction Endonuclease Mapping and Southern Blotting-Initial restriction enzyme mapping of cosmid clones was carried out by a series of partial digestions of linearized cosmid DNA (cleaved at the Sal1 vector site) followed by electrophoresis on 0.7% agarose gels and hybridization to [32P]oligo-labeled PstI (2.3 kb) and SalI/ClaI (2.5 kb) fragments which were isolated from the vector pCVOO1. These fragments flank both ends of the polylinker cloning site of pCVOO1. Restriction sites were positioned with increasing order of fragment sizes from the probe. In addition, restriction enzyme sites were determined more precisely by a series of single and double digests 18).
DNA fragments containing the PDH E l u gene were determined by blotting the restriction enzyme-digested DNA onto Genescreen Plus membranes followed by hybridization to a [32P]oligo-labeled P D H E l a subunit cDNA probe. A Hind111 fragment (approximately 3.2 kb in size) containing the promoter region of the gene was also identified by hybridization to a [y-"PIATP end-labeled synthetic oligonucleotide probe specific to the 5'-untranslated end of the PDH E l a subunit cDNA. Exon-and promoter-containing fragments were subsequently subcloned into either Bluescribe M13-or pUC9 vectors for further analysis.
Restriction enzyme sites of exon-containing fragments were mapped by a series of single and double digests. Exons were located by hybridization to [32P]oligo-labeled exon regions (determined by sequencing genomic DNA as described below) or using synthetic oligonucleotide probes (determined from the PDH E l a subunit cDNA sequence). In addition, exon locations were confirmed on the basis of unique restriction endonuclease sites deduced from the cDNA sequence.
DNA Sequence Analysis-To determine exon regions, cosmid DNA was digested with Sau3Al restriction endonuclease and "shotgun" cloned into the BamHI site of M13mp19 (9). Approximately 1600 clones were screened for the presence of exon segments using PDH E l a subunit cDNA or specific cDNA restriction endonuclease fragments as probes. Single-stranded DNA isolated from positive clones was sequenced using the Sequenase kit. The products of the sequencing reactions were separated by electrophoresis on denaturing 5% polyacrylamide/urea gels. Gels were dried with heat under vacuum and autoradiographed. In all, approximately two-thirds of the PDH Ela cDNA was covered using this approach. Alternatively, for exons containing unique restriction endonuclease sites (deduced from the cDNA sequence), the appropriate enzymes were used to generate DNA fragments in the cosmid DNA so that, after cloning the purified fragments into M13mp18 or -mp19, the exons would be adjacent to the priming site in the vector for sequence analysis. In addition, some exons were sequenced directly from double-stranded cosmid DNA using exon-specific synthetic oligonucleotide primers specific to both ends of the exon. In total, ten 18-base primers were used.
The nucleotide sequence of the promoter region was determined by inserting double-stranded genomic DNA into pUC9 and sequencing both DNA strands using universal sequencing primers and three 17-mer synthetic oligonucleotide primers. After determining approximately 200 bases using the first primer (specific to the 5'-untranslated end of the PDH E l a cDNA), a new 17-base primer (specific to the 5'-end of the previous nucleotide sequence) was used to sequence approximately another 200 bases and so on. Using this approach, 767 bases of the promoter region were sequenced.

RESULTS
Isolation and Characterization of the Gene for the Human P D H El a Subunit-A human leukocyte genomic DNA library constructed using the cosmid vector pCVOOl was screened with the human PDH Ela subunit cDNA probe. Two overlapping positive clones, cPDH 4a and cPDH 3, spanning approximately 50 kb of total genomic DNA were isolated and mapped ( Fig. 1). Southern blot analysis of digested cosmid DNA probed with the full length PDH Ela cDNA probe indicated that the major portion of the gene was localized within two EcoRIIBamHI restriction endonuclease fragments of 5.2 and 4.0 kb. Both fragments are contained within the region of overlap of the two cosmid clones. The cDNA probe also hybridized to an 8.8-kb EcoRI fragment located outside the overlap region on cPDH 4a, indicating that the gene extends into this fragment.
Comparison of Southern blot analysis of genomic DNA and cosmid DNA, digested with a variety of restriction enzymes and hybridized to the PDH Ela cDNA probe, suggested that the whole gene is contained completely within these fragments. The promoter region of the gene is present within the 8.8-kb EcoRI fragment and on a 3.2-kb Hind111 fragment as shown by hybridization to a synthetic oligonucleotide primer specific to the 5"untranslated end of the PDH Ela subunit cDNA (data not shown).
Detailed Structure of the Gene for the Human P D H Ela Subunit- Fig. 1 and Table I show the detailed structure of the PDH Ela gene. Exon-intron boundaries were determined by sequencing the appropriate regions of cosmid DNA selected as described under "Experimental Procedures" and aligning them with the PDH Ela subunit cDNA sequence. Exon positions were subsequently mapped by alignment of restriction enzyme cleavage sites and hybridization of exon-specific fragments or exon-specific synthetic oligonucleotide probes to Southern blotted digested cosmid DNA. The gene for the human PDH Elcv subunit contains 11 exons spanning approximately 17 kb of genomic DNA. As shown in Table I, the 11 exons range in size from 61 bp (exon 2) to 359 bp (exon 11, including the 3"untranslated region) and the 10 introns range from approximately 600 bp (introns 2 and 10) to 5.7 kb (intron 1, including the 5"untranslated region). All splice junction sequences flanking the introns conform to the consensus splice junction sequences and the GT/AG splice rule (13,14). DNA sequence analysis of the two cosmid clones also enabled us to resolve differences between the published cDNA sequences (5,18,19). We have amended our previously published sequence as shown in Table 11. The most important aspect of this correction is that the reading frame is in agreement with that of DeMeirleir et al. (19). In addition, the sequence analysis has confirmed the guanines a t positions 178,516,823, and 1156, the cytosine at position 824, and the adenine a t position 1171. Preliminary S1 nuclease (Fig. 2) and primer extension (5) experiments, to determine the transcription initiation site, suggest that the 5'-end of the PDH Ela mRNA is the adenine residue, 105 bp upstream from the translation initiation codon. Exon 1 is therefore 158 bp, contains the 5"untranslated region upstream of the initiation methionine codon, and codes for the first 19 amino acids ( Table I). Further characterization of the transcription initiation site is in progress.
The promoter region of the gene for the human PDH E l a subunit is shown in Fig.  3. A "TATA box"-like sequence (TTATTA) is present at position -29 to -24. A "CAAT box"like sequence (CCAAT) is present at -117 to -113. Two possible binding sites for the cellular transcription factor Spl (15) are present at positions -89 to -84 (GGGCGG) and -175 to -170 (CCGCCC antisense). A sequence identical with the glucocorticoid responsive element (GRE, AACCA-GATGTTCT) (16) is found at position -424 to -412. Two "CACCC box"-like sequences are present at positions -81 to -77 and -178 to -174, respectively. The CACCC box element has been described previously for other genes including the human metallothionein IIA (26) and the rat tryptophan oxygenase gene (27). When situated immediately upstream of the GRE, the CACCC box element has been implicated in the cooperative induction of transcription with the GRE (27). Whether the CACCC box-like sequences in the gene for the human PDH Ela subunit downstream from the GRE are functional or not remains to be determined. Two possible CAMP-responsive elements (CCCGCGGC) specifically resembling the activator protein 2 (AP-2) binding site (17) are present at -55 to -48 and -242 to -235. Several sets of inverted and direct repeats are also present.

DISCUSSION
The functional gene for the human PDH E l a subunit has been mapped to the p22.1 to 22.2 region of the X chromosome (6). This gene is approximately 17 kb in length and contains 11 exons as characterized from two cosmid clones.
DNA sequence analysis of exon regions in the two cosmid clones provided a means of comparing exons with the previously published P D H E l a cDNA sequences (5, 18,19). The sequence reported by Koike et al. (18) differs from ours ( 5 ) in that it lacks a 93-bp segment which would occupy position 621 to 714 of our sequence. This could be due to a sequencing error. However, the deleted segment corresponds exactly to the complete sequence of exon 6. The PDH Ela cDNA clone of Koike's group originated from a human foreskin fibroblast cDNA library, in contrast to our PDH E l a cDNA clone which originated from a human liver cDNA library. Whether the deleted segment in the cDNA sequence of Koike's group represents differential or incorrect splicing of a primary transcript is not known at this time. Another possibility is that the PDH Ela cDNA sequence reported by Koike's group has originated from a "processed" pseudogene. Examples of processed human genes include metallothionein (21), ,&tubulin (22), immunoglobulin (23), and phosphoglycerate kinase (24). We have recently reported a second locus on chromsome 4 which hybridizes with a weaker signal to the PDH Ela subunit cDNA probe (6). This locus may be a pseudogene or a gene for a closely related protein. It has not yet been determined whether sequences at the chromosome 4 locus are transcribed. Previous Northern blot analysis with the PDH E l a cDNA probe of the RNA from human tissues (skeletal muscle, cardiac muscle, liver, kidney, and brain) and human skin fibrcblast cells demonstrated two different mRNA species ( 5 ) . However, these two mRNAs which differed only in the 3'-untranslated region were both shown to be deficient in a patient with severe PDH deficiency (20), thus making it unlikely that they are derived from different gene loci. Two mRNA species were similarly detected by DeMeirleir et al. (19) in human skin fibroblast cells. In contrast, Koike's group detected only one mRNA species which may be characteristic of the HeLa cell line used to extract poly(A)+ RNA.
The transcription initiation site is located in a very GC-

A C G T
FIG. 2. S1 nuclease analysis of the 5'-end of the h u m a n PDH E l a mRNA. Lane I displays S1 mapping with the 96-bp HinfI genomic probe. Dideoxynucleotide sequencing reactions (lanes A, C, G, and 7') of the probe are also shown. The position of the 5'-end of the mRNA is indicated by the arrow. The band above is the reannealed probe. rich region. Appropriately located in the putative promoter region of the gene for the human PDH Ela subunit are sequences resembling the TATA box (13), CAAT box (131, and Spl binding elements (15). These features are characteristic of "housekeeping" genes (25). The promoter region also contains inverted and direct DNA repeats. However, the significance of these repeats, if any, is unknown. Such inverted and palindromic repeats are capable of forming secondary structures that may be involved in the control and/or modification of gene expression. The precise role of glucocorticoids and CAMP-responsive elements which are also present in the promoter region of the gene for the PDH Ela subunit is not known at this time. In preliminary experiments, dexamethasone alone had no significant effect on PDH E l a mRNA levels from human skin fibroblast cells after 24 h of treatment with this steroid.
No clinically useful restriction fragment length polymorphisms have so far been detected in the gene for the PDH Ela subunit. Thus, knowledge about the gene organization will facilitate the search for polymorphisms which may be useful for genetic studies and prenatal diagnosis.
Having characterized the normal gene for the human PDH Ela subunit, further studies on the expression and regulation of this gene are underway. These studies will provide necessary information for the analysis of mutations and their effects in patients with P D H E l a deficiency.

AP-2 C C G G C G C A G C G C A T G A C~~~G A C T C T G T C A C G C C G C G G T G C G A C T G A G G C G T G G C
-30 -1 .l I """~""~_"""~"" I I R -6

" T A T A
FIG. 3. Nucleotide sequence analysis of the promoter region of the gene for t h e h u m a n PDH E l a subunit. The promoter region of the gene is numbered -767 to -1. +1 to +I3 denotes part of the first exon. The boxed areas denote the following elements: TATA box, CAAT box, S p l binding sites (Spl), consensus sequence for glucocorticoid responsive element (GRE), and CAMP receptor binding sites (AP-2), respectively. The brackets with the intervening broken line marked IR-1 to -6 denote inverted repeats. Paired arrows DR-1 and -2 denote direct repeats.