Nucleotide sequence of a cDNA for branched chain acyltransferase with analysis of the deduced protein structure.

Nucleotide sequence was determined for a 1.6-kilobase human cDNA putative for the branched chain acyltransferase protein of the branched chain alpha-ketoacid dehydrogenase complex. Translation of the sequence reveals an open reading frame encoding a 315-amino acid protein of molecular weight 35,759 followed by 560 bases of 3'-untranslated sequence. Three repeats of the polyadenylation signal hexamer ATTAAA are present prior to the polyadenylate tail. Within the open reading frame is a 10-amino acid fragment which matches exactly the amino acid sequence around the lipoate-lysine residue in bovine kidney branched chain acyltransferase, thus confirming the identity of the cDNA. Analysis of the deduced protein structure for the human branched chain acyltransferase revealed an organization into domains similar to that reported for the acyltransferase proteins of the pyruvate and alpha-ketoglutarate dehydrogenase complexes. This similarity in organization suggests that a more detailed analysis of the proteins will be required to explain the individual substrate and multienzyme complex specificity shown by these acyltransferases.

Branched chain acyltransferase (BCKD-E2)' forms the core unit of the mitochondrial multienzyme complex, branched chain a-ketoacid dehydrogenase (BCKD) (1)(2)(3). BCKD-E2 is nuclear coded, and the mature form of the protein migrates with an apparent M, value of 52,000 (4). A covalently bound lipoate is positioned near the N terminus of the protein (2,5). Although the catalytic properties of BCKD-E2 are well characterized, details of the structural properties are limited. Amino acid sequence for the protein has not been determined since it has been difficult to purify BCKD-E2 free from the associated subunits. To circumvent this problem, * This work was supported by National Institutes of Health Grant AM 38320 and by grants from the Medical Research Council, United Kingdom and the University of Newcastle upon Tyne. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper h been submitted 503208.
to the GenBankTM/EMBL Data Bank with accession number(s) W Research Fellow of the University of Newcastle upon Tyne.
**TO whom correspondence should be addressed. $$ Lister Institute Research Fellow. The abbreviations used are: BCKD, branched chain a-ketoacid dehydrogenase; kb, kilobase; PDH, pyruvate dehydrogenase; OGDH, a-ketoglutarate dehydrogenase; SDS-PAGE, sodium dodecyl sulfatepolyacrylamide gel electrophoresis. amino acid sequence was deduced from the nucleic acid sequence of a cDNA encoding BCKD-E2.
We have previously reported the isolation of a human liver cDNA clone which, based on antibody selection, putatively coded for the human branched chain acyltransferase (6). Here we report the complete nucleotide sequence for this cDNA with the deduced amino acid sequence. We confirm the identity of this cDNA by matching a portion of the amino acid sequence to that of a peptide fragment isolated from bovine kidney BCKD-E2 (7). The entire deduced protein structure for human BCKD-E2 is then analyzed by comparison to other acyltransferase proteins.

MATERIALS AND METHODS
The 1.6-kilobase (kb) cDNA, putatively BCKD-E2, was cloned into M13 mp18 and mp19. Nucleic acid sequencing was done according to protocols given in ATP as the labeled nucleotide. Oligonucleotide primers (17 bases) synthesized in the Microchemistry Center, Emory University, were used to minimize subcloning. Fragments for sequence analysis were also produced and subcloned using a Cyclone Kit from International Biotechnologies, Inc. according to their instructions. Genepro software programs (Riverside Scientific Enterprises) were used to translate nucleotide sequences and for comparison analyses.
BCKD was acylated using [U-"Cla-ketoisocaproate in the presence of N-ethylmaleimide. Peptic fragments were isolated and purified by high voltage electrophoresis and high pressure liquid chromatography. Amino acid sequence of these fragments was determined with an Applied Biosystems 477 pulsed liquid Sequencer as previously described (8).

RESULTS
A restriction enzyme cleavage map for the 1.6-kb cDNA is seen in Fig. lA. Based on the strategy shown in Fig. 1B, the nucleotide sequence for this cDNA was established. Predicted enzyme cleavage sites were confirmed by sequence analysis. The complete nucleic acid sequence and deduced amino acid sequence are shown in Fig. 2. The encoded protein has a molecular weight of 35,759 based on amino acid content in the longest open reading frame.
Using acylated lipoate as a marker, a peptide fragment containing this lipoate binding site was isolated from bovine kidney BCKD. The determined amino acid sequence for this 10-residue peptide is Glu-Val-Gln-Ser-Asp-Lys-Ala-Ser-Val-Thr. As can be seen in Fig. 2, amino acid residues 95-104 for human BCKD-E2 exactly match this bovine amino acid sequence. Furthermore, the amino acid sequence flanking the lipoate binding site in human liver BCKD-E2 was compared to several other proteins ( Table I). As is observed, both the critical lysine and a glycine, 11 residues C-terminal to the lysine, are conserved in all proteins.
The first 25 amino acids of human BCKD-E2 contain no acidic residues and 3 arginine residues separated by 3-8 amino acids (Figs. 2 and 3). Such unusual properties are hallmarks of leader sequences which direct nuclear coded proteins to the mitochondria (11). The other acyltransferase proteins we compared with human BCKD-E2 lack this region.

DISCUSSION
Antibodies specific for the BCKD proteins were used to isolate the 1.6-kb cDNA clone from human liver (6). The longest open reading frame within this cDNA begins at the first methionine, located 12 bases after the 5'-EcoRI cloning site, and translates into a 315-amino acid protein of 35,759 daltons ( Fig. 2). Within the 3"untranslated end there are three repeats of the polyadenylation signal hexamer, ATTAAA, followed by a poly(&) tail (12). A 10-amino acid sequence (95-104) within the open reading frame exactly matches the lipoate-lysine fragment of bovine kidney BCKD-E2 and confirms that this cDNA encodes the human liver BCKD-E2 isolated from mitochondria migrates in sodium dodecyl sulfate-polyacrylamide gels (SDS-PAGE) with an M , of 52,000. The disparity in M, value and the -36-kDa size, predicted from amino acid content of the translated cDNA, must be explained. Size discrepancies of this type are also reported for both PDH-E2 and OGDH-E2 acyltransferase proteins from two analogous mitochondrial multienzyme complexes (3). Aberrantly high M , values in these proteins are attributed to an extended region of helical structure due to proline-alanine-rich domains (13).
Since the three multienzyme complexes (BCKD, PDH, and OGDH) function in an analogous manner with the acyltransferase as the core component (3), we asked if the size difference for human BCKD-E2 could also be explained by those properties of PDH-E2 and OGDH-E2 which lead to disparate molecular weight values. To approach this question, we aligned the structures of these three proteins using the lipoate-BCKD-E2.

CTTTTATTCT TTMTGATTA GGGTGACCTA OT'ICCACATG G C C T G M G G T M C A T T G G C A 1140 GTAATGTGAC TAGGTACATG TAGACTTGM ATTAAACTCA G C A T A M T M GAGTTCTCTC 1200
TCTCCCTCCT TTTATACCTC TTATTGTTTC AAATICTCTA TCCTTTCTM GTTAAMGTA 1260

ATCCTTAACG CTTCTGTGGA TGAAMCTGC CAGMTATAA CATATMGGT TGGCTATGCA 1380
GAGTAMMT GTTCTTATTA TTMTAGMG ATGGGGCAAA TGTGCTTGTA GATTAGAGAC 1440  lysine as the focal point (Fig. 3). Escherichia coli PDH-E2 contains three lipoate-lysine domains. Studies using recombinant proteins showed that a constructed protein, containing only the third lipoate-lysine domain, conferred full activity to a reconstituted PDH complex (13). Hence this third lipoatelysine site was used to align with the other proteins. Four structural domains are defined for OGDH-E2 and PDH-E2 (9, 14, 15). The deduced protein structure for human BCKD-E2 is analyzed by comparison to these PDH-E2 and OGDH-E2 proteins. A general conservation of structure is found for all the proteins (Fig. 3).

TTCTATTGTA CCTCAMTAG TGATCTTTTT T A G A C T A G A~G G T A T
Beginning with the lipoate-lysine region as domain one, our comparison shows similarity in amino acid content among the proteins (Table I). Since the lipoate serves a similar function in all the proteins this would be expected.
A proline-alanine-rich region is the second domain to compare among the proteins. This sequence is postulated to be important in placing the lipoate-lysine into an extended arm configuration (13). It is this extended structure which is thought to contribute to the slow migration of these proteins in SDS-PAGE, resulting in abnormally high M , values (13, 14). Functional importance for this domain may not depend on the exact number of proline or alanine residues since the predominance of these residues varies among the proteins compared. Human BCKD-E2 has only a few alanine residues while the proline content is high in this domain. The M, value of BCKD-E2 from SDS-PAGE estimation, therefore, could be aberrantly high as is observed with PDH-E2 and OGDH-E2. The actual role of lipoate and its contribution to migration of BCKD-E2 in SDS-PAGE must be investigated. These studies are being conducted with human BCKD-E2 made in vitro.
A third domain is the binding site for the lipoamide dehydrogenase (E3) subunit (3,10,15). Human BCKD-E2 and the other proteins show striking conservation of the amino acid sequence in this domain (Fig. 4), and thus we postulate that this is the E3 binding site for the human acyltransferase. Since the E3 subunit is common to all three multienzyme complexes, this similarity is not unexpected. An enrichment for lysine, glutamate and glutamine flanking this E3 binding site is also maintained for all proteins ((9) Fig. 2).
The fourth domain, termed the catalytic domain, is located in the C-terminal portion of the proteins. This domain confers substrate specificity to the protein and may contain the binding site for the El subunit of the complex. Specificity of the different acyltransferases for the homologous El subunit is high. This specificity was suggested from experiments with isolated subunits from the BCKD and PDH complexes (17). When subunits were mixed to give chimeric complexes, little complex activity was noted (17). This low activity could be due to the substrate specificity of the acyltransferases or result from improper binding between the mixed subunits. Since antibodies specific for BCKD-E2 do not recognize either OGDH-E2 or PDH-E2 (18), the antigenic specificity may also reside in this C-terminal end. Again additional studies are necessary to confirm or disprove this hypothesis. Since multiple E2 subunits interact in the formation of the active complex (3), binding sites for E2 to itself may also be present in this domain. Direct answers to some of our initial questions regarding human BCKD-E2 are not possible from these studies. However, these data confirm that the acyltransferase proteins are highly conserved in structure and function. These observations provide a basis for further analysis of BCKD complex assembly and subunit interaction. As clones for the other subunits of BCKD become available (19), in uitro synthesis of these proteins can be used to study these processes.
The first 25 amino acids of BCKD-E2 are unique due to the absence of acidic residues and the enrichment in arginine. These characteristics define other known leader sequences which direct nuclear coded proteins to the mitochondria (11). The N-terminal amino acid of mature BCKD-E2 is not known so the exact length of the leader sequence cannot be determined from this data. (Attempts at end terminal amino acid sequence analysis suggest N-terminal blockage.') In comparison with the other acyltransferases, only the rat liver PDH-E2 should contain a leader sequence since the other two proteins are of prokaryotic origin. Since data on the cDNA for the rat liver enzyme used in this report are derived from a literature report (lo), we do not know if the clone encodes a full-length protein. Experiments are now being done with the in uitro synthesized protein from this human BCKD-E2 cDNA to determine if this 5'-end does indeed encode a leader sequence.
Similarities between the acyltransferases suggest a common ancestor gene for these proteins. The mitochondrial location of these proteins in mammals is consistent with a prokaryote parent gene. BCKD-E2 does not occur normally in E. coli; however, a prokaryote form has been reported in Pseudomonas (20). Until the cDNA and protein sequences are known for these prokaryote BCKD components, a complete comparison and evolutionary prediction would be premature.