The Primary Structure of the Subunits of Carbon Monoxide Dehydrogenase/Acetyl-CoA Synthase from Clostridium titerrnoaceticum*

CO dehydrogenase/acetyl-coenzyme A synthase (CODH) is the central enzyme in the pathway of acetyl-coenzyme A biosynthesis in Clostridium thermoaceticum. It catalyzes the interconversion of CO and CO2 and the synthesis of acetyl-coenzyme A from the methylated corrinoid/iron sulfur protein, CO, and coenzyme A. It is a nickel-iron-sulfur protein and contains two subunits in the form (alpha beta)3. Reported here is the cloning and sequencing of the genes for both subunits of CODH. The gene for the alpha subunit codes for a protein with 729 amino acids and a molecular weight of 81,730, and the beta gene for a protein with 674 amino acids and a molecular weight of 72,928. The alpha subunit follows the beta subunit by 23 bases and the genes for both subunits are preceded by a sequence which is similar to the Shine-Dalgarno sequence of Escherichia coli. No significant amino acid sequence homology has been found to any known sequence. Labeling CODH with 2,4-dinitrophenylsulfenyl chloride and isolating labeled peptide fragments demonstrated that a tryptophan, residue 418 of the alpha subunit, is protected by coenzyme A and thus may be considered a potential part of the coenzyme A site.

Foundation for a Shaw Scholars Award, the Department of Energy for Grant DE-FG02-88ER13875 (to S. W. R.), and Grant GM24913 from the National Institutes of Health (to H. G . W.). A preliminary report of this work has appeared in the 6th International Symposium on Microbial growth on C, compounds, Abstract P-341. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequencefs) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M62727.
( 1 Present address: Dept. of Biochemistry, University of Nebraska, Lincoln, NE 68583. $$ Became deceased on September 12,1991. 3s To whom correspondence should be addressed Dept. of Biochemistry, University of Georgia, Athens, GA 30602. Tel.: 404-542-7640;Fax: 404-542-2222. autotrophically with CO or COZ and H2 as the source of carbon and energy. They produce acetate and use the acetate as an anabolic source of carbon, and play a significant role in the carbon cycle (1-3). The acetate is produced by the acetyl-coA pathway of CO, fixation.The methyl group of acetyl-coA is formed from CO2 by a series of reactions involving formate dehydrogenase and tetrahydrofolate dependent enzymes. It is then transferred to a cobamide bound to a protein designated the corrinoid iron-sulfur protein (C/Fe-SP)' (4). The carbonyl group of acetyl-coA is formed by reduction of CO, to CO. This is catalyzed by CO dehydrogenase/acetyI-CoA synthase (CODH) which also catalyzes the final step in acetyl-coA biosynthesis by condensing the methyl, CO, and CoA to yield acetyl-coA ( 5 ) . CODH from Clostridium thernoaceticum is a hexamer with a molecular weight of 440,000 consisting of two subunits in the form ( (~p )~.
It is one of the few naturally occurring nickel proteins, each @dimer containing two nickel, 11-13 iron, 14 inorganic sulfur, and one zinc (6). This central enzyme of the acetyl-coA pathway, in addition to catalyzing acetyl-coA synthesis, catalyzes also exchange reactions between acetyl-CoA and CO ( 5 ) , acetyl-coA and CoA (7,8,9), and acetyl-CoA and methyl groups bound to either CODH or C/Fe-SP (10). It must, therefore, have separate binding sites for the methyl, carbonyl, and CoA groups designated X , Y, and Z sites, respectively. Recent studies indicate the methyl and CO sites may be different coordination sites of a Ni-Fe center (9), and evidence has been presented that tryptophan is at or near the CoA site (11). But nothing is known about the binding of at least four different metal centers (12), including the Ni. Fe center and other ligands. The elucidation of the binding of the ligands will require sequence information of the subunits of CODH.
In this investigation, the genes of the a and p subunits have been sequenced using clones obtained from a genomic library constructed at the University of Georgia and from one established by Roberts et al. (13) who showed that a gene cluster contained the genes for the a and subunits of CODH, methyltransferase, and the corrinoid (C/Fe-SP) enzymes.
In addition, the primary structure derived from the DNA has been confirmed by sequencing peptides derived from the a and p subunits, one of which may be a component of the CoA site of CODH. Thus far, combination of the expressed a The abbreviations used are: C/Fe-SP, corrinoid iron-sulfur protein; CODH, CO dehydrogenase/acetyl-CoA synthase; LB, Luria-Bertani; TY, tryptone-yeast extract; SDS, sodium dodecyl sulfate; PAGE, polyacrylamide gel electrophoresis; DNPS-Cl, 2,4-dinitrophenylsulfenyl chloride; TFA, trifluoroacetic acid; kb, kilobase(s); HPLC, high performance liquid chromatography. and @ subunits has not led to formation of an active CODH, perhaps because the cloned subunits do not contain competent Ni . Fe-sulfur or other Fe-sulfur centers.

RESULTS AND DISCUSSION
Cbning and Mapping the CODH Genes-C. thermoaceticum genomic DNA was digested with EcoRI and hybridized to an oligonucleotide probe for the p subunit of CODH. A single 3.7-kb band was observed (data not shown). A partial library using EcoRI fragments ranging in size from 2.3 to 4.3 kb was constructed in pBR322 and screened using the CODH p subunit probe. All selected positives contained a 3.7-kb insert which was mapped using three restriction enzymes (see Fig.  1, miniprint).
To test for the presence of multiple copies of the genes, genomic DNA was digested with two restriction enzymes and probed with the EcoRI fragment containing the CODH genes. Only bands with molecular weights expected for the cloned fragment were observed, consistent with the presence of only a single copy of the CODH genes in the C. thermoaceticum genome.
DNA Sequencing and Deduced Amino Acid Sequence-For sequencing, the 3.7-kb EcoRI fragment was cloned into M13mp18 in both directions, and subclones were generated by single stranded deletions. Fig. 1 (miniprint) shows the sizes and orientations of the subclones which were sequenced. Because of the difficulty in accurately sizing the single stranded subclones there was considerable duplication in some regions. If the sequence from a subclone was entirely contained within another subclone, it was not included in the figure. The EcoRI fragment contained the complete gene of the @ subunit, but only 60% of the cy subunit. The remainder of the sequence, from 3680 to 4592, where the gene for the C/ Fe-SP begins, was obtained from a 10-kb Sau3A insert in PUG9 (pCt946A,ref,13). The subclones in this region, also shown in Fig. 1 (miniprint), were generated by double stranded deletions and sequenced. Gaps not covered by deletions were filled using sequence-specific primers.
The DNA sequence from the start of the EcoRI fragment to the first codon of the 55-kDa subunit of the C/Fe-SP is shown in Fig. 2, along with the derived amino acid sequence. As indicated in Fig. 1 obtained after the enzyme had been reacted with DNPS-Cl to identify peptides containing tryptophan residues were sequenced (heavy underlining of Fig. 2 which are numbered as in Fig. 4 of miniprint). In addition, the amino acid compositions for the two subunits predicted by the DNA sequence were found to be in close accord to those determined by acid hydrolysis of the protein and of the isolated subunits (data not shown).
The codon usage of the two CODH genes was used to calculate the coding probability of each of the three reading frames of the sequence by the method of Staden and . In support of the accuracy of the DNA sequence, the highest coding probability was found in the open reading frames used to deduce the protein sequence.
The sequence AGGAGG (underlined in Fig. 2) was found 8 bases in front of the genes for both subunits of CODH. This is similar to the consensus E. coli ribosome binding site (15) and presumably has the same function in C. thermoaceticum. Also shown in Fig. 2 is a similar sequence (AGGAGT) preceding the gene for the 55-kDa subunit of the C/Fe-SP.
Protein Sequence Analysis-The sequences of the two subunits of CODH have been compared to a translation of GenBank (release 61.0) using the sequence comparison program FASTDB (Intelligenetics Inc.). No significant homology was found to any known sequence by this method. Recently, Eggen et al. (16) have determined the amino acid sequence of a CODH from Methnothrix soehngenii which consists of two large subunits of 79.4 kDa and two small subunits of 19 kDa. A comparison of the sequences reveals that no extensive regions of homology exist between the enzymes. However, residues 495-500, VVATGC, and 548-551, GSCV, of the / 3 subunit of the C. thermoaceticum enzyme are identical with residues 545-550 and 583-586, respectively, of the large subunit of the M. soehngenii enzyme.
Spectrometric studies of CODH from C. thermoaceticum (12,(17)(18)(19)(20)(21) have demonstrated the presence of at least two [Fe&]-like clusters, one of which apparently interacts with nickel in the presence of CO. In addition, the enzyme may have a 2Fe complex and a FeS4 species. A total of 31 cysteines are present in the CODH, but arrangements of these as found in iron-sulfur proteins, including ferredoxins, are not evident in the primary structure (22). However, the cy and , 8 subunits each contain two cysteine-rich regions that include residues 506,509,518,528, and 583,595,597,608 of the cy subunit and residues 59,67,68, 71, 76,90, and 316, 317, 342, 350, 355, 366 of the @ subunit. These are potential sites for binding metal clusters. Of interest are the two pairs of vicinal cysteines (residues 67, 68 and 316, 317 of the p subunit). Poston et al. (23) noticed that acetate synthesis in cell-free extracts of C. thermoaceticum from methyl-B12 and pyruvate is inhibited by arsenite and cadmium chloride, reagents which react with vicinal sulfhydryls. Locating the amino acid residues responsible for ligating the Fe-S centers will require further study. Evidence has been presented that the CoA binding site is near the Ni. Fe . C center formed when CO reacts with CODH (17, 24) and involves tryptophan and arginine residues (11,24,25). To identify this site, CODH was labeled with DNPS-C1, a reagent specific for tryptophan residues. In a time course study of the reaction of CODH with DNPS-Cl, it was found that 8.2 mol of DNPS-CI were incorporated per mol of dimer during a 6-h period. The exchange activity of CO with the carbonyl group of acetyl-coA was completely abolished (Fig.  3, miniprint) and no further incorporation of DNPS-C1 was observed up to a period of 24 h. The DNPS-C1 modification of CODH was also carried out in the presence of 100 ,.AM CoA (Fig. 3, miniprint). Under this condition, only 6.8 mol of DNPS-Cl was incorporated per mol of dimer after 6 h without loss of enzymatic activity. This experiment suggests that at least one tryptophan is protected by CoA and may be at the CoA site. The kinetics of the loss of activity upon reaction with DNPS-C1 appears to be zero order (Fig. 3, miniprint), whereas normally it might be expected to be first order. This indicates a more complicated reaction pattern than just a reaction with a specific tryptophan and may involve a conformational change. It should be noted that in reacting CODH with N-bromosuccinimide, another tryptophan reagent, the reaction was exponential and that the loss of activity was prevented by CoA (11).
The CODH modified with DNPS-C1 was digested with trypsin and the resulting peptides separated using HPLC. The elution profile of the peptides is shown in Fig. 4 (miniprint). Seven major peptides, along with a few minor peptides, were labeled by DNPS-CI. The major peptides, together accounting for more than 60% of DNPS-C1 incorporated into CODH were repurified using reversed-phase HPLC.
The sequences were determined of all of the peptides that were labeled with DPNS-C1 in the experiment of Fig. 4A   (peptides 1-7). The N termini of the peptides of Fig. 4B were determined. All N-termini of the peptides of Fig. 4B corre-sponded to those of the tryptophan-labeled peptides of Fig. 4A except peptide 4 was missing which has an IHDFI amino terminus. These facts show that peptide 4 of Fig. 4A was protected by CoA and thus did not become labeled with DPNS-C1 in the experiment of Fig. 4B. This peptide, identified with residues 407-423 in the a subunit of CODH, contains the tryptophan residue 418 which apparently is protected by CoA and may be a portion of the CoA binding site. Since the enzyme is not expressed in the active form, mutation of Trp-418 could not be carried out in the present study to determine if this mutation causes inactivation of the enzyme. Other portions of the enzyme may also be involved in the CoA site that do not contain tryptophan.
All of the tryptic fragments were found in the CODH sequence with the exception of peptide 6, which did not correspond to the sequence in any of three reading frames of the CODH genes. This peptide was compared to the Protein Identification Resource (PIRTM) protein sequence data bank and was found to be very similar to a fragment of the elongation factor TU from E. coli (26) as follows:  C&TWIL. C. rhermoocerirum was grown at 5 5 T on the glueow medium of Ljungdahl and Andreeren (31).
E. coli strain HBlUl wa5 grown on L B media (32). For selection of transformants, I5 mg/l tetracycline was included in agarose plates and broth. E. coli strain T G I was used as a host for the sequencing vector and Sequencing Handbook, Amcrsham Corp.. Arlington Heights, IL).
MiJmpl8. It was maintained on minimal medium and grown in 2 x T T medium (Amerrham MI3 Cioning -. CODH fwm Clormdium r/remooce,iam cells was prepared and assayed ab described by Ragsdale and Wood (5).
,IO" VJ s!i u' m using a modihed version of the preparative electrophoresis procedure of Padjak (33). The N-terminal ' . T h e two subunits of CODH were separated q u e n c c of each suhunlt WPS independently determined at the Department of Genetics. University of Georgia and at the Protein and Nucleic Acid Sequencing Facility, Medical College of Wisconsin. ,,sing Applied Bioryrlerns 470A gas phase sequencer$ equipped with online l2OA FTH analyzers. Tvptic of about 89% wus obtained during the scquenring. 7he amino acid compos~lmn of each subunir wilb fragments of CODH were prepared. teparated and sequenced as descrihed earlier (13). An sverap y d d determined on a Beekman model 199CL amino acid analyzer after hydrolysis in 6N HCI for 24 hours. The walues far threonine and serm were correeted for destruction and valine and isoleucine for lncomplete release (34). Cystmnc was determined after performic m d oxidation (35). Tryptophan was not measured.