Molecular Cloning and Sequencing of the Gene for Mycocerosic Acid Synthase, a Novel Fatty Acid Elongating Multifunctional Enzyme, from Mycobacterium tuberculosis var. bovis Bacillus Calmette-Guerin”

Mycocerosyl lipids are found uniquely in the cell walls of pathogenic mycobacteria. Mycocerosic acid synthase (MAS) is a multifunctional protein which cat- alyzes elongation of n-fatty acyl-CoA with methylma-lonyl-CoA as the elongating agent (Rainwater, D. L., and Kolattukudy, P. E. (1985) J. Biol. Chem. 260, 616-623). To understand how the various domains that catalyze the reactions involved in chain elongation are organized, mas gene from Mycobacterium tuber- culosis bouis BCG was cloned. A Xgtll library of AluI partially digested genomic DNA from the organism was screened with an oligonucleotide probe designed from the N-terminal amino acid sequence of purified MAS. Using terminal segments of inserts from positive clones as the probe, the library was rescreened and the process was repeated. Sequencing of four overlapping clones revealed a contiguous sequence of 9699 base pair(s) (bp) of mycobacterial genome containing a 6330-bp open reading frame that could code for a protein of 2100 amino acids with a molecular mass of 225,437 daltons. The authenticity of the open reading frame as that of MAS was verified by correspondence of the amino acid sequences deduced from the gene with the directly determined amino acid sequences of the N terminus and three different internal peptide fragments. By comparing the MAS amino acid se- quence with the sequences in the active site regions of

Molecular Cloning and Sequencing of the Gene for Mycocerosic Acid Synthase, a Novel Fatty Acid Elongating Multifunctional Enzyme, from Mycobacterium tuberculosis var. bovis Bacillus Calmette-Guerin" ( Mycocerosyl lipids are found uniquely in the cell walls of pathogenic mycobacteria. Mycocerosic acid synthase (MAS) is a multifunctional protein which catalyzes elongation of n-fatty acyl-CoA with methylmalonyl-CoA as the elongating agent (Rainwater, D. L., and Kolattukudy, P. E. (1985) J. Biol. Chem. 260,[616][617][618][619][620][621][622][623]. To understand how the various domains that catalyze the reactions involved in chain elongation are organized, mas gene from Mycobacterium tuberculosis bouis BCG was cloned. A Xgtll library of AluI partially digested genomic DNA from the organism was screened with an oligonucleotide probe designed from the N-terminal amino acid sequence of purified MAS. Using terminal segments of inserts from positive clones as the probe, the library was rescreened and the process was repeated. Sequencing of four overlapping clones revealed a contiguous sequence of 9699 base pair(s) (bp) of mycobacterial genome containing a 6330-bp open reading frame that could code for a protein of 2100 amino acids with a molecular mass of 225,437 daltons. The authenticity of the open reading frame as that of MAS was verified by correspondence of the amino acid sequences deduced from the gene with the directly determined amino acid sequences of the N terminus and three different internal peptide fragments. By comparing the MAS amino acid sequence with the sequences in the active site regions of known fatty acid synthases and polyketide synthases the functional domains in MAS were identified. This analysis showed that the domains were organized in the following order: &ketoacyl synthase, acyl transferase, dehydratase-enoyl reductase, B-ketoreductase, acyl carrier protein; no thioesterase-like domain could be found. These results establish MAS as the first case of an elongating multifunctional enzyme composed of two identical subunits that resemble the vertebrate fatty acid synthase in size, subunit structure, and linear organization of functional domains. Southern and Western blot analyses showed absence of mas gene and encoded proteins in Mycobacterium smegmatis and Escherichia coli. This result is consistent with the re-* This work was supported by Grant GM-18278 from the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

M95808.
to the  port that mycocerosic acid is present only in pathogenic mycobacteria.
Multimethyl-branched fatty acids occur in a variety of biological systems. The aglycone portion of macrocyclic lactone antibiotics such as erythromycin generated by Streptomyces (1) and multimethyl-branched fatty acids found in sebaceous glands (2,3) are examples of such natural products. The enzymology and regulation of their biosynthesis is not well understood except in the case of goose uropygial gland which generates 2,4,6,8-tetramethyl decanoic acid (4). In this system, an acyl-CoA carboxylase generates malonyl-CoA and methylmalonyl-CoA ( 5 ) but the former is converted to acetyl-CoA by a cytoplasmic decarboxylase present only in the gland (4,6). This ensures that methylmalonyl-CoA would be the only substrate available for fatty acid synthase which is capable of synthesizing multimethyl-branched fatty acids (7).
Mycobacterium tuberculosis currently afflicts 30 million people causing 3 million deaths per year (8,9) and Mycobacterium leprae causes leprosy that afflicts 10-12 million people (10,ll). Even though tuberculosis was thought to have been nearly eradicated from the United States, recently there has been a resurgence of tuberculosis, and increasing incidences of tuberculosis among patients with acquired immunodeficiency syndrome (AIDS)' has been noted (12). These pathogenic mycobacteria contain a variety of unique fatty acids which have one to six methyl branches, at even-numbered positions at the carboxyl end, and a long n-aliphatic chain (13, 14). One such group of acids, called mycocerosic acids, is found exclusively esterified to phenolphthiocerol (15) in the cell wall of only pathogenic mycobacteria (16). A cell-free system which catalyzes the synthesis of mycocerosic acids was obtained from M. tuberculosis bouis BCG (17). Mycocerosic acid synthase (MAS), a multifunctional enzyme, which catalyzes the elongation of n-fatty acyl-CoA specifically using methylmalonyl-CoA (but not malonyl-CoA) to produce primarily the corresponding tetramethyl-branched mycocerosic acids was purified and characterized (18). It was found to be a dimer of 238-kDa protomers each with an acyl carrier protein like segment. Thus, this enzyme which elongates nfatty acyl moieties with methylmalonyl-CoA is a multifunctional enzyme similar to vertebrate fatty acid synthase in size and subunit structure (19). To achieve elongation the dimeric enzyme must catalyze transacylation steps, condensation, keto reduction, dehydration, and enoyl reduction. How the domains that catalyze the various steps involved in this unique elongation system are arranged is not known.
In the present paper, we describe the molecular cloning and sequence of the mycocerosic acid synthase gene (mas) and present evidence that the domains are organized in this multifunctional enzyme in the following order: ketoacyl synthase acyl transferase, dehydratase, enoyl reductase, p-ketoreductase, acyl carrier protein. This is the first case of structure elucidation of a fatty acid chain elongating enzyme.
Amino Acid Sequence Analysis-Mycocerosic acid synthase was purified as described before (18). The N-terminal sequencing was done in an Applied Biosystems Model 473 protein sequencer. The purified MAS was cleaved with Staphylococcus aureus V8 protease (20). The resulting peptides were loaded on a 3% stacking and 4-15% resolving sodium dodecyl sulfate-polyacrylamide electrophoresis gel. After electrophoresis the peptides were electroblotted on polyvinylidene difluoride membrane (Immobilon-P, Millipore), stained with Coomassie Blue, destained, rinsed with water, and air-dried (21). Membrane segments representing three internal peptides were cut out, and the peptides were sequenced in an Applied Biosystem Sequencer.
Western Blots-Crude extract from M. tuberculosis bovis BCG, M. smegmatis, and E. coli were subjected to sodium dodecyl sulfatepolyacrylamide gel electrophoresis on 3% stacking and 4% resolving gel. The proteins were blotted to a polyvinylidene difluoride membrane. Western blot was done as previously described (22) using polyclonal antibody raised against mycocerosic acid synthase in rabbit and 1251-protein A.
Preparation of Genomic DNA Library-TICE BCG vaccine of M. tuberculosis bouis was obtained and cultivated as previously described (17). Genomic DNA from bacteria was isolated as described before (23). A size-selected AluI partial library was made in X g t l l . The genomic DNA was partially digested with AluI. DNA fragments with length of 2-7 kb were isolated and ligated to EcoRI linker-adapter (according to the manufacturer's instructions, Invitrogen), and the products were ligated with X g t l l arms and packaged using an in vitro Gigapack 11 plus packaging extract (as described by the manufacturer, Stratagene). The titer of the unamplified AluI Xgtll genomic DNA library was 2 X lo6 recombinant phages, which was then amplified in strain Y1090.
DNA Sequencing-The EcoRI-digested lambda clones from XMAS1, XMAS2, XMAS3, and XMAS4 were subcloned into M13 mp18. Single stranded DNA was sequentially deleted with T4 DNA polymerase generating series of overlapping clones (24). Using the universal M13 sequencing primer as well as various additional, specifically synthesized oligonucleotide primers, the M13 mp18 clones were sequenced by the Sanger dideoxy chain termination method (27) using [ O I -~~S I~A T P a s label. Southern Blot Analysis-M. tuberculosis bouis BCG genomic DNA (10 pg) was digested with various restriction enzymes, separated on 0.8% agarose gel and transferred to Nytran (modified Nylon-66 membrane, Schleicher & Schuell) as described before (30). The filters were baked at 80 "C for 2 h. Prehybridization, hybridization, and washing were carried out as described above. A 2543-bp EcoRI fragment from XMAS2, a 1918-bp EcoRI fragment from XMAS3, and a 1297-bp EcoRI fragment from XMAS3 were used as probes for hybridization. Southern blot analysis was also done on EcoRI-digested genomic DNA from M. tuberculosis bovis BCG, M. smegmatis, and E. coli as described above.

RESULTS
Isolation and Sequencing of Genomic Clones-To understand the organization of the catalytic domains of the multifunctional enzyme, mycocerosic acid synthase, we cloned and sequenced the gene for this enzyme from M. tuberculosis bouis BCG. To prepare oligonucleotide probes for detecting mas gene, the purified enzyme was subjected to N-terminal sequencing. Unlike multifunctional fatty acid synthases such as those from vertebrates (19) and M. tuberculosis bouis BCG (31) in which the N terminus is blocked, N terminus of MAS is not blocked and consequently yielded the sequence of 20 amino acid residues: H2N-MESRVTPVAVIGMGCRLPGG-COOH. Based on this sequence an oligonucleotide corresponding to Ala-9 through Gly-14 was synthesized. When a Xgtll size-selected (2-7 kb) AluI partial genomic library of M. tuberculosis bouis BCG was screened with the labeled oligonucleotide probe, two positive clones, XMASl and AMASS, were obtained. EcoRI digestion of XMASl and XMAS2 each gave one EcoRI insert of 3.2 and 2.5 kb, respectively. These EcoRI fragments were subcloned in M13 mp18 and sequenced. Upon sequencing the two clones were found to be overlapping by 1988 bp, and the sequence of XMAS2 gave a 3"extension of 555 bp; thus a total sequence of 3758 bp was obtained from XMASl and XMAS2. A 724-bp HindIII-EcoRI fragment from 3'-end of XMAS2 was used as a probe (probe B, Fig. 1) to screen the genomic library. A positive clone XMAS3, which contained a 4-kb insert, was obtained which gave four EcoRI fragments. These EcoRI fragments were subcloned in M13 mp18 and sequenced. The four EcoRI fragments yielded sequences of 1918, 1297, 499, and 381 bp. A total sequence of 4077 bp was obtained from XMAS3 which was found to be overlapping with XMAS2 by 499 bp and gave an extension of 3578 bp. A 1297-bp EcoRI fragment from 3'end of XMAS3 (probe D, Fig. 1) was then used as a probe to screen the genomic library. A positive clone, XMAS4, containing a 2.8-kb insert was obtained which gave two EcoRI fragments. These two EcoRI fragments gave a sequence of 1238 and 1621 bp yielding a total sequence of 2853 bp from XMAS4 with an overlap of 490 bp with XMAS3 and a 3"extension of 2363 bp. Both the strands of all four genomic fragments were sequenced. The nucleotide sequence of these four overlapping clones reveal a contiguous sequence of 9699 bp of the mycobacterial genome. The complete DNA sequence and deduced amino acid sequences are shown in Fig. 2.       Southern and Western Blot Analyses-Genomic DNA prepared from M . tuberculosis bouis BCG was digested with various restriction enzymes and hybridized with three different probes: 2543-bp EcoRI fragment (probe A), 1918-bp EcoRI fragment (probe C), and 1297-bp EcoRI fragment (probe D) as shown in Fig. 1. Probe A hybridized to a single band (4.4 kb) in genomic PstI digest, three bands (3.9, 2.8, and 0.45 kb) in BamHI digest, and two bands (2.3 and 1.5 kb) in HincII digest (Fig. 4). Probe C hybridized to a single band (3.9 kb) in genomic BamHI digest, single band (1.9 kb) in genomic EcoRI digest, and five bands (2, 1.5, 0.8, 0.4, and 0.2 kb) in genomic HincII digest. Probe D hybridized to single band 3.4, 2.0, and 1.4 kb, in SmaI, EcoRI, and BamHI genomic digests, respectively. The result showed that there is perfect agreement between the restriction bands in the genomic Southern and the gene sequenced, thereby showing that the entire mas gene was cloned without rearrangement. The Southern analysis also strongly suggests that there is a single copy of the mas gene. mutis is known to be quite similar to that in M. tuberculosis bouis (31), and if M. smegmatis had a MAS it probably would have been immunologically similar to MAS from M. tuberculosis bouis. Thus, the results presented here are consistent with the report that mycocerosic acids, unique products of pathogenic mycobacteria, are not found in E. coli and M. smegmatis (16).

GAGGAGCAGGCGGCGGCACCGGCGGCATCATAGCGGTCGAATTCTGTTGTGCCGCAAGC~ E E Q A A A P A
Identification of Catalytic Domains in MAS-MAS is a unique fatty acid-elongating enzyme which uses methylmalonyl-CoA for elongation of n-fatty acids to produce multimethyl-branched very long chain fatty acids. The various functional domains required for synthesis of mycocerosic acid are: acyl transferase, @-ketoacyl synthase, ,&ketoreductase, dehydratase, enoyl reductase, and acyl carrier protein. Such reactions are catalyzed by the multifunctional fatty acid sythases, and the amino acid sequence motifs surrounding the substrate binding sites for the various domains of such synthases have been identified (33)(34)(35)(36)(37)(38)(39)(40). Recent studies revealed that the polyketide synthase from Streptomyces erythraea, eryA, also contains domains that can be identified by the presence of the amino acid sequence motifs similar to those found in the corresponding domains of fatty acid synthases (34,41). Comparison of the amino acid sequence of MAS with the known amino acid sequence motifs of the domains of fatty acid synthases of a number of eucaryotes and procaryotes (19,42) and of eryA helped to identify the domains in MAS.
Comparison of sequence between the ketoacyl synthase active sites (Table I) in MAS with eryA shows 93% identity, whereas the corresponding regions in FAS from chicken, goose, rat, S. cerevisiae, and E. coli showed only 71, 71, 64, 29, and 21% identify, respectively. The highly conserved sequence GPXXXXXXTACSS (43) around the active cysteine residue of ketoacyl synthase domain of FAS and polyketide synthase that participates in thioester formation can be detected in MAS (Gly-168 to Ser-179). Table I1 shows that the amino acid sequence surrounding the essential serine of acyl transferase shows maximum identity with S. erythraea eryA (80%) followed by rat FAS (45%) and chicken FAS (40%). The highly conserved GHSXG motif (41,44) of the acyl transferase domain, where S is serine, involved in the formation of acyl-enzyme intermediate, is present in the MAS (Gly-621 to Gly-625). Table I11 compares the amino acid sequences around the NADPH binding site of enoyl reductase and shows that 89, 83, and 67% amino acids of MAS are identical to the corresponding regions of fatty acid synthase of rat, chicken, and S. erythraea eryA, respectively. Comparison of amino acids around the NADPH binding site of ketoreductase shows that 81 and 56% amino acids of MAS are identical to the corresponding regions of vertebrate FAS and polyketide synthase from S. erythraea, respectively (Table IV). Common to the NADPH binding domains of many enzymes is a @-CY-@fold, centered around a highly conserved sequence Gly-X-Gly-X-X-Gly (where X is any amino acid) that constitutes a tight turn at the end of the first strand of a @-sheet and marks the beginning of the succeeding a-helix (45). This fingerprint region GXXGXXXAXXXA of NADPH-dependent reductases found in the enoyl reductase and ketoreductase domains can be allocated to two positions in the MAS sequence: Gly-1568 to Ala-1578 and Gly-1773 to Ala-1784. Table V shows that 54% of the amino acid sequence around the pantetheinebinding serine of acyl carrier protein of MAS is identical to the corresponding regions of FAS of chicken, rat, and goose; 46, 38, and 23% identity were found for the corresponding regions of module IV in ORF 2 of S. erythraea eryA, FAS of E. coli, and FAS of S. cereuisiae, respectively. The pantetheine-binding serine present in acyl carrier proteins in the GLDSLXXXE motif (46) is found at position Gly-2056 to A thioesterase-like domain was not found in the MAS open reading frame. Adjacent to MAS coding region, three open reading frames were found. In the 5'-end two overlapping open reading frames 1 and 3 were found, and in the 3'-end open reading frame 2 was found (Fig. 2). Glu-2064.
The mycobacterial genes involved in fatty acid synthesis seem to have a high degree of resemblance to the gene coding for the enzyme that produces macrocyclic lactone in S. erythraea (Fig. 6). The eryA gene that codes for the enzyme that synthesizes erythronolide B contains three open reading frames each coding for a fused dimeric FAS-like protein (34, 44,48), and the mas gene shows a high degree of similarity to segments of eryA gene. We have recently purified FAS from M. tuberculosis bovis BCG, and this protein is composed of 500-kDa monomers which probably represent two fused monomers (31) as found in eryA gene.
The open reading frame of mas as deduced from nucleic acid sequence lacks a thioesterase domain that resembles the thioesterase domains of vertebrate fatty acid synthase and eryA from Streptomyces. A transferase domain, like the palmitoyl transferase of yeast, that might transfer the product as CoA ester was also not found in the mas gene. The lack of a domain that would catalyze either release of free mycocerosic acids or their CoA esters helps to explain the finding that purified MAS showed extremely low specific activity and the product was found covalently attached to the enzyme (18). This lack of release of the free product might have a biological function. In vivo mycocerosic acids are found exclusively as glycosylated phenolphthiocerol dimycocerosate in the cell wall (15). If the mycocerosic acid is released either as free acid or as CoA ester, these acids might be incorporated into other cellular lipids. Therefore, a direct transfer from MAS to phenolphthiocerol might be an effective method to target the product to the specific glycolipid. To test whether MAS can catalyze such a transfer we tested the effect of purified phenolphthiocerol and its glycoside isolated from M. tuberculosis bouis BCG on purified MAS; but the rate of methylmalonyl-CoA incorporation was enhanced only <loo%, and release of mycocerosic acid from the synthase could not be detected.' Therefore, we tentatively conclude that MAS probably functions in conjunction with a separate transferase that transfers the mycocerosyl groups from MAS to the hydroxyl groups of phenolphthiocerol.
Since polypeptides, respectively. Hydrophobicity plots of these three proteins indicate that all three could contain transmembrane domains (data not shown). Data bank searches did not reveal adequate similarity to enable us to classify the protein products of these open reading frames as members of any known protein classes. However, the observed limited identities are tantalizing enough to suggest that one of them might be a mycocerosyl transferase. For example, open reading frame 1, that is at the 5'-side of mas gene, could code for a 272-amino acid protein that has a region homologous to the acyl transferase of Streptomyces. It is possible that this protein might be anchored to the membrane and functions by directly transferring the mycocerosyl group from MAS to the hydroxyl group of phenolphthiocerol. Recent immunogold labeling results indicating that MAS is located in association with the membrane, whereas FAS is located in the cyt~plasm,~ is consistent with such a hypothesis. This hypothesis is analogous to that postulated in the case of the synthesis of the cyclic peptide, gramicidin S (49,50). In this case, at the 5'side of a large open reading frame that codes for the multifunctional gramicidin S synthase, a small open reading frame was found that could code for thioesterase-like protein, homologous to the S-acyl fatty acid synthase thioesterase that we originally cloned from an animal tissue (51). This thioesterase-like protein was postulated to be involved in the termination of the multistep gramicidin synthesis.
Many important natural products are derived from methylmalonyl-CoA only, malonyl-CoA only, or both of these elongation substrates. However, which component enzyme involved in the process selectively chooses the branched or nprecursor and the molecular basis of the specificity are unknown. The present results raise the possibility of discovering structural factors of the enzyme that might be involved in selecting methylmalonyl-CoA as the substrate. The active site areas of the acyl transferase domain and the ketoacyl synthase domain of mycobacterial MAS showed 80 and 93% identity, respectively, to the corresponding regions of erythronolide synthase, while the corresponding regions of vertebrate FAS showed only 40 and 67% identity, respectively. On the other hand, the other active sites such as those of the reductases of MAS were much more similar to those of vertebrate FAS than to those of erythronolide synthase. Since the uniqueness of the erythronolide synthase and MAS is that they both use methylmalonyl-CoA as the substrate, the striking similarities in the acyl transferase and ketoacyl synthase domains suggest that either or both of the these domains may be selective for methylmalonyl-CoA. The availability of the cloned domains would allow us to test this hypothesis with the possibility of discovering methylmalonyl-CoA-selective enzyme domains. If this approach succeeds, tailor-made branching patterns could be introduced in the future into biologically active natural products with the possibility of generating new structures with novel activities.
The present results reveal the structure of a unique multifunctional chain-elongating enzyme that catalyzes the synthesis of mycocerosic acids. These unusual fatty acids are found exclusively esterified to phenolphthiocerol that is found only in pathogenic mycobacteria. Since the mycocerosyl lipids are unique constituents of the walls of these pathogenic organisms, the synthesis of mycocerosic acids could be a possible target for novel antimycobacterial drugs that could be of great benefit to many millions of people afflicted by diseases caused by mycobacteria.