Primary Structure of the Multifunctional a Subunit Protein of Yeast Fatty Acid Synthase Derived from FASZ Gene Sequence*

The yeast fatty acid synthase consists of two multi- functional proteins, a and 0, arranged in an com- plex with a molecular weight of 2.4 X lo6. Five of the seven enzymatic activities reside in the B subunit, while the remaining two activities, @-ketoacyl synthase and B-ketoacyl reductase, and the domain of the acyl carrier protein, with its prosthetic group, 4’-phosphopan-tetheine, are in the a subunit. The genes FASl and FASZ coding for B and a subunits, respectively, have been cloned and the sequence of FASl has been re- ported (Chirala, s. s., Kuziora, M. A., Spector, D. M., and Wakil, S. J. (1987) J. Biol. Chem. 262, 4231- 4240). In this study, we present the nucleotide sequence of the FAS2 gene. The sequence has an open reading frame, coding for a protein of 1894 amino acids with a calculated molecular weight of 207,863. The location of the serine site of attachment of the prosthetic group of the acyl carrier protein domain and the active cysteine-SH site of B-ketoacyl synthase have been identified at residues 180 and 1312, respectively, in the deduced amino acid sequence. A putative NADPH binding site of B-ketoacyl reductase has been suggested at residue 1038 based on the similarities to


4240).
In this study, we present the nucleotide sequence of the FAS2 gene. The sequence has an open reading frame, coding for a protein of 1894 amino acids with a calculated molecular weight of 207,863. The location of the serine site of attachment of the prosthetic group of the acyl carrier protein domain and the active cysteine-SH site of B-ketoacyl synthase have been identified at residues 180 and 1312, respectively, in the deduced amino acid sequence. A putative NADPH binding site of B-ketoacyl reductase has been suggested at residue 1038 based on the similarities to the consensus amino acid sequences -Gly-Ser-Ala-of the pyridine nucleotide enzymes.
We could not find any sequence homology in the 5' flanking sequence of the FASl and FASZ genes that would suggest common regulatory function. However, in the sequence of these two genes there is an identical eight-base pair sequence TCATTATG at the translational initiation site suggesting that the subunit stoichiometry probably results from equal translational efficiency of the mRNAs of both FASl and FASZ genes. The SI endonuclease mapping suggests that there is a transcriptional initiation site at about 40 nucleotides upstream of the first ATG codon and a transcriptional termination site about 300 nucleotides downstream of the TAG stop codon. The gene does not contain introns as no intron consensus TACTAAC have been found in the sequence.
The fatty acid synthase complex catalyzes the synthesis of long chain saturated fatty acid from acetyl-and malonyl-CoA. In prokaryotes and plants the complex consists of an acyl * This work was supported in part by Grant GM19091 from the National Institutes of Health and by the Clayton Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper hos been submitted

503936.
to the GenBankTM/EMBL Data Bank with accession number (s) carrier protein (ACP)' and seven structurally independent monofunctional enzymes, while in animals all the component enzymatic activities and ACP are organized in one large polypeptide chain (1). In the case of lower eukaryotes, such as yeast and other lower fungi, the enzymatic activities are distributed between two large multifunctional polypeptide chains, CY and P subunits (2-4). The enzymatically active synthase is an a& complex having an estimated molecular weight of 2.4 x lo6. The P subunit contains five of the activities, mapped in the following order on the polypeptide: acetyltransacylase, enoyl reductase, dehydratase, and malonyl/palmitoyl transacylases. The a subunit contains the remaining two activities, P-ketoacyl reductase and P-ketoacyl synthase, and the ACP where the growing fatty acid chain is attached to the cysteamine thiol of the prosthetic group 4'phosphopantetheine (1-4). In yeast, the CY and subunits are encoded by two unlinked genes FASB and FASl, respectively (5). There are indications that these genes are coordinately expressed and that the cell synthesizes equivalent amounts of the mRNAs and the subunits to form the complex ff6& (6, 7). Recently, we have cloned the FASl and FAS2 genes using two independent methods. Immunological screening of a bank of yeast genomic DNA in the vector ColEl yielded clones 102B5 and 33F1 (7). The other method, the genetic complementation of fatty acid requiring auxotrophs of Saccharomyces cereukiae by plasmids selected from a bank of yeast genomic DNA sequences in the vector YEpl3, yielded clones YEpFASl and YEpFAS2 (7). The plasmids 33F1 and YEp-FASB contained the entire FASl and FASB gene, respectively (7,8). The FASl gene of 33F1 DNA which encodes the P subunit of fatty acid synthase was cloned in YEp33F1, and its nucleotide sequence was determined (8). In the present paper, we report the complete nucleotide sequence of FASB gene from the clone YEpFAS2 which encodes the a subunit. The sequence shows an open reading frame that could code for a protein of 1894 amino acids. In the nucleotide-derived amino acid sequence, the serine site of attachment of 4'phosphopantetheine prosthetic group of ACP domain (9) and the active cysteine-SH site of the P-ketoacyl synthase, the enzyme that condenses the acyl and malonyl moieties (lo), have been identified. A putative NADPH binding site of the P-ketoacyl reductase has been proposed.

MATERIALS AND METHODS
Yeast and Escherichia coli Strains-Yeast strain X2180-B1, GRF-18, or 6657-4D was used to prepare total RNA, and E. coli strain RR1 (F pro leu thi lacy HsdR-hsd" endol-) was used to grow plasmids from yeast transformants. JM 107 was used as the host for the isolation of single-stranded DNA on infection with M13 clones.

Yeast Fatty
Acid Synthase Media-Yeast was grown on YPD or SD media ( l l ) , and E. coli was grown on L-broth or M9 media supplemented with ampicillin (40 pg/ml) as required (12).
Enzymes and Biochemicals-Restriction enzymes were purchased from New England Biolabs and the S1 endonuclease from Boehringer Mannheim. Sequencing kits for dideoxy method and radiochemicals were purchased from Amersham Corp. and Du Pont-New England Nuclear. Enzymes were used according to the recommendations of the manufacturer. Chemicals and biochemicals used were of the highest quality available.
RNA Isolation and Hybridization-RNA was isolated from yeast cells as described earlier (8). Total RNA was fractionated on agarose gels (7) and then transferred to nitrocellulose paper and hybridized to different labeled probes according to Thomas (13). FASl and FAS2 gene probes were prepared by synthesizing the radiolabeled strands using appropriate M13 clones and universal primer.
Plasmid DNA was isolated according to Birnboim and Doly (14) as described in the Cold Spring Harbor manual (15). SI endonuclease mapping was performed by the method of Berk and Sharp (16). DNA Sequence Determinutiom-Most of the DNA sequences were obtained by the chemical modification method of Maxam and Gilbert (17,22) using various one end-labeled DNA fragments. The restriction fragments were radiolabeled either at 5' end using [-p3'P]ATP or 3' end-labeled using the fill-in reaction of Klenow fragment of DNA polymerase and [ C Y -~~P ]~N T P (15). Wherever necessary, some of the fragments were sequenced by the dideoxy chain termination method of Sanger and Coulson (18), as described by Messing (19), using Amersham sequencing kit. Sequencing was done with singlestranded DNA templates prepared from M13 FAS2 clones with either the 17-mer universal primer 5'-d(GTAAAACGACGGCCAGT)-3' or oligonucleotide primers custom synthesized based on the known sequences. The oligonucleotide primers were synthesized using Applied Biosystem DNA Synthesizer and were desalted on a Sephadex P-10 column, eluted with 10 mM ammonium bicarbonate, pH 8.0, appropriately diluted (10 pg/ml) in 10 mM Tris-HC1, pH 8.0, 1.0 mM EDTA, and used for sequencing.
Autoradiography was performed with Du Pont Cronex intensifying screen and Kodak XAR-5 films. DNA sequences were analyzed using "Seqanal 1" program (20) with the help of Sperry Microcomputer.
Amino Acid Sequence Determination-Yeast fatty acid synthase (100 mg), purified according to the published procedure (2), was treated with 200 M excess CNBr over the estimated methionine contents in 3 ml of 70% formic acid for 24 h at room temperature. After dilution with 3 volumes of H20, the sample was dried in Speedvac Concentrator. The dried residue was suspended in 1.5 ml of 0.2 M ammonium bicarbonate buffer, pH 8.0, containing 0.5 mg of trypsin (Worthington) and incubated at 37 "C for 24 h. After digestion, the resulting peptides were separated on HPLC as follows. A sample containing 30 pg of peptides was applied to CIS reverse phase column (Vydac 218 TP54,0.46 X 25 cm, 5 pm particle size, Separation Group) and developed at 50 "C with a linear gradient of buffer A (0.1% trifluoroacetic acid) and solvent B (95% CH&N in 0.1% trifluoroacetic acid) from 0 to 50%, reached in 60 min at a flow rate pooled, and labeled T1, Tz, Ts, T4, and Ts, and lyophilized. Each of 1 ml/min (method A). Different fractions were arbitrarily chosen, fraction was digested with aminopeptidase M as follows. Aminopeptidase M (0.1 ml) (Sepharose CL-4B coupled) was washed twice with 1 ml of distilled water and resuspended in 0.2 ml of 0.1 M NH4HC03, pH 8.0. An aliquot (0.3 ml) of the tryptic peptides dissolved in the same buffer was added to the suspension, and the mixture was incubated at 37 "C for 24 h with occasional shaking. The resulting peptides were separated on HPLC as follows. An aliquot of the Aminopeptidase M digest containing 40-100 pg of peptides was applied to a Nova pak Cls column (0.46 X 7.5 cm, Waters Associates) and developed at 50 "C with various linear CH3CN gradients in 0.1% trifluoroacetic acid, pH 2.0, at a flow rate of 1 ml/min. The CHSCN gradients used were: method B, 0-30%, reached in 30 min; method C, 10-35%, reached in 20 min; or method D, 0-35%, reached in 30 min.
The purified peptides were collected and sequenced by automated Edman degradation using gas phase Protein Sequencer (Model 470 A, Applied Biosystems) with facility for on-line phenylthiohydantoin analysis (Model 120 A PTH Analyzer, Applied Biosystems), following the manufacturer's protocol.

RESULTS
Sequence Analysis of YEpFAS2"The molecular cloning of FASB gene was reported earlier (7). Two genomic clones, YEpFAS2 and 102B5, were isolated by genetic complementation and immunological screening, respectively (7). YEp-FASB has yeast genomic DNA of 11.4 kbp which is capable of complementing all known fas2 mutations (8) and encodes a full-length a subunit (21). The clone 102B5 has DNA insert of 3.4 kbp which hybridizes to the YEpFAS2 yeast DNA (7). In order to determine the extent of sequence information needed from the 11.4 kbp insert and to compare the sizes of the mRNAs for FASl and FASB genes, we performed Northern analyses. Yeast total RNA was separated on formaldehyde-containing agarose gels and transferred to nitrocellulose sheets. Radioactive probes specific for FASl and FAS2 genes were prepared using a respective gene fragment cloned in M13 and universal primer. As shown in Fig. 1, FASl probe hybridized to 6.5-kb mRNA and FASB to 6.0-kb mRNA.
HindIII cleavage of the 11.4-kbp DNA insert of YEpFAS2 yielded five fragments with sizes of 4.0, 3.3, 0.9, 1.3, and 2.0 kbp. All of these fragments hybridized with the 6.0-kb mRNA except the 4.0 kbp (data not shown) suggesting that the latter fragment is not a part of FAS2 gene. The 4.0-kbp DNA segment was cleaved with PuuII endonuclease yielding 2-kbp fragments. The 3' end of this fragment, along with the other four HindIII segments of the 11.4-kbp genomic insert, were subcloned into either pUC18 or pBR322 and M13 mp18/ 19, and used for DNA sequence determination. The restriction map and sequence strategy used in the structural analyses of the FAS2 gene are outlined in Fig. 2. Sequencing was accomplished either by chemical modification method of Maxam and Gilbert (17,22) or dideoxy method of Sanger and Coulson (18,19). A total of 7.6 kbp, starting about 1400 bases upstream of the translational start codon, to about 500 bases downstream of the translational stop codon, was sequenced (Fig.  3). All segments of the sequences were overlapped. The sequences of the subcloned inserts were melded by sequencing the parental plasmid YEpFAS2. More than 95% of the sequences were confirmed by sequencing both strands; wherever only one strand was sequenced, the analyses were done more than once. Fig. 3 also shows the deduced amino acid sequence. The nucleotide numbering starts with the putative translation initiation codon ATG at base 1. Starting with this Met codon,   Hatched bores indicate the coding segment. Only the restriction sites used for sequencing is indicated. Open circles denote sequencing by chemical modification method (17, 22), and closed circles denote sequencing by dideoxy method (la, 19). Overlapping sequences were obtained by sequencing the parent plasmid YEpFAS2 by dideoxy method, using appropriate oligonucleotide primers. The DNA fragments used for 5' and 3' end SI endonuclease mapping are shown in p3.3 and p2.0, respectively.  Table I shows the codon usage, and the predicted amino acid composition of the a subunit of yeast fatty acid synthase. The amino acid composition agreed closely to the published values (23).
The serine site (residue 180) of attachment of the prosthetic group, 4'-phosphopantetheine, of the acyl carrier protein and the cysteine-SH (residue 1312) of p-ketoacyl synthase have been located in the contiguous open reading frame presented in Fig. 3 as indicated, based on previously published amino acid sequences (9, 10).
In order to confirm the reading frame of the nucleotide sequence further, we sequenced some of the peptides derived from tryptic digest of the cyanogen bromide peptides obtained from yeast fatty acid synthase. As described under "Materials and Methods," yeast fatty acid synthase was cleaved with cyanogen bromide, and the resulting peptides were digested with trypsin. The digest was then fractionated on HPLC as shown in Fig. 4. Five fractions were arbitrarily collected and labeled TI, Tz, TB, T4, and T g . Each fraction was treated with aminopeptidase M2 and rechromatographed on HPLC using various linear gradients of CHSCN designated as methods B, C, or D (see "Materials and Methods" for details). The pure peptides obtained from each tryptic fractions were sequenced. Comparisons of the amino acid sequence of 10 peptides revealed that eight of the peptides are derived from a subunit and the remaining two peptides from p subunit. As shown in Table 11, the predicted amino acid sequences between residues 87 to 100,101 to 105,507 to 511 near the NH2-terminal region and between residues 1657 to 1676, 1772 to 1796, 1834 to 1842,1852 to 1863 and 1876 to 1892 near the COOH-terminal regions of the a subunit protein matched exactly with the amino acid sequences of the peptides 1 to 8, respectively. The amino acid sequences of peptides 9 and 10 (Table 11) matched with the deduced amino acid sequence of the p subunit protein * The use of aminopeptidase was for the purpose of isolating the blocked NHz-terminal peptides of a and B subunits (to be published).
However, as a by-product of these studies, several peptides were obtained in pure state and sequenced.

a T C C A l l C T l l L X A T A W U X T A C T W W T T P A T G W A X N X M T C A C l l M ' X X T C C CT@XA4TCGTC
oo\cAAGcc;pu;Gac llWllWTG AAAAATcGGITTA44

P G T X i 3 3 A W X T G T T A T K A T G A W -TCAG (X%&%WTCCAG4 uu\pcBllTTTGApG C T T A T C W T K WAlTGTGGr\TTCC ApcTAcApGcdcfpG AATA44GWMXC
-1W UUXXXffiTATATG A T C A l l m GCCACCuw;I\TW GTT"%AXCATA WCATGTAW%GC;C w3w;rlTCTTCTCA CAPCICAllGllWT GCTMAXGTGGG W C

K G G T A T C T l % A K T A l l C C A~C A R K T A T T C C G T T C l l A C A T l l G A C A A A G A A G K T C C T C C C G G r \ C A T T G G A 4 4 T A T G A C C G C~T T G T C T T C T C T T T T C 1 9 9 5 Thr Val Ser Ser Thr
Ile Pm Arg Glu lhr Ile P m Fhe Leu His Leu Aq Lys Lys lhr Pm A l a G l y Asp Trp Lys Tyr Asp Arg Gln Leu Ser Ser Leu phe 665 Numbering of the nucleotides starts with the A of the first ATG Met codon. The numbers on the right represent either the last nucleotide or the amino acid as appropriate. In the 5' noncoding region the putative transcription initiation site and the possible translation initiation signal CAACACCAA are marked and underlined. The acyl carrier protein, the putative @-ketoacyl reductase, and the P-ketoacyl synthase sites are so indicated and underlined based on the published protein sequences (9,lO). In the 3' noncoding region the transcription termination site, and the TAGT, TATGT, and TAGTAA sequences (27) are identified and underlined. The translation termination site is identified by a star. The underlined amino acid sequences represent the isolated tryptic peptides whose sequences were determined by automated Edman degradation (see "Materials and Methods"). Putative NADPH Site    between residues 40 to 63 and 971 to 982 as reported earlier (8).

G 4 4 i w : W T T G W G C T T T G / W ; T T G G A G ( 1 ; C W G 4 4 A l l~P c c W G P C C A 4 P A C~T T C T T A C T T W C G T~A G R G R A A T C W : P
Transcription Initiation and Termination Sites of FAS2 Gene-Since the yeast genomic insert in YEpFAS2 is about 11.4 kbp and the size of the mRNA coding for the a subunit is about 6.0 kb (Fig. l), it was necessary to determine the beginning and the end of the transcription unit. This information would also be useful in establishing the extent of the coding region in the clone YEpFAS2.
In order to determine the transcription initiation site, we labeled the proximal AvaII site in p3.3 at the 5' end (see Fig.  2), and the AvaII-Hind111 fragment was then separated and hybridized with yeast total RNA in 80% formamide at 37 or 47 "C (for details, see legend to Fig. 5). After SI endonuclease digestion, the resistant DNA hybrid was analyzed on denaturing polyacrylamide gels. On autoradiography, two fragments with the estimated sizes of 170 and 180 bases were found to be protected; however, these DNA fragments were lost when the hybridization was performed at 47 "C ( Fig. 5A) presumably because these fragments are AT-rich (64%) resulting in inefficient hybridizations. These results indicate that the initiation site for transcription maps at about 40 bases upstream of the first ATG codon in the sequence shown in Fig. 3.
The mapping of the 3' end of the transcript was done by labeling the distal AvaII site in the plasmid p2.0 (see Fig. 2) at the 3' end. The AuaII-HindIII fragment was excised and hybridized with yeast total RNA and digested with SI endonuclease as described above. The S1-resistant hybrids were analyzed by electrophoresis on denaturing polyacrylamide gel. After autoradiography a band with an estimated length of 800 nucleotides was obtained at hybridization temperatures of 37 and 47 "C (Fig. 5B). These results indicate that the transcription termination occurs at about 300 bases downstream of the TAG stop codon in the sequence presented in Fig. 3. This was confirmed when MluI site at the 3' end of p2.0 (Fig. 2) was labeled and the MluI-Hind111 fragment was isolated and hybridized to the yeast RNA. A DNA fragment of about 90 nucleotides was protected by the mRNA (as shown in Fig.  5C).

DISCUSSION
The plasmid YEpFAS2 contains a yeast genomic fragment that codes for the entire a subunit of yeast fatty acid synthase.
The nucleotide sequence presented here shows an unique open reading frame that could code for a protein of 1894 amino acids starting with Met at position 1. There is no striking homology in the 5' and 3' flanking sequences of FASl and FAS2 genes that could suggest common regulatory functions.

TABLE I1
Amino acid sequence of the purified peptides of yeast fatty acid synthase Purified cyanogen bromide-tryptic peptides of yeast fatty acid synthase were sequenced by gas phase Protein Sequencer (Applied Biosystem) as described under "Materials and Methods." Amino acid sequences of peptides 1-8 and 9-10 are matched with the deduced amino acid sequence of FAS2 and FASl genes (8) Fig. 2) was labeled at the 5' end and the 1.1-kbp AuaII-Hind111 fragment was isolated and hybridized with yeast total RNA (30 pg) at 37 or 47 "C for 16 h. SI endonuclease (800 units) was added and the mixture incubated for 30 min at room temperature. The protected fragments were analyzed in 6% polyacrylamide-urea gels. Lane 1, denatured DNA; lane 2, DNA hybridized with 30 pg of yeast tRNA at 37 "C; lanes 3 and 4, DNA hybridized with yeast RNA at 37 and 47 "C, respectively; lane 5, 6x174 DNA cut with HaeIII and used as standard. B, AuaII site of the FAS2 DNA insert in p2.0 at the downstream region (see Fig. 2) was labeled at the 3' end and the 1.2kbp AuaII-Hind111 fragment was isolated and hybridized with yeast total RNA (30 pg) at 37 or 47 "C for 16 h. SI endonuclease (1000 units) was added and mixture incubated for 45 min at room temperature. The protected DNA fragments were analyzed as in A. Lane 1, 6x174 DNA cut with Hue111 and used as standard; lane 2, denatured DNA; lane 3, DNA hybridized with 30 pg of yeast tRNA at 37 "C; lanes 4 and 5, DNA hybridized with yeast RNA at 37 and 47 "C, However, in the sequence of these two genes there is an identical 8-base pair sequence TCATTATG (bases -5 to 3) at the site of translational initiation (8) including the Met codon. This sequence may serve as a signal that would influence translation efficiency of the mRNA for both a and p subunits leading to stoichiometric accumulation of the subunits and their subsequent association into a& complex.
The SI endonuclease mapping of FASZ gene at the 5' end indicated that there is a transcriptional initiation site at -26 and -36 bp (Fig. 5 A ) . There is a TATATTA sequence at -51 bp that could serve as TATA box. The next upstream TATA box is at -118 bp. Polypyrimidine tracts are present between -500and -210 bp in the sequence presented here; such sequences are found in the 5'-untranslated region of most yeast polII genes (26). The 3' end of the transcriptional unit appears to be about 300 bp downstream of the translational termination site (Fig. 5, B and C). The eukaryotic polyadenylation signal AATAAA has not been found in the downstream sequence. However, signals for transcription termination and polyadenylation TAGT and TATGT (26) are found in the upstream of the transcription termination site (Fig. 3, underlined).
In the entire sequence presented in Fig. 3, there is no intronspecific sequence TACTAAC (27) indicating that the open reading frame of FAS2 represents the only exon which codes respectively. C, MluI site of the DNA insert in the same p.2.0 fragment used in B was labeled at the 3' end and the 600-bp MluI-Hind111 fragment was isolated and hybridized with yeast total RNA (30 pg) at 37 or 47 "C. SI endonuclease treatment and analysis of the protected DNA fragments were carried out as in B. Lanes 1 and 6,@X174 DNA cut with HaeIII and used as standard; lane 2, denatured DNA; lane 3, DNA hybridized with 30 pg of yeast tRNA at 37 "C; lanes 4 and 5, DNA hybridized with yeast RNA at 37 and 47 "C, respectively.

Yeast Fatty
Acid Synthase for the a subunit protein of 1894 amino acids and that the adjoining sequences are not part of FASB gene. In the deduced protein sequence we could identify the serine site of attachment of the prosthetic group, 4'-phosphopantetheine, of the ACP domain and the active cysteine-SH of the P-ketoacyl synthase, the condensing enzyme, by comparison with the published amino acid sequences (9, 10). The amino acid sequence between residues 176-192 matches exactly with the sequence reported for the serine site of ACP domain; however, leucine, residue 175, follows aspartic acid, which was missing in the sequence reported (9). The amino acid sequence between residues 1307-1312 matches exactly with the peptide sequence published for P-ketoacyl synthase (10). The amino acid sequence of P-ketoacyl reductase is not known. However, by comparison with the consensus sequences (-Gly-Ser-Ala-) for NADPH binding site for the enoyl reductase of the yeast P subunit (8), goose uropygial enoyl reductase (24) and horse alcohol dehydrogenase (25), a site between residues 1038 and 1040 has been located (see Fig. 3 and Table 111). The -Gly-Ser-Ala-sequence occurs only once in the entire amino acid sequence of CY subunit and could probably be the site of P-ketoacyl reductase.
The amino acid composition of the deduced protein matches closely with that reported for yeast fatty acid synthase CY subunit (Table I) (23). The percent fraction of each amino acid in the deduced sequence is comparable to an average protein except for the content of cysteine, which is relatively low, as in the case of P subunit of yeast synthase (8). In the , , , , l ,~~~l~~~~l~~~, l ,~, , l , , , , l~~~, l~, , , l ,~,~l~, , , (Table 111).
a subunit there are only 13 cysteine residues out of 1894 amino acids, suggesting a fewer number of disulfide bridges within the molecule. The CY subunit seems to be acidic for the number of Asp + Glu residues are 246 as compared to that of Arg + Lys which number 211 residues. In this regard, the CY subunit is as acidic as the P subunit (8).
The usage of different codons reflects bias for certain codons (Table I). For example, Arg is coded by AGA and CGT more often than by AGG, CGC, or CGA. The codon CGG, on the other hand, was not used at all. Also, Asn is coded more often by AAC than by AAT. As shown in Table I the codons that are rarely used in highly expressed yeast genes, such as alcohol dehydrogenase or glycerol-3-phosphate dehydrogenase (28,29), are used frequently for the biosynthesis of the CY subunit, as is the case of the P subunit of the yeast synthase (8). This implies that the FASB gene is moderately expressed. This conclusion is in line with the codon bias index of 0.40, calculated according to Bennetzen and Hall (29).
The hydropathic profile of the deduced amino acid sequence of the a subunit was calculated by the algorithm of Kyte and Doolittle (30), as shown in Fig. 6. Because of the large size of the molecule, the calculation was performed in four sections. The first segment (1-500 residues) shows the acyl carrier protein domain: the Ser-180, the site of attachment of 4'phosphopantetheine, is surrounded by a stretch of hydrophilic amino acids as is the case in E. coli ACP (31,32). In the second segment (501-1000 residues), residues 501-660 are hydrophilic while the rest are of a mixture of both hydrophilic and hydrophobic amino acids. In the third segment (1001-1500 residues), the putative P-ketoacyl reductase is in a hydrophilic region sandwiched by stretches of hydrophobic amino acids. This is, however, different from the NADPH binding site of enoyl reductase of yeast /3 subunit that was shown to be in a hydrophobic domain (8). The cysteine-SH of the P-ketoacyl synthase is also located in this segment of the molecule and is clearly associated with a region of high hydrophobic amino acids. The last segment (1501-1894 residues) is predominantly of hydrophilic amino acids (Fig. 6).
The unique contiguous open reading frame of the FAS2 gene presented here (Fig. 3) translates into a protein of M , 207,863. Since the NH2 termini of yeast fatty acid synthase subunits are blocked (33), no NH2-terminal sequences are available. Hence, it is possible that one of the first three methionines, , is the NH2 terminus of the protein. However, Met-168 cannot be the site of translation initiation because cyanogen bromide-tryptic digestion of purified yeast synthase yielded a peptide whose amino acid sequence matched exactly with the deduced amino acid sequence between residues 87 and 100, indicating that translation initiation occurs prior to this Met. We believe that Met-1 and not Met-56 is the start of translation-initiation because of the striking sequence homology of the eight nucleotides TCATTATG at the very beginning of the coding region (-5 to 3 bases) of FAS2 gene to the corresponding location in FASl gene (8). In predicting the amino acid sequence of B subunit based on the nucleotide sequence of FASl gene, it was not possible to state unequivocally whether Met-1 or Met-132 were the sites of translation initiation (8). However, as shown in Table 11, we have isolated and sequenced a peptide from purified synthase whose amino acid sequence matches exactly with the deduced sequence of the / 3 subunit at residues 40-63. This result, therefore, establishes Met-1 of the deduced amino acids of P subunit as the NH2 terminus of the protein.
Hence, the eight nucleotides TCATTATG at the translational start of both a and ( 3 subunits may signify that 1) the first Met codon in the open reading frame of FAS2 gene could very well be the NH2-terminal Met of the a subunit; and 2) that the observed stoichiometry of a and @ subunits in yeast could result from the equal translational efficiency of the mRNAs of both the subunits, as mentioned above.
Our calculated molecular weight for the @ subunit of yeast synthase from 1980 amino acids, deduced from the nucleotide sequence, is 220,077 (8), while that of a subunit is 207,863, as stated earlier. These values are the reverse of what had been estimated for the molecular weights of cy subunit (213,000) and p subunits (203,000), based on their mobilities in SDSpolyacrylamide gel electrophoresis where the p subunit always moves faster than the a subunit (2). The reason for this discrepancy is not clear at this time. It is possible, however, that the slow mobility of the a subunit in SDS gel may be due to the presence of phosphopantetheine prosthetic group, unique structural features and/or to post-translational modifications such as glycosylation. Though there are six potential N-glycosylation sites (Asn-X-Ser/Thr) in the deduced amino acid sequence, our tests for the presence of carbohydrates in the protein were negative.
It is noteworthy that E. coli ACP migrates anomalously in SDS-polyacrylamide gels (32). Therefore, the anomalous behavior of the a subunit in SDS gels could be due to its ACP domain. We find, for instance, that E. coli ACP with a calculated molecular weight of 7985 (31) migrated in SDS gel in a distinctly anomalous fashion with an apparent molecular weight of 16,500, in agreement with Rock and Cronon (32). Performic acid oxidation of the E. coli ACP did not change its position on the gel, suggesting that the slower mobility was not due to dimerization through disulfide bond formation (data not shown). It is conceivable, therefore, that the anomalous behavior of the a subunit in SDS gel electrophoresis may be due to its ACP moiety in a manner similar to that of E. coli ACP. Neither performic acid oxidation of the a subunit protein nor @-elimination of the 4'-phosphopantetheine (33, 34) alter its mobility in SDS gel. Moreover, in SDS-urea gel electrophoresis, in which E. coli ACP migrates according to its expected molecular weight of about 8,000 (32), the a subunit shows slower mobility than that of / 3 subunit. The reason for this behavior is not apparent at this time. However, it is of interest to note that upstream of this site for attachment of 4'-phosphopantetheine (Ser-180) in the deduced amino acid sequence (residues 108-138), there is a cluster of prolines and alanines; out of the 31 amino acids in this stretch there are 20 alanines and 8 prolines. It is not known whether such a sequence motif could result in abnormal folding of the protein that would hinder its complete unfolding in SDS, hence its abnormal mobility in SDS-polyacrylamide gels. Moreover, anomalous behavior of proteins is not uncommon in SDS gels (35,36). Examination of the amino acid sequence at the ACP domain of the a subunit shows no similarity to that of the E. coli.
However, when the predicted secondary structure of the ACP region of a subunit, according to Finkelstein (37), was compared to the proposed secondary and tertiary structural model for E. coli ACP (32,38,39), a remarkable similarity was noted.
In these studies, the amino acid sequence of the ACP domain of a subunit (residues 140-226) was analyzed. The results showed that this region is composed of four &-helical structures interrupted by three @ turns. Ser-180 is located in a / 3 turn similar to that of E. coli ACP (32). The four a-helical regions can be folded in such a way as to mimic the proposed structure of E. coli ACP based on NMR studies (39) as shown in Fig. 7. The most striking feature of the predicted model is the conservation of Phe-194 and its location at a distance from the prosthetic group comparable to that of Phe-50 in the E. coli ACP. The amino acid sequence in this region -Glu-Phe-Gly-Thr-of the yeast ACP domain and -Glu-Phe-Asp- Thr-of the E. coli ACP (31) is highly conserved and may signify the suggested role of this Phe residue in the acylation of the prosthetic group in early stages of acyl chain elongation (38,39). The Ile-172 and Phe-217 may play similar roles as Ile-69 and Phe-28 of E. coli ACP by interacting with the prosthetic group (39). Even though the model proposed in Fig.  7 is very speculative, its overall similarity to the structure of E . coli ACP is striking despite the considerable variations in the primary structures of the two proteins (31).
The discrepancy in molecular weight and mobility in SDS gels does not reflect on the correctness of the sequence presented here for the following reasons. 1) In the entire sequence there are no intron-specific sequences, as stated above. The sequence TACTAC found at -375 bases might not serve as processing signal because of the stringent requirement for TACTAAC (27). 2) An Aut11 subclone of the FASZ insert in YEpFA2 containing the entire sequence starting from -700 bases complements all known fas2 mutants and codes for fulllength a subunit (data not shown).
3) The insert of the upstream p2.0* (see Fig. 2) hybridizes to about 1.5-kb mRNA, and not to the 6.0-kb fatty acid synthase mRNA, suggesting that p2.0* is a different gene. SI endonuclease mapping of this p2.0* gene shows that the 1.5-kb mRNA starts at about -2000 bp upstream of the Hind111 and probably terminates at about -500 bp in the sequence as shown in Fig. 3. Both genes are transcribed with the same polarity. 4) Finally, starting from the beginning of the sequence presented here (-1350 bases), there is an open reading frame terminating at -500 bases, coding for a protein of approximately 30 kDa. However, the nucleotide sequence between -500 and 1 does not appear to code for a protein of any consequence. Therefore, the data presented here in Fig. 3 represents a unique sequence coding for the full-length yeast fatty acid synthase CY subunit.
Another possibility we have considered is that the p subunit undergoes post-translational proteolysis, resulting in loss of a peptide of about 150 amino acids. However, there is no way that such a peptide would have been cleaved from the NH2terminal end of the molecule and lost on purification, since we have isolated and sequenced a peptide that has an amino acid sequence that matches the predicted sequence near the NH2-terminal end (residues 40-63) (Table 11). Moreover, we have cloned FASl gene in an expression vector which, when expressed in E. coli, yields subunit protein having the same mobility as the wild type P subunit (date not shown). Although possible, it would be hard to believe that the products produced in both E. coli and yeast would undergo the same modification.
Recently, Schweizer and co-workers (40) sequenced the FASl gene and reported a reading frame of 5535 bp corresponding to a @ subunit protein of 1845 amino acids. This is shorter than the 5940 bp that we reported for the FASl gene or about 135 amino acids less than we predicted for the P subunit protein (8). In comparing the two sequences, we have reconfirmed our sequence of FASl gene and found that the discrepancy lies on an additional G in the sequence CCAA-GAGA G TGAGTTG reported by Schweizer et al. (40) at position-7002. This additional nucleotide in their sequence resulted in premature termination of the open reading frame as reported by them. Also, there are many differences between their nucleotide sequence and ours at the 3' end of the gene.
Our data on the SI mapping of the 3' end and the nucleotide sequence show correctly that the FASl gene codes for a protein of 1980 amino acids (8). Moreover, as shown in Fig.  1, the mRNA for the subunit is larger than that for the 01 subunit, an observation which reaffirms the differences in the calculated molecular weights for the two subunits.
The clone 102B5, the truncated FAS2 gene isolated by immunological screening method that we reported earlier (7), must have the 3' end of the gene because the homologous region between YEpFAS2 and 102B5 shows that the truncated gene could code for the P-ketoacyl synthase and the putative P-ketoacyl reductase, but not for the acyl carrier protein domain, which lies at the 5' end of the FAS2 gene. The transcription of this clone in E. coli might have resulted from a promoter-like sequence either within the insert or in the ColEl plasmid, resulting in the production of protein antigenically reactive to the anti-yeast fatty acid synthase antibodies.
Finally, our nucleotide-derived amino acid sequence helped establish the order of the domains of ACP and the other two partial activities as 5'-ACP/P-ketoacyl reductase/@-ketoacyl synthase-3' which is different from the order determined by genetic mapping by Schweizer and co-workers (6). Our identification of 6-ketoacyl reductase, however, is tentative and is based on the consensus -Gly-Ser-Ala-sequence. Our sequence shows that the ACP domain is at the NHz-terminal end of the multifunctional molecule with an interdomain of about 950 amino acids between it and the putative P-ketoacyl reductase region. On the other hand, a stretch of about 300 amino acids lie between the reductase and the P-ketoacyl synthase which is about 500 amino acids from the COOH terminus of the molecule. The very long interdomainal separation of the ACP and the synthase may explain why the bifunctional reagent dibromopropanone did not result in cross-linking within CY subunit of the pantetheine-SH and the active cysteine-SH of the P-ketoacyl synthase (41). However, it did link all CY subunits by cross-linking the cysteamine-SH of one subunit to the cysteine-SH of the adjacent subunit. This long stretch of amino acids, therefore, would have facilitated this cross-linking and also help arrange the CY subunit (plate-like) with the / 3 subunit (arch-like) in the structures depicted by our model of complex of yeast fatty acid synthase (2). This model was based on electron microscope studies of negatively stained yeast synthase and depicts the synthase as an ovate structure containing on its short axis, plate-like proteins (01 subunits) around which six arch-like proteins (P subunits) are distributed, three on either side. The plate-like proteins are organized in such a way that the cygteine-SH of the P-ketoacyl synthase is juxtaposed within 2 A from the pantetheine-SH of the adjacent CY subunit. The location, therefore, of the pantetheine prosthetic group on Ser-180 and the active cysteine-SH of 8-ketoacyl synthase on Cys-1312 would make it possible to construct the site for carbon-carbon bond formation, hence, chain elongation to produce long chain fatty acyl groups.