Sequence of the Small Subunit of Yeast Carbamyl Phosphate Synthetase and Identification of Its Catalytic Domain*

The yeast gene CPAl coding for the small subunit of arginine-specific carbamyl phosphate synthetase has been cloned by complementation of a cpal mutant with a plasmid library of total yeast chromosomal DNA. Two of the plasmids, pJL113/ST4 and pJL113/ST15, contain DNA inserts in opposite orientations with overlapping sequences of 2.6 kilobases. The nucleotide sequence of a 2.2-kilobase region of the DNA insert carrying the CPAl gene has been determined. The CPAI gene has been identified to be 1233 nucleotides long and to code for a polypeptide of 41 1 amino acids with a calculated molecular weight of 45,358. The amino acid sequence encoded in CPAI is homologous to the recently determined sequence of the small subunit of Escherichia coli carbamyl phosphate synthetase (Piette, J., Nyunoya, H., Lusty, C. J., Cunin, R., Wey-ens, G., Crabeel, M., Charlier, D., Glansdorff, N., and Pierard, A. (1984) Proc. Natl. Acad. Sci. U. S. A. 81, 4134-4138) over the entire length of the polypeptide chain. Comparison of the amino acid sequences of the small subunits of yeast and E. coli carbamyl phosphate synthetases to the sequences of Component 11 of anthranilate and p-aminobenzoate synthases suggests that these amidotransferases are evolutionarily related. CPAI-Yeast CPAl isolated by a described protocol (11) from a recombinant plasmid pool of total nuclear DNA ligated to the BarnHI site of the hybrid vector (24). The CPAl gene by transformation of a mutation in the structural gene of the small arginine-specific carbamyl phosphate synthetase, mutation in for pyrimidine-specific carbamyl phosphate synthetase, double mutation in LEM. JL113 plasmid plasmids pJL113/Tl,

Sequence of the Small Subunit of Yeast Carbamyl Phosphate Synthetase and Identification of Its Catalytic Domain* (Received for publication, November 22,1983) Hiroshi NyunoyaS and C. J. Lusty From the Molecular Genetics Laboratory, The Public Health Research Institute of The City of New York, Inc., New York, The yeast gene CPAl coding for the small subunit of arginine-specific carbamyl phosphate synthetase has been cloned by complementation of a cpal mutant with a plasmid library of total yeast chromosomal DNA. Two of the plasmids, pJL113/ST4 and pJL113/ST15, contain DNA inserts in opposite orientations with overlapping sequences of 2.6 kilobases. The nucleotide sequence of a 2.2-kilobase region of the DNA insert carrying the CPAl gene has been determined. The CPAI gene has been identified to be 1233 nucleotides long and to code for a polypeptide of 41  Based on the observed homologies in the primary sequences of the other amidotransferases examined, we propose a 13-amino acid long sequence to be part of the catalytic domain of this class of enzymes.
Carbamyl phosphate is an essential precursor of both arginine and pyrimidine biosynthesis. In most bacteria capable of arginine and pyrimidine biosynthesis, carbamyl phosphate is synthesized from glutamine, HCO;, and 2 molecules of ATP by a single enzyme, glutamine-dependent carbamyl phosphate synthetase (1,2). The enzyme is an oligomeric protein composed of two nonidentical subunits, a small subunit (-42 kDA) which functions in the transfer of glutamine amide nitrogen to a large subunit (-130 kDa) that catalyzes carbamyl phosphate formation from NH3, HCOT, and ATP (3).
*These studies were supported by Grant GM 25846 from the National Institutes of Health. The costa of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked ''advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ On leave from the Department of Biology, Okayama University, Okayama, Japan.
Two different enzymes, a pyrimidine-specific and an arginine-specific carbamyl phosphate synthetase, are present in yeast (4) and other fungi (5). The two enzymes are located in different subcellular compartments (6)(7)(8) and are separately regulated (2, 4,9). The arginine-specific carbamyl phosphate synthetase catalyzes the same overall reaction and has the same subunit structure as the prokaryotic enzyme. A small subunit (-36 kDa) transfers glutamine amide nitrogen to a large subunit with the catalytic sites for carbamyl phosphate synthesis (10). In Saccharomyces cereuisiae, the small and large subunits are encoded by two unlinked genes, CPAl and CPA2, respectively (4). The CPA2 gene has been cloned (11) and its nucleotide sequence determined (12). The adjacent genes carA and carB coding for the small and large subunits of carbamyl phosphate synthetase of Escherichia coli have also been cloned (13,14). The amino acid sequence of the large subunit derived from the nucleotide sequence of carB (15) has been shown to be highly homologous to the amino acid sequence of the yeast large subunit of arginine-specific carbamyl phosphate synthetase. The primary sequence homologies of the two proteins indicates the bacterial and fungal arginine-specific carbamyl phosphate synthetases are evolutionarily related and are derived from a common ancestral gene (12).
In  (20). Our analysis of the primary sequences together with published data on affinity labeling of the active site of anthranilate synthases of Serratia marcescens (21) and P. putida (19) has allowed us to identify a catalytic domain common to these enzymes.
Cloning of Yeast CPAI-Yeast CPAl was isolated by a previously described protocol (11) from a recombinant plasmid pool of total yeast nuclear DNA ligated to the BarnHI site of the hybrid vector YEpl3 (24). The CPAl gene was selected by transformation of yeast strain JL113 carrying a mutation in the structural gene of the small subunit of arginine-specific carbamyl phosphate synthetase, a mutation in UR42 coding for pyrimidine-specific carbamyl phosphate synthetase, and a double mutation in LEM. JL113 was transformed with 10 pg of the recombinant plasmid pool, and 28 independent clones (Leu+ Cps') were obtained by selection for growth on minimal medium (2% glucose, 0.67% yeast nitrogen base without amino acids (Difco)/l.2 M sorbitol, 3% agar). Plasmid DNA from each of eight different transformants (JL113/Tl-JL113/T8) was used to transform E. coli RR1 to ampicillin-resistance, tetracycline-sensitivity. The transforming plasmids were found by restriction analysis with EcoRI, Sun, and EcoRI plus Sun to contain identical yeast nuclear DNA inserts of 5.7 kb.' Since the size of the 5.7-kb insert was much larger than the anticipated length of the CPAl gene (-1.5 kb), one of the plasmids, pJL113/Tl, was used to subclone the CPAl gene. Plasmid DNA from pJL113/T1 was partially digested (average size of 2-3 kb) with Sau3A and ligated to the BarnHI site of the YEpl3 vector. Transformation of S. cereuisiue JL113 with the new plasmid pool yielded a large number of Leu+ Cps+ transformants. A screen of 20 independent yeast clones complemented in the cpal mutation by the pool yielded two plasmids with yeast nuclear DNA inserts of 2.6 and 3.1 kb; the other plasmids carried larger inserts. Both plasmids were presumed to have the CPAI gene in view of their ability to complement S. cereuisiue JL113. The growth properties of the transformants (growth on minimal medium, and growth on minimal medium plus uracil (11)) indicated that the cloned genes complemented the cpal rather than the u r d mutation.
DNA Sequence Analysis-The plasmids pJL113/ST4 and pJL113/ ST15 were shown by restriction analysis to have yeast DNA inserts in opposite orientations with overlapping sequences of 2.6 kb. DNA fragments from both plasmids were used to determine the nucleotide sequence of a 2.2-kb region in which the CPAl gene is located. Plasmid DNA digested with Hind11 plus SphI produced two fragments of 10.2 (vector) and 3.6 kb in the case of pJL113/ST4 or 3.2 kb with pJL113/ST15 (cf. Fig. 1). The smaller fragments containing the yeast DNA inserts were separated on preparative agarose gels and isolated. The purified fragments were further digested with EcoRI, DdeI, or BstEII plus MnoI. The isolated fragments were either 5'-end labeled with [Y-~'P]ATP (5000 Ci/mmol, Amersham) and polynucleotide kinase (25) or were labeled after secondary cleavage with appropriate restriction endonucleases. Single strands were obtained by electrophoresis on polyacrylamide gels (25). The nucleotide sequence of the isolated single strands was determined by the method of Maxam and Gilbert (25).
SI Nuclease Mapping and Sizing of Yeast Transcripts-The wild type S. cereuisiue strains D273-10B/Al (a met-6) and LL2 (CY ku2-3 ku2-122) were grown to midlogarithmic phase in 1% yeast extract, 2% peptone, 2% glucose and in minimal medium (2% glucose, 0.67% yeast nitrogen base without amino acids (Difco)) supplemented with 50 rg/ml of methionine or leucine. The transformant strain JL113/ ST15 was grown in minimal medium. Total yeast RNA was obtained from spheroplasts as previously described (12). The poly(A)-containing fraction was isolated by chromatography on poly(Uj-Sepharose 4B (26). SI nuclease mapping of the 5' ends of the yeast transcripts was performed as described previously (12). A 5'-end labeled singlestranded fragment of DNA extending beyond the expected transcriptional start site of the CPAl gene (MnoI-BstEII fragment, nucleotides -510 to +I27 (cf. Fig. 1)) was hybridized with 20-100 pg of total yeast RNA or with 1-10 r g of poly(A)-containing RNA. In other experiments, the DNA probes were Hinfl fragments (nucleotides -290 to -180) and (nucleotides -179 to +134). RNA-DNA hybridization was carried out in a volume of 30 pl of 40% formamide containing 40 mM Pipes, pH 6.4, 0.4 M NaCI, and 1 mM EDTA for 3 h at 37 or 45 "C. The annealed samples were diluted 1:10 into SI nuclease buffer (27) and digested with different amounts of St (50-2500 units/ml in the case of total RNA, 20-2500 units/ml with poly(A)-containing RNA) for 30 min at 37 "C. As a control, the probe was also digested with SI The abbreviations used are: kb, kilobase pairs; Pipes, l,4-piperazinediethanesulfonic acid. nuclease both in the absence of RNA, and in the presence of 20 pg of total yeast RNA without prior hybridization. After ethanol precipitation, the RNA-DNA hybrids were denatured in formamide and the DNA was separated on a sequencing gel adjacent to the chemically derivatized DNA probe. S1 nuclease mapping of the 3' termini of yeast transcripts was performed by using the same protocol. The probe, 3'-end labeled with [c~-~~P]dideoxy-ATP (5000 Ci/mmol, Amersham) and terminal deoxynucleotidyl transferase, was a singlestranded Ban-SphI fragment (588 nucleotides) extending from nucleotide +E14 to the SphI site of the vector (cf. Fig. l).
The size of RNA transcripts was estimated by Northern analysis. Total RNA (2-20 pg) and poly(A)-containing RNA (0.2-2 pg) were denatured in 2.2 M formaldehyde and separated on 1.3% agarose gels containing 2.2 M formaldehyde (28). E. coli 16 and 23 S and yeast 18 and 26 S rRNAs were used as calibrating standards. After electrophoresis, RNA was transferred to nitrocellulose filters and hybridized as described by Thomas (29) with a radiolabeled probe prepared by nick-translation (30) of a 1088-nucleotide long BstEII-Ban fragment containing almost all the coding sequence of the CPAl gene.

RESULTS
Nucleotide Sequence of CPAl-The yeast gene CPAl coding for the small subunit of arginine-specific carbamyl phosphate synthetase (4) was isolated as described under "Materials and Methods." The two recombinant plasmids pJL113/ST4 and pJL113/ST15 with the CPAl gene have 3.1-and 2.6-kb yeast DNA inserts, respectively, in the YEpl3 shuttle vector (24) ( Fig. 1). The presence of the CPAl gene in the recombinant clones was confirmed by in uiuo complementation of yeast strains with mutations in the structural genes of the small subunit of arginine-specific carbamyl phosphate synthetase and in the pyrimidine-specific carbamyl phosphate synthetase. The fact that uracil did not inhibit growth of the transformants indicated complementation of the cpul rather than the uru2 mutation (11).
The nucleotide sequence of the CPAl gene was derived from a 2.2-kb region of the yeast DNA inserts of both pJL113/ ST4 and pJL113/ST15. Almost the entire sequence of both strands was obtained by using the restriction sites shown in Fig. 1 for 5'-end labeling. The nucleotide sequence across each of the labeled sites was confirmed with a second set of overlapping fragments. The sequence presented in Fig. 2 represents approximately 85% of the cloned fragment. It has a continuous reading frame of 1233 nucleotides. The other five frames of the sequence are interrupted by frequent termination codons, suggesting that the open reading frame codes for the small subunit of yeast carbamyl phosphate synthetase. The CPAl gene starts with an ATG initiation codon at nucleotide +1 and ends with the termination codon TAA at nucleotide +1234. The open reading frame shown in Fig. 2 codes for a protein of 411 amino acid residues with a calculated molecular weight of 45,358. This value is larger than the size (-36 kDa) of the yeast small subunit estimated by gel filtration (lo), but is consistent with the size of the small subunit of E. coli carbamyl phosphate synthetase (3). As discussed later, the identification of the coding sequence is strongly supported by the homology of the encoded amino acid sequence with the E. coli small subunit (16). The assignment of the ATG codon at +1 as the translational start site of the CPAl gene is consistent with the results of Northern analysis and S1 mapping of the 5' termini of yeast transcripts discussed below. SI Mapping of CPAl Transcripts-The 5' termini of the CPAl transcripts were studied in the wild type strains D273-10B/A1 and LL2 grown under repressed and derepressed conditions. Messenguy et al. (31) have reported a substantive increase in the level of CPAl message when yeast are grown in the absence of amino acids (derepressed) compared to rich medium (repressed) where the general control of amino acid biosynthesis operates. SI analyses of CPAl transcripts were also extended to the transformant JL113/ST15 grown in minimal medium. The initial experiments using a MnoI-BstEII probe (nucleotides -510 to +127) showed that the major transcripts had 5' starts between -244 and -231 ( Fig.  3A). Fig. 3A also shows less prominent 5' starts downstream of -231. The significance of the minor transcripts is not clear. They do not appear to be artifacts, since they are more abundant under derepressed conditions and in the transformant strain harboring the gene on a multicopied plasmid (data not shown). The major transcripts starting at -244 and their relative abundance was gauged more accurately by using a HinfI probe extending from nucleotides -290 to -180 as shown in Fig. 3B. RNA from both wild type and transformant strains grown in minimal medium exhibited strongest 5' termini at -244, -243, -241, -236, and -231. Lower abundance transcripts were also observed with 5' starts at intermediate positions. That all of these starts are real is supported by the following observations: 1) probes incubated with SI in the absence of RNA were completely degraded; 2) increased concentrations of S , to as much as 2500 units/ml did not change the basic S1 pattern; and 3) identical 5' starts were observed under repressed conditions (Lane 8), although their relative abundance was much lower. This was true of the transformant ( L a n e 9) and of wild type yeast (compare Lanes 7 and 8). It is of interest that in the wild type strain LL2, a prominent 5' end was seen at -123. The absence of this transcript in the transformant and in the other wild type strain studied suggests some strain-specific heterogeneity in the DNA sequence.
Although the 5' leader of the CPAI message has an ATG codon, the reading frame initiated by this codon is very short. Several features in the 5' leader sequence support the idea that the CPAl gene starts at the ATG codon (+I) that initiates the 1233-nucleotide long reading frame. Upstream of the initiation codon are found transcription-and translationrelated sequences common to other known yeast genes (33). Nucleotides -1 to -25 of the initiation codon are adeninerich (33, 34) and contain a sequence CACAAA similar to the CACACA sequence noted in other yeast genes (33). Though located 143 nucleotides upstream of the major transcriptional start, a sequence TATATAA at -387 resembles the Goldberg-Hogness box. Another TATAT sequence is observed at -133.
SI nuclease mapping of the 3' termini showed protected ends spaced within 21 nucleotides (+1291 to +1312). These results indicate CPAl transcripts to have a 3' untranslated sequence of  nucleotides. Assuming 50 residues of poly(A) (35), and a 5' leader sequence of 231-244 nucleotides, the length of the major CPAl message is 1.6 kilobases. This value is in good agreement with the results of Northern blot analyses that show RNA transcripts of 1.2-1.6 kilobases in size (Fig. 4).
Codon Usage in CPAI-Codon utilization in CPAl is summarized in Table I. Only four of the possible 61 codons are absent in the sequence; CGA and CGG (Arg), GCG (Ala), and AGC (Ser). AGA is preferred for arginine, GGU for glycine, and the UUA and UUG codons for leucine. Based on the method of Ikemura (40), the frequency of optimal codon usage in the CPAl gene was calculated to be 0.63. This value is similar to the values calculated for CPAQ (12), TRPS (41), and CYC1 (40) and is typical of moderately expressed genes in yeast.
Amino Acid Sequence Derived from Nucleotide Sequence-Yeast carbamyl phosphate synthetase is extremely labile and has not been purified (10); therefore, no protein sequence data are available. That the 1233-nucleotide long reading frame codes for the small subunit of yeast arginine-specific carbamyl phosphate synthetase is substantiated by the extensive homology of the derived amino acid sequence to the amino acid sequence of the small subunit of E. coli carbamyl phosphate synthetase (16). The amino acid sequences of the two proteins are shown in Fig. 5. The two sequences exhibit an overall homology of 65.3% over the entire length of the polypeptide chain, with an average of only two deletions or insertions/100 amino acid residues. Of 357 possible matches, 148 (41.6%) amino acid residues are identical and 85 (23.8%) represent conservative replacements. Of particular significance is the homology at the NH2-terminal end, showing the amino acid sequence is strongly conserved up to the NH2 terminus of the E. coli protein. The extensive amino acid sequence homology shows the small subunits of yeast and E. coli to be as homologous as the large subunits of this enzyme (12).
Catalytic Domain of Small Subunit of Carbamyl Phosphate Synthetase and Other Amidotransferases-The small subunits of E. coli (42,43) and yeast (10) carbamyl phosphate synthetase have been shown to function in glutamine amide transfer. The glutamine amidotransferases, which also include anthranilate synthase (44), formylglycinamide ribonucleotide amidotransferase (&), p-aminobenzoate synthase (46,47), and glutamine phosphoribosylpyrophosphate amidotransferase (48), catalyze the hydrolysis of glutamine and transfer of NH, to a distant ammonia site located either on a nonidentical subunit or in a different catalytic domain of the same polypeptide chain. The small subunit of E. coli carbamyl phosphate synthetase has been shown to have a reactive cysteine residue at the glutamine catalytic site (49). An active site cysteine has also been identified by affinity labeling studies of anthranilate synthase Component I1 (19,21).
In an attempt to identify the active site responsible for 10 pg of t o t a l RNA from transformant JL113/ST15 grown in minimal medium was hybridized to the probe and treated with 500 units/ml of SI nuclease. The chemically derivatized probe was used as the sizing ladder. The numbers of the corresponding nucleotides in the DNA sequence are shown for the major starts. One and one-half nucleotides have been subtracted from the sequence positions to correct for the displacement of the 3' terminus in the sequencing ladder (32). glutamine amide transfer, we have searched for primary sequence homologies in the cysteine-containing regions of the yeast and E. coli enzymes and anthranilate synthase Component I1 from various organisms. A comparison of the yeast and bacterial small subunits with the different anthranilate synthases Component I1 revealed a highly conserved region composed of 13 amino acids. This region includes the previously identified active site cysteine residue of anthranilate synthase Component I1 of S. mrcescem (21) and of P. putida (19). As shown in Table 11, eight of the 13 amino acids in this region of the E. coli carbamyl phosphate synthetase and anthranilate synthase Component I1 are identical. Of the remaining five residues, all are conservative substitutions. This highly conserved sequence which we propose to be the active site of yeast and bacterial carbamyl phosphate synthetase is located around Cys-264 of the yeast enzyme and Cys-269 of the E. coli enzyme. The corresponding cysteine in the E. coli anthranilate synthase Component I1 is Cys-83.
Active site cysteines have also been identified in studies of formylglycinamide ribonucleotide amidotransferase of Sal-monella typhimurium (50) and chicken liver (51). Although the labeled peptides that were isolated are only five and seven residues long (Table 11), the L-G-V-C sequences are identical to part of the amino acid sequences of anthranilate synthase Component I1 and carbamyl phosphate synthetase. Also shown in Table I1 is the amino acid sequence of the glutamine active site of phosphoribosylpyrophosphate amidotransferase (52, 53). The active site of the bacterial phosphoribosylpyrophosphate amidotransferases appears to be different from that of the other amidotransferases. For example, the active cysteine is the NH2-terminal residue of both proteins (52,53).  of carbamyl phosphate synthetase and other amidotransferases suggested that these proteins might be related. Kaplan and Nichols (17) have recently shown that the amidotransferases encoded by the trp(C)D gene and the small subunit of p-aminobenzoate synthase Component I1 encoded by pabA appear to have evolved by gene duplication. Furthermore, there have been speculations that amidotransfereases in general may have evolved from a common glutamine-utilizing enzyme (42,44,54).
Dot matrix analysis ( able regions of amino acid homology between residues 220-382 of carbamyl phosphate synthetase and anthranilate synthase Component 11. The homologies revealed by the dot matrix lie on a diagonal line (data not shown). The extent of the homology is shown in Fig. 5 where the sequences of the yeast and E. coli  shown in the figure, although the other bacterial and fungal anthranilate synthases (18-20) were used to maximize the alignment. The deletions shown preserve the alignments of the anthranilate synthase Component I1 with p-aminobenzoate synthase Component I1 previously proposed by Kaplan and Nichols (17). The NH2-terminal region of anthranilate synthase Component I1 was difficult to align because of the lack of significant homology. In fact, it is impossible to align the first 35 residues of anthranilate synthase Component I1 with any part of carbamyl phosphate synthetase without introducing extensive insertions into the carbamyl phosphate synthetase sequence. The alignment shown for this region is tentative and other alignments are equally tenable. Between residues 220 and 382 of carbamyl phosphate synthetase and residues 35 and 192 of anthranilate synthase Component 11, 41 (27%) amino acids are identical and 42 (28%) represent conservative substitutions, giving an overall homology of 55%.
The most highly conserved sequences are clustered and fall into three domains of the proteins labeled as A, B, and C in Fig. 5. A relationship of carbamyl phosphate synthetase and anthranilate synthase Component I1 was also evident from an analysis of their gene sequences. A dot matrix program scoring a dot for 50% or greater homology in a scan of 60 nucleotides revealed three homologous sequences in the carA and trp(G)D genes (Fig. 6B). The three conserved DNA sequences lie on a diagonal line and correspond to the conserved regions A, B, and C in the protein sequences.
On the basis of DNA sequence homology, Kaplan and amidotransferases The boxed residues indicate identical and conserved hydrophobic amino acid residues. The asterisks denote active sites of amidotransferases at which the cysteine residues become labeled with reactive glutamine analogues such as 6-diazo-5-oxo-~-norleucine (  Nichols (17) have found that trp(G)D is related to pubA, which codes for the small subunit of p-aminobenzoate synthase Component 11. Comparing the nucleotide sequences of trp(G)D andpubA with a program which scored a dot when a minimum of 50% homology occurs in a stretch of 40 nucleotides, these authors reported six regions of homology between the two genes. As shown in Fig. 6A, an even more definitive evolutionary relationship is revealed with a program scoring 50% or greater homology in 60 nucleotides. The two sets of data, namely homology of curA and trp(G)D and of trp(G)D andpubA, suggested that all three genes are related. This was tested by analyzing the sequences of E. coli carbamyl phosphate synthetase and p-aminobenzoate synthase Component 11. As indicated in the previous section, both enzymes are highly homologous in the vicinity of the active site cysteine residue. The protein sequence alignments shown in Fig. 5 indicate that the two proteins are significantly homologous in other regions as well. It is of interest that the greatest homology is seen in the previously mentioned regions A, B, and C.
A dot matrix comparing the gene sequences of curA and pu6A shows a clear line of homology in the region of the active cysteine residue (Fig. 6C). In this region the DNA sequence homology is 67%. Although the two genes have undergone extensive divergence, as evidenced by the absence of a clear diagonal in the dot matrix (Fig. 6C), they, nonetheless, appear to be related. The three regions of amino acid conservation average 49% homology at the nucleotide level. Statistically, this value is significantly above the maximal value of 30% for nonrelated genes in E. coli (17).

DISCUSSION
In previous studies (12), we reported that the large subunits of yeast arginine-specific carbamyl phosphate synthetase and E. coli carbamyl phosphate synthetase are structurally related.
The present studies were undertaken to determine the S t N Cture of the small subunit of yeast carbamyl phosphate synthetase and to establish its relation to the small subunit of the prokaryotic enzyme as well as to other glutamine amidotransferases. The yeast CPAl gene, encoding the small subunit of arginine-specific carbamyl phosphate synthetase was cloned on a recombinant plasmid and its nucleotide sequence has been determined. The gene is 1233 nucleotides long and codes for a polypeptide of 411 amino acids. The amino acid sequence of the polypeptide derived from the nucleotide sequence is homologous to the derived amino acid sequence of the small subunit of E. coli carbamyl phosphate synthetase.
Unlike the large subunit of yeast carbamyl phosphate synthetase whose gene has an internal duplication, there is no evidence for a duplication in CPAl.
The small subunits of E. coli (42) and yeast (10) carbamyl phosphate synthetases function in glutamine amide transfer. The amide N derived from glutamine is transferred to the large subunit of carbamyl phosphate synthetase and is subsequently used to form carbamyl phosphate from HCO; and ATP (42). This reaction mechanism is similar to the utilization of glutamine as donor of the amide group for the synthesis of anthranilate (44) and p-aminobenzoate (47). The latter reactions are also dependent on the transfer of glutamine amide N to an active site on another subunit of the synthases (44, 46, 47). These general properties of amidotransferases have raised the intriguing possibility of a common catalytic mechanism that may, in fact, have an evolutionary basis (42-44,54). These earlier speculations to some extent have been substantiated by recent data clearly indicating that Component I1 of anthranilate synthase and p-aminobenzoate synthase are closely related both on the basis of their amino acid sequences and also in their gene sequences (17). The latter evidence has been suggested to indicate that these two enzymes arose from a gene duplication event (17).
The gene sequences coding for the small subunit of yeast and E. coli carbamyl phosphate synthetases as well as the derived amino acid sequences of the two proteins reported in this paper suggest that both the prokaryotic and eukaryotic enzymes are also members of a broader class of amidotransferases with a common evolutionary origin.
Both yeast and E. coli carbamyl phosphate synthetase share three regions of homology with E. coli anthranilate synthase Component I1 and p-aminobenzoate synthase Component I1 (A, B, and C in Fig. 5). The common sequences can be aligned in these different enzymes without introducing any major deletions or insertions. Regions A , B, and C of the three enzymes comprising 62 amino acid residues share 18 identities and 11 conservative replacements. Further evidence indicating that the three enzymes have a common ancestry was obtained from a comparison of their gene sequences. Computer analysis using a dot matrix program showed three homologous segments in the sequences of trp(G)D and carA. These corresponded to the three regions of amino acid con-servation. Although a dot matrix comparing carA and pabA revealed only one homologous sequence, the limits set in the scanning program lead to an underestimate of the actual extent of homology of the two genes. Thus, the three regions exhibiting the highest primary sequence conservation average 49% homology in the DNA.
These data strongly argue that carA, trp(G)D, and pabA were derived from a common ancestral gene. A tentative evolutionary scheme of how the present genes of E. coli may have evolved is presented in Fig. 7. This scheme assumes that the size of the ancestral gene was similar in size to that of the present-day pabA gene. Two duplication events are necessary to explain three related genes. To account for the greater sequence divergence of carA, we propose that this gene arose from the first duplication. An early duplication leading to carA is also consistent with the extensive homology of the E. coli and yeast proteins. The duplication event must, therefore, have occurred before the emergence of eukaryotes. That the ancestral forms of trpG and pabA arose from a later duplication is consistent with the higher degree of amino acid and DNA homology as well as the similarity of the catalytic function of the proteins. Further evolution of carA and trpG must have involved fusions and/or insertions of other sequences. In the case of trp(G)D, Component I1 of anthranilate synthase resulted from a fusion of the trpC gene coding for an amidotransferase and the trpD gene for a phosphoribosyl transferase (56)(57)(58). In carA, the sequences fused to or inserted into the NH2 terminus are almost the length of the ancestral amidotransferase. The function of the added sequences is not known.
Several lines of evidence point to region B (Fig. 5) as the active site involved in the amidotransferase activity of the three enzymes. It is the most highly conserved region of the proteins. Thirteen amino acid residues are almost identical not only among the various anthranilate synthases Component I1 but in carbamyl phosphate synthetase and p-aminobenzoate synthase Component 11. Of special significance is the presence of an invariant cysteine residue. Glutamine utilization by most, if not all, amidotransferases depends on the participation of a cysteine residue (19,44,(48)(49)(50)(51)(52)(53)(59)(60)(61). In two of the bacterial anthranilate synthases, the active cysteine in the conserved sequence has been shown to be  (19, 21). These findings imply that region B is part of the catalytic domain of these enzymes. Some sequence data are also available for three other amidotransferases. The partial sequences of formylglycinamide ribonucleotide amidotransferase of S. typhimurium (50) and chicken liver (51) hint that these enzymes are also related to the general class of amidotransferases proposed here. Two short peptides of five to seven amino acids with the active cysteine have been isolated and sequenced. The sequences reported have a suggestive homology to the proposed catalytic sites of anthranilate synthase Component I1 and carbamyl phosphate synthetase. Other examples of glutamine-dependent amidotransferases which include phosphoribosylpyrophosphate amidotransferase (52, 53) and GMP synthetase (61) are more difficult to draw conclusions about. Although both transferases have active site cysteines next to a short stretch of hydrophobic residues, in at least one case where the entire protein sequence is known (62), it is not homologous to either anthranilate synthase Component I1 or carbamyl phosphate synthetase. Despite the absence of an evolutionary relatedness of these enzymes to the other amidotransferases, it is conceivable that they may nonetheless have common structural features at the active sites. This will require crystallographic data on the tertiary structure of the enzymes.