Characterization and derivation of the gene coding for mitochondrial carbamyl phosphate synthetase I of rat.

The nucleotide sequence of rat carbamyl phosphate synthetase I mRNA has been determined from the complementary DNA. The mRNA comprises minimally 5,645 nucleotides and codes for a polypeptide of 164,564 Da corresponding to the precursor form of the rat liver enzyme. The primary sequence of mature rat carbamyl phosphate synthetase I indicates that the precursor is cleaved at one of two leucines at residues 38 or 39. The derived amino acid sequence of carbamyl phosphate synthetase I is homologous to the sequences of carbamyl phosphate synthetase of Escherichia coli and yeast. The sequence homology extends along the entire length of the rat polypeptide and encompasses the entire sequences of both the small and large subunits of the E. coli and yeast enzymes. The protein sequence data provide strong evidence that the carbamyl phosphate synthetase I gene of rat, the carAB gene of E. coli, and the CPA1 and CPA2 genes of yeast were derived from common ancestral genes. Part of the rat carbamyl phosphate synthetase I gene has been characterized with two nonoverlapping phage clones spanning 28.7 kilobases of rat chromosomal DNA. This region contains 13 exons ranging in size from 68 to 195 base pairs and encodes the 453 carboxyl-terminal amino acids of the rat protein. Southern hybridization analysis of rat genomic DNA indicates the carbamyl phosphate synthetase I gene to be present in single copy.

The nucleotide sequence of rat carbamyl phosphate synthetase I mRNA has been determined from the complementary DNA. The mRNA comprises minimally 5,645 nucleotides and codes for a polypeptide of 164,564 Da corresponding to the precursor form of the rat liver enzyme. The primary sequence of mature rat cabamyl phosphate synthetase I indicates that the precursor is cleaved at one of two leucines at residues 38  Carbamyl phosphate synthetase I catalyzes the synthesis of carbamyl phosphate from NHs, HCOZ, and 2ATP and requires N-acetylglutamate as an allosteric activator (1). NH3dependent carbamyl phosphate synthetase is found only in ureotelic animals primarily in the liver and small intestine (21, where the enzyme plays an important role in removing excess NH, from the cell. The ability of carbamyl phosphate synthetase I to function efficiently a t low concentrations of NH3 is recognized to be an important step in the evolution of ureotelic metabolism (3,4).
In recent studies, we have described the isolation of cDNA clones complementary to rat carbamyl phosphate synthetase I mRNA ( 5 ) . Partial DNA sequence analysis of the cDNA has shown the gene encoding carbamyl phosphate synthetase I to be evolutionarily related to the genes encoding the glutaminedependent carbamyl phosphate synthetases of Escherichia coli (6,7 ) and yeast (8,9). We have proposed that the gene *These studies were supported by Grant GM 25846 from the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisernent" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. encoding the NR-dependent carbamyl phosphate synthetase I evolved from the genes coding for the glutamine-dependent carbamyl phosphate synthetases by gene fusion and subsequent mutation of the glutamine binding and catalytic domain (5).
In this communication, we present the entire nucleotide sequence of carbamyl phosphate synthetase I mRNA. The sequence of the mRNA confirms our previous conclusion that the mitochondrial enzyme is a hybrid p o l~e p t i d e encoded by a gene derived from two separate genes: one gene coding for a subunit that catalyzes glutamine amide transfer (glutamine subunit) and the second coding for a synthetase subunit. Analysis of the rat gene shows it to occur in single copy and to be composed of multiple exons separated by long intervening sequences.

MATERIALS AND METHODS
DNA and Enzymes-Enzymes were purchased from P-L Biochemicals, New England Biolabs, and Amersham Corp. Plasmid DNA was isolated by the method of Birnboim and Doly (10) and purified by chromatography on Sepharose 6B.
Carbamyl Phosphate Synthetase cDNA Clones-The construction of rat (Wistar strain) liver cDNA libraries and the isolation of a set of recombinant plasmids (pKB4, pKB21, pHN107, and pHN234) carrying cDNA inserts complementary to rat liver carbamyl phosphate synthetase I mRNA have been described previously (5,11). The cDNA insert (2.1 kb') in the plasmid pKB4 was confirmed by hybrid-selected in oitro translation to direct the synthesis of a 165-kDa polypeptide corresponding to the precursor form of rat liver carbamyl phosphate synthetase I (11). A near full-length cDNA insert (5.3 kb) was carried in plasmid pHN234. Based on the size of carbamyl phosphate synthetase mRNA, the cDNA insert was only 300-400 nucleotides shorter than the expected length of the message. To select for clones containing the 5"terminal sequences, we screened our Okayama-Berg rat liver cDNA library (1.8 X 10' transformants) by colony hybridization using as a probe an 800-bp Psll-XhoI fragment from the 5' end of pHN234. This screen yielded four new clones, pHN291, pHN292, pHN293, and pHN295, with identical cDNA that the cDNA inserts overlapped by 600 bp the 5' region of the inserts of 1.5 kb. Restriction analysis of the four new clones showed cDNA insert in pHN234. The four clones, however, contained 500 bp of a sequence not homologous to carbamyl phosphate synthetase mRNA. A complete screen of the cDNA library (6 X IO7 transformants) with a 175-bp RsaI-KpnI fragment from the 5' proximal sequence of the cDNA insert of pHN234 yielded another 68 confirmed positive clones. Of 28 clones analyzed by restriction mapping, four contained cUNA inserts identical to the insert in pHN234, and 24 and single strands were separated by electrophoresis on polyacrylamide gels. The nucleotide sequences of both isolated single strands were determined.
Northern Blot Hybridization-The size of carbamyl phosphate synthetase mRNA was estimated by Northern analysis of rat liver poly(A+) RNA after denaturation in 2.2 M formaldehyde and separation on agarose gels containing 1.1 M formaldehyde (13). RNA was transferred to nitrocellulose and hybridized as described by Thomas (14) with a radioactive probe prepared by nick translation. The gels were calibrated with a mixture of E. coli and yeast rRNAs as described (8), @X174 RF DNA (5386 bp) and mp13 RF DNA (-7200 bp).
Amino Acid Composition, NH2and Carboxyl-terminal Analyses- The amino acid composition, NH,-terminal sequence, and carboxylterminal amino acids of mature carbamyl phosphate synthetase were determined on the rat liver enzyme judged by SDS-polyacrylamide gel electrophoresis to be 98% pure. Lyophilized salt-free samples (154 wg) of the protein were hydrolyzed with 6 N HC1 (containing a crystal of phenol) in evacuated sealed tubes for 22 h a t 110 "C. The amino acids were analyzed on an automatic amino acid analyzer (Beckman, model 119CL) which had been calibrated with 10 nmol of a standard mixture. The average of three determinations was used in the calculations. The NH, terminus of the enzyme was identified according to the method of Gros and Labouesse (15). Dansyl-amino acids released after acid hydrolysis of the dansylated protein for 4 and 18 h a t 110 "C were identified on polyamide thin layers (16). The NHZterminal sequence of the protein was determined by automated Edman degradation with a Beckman Sequencer, model 890C, using the 0.1 M Quadrol buffer system. Carbamyl phosphate synthetase was desalted on Sephadex G-25 equilibrated with 0.06 M triethylamine/ trifluoroacetic acid, pH 8.9, and a sample (4 mg) was used for sequence analysis. Thiazolinones released at each cycle were converted to phenylthiohydantoins and identified by high performance liquid chromatography. The carboxyl-terminal sequence of the enzyme was determined by digestion of the protein with phenylmethylsulfonyl fluoride-treated carboxypeptidase A according to the procedures described by Ambler (17). Amino acids released by carboxypeptidase A digestion were identified by amino acid analysis and corrected for autodigestion with two controls; one control contained all components except carbamyl phosphate synthetase; the second contained all components except carboxypeptidase A.
Genomic Clones of Carbamyl Phosphate Synthetase I-Genomic clones of rat carbamyl phosphate synthetase I were isolated from a recombinant phage library of rat chromosomal DNA generously provided to us by Drs. James Bonner and Thomas D. Sargent (California Institute of Technology). The genomic library consists of 10-20-kb EcoRI fragments of rat (Sprague-Dawley strain) nuclear DNA cloned in the X phage, Charon 4A (18). Bacteriophage were grown in E. coli DP50supF in NZYCDT medium, and recombinant phage (650,000 plaques) were screened by plaque hybridization (19) using as a probe a 770-bp Pstl fragment from a cDNA clone (pCPSrl) kindly provided by Dr. William E. O'Brien (Baylor College of Medicine). After plaque purification, recombinant bacteriophage DNA from E. coli DP50supF was isolated as described by Blattner et al. (20) and Maniatis et al. (21). EcoRI and Hind111 fragments of the rat nuclear DNA inserts were subsequently subcloned in pUC8.
Hybridization Analysis of Genomic DNA-Rat chromosomal DNA was isolated from livers of Wistar strain rats according to the procedure of Blin and Stafford (22). Genomic DNA (20 pg) was digested to completion with appropriate restriction endonucleases, separated on 0.75% agarose gels, and transferred to nitrocellulose as described by Southern (23). Hybridization with radioactive probes was carried out in 5 X SSC, 5 X Denhardt's, 100 pg/ml salmon sperm DNA, 50 mM sodium phosphate, pH 8, 10 mM EDTA, 0.1% SDS, 50% formamide, and 10% dextran sulfate overnight a t 42 "C. The blots were washed two times for 10 min a t room temperature with 2 X SSC, 0.1% SDS, followed by two washes for 10-15 min a t 65 "C in 0.1 X SSC, 0.1% SDS. Blots were exposed to Kodak XAR-5 film with an intensifying screen for 24-48 h.  longest cDNA insert starting with a poly(A) tract and including 4215 nucleotides of the coding sequence. This clone, however, lacked the 5' nontranslated leader and part of the NH2-terminal coding region. The 5' region missing from pHN234 was isolated on plasmid pHN291. The latter is one of four recombinant plasmids, all of which had identical cDNA inserts overlapping with the 5' proximal region of pHN234 but extending an additional 400 nucleotides further upstream. This upstream region included the entire NHp-terminal coding region of the mRNA as well as 139 nucleotides of 5' nontranslated leader. pHN291 as well as the other three I Gene

V a l L y s A l a G l n T h r A l a H i s I l e V a l L e u G l u A s p G l y T h r L y s Met L y s G l y T y r
Ser P h e G l y His P r o S e r 6 5 S e r V a l A l a G l y G l u V a l V a l P h e A s n T h r G l y L e u G l y G l y T y r Ser G l u A l a L e u T h r A s p P r o A l a T y r L y s

I l e Met A l a T h r G l u A s p A r y G l n L e u P h e Ser A s p L y s L e u A s n G l u I l e A s n G l u L y s I l e A l a P r o S e r P h e 5 6 5
ATT ATG GCC ACA GAA GAC AGG CAG CTT TTC TCA GAC AAG CTG AAT GAG ATC AAC GAG AAG ATT GCT CCT AGC T I T 1 R 9 5

A l a V a l G l u Ser Met G l u A s p A l a L e u Lys A l a A l a A s p T h r I l e G l y T y r P r o V a l
Met I l e A r y Ser A l a T y r 590 GCA GTG GAA TCA ATG GAG GAT GCC TTG AAG GCA GCA GAC ACC ATT GGC TAC CCT GTG ATG ATT CGG TCT GCG TAT 1770

A l a Leu G l y G l y L e u G l y S e r G l y I l e C y s P r o A s n L y s G l u T h r Leu Met A s p L e u G l y T h r L y s A l a
Phc A l a

A S P A S P A s n c y s V a l T h r V a l C y s A s n Met G l u A s n V a l A s p A l a
Met ~l y V a l His T h r G l y n s p S e r V a l V a l 6 6 5 GAT GAT AAC TGT GTC ACA GTC TGT AAC ATG GAG AAT GTT GAC GCC ATG GGT GTT CAC ACA GGT GAT TCA GTT GTT 1 9 9 5 va1 A l a p r o

A s n P r o G l u T h r V a l Ser T h r A s p P h e A s p G l u C y s A s p Lys L e u T y r P h e G l u G l u L e u
Ser Leu G l u A r g I l e 1 0 4 0 AAC CCG GAG ACT GTG AGC ACT GAC TTT GAT GAG TGT GAC AAA CTC TAC TTT GAA GAG CTG TCT TTG GAG AGA ATC identical plasmids contained another sequence unrelated to carbamyl phosphate synthetase. This sequence consisting of 528 nucleotides was found to have a reading frame encoding the sequence of low molecular weight kininogen (data not shown). The mechanism by which the hybrid cDNA was formed during reverse transcription is not clear at present. Nuc~e~tide Sequence of Carbamyl P h~s~~t e S~n t h e t~e I mRNA-The nucleotide sequence of carbamyl phosphate synthetase I mRNA was derived from the cDNA inserts of the five recombinant clones, pKB4, pKB21, pHN107, pHN234, and pHN291, according to the strategy outlined in Fig. 2. The entire nucleotide sequence was confirmed from the complementary strands. All of the labeled sites were crossed with a second set of overlapping fragments. The cDNA sequence corresponding to carbamyl phosphate synthetase I mRNA is presented in Fig. 3. The sequence (not including the poly(A) tract) is 5545 nucleotides in length. The mRNA, minimally, has a 5' nontranslated leader of 139 nucleotides, a continuous reading frame of 4500 nucleotides, and a 3' nontranslated region of 905 nucleotides. The hexanucleotide AATAAA located at nucleotide +5387 through 5392 conforms to a signal for polyadenylation (25). Fourteen nucleotides downstream at nucleotide +5406 is the poly(A) addition site. A poly(A) tract of 100-110 adenines was found in the cDNA sequence of the cDNA insert from pKB4; a tract of 41 adenines was sequenced from the cDNA insert of pHN234. The coding sequence begins with the ATG at nucleotide +1 and ends with a termination codon (TAG) at nucleotide +4,501, followed at nucleotide +4,519 with a second termination signal (TAGTGA). The 4,500-nucleotide-long reading frame initiated with the ATG at nucleotide +1 codes for a polypeptide of 1,500 amino acids with a calculated molecular weight of 164,564. This vaIue is in excellent agreement with the molecular weight ~1 6 5 , O~) of the precursor form of rat liver carbamyl phosphate synthetase I (26). The ATG at nucleotide +l is preceded by a purine (A) at -3 and a C at -4, as found in translational start sites of most eukaryotic genes (27,28). In contrast, an upstream ATG located at nucleotide -44 in the 5' leader sequence is preceded by a pyrimidine (C) at -3 and 12 bp downstream is followed by a termination codon. Both features are common to other eukaryotic genes that contain ATG codons upstream of the   NHz-terminal sequence of mature rat liver carbamyl phosphate synthetase I. Automated Edman degradation from the NH, terminus ofthe purified protein ( 2 5 1 nmol) was performed in a Reckman Sequencer, model 89OC. I'TH derivatives were identified by high performance liquid chromatography, and the peak areas were quantitated by automatic integration and comparison to a mixture of known standards. The yield (nmol) ofthe 1'TH derivatives released in each cycle is indicated in the linvs hrlorc, cwch seyucwcc,. PTH-serine and PTH-threonine were qualitatively identified. The yield of PTH-serine was low (3.9 and 1.6 nmol were recovered in cycles 1 and 2, respectively). The yield of glutamine was derived from the sum of PTH-glutamine and PTH-glutamic acid.

TARIX I1
Amino acid composition of maturr carbamyl phosphatf. svnthrtasr I Values are given as mol of residue/mol of enzyme. Rat liver poly(A') RNA (10 pg) was denatured in 2.2 M formaldehyde, separated on 1"; agarose gel under denaturing conditions. and transferred to nitrocellulose. The RNA was hybridized with a nick-translated cDNA probe internal to the coding region. major translational initiation site (27,28). In the carbamyl phosphate synthetase leader, the upstream ATG at nucleotide -44 is centered within an imperfect inverted repeat (underlined in Fig. 3). A similar potential stem-loop structure is also found around an upstream ATG codon in the 5' leader sequence of the yeast gene CPAl, which codes for the small subunit of carbamyl phosphate synthetase (8).
The nucleotide sequence of the carbamyl phosphate synthetase I mRNA was consistent among all of the cDNA inserts of the recombinant clones, with the exception of two differences at nucleotides +3603 and +3669 of the cDNA insert of pKB21. Both differences (C+T) at position 3603 and (G+A) a t position 3669 occur in the third position of valine codons and most probably are due to a polymorphism. Corrected for 10% decomposition.
Codon Usage-Codon usage in the carbamyl phosphate synthetase I mRNA is summarized in Table I Northern Analysis-The size of carbamyl phosphate synthetase I mRNA was estimated by Northern blot hybridization of rat liver poly(A+) RNA with a nick-translated probe from the coding region of the cDNA. As shown in Fig. 4, radioautographs revealed a major radioactive band whose length was estimated to be 5.7 f 0.3 kb. The size of the transcript is consistent with sequence analysis of the corre- Dot matrix analyses of the homology of the amino acid sequences of carbamyl phosphate synthetases of rat, yeast, and E. coli. A, matrix of the rat carbamyl phosphate synthetase I sequence versus the sequences of the glutamine and synthetase subunits of E. coli carbamyl phosphate synthetase. B, matrix of the rat carbamyl phosphate synthetase I sequence versus the sequences of the small and large subunits of yeast carbamyl phosphate synthetase C, matrix plot of the internal homology in the synthetase component of rat carbamyl phosphate synthetase 1. The m a i n d i a g o~a~ represents the homology between the sequences of the two proteins. The shorter diagonals aboue and below the main line represent the reciprocal internal homologies in the synthetase components of the three enzymes. E. coli CPS, E. coli carbamyl phosphate synthetase (6, 7); yeast CPS, yeast carbamyl phosphate synthetase (8, 9); rat CFSI, rat carbamyl phosphate synthetase I (this work).

844-885 R~i i n T V P F V s K A T C V P L A K V A A U V~A~K S L A E --Q ----G~l K " -E~l P~
' , % , , , , , , / , I 3 I , yeast CPS 864.905

R A S R S F P F V S~V L G V N F I € I A V~A F~~G D~V P --~----P V D~---~~N K K
at CPSI 1259-1)Oz sponding cDNA. Assuming a poly(A) tract of 100 adenines and a 5' leader of 139 nucleotides, the minimal full-length transcript is calculated to be 5645 nucleotides.

R R S R~~P F V S K T~~V~F I O V A T K V N~G E S~Q~--K H --
Identificat~on of Mature Carbam~l Phosphat~ Synthetase I-Rat liver carbamyl phosphate synthetase I has previously been shown to be synthesized as a precursor 5 kDa larger than the mature enzyme (26). The NHs-terminal start of the mature protein was determined in the present study by analysis of the amino-terminal residue and also by a partial sequence of the NHp-terminal region of the mature protein.
Dansylation and hydrolysis of the protein yielded dansylleucine and dansyl-serine, although serine appeared to be a minor component. A leucine NH2 terminus has also been reported by Clarke (29) for rat liver carbamyl phosphate synthetase I prepared by a different procedure. radation (Fig. 5). The first sequence started at Leu 39 and the second at Ser 40. Both sequences matched the sequence encoded by the mRNA for the 10 steps analyzed (compare Figs. 3 and 5). The average yield calculated from the stable amino acid derivatives released in cycles 3-5 indicated an aproximately 1:1 ratio of the two NH2-terminal sequences. While these results suggest two adjacent processing sites involving cleavage on the carbonyl side of Leu 38 and/or Leu 39, the possibility of proteolytic removal of the NH2-terminal leucine during the isolation of the enzyme cannot be excluded. Digestion of purified carbamyl phosphate synthetase with carboxypeptidase A released 1.6 mol of alaninelmol of protein and 0.37-0.43 mol/mol of (tyrosine, lysine, glycine, and serine). Since lysine is known to be a poor substrate for carboxypeptidase A, these data indicate the COOH-terminal sequence to start with two alanines followed by lysine, followed by glycine, tyrosine, and serine; the order of the latter three being indeterminate. These data are consistent with the sequence of carbamyl phosphate synthetase encoded in the cloned mRNA (Fig. 3).
The correct identification of the coding sequence is also supported by the experimentally determined amino acid composition and the molecular weight of mature carbamyl phosphate synthetase. The amino acid composition of the purified rat enzyme i s in excellent agreement with the composition of the mature protein predicted from the mRNA sequence ( Table  11). Assuming that the mature protein starts at Leu 39 of the cDNA sequence, its overall length of 1462 amino acid residues corresponds to a molecular weight of 160,304. This agrees well with the molecular weight of 158,700 previously determined by physical methods (30).
Homology of Carbamyl Phosphate Synthetase I and Glutamine-dependent Carbamyl Phosphate Synthetases of Yeast and E. coli-The derived amino acid sequence of carbamyl phosphate synthetase I is homologous to the amino acid sequences of both yeast and E. coli carbamyl phosphate synthetases. The amino acid sequences of the rat and E. coli proteins are shown in Fig. 6. The sequence homology between   The evolutionary relationship of the rat, yeast, and E. coli enzymes is even more clearly demonstrated by dot matrix (31) analyses, where the sequence homology between the proteins is scored by using a mutation data matrix (32). As shown in Fig. 7, when the amino acid sequence of carbamyl phosphate synthetase I is compared to those of the E. coli

A s n G l u A l a L e u T y r V a l L e u S e
CCCTTCACCCTTTCCAG/TGGGTCTGCCATG.. . . .CGAGTCTCTCAGI GTAGTGTCCCATTTTCT

V a l I l e C l u C y s V a l A l a I l e L y s A l a P r o M e t P h e S e r T h r G l y C l u
CCTGTTGCGTCTGACAGlGCTCCCATGTTT.. , . . .TCTACTGGAGAGl GTAAATAGTTAATGATC

L e u P h e A l a T h r S e r I l e A r g L y
TTCTATTTTAAATGCAGI CTTTTTGCCACA.. . , . . .TCCATCAGAAA( GTAAGAACCGAATAGCC

V a l T h r L y s L e u
I synthetase has previously been described for the E. coli and yeast enzymes (6,9).,The present data show the synthetase component of the rat enzyme to have the same duplication. The two halves of the rat synthetase component exhibit 23% identical residues. Comparable values for the yeast and E. coli synthetases are 28.5 and 35%, respectively. Since the greatest sequence conservation among the three species occurs in the NHz-terminal halves of the synthetase subunits, the decreasing homology is due primarily to divergence in the carboxylterminal halves of the three enzymes. Functional Domains-We have previously proposed three functional domains in E. coli and yeast carbamyl phosphate synthetases. Two of the domains were suggested to be involved in the binding of ATP (9), and the third domain was identified as the site of glutamine hydrolysis (8). The main feature of the glutamine hydrolytic site was the presence of a reactive cysteine residue, which had previously been shown to be essential for amidotransferase activity (33,34). The absence of this cysteine residue in the sequence of rat carbamyl phosphate synthetase I led us to conclude that the glutamine site is modified such that it could no longer catalyze the hydrolysis of glutamine (5).
The postulated ATP-binding sites in the E. coli and yeast enzymes (9) are essentially conserved in the amino acid sequence of rat carbamyl phosphate synthetase I. The ATPbinding sites are located in two regions of the NHZ-terminal half of the E. coli synthetase subunit. These sites have their counterparts in the carboxyl-terminal half. One of the sites (residues 302-352) exhibits sequence similarities to the ATPbinding site of phosphoglycerate kinase and to the predicted dinucleotide fold of glutamate dehydrogenase (9). In rat carbamyl phosphate synthetase I, the analogous domain is located between residues 718-768 (Fig. 8). This domain is highly conserved between the E. coli and rat sequences. Of 50 amino acid residues, 31 are identical and 10 represent functionally conserved substitutions. The corresponding domain (residues 1259-1302) in the carboxyl-terminal half of the rat synthetase component is also conserved (19 identities and 5 functionally conserved substitutions) (Fig. 8).
The second ATP-binding site (residues 152-210 in the NHZ-half and residues 698-756 in the COOH-half of the E. coli large subunit) is based on sequence similarities with the glycine-rich loop of the ATP-binding site of adenylate kinase and the p-subunit of F1-ATPase (9). The analogous domains in the rat sequence (residues 571-626 and 1113-1171) are less conserved. Of 55 amino acid residues, 17 are identical (data not shown).
A fourth functional domain of rat carbamyl phosphate synthetase I is absent in both the E. coli and the yeast enzymes. This domain consists of 38 residues, starting from the NHz-terminal methionine. This sequence contains 8 basic amino acid residues, 1 acidic residue, and a Pro-Gly sequence 4 residues before the start of the mature enzyme, these features being common to signal sequences that direct proteins for import into mitochondria ( The exons in both clones were localized by Southern hybridization and by sequence analysis. The two clones contained the sequence of the cDNA starting from nucleotide 31.42 through 4529. The coding region comprising 1359 nucleotides was ascertained to be split into 13 separate exons whose positions are shown in Fig. 9. With the exception of exons 9 and 10, all the remaining 11 exons were sequenced and the boundaries identified (Fig. 10). The localization of exons 9 and 10 was determined by restriction mapping and Southern hybridization within the limits indicated by the brackets in the figure.
Even though the two clones did not have overlapping sequences, it was possible to show that they represented contiguous segments of the gene. Southern hybridization analysis of rat genomic DNA digested with HindIII indicated a 2.6-kb fragment that hybridized to a cDNA probe containing the sequence encompassed by exons 3 into 7 (see Fig. 9). Based on the location of the HindIII sites in X10,cps and Xl,cps, the 2.6-kb HindIII fragment should include the EcoRI site common to the two clones. Of course, we cannot exclude the possibility that the genomic DNA may have another EcoRI site (<lo0 bp) separating the two cloned fragments.
The sequences of the 11 exons which range from 68 to 195 bp in length showed no discrepancies with the sequence of the cDNA reported in Fig. 3. All the intron-exon boundaries conformed to the GT-AG rule (36) and to the consensus sequence compiled by Mount (37). Assuming the same 20:l ratio for intron/exon lengths for the rest of the gene, we estimate that the rat carbamyl phosphate synthetase I gene should be included in approximately 110-120 kb of rat chromosomal DNA.
Extensive Southern hybridization analysis of rat genomic DNA indicates that the carbamyl phosphate synthetase I gene is present in single copy (data not shown).

DISCUSSION
Carbamyl phosphate synthetase of E. coli and of yeast are each composed of two different subunits encoded by two different genes (38). The smaller subunit catalyzes the trans-fer of the amide-N from glutamine to a catalytic center for carbamyl phosphate synthesis located on the larger synthetase subunit (39,40). Previous studies of the E. coli (6, 7) and of the yeast (8,9) genes have shown that the glutamine subunits are homologous and related to other amidotransferases (8). The genes for the larger synthetase subunits were found to have undergone a gene duplication resulting in a polypeptide with two homologous halves (6, 9). The present studies were undertaken to establish the relation of the mammalian NHa-utilizing enzyme carbamyl phosphate synthetase I to the glutamine-specific carbamyl phosphate synthetases of bacteria and yeast.
The entire coding sequence of the rat liver carbamyl phosphate synthetase I mRNA has been determined from overlapping cDNA clones. The message includes a short 5' leader of at least 139 nucleotides, 4500 nucleotides of coding sequence, and a long 3' nontranslated extension of 905 nucleotides. Analysis of two genomic clones selected from a X library indicates tht the gene contains multiple exons. The two clones studied represent a total of 28.7 kb of genomic DNA of which only 1388 bp are present in the mRNA.
The.predicted primary translation product of the carbamyl phosphate synthetase I mRNA is a 164,564-Da protein. This precursor is cleaved at Leu 38 and/or Leu 39 to yield a mature carbamyl phosphate synthetase I of 160,304 Da. These molecular weights are in very good agreement with previously published data on the sizes of the precursor and mature forms of the rat enzyme estimated on SDS-polyacrylamide gels (26, 29,30) and by sedimentation equilibrium in guanidine hydrochloride (30).
The derived amino acid sequence of rat carbamyl phosphate synthetase I has revealed several important facts that bear on its evolutionary origin. First, the complete sequence of the message has confirmed our previous conclusion that the mammalian enzyme is a fusion polypeptide of a glutamine amide transfer component and of a synthetase component (5). The amino acid sequences of the glutamine component located at the NH, end of the enzyme and of the fused synthetase component are both homologous to the separate subunits of the E. coli and yeast enzymes. The homology is unambiguous and extends across the carboxyl-terminal end of the small subunit and the NH, terminus of the larger synthetase subunit. This suggests that the mammalian gene was formed by a simple gene fusion event by either a mechanism similar to that proposed for the fusion of the his genes (41) or by nonhomologous (illegitimate) recombination (42, 43). Comparison of the protein sequences suggests the fusion probably occurred some time after the fungi diverged from the animal line, but before the separation of the chordate line, which includes cartilaginous and bony fishes, amphibians, and mammals. In all these organisms, arginine-specific carbamyl phosphate synthetase consists of a single 160-kDa polypeptide, suggesting fusion to be an early evolutionary event. A probable scheme depicting the evolution of the mammalian enzyme is presented in Fig. 11. This scheme incorporates our earlier suggestion that the small glutamine subunit of glutaminedependent carbamyl phosphate synthetase was derived from a fusion of an ancestral gene coding for the glutamine subunit with another unidentified gene (8). The synthetase subunit of the prokaryotic and fungal enzymes arose by a tandem duplication of an ancestral kinase (6). The present study provides strong evidence that the single gene of mammalian carbamyl phosphate synthetases was formed by a later fusion of the genes for the glutamine and synthetase subunits. Since all mammalian carbamyl phosphate synthetases are localized in the mitochondrion, their evolution requires the further acqui-Rat Carbamyl Phosphate Synthetase I Gene sition of a signal peptide directing the protein for import into the organelle.
Even though the two halves of the synthetase component have diverged to a greater extent in mammalian carbamyl phosphate synthetase (23%) than in yeast (28.5%) and E. coli (35%), certain regions are highly conserved. Among these is a domain which we previously proposed to be a possible nucleotide-binding site of the synthetase. This domain has been conserved in both the NH,-(residues 718-768) and carboxyl-terminal (residues 1259-1302) halves of rat liver carbamyl phosphate synthetase I. The conservation of these sequences in the context of a general loss of sequence identity between the two halves of the mammalian synthetase provides additional evidence that both domains play an important catalytic function.
Another functional domain previously identified to be involved in the transfer of glutamine amide-N has also been conserved in the glutamine component of mammalian carbamyl phosphate synthetase. As noted previously, however, a cysteine residue shown in other amidotransferases to be essential for glutamine hydrolysis has been substituted by a serine (5). This substitution accounts for the inability of the mammalian synthetase to derive NH3 from glutamine. Paluh et al. (44) have recently found that the site-specific substitution of a glycine for the homologous cysteine (Cys 84) in the catalytic site of anthranilate synthase Component I1 abolishes glutamine but not NH3 utilization by anthranilate synthase.
An important property of mammalian carbamyl phosphate synthetase I is its almost absolute requirement of acetylglutamate for enzymatic activity. Since acetylglutamate is not used in the reaction, it acts as an allosteric activator of the synthetase (45). Acetylglutamate has been shown to bind with high affinity to mammalian carbamyl phosphate synthetase I (45), although the binding site has not been identified. Acetylglutamate could bind either to the modified glutamine domain or to some new site in the fused protein. Although the former possibility is attractive, there is evidence from studies of carbamyl phosphate synthetase I11 of teleost fish that acetylglutamate binds to a site separate from the glutamine-binding site (46). Carbamyl phosphate synthetase I11 utilizes glutamine like the prokaryotic enzyme but requires acetylglutamate for activity (47,48). Casey and Anderson (46) have recently shown acetylglutamate and glutamine bind to two separate but interacting sites. We, therefore, favor the idea that in the mammalian enzyme, acetylglutamate interacts with a site distinct from the glutamine domain.