Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain of the Syrian hamster multifunctional protein CAD.

Glutamine-dependent carbamoyl-phosphate synthetase (EC 6.3.5.5) catalyzes the first step in de novo pyrimidine biosynthesis. The mammalian enzyme is part of a 240-kDa multifunctional protein which also has the second (aspartate carbamoyltransferase, EC 2.1.3.2), and third (dihydroorotase, EC 3.5.2.3) activities of the pathway. Shigesada et al. (Shigesada, K., Stark, G.R., Maley, J.A., and Davidson, J.N. (1985) Mol. Cell Biol. 175, 1-7) produced a truncated cDNA clone from a Syrian hamster cell line that contained most of the coding region for this protein. We have completed sequencing this clone, known as pCAD142. The cDNA insert contained all of the coding region for the glutaminase (GLN) and carbamyl phosphate synthetase (CPS) domains but lacked a short amino-terminal segment. By comparing the primary structure of the mammalian chimera to monofunctional proteins we have identified the borders of the functional domains. The GLN domain is 21 kDa, close to the size of the functionally similar polypeptide products of the Escherichia coli pabA and hisH genes. The domain has the three regions of homology common to trpG-type glutamine amidotransferases, as well as a fourth region specific to the carbamyl phosphate synthetases. The CPSase domain is similar to other reported CPSases in size (120 kDa), primary structure (37-67% amino acid identity), and homology between its amino and carboxyl halves. Analysis of the nucleotide and amino acid sequence identities among the various carbamyl phosphate synthetases suggests that the gene fusion which joined the GLN and CPS domains was an early event in the evolution of eukaryotic organisms and that the Saccharomyces cerevisiae enzyme consisting of separate subunits arose by defusion from an ancestral multifunctional protein.

A glutamine amidotransferase domain is a common feature of a large group of biosynthetic enzymes (21,22). Based upon sequence homology these enzymes are separated into the trpG-type and purF-type glutamine amidotransferases (23) although a common mechanism has been proposed for both classes (24). The amidotransferases are noted for their structural diversity. The domain exists as a separate subunit or as part of a larger polypeptide fused in different ways to a synthetase domain or other structural elements (25). Despite these differences in structural organization the function of a glutamine amidotransferase and its associated synthetase are closely integrated.
Often the catalytic activities are mutually coupled via ligand-induced conformational changes such that the binding of substrates to one of the domains results in activation of the other (26-31 The CPS plot compared hamster CAD, rat CPSI, and the cpa2, and E. coli carB gene products.

RESULTS
The sequence of the GLNase and CPSase domains of CAD was determined by sequencing both strands of the pCAD142 insert following the strategy outlined in Fig. 1. The nucleotide sequence (Fig. 2) begins at the end of the poly(G) segment created during cloning (53) and extends for over 3900 nucleotides to the end of the CPS coding region and confirms approximately 800 nucleotides reported at the junction of the glutaminase and synthetase domains (54). This completes the coding sequence found on pCAD142 but leaves undetermined the short 5' region of the mRNA coding for the amino terminus.
This latter region of the protein has no known catalytic activity but is necessary for the association of the E.
coli GLNase and CPSase subunits (55). The deduced amino acid sequence of the glutaminase (Fig.  3) and carbamyl phosphate synthetase (Fig. 4) was aligned to those of other CPSases. Based upon the size of the E. coli pabA gene product the GLNase domain of CAD starts at glycine 19 and ends at threonine 201. The abrupt appearance of strong homology at lysine 240 marks the amino terminus of the CPSase. Alignment of the amino half of the CAD CPSase (Lys-240 to Pro-778) with the carboxyl half (His-779 to Cys-1300) gave a 26% sequence identity and identified histidine 779 as the start of the carboxyl half. The CPSase extends to cysteine 1300 where the dihydroorotase domain begins (12). The synthetase, as defined, is 8 residues shorter on the amino terminus and 6 residues shorter on the carboxyl terminus than the carB gene product from E. coli. The connecting region between the GLNase and CPSase is 39 amino acids long. There are remnants of sequence identity in this linker (Fig. 5) among the hamster CAD, Drosophila CAD, yeast ura2 protein products and, to a lesser extent, the Dictyostelium pyrl-3 gene product.
The domain structure of the CAD CPSase was probed by controlled proteolysis.
A 60-kDa trypsin fragment was isolated and its amino terminus sequenced by Edman degradation. Based upon the size of the polypeptide it would extend from isoleucine 600 to lysine 1140 and include the junction between the two homologous halves of the CPSase. The resistance of the 60-kDa fragment to further proteolysis suggests that no appreciable exposed linker connects the amino and carboxyl halves of the CPSase.
Nucleotide sequence alignments were directly derived from the aligned amino acid sequence and both were used to calculate the identities matrixes and dendrograms showing the phylogenetic relationship of the proteins (Fig. 6). The CPSases were compared to homologous regions of yeast pyruvate carboxylase (56) and chicken acetyl-CoA carboxylase (57), two enzymes which also phosphorylate bicarbonate to produce carboxy phosphate and can synthesize ATP from carbamyl phosphate and ADP (Fig. 7). These two proteins are more closely related to each other than to the CPSases (36 uersua 18% amino acid sequence identity). A dendrogram based upon amino acid identity (not shown) indicates that the ancestral gene common to the yeast pyruvate carboxylase (YPCase) and chicken acetyl-coA carboxylase (CACCase) diverged before the duplication that produced the homologous amino and carboxyl halves of the CPSases. A homology plot of the GLNase domain, computed as described above, revealed four distinct peaks. (Fig. 7). In addition to the three homologous regions characteristic of trpG-like amidotransferases, there is a fourth cluster of conserved residues (threonine 136 to aspartate 145) which appears only in the carbamyl phosphate synthetases. The 10 regions of strict conservation that map within the synthetase domain (Fig. 8)  The glutaminase was compared from glycine 19 to valine 202 and the synthetase from lysine 240 to cysteine 1300. The calibration at the left of the dendrograms marks percent differences. The numbers in the dendrograms represent the average number of changes required to make a sequence on one branch identical to a sequence on another. The top row of matrixes gives percent identities while the bottom row nrovides the number of identities Moue right) over the length of the region compared (counting dashes). The abbreviations are given in the legends to Figs. s and 4. discrete functional units. These domains have acted as modules during evolution and appear in a wide variety of related proteins. By comparing the primary structure of CAD to its well characterized homologues we are able to assign likely functions to many regions of the protein. The functions of the three homologous segments (Fig. 3)  to the three regions characteristic of the trpGtype amidotransferases, the CPSases have a fourth region of homology located between the cysteine and histidine boxes (Fig. 3). Although there is no direct evidence to assign a function to this region its absence from the other amidotransferases suggests a role in proper orientation and interaction with the CPSase domain.
boxylase sequences these three regions of secondary structure could not be clearly identified nor could we confirm either of the specific binding sites already proposed (36, 41).
The likely ATP binding sites of the CPSases have been proposed based upon comparisons to other well characterized ATP binding proteins (36, 41). These sites fall within the region homologous to YPCase (56) and CACCase (57) as well as the domains photolabeled by 8-azido-ATP (41). These investigations have established the approximate location of the ATP binding domains. For hamster CAD this represents the segment from glycine 343 to proline 586 on the amino half and glycine 877 to proline 1109 on the carboxyl half of the protein (Fig. 7). There are protease-sensitive sites, probably indicative of interdomain linkers, near the carboxyl ends of the ATP binding regions of both amino and carboxyl halves of the CPS domains. Cleavage at these two sites produces the BO-kDa trypsin fragment.
Crystallographic studies suggest that ATP binding sites typically contain three segments: a glycine-rich loop, an a-helix, and a hydrophobic parallel ppleated sheet (60). However these structural components do not require a highly specific primary structure for function. Therefore despite the growing data base of CPSase and car- The ATP binding site assigned by homology to YPCase and CACCase ends about 190 residues (or 21 kDa) before the start of the DHOase. As suggested previously (61) this domain on the carboxyl end of the CPSase may act as a regulatory domain controlling the activity of the carbamyl phosphate synthetase. The phosphorylation of hamster CAD at serine 1165 enhances the binding of ATP and reduces the inhibitory affect of UTP (62). The binding sites for 5-phosphoribosyl-lpyrophosphate and UTP may also be located in this region. These effecters must bind within the GlnCPS since regulation by UTP and &phosphoribosyl-1-pyrophosphate is not eliminated after the ATC and DHO domains have been removed by proteolysis (5,63). UTP inhibits hamster CAD while protecting the carboxyl end of the CPSase from proteolytic cleavage (64). In CPSI this region binds the activator Nacetylglutamate with effects similar to 5-phosphoribosyl-lpyrophosphate binding to CAD (61). Both ligands increase the apparent affinity of the CPSase for ATP. This region has significantly lower homology than the rest of the CPSase, possibly reflecting the diversity of effecters that bind to the different CPSases.
Because the region between the CPSase and ATCase domains, which includes the DHO domain in CAD, is homologous in hamster, Drosophila, Dictyostelium, and Sacchnromyces (12,65,66), the 240-kDa multifunctional protein initiating pyrimidine biosynthesis was probably formed prior to the divergence of these organisms. The dendrograms based upon the amino acid comparisons show the CPSI and cpa2 gene product diverging before the formation of the chimeric proteins. The dendrogram derived from the nucleotide alignment is quite different. It shows that ura2 is significantly closer to cpa2 than to the other chimeric proteins. Similarly the nucleotide sequence of CPSI is more closely related to hamster CAD than to cpa2. Part of the reason for the differences is that codon preferences may tend to cluster nucleotide sequences from closely related organisms, while a change in ligand specificity, or other function, can increase the rate of divergence of proteins. The loss of glutaminase activity and the acquisition of N-acetylglutamate regulation in CPSI, and the lack of allosteric regulation of cpa2 should have allowed mutations that would make these proteins appear more distant in the amino acid comparison.
Although it has been suggested that CPSI was formed by the fusion of the glutaminase with the synthetase (35, 67), our analysis raises the possibility that CPSI, cpal and cpa2 arose by defusion from the pyrimidine chimera. It is noteworthy that the carboxyl terminus of the cpal and amino terminus of the cpa2 gene products are longer than expected based upon the length of the E. coli carA and carB gene products. If the homology between these extensions is the result of a common origin they too may have split from a fused protein (Fig. 5). These comparisons therefore suggest a complex evolutionary history for this family of proteins involving multiple gene duplications, fusions, defusions, and extinctions.