Nucleotide Sequence of Escherichia coti pyrG Encoding CTP Synthetase*

The amino acid sequence of Escherichia coli CTP synthetase was derived from the nucleotide sequence of pyrG. The derived amino acid sequence, confirmed at the N terminus by protein sequencing, predicts a subunit of 544 amino acids having a calculated M, of 60,300 after removal of the initiator methionine. A glutamine amide transfer domain was identified which extends from approximately amino acid residue 300 to the C terminus of the molecule. The CTP synthetase glutamine amide transfer domain contains three con- served regions similar to those in GMP synthetase, anthranilate synthase, p-aminobenzoate synthase, and carbamoyl-P synthetase. The CTP synthetase structure supports a model for gene fusion of a trpG-related glutamine amide transfer domain to a primitive NH,-dependent CTP synthetase. The major 5’ end of pyrG mRNA was localized to a position approximately 48 base pairs upstream of the translation initiation codon. Translation of the gene eno, encoding enolase, is initiated 89 base pairs downstream of pyrG. ThepyrG-eno junction is characterized by multiple mRNA species which are ascribed to monocistronic pyrG and/or eno mRNAs and a pyrG eno polycistronic mRNA. CTP catalyzes the terminal reaction in

esis to investigate the role of amino acid residues that are important for glutamine amide transfer function (13)(14)(15). In the absence of crystallographic data, identification of conserved amino acids in a homologous domain from different enzymes provides a basis for identifying potentially functional residues to be replaced by mutagenesis (15). In addition, analysis of the pattern for fusion of the glutamine amide transfer domain to other functional domains has led to a model for the evolution of amidotransferases having the capacity to utilize NH, and glutamine (8). For these reasons, sequences of additiond glutamine amidotransferases are of interest.
In this paper, we report the nucleotide sequence of pyrG and the derived CTP synthetase amino acid sequence. The CTP synthetase glutamine amide transfer domain contains three conserved regions similar to those in anthranilate synthase, p-aminobenzoate synthase, GMP synthetase, and carbamoyl-P synthetase. The CTP synthetase glutamine amide transfer domain is located at the C-terminal end of the molecule, encoded by DNA at the 3' end of the gene, consistent with a model (8) for evolution by gene fusion to augment the function of an NHa-dependent enzyme. Experiments to map the 5' and 3' ends of the pyrG mRNA led to the finding that eno, which encodes enolase, is 89 bpl downstream of PYrG.

RESULTS
Subcloning pyrG-Plasmid pNF1519 contains pyrG in a 4.3-kb PstI fragment of E. coli DNA cloned in pBR322 (Fig. 1). pyrG was subcloned into vector pUC8 as shown in Fig. 1. Plasmid pMWl was obtained by ligation of a mixture of BamHI fragments from pNF1519 into the BamHI site of pUC8. Selection for pyrG+ was by functional complementation of p y a in strain JF646. Plasmid pMW5 was constructed by ligating the 2.6-kb SalI-PstI segment of E. coli DNA from pMW1 into the SalI and PstI polylinker sites in pUC8. Further subcloning indicated that DNA at the BamHI and KpnI sites in pMW5 was essential for pyrG function.
DNA Sequenee-The DNA sequence of pyrG was initially determined using fragments isolated from plasmid pMW5. RsaI, HpaII, or TaqI digests of plasmid pMW5 or of the SalI-* The abbreviations used are: bp, base pair; kb, kilobase pair.
Portions of this paper (including "Experimental Procedures" and Figs. 1,2,4,5,and 6) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 85M-3563, cite the authors, and include a check or money order for $4.40 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal that is available from Waverly Press. PstI insert were ligated into M13mp18 or M13mp19 (Fig. 2 A ) . In addition, specific subfragments were isolated from digests that were obtained using restriction enzymes having 6-bp recognition sequences (Fig. 2B). Finally, the exonuclease I11 procedure (26) was employed to obtain a set of overlapping sequences from NruI-BamHI and BumHI-PstI segments of the cloned DNA (Fig. 2C). The DNA sequence shown in Fig.  3 extends from 11 bp upstream of the NruI site to the downstream PstI site at nucleotide 2442. The entire sequence was determined on both DNA strands from overlapping fragments.
The derived amino acid sequence of CTP synthetase is shown in Fig. 3. The protein chain of 545 amino acid residues has a calculated molecular weight of 60,450. At nucleotides 2074-2076, an ATG initiates an open reading frame that extends 123 codons to the 3' end of the cloned E. coli DNA. By screening protein data banks, the downstream sequence was found to be homologous with that of yeast enolase. Thus, E. coli eno is 89 bp downstream from py&.
CTP Synthetase-Enzyme was purified to homogeneity from cells bearing plasmid pMW5. A single stained protein band was obtained by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. By comparison with proteins of known molecular weight, the CTP synthetase subunit had an estimated molecular weight of 60,000 (Fig. 4).
The N-terminal amino acid sequence of CTP synthetase was determined by automated Edman degradation. The data for 24 cycles were indicative of a pure protein. In the first cycle, a complex mixture was obtained, apparently containing methionine, threonine, and a residue similar to alanine. From cycles 2 through 24, the sequence was identical with that shown in Fig. 3 from Thr-3 to Leu-25. We conclude that Met-1 was removed by processing and the N terminus of the mature enzyme is threonine 2 or a modified form of threonine. pyrG mRNA-The 5' end of pyrG mRNA was mapped by the nuclease S1 procedure (29). A 462-nucleotide NruI-BumHI DNA probe was used which extends from nucleotides . The probe was labeled with Ia-"P] dCTP by primer extension or the double-stranded fragment was isolated and 5' end-labeled with [y3'P]ATP and polynucleotide kinase. The results of nuclease S1 mapping are shown in Fig. 5. Two protected fragments of coding strand DNA were obtained (Fig. 5A, lune 2). Noncoding strand DNA did not anneal to RNA and was completely digested (Fig. 5A, lune 4). To confirm that the two transcripts extend into the pyrG coding sequence, nuclease S1 mapping was repeated using the 5' end-labeled NruI-BumHI probe. Fig. 5B shows that the same two protected fragments were obtained. The size of the major protected fragment is about 175 nucleotides, and the minor one approximately 255 nucleotides. More precise mapping was obtained by using a DNA sequencing ladder as a size standard (Fig. 5C). These results confirm those obtained with restriction fragment size standards, Corresponding sites for transcription initiation are overlined in Fig.  3.
The 3' end of pyrG mRNA was mapped with a KpnI-PstI DNA probe that extends from nucleotides 1663-2442 (Figs. 2 and3). The results of nuclease S1 mapping are shown in Fig.  6. The major products obtained from the coding strand were undigested probe and a fragment of approximately 400 nucleotides (Fig. 6, lune 2). Minor fragments of approximately 420, 530, and 650 nucleotides were also obtained. The same pattern of fragments was obtained when the nuclease S1 concentration was increased 3-fold (data not shown). The noncoding strand DNA probe did not anneal to RNA and was completely digested (Fig. 6, lune 4). These results indicate that there are multiple species of pyrG and eno mRNA. An mRNA that anneals and fully protects the probe is suggestive of polycistronic pyrG e m mRNA.

DISCUSSION
The nucleotide sequence of E. coli pyrG was determined in order to extend our analysis of the relationship of glutamine amidotransferase structure to function. Recent sequence analyses indicate that different amidotransferases contain one of two distinct glutamine amide transfer domains (8)(9)(10)(11)(12). In all amidotransferases, a glutamine amide transfer domain is combined by various arrangements with a domain that catalyzes an NH3-dependent biosynthetic reaction. This combination endows glutamine amidotransferases with the capacity to catalyze a glutamine-dependent as well as an NH3-dependent biosynthetic reaction, both in vitro and in vivo (13,14). Both types of glutamine amide transfer domain utilize an active site cysteine to form a covalent glutaminyl intermediate for catalysis of amide transfer (13,14). Amidophosphoribosyltransferase (11) and glucosamine-6-P synthase (12) have a highly conserved glutamine amide transfer domain of approximately 190 amino acids that is characterized by an N-terminal active site cysteine. The second type of glutamine amide transfer domain, in GMP synthetase (30), carbamoyl-P synthetase (7, lo), anthranilate synthase (5), and p-aminobenzoate synthase (9), shown in Fig. 7, has three conserved segments. The active site cysteine in segment 2 is usually at a position approximately 80 to 90 amino acids from the N terminus of the domain.
The alignment in Fig. 7 localizes the CTP synthetase glutamine amide transfer domain and establishes its similarity to that in carbamoyl-P synthetase, GMP synthetase, anthranilate synthase, andp-aminobenzoate synthase. Using current nomenclature, the CTP synthetase glutamine amide transfer domain is trpG-related (8). In GMP synthetase, anthranilate synthase component 11, and p-aminobenzoate synthase subunit 11, the homologous trpG-related glutamine amide transfer domain (8) is initiated at the N-terminal residue of the protein chain. By noting that the first block of conserved sequence occurs 46-54 residues from the beginning of the domain in the three preceding enzymes, we estimate that the CTP synthetase glutamine amide transfer domain begins approximately at amino acids 292 to 300. Amino acid residues 1 to approximately 300 should contribute the structure needed for catalyzing the NH3-dependent reaction. The glutamine amide transfer domain is fused onto the C-terminal end of the enzyme. Likewise, in carbamoyl-P synthetase, the glutamine amide transfer domain is fused onto the C-terminal end of a protein chain, except that the function of the N-terminall85amino-acid segment is unknown.
The conservation of amino acids in region 2 is sufficiently high to predict that the conserved cysteine, residue 379 in CTP synthetase, functions to form the covalent glutaminyl intermediate as has been shown for Cys-84 in anthranilate synthase component I1 (14). Likewise, CTP synthetase His-515 in region 3 is implicated in the proton transfer that is required for ionization of Cys-379 (15). Whereas previous experiments have provided evidence for catalytic roles of cysteinyl and histidyl side chains in regions 2 and 3, respectively, there is no evidence bearing on the possible role of region 1 in glutamine amide transfer.
In a previous analysis of the pattern for fusion of the glutamine amide transfer domain to other protein chains, a model was proposed to explain the evolution of glutamine amidotransferases from primitive NH3-dependent enzymes (8). According to this model, after duplication, genes encoding   to translocate contiguous to a promoter. The arrangement of the glutamine amide transfer domain in CTP synthetase conforms to this model. Nuclease S1 mapping indicates that the major pyrG promoter is proximal to the translation initiation codon. No other genes intervene between the promoter and pyrG. The position of the glutamine amide transfer domain in CTP synthetase is consistent with translocation and fusion of a trpG-related glutamine amide transfer domain to the 3' promoter distal end of an existingpyrG coding sequence of approximately 300 amino acids.
To explain the trpG (5, 31, 32) and pabA (9) gene fusion pattern in several microorganisms, it was proposed that trpGrelated gene fusions onto the 3' end of an existing gene were unfavorable compared to 5' end fusions (8). It is now apparent that 3' end trpG-related gene fusions occur in carbamoyl-P synthetase (7, 33) and CTP synthetase. It is uncertain why 3' end trpG or trpG-related gene fusions did not occur with trpE or pabB, respectively.
Previously, evidence was reported ( Nuclease S1 mapping of the pyrG eno boundary indicates multiple species of mRNA. One of the major products of this mapping experiment was the fully protected probe. The conventional interpretation of this result is that a polycistronic pyrG eno mRNA annealed to the probe and protected against nuclease S1 digestion. An alternative possibility, not presently excluded, is that two overlapping monocistronic pyrG and eno RNA molecules can anneal to the probe forming a tripartate nuclease S1-resistant structure (37). The other major mRNA of approximately 400 nucleotides should correspond either to a pyrG transcript having a 3' end at approximately nucleotide 2059 or an eno transcript having a 5' end at approximately nucleotide 2040. Likewise, the minor mRNA species of 420, 530, and 650 nucleotides either terminate distal to pyrG or initiate upstream of eno. Further experiments are required to determine whether multiple pyrG and eno mRNA molecules arise from transcription termination afterpyrG and transcription initiation prior to eno or whether a primary pyrG eno mRNA undergoes processing. Since pyrG expression appears to be constitutive ( RNA-protected fraprrnts were resolved on 5 1 p~I y a c r y l a n l d~ 7 W urea gels with