Biosynthesis of Bacterial Glycogen

The nucleotide sequence of the glg B gene, coding for branching enzyme (EC 2.4.1.18), was elucidated. It consists of 2181 base pairs specifying a protein of 727 amino acids. The deduced amino acid sequence was consistent with the amino acid analysis that was obtained with the pure protein as well as with the molecular weight determined from sodium dodecyl sulfategel electrophoresis. The deduced amino acid sequence was also consistent with the amino-terminal amino acid sequence and the amino acid sequence a alysis of various peptides obtained from CNBr degradation of purified branching enzyme.


Synthesis of 1,6-(~-glucosidic linkages of glycogen in bacteria or in mammals is catalyzed by 1,4-(~-D-glUCan:1,4-c-Dglucan 6-a-D-(1,4-a-D-glucano)-transferase
. Some information has been reported with respect to its properties, as well as its reaction mechanism in mammalian (1)(2)(3)(4)(5)(6) as well as in bacterial systems (7)(8)(9). The enzyme from Escherichia coli has been purified to near-homogeneity (10) and was shown to be a monomeric protein with a molecular weight of about 84,000. Other experiments have determined the amino-terminal amino acid sequence of the E. coli branching enzymes, as well as its immunological reactivity to branching enzyme from other bacteria (11).
Recently, the genes coding for the glycogen biosynthetic enzymes have been cloned (12). Subsequently, the nucleotide sequence of the glg C gene coding for ADP-glucose pyrophosphorylase (EC 2.7.7.27) has been determined and the amino acid sequence thus deduced (13). This report is concerned with the determination of the nucleotide sequence of glg B, the structural gene for branching enzyme, and the resultant amino acid sequence as deduced from the nucleic acid sequence. This study has also enabled us to compare the upstream nucleotide sequences for the three glycogen biosynthetic structural genes. It has been shown in in vivo experiments (14), as well as in uitro experiments (15), that expression of the glycogen biosynthetic structural genes is stimulated by cyclic 3,5-AMP and cyclic AMP receptor protein. It was, therefore, of interest if there were concensus nucleotide *This research was supported by United States Public Health Service Research Grants A105520 and AI22835 The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ Present address: Syntex, 3401 Hillview, Palo Alto, CA 94304. $ Present address: Dept. of Biochemistry, Michigan State University, East Lansing, MI 48824. To whom correspondence and reprint requests should be addressed. sequences present as found for CRP' binding sites of other genes where cyclic AMP and CRP were required for optimal expression (16).

RESULTS AND DISCUSSION
Sequencing Strategy-Previous studies (13) have shown that the plasmid pOP12 contains the structural genes for all of the glycogen biosynthetic enzymes as well as the asd gene coding for aspartate semialdehyde dehydrogenase. Okita et al. (12) have shown by deletion mapping and subcloning the approximate positions of the asd, glg B, glg C, and glg A genes. Nucleotide sequencing data have further refined the positions of the glg C and glg A genes (13), as well as the position of the asd gene, and its flanking DNA sequences reported by Haziza et al. (17) were of particular interest to us since the last 58 base pairs of their sequence could be translated into a peptide sequence consistent with the reported amino-terminal (11) sequence of the branching enzyme. This suggested a location for the start of the glg B structural gene which was entirely consistent with both the results of transduction mapping experiments (18) and the deletion mapping experiments of Okita et al. (13). Accordingly, the nucleotide sequence of the region between 2.3 and 4.9 kilobase pairs shown on the restriction map of pOP12 in Fig. 1 was sequenced.
The strategy used in sequencing the glg B gene is shown in Fig. 2. This strategy allowed us to sequence both strands of DNA for the entire gene. In addition, each restriction site used for labeling was sequenced through from an adjacent restriction site. The sequencing strategy made use of subclones of several blunt-ended restriction fragments into the plasmid pUC8. Subcloning these restriction fragments into the HincII site of pUC8 facilitated both 5' and 3' labeling adjacent to the region of interest due to the presence in pUC8 of the SmaI site immediately adjacent to the HincII site. Cleavage of the SmaI site with AuaI gave DNA terminal easily labeled at either the 5' or 3' ends.
DNA Sequence- Fig. 3 shows the DNA sequence of the glg  Plasmid pOP12 was digested with the restriction endonucleases indicated in the figure. The position of the structural genes shown are based on deletion mapping of pOP12 with EcoRI or PstI as previously described (13), the nucleotide sequence of the asd gene (17), and the amino-terminal sequence of branching enzyme (11). The direction of transcription shown by a horizontal arrow is based on the sequence data obtained previously (13,17) and in this report. Amino Acid Sequence and Composition-The amino acid sequence of branching enzyme deduced from the DNA sequence is also seen in Fig. 3. The open reading frame codes for a polypeptide with a molecular weight of 84,231, which is in agreement with the reported subunit molecular weight of 84,000 for the isolated branching enzyme (10). The branching enzyme isolated from E. coli B lacks the NHa-terminal methionine predicted in the DNA sequence. With this in consideration, there is complete agreement with the deduced amino acid sequence and the reported amino-terminal sequence of The sequence is shown as that of the antisense strand. The amino acid sequences underlined are those that have been confirmed by amino acid sequencing. The underlined nucleotide sequence preceding the translational start is a potential ribosome binding site. Horizontal opposing sets of arrows before and after the translated region represent areas of dyad symmetry. The two 21-base pair sequences enclosed by rectangles of broken lines are potential cyclic AMP repressor protein binding sites. The number shown corresponds to the nucleotide in the sequence shown or to the amino acid deduced starting with the known amino-terminal amino acid of branching enzyme.  Table 11, Miniprint) as well as with the partial amino acid sequences of peptides obtained from either CNBr degradation or CNBr degradation with further digestion with arginine-specific mouse submaxillary protease ( Table  11, Miniprint). The amino acid compositions of 6 of the 7 peptides sequenced were also determined ( Table I, Miniprint) and are in very good agreement with the deduced amino acid sequence of the CNBr fragment which was assigned to a deduced amino acid sequence in the protein based on the partial amino acid sequence obtained for the peptide.
The amino acid composition of branching enzyme isolated from the overproducing strain AC70R1 transformed with pOP12 as well as the amino acid composition deduced from the nucleotide sequence is seen in Table I. There is good overall agreement between the deduced and observed amino acid compositions except for valine. The lower observed amino acid composition for this residue may reflect a lack of complete hydrolysis of peptide bonds adjacent to the valine residues. -Codon Usage-The frequency of codon utilization in the glg B gene is shown in Table 11. It has been shown that there is a strong positive correlation between the abundance of a given tRNA in E. coli and the frequency with which that codon is used in E. coli genes (19). The positive correlation is stronger for the genes of abundant proteins, whereas the genes of less abundant proteins utilize minor tRNAs more often and the correlation is weaker. The frequency of optimal codon usage as defined by Ikemura (19) was calculated to be 67% for the glg B gene. This value is about that of genes coding for amino acid biosynthetic enzymes.
Sequences That May Be Concerned with Regulation of Transcription"Severa1 short regions of dyad symmetry are found preceding the translational start of the glg B gene. These are indicated in Fig. 3 as opposing horizontal arrows above the nucleotide sequence. Alternative regions of dyad symmetry are possible. Since none of the inverted repeats are followed by a series of thymine residues, they do not appear to be pindependent transcriptional termination sites (20) for the asd gene. The mode of transcriptional termination of the asd gene is a topic for future research.
Present evidence indicates that glycogen synthesis is stimulated by cyclic AMP and cyclic AMP receptor protein in E. coli (14,15). However, we find no strong homologies with the consensus Pribnow box sequence nor with the consensus -35 region sequence (16) in any of the upstream nucleotide sequences preceding the translational starts for the glg A, B, and C genes (Fig. 4). Two potential CRP binding sites are tentatively identified in the region preceding the translational start of the glg B gene, by homology with the consensus sequence of de Crombrugghe et al. (16). The homologies are found on the transcribed strand of DNA in both instances; these potential CRP binding sites are indicated by boxes of broken lines in Fig. 3. It is perhaps of interest that one of the potential CRP binding sites (the one proximal to the translational start) shows a strong homology with that of the gal E CRP binding site (16). It has been shown that CRP binds to the -35 region of the gal P2 promoter (21), thereby excluding RNA polymerase from the P2 promoter and simultaneously stimulating association of RNA polymerase with the P1 promoter (21), 5 base pairs downstream from P2. Further studies will be directed toward identifying the glg B promoter region and nucleotide sequence where CRP and CAMP may bind.
A potential Shine-Dalgarno sequence (22), AGGA, is found 8 base pairs before the translational start of glg B. This sequence is underlined in Fig. 3. A spacing of 8 base pairs between the ribosome binding site and the translational start has been shown to be near the optimal spacing (23).

5' -1 40 T C A A T A C A T C A C T A T T C A C A G C C T
-130

-120 T A C C T G A A C T G A G C A C C A C G C A G A
-110

C A G C A T C C A C A C A G T C C T G C G C C G T T C
-1 00 -90 -80

A A A C G C A C A C C A A C A T G C A C C T C G T C C T T T
-100 -90 -80

' G C T T C A C C T A A C A G A C A T T C T T T T
-1 40 -1 30

-120 A C C T G C T C G C C A C T G C C A C C C C A T
-110

3'
-50 At the 3' end of the sequence shown in Fig. 3, a short open reading frame of 13 codons appears starting at nucleotide 2521. This open reading frame is preceded by a potential Shine-Dalgarno sequence ( 2 3 , AGGA, 8 base pairs before the hypothetical translational start. In E. coli B, two regulatory elements have been identified for the glycogen biosynthetic genes, glg R and glg Q (25). The regulatory gene glg R has been linked by P1 transduction to glg B, glg C, and glg A, whereas glg Q is not linked to these genes (26). In Salmonella typhimurium, glg R is also linked to glg B, glg C, and glg A (25,27). Perhaps the open reading frame shown at the end of Fig. 3 is the 5' end of the glg R gene. If further work were to confirm this supposition, this would allow the investigation of the mode of regulation of the glycogen biosynthetic enzymes at the molecular level.

PUC8 ( 4 ) .
Plasmids. The pBR322 derivative. pop12 has been described ( 3 ) as has ChloramDhenicol amDlification of ARl/DOP12 Brown in M9 media (5). Cells were Preparation of Plasmid DNA. Plasmid DNA 0P pOP12 was obtained by collected by centrifugation and a cleir lysite prepared as descPibed by Clewell (6). Crude plasmid DNA was extracted with an equal volume of buffer saturated phenol, followed by two extractions with equal volumes of hydrated 0.2 H. adding two volumes of 95% ethanol and storage Overnight at -2OOC. DNA ethyl ether. DNA was then precipitated by bringing the NaCl concentration to M NaCl, 1 mM Nap EDTR. DNA was then pressed through a Sephacryl S-1000 column was collected by centrifugation and resuspended in 50 mM Trls-HC1. pH 7.5. 0.5 peak fractions and ethanol precipitating as described above, it was further (2.5 x 30 cm) equilibrated with the same buffer. After collecting the DNA purified by CsCl density gradient centrifugation as described (5).
cultures in LB media of JM103 transformed with pUC8 containing the appropriate was further purified of RNA fragments by passing through a Sephacryl S-1000 insert. Plasmid DNA was purlfled by the method of Birnboim and Doly ( 7 ) . DNA column (1.0 x 15 cm) as previously described.
Plasmid DNA OF pUC8 derivatives was prepared from 10 ml overnight Isolation of pop12 Restriction Fragments. Restriction fragments Of greater than one kilobase pair in length were resolved in 1% low gelling temperature agarose gels run with TriS-Borate EDTA buffer. DNA was visualized by staining with ethidium bromide (5 ug/ml) and illumination with long wave ultra violet light. Restriction fragments of interest were excised from the gel and recovered by the technique of Langridge g. (8). Restriction fragments less than one kilobase pair in length were Pesolved on 5% polyacrylamide gels in TBE buffer. DNA bands neve Visualized by staining with ethidium bromide (5 ug/ml). and illumination with long wave ultra violet light. Restriction fragments of interest were excised from the gel, electroeluted and ethanol precipitated. JM103 cells prepared by the method of Cohen et g . (9) were then transformed as describved by Bolivar 5 g . (1).

Transformed cells were plated on LB agar plates Containing 50 pg/ml
After growth overnight at 37OC. large colorless colonies Were picked and anIPicillin. 100 pg/ml streptomycin sulfate. 0.1 mM IPTG and 0.01s X-gal.
With shaking. miniscreen DNA was prepared from the cells by the method of innoculated into 1 0 ml cultures of LB media. After growth overnight at 37oc Birnboim and Doly ( 7 ) . After verification of the presence and orientation of the appropriate insert by restriction mapping. the DNA was prepared for sequencing by cleavage with Ava I at the Sma I site of PUCE immediately Outside the position of the insert. The Ava I cleaved DNa was then either 5' or 3' end labeled, and cleaved at the ECO RI site immediately adjacent to the Sma I site away from the position or the insert. DNA prepared in this manner could be sequenced without further reStr1Ction fragment isolation.
End Labeling of DNA. Restriction endonuclease digested DNA was 5' end labeled as described by Haxam and Gilbert (10). Terminal sequences generated with [a-3BPlcordycepin 5' triphosphate. Terminal sequences generated by by cleava e with Kpn I were 3' end labeled as described by Tu and Cohen (11) cleavage with Ava I were 3' end labeled with [a-32P]dCTP and E.
DNA polymerase (large fragment). The reaction mixture contained 5 picomoles of Tris-HC1. pH 7.5, 10 mM MgCIZ. 0.2 mM DTT and 1 unit of DNA polymerase I Ava I cleaved DNA. 50 pCi of [a-3ZP]dCTP (3.000 Ci/mmole). 50 mM NaC1, 10 mM (large fragment) in a final volume of 45 u1. The reaction mixture was Incubated at room temperature for 15 minutes after which 90 plcom01es of unlabeled dCTP were added. After incubating further for 15 minutes at room temperature, the reaction was stopped by phenol extraction. followed by ethanol precipitation. Terminal sequences generated by cleavage with 85s H I1 or Bam H I were 3' end labeled with Ca-32PldCTP or dGTP, respectively. in the same manner as were Ava I sites except that the addition of unlabeled dNTP was omitted.
DNA Sequencing. Sequences were determined by the chemical cleavage method of Maxam and Gilbert (10) utilizing the G. G+A, A X , T+C and C reactions.
Strain AC7ORl/pOP12 as previously described for purification OF the enzyme from E. coli B strain AC70R1 (12).
B AC70Rl/pOP12 branching enzyme showed that the enzyme migrated as a single homogenous protein band. It had the same rate of migration as the E. coli AClORl enzyme which was previously shown to have a subunit My of aboiit 84,000 (12). Protein was determined by the method of Lowry et g. (14). The specific activity of the enzyme waa 400 to 500 wnol/min per mg of protein.
Purification of Branching Enzyme. The enzyme was purified from E. Amino Acid Composition. Amino acid analyses were performed on proteins Appropriate extrapolations were made to correct Por losses or serine and threonine. All other values were Uncorrected. Cysteine was determined by analysis after carboxymethylation of the enzyme with iodoacetic by the method of Hirs (15). Tryptophan was determined by the method of Edelhoch (16) or by Recoveries of amino acids were 911 and 951 in the HC1 and acid hydrolysis of the protein with 3N mercaptoethanesulfonic acid (17).
Degradation of Branching Enzyme. After carboxymethylation, 17.6 mg of protein was treated with CNBr as descPibed (18). The CNBr reaction was lyophilized Preparation and Purification of Peptide Fragments Obtained CNBR and the mixture were made 10 mg/ml in aqueous 0.11 trif1uOrOaCetiC acid and centrifuged to remove the precipitate. The soluble peptides were stored fpozen. The precipitate was washed twice With 0.1s trirlUOPOaCetiC acid, dried in a centrifuge under vacuum to remove the acid and then dissolved in 0.5 ml of freshly prepared 0.5 M ammonium bicarbonate buffer, pH 8.0. Mouse Sub-maxillary protease was added to the mixture in a ratio of 50:l of peptide to protease and the mixture was incubated overnight at 37% and then centrifuged under vacuum. The peptides were then dissolved in 0.1% trifluoroacetic acid.
The soluble peptides in 100 p l aliquots were subjected to HPLC using a Waters system having models 6000A and "45 solvent delivery systems. a 660 solvent programmer. a U6K universal injector, and a Varian RPC-18. large pore size column. A gradient elution to 701 of solvent B was run with 0.11 0. The flow rate was 0.5 m 1 per min and the chromatography was for 80 min. A trifluoroacetic as solvent A and 0.11 trifluoroacetic in n-propanol as solvent 5 min pre-gradient run was done prior to the chromatography run. Samples of collected peaks from 5 runs were pooled and evaporated to dryness, dissolved solvents A and B with a gradient elution up to 501 of solvent 8. For some in 100 p 1 of aqueous tri~1uorOacetiC acid and rechromatographed using the peptide fractions this was Pepeated twice. The peaks were collected and Were subjected to amino acid analysis. Those peptides considered pure were further analyzed for amino acid sequence.
HPLC using a vydac RPC-4 5 micron column. A gradient elution from solvent A minutes after a 5 min pre-gradient run. Corresponding peaks from several runs to 601 of solvent B (see above) was used with a flow rate of 0.5011 min for 80 were pooled and re-chromatographed with a gradient elution from solvent A to either amino acid analysis or to peptide sequencing. 501 of solvent 0. Those peptide peaks considered pure were subjected to The fractions treated with mouse sub-maxillary protease were subjected to Amino Acid Analysis and Sequence Determination of Peptides. The acid hydrolysis of tryptic peptides was carried out in 5.7 N HC1 in sealed ignition as well as the carboxymethylated protein was performed by automated Edman tubes under vacuum at llO°C for 24 hours (19). Sequence analysis of peptides degradation in a Beckman model 890 M sequencer. The PTH amino acids were chromatography (21) or gas liquid chromatography ( 2 2 ) .
identified by at least two independent methods using HPLC (20). thin layer via HPLC and the amino acid Sequence in branching enzyme consistent with the composition as obtained by actual sequencing as well as by the deduced amino acid sequence from the nucleotide sequence. Table I1 shows the various sequences obtained via peptide sequencing.