The Glycine Cleavage System MOLECULAR CLONING OF THE CHICKEN AND HUMAN GLYCINE DECARBOXYLASE cDNAs AND SOME CHARACTERISTICS INVOLVED IN THE DEDUCED PROTEIN STRUCTURES*

A cDNA encoding chicken glycine decarboxylase (pCP15b) was isolated using an antibody specific to this protein. Additional cDNAs were cloned with the aid of the genomic fragments obtained by using the pCPl5b cDNA probe. No initiator methionine codon is found in the currently elucidated cDNA sequence, and an ATG codon in an exon is assigned to this role. The precursor glycine decarboxylase deduced from the 3514-base pair nucleotide sequence is comprised of 1,004 amino acids (Mr = 11 1, 848). The 1,020 amino acid residues are encoded for the precursor form of human glycine decarboxylase (Mr = 112,869) in the 3,783-base long cDNA sequence of two 1.9-kilobase pair cDNAs with a pentanucleotide overlap. The pyridoxal phosphate binding site lysine and a glycine-rich region, which is suggested to be responsible for the attachment of the phosphate moiety of pyridoxal phosphate, are found in close proximity in both the chicken and human enzymes. This region es-sential for the enzyme action is suggested to be em- bedded in a segment rich in &turns and random coils and is surrounded by conserved and repetitive amino acid sequences. It is suggested that these structures are involved in the organization of the active site of glycine decarboxylase.

A cDNA encoding chicken glycine decarboxylase (pCP15b) was isolated using an antibody specific to this protein. Additional cDNAs were cloned with the aid of the genomic fragments obtained by using the pCPl5b cDNA probe. No initiator methionine codon is found in the currently elucidated cDNA sequence, and an ATG codon in an exon is assigned to this role. The precursor glycine decarboxylase deduced from the 3514-base pair nucleotide sequence is comprised of 1,004 amino acids (Mr = 11 1, 848). The 1,020 amino acid residues are encoded for the precursor form of human glycine decarboxylase (Mr = 112,869) in the 3,783-base long cDNA sequence of two 1.9-kilobase pair cDNAs with a pentanucleotide overlap.
The pyridoxal phosphate binding site lysine and a glycine-rich region, which is suggested to be responsible for the attachment of the phosphate moiety of pyridoxal phosphate, are found in close proximity in both the chicken and human enzymes. This region essential for the enzyme action is suggested to be embedded in a segment rich in &turns and random coils and is surrounded by conserved and repetitive amino acid sequences. It is suggested that these structures are involved in the organization of the active site of glycine decarboxylase. The glycine cleavage system (1) (glycine synthase (EC 2.1.2.10)) is a multienzyme system comprised of glycine decarboxylase (2) (tentatively known as P-protein due to the requirement for pyridoxal phosphate (l), or P1 (3)), H-protein,' T-protein, and lipoamide dehydrogenase and reversibly catalyzes the degradation of glycine in animal and plant mitochondria and in prokaryotes (1). Some interesting properties involved in the catalytic reaction have been well documented using both glycine decarboxylase and H-protein purified from chicken liver. Glycine decarboxylase is almost Ministry of Education, Science, and Culture, Japan (to K. H.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 505742. dressed.
$To whom correspondence and reprint requests should be ad-' The abbreviations used are: H-protein, hydrogen carrier protein; T-protein; tetrahydrofolate-requiringprotein; kb, kilobase pair(s); bp, base pair(s). inactive by itself and forms an enzyme complex with Hprotein, resulting in a spectral change of the prosthetic group, pyridoxal phosphate, and conversion to the active enzyme. It is physiologically significant that H-protein functions as a regulatory protein, in addition to transferring two electrons through the prosthetic lipoyl moiety in the reversible reaction (2, 4, 5).
Our long term goal is the elucidation of the mechanism for the synthesis of these two proteins and their mode of interaction. However, because neither the cloning of the glycine decarboxylase cDNA nor the structural analysis of this protein was available, these issues are still unclear. In the accompanying study (6), we cloned and characterized chicken Hprotein cDNA. The reported primary structure of this Hprotein confirms that previously determined by Fujiwara et al. (7) using the protein purified from chicken liver. We attempted to isolate the chicken glycine decarboxylase cDNA. If this attempt was successful, then the cDNA would serve as a start for realizing our goals listed above.
In vertebrates, the glycine cleavage activity exists mainly in the liver, kidney, and brain (8) and is the physiologically major pathway for the catabolic degradation of glycine (9). Impaired or defective breakdown of glycine causes nonketotic hyperglycinemia, an incurable disease in humans (10). Defective glycine decarboxylase is most frequently assigned as the primary cause of this disease (11). Knowledge of the human glycine decarboxylase cDNA would be indispensable to the further study of this disease. Taking advantage of the fact that the human enzyme is reactive to an antibody raised against the chicken enzyme (12), we tried to clone the cDNA encoding the human enzyme. In this paper, we report on the cloning of the chicken and human glycine decarboxylase cDNAs and some characteristics expected to be responsible for the action of these enzymes.

Materials-Livers from White
Leghorn hens (about 10 months old) were used. Radioactive nucleotides were obtained from Du Pont-New England Nuclear, and Bluescript plasmid vector (Stratagene Cloning Systems) was used to subclone various DNA fragments. All other materials were commercially obtained.
Partial Primary Structures of Chicken Glycine Decarboxyhe-Glycine decarboxylase purified from chicken liver as described previously (2) was carboxymethylated and digested with tosyl-L-phenylalanine chloromethyl ketone-treated trypsin (13). Nine peptides were randomly purified by reverse phase high performance liquid chromatography (14) and subjected to sequence analysis using an automated protein sequencer, model 470A with on-line facility for phenylthiohydantoin analysis, model 120A (Applied Biosystems Inc.). The purified chicken glycine decarboxylase was also subjected to amino-terminal sequence analysis by this method.
Although the recovery of phenylthiohydantoin amino acids was

3324
The Deduced Glycine Decarboxylase extremely low (5% relative to the amount of the subunit used (10 nmol)), the primary structure of a peptide consisting of 15 amino acids was determined. The summarized result is shown in Table I. The peptides designated as 6 and 7 were reverse-translated, in part, into synthetic oligodeoxynucleotides as the strand complementary to mRNA and with all the possible codon usages, using a DNA Synthesizer, model 381A (Applied Biosystems Inc.). They were designated as probes 6 and 7, and the nucleotide sequence for probe 6 is 5'-
Selection of Chicken and Human Glycine Decarboxylase cDNAs and Their Genes-Several cDNA clones were selected from a chicken liver cDNA expression library (6) using an immunopurified anti-chicken glycine decarboxylase antibody (12), but only the 2-kb' cDNA clone (pCP15b) hybridized to the end-labeled probes 6 and 7. None of the clones selected from this library with the pCP15b cDNA probe had cDNA longer than the pCP15b insert. In parallel, we cloned the chicken glycine decarboxylase gene from a chicken genomic library (6) using the pCP15b cDNA. The pCPGlOlEE4.5 insert, a subcloned EcoRI fragment of this gene, could hybridize to RNA with a size similar to that revealed by the nick-translated pCP15b insert. One of several immunoreactive cDNA clones from the primary selection hybridized to the nick-translated pCPGlOlEE4.5 insert and was subcloned (pCP23a). A commercial cDNA library (CLONTECH Laboratory Inc.) was subjected to the screening with the pCP23a cDNA probe, and four additional cDNA clones were obtained and charac-

TABLE I
The partial primary structures of chicken glycine decarboxylase The carboxymethylated glycine decarboxylase (1 mg, 10 nmol of subunit) was digested with tosyl-L-phenylalanine chloromethyl ketone-treated trypsin. The primary structures of the peptides obtained as described under "Experimental Procedures" are shown in A. The positions of the peptides in the deduced primary structure are also shown. The underlined sequences were reversely translated into the synthetic DNA probes 6 and 7. The amino-terminal sequence deduced for the purified glycine decarboxylase is shown in B.
Amino acid sequence deduced structure

A.
B.
A human liver cDNA expression library (16) was screened with the same antibody used for the selection of the chicken cDNA, and human glycine decarboxylase cDNAs were isolated (XHGD34c and XHGD34d). The 5' EcoRI site of XHGD34d cDNA had been altered, but the cDNA was subcloned with part of the vector DNA which includes a HincII site. The pHGD34d cDNA was employed to clone XHGD52a from a X g t l O human liver cDNA library. The pHGD34d and pHGD52a cDNAs hybridized to the pCP15b insert and the pHGD34c cDNA to the pCP23a insert.
Human high molecular weight DNA was prepared from the nucleated blood cells of a normal male by the method of DiLella and Woo (17). A partial digest of genomic DNA with Sau3AI was subjected to construction of a human genomic library with XDASH DNA (Stratagene Cloning System) according to the method recommended (18). A genomic clone (XHGDG27) was obtained using the nicktranslated pHGD34c and pHGD34d cDNAs and partially characterized. Throughout the screening, the probing was conducted as described in the preceding paper (16). Restriction maps and relative locations of the chicken and human cDNAs are presented in Figs. 1 and 2.
DNA Sequencing-The pCP112a0.5 insert, a 0.5-kb stretch between the 5'-end EcoRI and EcoRV sites of pCP112a, was treated with PuuII and RsaI, and the resultant fragments (120-270 bp) were subcloned. The 2.0-and 0.9-kb fragments formed from pCPllOb by EcoRI were subcloned (pCPllOb2.O and pCPl10b0.9). The pCPllOb2.0 insert was subjected to isolation of serial deletion mutants for both strands as described (19). The SrnaI, XhoI, and PstI fragments of the pCP110b0.9 insert was subcloned.
For the human glycine decarboxylase cDNA, a similar sequencing strategy was undertaken. Serial deletion mutants of both strands were prepared from the pHGD34c cDNA. Several fragments of about 200-280 bp were formed from the pHGD34d, pHGD15a, and pHGD52a inserts by treating them with EcoRV, ApaI, EcoRI, SmaI, Sau3A1, and AluI, and then subcloned.
Both the nucleotide sequences and the deduced primary structures were analyzed using a computation program, MicroGenie, which was developed by C. Queen and L. Korn (Beckman Instruments, Inc.).

Characterization of Chicken Glycine Decarboxylase cDNA-
The chicken glycine decarboxylase subunit exhibits a molecular mass of about 100 kDa (905 amino acids (Z)), and the expected size of mRNA encoding this protein is at least 2.7 kb. The pCP15b cDNA, only the clone identified with the specific antibody and synthetic oligonucleotide probes, was less than 2 kb in size, although this cDNA probe reveals a 4kb mRNA (Fig. 3). Additional clones, pCPllOb and pCP112a, were isolated by using the pCP23a cDNA which was obtained with the aid of a genomic fragment cloned with the pCP15b cDNA. The pCPllOb cDNA had a 15-bp poly(A) region and hybridized to the pCP15b at the 3' region. The other side of this cDNA hybridized to the pCP23a cDNA, and the pCPll2a cDNA encodes an approximately 100-bp long sequence upstream of the pCP23a cDNA (cf. Fig. 1). Nucleotide sequence determined by using these inserts and primary structure deduced for chicken glycine decarboxylase are shown in Fig. 4, A and B. This 3,490-bp long cDNA contains the sequence for the amino-terminal primary structure shown in Table I, B (nucleotides 79-1231, but no initiator methionine codon in the upstream reading frame. All of the nine tryptic peptides can be located in the primary structure deduced from an identical reading frame (underlined in Fig. 4B), indicating that the selected cDNAs are those for chicken glycine decarboxylase.
An exon in a genomic subclone, pCPG301EE2.9, includes nucleotides 1-201 of the cDNA. An ATG codon was found 24 bp upstream on an identical genomic stretch. This ATG codon is included in the reading frame encoding a peptide of 34 amino acids followed by the amino-terminal peptide of the purified protein (Fig. 4A) and assigned to the role of the initiator methionine codon. Two additional ATG codons (275 Characterization of Human Glycine Decarboxylase cDNA-Two 1.9-kb cDNAs were cloned and distinguished by the restriction mapping (pHGD34c and pHGD52a in Fig. 2). The pHGD34c cDNA is 1,873 bp in size (see Fig. 4C) and shows no significant hybridization with both the pHGD52a and pHGD34d cDNAs, although these cDNAs seem to hybridize to a single RNA (Fig. 3). The first ATG codon begins a t nucleotide 151 and precedes an open reading frame encoding 574 amino acids. However, the sequences at the 3'-end of the pHGD34c cDNA, AAAGAATT (1,870-1,877 in Fig. 4C) and 5'-end of the pHGD34d cDNA (GAATTTGC 1,873-1,880, identical with that of pHGD52a) differ from the EcoRI linker structure, CGGAATTCCG, used for the library construction (16). The sequence that is yielded by the pHGD34c cDNA joined to the pHGD34d cDNA at a short overlap, GAATT, becomes highly homologous at its junction to a segment of chicken glycine decarboxylase cDNA (AAGGAATTTGC, nucleotide 1,672-1,682 in Fig. 4B), and the two individuals reading frames of the immunoreactive pHGD34c and pHGD34d cDNAs are not changed.
Analysis of the genomic structure confirmed that this junction sequence is actually coded in an exon in a genomic clone, XHGDG27. A XbaI fragment from this clone was selected by hybridization with both a 3' region of the pHGD34c cDNA and a 5' region of pHGD34d cDNA. An exon near the 3'-end of this XbaI fragment is comprised of a 143-bp sequence, and, thereby, the 5' 16-base long sequence from the 3'-end of the pHGD34c cDNA links consecutively to the remnant part corresponding to the 5'-end of the pHGD34d cDNA (Fig. 5). This result indicates that there are probably no sequences inserted between the two sequences determined from the pHGD34c and pHGD34d or pHGD52a cDNAs. The precursor form of human glycine decarboxylase deduced from the 3783bp human glycine decarboxylase cDNA sequence is composed beginning at Met', because the amino-terminal regions of both enzymes (Met' to Gly') are very similar (Fig. 4, B and C). The major difference is observed in the sequences from Leu" to Leusfi of the human enzyme and from Arg" to Leu" of the chicken enzyme. The human enzyme, however, contains a glycine-and alanine-rich segment similar to the amino terminal region of the purified chicken enzyme, suggesting that the amino terminus of the human enzyme is probably Gly40, and that the majority of the difference involved is in their mitochondrial presequences. The distinct insertion of a heptapeptide, V-V-Q-T-R-A-K-(Va1232-Lys238), is found in the human enzyme, instead of AsnZz2 of the chicken enzyme. The carboxyl-terminal side from this position on (about 800 residues) contains no gap structure. Overall, the primary structures of these two enzymes appear to be highly conserved. Structural homology was estimated to be 83.8% in putative forms of the mature enzymes. By taking into account the fact that these enzymes contain many amino acid substitutions between Asp and Glu, Arg and Lys, and Ser and Thr, and within branched-chain amino acids, the structural similarity is estimated to be near 93%. The carboxyl-terminal side is closely conserved. A dot matrix comparison of these two cDNA sequences, in which comparison of segments consisting of 8 consecutive nucleotides that contain 6 identical nucleotides gave a dot, also showed a line in the coding regions for the mature protein (data not shown). This result is confirmatory of the finding that the human cDNAs were identified by hybridization to the chicken cDNAs under highly stringent conditions.
Repetitive amino acid sequences are commonly found in both the chicken and the human enzymes. A nonapeptide, ILSTPFKRT, is found in the chicken enzyme (amino acids 492-500), and its counterpart appears to be repeated as ILDTRPFKKT (amino acids 848-857). The additional repetitions, SSAELAPISW and SSAILPISW (amino acids 548-557 and 791-799), are surrounded by the first nonapeptide repetitions. The human enzyme also contains SSSELAPITW   b b b b b t t c t c c b b b b b b t t t t t b b t b b b b b c t t c c c b b b b b b b b b b
The Lys738 of the chicken enzyme can be assigned to the pyridoxal phosphate binding site, because we confirmed that the phosphopyridoxyl peptide reported by Fujiwara et al. (26) was located at amino acids 704-757 in the precursor chicken enzyme. Therefore, from the structural similarity, the Lys754 is suggested to be that site of pyridoxal phosphate binding in the human enzyme. These lysine residues followed by a glycine-rich sequence, PHGGGGPGMGPIGVKK, are included between the serine-rich and inner repetitive sequences of the chicken and human enzymes. Secondary structure analysis using the algorithm of Garnier et al. (27) demonstrates that the active site lysine and the glycine-rich region are embedded in a segment rich in @-structure (53%), @-turn (29%), and random coil (12%) (amino acids 703-804 in the chicken enzyme). Only 6 amino acids were predicted to form an a-helix in this segment. The human enzyme also had a similar structure (Fig. 6, A and B ) . DISCUSSION This is the first report on the cloning of cDNA encoding glycine decarboxylase and on its deduced structure. This protein was first found in Peptococcus glycinophilus and designated as the P1 component (3). Stauffer et al. (23) also suggested that the Escherichia coli genome encodes this enzyme system as an operon. The eukaryotic enzyme has been purified from chicken liver by Hiraga and Kikuchi (2). However, there has been no reported study on the entire structure of this enzyme. Table I, B, we deduced a peptide composed of 15 amino acids for the amino terminus of the chicken enzyme. However, the recoveries of the amino acid derivatives were extremely low throughout this experiment. It is possible that the 35th Gly is an artificial amino terminus formed by proteolysis during purification. Several Gln residues are found on the amino-terminal side of the Gly35. If cyclization occurs forming a pyroglutamyl residue (24), then it is impossible to identify the true amino-terminal amino acid by the method used in the present study. Since the calculated amino acid composition of the deduced mature glycine decarboxylase beginning at Gly35 and consisting of 970 amino acids (about 108 kDa) was not largely different from the value reported for the protein purified from chicken liver (905 amino acids, M, = 100,154) (2), the true amino terminus, if present, is not far from the Gly35. From the structural similarity, Gly4' was assigned to the amino terminus of the human enzyme, and the overall homology of the putative mature enzymes is estimated to be about 84%. The arrangement of a-helix supports the conservativeness in structure. It is likely that glycine decarboxylases from different animals are immunochemically similar and are strongly stimulated by the H-proteins from different organisms (12, 25).

As shown in
Two conserved and distinct structures are found in both enzymes. One is the repetitive sequence, and the other is the glycine-rich sequence localized between the repetitions in the carboxyl-terminal half (amino acids 743-758 and 759-774 in the chicken and human enzymes, respectively). The glycinerich sequences of both enzymes are identical and very similar to that of E. coli D-serine dehydratase (28,29). This region of the E. coli D-Serine dehydratase is suggested to be involved in the active site or in the organization of the stereochemical structure of the active site, possibly as an attachment site for the phosphate group of pyridoxal phosphate (28, 29). Nicotiana tabacum ribosomal protein L2 (30) also has a similar glycine-rich structure which could be aligned over that of glycine decarboxylase showing 4 substituted and 1 deleted amino acids within the 16-amino acid sequence (PH/GGGE-GRAPIGRKK). Ribosomal protein L2 is required for peptidyltransferase activity (31) and possibly interacts with the phosphate moiety of RNA or nucleotides such as GTP and GDP. Other pyridoxal enzymes also contain a homologous glycine-rich region (28).
The lysine residues binding pyridoxal phosphate are located at the intrinsic positions 23-213 amino acids away from either end of this glycine-rich region in the above mentioned enzymes, whereas the lysine residue and glycine-rich region are in close proximity in glycine decarboxylase (5 amino acids away from the conserved proline residue of the glycine-rich region). This short segment of glycine decarboxylase is deduced to be embedded in a peptide rich in @-turns and random coils. These flexible structures might be responsible for the spectral change resulting from the interaction of glycine decarboxylase and H-protein (4). Neither rabbit serine hydroxymethyltransferase (which also binds glycine) nor other amino acid decarboxylases contain a similar glycine-rich region at positions adjacent to their active site lysine (32, 33).
It is conceivable that the conserved repetitive sequences might have a specific function necessary for the enzyme activity, because their unique structures are present in the enzymes from the different organisms. This prediction remains to be examined.