The Glycine Cleavage System

Several cDNAs encoding H-protein, a constituent of the glycine cleavage system, were cloned from chicken liver cDNA libraries with an antibody raised against rat H-protein or with a nick-translated cDNA of an immunoreactive clone. The structure of the H-protein cDNA consisting of 910 base pairs was determined using clones with an apparent overlap in the nucleotide sequence. The cDNA encodes the precursor form of Hprotein that is comprised of 39 amino acid residues for a mitochondrial presequence and 125 amino acid residues for the mature protein, following a 5’ untranslated region of 13 base pairs. There are two genuine consensus sequences for the cleavage/polyadenylation of the precursor H-protein mRNA in the 3‘ untranslated region of the cDNA sequence. We showed by comparison with the 6-aminolevulinate synthase gene that only one copy of the H-protein cDNA occurs in the haploid genome of the chicken. Nevertheless, two types of H-protein mRNAs, which differ by the length of their 3‘ untranslated region, are produced in liver. The chicken H-protein gene extends over 8 kilobase pairs on the genome and includes 5 exons that encode the entire cDNA sequence. Two AATAAA motifs are coded in the last exon of this gene, suggesting that the differently size H-protein mRNAs are produced by the alternative use of these motifs.

expression of this activity. The purified glycine decarboxylase and H-protein form an enzyme complex in uitro; 1 mol of glycine decarboxylase (a homodimer) binds 2 mol of monomeric H-protein, resulting in a structural change at the active site of glycine decarboxylase and conversion to the active enzyme (8). This interaction plays a key role in catalyzing the initial step of the glycine cleavage (8).
Contents of the component proteins which were estimated by specific enzyme activities appear to vary from tissue to tissue in vertebrates (9). One can anticipate a closely related mode of biosynthesis of component enzymes. To our knowledge, to date, no study has reported either the biosynthesis or the regulation of this process for the components of the glycine cleavage system. Accordingly, we attempted to clone a cDNA encoding chicken H-protein and use the cloned cDNA as a probe to study the regulation mechanisms involved. We have already isolated human H-protein cDNA (lo), and characterization of the human H-protein gene is in progress in our laboratory. In this paper, we report on the cloning of the cDNA and gene encoding chicken H-protein, and their structures, and suggest that two different types of H-protein mRNAs are produced by the alternative use of two AATAAA motifs localized in a single exon.

EXPERIMENTAL PROCEDURES
Materials-One-month-old White Leghorn hens were used. Radioactive nucleotides were products of Du Pont-New England Nuclear and "'1-anti-rabbit IgG, F(ab'):, was from Amersham, Japan. Several plasmid vectors (pGEM1, pGEM3 (Promega), and Bluescript (Stratagene Cloning Systems)) were used. Other materials were commercially obtained.
Cloning of Chicken H-Protein cDNA-In advance, we confirmed that an immunopurified antibody raised against rat H-protein (10) reacted with chicken H-protein.
A chicken liver cDNA expression library (11) was screened by the method described previously (11) using this antibody and a cDNA of an immunoreactive clone (pCH3a, about 0.5 kb)'. A commercially obtained library (CLONTECH Laboratory Inc.) was also used to obtain longer cDNA inserts.
RNA and Genomic DNA Blot Hybridization-Chicken liver total RNA was prepared by the method of Fyrberg et al. (12), and an aliquot was subjected to two cycles of oligo(dT)-cellulose (Collaborative Research Inc.) column chromatography. Either 40 pg of total RNA or 4 pg of poly(A)+ RNA was denatured with formaldehyde and formamide and subjected to Northern analysis using a 1.2% agarose gel containing formaldehyde (13).
High moiecular weight DNA was obtained from HD6 cells, a chicken erythroblast cell line (14), and chicken liver as described (14). The liver cells were crudely dispersed through several sheets of nylon meshes and treated with proteinase K and sodium dodecyl sulfate (SDS). Aliquots (20 pg) were digested with restriction enzymes and subjected to Southern analysis.
Nick-translated probes were hybridized to RNA and DNA, which The abbreviations used are: H-protein, hydrogen carrier protein; SDS, sodium dodecyl sulfate; kb, kilobase pair(s).
were immobilized on a nitrocellulose filter, for about 15 h at 37 "C or 42 "C in a solution composed of 50 mM Tris-HC1 buffer (pH 8.0), 1 M NaCl, 5 mM EDTA, 4 mM sodium phosphate, 0.1% SDS, 50% formamide, 100 rg/ml salmon testes DNA, and 5 X Denhardt's solution. The filter was washed a t 37-54 "C for 6 h with several changes of a solution containing 10 mM Tris-HC1 buffer (pH 8.0), 25 mM NaC1, 0.1% SDS, and 1 mM concentration each of EDTA and sodium phosphate. RNA and DNA were located by autoradiography, and intensities of signals were densitometrically determined using a Shimadzu TLC Scanner CS-910 equipped with an integrator, Chromatopac C-R1A.
Cloning of the Chicken H-Protein Gene-A chicken genomic library was constructed with XEMBL3 as a vector by the method of Choi and Engel (14). Approximately 1 X lo6 phages in the initial preparation were amplified once, and 6 X lo5 phages were subjected to screening with the nick-translated pCH3a insert.
DNA Sequencing-Variously truncated cDNA and genomic DNA fragments were subcloned. Some of genomic clones were subjected to isolation of ordered serial deletion mutants of both strands by the method of Henikoff (15). Nucleotide sequences were determined on the plasmids by the method of Sanger et al. (16) using 7-deaza-dGTP (17). Several oligonucleotides commercially available were used as primers. In particular cases, oligonucleotides complementary to the sequences in the chicken H-protein gene were synthesized using a DNA Synthesizer, model 381A (Applied Biosystems Inc., Japan) and used as primers.

RESULTS
Characterization of Chicken H-Protein cDNA-Several immunoreactive clones, including pCH3a, were isolated in the primary selection. None of the approximately 80 clones obtained using the pCH3a cDNA probe contained inserts over 800 bases long. The restriction map, relative locations of the cDNA clones, and strategy for DNA sequencing are summarized in Fig. 1, A, B, and C.
The nucleotide sequence of the H-protein cDNA was determined using the cDNA inserts from pCH37e, pCH50, and pCH62 (Fig. 2). The pCH37e cDNA comprises 561 base pairs with a poly(A) tail, and a methionine codon located at nucleotides 14 to 16 precedes an open reading frame. The sequence surrounding this codon matches the expected consensus for the initiator methionine codon (18), and a protein of 164 amino acid residues was deduced in this reading frame. The entire structure reported for the mature chicken H-protein (from the 40th Ser) by Fujiwara et al. (19) is included in the deduced protein structure, indicating that all the cDNAs listed encode chicken H-protein. Initial 39 amino acid residues can be assigned to a mitochondrial presequence. The pCH34d and pCH50 cDNAs were significantly longer than the others and differed in their 3'-end sequences from the pCH3a and pCH37e inserts. Although the sequence between nucleotides 506 and 542 (until T shown with # in Fig. 2) is commonly coded in the pCH3a and pCH37e cDNAs and actually followed by a poly(A) sequence, this region of the pCH50 cDNA further extends to the 892nd nucleotide. A clone designated as pCH62 contained a sequence identical with that of the pCH50 insert, but was further followed by an 18-bp sequence containing an AATAAA motif. These results indicate that two types of cDNA clones with structural heterogeneity in the 3' untranslated region were isolated.
Occurrence of the Two Types of H-Protein mRNAs-We also detected multiple H-protein mRNAs in chicken liver total RNA by Northern analysis. As shown in Fig. 3, the cDNA insert from pCHSa, most of which corresponds to the protein coding region, gives two bands of about 1.3 kb and 0.9 kb, respectively (lane I ) . The 0.9-kb RNA seems to be twice as intense in density as the 1.3-kb RNA on the autoradiogram. In contrast, the pCH50EcoT22/EcoRI cDNA fragment, most of which encompasses the 3' untranslated region, hybridizes to the longer RNA, showing a significantly intense signal. This probe, however, reveals a much weaker intensity of the signal for the 0.9-kb RNA (lane 2), compared to that of the longer RNA. These results indicate that two types of Hprotein mRNAs produced by the differently sized 3' untranslated region occur in the chicken liver.
Single Genomic Locus Specifying Chicken H-Protein-Two possible mechanisms which confer such heterogeneity in the 3' noncoding region can be postulated. One is the alternative use of multiple poly(A) sites and the other involves the differential use of separate exons which encode different structures for the 3' untranslated regions. To distinguish which of these mechanisms specifies the two H-protein mRNAs, we first examined the copy number for the H-protein cDNA in the chicken genome, and then the structure of the chicken H-protein gene, especially in the exon for the 3' untranslated region. Fig. 4 shows results of Southern analysis. The pCH3a probe hybridizes to the 11-and 1.5-kb EcoRI fragments (lam 2). The HindIII fragment of about 6 kb is also hybridized with this probe (lane 3 ) . In contrast, the pCH50EcoT22/EcoRI probe hybridizes to the 1.5-kb EcoRI fragment alone (lane 5) and also gives a 5.6-kb band in the HindIII-treated DNA (lane 6). The 5' region of the H-protein cDNA (the 5' EcoRI/ EcoRV fragment of pCH37e insert) gives a single band of 11 kb in the EcoRI-treated genomic fragments (data not shown). These results demonstrate that the sequence downstream from the EcoT221 site of the H-protein cDNA is coded in the 1.5-kb EcoRI fragment of genomic DNA. The integrated sizes of the fragments revealed in the lanes for the EcoRI-and HindIII-treated DNAs suggest that the signal in lane 3 comprises two fragments with similar sizes. (Note: one of the HindIII sites necessary for production of the 5.6-kb fragment in lane 6 seems to be beyond the range of Fig. 5B.) The pCH3a probe hybridizes to far longer fragments in the genomic DNA treated with BamHI (lane 1 ) or KpnI (lane 4).
The copy number for the H-protein cDNA was determined by Southern analysis using the 6-aminolevulinate synthase gene as a standard, because a single locus specifies this gene in the chicken genome (20). The pCH34d cDNA (769 bp formed by EcoRI) was ligated with the 790-bp EcoRIISphI fragment of 6-aminolevulinate synthase cDNA, pALlO (14), and subcloned at the EcoRI and SphI sites of the pGEM3 vector (pAHlink). This insert was nick-translated and hybridized to the chicken genomic DNA treated either with EcoRI or with HindIII. As shown in lane 7 of Fig. 4C, the 11-and 1.5-kb fragments originate from the H-protein gene, and the bands indicated as A are from the 6-aminolevulinate synthase gene (20). The integrated intensity of the fragments from the H-protein gene (98 in arbitrary units) is similar to that from the marker gene (112 units). The genomic DNA treated with

P A V R R L G T G S L L L S A R K F T D K H E W I S V E N G 56 CATTGGAACAGTAGGAATCAGCAATTTTGCACAGGAAGCATTAGGAGACGTTGTTTATTGTAGTCTTCCAGAAATTGGGACAAAATTGAA 2 7 0
Exon C , <Exon D .   (lanes 3, 6, and 8), and KpnI (lane 4 ) , separated on an 0.8% agarose gel, and transferred to nitrocellulose filters. The nick-translated inserts from pCH3a (lanes I, 2, 3, and 4 ) , pCH50EcoT22/EcoRI (lanes 5 and 6 ) , and pAHlink (lanes 7 and 8) were used as probes. Fragments from the 6-aminolevulinate synthase and H-protein genes are indicated as A and H in panel C.

T T T C A A T G T A G T C T G T C C A T C C A T C C A C T T G A A G T A A A G A
pCH37e. Therefore, the genomic region corresponding to nucleotides 1-134 of the cDNA was designated as Exon A and the subsequent exons as B, C, D, and E (Fig. 5, A, B, and C ) .
The exact boundaries of these exons are shown in Fig. 2, and the genomic sequences for the exonlintron boundaries and those flanking the 5' and 3' ends of the cDNA structure are presented in Fig. 6.
Exon A encodes the 13-bp 5' untranslated region followed by the nucleotide sequence for a mitochondrial presequence and Exon D includes the Lys residue for the lipoic acid binding site (Lys-98, in the precursor form). One of the HindIII sites, from which the 5.6-kb fragment was exhibited in Fig. 4B, is not found in the genomic fragments cloned and analyzed, suggesting the location of this HindIII site at the downstream    It seems to be a feature of exon-intron boundaries that they sequence to Ala and Leu, respectively.

a g t g c a c g g t t a a g g c t g t g t t t t c t t t t t g t c t g g a c t a a g t a a g c a c t g c a t g t g t c~c a g g t a t t a a~
Types of H-Protein mRNAs, in a Single Exon-The last exon, Exon E, is preceded by a 0.6-kb intron and, unlike the other exons, appreciably longer in size. The cDNA sequence for the carboxyl-terminal region of H-protein (32 amino acid residues) and the 3' untranslated region matches the genomic sequence and follows a 167-bp stretch extending to the EcoRI site, the 3' end of the 1.5-kb fragment shown in Fig. 4. It appears that both AATAAA motifs are in a single exon. Given the fact that only one copy for the H-protein cDNA exists in the genome, the alternative use of the different exons for the production of two H-protein mRNAs can be excluded. Conversely, these results indicate that the two types of H-protein mRNAs are formed by the alternative use of the separate poly(A) sites in the last exon.
Several G,T-rich sequences homologous to the second consensus, which promotes the efficiency in producing the poly(A) site of a precursor mRNA (22,23), are found in close proximity to two AATAAA motifs in the H-protein gene. As listed in Table I, TGTAGT and CTGTTTT are 23 and 64 bp downstream from the upstream AATAAA, and TTGTTTT, CTGTGCTTT, and GCTGTGTTTT are likely candidates for the second consensus of the downstream AATAAA.

DISCUSSION
The chicken H-protein cDNA clones have been isolated and well characterized. Only the pCH37e cDNA encodes an ATG codon which fulfills the requirements for the initiator methionine codon. Considering the size difference between the shorter H-protein mRNA and pCH37e cDNA, and the normal 100-200-bp length of the poly(A) tail, an additional 5' untranslated region with an unknown length may exist in this gene.
Two types of H-protein cDNAs were cloned. In addition, chicken liver total RNA contains two types of H-protein mRNAs which can be distinguished by the 3' untranslated region. In the present study, we provide two lines of evidence for the possible mechanisms by which the multiple H-protein mRNAs can be produced. Firstly, H-protein is encoded by a single locus in the chicken genome. Secondly, structural analysis of the cloned H-protein cDNA and gene demonstrates that the two polyadenylation consensus sequences are coded in a span of 371 bp in the H-protein cDNA and last exon, Exon E of the H-protein gene. These observations strongly suggest the idea that the alternative use of two AATAAA motifs of a precursor mRNA produces the differently sized H-protein mRNAs in the chicken. In this context, it is possible that several T -or G,T-rich sequences in the chicken H-protein gene act as the second stimulatory consensus for poly(A) site formation as directed by the different genes (22,23).
Multiple transcripts for the mouse dihydrofolate reductase gene have been reported, and, in this case, the altered consensus sequences create various amounts of transcripts relative t o that formed by recognition of the AATAAA sequence (24). In case of the H-protein mRNAs, chicken liver appears to be rich in the shorter sized mRNA (67% of total H-protein mRNA). This result suggests that both consensus motifs are used in different ratios, assuming an equal degradation rate for the different H-protein mRNAs. Analysis of the product from the run-off transcription is being conducted in a separate series of studies to ascertain whether activity for the transcription termination is found in the region between two AATAAA motifs.
The primary structure deduced for H-protein contained that for the mature protein reported by Fujiwara et al. (19) and is coded for by four exons (B to E). The structural similarity of H-protein to other lipoyl-enzymes has been partly discussed (5, 19), and Hummel et ai. (25) pointed out a conserved glycine at the 11th residue away from the active site lysine (Lys-98 in the precursor form) on the carboxylterminal side. Again, comparing the structures around the lipoic acid binding site of H-protein and lipoyl-enzymes of aketo acid dehydrogenases (25)(26)(27)(28)(29)(30)(31), an additional conserved glycine may be located at the 16th residue on the aminoterminal side beyond Lys-98. It seems to be an additional feature that the amino-terminal side of Lys-98 is rich in hydrophilic amino acids, but contains no basic amino acid in the proximal 10 residues. In contrast, the carboxyl-terminal side is composed of neutral or hydrophobic amino acids. We designated the structure between the 2 conserved glycine residues as a subdomain for the lipoic acid binding site. This subdomain is coded for by Exon C and D in the chicken Hprotein gene. However, there has been no report on the structural organization of the genes encoding other lipoyl enzymes of eukaryotes. It will be of interest to compare exon structures of the limited number of proteins that participate in the metabolism of a-keto acids and glycine and that can utilize lipoic acid as a prosthetic group (19,(25)(26)(27)(28)(29)(30)(31). It is conceivable that a series of proteins contains a subdomain which evolved from a common ancestral protein.