Homologous nucleotide sequences at the 5' termini of messenger RNAs synthesized from the yeast enolase and glyceraldehyde-3-phosphate dehydrogenase gene families. The primary structure of a third yeast glyceraldehyde-3-phosphate dehydrogenase gene.

Genomic DNA containing a third yeast glyceraldehyde-3-phosphate dehydrogenase structural gene has been isolated on a bacterial plasmid designated pgap11. The complete nucleotide sequence of this structural gene was determined. The gene contains no intervening sequences, codon usage is highly biased, and the nucleotide sequence of the coding portion of this gene is 90% homologous to the other two glyceraldehyde-3-phosphate dehydrogenase genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). Based on the extent of nucleotide sequence divergence among the three glyceraldehyde-3-phosphate dehydrogenase genes, it is likely that they arose as a consequence of two duplication events and the gene contained on the hybrid plasmid designated pgap11 is a product of the first duplication event. All three structural genes share extensive nucleotide sequence homology in the 5'-noncoding regions adjacent to the three respective translational initiation codons. The gene contained on pgap11 is not homologous to the others downstream from the respective translational termination codon, however. The 5' termini of messenger RNAs synthesized from the three glyceraldehyde-3-phosphate dehydrogenase and two yeast enolase genes have been mapped to sites ranging from 36 to 82 nucleotides upstream from the respective translational initiation codons. In each case the 5' terminus of the mRNA maps to a region of strong nucleotide sequence homology which is shared by all five structural genes. These latter data confirm that all five structural genes are expressed during vegetative cell growth and further support the hypothesis that a portion of the 5'-noncoding flanking region of the yeast glyceraldehyde-3-phosphate dehydrogenase and enolase genes evolved from a common precursor sequence.

Genomic DNA containing a third yeast glyceraldehyde-3-phosphate dehydrogenase structural gene has been isolated on a bacterial plasmid designated pgapll. The complete nucleotide sequence of this structural gene was determined. The gene contains no intervening sequences, codon usage is highly biased, and the nucleotide sequence of the coding portion of this gene is 90% homologous to the other two glyceraldehyde-3phosphate dehydrogenase genes (Holland,  Based on the extent of nucleotide sequence divergence among the three glyceraldehyde-3-phosphate dehydrogenase genes, it is likely that they arose as a consequence of two duplication events and the gene contained on the hybrid plasmid designated pgapll is a product of the first duplication event. All three structural genes share extensive nucleotide sequence homology in the 5'-noncoding regions adjacent to the three respective translational initiation codons. The gene contained on pgapll is not homologous to the others downstream from the respective translational termination codon, however. The 5' termini of messenger RNAs synthesized from the three glyceraldehyde-3-phosphate dehydrogenase and two yeast enolase genes have been mapped to sites ranging from 36 to 82 nucleotides upstream from the respective translational initiation codons. In each case the 5' terminus of the mRNA maps to a region of strong nucleotide sequence homology which is shared by all five structural genes. These latter data confirm that all five structural genes are expressed during vegetative cell growth and further support the hypothesis that a portion of the 5'noncoding flanking region of the yeast glyceraldehyde-3-phosphate dehydrogenase and enolase genes evolved from a common precursor sequence.
The existence of three yeast glyceraldehyde-3-phosphate dehydrogenase structural genes was first suggested from DNA fdter-blotting experiments using restriction endonucleasecleaved genomic DNA and hybridization probes which are 5 American Heart Association Established Investigator.
complementary to glyceraldehyde-3-phosphate dehydrogenase mRNA sequences (1,2). Two segments of genomic DNA were subsequently isolated on bacterial plasmids and shown by nucleotide sequencing to contain structural genes which are 95% homologous (3,4). Here we report the isolation and nucleotide sequence of the third yeast glyceraldehyde-3-phosphate dehydrogenase structural gene. Glyceraldehyde-3-phosphate dehydrogenase is expressed at high levels in yeast (5, 6) and it is of interest to know if all three structural genes are expressed during vegetative cell growth. There have been numerous reports of multiple forms of the enzyme isolated from yeast but it is unclear if these represent isozymes or forms which have been modified during isolation. The most compelling evidence that at least two of the genes are expressed is based on the fact that a limited number of ambiguities in the primary structure determined for the yeast enzyme ( 7 ) can be reconciled if the sequenced protein was a mixture of at least two of the polypeptides encoded by the isolated genes (4). Evidence is presented here that mRNA is synthesized in vivo from all three structural genes during vegetative cell growth.
The 5' termini of the three glyceraldehyde-3-phosphate dehydrogenase mRNAs and the mRNAs synthesized from the two yeast enolase genes (8) have been identified. In all cases the 5' termini map to a region of nucleotide sequence homology which is shared by all five structural genes. These data support our hypothesis that the 5"noncoding regions of the glyceraldehyde-3-phosphate dehydrogenase and enolase genes are structurally related (8).

RESULTS
Isolation of pgapll-There are three segments of yeast genomic DNA which are complementary t.o cDNA hybridization probes synthesized from purified yeast glyceraldehyde-3-phosphate dehydrogenase mRNA (1, 2). Two of these regions of genomic DNA were subsequently isolated and shown to contain glyceraldehyde-3-phosphate dehydrogenase structural genes (3, 4). When yeast genomic DNA is analyzed by DNA filter blotting after EcoRI digestion, 13.2-, 5.5-, and 4.3kb' fragments form hybrids with a nick-translated hybridization probe synthesized from the 2.1-kb HindIII fragment which contains the glyceraldehyde-3-phosphate dehydrogenase gene within the plasmid pgap491 (3). The 13.2-kb EcoRI fragment has been isolated on a plasmid designated pgap492 (2, 4) and corresponds to the structural gene already isolated on the plasmid designated pgup49 (1). The 5.5-kb EcoRI fragment corresponds to the structural gene previously isolated on the plasmid designated pgap63 (4). The 4.3-kb EcoRI fragment was cloned as previously described for the isolation of pgap492 (2). EcoRI-digested yeast DNA was ligated into the EcoRI site of the vector pSF2124. Total plasmid DNA was isolated from the shotgun collection obtained and supercoiled plasmids corresponding in size to pSF2124 plus a 4.5-kb insertion of yeast DNA was isolated by preparative agarose gel electrophoresis. After retransformation of competent Escherichia coli with this enriched fraction of plasmids, the collection was screened by colony hybridization using a nick-translated probe synthesized from the 2.1-kb HindIII fragment isolated from pgap491. A hybrid plasmid designated pgapll was identified and analyzed as described below.
Restriction Endonuclease Mapping of pgapl I : Comparison with pgap492 a n d pgap63"In order to confirm that the sequences isolated on pgapll are co-linear with genomic DNA sequences, DNA Wter blotting was carried out in parallel with the isolated plasmid and yeast genomic DNA. ment from pgapll corresponds to the smallest hybrid formed in genomic DNA. These data confirm that pgapll contains a segment of DNA which is contiguous with corresponding genomic sequences.
Similar blotting experiments were carried out with genomic DNA isolated from several haploid strains of Saccharomyces cereuisiae (data not shown). In some strains, we observed polymorphisms with respect to the location of EcoRI and HindIII restriction endonuclease cleavage sites adjacent to the glyceraldehyde-3-phosphate dehydrogenase genes. Comparison of these blots with those generated with DNA isolated from strain F1 demonstrated that there are three glyceraldehyde-3-phosphate dehydrogenase structural genes per haploid genome in all the strains tested and the three genes corresponded in all cases to those isolated on pgap492 (2,4), pgap63 (4), and pgapll.
A restriction endonuclease cleavage map of pgapll, illustrated in Fig. 2, was generated as described under "Experimental Procedures." The location of the coding sequences in the plasmid and the direction of transcription were determined from the nucleotide sequence described below and are indicated in Fig. 2 by the shaded region and arrow, respectively. The HpuI cleavage site within the coding portion of the gene in pgapll is homologous to HpaI cleavage sites in the genes contained within pgap492 and pgap63 (4). The restriction endonuclease cleavage map of pgapl 1 adjacent to the coding sequences is not homologous to the other two structural genes. These data confirm that none of the yesat glyceraldehyde-3-phosphate dehydrogenase genes are tandemly repeated in the yeast genome.
A more detailed restriction endonuclease map of the coding and adjacent noncoding sequences of the glyceraldehyde-3phosphate dehydrogenase gene within pgapll is shown in Fig.  3. The restriction endonuclease cleavage maps of the corresponding regions of pgup491 and pgap63 (4) are also shown for comparison. Within the coding regions of the genes, approximately 65% of the restriction endonuclease cleavage sites are present in all three structural genes. These data demonstrate, together with the primary structure described in the following section, that the three structural genes are very homologous. No homology among the genes is observed for restriction endonuclease cleavage sites which map outside of the respective coding sequences. Primary Structure of the Glyceraldehyde-3-phosphate Dehydrogenase Gene in pgapl I-The complete nucleotide sequence of the coding region of the glyceraldehyde-3-phosphate dehydrogenase gene contained within pgapll is shown in 4 as are the sequences for the genes in pgap491(3) and pgap63 (4). The primary structure was determined using the sequencing strategy outlined in Fig. 5. The gene in pgapll contains a single open reading frame without intervening sequences. The primary structure of the polypeptide predicted from this gene is highly homologous to the primary structure determined for yeast glyceraldehyde-3-phosphate dehydrogenase (7), confirming that this plasmid does contain a third structural gene. The primary structure of the polypeptide encoded by the gene within pgapll is 89 and 88% homologous to the polypeptides encoded by the genes within pgap63 and pgap491, respectively. Based on these data, it is possible that the gene within pgapll was derived from an early duplication event and that the genes contained on pgap63 and pgap491 arose from a subsequent duplication. Alternatively, the similarity between the genes on pgap63 and pgap491 could be due to preferential gene conversion between these two structural genes rather than a later duplication event. The possibility of gene conversion among the three glyceraldehyde-3-phosphate dehydrogenase genes is considered under "Discussion." Codon usage within the gene contained within ggapll follows the same highly biased pattern previously described for those within pgap63 and pgap491 (3,4). In the cases of alanine, aspartic acid, isoleucine, serine, threonine, and valine, two codons are used exclusively which contain either C or U in the third position. The remaining 14 amino acids are encoded by a single codon in approximately 98% of the cases. For the six amino acids which are encoded by two codons, alanine and aspartic acid are biased for GCU (81%) and GAC (68%), respectively. The codons for isoleucine, serine, threonine, and valine contain either C or U in the third position approximately 50% of the t,ime. There are 84 positions within the three glyceraldehyde-3-phosphate dehydrogenase structural genes at which all three genes contain an isoleucine, serine, threonine, or valine codon. In 56 of these cases, the third positions of the respective codons are identical. In the remaining 28 cases, the nonidentical third position nucleotide occurred in pgap491, pgap63, or pgapll in 46, 25, and 29% of the cases, respectively. The distribution of third position changes among the three structural genes will be discussed below.
Within the portions of the three structural genes which encode the NAD-binding domains of the polypeptides (residues 1-147), the gene contained within pgapll differs from those contained on pgap491 and pgap63 at 69 and 63 nucleotide positions, respectively. Within this same region, the genes contained on pgap491 and pgap63 differ by 30 nucleotides. Within the portions of the genes which encode the cat,alytic domains of the polypeptides (residues 148-331), the gene contained on pgapll differs from those contained on pgap491 and pgap63 by 40 and 38 nucleotides, respectively. Within this region, the genes within pgap491 and pgap63 differ by 23 nucleotides. These data demonstrate that the observed rate of divergence within the portions of the genes which encode the NAD-binding domains of the polypeptides is higher than that observed within the catalytic domain.
The putative location of amino acid residues which differ among the three yeast glyceraldehyde-3-phosphate dehydrogenase polypeptides in the native enzyme has been ascertained by comparison with the three-dimensional structure determined for lobster glyceraldehyde-3-phosphate dehydrogenase (9,10). The majority of the amino acids which differ among the three polypeptides are predicted to reside at external regions of the enzyme. Amino acid differences at positions predicted to be within internal regions of the enzyme are chemically conservative amino acid changes. There are a limited number of amino acid substitutions which might alter the structure of the tetrameric enzyme; however, no striking structural alterations are predicted from this analysis.
The Primary Structure of the 5'-and 3'-noncoding Regions of the Glyceraldehyde-3-phosphate Dehydrogenase Gene in pgapll-The primary strcuture of the regions of the glyceraldehyde-3-phosphate dehydrogenase gene in pgapll which are adjacent to the translational initiation and termination codons is shown in Fig. 6. As observed for the genes contained on pgap491 (3) and pgap63 (4), the A + T composition of the nocoding regions of the gene on pgapll are 72% for 150 nucleotides upstream from the translational initiation codon and 74% for 120 nucleotides downstream from the translational termination codon. The hexanucleotide TATAAA is located 150 nucleotides upstream from the translational initiation codon in pgapll. This hexanucleotide is located 139 and 130 nucleotides upstream from the translational initiation codons in pgap491 and pgap63, respectively (4). There are two regions of strong nucleotide sequence homology between the 5"noncoding sequences in pgapll and those in pgap491 and pgap63. These regions are illustrated in Fig. 7. The sequences between -1 and -38 in pgapll are homologous to the sequences between -1 and -47 in pgap491 and those between -1 and -56 in pgap63. The region between -39 and -88 in pgapll is also homologous to these same regions in pgap491 and pgap63. Based on the strong similarities in primary structure between these two regions in pgapll, it is likely that they arose by tandem duplication.
Within these same regions of nucleotide sequence homology among the glyceraldehyde-3-phosphate dehydrogenase genes, The amino acid sequence predicted from the nucleotide sequence is shown be2ow the nucleotide sequence. Codons in pgap49 which differ from pgupll are shown above the continuous sequence. Codons in pgap63 which differ from pgapll are shown below the continuous sequence. In positions where the codon in pgap49 or pgap63 predicts a different amino acid than in pgapll, the amino acid is indicated above @gup49) or below (pgap63) the codon.
there are strong homologies with corresponding 5"noncoding regions of the two yeast enolase genes (8). As shown in Fig. 7 , all five genes contain a hexanucleotide homologous to CA-CACA, 5-15 nucleotides upstream from the respective translational initiation codons. A second region of strong sequence homology is located 22-41 nucleotides upstream from each respective initiation codon. Interestingly, the apparent dupli-cation of sequences in pgapll (nucleotides -39 to -88) begins with the CACACA hexanucleotide and ends with the second homologous region described above. The significance of these homologous regions will be discussed further in the following section.
The 3"noncoding region of the gene contained in pgapll is not homologous to the corresponding regions in pgap491 and pgap63, although the latter two genes contain significant homologies within the first 100 nucleotides following the translational termination codon (4).
Mapping the 5' Termini of the Messenger RNAs Encoded by the Yeast EnoLase and Glyceraldehyde-3-phosphate Dehydrogenase Gene Families--In order to determine if the three glyceraldehyde-3-phosphate dehydrogenase structural genes are expressed in yeast, hybridization analyses were carried out with total poly(A)-containing yeast mRNA and hybridization probes containing sequences which are complementary to the 5"noncoding portions of the three structural genes. Since the coding portions of the structural genes are extremely homologous and cross-hybridize, hybridization probes were prepared which included sequences from the 5" noncoding regions of the genes. A parallel set of experiments was carried out with the two enolase structural genes. From the analysis, it is possible to map the 5' terminus of the mRNA and to ascertain if the mRNA is synthesized.
In the case of the yeast enolase genes, both genes are expressed during the vegetative cellular growth however, the amount of mRNA synthesized from the two genes is highly dependent on the carbon source used to propagate the cells (11). In cells growing logarithmically in medium containing glucose, for example, 95% of the enolase mRNA is derived from the structural gene contained on the plasmid designated peno8, while 5% is encoded by the gene contained on the hybrid plasmid designated pen046 (8,ll). In order to map the 5' termini of the two enolase mRNAs, hybridization probes were isolated from each plasmid which extended from a common H i d restriction endonuclease cleavage site located 34 nucleotides downstream from the translational initiation codons of both genes, to sites far upstream from the translational initiation condons. The nucleotide sequences of the two enolase structural genes are identical between the initiation codon and the HznfI cleavage site. The probes were labeled with polynucleotide kinase at the 5' termini of the HinfI cleavage sites and were then hybridized with total yeast poly(A)-containing mRNA as described under "Experimental Procedures.'' The poly(A)-containing mRNA was isolated from cells grown in the presence of glucose as carbon source. Under these growth conditions, the mRNA encoded by the gene contained on pen046 comprises approximately 5% of the enolase mRNA in the cell. S1 nuclease digestion of the hybrids formed between the probe derived from the gene on pen046 and this mRNA preparation should reveal at least two resistant hybrids. The major hybrid would be formed between the probe and mRNA synthesized from the gene corresponding to pen08 and should extend from the HinfI site through the   homologous coding sequences to the last homologous nucleotide which is one nucleotide upstream from the translational initiation codon. The second hybrid would be formed between the probe and mRNA synthesized from the homologous gene corresponding to peno46. As illustrated in Fig. 8, the two expected hybrids are formed. The smaller, more abundant hybrid maps to a position one nucleotide upstream from the initiation codon while a less intense hybrid is present which maps to a position 40 nucleotides upstream from the initiation codon. Neither hybrid is formed in the absence of yeast mRNA. In each case, a series of five hybrids, differing by one nucleotide from each other, is observed. This same pattern is observed for the glyceraldehyde-3-phosphate dehydrogenasemapping experiments. In the case of the hybrid formed between the probe isolated from the gene in pen046 and the mRNA synthesized from the gene corresponding to peno8, we know from the primary structures of the enolase genes that the last homologous nucleotide is one position upstream from the initiation codon (8). The series of five hybrids extends from three nucleotides upstream from the last homologous nucleotide to one nucleotide downstream from this nucleotide. These data suggest that S1 nuclease trims the hybrids within a few nucleotides of the last base pair in the hybrid. We therefore assume that the error in mapping is +4-5 nucleotides. Based on these data, it is unlikely that the multiple hybrids reflect heterogeneity at the 5' termini of the mRNAs.

( -8 8 )~T~~~T C A 4 G -A 4 C T T G G m -G A T A m C C ( -3 9 ) ( -3 8 ) A 4 n A n A C --f f i -T A C T -----T C A --C T I W \ m A -C A C A C
A similar experiment was carried out with the probe isolated from peno8. As illustrated in Fig. 9, a single S1 nucleaseresistant hybrid is observed which maps 36 nucleotides upstream from the translational initiation codon in peno8.
Hybridization probes were isolated from the three glyceraldehyde-3-phosphate dehydrogenase structural genes which extend from a HinfI restriction endonuclease cleavage site 29 nucleotides downstream from the initiation codon in all three genes to sites far upstream from the initiation codon in each respective gene. Hybridization was carried out as described for the enolase genes utilizing total poly(A)-containing mRNA isolated from cells grown in the presence of glucose. The probes isolated from pgap491, pgup63, and pgupll formed hybrids which mapped to sites 44, 53, and 82 nucleotides upstream from the initiation codon of each respective gene (Fig. 9). In each case, the hybrid observed extended beyond the initiation codon and includes sequences within the 5'noncoding region of the gene. Since the 5"noncoding regions of the three glyceraldehyde-3-phosphate dehydrogenase genes lack sufficient homology to cross-hybridize, we conclude that

40-
(-36)-the hybrids observed were formed with mRNA synthesized from the gene which corresponds to the hybridization probe. The size of the hybrid formed, therefore, corresponds to the number of nucleotides from the HinfI site to the last nucleotide which is complementary to the 5' terminus of the mRNA. The locations of the 5' termini of the enolase and glyceraldehyde-3-phosphate dehydrogenase mRNAs, calculated from the data presented in Figs. 8 and 9, are shown in Fig. 7. The solid burs above each sequence correspond to the end points of the hybrids formed with each probe. The dots above the burs indicate the most intense of the family of hybrids. The most striking feature of these data is that the 5' termini of all five mRNAs are located within a region of strong nucleotide sequence homology among the genes. In the case of the gene contained on pgupll, the 5' terminus of the mRNA synthesized from this maps only to one of the duplicated homologous sequences. These data demonstrate that mRNA is synthesized in vivo from all three glyceraldehyde-3-phosphate dehydrogenase genes. They also show that the 5'-nontranslated re- gions of the yeast enolase and glyceraldehyde-3-phosphate dehydrogenase mRNAs are homologous.

DISCUSSION
Previous reports from this laboratory (3,4) and the data presented here confirm that there are three nontandemly repeated glyceraldehyde-3-phosphate dehydrogenase structural genes in yeast. These genes are not alleles since they are present in haploid strains of S. cerevisiue. None of the genes contains an intervening sequence (3,4). The gene contained on the pgupll plasmid is homologous to but not identical with the other two (3,4) within the coding portions of the genes. Codon usage within the gene on p g a p l l follows the same highly biased pattern observed for the other glyceraldehyde-3-phosphate dehydrogenase genes (4).
Based on the primary structures of the three glyceraldehyde-3-phosphate dehydrogenase genes, none appears to be a pseudogene. The S1 nuclease-mapping data demonstrate that mRNA is synthesized from all three structural genes. It seems likely that the three polypeptides are also synthesized in vivo and that they might give rise to isozymes of the tetrameric enzyme. The most convincing eveidence that there are at least two glyceraldehyde-3-phosphate dehydrogenase polypeptides present in yeast is based on the fact that two ambiguities in the primary structure of the enzyme (7) can be resolved if polypeptides are present in the cell which are encoded by at least two of the isolated genes. Jones and Harris (7) report similar molar yields of serine and threonine a t position 36 and high yields of valine and isoleucine a t position 328. The gene isolated on pgup491 encodes threonine and isoleucine at these positions while serine and valine are encoded at these positions by the genes on pgup63 and pgapll. The remainder of the primary structure determined by Jones and Harris (7) agrees closely with that predicted from the gene in pgap491. Based on these data, one would predict that the polypeptide encoded by the gene in pgup491 was the major polypeptide in the preparation which was sequenced. It is also reasonable to conclude that one or both of the polypeptides encoded by the genes in pgup63 and pgupll were present in the preparation.
The simplest mechanism for the evolution of three glyceraldehyde-3-phosphate dehydrogenase structural genes in yeast would be to postulate two successive duplication events. The coding portion of the gene contained in pgupll is 90% homologous to the coding portions of the genes contained in pgup491 and pgap63. The coding portions of the genes in pgap491 and pgup63 are 95% homologous. If one ignores the possibility of preferential recombination between specific pairs of structural genes, then one would predict that the gene contained in pgupll is a product of the first duplication event while those in pgap491 and pgup63 are the products of a subsequent duplication. Minimum estimates of the times of these duplication events, calculated from the observed rate of divergence of glyceraldehyde-3-phosphate dehydrogenase (12), are 200 and 100 million years, respectively. These estimates would represent minimum values since concerted evolution of the family of yeast glyceraldehyde-3-phosphate dehydrogenase genes ( i e . recombination among the genes) would minimize the observed sequence divergence among the genes. It is quite possible that the yeast genes have evolved in concert since multiple forms of glyceraldehyde-3-phosphate dehydrogenase have been observed in a wide variety of eucaryotic cells, suggesting duplication events which are much earlier than those estimated above.
The distribution of nucleotide sequence changes within specific portions of the coding regions of the three yeast glyceraldehyde-3-phosphate dehydrogenase structural genes was analyzed in order to test the possibility that preferential recombination occurs between specific pairs of structural genes. If the three genes do not undergo recombination or if recombination occurs randomly among the three genes, then the time and order of duplication estimated from the sequence divergence data determined for any statistically significant portion of the three coding regions should be similar. Initially, divergence within the portions of the genes which encode the two functional domains of the polypeptides was examined.
Within the portions of the structural genes which encode the NAD-binding domains of the polypeptides (codons 1-147), the gene contained in pgupll differs from those contained in pgap491 and pgap63 at 69 and 63 nucleotide positions, respectively. The latter two genes differ by 30 nucleotides in this region. The portion of the gene in pgupll which encodes the catalytic domain of the polypeptide (codons 148-331) differs from the genes in pgap491 and pgap63 at 40 and 38 nucleotide positions, respectively.
The genes contained in pgap491 and pgap63 differ by 23 nucleotides within the regions encoding the catalytic domains. Based on these data, sequences encoding the NAD-binding domain in pgupll are 2.3-fold (69:30) more diverged from the genes in pgup491 or pgap63 than are the latter genes from each other. In the case of sequences encoding the catalytic domains, the gene in pgupll is 1.7-fold (40:23) more diverged from those in pgup491 or pgup63 than are the latter genes from each other. In both cases, the data predict that the genes contained in pgap491 and pgap63 are the products of a second duplication event. The estimated times of the two duplication events, however, are somewhat different if one considers the data from the NAD-binding domain versus the catalytic domain. The catalytic domain divergence data predict that the second duplication is closer to the first duplication than do the data from the NAD-binding domain.
A more striking anomaly in the divergence pattern of the three yeast structrual genes is observed if one considers those sequences between codons 144 and 243. This portion of the structural genes encodes the most highly conserved portion of the polypeptides. The amino acid sequences of the yeast, lobster, and pig glyceraldehyde-3-phosphate dehydrogenases are extremely homologous in this region (7,10). Within this region, there is a single amino acid difference predicted from the sequences of the three yeast genes. The structural genes in pgupll and pgap63 predict a methionine residue a t position 178 while the gene in pgap491 predicts leucine a t this position. Within this region, the sequences of the genes in pgup491 and pgup63 differ by 11 silent third position codon changes and a single first position codon change at codon 178. The gene contained in pgapll differs within this region from the genes in pgap491 and pgup63 by 13 and 5 nucleotides, respectively. Interestingly, these data predict that the genes in pgupll and pgap63 are the products of the second duplication. Since all but one of the nucleotide sequence differences among the three genes in this region are silent third position codon changes, it seems unlikely that the conservation of nucleotide sequence between the genes in pgupll and pgup63 in this region of the structural genes is the result of selective pressure at the polypeptide level. An attractive explanation for the anomalous duplication times predicted from the divergence data in the NAD-binding domains uersus the catalytic domains would be to postulate preferential recombination among the three structural genes within sequences encoding the catalytic domains uersus the NAD-binding domains. In the case of sequences encoding residues 144-243 of the polypeptides, one would further postulate preferential recombination in this region between the genes in pgupll and pgup63.
The 5'-noncoding region of the gene in pgapll is homolo-gous to the genes contained in pgap491 and pgap63 within the regions adjacent to the translational initiation codons. This region of homology among the three genes appears to have been tandemly duplicated in pgapll. No homology between the 3"noncoding portion of the gene in pgapll and corresponding regions of the other two genes was observed. The S1 nuclease-mapping data demonstrate that mRNA is synthesized from all three yeast glyceraldehyde-3-phosphate dehydrogenase genes. These data also show that the 5' termini of the glyceraldehyde-3-phosphate dehydrogenase genes and the enolase genes map to a region of nucleotide sequence homology which is shared by all five structural genes. Thus, the homologous portions of these two gene families which are adjacent to the initiation codons (8) are present within the 5'nontranslated regions of the mRNAs synthesized from the genes. While the functional significance of this homologous region of nucleotide sequence is not known, it is likely that these sequences evolved from a common precursor. Since the coding portions of the glyceraldehyde-3-phosphate dehydrogenase gene family is unrelated to the coding regions of the enolase genes, it is likely that the complete genes evolved by a segmental process.
The 5"noncoding regions of the glyceraldehyde-3-phosphate dehydrogenase and enolase genes have been compared to other yeast genes for which the 5' terminus of the mRNA synthesized from the gene is known. Within the sequences surrounding those corresponding to the 5' terminus of the respective mRNA, no homology was found among the genes coding for yeast iso-1-cytochrome c (13), yeast TRP5 (14), and those reported here. In contrast, the genes coding for the yeast alcohol dehydrogenases (15,16) contain regions of significant homology. The region surrounding the sequences coding for the 5' termini of the glyceraldehyde-3-phosphate dehydrogenase and enolase genes has the general structure: AAAAAACCAAGEAACT where the underlined region indicates the mapped termini of the mRNAs. The corresponding regions in ADCl is AATATTTCAAGCTATACCAAG CATAC. In ADR2, the sequence of the corresponding recon is: AGAATATCAAGCTACA. In both of these genes, the sequence CAAGC is present at or near the mapped 5' termini of the mRNAs. This sequence is homologous to sequences adjacent to the 5' termini of the glyceraldehyde-3-phosphate dehydrogenase and enolase genes. Finally, the sequence surrounding the site mapped for the 5' terminus of the yeast HIS3 gene (17) is: AAAAAATGAGCAGGC. -This sequence is " also homologous to the corresponding regions of the genes described here but the degree of homology with the CAAGC sequence is not as strong as for the alcohol dehydrogenase genes. Interestingly, the sequence AAAAAAC--AG-TACT is present in pgapll between nucleotides -26 and -38. Although this sequence is extremely close to the consensus sequence for the genes, it does not correspond to the 5' terminus of a mRNA detected in the cell. This sequence lacks the CAAG portion of the homology. It is tempting to speculate that the homologous portions of these genes play some role in transcription or translation. Correlation of these sequences with expression of the genes will require further analysis of the expression of genes containing defined alterations within these homologous sequences.