Structure of Two Unlinked Drosophila melanogaster Glyceraldehyde-3-phosphate Dehydrogenase Genes*

Two Drosophila genes that code for the enzyme glyc-uisite for these kinds of studies, we isolated and sequenced eraldehyde-3-phosphate dehydrogenase (Gapdh) have the genes for glyceraldehyde-3-phosphate dehydrogenase from been isolated and their structures determined by DNA Drosophila mehnogmter. This organism was chosen for its sequence analysis. two genes, Gadph-1 and Gapdh-small genome size and the existence of techniques in tissue 2, are homologous to each other in their coding regions culture cell transfection (1) and germ-line transformation (2). but differ entirely in the 5’ and 3‘ flanking regions. The latter two techniques are essential in analyzing the in Both genes are expressed in flies as vitro mutated promoter and other regulatory elements, and determined by Northern analysis using gene-’pe-they are currently used in studying the expression of several Cific Probes. Gapdh-1 is mapped by in situ hybridiza- Drosophila genes (3-5). this article, we report the isolation tion at position 43E-F on the right arm Of the second and DNA sequence analysis of two Drosophi~ Gap&, genes. chromosome and Gapdh-2 at position 13F On the left In addition, we characterized the transcription unit for both arm of the X chromosome. Transcription initiation sites as well as oolvadenvlation sites for both Ga&h genes to aid future studies in transcriptional regulation. transcripts have-alio been determined. Gapdh- 1 lacks a sequence homologous to the TATA

Two Drosophila genes that code for the enzyme glycuisite for these kinds of studies, we isolated and sequenced eraldehyde-3-phosphate dehydrogenase (Gapdh) have the genes for glyceraldehyde-3-phosphate dehydrogenase from been isolated and their structures determined by DNA Drosophila mehnogmter. This organism was chosen for its sequence analysis. two genes, Gadph-1 and Gapdhsmall genome size and the existence of techniques in tissue 2, are homologous to each other in their coding regions culture cell transfection ( 1 ) and germ-line transformation (2). but differ entirely in the 5' and 3' flanking regions. The latter two techniques are essential in analyzing the in Both genes are expressed in flies as vitro mutated promoter and other regulatory elements, and determined by Northern analysis using gene-'pethey are currently used in studying the expression of several Cific Probes. Gapdh-1 is mapped by in situ hybridiza-Drosophila genes (3-5). this article, we report the isolation tion at position 43E-F on the right arm Of the second and DNA sequence analysis of two D r o s o p h i~ Gap&, genes. chromosome and Gapdh-2 at position 13F On the left In addition, we characterized the transcription unit for both arm of the X chromosome. Transcription initiation sites as well as oolvadenvlation sites for both Ga&h genes to aid future studies in transcriptional regulation. transcripts have-alio been determined. Gapdh-1 lacks a sequence homologous to the TATA box in its -30base pair region that is characteristic of many RNA polymerase I1 transcribed promoters. In contrast, Gapdh-2 contains a consensus TATA box sequence as well as a CAAT box in its promoter region. Furthermore, a sequence element ATTTGCAT (dc) and nontandem multiple direct repeats have been found in the -38 to -155-base pair 5' flanking region. Other than the intron located in the 5' noncoding region of Gapdh-2, both genes lack intervening sequences.
Screening the Drosophila Genomic and cDNA Library-D. melnnogoster Canton-S genomic library was prepared by Maniatis et al. (6). It is a Charon 4A random-shear library from embryonic DNA with inserts of about 16 kbl and terminated with svnthetic EcoRI linkers. Recombinant phages were propagated in E.coli strain DP50 supF (7). One genome equivalent in this library corresponds to about 10,000 its gene structure for two reasons. The first concerns the evolution studies of this highly conserved enzyme on the nucleotide level. Specifically we would like to analyze the pattern of changes that are not evident on the protein level during evolution, parameters such as synonymous nucleotide substitutions, placement of introns, and nucleotide changes in the noncoding regions. Secondly, we are interested in the promoter function of this gene because of the constitutive nature of its high expression. We expected that the promoter of Gapdh-1 (where Gadph represents glyceraldehyde-3-phosphate dehydrogenase gene) might offer a simple system for studies in transcription by RNA polymerase 11. As a prereq-to proceed at either room temperature (low stringency) or at 42 "C (high stringency) for 24-48 h. Filters were washed once with 50% formamide, 5 X SSC, and 0.2% SDS and three times with 2 X SSC and 0.2% SDS at the hybridization temperature prior to autoradiography.
DNA Sequence Analysis-DNA sequencing was performed by the dideoxynucleotide chain termination technique of Sanger et a!-. (18) as modified by Messing (19).
In Situ Hybridization to the Polytene Chromosome-Preparation of the biotin-tagged DNA fragment was essentially the nick translation procedure of Rigby et al. (20) except that 14.5 p~ biotinylated dUTP was used instead of dTTP. Polytene chromosomes were prepared from larval salivary gland nuclei as described in Simon et al. (15). In situ hybridization was performed with biotin-tagged probe according to the procedure described by Enzygene Inc.
SI Mapping of the Gapdh Transcripts-SI nuclease mapping experiments were carried out according to the following procedure. The 32P-labeled probe was synthesized in uitro from a restriction fragment primer complementary to the DNA sequence cloned in M13 mpl0 or mpll. In a typical reaction, 2 pg of template was used at a molar ratio of primer to template of 1:l. The mixture was heated at 100 "C for 3 min and annealed at 60 "C for 30 min in 10 mM Tris-HC1, pH 7.6, 50 mM NaC1, 10 mM MgCl,, and 1 mM dithiothreitol. DNA synthesis was allowed to proceed by adding to the above mixture dGTP, dCTP, dTTP (final concentration, 20 p~) , [CP~'P]~ATP (10 pCi, 410 Ci/mol), and DNA polymerase large fragment (2 units). After incubation at 25 "C for 15 min, the reaction mixture was chased with cold dATP (final concentration, 20 pM) for 15 min. After digestion with an appropriate restriction enzyme, the reaction product was ethanol precipitated. The pellet was resuspended in 30% dimethyl sulfoxide, 10 mM Tris-HC1, pH 8.0, 0.03% xylene cyanol, 0.03% bromphenol blue, heated at 100 "C for 5 min, and loaded onto a 5% polyacrylamide gel to isolate the single-stranded DNA probe (13). 32P-labeled DNA probes were then hybridized to total Drosophila RNA (10 pg) or poly(A+) RNA (1 pg) in conditions described by Treisman et al. (21). The samples were diluted to 300 pl with S1 nuclease digestion buffer containing 250 mM NaCl, 30 mM sodium acetate, pH 4.6, 4.5 mM ZnS04, 100 units/ml S1 nuclease, 20 pg/ml denatured calf thymus DNA, and incubated at 25 "C for 30 min. The digestion products were ethanol precipitated and analyzed on a 8% polyacrylamide, 7 M urea gel.
Primer Extension for Mapping the 5' End of the Gadph Tran-s~ripts-~~P-labeled primers were synthesized using M13 sequencing primer in a protocol similar to that described in the last paragraph and isolated as double-stranded fragment after restriction digestion. The primers were hybridized to the Drosophila RNA as described above. Hybrids were ethanol precipitated, and primer extensions were performed according to Treisman et al. (21). The cDNA products were analyzed on a 8% polyacrylamide, 7 M urea gel.

RESULTS
Isolation of the Drosophila Gapdh-I-Amino acid sequences of glyceraldehyde-3-phosphate dehydrogenase from various organisms are highly conserved. Blocks of sequence homology containing invariant amino acid residues are observed when several glyceraldehyde-3-phosphate dehydrogenase sequences are aligned. The most conserved region is located at the catalytic site containing the essential Cys-152. A block of 12 invariant amino acid residues is contained in this region in sequence comparison among five organisms (22). It is likely that sequence homology also exists on the DNA level, and it would be possible to use a heterologous Gapdh gene sequence as a probe to isolate the Drosophila Gapdh gene by crosshybridization. The feasibility of this approach has recently been demonstrated by Musti et al. (23). We had used a combination of full length human and rat Gapdh cDNA (24) as hybridization probes to screen a Charon 4A-D. melanogaster Canton-S genomic library for the Gapdh gene in this organism. Fifty thousand recombinant phages, which correspond to about five times the genome size, were screened by the plaque hybridization technique under conditions of low hybridization stringency. Several putative positive signals were observed in our primary screening, but only one positive recombinant phage was isolated upon secondary screening. The isolated phage was designated XDmGAPl. It can crosshybridize to either human or rat Gapdh cDNA under conditions of low hybridization stringency. To further localize the Gapdh gene, DNA from XDmGAPl was digested with various restriction enzymes, and the resulting restriction fragments were subjected to Southern blot analysis. One of the hybridizing restriction fragments, a 3.2-kb XbaI-Hind111 fragment, was isolated and subcloned into plasmid pUC13 for detailed restriction mapping and sequence analysis.
Drosophila Gapdh-1 Sequence Analysis-To ascertain whether XDmGAPl contains a Gapdh gene, we used the shotgun DNA-sequencing method (25) to locate and sequence the coding region of Gapdh in this recombinant phage. A 3.2kb XbaI-Hind111 fragment derived from XDmGAPl was digested separately with restriction enzymes Sau3Al and HpaII, and the resulting fragments were cloned, respectively, into the BamHI site and the AccI site of M13 phage mpl0. M13 recombinant phages that hybridized to the rat cDNA probe were then picked and amplified to isolate the single-stranded DNA for sequencing by the dideoxynucleotide chain termination method. Using this approach, we have determined from nine independent shotgun clones nearly the entire coding sequence of the Drosophila Gapdh gene contained in XDm-GAP1. Concurrently the detailed restriction map of the 3.2kb XbaI-Hind111 fragment was determined. The restriction map and the initial shotgun sequence data allowed us to select for specific restriction fragments for further sequencing to extend and complete our analysis to include the flanking regions of the Gapdh gene. A total of about 2.4-kb DNA sequence starting from the XbaI site to the most distal ClaI site was determined. Results of the restriction mapping and the sequencing strategy are summarized in Fig. la. More than 98% of the sequence was obtained from both strands of the DNA and with sufficient overlaps to minimize any sequencing mistakes. The coding sequence and the 5' and 3' flanking sequences of this D. melanogaster Gapdh gene, termed Gapdh-1, are shown, respectively, in Figs. 2,3, and 4. Assuming that the initiating methionine residue is removed after translation, the gene would code for an enzyme of 332 amino acid residues, the sequence of which is homologous to glyceraldehyde-3phosphate dehydrogenase from other organisms. Interestingly, no intervening sequence was found in the coding region of Gapdh-1.
Southern Blot Analysis of the Drosophila Genomic DNA-To determine the number of Gapdh genes in Drosophila, a restriction fragment from the Gapdh-1 coding region was used as a probe to examine the genomic complexity of Gapdhrelated sequences. Restriction digests of chromosomal DNA from D. melanogaster Oregon-R strain were electrophoresed on an agarose gel, transferred to nitrocellulose, and hybridized to a 32P-labeled 600-bp BglI-BglI fragment (nucleotides 36-635, Fig. 2) from the coding sequence of Gapdh-1. The result of this experiment is shown in Fig. 5b. For each digest, two restriction fragments were observed to hybridize with the probe. In all cases the lower hybridizing fragment has the molecular weight corresponding to that predicted by the Southern analysis of hDmGAP1. The additional bands seen in the genomic blot, however, could not be due to partial restriction digestion since use of excess enzymes failed to convert them to other forms. A more plausible explanation for the presence of these additional hybridizing fragments is the existence of a second Gapdh gene in the Drosophila genome.
Isolation of the Drosophila Gapdh-2"To isolate the second Gapdh gene we rescreened the D. melanogaster Canton-S a.  Fig. 1) under conditions of low hybridization stringency to isolate as many Gapdh gene-related sequences as possible. After screening 200,000 plaques, we isolated 10 recombinants that hybridized to the Gapdh gene probe. Nine of these isolates contain the same hybridizing EcoRI fragments as in XDmGAP1, and one contains a variant shorter EcoRI fragment of Gapdh-1 generated by random shearing. We were thus unable to isolate the second Gapdh gene in this library.
We next screened a D. melanogaster Oregon-R pupae cDNA library using the same approach and were able to isolate one recombinant phage that hybridizes to the coding sequence of Gapdh-1 after screening about 50,000 plaques. This phage contains a 1.1-kb cDNA insert but lacks many of the unique restriction sites that are in Gapdh-1. DNA sequence analysis of this cloned cDNA insert revealed that it contains a truncated Gapdh-coding sequence that codes for amino acid residues 86-332 and a 3' noncoding sequence of 328 bp before polyadenylation. The coding sequence is similar but distinct from that of Gapdh-1 in many positions, and no sequence homology was observed at the 3' noncoding region. The cDNA clone is thus likely encoded by a second Gapdh gene seen in the Southern blot in Fig. 56. This conclusion is confirmed by genomic Southern blot analysis using the 3' noncoding sequence of the cDNA clone as a probe. Only one fragment corresponding to the additional band seen previously in Fig.  5b was observed for each digest. Likewise, when the 3' noncoding sequence of Gapdh-1 was used as a probe for genomic blot only one hybridizing band was observed. The 3' noncoding sequences are thus specific for each Gapdh gene (Fig. 5, a  and c).
We finally screened a newly constructed D. melanogaster Oregon-R genomic library? Only two recombinant phages that hybridized to the 3' noncoding sequence of the second Gapdh gene cDNA were obtained after screening 100,000 plaques. On the other hand, 15 recombinant phages that hybridized to Gapdh-13' noncoding sequences were obtained for the same size screening. The second Gapdh gene, termed Gapdh-2, is thus under-represented in both the Canton-S and Oregon-R genomic libraries. Restriction and Southern anal- yses of the Gapdh-1-containing phages isolated from the Oregon-R library showed, in most cases, no restriction polymorphism when compared to XDmGAP1, which was isolated from the Canton-S library. One of the recombinant phages containing Gapdh-2 was selected for further studies; it is designated XDmGAP2. A 6-kb EcoRI-EcoRI fragment containing the Gapdh-2 gene was isolated from this phage and subcloned into plasmid pUC13 for further restriction mapping and sequence analysis.
Drosophila Gapdh-2 Sequence Analysis-We used the same approach to locate and sequence the Gapdh-2 as described above for Gapdh-1. A total of 2.7 kb of DNA sequence starting from the 5' EcoRI site to a ClaI site distal to the gene was determined. Results of the restriction mapping and the sequencing strategy are summarized in Fig. lb. The coding sequence of Gapdh-2 is shown along with that of Gapdh-1 in Fig. 2. Only those nucleotides that differ are shown. Again, no intervening sequence is found in the Gapdh-2 coding region, which codes for a protein product of the same chain length as does Gapdh-1. The 5' and 3' flanking sequences for this gene are given in Figs. 6 and 7, respectively. No squence homology is observed when the flanking sequences of the two genes are compared.
Northern Blot Analysis of Gapdh mRNA-To estimate the size and number of the Gapdh transcripts, we analyzed the mRNA by the Northern blot method. Total and poly(A+) RNA from adult flies were electrophoresed on a formaldehyde gel, blotted on nitrocellulose, and probed with a 32P-labeled fragment. The probe used was a 1.4-kb XbaI-PuuII fragment containing all the coding sequences of Gapdh-1 from ADm-GAP1. It hybridized to a RNA species of 1.6 kb in both cases ( Fig. 8). This species, however, could be resolved into two bands of similar intensities upon prolonged electrophoresis. Using the 3' noncoding sequences from two genes as specific probes, we confirmed that the upper band is the transcript of Gapdh-2 and the lower band Gapdh-1 (data not shown).  Fig. 9a), and the * under each nucleotide represents sites of transcription initiation as determined by S1 nuclease mapping (see Fig. 9b). bp DdeI-ClaI fragment spanning the 5' noncoding and the coding region of the gene (nucleotide 408 in Fig. 3 to nucleotide 62 in Fig. 2). This small fragment was uniformally labeled and then hybridized to the isolated Drosophila adult mRNA and extended to the 5' cap site of the Gupdh mRNA by reverse transcriptase. The same fragment was also used as primer for chain-termination sequencing from a cloned template. The synthesized cDNA and the sequencing reaction product were then electrophorised on a 8% polyacrylamide-urea gel. Since all end products of both reactions shared a common 5' end, the exact length of the cDNA synthesized could then be accurately determined by comparing to the sequencing ladder. Results of the primer extension experiment for Gupdh-1 are shown in Fig. 9u. The 68-bp DdeI-ClaI fragment primed the synthesis of only two major extension products that are 124 and 125 bases long, respectively. The formation of a pair of extension products differing in length by one base has been reported in several cases (28). The doublets had been observed a. b.  B, BamHI; S, SstI; E , EcoRI; and X, XbaI. even in cases where it is known that the transcripts used had only one 5' end (29). Luse et ul. (28) suggested these doublets are artifacts of the reverse transcriptase reaction related to the cap structure. If this is the case, the position of the lower band would correspond to the true mRNA start site. When compared to the sequencing ladder the lower band corresponds to an A residue, 62 bases from the translation initiation codon ATG, on the noncoding strand (Fig. 3). Other than these doublets no other major extension product resulted from the cross-hybridization of the primer to Gupdh-2 mRNA was obtained.

C. B S E X B S E X B S E X
The result of the primer extension experiment was confirmed by S1 nuclease mapping. The DdeI-ClaI fragment was again used, this time to synthesize a 32P-labeled coding strand probe up to the X6uI site in the noncoding region of Gupdh-1 (464 bases). The synthesized probe was hybridized to the Drosophila mRNA, and the unhybridized single-stranded region of the probe was then cut with the S1 nuclease. The digested product was electrophoresed along with the sequencing reaction products for molecular weight standards on an 8% polyacrylamide-urea gel. As shown in Fig. 96, a cluster of fragments ranging from 124-127 residues long were protected by Gupdh-1 mRNA. The molecular weight of these fragments indicated the mRNA start site is about 62-65 bases from the translation initiation codon, a region that is within a range of 4 bases from the site determined by primer extension. The presence of multiple protected fragments was probably due to nondiscrete digestion of S1 nuclease at the protected end rather than multiple transcription initiation of Gupdh-1. Since Gdpdh-1 and Gupdh-2 share little sequence homology beyond the coding sequences, the protected fragment could not have resulted from cross-hybridization of the Gupdh-2 mRNA and the probe used.
A similar strategy was used to map the mRNA start site of Gupdh-2. In this case, we used a 109-bp HpuI-ClaI fragment spanning the 5' noncoding and coding region of Gupdh-2 (nucleotide 1053 in Fig. 6, nucleotide 105 in Fig. 2). This The translation initiation codon ATG, the TATA box, and the CAAT box are boxed. Transcription initiation sites as determined by primer extension experiment are indicated by V. The intron in the 5' noncoding region is bracketed. The * under each nucleotide represents 3' intron splicing sites as determined by S1 nuclease mapping. The short pentanucleotide repeats in the promotor region are underlined by broken lines, and the 12-bp imperfect direct repeats are underlined by arrows. The wavy lines indicate the dc and imperfect cd sequences. fragment was used either as a primer to synthesize a cDNA complementary to Gupdh-2 mRNA or coding-strand probe up to the XhoI site 5' to the coding sequence (319 bases long) of Gupdh-2 for S1 nuclease protection. The primer extension experiments gave four discrete cDNA products (Fig. 10) which mapped the 5' noncoding sequences of the mRNA to be 44, 48, 49, and 62 bases long, respectively. .The last species is tentatively identified as an artifact which resulted from crosshybridization of the primer to the Gupdh-1 mRNA, since its length is exactly that predicted for such a case. Whether the smallest molecular weight species represents one of the true mRNA start sites or a premature extension product has not been determined. But the doublets in Fig. 10u are most likely the extension products to the cap structure as discussed above. A different result with regard to the transcription start site was obtained with S1 nuclease mapping experiments. This method mapped the mRNA start site to a cluster of residues around 18 bases from the ATG codon (Fig. lob). The discrepancy between the two methods of mapping could be explained by the placement of an intron in the 5' noncoding region of Gupdh-2. The S1 nuclease in this case did not cut the probe at the mRNA start site but rather at the intron junction where mismatch occurs between the spliced mRNA and the probe.  (20 pg) were electrophoresed on a 1% agarose-formaldehyde gel, transferred to nitrocellulose paper, and probed with a nick-translated 1.4-kb XbaI-PuuII fragment from XDmGAP1. Similar results were obtained when the 3' noncoding sequences from each gene were used as probes.
Indeed, a consensus 3' splicing signal is located where the S1 mapping indicated. The sequence at the putative splicing junction is CAGlAU (nucleotides 1036-1040, Fig. 6) and is preceded by a stretch of pyrimidine- rich residues (30, 31). Moreover, an internal signal sequence that is conserved in the introns of Drosophila pre-mRNA, CTAAT, can also be found 26 bases from the putative 3' splicing site (32,33). The region where the S1 nuclease method mapped thus contains all the characteristics of the 3' intron splicing site. We had searched for the 5' intron splicing signal in a sequence of about 1 kb upstream and were able to locate only one such site. The sequence, CAGJGT (nucleotides 606-611, Fig. 6), is located 430 bases from the putative 3' splicing site and agrees well with the consensus 5' splicing signal ZAGIGT (30). Since the primer extension experiments indicated the spliced mRNAs have 5' noncoding sequences of 46,48, and 49 bases long, the transcription start sites are thus located about 28, 30, and 31 bases from the spliced junction (Fig. 6). A consensus TATA box sequence, TATATAT, can be found at position S1 nuclease mapping. a, a 32P-labeled 68-bp DdeI-ClaI fragment from Gapdh-1 was used as a primer (wauy lines) for cDNA synthesis after hybridized to total fly RNA (IO pg). The synthesized cDNA was electrophoresed on an 8% polyacrylamide-urea gel along with sequencing reaction products initiated with the same primer. b, a 32P-labeled XbaI-ClaI single stranded fragment was used as S1 nuclease mapping probe (open and closed area). Total fly RNA (10 pg) was hybridized to the probe and treated with SI nuclease (100 units/ml). The RNA-protected portion of the probe (closed area) was electrophoresed on an 8% polyacrylamide-urea gel with the sequencing reaction products mentioned in a as molecular weight markers. The numbers indicated by arrows represent fragment sizes in bp. about -38 from the putative transcription sites. In addition, a CAAT box can also be found at position -90. Gapdh-2 thus contains promoter structure characteristics of the RNA polymerase I1 transcribed gene. Examination of the sequence upstream of the transcription initiation site of Gapdh-1, however, failed to reveal any of these features. S1 nuclease mapping was also performed to map the 3' end polyadenylation site of the Gapdh-1 mRNA. Mapping of the Gapdh-2 mRNA 3' end was not necessary since sequencing of the Gapdh-2 cDNA already indicated that the polyadenylation site was at the base number 331 (Fig. 7) after the stop codon. A coding strand probe covering the sequence between the PvuII and ClaI site (nucleotides 38-728, 690 bases in length, Fig. 4) in the 3' noncoding region of the Gapdh-1 gene was used as a probe for S1 nuclease digestion. The length of the probe protected by mRNA from S1 nuclease digestion was determined to be around 175 bases (Fig. 11). The polyadenylation site of Gapdh-1 mRNna was calculated from this result to be around 213 bases from the stop codon (Fig. 4). Polyadenylation signal sequences, AATAAA, were observed about 62 and 15 bases upstream from the polyadenylation site, but the presence of the unique 3' end of the Gapdh-1 transcript suggests that only one of them is effective as a signal. On the other hand, multiple mRNA species had been reported because of the existence of several of these signals in the 3' noncoding region of rat cytochrome c gene (34). The same signal can be found in the noncoding sequence of Gapdh-2 gene 50 bases away from the polyadenylation site.

DISCUSSION
Based on the results presented in this report, D. melanogaster contains two Gapdh genes coding for two fully func- FIG. 10. Mapping transcription initiation site for Gupdh-2 by (a) primer extension and (b) S1 nuclease mapping. a, a "Plabeled 109-bp HpaI-ClaI fragment from Gapdh-2 was used as primer (wauy line) for cDNA synthesis. b, a 32P-labeled XhoI-ClaI singlestranded fragment from Gapdh-2 was used as a SI nuclease mapping probe (open and closed area). Experiment conditions are the same as described in the legend of Fig. 9. The numbers indicated by arrows represent fragment sizes in bp. tional glyceraldehyde-3-phosphate dehydrogenase enzymes. Existence of isoenzymes for this protein in different organisms was investigated by Lebherz and Rutter (35). They reported that in several organisms multiple forms of glyceraldehyde-3-phosphate dehydrogenase can be found and some are expressed in a tissue-specific manner. Except for yeast, however, this phenomenon seems to be restricted to species that had ploidy changes in their evolutionary history. In a polyploidy genome, multiple copies of the same gene could diverge and give rise to different isoenzymes (36). In organisms where polyploidy does not exist, the presence of isoenzymes for glyceraldehyde-3-phosphate dehydrogenase is rare. The insect honeybee, for example, has only one form of glyceraldehyde-3-phosphate dehydrogenase (38). In Drosophila it appears that the two genes have arisen by a duplication event. Translocation of one of the genes to another chromosome could have occurred either concomitantly or subsequently.
The time of divergence for the two genes can be estimated based on the number of amino acid residue substitutions. Glyceraldehyde-3-phosphate dehydrogenase is a highly conserved protein that evolved with a UEP (unit evolutionary period, defined as the time in million years required to change 1% of the sequence information between the two lines) of about 24 (23,37).
The derived primary structure of the two isoenzymes is homologous to each other (97.5%), having only 8 amino acid residue substitutions for a mature chain length of 332 residues. Based on this comparison, the two genes diverged approximately 60 million years ago, which was long before the speciation of Drosophila (38). It is thus likely that most of the Drosophila species contain two Gapdh genes. The conservation of nucleotides for the two genes in the coding region is much lower (89%). Many of the nucleotide changes can be  (open and closed area). After it hybridized to the fly total RNA (10 pg) the hybrid was digested with S1 nuclease (100 units/ml), and the RNA-protected portion of the probe was run on a 5% polyacrylamide-urea gel. The probe without S1 treatment was also run on the gel. HinfI -cut pBR322 fragments were used as molecular weight markers ( M W ) . The size of the probe is shown in the lane marked -SI and the protected fragment size is shown in the lane marked +SI. The size of fragments are indicated by the number in bP.
attributed to synonymous substitutions that do not result in amino acid replacement. When nucleotide changes are divided into synonymous and replacement substitutions, we found there are 99 for the former and only 10 for the latter. The corrected percentage of divergence at the synonymous sites when calculated according to Perler et al. (39) is 60%, which corresponds to a UEP of 1 or a mutation rate of 5 X lo-' substitutions/synonymous site/year. This fast rate of change is identical to the uniform rate determined by Miyata et al. (40) for the synonymous codon substitutions in the mammalian genome. Immediately 5' and 3' to the coding regions, DNA sequences between the two genes have diverged so substantially that homology can no longer be detected. Because of this lack of homology it is not possible to estimate the unit that was copied in the ancient duplication or translocation event. The drastic changes are most evident in the 5' noncoding regions. Here the two genes differ not only in the structures of the putative promoter but also in the placement of an intron. In this region Gapdh-1 lacks the "TATA" box at the -25to -30-bp region that is characteristic of many pol I1 (RNA polymerase 11) promoters. Although there are pol I1 promoters that lack the TATA box (41), these are exceptions rather than the general rule. Based on several lines of evidence (42) the role of the TATA box seems to position the RNA polymerase for accurate initiation. As Gapdh-1 has very discrete transcription initiation, there must be an analogous sequence that serves this function. The Gapdh-2, on the other hand, contains a consensus TATA box sequence, TATATAT, at -33 bp from the transcription initiation site. Farther upstream at -85 bp there is another sequence that has been conserved in some organisms, although it is absent in most if not all Drosophila pol I1 promoter. The sequence, CCAAT, is sometimes known as the CAAT box. The exact role of this sequence in promoter function is not clear.
Also upstream of the TATA box, there are six nontandem direct repeats of a short pentanucleotide, TATGT, in the -55to -165-bp region of the Gapdh-2 promoter. This arrangement of short direct repeats is similar to that of the SV40 early promoter in the -30to -100-bp region, although the repeats in that case are G/C rich. Deletions in such regions in the SV40 sequence affect its transcription, indicating that it is important for efficient transcription (43). Two of the short direct repeats in Gapdh-2 fall into tandem but imperfect repeats of 12 bp that are positioned at -100 to -124 bp. In the case of the /3-globin promoter, a pair of imperfect repeats at similar positions had been determined to be crucial in gene expression (44). A sequence element ATTTGCAT (dc) was also found in -132 in Gapdh-2. This element or its inverted repeat sequence (cd) have been found upstream of human and mouse immunoglobulin variable region genes and they are needed for correct transcription (45). The same elements are present in a chicken ovalbumin gene, sea urchin histone gene cluster (45), and Drosophila homoeotic gene ftz (46). An imperfect homologous cd sequence ATGAAAT is also found in -179-bp region. Neither of these elements nor the nontandem and tandem repeats can be found in the Gapdh-1 5' flanking region. We do not know whether the drastic differences in the 5' nonciding regions of the two genes reflect differences in gene functions. Developmental profile and tissue specificity studies on their expression will be useful in answering this question. A long-term goal of this investigation will be to identify sequence information that is essential for the Gapdh expression. It should be mentioned that in an independent study, Sullivan et al. (47) have demonstrated the presence of two glyceraldehyde-3-phosphate dehydrogenase isoenzymes in D. melanogaster, and they too have mapped the two Gapdh genes in locations similar to those mentioned in this report using isolated recombinant phages as probes.