Orangutan Fetal Globin Genes NUCLEOTIDE SEQUENCES REVEAL MULTIPLE GENE CONVERSIONS DURING HOMINID PHYLOGENY*

We have determined the nucleotide sequences of the linked y'- and y2- fetal globin genes from a single orangutan (Pongo pygmaeus) chromosome and compared them with the corresponding genes of other sim- ian primates (7'- and y2-genes of human, chimpanzee, gorilla, and the single y-gene of the spider monkey). Previous studies have indicated that the two y-gene loci in catarrhine primates resulted from a duplication about 25-35 million years ago. However, comparisons of aligned y-gene sequences show that these genes con- tain three regions with distinct histories of which only the 3' third clearly reflects the ancestral nature ex- pected of the y-gene duplication. To explain these different evolutionary histories and also hominid rela- tionships we provide evidence for the occurrence of sequence conversions which affect region 1 (120 base pairs 5"flanking through exon 2) in all hominid species and extend to varying degrees into region 2 (intron 2 through exon 3). Close examinations of the proposed conversions further suggest that 12 of the 13 conver- sions identified involved y' converting y2. Polarity of these conversions may be a result of differential sur- vival between these genes because during human fetal development the 7'-gene is preferentially expressed over the y2-gene and it may be subjected to greater selection pressure to remain unaltered. The @-type globin genes show conserved chromo-somal

' The abbreviations used are: kb, kilobase pairs; bp, base pairs. 1980; Barrie et al., 1981;Zimmer, 1981;Scott et al., 1984;Goodman et al., 1984;. Evolutionary analyses indicate that the gene arrangement, 5'-c-y-g-6-fi-3', evolved by a series of tandem duplications which occurred between 200 and 100 million years ago in the lineage to basal eutherians (Czelusniak et al., 1982;Goodman et al., 1984). The ?-gene became a pseudogene in the early primates; however, the four other loci remained functional within Anthropoidea (Goodman et al., 1984;Menon and Lingrel, 1986). The new world monkeys Ateles geoffroyi (spider) and Aotus triuigatus (owl) show only a single y-locus (Barrie et al., 1981;Harris et al., 1984;Giebel et al., 1985). The duplicated 7'-and yz-genes of catarrhines arose by a recombinational event about 35-25 million years ago in basal Catarrhini after divergence of Platyrrhini, but prior to the divergence of superfamilies Cercopithecoidea (old world monkeys) and Hominoidea (Barrie et al., 1981;Shen et al., 1981). This recombinational event appears to have been an unequal crossing-over resulting in the duplication of about 5 kb of DNA (Shen et al., 1981).
After duplication of the ancestral y-gene locus other recombinational events occurred in different species lineages between parts of the homologous duplicated sequences. Certain of these events have resulted in either the expansion or contraction of the y-gene locus by additional unequal crossing-over (occurring outside the structural gene regions) or by the exchange of structural gene sequences by way of gene conversion. Both unequal crossing-over and gene conversion share a common recombinational mechanism (Radding, 1982). In some cases, the outcome of recombination events have survived in a species population thereby allowing its characterization. For example, individual human chromosomes containing triple or quadruple y-genes as a result of unequal crossing-over events have been identified (Hill et al., 1986). Also in humans, nucleotide sequence comparison of paralogous yl-and y2-genes has shown that a gene conversion event resulted in a nonreciprocal transfer of 7'-sequences to the y2-locus .
In the case of y-gene evolution, the effects of gene conversions can be evaluated by comparing the nucleotide sequences of orthologous and paralogous y-genes in closely related species. Previously, we determined the nucleotide sequence for the linked fetal globin genes of human, chimpanzee, and gorilla and by using the parsimony method identified sequence regions within the y-loci which appear to have been involved in gene conversion events. Comparing the human and gorilla y-gene sequences enabled us to identify three possible human gene conversion events labeled C2-C4 (Scott et al., 1984) of which C4 represents the conversion originally described by Slightom et al. (1980). Similar analysis with the chimpanzee y-gene sequences revealed three possible chimpanzee, C5-C7, and one gorilla, C8, conversion regions . The finding that a stretch of simple-sequence DNA, (TG),, is located at the 3' boundary of conversion regions (22, (24, and C6 and at the 5' boundary of C5 supports the suggestion that this (TG), element may provide a "hot spot" for strand breakage and transfer between the paralogous y-genes Kilpatrick et al., 1984). However, conversion regions C7 and C8 in chimpanzee and gorilla, respectively, encompass exons 1 and 2, intron 1, and only the 5' boundary of intron 2 and thus do not appear to be associated with the hot spot element . The polarity of these nucleotide sequence transfers appears not to be random because all regions except C5 have -$-gene sequences (donor sequences) superimposed onto y'-gene sequences (acceptor sequences). Conversion C5 has this polarity reversed . In addition to identifying species-specific conversions, our previous evidence (a less-than-expected divergence between y'-and y2-intron 2 sequences) suggested that a conversion event (Cl) occurred in a late common ancestor of Homo, Pan, and Gorilla.
The present study extends our previous findings. We have now determined the nucleotide sequences of the orangutan 7'-and 7'-fetal globin genes and added these sequences to our evolutionary analysis. For our nearest outgroup, we have also included the sequence of the single y-gene of spider monkey (Giebel et al., 1985). As in Homo, Pan, and Gorilla the 5' 600 bp of the two orangutan y-genes are more similar to each other than to orthologous positions in other species, whereas the reverse relationship is found for the 3"flanking DNA. Thus, two different evolutionary histories are indicated for the 5' and 3' regions of these genes. In addition to this orangutan 5"conversion region, labeled C11, two downstream conversion regions, C12 and C13, are indicated by shared sites between orangutan yl-and y2-loci. Moreover, perhaps because of having used a closer outgroup, our evidence now suggests that additional conversions, C9 and C10, may have occurred in gorilla. The present data continues to support the occurrence of a conversion, C1, in a common ancester of Homininae (Homo, Pan, and Gorilla) and further suggest that a conversion, labeled CO, occurred in a late common ancestor of Ponginae (Pongo) and Homininae. In sorting out this history of primate gene conversion events, our evolutionary analysis supports genealogically classifying extant Hominidae into these two subfamilies with much later divergence times for Homininae genera than for the Hominidae subfamilies, 6-8 million years ago compared to 12-16 million years ago (Koop et al., 1986a(Koop et al., , 1986b.

EXPERIMENTAL PROCEDURES
EcoRI, HindIII, Sau3A1, StuI, XbaI, and XhoI were obtained from Materials-Restriction endonucleases; ApaI, AccI, BarnHI, BglII, either Promega Biotec or New England Biolabs. Polynucleotide kinase was obtained from Pharmacia and bovine alkaline phosphatase was obtained from Boehringer Mannheim. Radioactive nucleotides [a-32P]ATP (2000-3000 Ci/mM; 1 Ci = 3.7 X 10'" Bq) and T4 ligase were obtained from New England Nuclear. Chemicals used for DNA sequencing were obtained from vendors recommended by Maxam and Gilbert (1980). X-ray film rolls (20 cm X 25 m; XAR351) and sheet film (35 X 45 cm; XAR5) were obtained from Kodak. Intensifying screens (Quanta 111: 35 cm X 1 m) were obtained from Du Pont. Nitrocellulose paper (BA-85) was obtained from Schleicher and Schuell and 3MM paper from Whatman. DNA sequencing gel stands and safety cabinets were obtained from Fotodyne, Inc.
DNA Isohtion and Gene Cloning-Total DNA was isolated from liver removed from two nonrelated orangutans; orangutan YO-1 (Yerkes Primate Center, Emory University, Atlanta, GA) DNA was a gift from E. Zimmer (Louisiana University, Baton Rouge, LA) and orangutan NZO-1 (National Zoo, Washington, D. C.) DNA was isolated using the method described by Blin and Stafford (1976). Purified DNAs were partially digested with either EcoRI (YO-1) or S a d A I (NZO-1) and DNA fragments of 15-20 kb were size selected on 5-2076 NaCl gradients . Size-selected DNA fragments were cloned into the appropriate A vector arms, EcoRI-cut fragments of YO-1 DNA were cloned into EcoRI-cut Charon 32 arms and Sau3AI-cut fragments of NZO-1 DNA were cloned into BamHIcut Charon 35 arms (Loenen and Blattner, 1983). Recombinant phage DNAs were packaged into phage capsids using the in vitro phagepackaging procedure described by Hohn (1979). Both recombinantphage libraries, a total of 1-2 X IO6 phage from each, were screened by plating on the RecA-Escherichiu coli host ED8767 (Murray et al., 1977) followed by nitrocellulose blotting as described by Benton and Davis (1977). Nitrocellulose filters were hybridized against a 32Plabeled AuaII -EcoRI fragment (245 bp) cut from the human y cDNA clone pJW151 (Wilson et al., 1978). Clones containing regions of the orangutan @-globin gene cluster were purified and phage DNA isolated as described previously . Two recombinant phage clones were isolated from orangutan YO-1; PpyCh32-13.5 and PpyCh32-14.2, which contain the linked fetal globin and the linked GBand &globin genes, respectively. From screening the Charon 35 recombinant phage library of orangutan NZO-1 we obtained clones . Both clones contain the linked fetal globin genes; however, clone PpyCh35-19.5, also contains the y2-and J.t)-globin gene linkage. The 7.25-and 2.64-kb EcoRI fragments containing the orangutan 7'-and +fetal globin genes, respectively, were isolated from PpyCh32-13.5 and cloned into pBR322. The 4.8-kb BamHI fragment (7'-intron 2, intergenic DNA, and 5' third of the y2-gene) of PpyCh32-13.5 and 2.8-kb EcoRI fragment (y'-exon 3 and, 3"untranslated and 3"flanking) of PpyCh35-19.5 were cloned into the respective sites of pUC8. Transformation of E. coli K12 strain HBlOl was done using the extended calcium shock method described by Dagert and Ehrlick (1979), the resulting clones were designated pPpy13.5-R7.25, pPpy135R2.64, pPpy13.5-B4.8, and pPpy19.5-R2.8. Plasmid DNAs were purified using the alkalineextraction procedure described by Birnboim and Doly (1979), followed by two CsC1-ethidium bromide gradient bandings in type 70.1 Ti rotor (fixed angle rotor).
DNA Sequencing-Chemical sequencing was performed as described by Maxam and Gilbert (1980). Generally, 10-20 pg of plasmid or 50 pg of A phage DNAs were digested with the appropriate restriction enzyme and end-labeled. The conditions and type of DNA sequencing gels used have been described previously . Pouring of gels (20-cm wide, 104-cm long, and 0.2mm thick) was improved using a technique developed by Frederick R. Blattner (University of Wisconsin, Madison, WI) which utilizes a hole Mth-inch diameter) drilled into the face plate. This hole is located 1 inch from the bottom and 1 inch from one side of the face plate. After assembly, gels are poured by laying them horizontally on a bench, fitting the Yath-inch hole with the barrel of a 50-ml syringe, followed by pouring the acrylamide gel solution into the syringe. The gel mold will fill by gravity flow, but if necessary the acrylamide solution can be forced into the mold by using the syringe plunger. After filling, the comb is set into place, the bottom spacer is removed, and the 'hth-inch hole is plugged with grease. This method of pouring large gels avoids the necessity of lifting the gel (during the pouring process), is much faster, and reduces the risk for bubble formation. Efficiency of these DNA sequencing gels was increased by using wedge-shaped spacers (Olsson et al., 1984) obtained from C. B. S. Scientific Co., Inc. The spacers measure 60-cm long with thickness ranging from 0.2 (top) to 0.6 mm (bottom), a 0.2-mm spacer was used for the remaining (upper) 44 cm. These gels are referred to as "bellbottom gels." A typical 4% acrylamide bellbottom gel is run until the xylene cyano1 dye is 65 cm from the top, nucleotide sequence reads start at about 50 bp (from the end-labeled site) and are generally readable out to about 550 bp. Nucleotide sequencing reads greater than 600 bp can be obtained by using a 4% acrylamide gel fitted with uniform 0.2-mm spacers and running the dye to the bottom. Nucleotide sequences near the labeled site can be obtained by running a 16% acrylamide gel (30-cm long) until the dye is 8 cm from the top. The capacity of these gels can be increased by using a comb with 3mm slots (International Biotechnologies Inc.) which allows 32 loads to be made across a 20-cm-wide gel. Thus from a single 4% bellbottom gel loaded once with eight different sequenced fragments, as many as 4000 bp can be read.
Gene Conuerswn and Euolutionary Analysis-Using the principle of minimum change or parsimony as described in previous studies (Scott et al., 1984;Goodman et al., 1984;Slightom et al., 1985), we aligned the nucleotide sequences of 10 fetal globin genes (four 7'genes and five yz-genes of the four hominids and the single y-gene of the spider monkey Ateles geoffroyi) against one another (from 154 bp 5' of the cap site to 170 bp 3' of the poly(A) site) and then determined independently at each variable position the most parsimonious Restriction-enzyme site maps of X clones containing regions of the orangutan &type globin gene cluster. The top line shows the organization of the human ' y -, ' y -, $q, and &globin genes; raised bars denote the location of genes. The location of various restriction-enzyme sites found in the human @-globin gene cluster are shown (Shen et al., 1981;Collins and Weissman, 1984;Koop et al., 1986a). Insert regions from orangutan recombinant X clones PpyCh32-13.5 and -14.2 (from orangutan YO-1) and clones PpyCh35-16.8 and -19.5 (from orangutan NZO-1) have been mapped and are shown below the corresponding regions of the human gene cluster. Many restriction-enzyme sites are shared between the human and orangutan 8-globin gene regions, see text. The asterisk above Hind11 sites indicate that they are polymorphic (Jeffreys, 1979). Restriction enzymes are: Ba, BamHI; Bg, BglII; H, HindIII; and R, EcoRI. Subclone regions of the orangutan 7-globin genes and strategies used to determine their nucleotide sequences. Plasmid subclones pPpy13.5-R7.25, -R2.6, and -B4.8 are from X clone PpyCh32-13.5 and pPy19.5-R2.5 is from X clone FpyCh35-19.5. Horizontal arrows shown below the cloned regions denote the restrictionenzyme site labeled with "P, the direction, and distance sequenced. Restriction enzymes are the same as shown in Fig. 1 with the addition oE Ac, AccI; Ap, ApaI; Bs, BstEII; St, StuI; and X, XbaI.
branching arrangement. At each variable position where two y-genes of a hominid taxon shared a sequence identity that the parsimony solution depicted as derived rather than primitive, the result was taken as evidence for a putative conversion encompassing that position in the yl-and y2-gene pair. Conversely, absence of sequence identity could be taken, depending on the parsimonions branching arrangement, ~E I evidence against conversion for that gene pair at that position. On synthesizing the results of all the separate parsimonious solutions, we identified gene conversion regions by minimizing the number of conversion events needed to account for the overall most parsimonious solution. This procedure also resulted in the reconstruction of the course of evolution for the nine hominid fetal globin genes.

RESULTS AND DISCUSSION
Restriction Maps of Orangutan y'-, y2-, $7-, and &Globin Gene Regions-From screening the orangutan Ch32 library, YO-1, with 32P-labeled human y-globin probe prepared from the cDNA clone pJW151 (Wilson et al., 1978) we isolated two y-gene hybridizing X clones, PpyCh32-14.2 and PpyCh32-13.5. Detailed restriction enzyme site mapping showed that clone PpyCh-32-14.2 contains the orangutan $0-to 6-globin gene linkage and that clone F'pyCh32-13.5 contains the linked yl-and y2-fetal globin genes (Fig. 1). The nucleotide sequence of this orangutan $v-globin gene has already been reported (Koop et al., 1986a). The linkage between the yz-to $?-globin genes was not obtained from this screening. We subsequently screened the second orangutan library, NZO-1, and isolated clones . Both of these clones contain Iinked yl-and y2-fetal globin genes and clone Ppy-Ch35-19.5 contains the linkage region between the y2-and $?-genes (see Fig. 1).
The EcoRI fragments containing the yl-and y2-fetal globin genes, from X clone PpyCh32-13.5, were isolated and subcloned into pBR322 yielding clones pPpy13.5-R7.25 (7'-gene) and pPpy13.5-R2.64 (y2-gene) (Fig. 2). Restriction enzyme site mapping reveals that most of the sites found in and around the human fetal globin genes are also present in these orangutan genes. Notable exceptions include the absence of EcoRI sites in the 3'-flanking DNAs of each y-gene (the EcoRI fragments 3' of yl and y2 genes are 2.2 and 2.8 kb, respectively). The XhoI site in each intron 2 is also absent, as is the case in chimpanzee fetal globin genes . One additional HindIII site is located in the 3'untranslated region of the orangutan y'-gene (See Fig. 1).

6.
""""-+""""-+""""-*""""*""""-+""""-~""""-*""""~*""""~+" nucleotide sequence information was obtained from the gorilla (Scott et al., 1984) and chimpanzee  fetal globin genes so that the CCAAT-element could be added to our analysis. Although most of the orangutan ~l -and y2gene sequences presented in Fig. 3 were obtained from the same chromosome of orangutan YO-1, sequences extending 3' from the EcoRI site in exon 3 of the y2-gene were obtained from a different orangutan chromosome (NZO-1). Because these two y2-gene region clones differ in their intron 2 nucleotide sequences (data not shown) they certainly must represent different orangutan chromosome alleles. Nevertheless, our previous studies of this 3' noncoding and flanking DNA region indicate that the use of a "hybrid" y2-gene nucleotide sequence will not distort our analysis because this region is not active in gene conversions and has retained the expected paralogous relationship (see discussion below).
As evident from the nucleotide sequence shown in Fig. 3, the fetal globjn genes from the five simian species share a high degree of identity not only in exon regions, but also in noncoding regions. Tables I and I1 summarize regional divergence results from noncoding and coding DNAs, respectively. A high degree of sequence conservation exist in and around the 5"flanking promoter elements (CCAAT and AATAAA), 5'-untransIated, exons 1 and 2, and intron 1 sequence regions. In contrast intron 2, 3'-untranslated, and flanking DNA regions have accumulated many more substitutions. The 3' noncoding regions all share the same poly(A) addition signal (AATAAA), with the spider monkey containing two signals; however, only the shared poly(A) signal is functional (Giebel et al., 1985). The intron 1 nucleotide sequence of the orangutan y2-gene is unusual because it shows a one base pair insertion (T at position 256, Fig. 3) which is identical to that found in the single y-gene of spider monkey and may have been present in the simian ancestor. Lengths of intron 2 sequence of the orangutan 7'-and y2-genes (874 and 876 bp, respectively) are shorter than intron 2 sequences of the other primate y-genes due to the presence of shorter tracts of the simple sequence DNA (TG),; values of n are 11 for the ylgene and 13 for the y2-gene. The value of n for the orangutan y'-gene is half that found in the orthologous gene of the other hominids shown in Fig. 3 (positions 1076-1137), which have n values ranging between 22 and 26. The shortness of the (TG), repeats in the orangutan yl-and y2-genes may be important because there is a lack of any clear gene conversion pattern in intron 2 which has been initiated by their (TG), sequence elements (see Fig. 4 and discussion below).
Evolution of the Orangutan y-Globin Genes-Comparison of the coding nucleotide sequences of both orangutan y-gene reveals that they differ by a total of five nucleotide substitutions, two nonsynonymous and three synonymous. The two nonsynonymous substitutions occurred in descent of the y2gene; the first is at position 276 (G/C; changing amino acid position 33, Val to Leu) and the second is at position 402 (A/ G; changing amino acid 75, Ile to Val). These positions in the orangutan yl-gene are identical to those found in all the other hominid y-genes (see Fig. 3 and Table 111). The nonsynonymous substitution at amino acid position 75 is predicted by amino acid sequence data (Huisman et al., 1973;Schroeder et al., 1978), but amino acid analysis at position 33 is -not available. Amino acid sequence heterogeneity is predicted for amino acid position 135, Thr/Ala (Huisman et al., 1973); however, our nucleotide sequences show that for position 135 both y-genes encode Ala. This difference could be indicative of allelic polymorphism in the orangutan population.
Synonymous substitutions occur in both orangutan ygenes; the first is located at nucleotide position 137 of the ylgene (C/T), the second is position 377 of the ?'-gene (G/A), and the last is located at position 449 of the y2-gene (A/G). All of these nonsynonymous and synonymous substations are unique to orangutan y-genes except for the substitution at position 449 in which orangutan y2 shares a G-nucleotide with the single y-gene of the spider monkey. This latter substitution provides evidence indicating that a conversion occurred in the conserved 5' region of the duplicated hominine y-genes, h Nucleotide Sequence of Orangutan Fetal Globin Genes with y ' accepting sequence transfer from the $-gene (see discussion below).
In addition to the above mentioned substitutions, the human and orangutan y-gene coding sequences differ by six substitutions, four of these substitutions are nonsynonymous and are located a t nucleotide positions 108 (stem to both orangutan y-genes), 408 (stem to all hominine y-genes), 490 (stem to hominine yl-genes including converted human and chimpanzee y2-genes), and 1500 (stem to hominine +genes). A summary of primate y-gene nonsynonymous substitutions (most of which are at the root of the split between caterrhine and platyrrhine species) are presented in Table 111. The orangutan yl-and y2-genes share the two synonymous substitutions (see positions 356 and 1447 in Fig. 3), the substitution at position 356 is also shared with both chimpanzee ygenes (Fig. 3). Two more synonymous substitutions are spe-cific for the chimpanzee genes, one a t position 419 (yz) and another at position 437 (7'). In addition, inspection of the aligned sequences (Fig. 3) shows that a minimum of 21 mutations (at positions -69,250, 408,490,937, 1138, 1240, 1241, 1383, 1500, 1537, 1538, 1539, 1540, 1551, 1612, 1695, 1712, 1724, 1768, 1776) unequivocally favor the genealogical grouping of Homo, Pan, and Gorilla into the subfamily Homininae, separate from the subfamily Ponginae (Pongo). The percent divergence values tabulated for these coding sequences (Table 11) reveal that the silent site divergence between Pongo and Homininae, 3.5% is similar to the divergence found from orangutan t-, $q-, a'-, and a'-globin gene nucleotide sequence and from total unique sequence genomic DNA hybridization data (Koop et al., 1986a(Koop et al., , 1986bMarks et al., 1986;Sibley and Ahlquist, 1984). Thus, silent sites in the coding sequences of these y-genes are not evolving a t a slower  rate than average noncoding DNA of hominid genomes, alone-fourth the average reported mammalian rate of 5 X lo-' though in previous studies (Scott et al., 1984;Slightom et al., substitutions/site/year (Li et al., 1985). The divergence be-1985) this was thought to be the case. From nucleotide setween y' and yz coding sequences is less within each hominid quence analyses of hominid genomic DNAs, we and others species than the divergence of positional orthologues among calculate the overall neutral drift rate to be about 1.3 X lo-' species (7' to y', y2 to r2), a finding which strongly suggests substitutions/site/year (Goodman et al., 1984, Slightom et  . Asterisks indicate positions where interspecific similarity between yl-(or y2-) genes is greater than intraspecific similarity between yl-and y2-genes. Two consecutive dots indicate greater intraspecific similarity and suggests a conversion event (labeled C1-C13). Below the map of yl-and yz-genes we have indicated three regions and using regional parsimony analysis we have determined the overall phylogenetic order for these three distinct y-gene regions. The complex nature of region 2 necessitated breaking up sequences into smaller subsequences that reflected patchy conversion regions (see text).

TABLE 111
Amino acid replacements encoded in primate y-globin gene nucleotide sequences amino acid residue number (nucleotide position) pair of hominid species (e.g. human and orangutan) is much flanking and intron 2 sequences. For these regions, we calculess than that expected in the absence of gene conversion for late divergence values for paralogous genes that are much less duplicated loci which arose in the stem of the Catarrhini than that expected for these duplicated loci (see calculation about 35 million years ago, long before the separation of results listed in Table I). Homininae from Ponginae. In addition, evidence supporting Parsimony Analysis Reveals Gene Conversions Between y l the occurrence of conversions comes from noncoding 5'and y 2 Genes-Our initial examination of the aligned y-gene sequences in Fig. 3 reveals that these y-gene sequences consist of three distinct regions which exhibit quite different evolutionary histories. These three regions are represented by nucleotide positions -154 to 495 (5"flanking and untranslated, exons 1 and 2, and intron I), 496-1535 (intron 2 and exon 3), and 1536-1792 (3"untranslated and flanking), respectively, see Fig. 4. Of these regions, only region 3 shows evidence that the two ancient paralogous y-gene lineages are unaffected by gene conversions. The orthologous splittings within these lineages show divergence of an orangutan clade from a human, chimpanzee, and gorilla clade (Fig. 4). Although an identical speciation pattern is evident in region l, the sequence relationships suggest that latter parallel duplications produced the duplicated y-genes of human, chimpanzee, gorilla, and orangutan. However, this suggestion is clearly inconsistant with the pattern shown in region 3 and the fact that two linked y-loci are present in all extant catarrhine primates. The pattern of sequence relationships found in region 1 can be explained if we hypothesize, in each species lineage, a transfer of genetic information via gene conversion events between paralogues of y-genes. The complex mosiac pattern (see below) found in region 2 could then be explained by the initiation and/or termination of patchy type gene conversion events extending from or into region 1. In order to test the gene conversion hypothesis in more detail, and also to delineate additional conversion regions, we examined each variable position in the aligned y-gene sequences for evidence of species-specific conversion event. Fig. 4 presents results from position-by-position parsimony analysis over the 1946 aligned nucleotide positions of the 10 simian fetal-globin genes shown in Fig. 3. A "dot" a t a variable site indicates that the positional paralogues are closer genealogically to each other than to positional orthologues, therefore supporting an hypothesized conversion, whereas an "asterisk" indicates the reverse and supports no conversion. For example, position 108 (Fig. 3) is represented by a dot in the orangutan row (0'02) of Fig. 4 because at this position the two orangutan y-genes share the same T-nucleotide whereas the other eight y-genes have a G-nucleotide. In contrast, position 583 is represented by an asterisk in the 0'0' row because orangutan, chimpanzee, and gorilla y2-orthologues share an A-nucleotide whereas the y1 orthologues of these three species share a G-nucleotide. A 13-bp deletion over positions 572-584 in the three human y-genes (HI, HZa, HZb) includes position 583, and this deletion is represented by a dot a t position 572 in each of the two human rows (H1H2" and HIHzb). Positions 69, 210, 250, 256, 408, and 449, each viewed independently of the others, have dots in the ancestral hominine (An'An') row because in each case all hominine yl-and y2-genes share a nucleotide or deletion which is different from the deduced ancestral nucleotide (rooted with the spider monkey y-gene sequence). Positions 256 and 449 are especially informative because they provide evidence that in conversion C1 (ancestral Homininae) y2 sequences accepted yl sequences. It should be noted that on putting together these results a t different positions, the most parsimonious solution does not require conversion of y2 by yl in the Homininae ancestor, but it does require that y 2 sequence accept yl sequences in the 5' conversion region of each species (C7, C8, and C11).
A highly nonrandom distribution of dots and asterisks are found for the yl-and ?'-gene comparisons shown in Fig. 4 with dots being clustered among sites in region 1 and asterisks clustered in region 3. This distribution of characters indicates that the y-gene 5' regions are prone to gene conversions, whereas the 3' regions resist conversions (Fig. 4). Between these two regions (from the end of exon 2 to the end of exon 3) is region 2 which contains approximately 1000 bases that show a complex, mosaic arrangement of dots and asterisks. T o account for the distribution of dots by the minimum number of gene conversion events, we treat two or more consecutive dots as part of the same conversion region. In fact, because of the scarcity of dots in the 3' region and the abundance of asterisks, even the presence of two widely separated dots which are not interrupted by asterisks appears to be a nonrandom occurrence. Analyses of mutational patterns in the 3' region indicate that parallel mutations resulting in the same nucleotide being present at an homologous position in yl-and y2-gene pairs are unlikely. As a result of these observations our minimal criterion for assigning a hypothetical gene conversion, using the data in Fig. 4, is the presence of two consecutive dots (not necessarily adjacent or even close) which are not interrupted by an asterisk. Once having identified a gene conversion region by this criterion the conversion is allowed to spread until it meets the first of two consecutive asterisks or enters a sequence region of relatively frequent mismatches between positional paralogues. However, a gene conversion does not have to be continuous, we can postulate a conversion event in which converted sequences are interspersed among stretches of unconverted sequences, i.e. a patchy or bubble conversion (Kourilsky, 1983;Michelson and Orkin, 1983;Stoeckert et al., 1984, Powers andSmithies, 1986). By using the above criteria, we have determined for each row of paired y' y2-genes in Fig. 4, the minimum number of gene conversion events needed to account for the pattern of dots and these results are delineated (as bars) in the upper part of Fig. 4. Even though the six sets of yl y2-gene pairs show 13 conversion regions (Cl-C13) we need hypothesize no more than a single conversion event per y-gene pair if we assume that the single conversion occurred via a patchy conversion mechanism.
Our present parsimony analysis confirms the presence of C2, C3, and C4 (human) suggested by Scott et al. (1984) and C5, C6, C7 (chimpanzee), and C8 (gorilla) suggested by Slightom et al. (1985). By the criterion of two consecutive dots, we further hypothesize two additional conversion regions in gorilla, C9 (extending approximately 120 bp 5' of the hot spot (TG), tract) and C10 for positions 1240 and 1241, downstream of the hot spot (Fig. 4). From our analysis of the orangutan y-genes, we hypothesize three conversion regions; C11, extending from the CCAAT-element through position 240 and possibly through exon 2, C12, extending from about position 620 to 720 in the upstream region of intron 2, and C13 which encompasses exon 3 and extends through position 1600 (see Fig. 4). In contrast to the hominine results, the hot spot does not border any of the orangutan conversion regions.
The hypothesized conversion, C1, which occurred in a late common ancestor of Homininae is neither supported or contradicted by our parsimony analysis, but is suggested by the lack of divergence in intron 2 (Table I). For instance, 3' of the hot spot in intron 2, there are no two linked dots which directly support C1 (An'An') and no linked dots which support a human conversion (HIHPA; H1HZB); yet the human ylgene has diverged only 3.0 and 3.8% from the y2-alleles A and B, respectively (Table I). This degree of divergence is much less than expected (8-12%) for sequences which descended from a gene duplication that occurred in the basal catarrhines about 35 MYA (Shen et al., 1981;Scott et ai., 1984). Only region 3' of exon 3 shows the expected degree of divergence (see Table I). Even in the 5' region of intron 2, where asterisks at 11 positions oppose conversion C1, the hominine yl-and y2-genes still differ by less than 7%; this suggests that another ancient gene conversion (CO, not shown in Fig. 4) Fig. 4 and clearly show different evolutionary histories with region 1 being involved in many gene conversions, whereby y2-gene sequences have accepted ?'-gene sequences. Region 3 shows no history of gene conversions, whereas region 2 shows an intermediate and more complex history (see Fig. 4).
in Fig. 5). Alternatively, the 5' intron 2 region of each hominid species may have been involved in species-specific conversions in which the y2-gene always accepted 7'-gene sequences. One further observation is that the sequence stretches which immediately flank many of the proposed conversions have accumulated mutations at a rate greater than that found in comparable parts of these genes. This is particularly evident in orthologous comparisons of 5' intron 2 and 3"flanking regions (see calculation results listed in Table I). Presently, it is difficult to determine whether these higher mutation rates are the cause or are the result of conversion terminations.
To further test whether the proposed conversions shown in Fig. 4 yield a meaningful picture we separated region 2 into sequence subregions with presumed independent histories and then determined their parsimonious branching arrangement. This branching arrangement is shown below region 2 in Fig.  4. Regions 1 and 3 were left intact and their most parsimonious branching arrangements are also shown below the gene map in Fig. 4. Note that when conversions are accounted for in region 1 and 2, sequence and species relationships for all three regions are compatible. Also, when conversions are accounted for the region 2 tree depicts distinct paralogous, 7'-and y2-lineages which predate the separation of Ponginae from Homininae. Thus, the three trees shown in Fig. 4 support our conversion hypotheses and divide Hominidae into subfamilies Ponginae (Pongo) and Homininae (Homo, Pan, Gorilla).
With regard to the directions of conversions, the branching pattern of the region 2 tree shows all converted sequences joining the y' branch with one exception. The exception, chimpanzee conversion C5, is clearly a member of the y2 branch as supported by the parsimony solutions at positions 1138,1142,1160, and 1172. The parsimony solutions at other positions show that, y2 sequences accepted y' sequences. Several examples of these positions are as follows. Position 626 provides parsimony evidence that the y2-gene accepted -$-gene sequences in the C12 region of orangutan and C3 and C4 regions of human chromosomes A and B, respectively.
However, gorilla and chimpanzee y2-genes are not converted here and contain a G-nucleotide, whereas the ancestral sequence (which contains an A-nucleotide, as deduced from spider monkey y-gene) was retained in the hominid y' lineage and transferred by conversions to orangutan and human y2-genes. Position 1023 provides evidence that either in an ancestral hominine conversion (Cl) or in later conversions of gorilla (C9), chimpanzee (C6), and human (C2 and C4), y2genes accepted 7'-gene sequences. The ancestral G-nucleotide (retained in spider monkey y and in orangutan 7') mutated to an A-nucleotide in the stem of hominid yl and then this A-nucleotide (either in one early hominine conversion event (Cl) or in separate later conversion events) was transferred to gorilla, chimpanzee, and human y2 genes. By using the same argument, position 449 provides evidence that either by conversion C1 or by separate later conversions, gorilla (CS), chimpanzee (C7), and human (C3 and C4) y2-genes accepted ?'-gene sequences.
Does the (TG), Tract Initiate Fetal-Globin Gene Conuersions?-As already noted, a tract of TG dinucleotide repeats (at positions 1076-1136, Fig. 3) occurs in intron 2 of all primate fetal-globin genes so far sequenced. This tract borders five of the gene conversion regions depicted in Fig. 4, two human regions (C2 and C4), two chimpanzee regions (C5 and C6), and one gorilla region ((29). Slightom et al. (1980) first suggested that these (TG), tracts may be a hot spot for genetic recombination. Such pyrimidine/purine repeats are known to adopt the left-handed Z-DNA conformation (Wells et al. 1982). Junction between Band Z-DNA conformations are accessible to single-stranded nucleases that act during genetic recombination (Singleton et al., 1982;Kilpatrick et al., 1984). Also tracts of pyrimidme/purine dinucleotides stimulate recombination in the Simian Virus 40 genome (Stringer, 1985) and promote homologous pairing of DNA by the Rec 1 protein from the fungus Ustelugo maydis (Kmiec and Holloman, 1986). Thus considerable evidence now supports the hypothesis that (TG), sequence elements can act as the initiation hot spot for recombinational events.
The nonallelic orangutan fetal globin genes lack recognizable gene conversions which have the (TG), elements at a boundary; however, it is possible that these elements did initiate recombinational events C11, C12, and C13 even though these conversions map at considerable distance from the hot spot. For this to be feasible these conversions would have to be patchy. Even over conserved genic regions (exons 1 and 2, and intron I), the degree of divergence between y1 and y2 genes is greater in orangutan than in each of the hominines (Tables I and 11), a finding which suggests that the conversions between the orangutan 7'-and y2-genes were discontinuous and patchy, as well as infrequent.
Perhaps, the orangutan (TG), hot spot has physical features which reduce its ability to initiate genetic recombination. Two important physical parameters are length of the (TG), element (Stringer, 1985) and the degree of nucleotide sequence identity immediately adjacent to the sites of strand transfer and invasion (Nicolas and Rossignol, 1983;Radding, 1982). Tracts of (CG), with values of n as small as five can adopt the Z-DNA conformation (Singleton et at., 1983); however, (CG), repeats from a Z-DNA helix more readily than (TG), repeats. The importance of length may rest with the increased efficiency in pairing permitted by longer tracts of a repeated dinucleotide (Stringer, 1985). The degree of sequence identity immediately adjacent to the (TG), elements may be the more important physical parameter affecting the frequency of ygene conversions. We note that in the gene conversions (C2, C4, C5; Fig. 4) which extend from the (TG), elements, the species-specific nonallelic gene pairs share complete sequence identity adjacent to both (TG), elements (see Fig. 3). In turn, we see no conversions extending from (TG), elements in the orangutan y-genes and the gene sequences immediately 5' of the (TG), tract do not share complete identity as they differ by a 3-bp deletion in the 7'-gene and a 6-bp insertion in the y2-gene. The orangutan y-gene nucleotide sequences immediately 3' of their hot spots also differ as a result of a 2-bp deletion in the y2-gene.
Concluding Remarks-Our analyses of the closely linked yl-and y2-fetal globin genes in these four hominid species provide us with a great deal of information concerning the effect that recombinational events such as gene conversions have on the evolution of a pair of duplicated genes. The results presented here clearly show that in all four species the evolution of the paralogous 7'-and y2-genes are affected because these gene pairs are evolving more like orthologous genes, and in addition the rate at which these gene pairs are, diverging is reduced. Thus, this fetal globin gene conversion mechanism is probably playing a role in the evolution of the y-gene pairs in all catarrhine species, although the degree and frequency of conversion appears to be quite different for the different species.
Why should there be more examples of the -,' -gene sequences (5' of the hot spot element) accepting y'-gene sequences than the reverse? A possible clue comes from human clinical studies (Bunn and Forget, 1986) which show that the ratio of ' 7 to *y chains in fetal hemoglobin is about three to one. We suggest, therefore, that harmful mutations in regulatory and coding positions are more likely to have functional effects and be selected against if they occur in the y1 locus rather than if they occur in the y2 locus. That is, selection should act more stringently on the y'-gene than the y2-gene, and the transfer of y'-sequences to y2-sequences would be more favorable than the reverse.