Rhesus fetal globin genes. Concerted gene evolution in the descent of higher primates.

The comparison of the nucleotide sequences of closely linked duplicated genes of higher eukaryotes has been important in the identification of molecular events that shape the evolution of mammalian genes, most notably recombinational events such as unequal crossovers and gene conversions. Toward this goal we have been comparing the nucleotide sequences of the paired gamma 1- and gamma 2-fetal globin genes from species of catarrhine primates. Previous comparisons document that, within each great ape species as in humans, the paired gamma-genes have been involved in gene conversion events. We now extend our analysis to the catarrhine superfamily Cercopithecoidea by obtaining the nucleotide sequence of the paired gamma 1- and gamma 2-genes of rhesus monkey (Macaca mulatta). The rhesus gamma 1- and gamma 2-genes diverge less from each other than from human, chimpanzee, gorilla, and orangutan gamma 1- or gamma 2-genes. This finding indicates that a species-specific gene conversion occurred between rhesus gamma 1- and gamma 2-genes. This gamma-gene conversion (labeled C14 in our series) involved at least 1898 base pairs, extending across the complete transcriptional region of the rhesus gamma-genes. C14 could have resulted from a single large conversion or several short conversion events which may have involved the (TG)n repetitive sequence element. Parsimony analysis of the enlarged body of gamma-gene sequence data also strengthens the evidence for the 14 previously suggested gamma-gene conversion events: labeled C2, C3, and C4 in Homo; C5, C6, and C7 in Pan; C8, C9, and C10 in Gorilla; C11, C12, C13 in Pongo; C1 in the stem to Homininae (the subfamily of Homo, Pan, and Gorilla) and CO in the stem of Hominidae (the family of Pongo and Homininae).

The comparison of the nucleotide sequences of closely , linked duplicated genes of higher eukaryotes has been important in the identification of molecular events that shape the evolution of mammalian genes, most notably recombinational events such as unequal crossovers and gene conversions. Toward this goal we have been comparing the nucleotide sequences of the paired 7'-and ?'-fetal globin genes from species of catarrhine primates. Previous comparisons document that, within each great ape species as in humans, the paired y-genes have been involved in gene conversion events. We now extend our analysis to the catarrhine superfamily Cercopithecoidea by obtaining the nucleotide sequence of the paired 7'-and y2-genes of rhesus monkey (Macaca mulatta). The rhesus 7'-and y2-genes diverge less from each other than from human, chimpanzee, gorilla, and orangutan yl-or y'-genes. This finding indicates that a species-specific gene conversion occurred between rhesus 7'-and y2-genes. This y-gene conversion (labeled C14 in our series) involved at least 1898 base pairs, extending across the complete transcriptional region of the rhesus y-genes. C14 could have resulted from a single large conversion or several short conversion events which may have involved the (TG), repetitive sequence element. Parsimony analysis of the enlarged body of */-gene sequence data also strengthens the evidence for the 1 4 previously suggested y-gene conversion events: labeled C2, C3, and C4 in Homo; C5, C6, and C7 in Pan; C8, C9, and C10 in Gorilla; C11, C12, C13 in Pongo; C1 in the stem to Homininae (the subfamily of Homo, Pan, and Gorilla) and CO in the stem of Hominidae (the family of Pongo and Homininae) .
During the evolution of eukaryotic genomes, genes have duplicated via nonhomologous and homologous chromosomal breakage and rejoining. In time, the duplicate genes have diverged as a result of base pair substitutions, deletions, and insertions. These processes resulted in the establishment of multigene families that encode protein products of similar, but not necessarily identical, functions. Diverging members of gene families have become in some cases pseudogenes if * This study was supported by National Institutes of Health Grant HL33940 (to M. G. and J. L. S.), National Science Foundation Grant BSR 83-07336 (to M. G.), and Alfred P. Sloan Foundation Award (to M. G.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

503938.
to the GenBankTM/EMBL Data Bank with accession numbetfs) llPresent address: Dept. of Plant Pathology, University of Wisconsin, Madison, WI 53706. their function is not needed and, in other cases, as a result of positive selection, have become founders of new gene types which encode proteins that have acquired new functions. Recently duplicated genes share identical nucleotide sequences, thus their divergence from each other may not be independent because of their involvement in recombinational events such as unequal crossing over and gene conversion. Unequal crossing over events can increase or decrease the number of genes, whereas gene conversion events can decrease, as in the case of concerted evolution (Maeda and Smithies, 1986;Walsh, 1987), or increase, as in the case of antigenic variation in trypanosomes (Pays et al., 19831, diver-gence between duplicated genes. Nucleotide sequence analysis of gene members of the aand @-globin gene families has yielded many examples of both unequal crossover events (Zimmer et al., 1980;Shen et al., 1981;Hill et al., 1986) and gene conversion events Michelson and Orkin, 1983;Scott et al., 1984;Slightom et al., 1985Slightom et al., , 1987. Additional examples of recombinational events have been identified in other multigene families such as haptoglobin genes (Maeda et al., 19841, human and mouse immunoglobulin genes (Bentley and Rabbitts, 1983;Flanagan et al., 1984;Hayashida et al., 1984), and mouse histocompatibility genes (Schulze et al., 1983;Weiss et al., 1983). The @-type globin gene clusters of catarrhine primates are highly conserved, consisting of six paralogously related genes which are arranged (5' to 3') in the order of their developmental expression; 5'-c-(embryonic), duplicated y' and y2 (fetal), inactive $7 (a pseudogene), and 6 and @ (adult) (Efstratiadis et al., 1980;Barrie et al., 1981) ;Zimmer, 1981;Scott et al., 1984;Goodman et al., 1984;Slightom et al., 1985Slightom et al., , 1987. The linked yl-and y2-genes are physically located within a 10-kb' DNA region which arose as a result of a 5-kb duplication (Shen et al., 1981). This duplication occurred about 25 to 35 million years ago in the stem of the Catarrhini branch after divergence of Platyrrhini but before the divergence of superfamilies Cercopithecoidea (old world monkeys) and Hominoidea (Homo, Pan, Gorilla, Pongo, and Hylobates) (Barrie et al., 1981;Koop et al., 1986aKoop et al., , 1986bZimmer, 1981).
In order to determine the type of molecular events which took place between the members of a duplicated gene pair, we have isolated and sequenced the closely linked y-fetal globin gene pair from the major group of higher primate species. We have previously obtained the nucleotide sequence of the yland y2-genes of human (Homo sapiens), chimpanzee (Pan troglodytes), gorilla (Gorilla gorilk), and orangutan (Pongo pygmaeus) and used the parsimony method (Goodman et al., 1984) to identify, within the paired y-genes of each hominoid species, homologous regions which have been affected by gene The abbreviations used are: kb, kilobase pairs; bp, base pairs. of Rhesus Fetal Globin Genes conversion events. Thus far we have identified conversion regions accumulatively labeled CO to C13 Scott et al., 1984, Slightom et al., 1985, 1987. Many of these gene conversion events appear to have been initiated at the boundaries of a stretch of simple-sequence DNA, (TG), element, located in intron 2. This (TG), element may provide a "hot spot" for strand breakage and transfer between paired homologous regions in the y-genes Kilpatrick et al., 1984). Interestingly, our previous analyses indicate that conversion directions are not random; most ygene conversions, except C5 in chimpanzee (Slightom et al., 1985), have y'-gene sequences (donor sequence) superimposed onto ?'--gene sequences (acceptor sequence) (see . In this report we present the nucleotide sequence of the 7'and y2-globin gene of a rhesus monkey (Macaca mulatta) and, on the basis of this additional sequence data on y-globin genes, extend our reconstruction of the evolutionary history of the duplicated catarrhine y-genes to the two superfamilies that comprise the infraorder Catarrhini, Cercopithecoidea, and Hominoidea. The single y-gene sequences from spider monkey (Giebel et al., 1985), brown, and dwarf lemur (Harris et al., 1986) were used as out groups of the catarrhine y-gene sequences. Comparison of the rhesus yl-and y2-genes shows that they are nearly identical over their complete transcriptional region and therefore appear to have been involved in an extensive (over 1898 bp) gene conversion event (or a series of smaller conversions) labeled C14.

EXPERIMENTAL PROCEDURES
Materials-Restriction endonucleases were obtained from either Bethesda Research Laboratories or New England Biolabs and used as specified by the vendor. Polynucleotide kinase was obtained from U. S. Biochemicals and bovine alkaline phosphatase was obtained from Boehringer Mannheim. Radioactive nucleotide [ O -~~P ] C T P (2000 to 3000 Ci/mM (1 Ci = 3.7 X 10" Bq)) was obtained from New England Nuclear, and [y-32P]ATP (2000 to 3000 Ci/mM) was obtained from ICN Pharmaceuticals. 3'-End labeling kit was obtained from Amersham Corp. T4 ligase was obtained from Collaborative Research and nitrocellulose filter membranes (BA-85) were obtained from Schleicher & Schuell. Chemicals used for DNA sequencing were obtained from vendors recommended by Maxam and Gilbert (1980). X-ray roll film (Kodak XAR-351) and DNA sequencing gel stands and safety cabinets were obtained from Fotodyne, Inc. Intensifying screens (Quanta 111) were obtained from Du Pont-New England Nuclear. Wedge spacers (60 cm long and thickness ranging from 0.2 mm (top) to 0.4 mm (bottom)) for forming bell-bottom gel molds were obtained from C. B. S. Scientific Co. Inc.
Rhesus ?-Gene Cloning and DNA Isolation-Total DNA from rhesus monkey (M. rnulatta) was a gift from Dr. Sandy Martin (University of California, Berkeley, CA) and its isolation has previously been described . Purified rhesus DNA was partially digested with either MboI or EcoRI, and DNA fragments in the range of 15 to 20 kb were size-fractionated on 5 to 20% NaCl gradients . Size-fractionated rhesus DNAs were cloned into the appropriate X vector arms; MboI-cut fragments were cloned into BarnHI-cut Charon 30 arms (Rimm et al., 1980), and EcoRI-cut fragments were cloned into EcoRI-cut Charon 32 arms (Loenen and Blattner, 1983). Recombinant phage DNAs were packaged into phage capsids using the in uitro phage-packaging procedure described by Hohn (1979), and both recombinant phage libraries contained between 1 to 2 X lo6 phage. These libraries were plated on Escherichia coli host (Charon 30 on DP50supF and Charon 32 on the recA-ED8767), and the resulting phage plates were screened in situ by blotting with nitrocellulose filters as described by Benton and Davis (1977). Nitrocellulose filters were hybridizedagainst 32P-labeled AuaII to EcoRI fragments (245 bp) cut from the human y-gene clone pJW151 (Wilson et al., 1978). Recombinant X clones containing the regions of the rhesus @-globin gene cluster were purified and phage DNAs isolated as described by Slightom et al(l980). The recombinant rhesus clone MmuCh30-13.1 was isolated by Dr. Sandy Martin and was found to contain the complete rhesus $-gene. From screening the rhesus-Ch32 library we isolated numerous clones containing regions of the rhesus &globin gene cluster, but we were unable to obtain a single clone containing the linked yl-and y*-globin genes. We isolated recombinant X clones MmuCh32-14.7 and MmuCh32-4, which contain the rhesus y-gene linkage region, as determined by sequence analysis and comparison with this region of the sequenced human y-genes (Shen et al., 1981). The 8.8-, 2.6-, and 10.0-kb EcoRI fragments containing the rhesus TI-, y2-, and $7-globin genes, respectively, were subcloned into pBR322 to facilitate DNA sequencing.
Transformation of E. coli K12 strain HBlOl was done using the extended calcium shock method described by Dagert and Ehrlick (1979), and the resulting clones were designated pMmu13.1-R8.8, pMmu14.7-R2.6, and pMmu14.7-R1O.O (see Fig. 1). Plasmid DNAs were purified using the alkaline-extraction procedure described by Birnboim and Doly (19791, followed by two CsC1-ethidium bromide gradient bandings in a type 70.1 rotor (fixed angle rotor) to ensure removal of contaminating small RNAs. scribed by Maxam and Gilbert (1980). Generally, 10 to 20 gg of DNA Sequencing-Chemical sequencing was performed as deplasmid or 50 fig of X phage DNAs were digested with the appropriate restriction enzyme and either 5'-and/or 3"end-labeled. The conditions for end labeling and chemical sequencing, to ensure long DNA sequence reads (greater than 600 bp), have been described previously . A description and detailed procedure gels have been presented by Slightom et al. (1987). Most DNA for forming, running, exposing the 100-cm bell-bottom sequencing sequences were determined from both DNA strands by either sequencing the strands from opposite directions or by doing both 5'-and 3'end labelings of the same restriction enzyme site. Even though the sequencing strategy (see Fig. 1) shows an arrow pointing in only one direction, sequence information was obtained from both strands. Nucleotide sequences that differ among the aligned y-globin gene sequences (see Fig. 2) were closely analyzed to ensure the accuracy of the rhesus y-gene sequences present here. This alignment provides an excellent position-by-position test for judging the accuracy of any new primate y-gene sequence and each nucleotide of these rhesus ygene sequences (presented in Fig. 2) has passed this rigorous test. Gene Conversion and Evolutionary Analyses-Using the principle of minimum change or parsimony as described in previous studies (Scott et al., 1984;Goodman et al., 1984;Slightom et al., 1985Slightom et al., , 1987, we aligned the nucleotide sequences of 14 fetal globin genes (see Fig.  2) and then determined independently, at each variable position, the most parsimonious branching arrangement. At each variable position where two y-genes of a catarrhine taxon shared a sequence identity, which the parsimony solution depicted as derived rather than primitive, the result was taken as evidence for a putative conversion encompassing that position in the yl-and y2-gene pair. Conversely, absence of sequence identity could be taken as evidence against conversion for that gene pair at that position, depending on the parsimonious branching arrangement. On synthesizing the results of all the separate parsimonious solutions, we identified gene conversion regions by minimizing the number of conversion events needed to account for the overall most parsimonious solutions.

RESULTS AND DISCUSSION
Isolation, Nucleotide Sequence, and Derived Amino Acid Sequence-The X and plasmid clones used to isolate and sequence the rhesus y-globin genes are shown in Fig. 1. Rhesus yl-and y2-genes are located on 8.8-and 2.6-kb EcoRI fragments, respectively, an organization similar to that found in other higher primates (Barrie et al., 1981;Zimmer, 1981;Slightom et al., 1980;Scott et al., 1984;Slightom et al., 1985Slightom et al., , 1987. The X clones shown in Fig. 1 do not physically link the two rhesus y-genes, but X clone MmuCh32-4 does contain a fragment that links X clones MmuCh30-13.1 and MmuCh32-14.7. The nucleotide sequence of this fragment is orthologous (data not shown) to the nucleotide sequence of the linkage region between human Gy-and *?-genes (Shen et al., 1981). In addition, our sequence analysis of the 3"flanking DNAs (Fig. 2) clearly indicate that these two rhesus y-genes are not alleles. The organization of the rhesus y-genes presented here is similar to that found in a closely related species, the yellow baboon (Papio cynocephalus) (Barrie et al., 1981;Zimmer, 1981). Fig. 1 shows the strategy used to obtain the nucleotide sequence of the rhesus yl-and y2-genes; in Fig. 2 ".  The top line shows the organization of the human Gy-, Ay-, $v-, and &globin genes. Raked bars denote the location of gene transcriptional regions. The location of the various restriction enzyme sites found in the human 0-globin cluster are in accordance with published findings (Shen et al., 1981;Collins and Weissman, 1984;Miyamoto et al., 1987). Insert regions from the rhesus recombinant A clones MmuCh30-13.1, MmuCh32-14.5, and MmuCh32-4 are positioned below the corresponding regions of the human gene cluster. The plasmid subclones that contain the rhesus globin genes are shown below the corresponding region on the A clones: y1-pMmu13.1-R8.8; 7'-pMmu14.5-R2.6; and $~-pMmul4.5-R10.0. Horizontal arrows shown below the rhesus A and plasmid clones denote the restriction enzyme sites which were 5'-and 3'-end-labeled (to allow both strands to be sequenced) with 32P, the direction and distance sequenced from the end-labeled site. Restriction enzymes are denoted as foIlows: Ac, AccI; Ap, ApaI; Ba, BarnHI; Bg, BgDI; H, HindIII; R, EcoRI. sequences from rhesus and other species are aligned against one another.

1'
Amino acid sequence analysis of the y chains isolated from fetal hemoglobin of a rhesus monkey revealed no amino acid heterogeneity at any of the 146 chain positions (Mahoney and Nute, 1980). However, the derived amino acid sequences of the two y-genes ( Fig. 2) differ due to the nucleotide replacement substitution at position 1487 (C/T), which results in an amino acid change in residue 117 (His/Tyr). The amino acid sequence determined by Mahoney and Nute (1980) indicates that a y2 allele which encodes His at this position apparently also exists in the rhesus monkey population; however, the predominant allele remains to be determined. The y chains encoded by rhesus and human differ at two amino acid positions, 77 and 104, where rhesus shares amino acid residues Asn and Arg with orangutan and spider monkey, see Fig. 2 and Table I. Both rhesus y-genes encode Gly at position 136 (consistent with the findings of Mahoney and Nute (1980)) as do both y-genes of orangutan and the single y-gene of spider monkey (Table I).
Conservation of y-Gene Structure and Regulatory Elements-From the 14 aligned nucleotide sequences of y-genes, obtained from eight different primate species (Fig. 2), it is apparent that all share a high degree of nucleotide sequence conservation of DNA elements known to be important for ygene expression and mRNA processing. For example the transcriptional promoter elements, CCAAT and AATAAA, are located within stretches of sequence which are virtually invariant. The imperfect direct repeats that resulted in the placement of two consecutive CCAAT elements (Efstratiadis et al., 1980) are present in all of these genes, except brown lemur, which supports the suggestion of an ancient origin for the duplicated CCAAT-containing sequence (Efstratiadis et al., 1980;Harris et al., 1986). The position and sequence of the 3"regulatory poly(A) signal (Proudfoot and Brownlee, 1976) is also invariant. All of these genes contain GU-rich and U-rich sequence elements (positions 1669 to 1679 and 1686 to 1695, respectively) which may be essential for mRNA 3'-end formation (Gil and Proudfoot, 1987).
Bst E l l t-1 -

FIG. 2. Nucleotide sequence alignment of primate y-globin genes.
From a single human the '7-and 'ygenes from chromosome "A" are denoted Hsa aG and Hsa a A , respectively, and the 'y-gene from chromosome "B" is denoted Hsu bA Shen et ul., 1981); gorilla, Ggo (Scott et al., 1984); chimpanzee, Ptr (Slightom et al., 1985); orangutan, Ppy ; spider monkey, Age (Giebel et al., 1985); and lemur, Lfu, and dwarf lemur, Cme (Harris et al., 1986). The numbering system used was set by the overall alignment of the primate y-gene sequences; asterisks indicate gaps which were used to increase sequence identities among these genes. For any position where one sequence differs from another, the nucleotide (or asterisk) for that position is given for each gene sequence. Note nucleotide sequence elements that may have biological importance: promoter elements CCAAT and AATAAA and poly(A) addition signals are ouerlined. The y-gene amino acid sequences are shown below the lower counting line, and amino acid differences are printed below the appropriate codon. The translation initiator codon is the first Met, and the terminator codon is designated TER. Vertical arrows show the location of exon-intron boundaries, all of which conform to the GT/AG rule (Breathnach et al., 1978). Horizontal arrows indicate the sizes expected for the 5'-and 3"untranslated regions; however, note that the length of both these regions has only been substantiated for the human genes (see Slightom et al., 1980) and the 3'-region of the spider monkey gene (Giebel et al., 1985).

Amino acid replacements encoded in primate y-globin gene nucleotide sequences, amino acid residue number (nucleotide position)
only one difference is observed in the lemur sequences, a transversion (G to T/C) in position 7 of the donor consensus sequence (C or A-A-G/g-t-a or g-a-g-t; Mount, 1982). Similarly, variation in intron 1 acceptor sequences (from the consensus sequence (c or t),-n-c or t-a-g/G) is found only in the lemur sequence which have a transition just after the splice junction (position 268 in Fig. 2, A to G).
In contrast, intron 2 sequences show a considerable amount of variation in both length and shared nucleotide sequence identity (Fig. 2). The shortest intron 2 sequence (849 bp) is found in dwarf lemur (Harris et al., 1986) and the largest is in the gorilla 'y (906 bp) (Scott et al., 1984); rhesus intron 2 sequences contain 895 and 887 bp for the 7'-and y2-genes, respectively. However, intron 2 donor and acceptor splice sequences show few substitutions, and the observed substitutions retain nucleotide identity with the consensus sequences, e.g. the A or G nucleotides in positions 2 and 6 of the donor sequence (Fig. 2). The intron 2 acceptor sequences are invariant in and near the splice junction; their only differences, of which most are transitions, are found in the pyrimidine-rich region which does not alter the nucleotide composition of the intron acceptor sequences. These comparative observations are consistent with the intron splice sequence requirements found by van , in which nucleotide sequences adjacent to the donor (3' 6 bp) and acceptor (5' 20 bp) were found to be important components of the splicing mechanism and thus are strictly conserved. Despite the nearly 20% divergence among the primate y-gene intron 2 sequences presented in Fig. 2, the mechanism for splicing out these introns probably remains unaltered, following the excision order described by Lang et al. (1985). Mutations in the essential splice donor and/or acceptor nucleotides or mutations which create new internal donor or acceptor splice sites can have devastating consequences, such as that resulting in p-thalassemia (Orkin and Kazazian, 1984;Dobkin and Bank, 1985).
Extensive Gene Conversion between Rhesus y-Globin Genes- Fig. 3 presents the conversion regions for the duplicated catarrhine y-genes which are suggested by the positionby-position parsimony analysis over the 1994 aligned nucleotide positions of the 14 primate y-genes shown in Fig. 2. A "dot" at a variable site indicates that the positional paralogues of yl-and ?'-genes of the same species are closer genealogically to each other than to positional orthologues, therefore supporting an hypothesized conversion. In contrast, an open "square" indicates the reverse and supports the absence of conversion. Eight rows of these intraspecies 7'-to y2-gene comparisons are presented two of the rows are for the human genes (consisting of sequenced yl-or '7-gene from chromosome A, y2-or Ay-gene also from chromosome A, and the *ygene from a second chromosome, B). Four rows are for the paired yl-and y2-genes of chimpanzee, gorilla, orangutan, and rhesus monkey. The seventh row is for the deduced ancestral hominine (human, chimpanzee, and gorilla; HCG in Fig. 3) paired genes, and the eighth row is for the deduced ancestral hominid (hominine and orangutan; HCGO in Fig. 3) paired genes. Each of the eight rows shows a highly nonrandom distribution of dots and squares which identify the same three distinct regions suggested by our earlier analysis . The distribution of dots indicates that the y-gene 5"regions (the approximately 650 bp extending from the 5"flanking region through exon 1, intron 1, and exon 2; positions -153 to 495 in Fig. 2) are prone to gene conversions, see region 1, Fig. 3. In contrast, the 3"regions (3"untranslated and flanking DNAs, labeled region 3 in Fig. 3; positions 1582 to 1839 in Fig. 2) are not prone to conversions. Between these 5'-and 3"regions is located region 2 (consisting of intron 2  These regions were derived using the single y-gene sequence alignment of spider monkey, brown lemur, and dwarf lemur (as shown in Fig. 2) to identify where intraspecific similarity of the duplicated yl-and y2-genes (paralogous positions) is greater than the interspecific similarity of y' or y2 (orthologous positions). The open boxes indicate positions where interspecific similarity between y1 (or ?')-genes is greater than intraspecific similarity between yl-and y2-genes. Two consecutive dots indicate greater intraspecific similarity and suggest a conversion event , labeled CO to C14. These suggested conversion regions are shown above the lines containing position by position analysis, represented as boxes or dots. Below the map of yland y2-genes are indicated the three y-gene regions which show different evolutionary histories, and using regional parsimony analysis the overall phylogenetic order for these regions has been constructed. The complex nature of region 2 necessitated breaking up sequences into small subregions that appear to reflect patchy conversion regions (see discussion . Fig. 2) which shows a mosaic arrangement of dots and squares in rows representing paired hominid genes but essentially only dots in the row for the paired rhesus y-genes.

in
The distribution of dots can be accounted for by the minimum number of hypothesized gene conversions, if we consider two or more consecutive dots as part of the same conversion region. Each hypothesized gene conversion region is allowed to spread until it meets the first of two consecutive squares or enters a sequence region of frequent mismatches between positional paralogues. According to this procedure the entire transcriptional region of the rhesus y-genes has been involved in a conversion, labeled C14 in Fig. 3. Of the intraspecific gene conversions identified so far, C14 is the largest and the only one postulated for extant duplicated y-genes which involves the complete transcriptional region. In the 3"direction the rhesus y-gene conversion appears to terminate near position 1696 (Fig. 2), which is 500 bp downstream from the 3'end of the (TG), element (position 1190, in Fig. 2). Downstream, into region 3, the two rhesus genes show a divergence of 18.9% which is somewhat higher than the average level expected (about 14 t o 15%) for paralogous y sequences that diverged about 25 to 35 million years ago (Shen et al., 1981;Slightom et al., 1985Slightom et al., , 1987. This higher level of divergence in this short stretch of rhesus monkey DNA suggests that the rate of neutral DNA evolution is higher in cercopithecoids than in hominoids (Koop et al., 1986a;Miyamoto et al., 1987). Additional nucleotide sequencing of extensive stretches of nonconverted regions of the rhesus y-genes is needed to test the validity of the higher neutral rate suggested by this comparison.
In the 5"direction rhesus conversion C14 extends beyond the y-gene sequences aligned in Fig. 2. Alignment of the presently available 200 bp of additional 5"flanking DNA sequence from the two rhesus y-genes (Fig. 4) shows a continuation of sequence identity with the first difference being located at position -157 and a second difference being located 56 bp farther upstream, at position -214. This extensive nucleotide sequence identity between the rhesus yl-and y2-5"flanking DNAs extends up to position -282, after which three closely spaced sequence differences are found (Fig. 4). The original y-gene duplication encompasses about 5 kb (Shen et al., 1981), and its 5"boundary is located about 1200 bp 5' farther than the sequences shown in Fig. 4. The degree of divergence between the 5'-most 100 bp of these rhesus ygene regions presented in Fig. 4 is only about 4%, much below the expected level of 14 to 15% for noncovered y-gene regions (Shen et al., 1981). Thus, position -281 is unlikely to be the termination point for all conversion events which extended into these 5"flanking regions, but it may represent the 5'termination point for the most recent conversion. If we assume that the 5'-and 3"boundaries of converted sequence in

GGCAMGGCTA. T L T T M G C A G C A G T A T C C . TCTTGGGGGCCCC -201
-

~~C M C C T T~~C~~A G T C~A G A G T A T C A G G T G A~~~A -~C G G C -51
, , I I I I I I I , I I , , I I I I I I I I I I , , , , , , , , , ,~, , , I I I I I I I I I I I I  , , I I I I I I I I I I . , I  FIG. 4. Possible location of the 5"termination point of C14. Additional nucleotide sequences of the 5"flanking region of the rhesus yl-and y2-genes have been obtained (Fig. 1) and compared. The compared sequences begin at the capped nucleotide, also shown in Fig. 2, and extend 200 bp farther in the 5' direction than the sequences shown in Fig. 2. Gene regulatory elements, CCAAT and AATAAA, or underlined and the horizontal arrows denote a region within the imperfect direct repeat (Efstratiadis et al., 1980) which, in the case of these rhesus genes, is a perfect direct repeat.

Is C14 the Result of a Single or Multiple Conversion
Egents?-If the converted region of the rhesus y-genes labeled C14 occurred as a single conversion event, which included the (TG), elements, no significant variation in the degree of divergence over this region would be expected. However, if the divergence pattern within C14 shows disparity among regional comparisons, C14 may actually be the result of more than one conversion event. Uniformity of divergence within C14 can be tested by analyzing divergence within specific gene regions; we selected genic regions with natural boundaries (exons, introns, untranslated, and flanking) and subdivided intron 2 by using the 5'-and 3'-ends of the (TG), element as arbitrary boundaries. Both 7-genes are active; thus, coding sequences were examined separately from noncoding because exons are subject to selective pressures. The divergence values are as follows: 5"flanking (3/281 = 1.0%); 5"untranslated + exon and intron 1 + exon 2 (0/491 = 0%; these regions were grouped because they show no divergence); 5'-end of intron 2 to 5'-end of (TG), element (4/578 = 0.7%); 3'-end of (TG), element to 3'-end of intron 2 (3/248 = 1.2%); exon 3 (1/136 = 0.7%); and 3'-untranslated region (3/86 = 3.5%). This analysis of divergence clearly shows disparity among the various regions of C14; in intron 2 the degree of divergence 3' and 5' of the (TG), element differs by almost a factor of 2, which supports the possibility that the conversion region designated C14 may be the result of more than one conversion event. The most recently converted part of C14 is located 5' of the (TG), elements, a possibility supported by the fact that the two rhesus (TG), elements are identical in their 5"regions but differ in length (4 bp) and composition in their 3"regions ( Fig. 2). Two possible histories of conversion events could account for the disparity in the divergence of the regions of C14. To account for divergence disparity in different regions of C14, we present two possible histories of conversion events. Frame A shows two conversion events which start at the boundary of the (TG), element (shown as X X ) and spread in opposite directions. The 3'-conversion C14a occurred prior to the 5'-conversion C14b. Frame E represents conversion C14a as a more ancient conversion than the later conversion C14b, which is initiated at the 5'-boundary of the (TG), element and spread only in the 5'-direction. In both schemes a third conversion, C14c, is suggested to have occurred in the region which shows no divergence. of Rhesus Fetal Globin Genes events, whereby the first conversion would cover the same region identified for C14 (labeled C14a, in frame B ) , and it is followed by a later conversion (labeled C14b, in frame B ) initiated at the 5'-end of the (TG), element, but extends only in the 5"direction. Both conversion histories account for the disparity in divergence found 5' and 3' of the (TG), element; to account for the part of C14 which shows no divergence a third conversion (labeled C14c, in both frames A and B ) must be added. Thus, even though conversion c14 appears to be straightforward, it may have been the result of a complex history of conversion events. Further delineation of the actual history of conversion events will require additional y-gene nucleotide sequence information from other rhesus individuals and closely related catarrhine species.
By using the above divergence data and the regional molecular clock derived by Koop et al. (1986a), the events responsible for converting the 5'-and 3"regions of C14 occurred about 3 and 6 million years ago, respectively (Fig. 5). The deduced time for the 5'-region of C14 is almost as recent as the time proposed for C4 of humans, which may have occurred within the last million years (Shen et al., 1981). The age of the latest 3'-region conversion event is also supported by the replacement substitution at position 1487 ( Fig. 2) of the rhesus yL-and ?'-genes. Using the replacement rate of 1% for each 10 million years ago, derived by Efstratiadis et al. (1980), one replacement site substitution in exon 3 is expected to occur about every 5 million years. The 5'-part of C14, which includes the 5-untranslated region + exon and intron 1 + exon 2 + 293 bp of intron 2, shows no divergence and thus might have been involved in a very recent conversion, as depicted in Fig. 5, frames A and B. The fact that the corresponding regions from the paralogous y-gene pairs of Homo, Pan, Gorilla, and Pongo have also shown lower than expected levels of divergence suggests that this region may be prone to undergo conversions ( Fig. 3; Slightom et al., 1987). Furthermore, these 5'-conversions may not require by the (TG), element .
Evolution of the (TG), Element-The simple sequence repeat, (TG), element, is not present in the brown and dwarf lemur genes nor in the rabbit y-globin gene (Hardison, 1981), but is present in spider monkey and catarrhine y-genes. Thus, it emerged in the lineage to basal Anthropoidea after separation of the lineage to Lemuriformes. This (TG), element in a common ancestor of platyrrhines and catarrhines might have emerged from random substitutions which generated a stretch of alternating pyrimidine/purine nucleotides, or it may have emerged suddenly from an ancient transposition event which left a proto-(TG), element upon its departure. Evidence for the latter possibility comes from the observation that the (TG), elements are flanked by short repeats. After this proto-(TG), element appeared, its development probably continued as a result of mispairings during exchanges between homologous in the single y-locus of the early simians. After y-gene duplication in the stem of catarrhines (Barrie et al., 1981;Shen et al., 1981), the developed (TG),, element could also initiate exchanges between paralogous yl-and y2-loci, thus reducing divergence between these genes and increasing their chance of evolving in concert.
The (TG), elements present in the rhesus y-genes are among the largest found thus far, with n = 24 and = 22 for yl-and y2-genes, respectively (Fig. 2). Moreover, the DNA sequences flanking these (TG), elements share a high degree of identity. These rhesus (TG), elements are considerably larger than those found in the orangutan genes, which have n values of 11 and 13 . Our analysis of the orangutan y-genes suggests that this pair of genes has experienced a low number of gene conversions, none of which appear to have been initiated by the (TG), elements. This low level of conversions between the orangutan y-genes correlates with its short (TG), elements and lack of shared identity between nucleotide sequences adjacent to the (TG), elements. In contrast, the y-genes of rhesus appear to have experienced a recent and extensive gene conversion. The long length of the rhesus (TG), elements and high degree of identity shared between sequences adjacent to the (TG), elements (especially 5' of the element) support our hypothesis that these elements play an important role in some of the ygene conversions . This hypothesis is also supported by analysis of the role that (TG), elements play in recombinations between SV40 molecules (Stringer, 1985) and the reduced level of conversion in Ascobolus immerses as a result of mispairing between exchanging DNA strands (Nicolas and Rossignol, 1983). The occurrence of a recent and extensive conversion between the rhesus y-genes obscures any evidence of more ancient conversions in the ancestry of these genes; thus, it is impossible to determine if the ancestral gene pairs were frequently involved in conversion events. However, the fact that the rhesus y-genes not only contain the elements favoring the initiation and extension of conversion events but also have been involved in a recent conversion event suggests that these genes may frequently have engaged in gene conversions. Thus, an important extension to our hypothesis is that if the (TG), element is long and its flanking DNAs match, then frequent conversions can be expected. In other words, the lack of divergence between a gene pair indicates their involvement in frequent conversion events, which is consistent with the gene conversion model proposed by Walsh (1987).
Evolution and Polarity of y-Gene Conversion-With the addition of the rhesus sequence to the y-gene alignment (Fig.  2), our parsimony analyses continue to support the previously identified intraspecific y-gene conversion regions for the four hominid species: human conversions C2, C3, and C4; chimpanzee conversions C5, C6, and C7; gorilla conversions C8, C9, and C10; and orangutan conversions C11, C12, and C13 . Also, inclusion of the paired rhesus ygenes in the parsimony analyses supports the somewhat patchy ancestral hominine conversion C1 and provides evidence for an ancestral hominid conversion, CO, which continuously spans (except for one short interruption) all of regions 1 and 2 (starting in the 5'-flanking DNA region and terminating in the 3"untranslated region, Fig. 3). As can be seen in the evolutionary trees constructed by the parsimony method for regions, 1,2, and 3 of the aligned y-gene sequences, only region 3 (3"untranslated and flanking DNAs) shows complete absence of gene conversions.
Although identification of the converted region of the rhesus y-gene pair was relatively straightforward, assignment of the y-gene donor and acceptor sequences within the subregions is difficult. Not only does C14 span the complete transcriptional region of the paired rhesus genes, but so does the ancestral conversion CO (the gene conversion hypothesized to have occurred in the hominid stem), except for one location. A three-base position gap occurs at nucleotide positions 503 to 506 in the y2-gene of rhesus monkey, orangutan, and gorilla and in the $-gene of rhesus monkey, but not in any of the other y-genes (see Fig. 3). If this location is excluded from the hypothesized conversion CO in the hominid stem (i.e. CO is considered to be a patchy conversion), this gap can be explained as resulting from a three-base deletion in the nascent y2-locus which arose in the catarrhine stem after the ygene duplication. Much later in the lineage to rhesus monkey, y2-donor sequence containing this deletion replaced, by a conversion event, the homologous 7'-acceptor sequence. This three-base deletion region indicates that for the rhesus ygenes the region of C14 which is 5' of the (TG), element y2gene sequence was superimposed onto the -$-gene. The 5'region conversions in human (C3 and C4) and chimpanzee (C7) genes apparently involved 7'-sequences superimposed onto y2-genes , which would account for the absence of the 503-to 506-position deletion in human and chimpanzee y-genes. The 5'-conversion regions in gorilla (C8) and orangutan (C11) genes end before reaching position 503; thus, the deletion at positions 503 to 506 in the y2-lineage of the gorilla and orangutan y2-genes would be expected to remain. This postulated occurrence of a patchy-type conversion pattern in the hominid lineages has allowed us to deduce that in these lineages 7'-donor sequences were superimposed onto y2-acceptor sequences in most conversions; only chimpanzee conversion C5 of the hominid intraspecific conversions shows the opposite polarity (Slightom et al., 1985). From this lengthy argument we conclude that the three-base deletion at positions 503 to 506 does provide the only evidence for the polarity of at least part of C14, suggesting that like C5 in chimpanzee the rhesus y2-donor sequence was superimposed onto the 7'-acceptor gene sequence.
Concluding Remarks-The sequence of the paired y-globin genes from rhesus has provided us with the most striking example so far of y-gene conversion. Conversion C14 is even more extensive than conversion C4, which occurred between the human y-gene pair on chromosome A and involved about 1500-bp Shen et al., 1981), because it extended across the complete length of the rhesus y-gene transcript and into the 5"flanking DNAs. Whether such extensive gene conversions are common to y-gene pairs of other rhesus individuals or other Cercopithecoid species remains to be determined. Further investigation of y-genes from Cercopithecoid lineages could be quite rewarding in view of the fact that this superfamily contains many more living species than in Hominoidea. Thus, it should be possible to determine if extensive species-specific conversions are common in cercopithecoids or if their conversions patterns will be similar to the patchy-type found in hominoids. Walsh (1987) has recently presented a mathematical gene conversion model in which conversion events are considered to be frequent if members of a gene pair are found to be identical. Identity of the gene pair members is maintained by frequent conversions until mutations occur (base mutations, insertions, or deletions). As the members of a gene pair accumulate such mutations conversion frequencies are reduced until the individual gene members "escape" from the conversion process responsible for their concerted evolution (Walsh, 1987). The y-gene pairs of the catarrhine primates studied so far appear to be in different stages of the gene conversion process, with some gene pairs accumulating mutations in such important elements as the (TG), stretch and its flanking sequences and thus may be escaping from the conversion process. The human and rhesus y-gene pairs each appear to be continually involved in conversions and thus are not accumulating the mutations necessary to escape the conversion process, whereas the orangutan and gorilla y-gene pairs each appear to be escaping the conversion process. Evolutionary divergence between the two nonallelic y-genes is subject to selective pressures because at least one y-globin gene must remain functional. The other y-gene member must also remain functional if expressed, but if silenced it could become another @-type globin pseudogene. We previously suggested that the 5'-y-gene, which in hominoids is expressed at three-fold higher levels than the 3'-y-gene, is accordingly subject to more stringent selective pressure . If the y-gene conversion process were to become com-pletely blocked in a species such as gorilla or orangutan, the y2-gene in that species would have the likely fate of degenerating into a pseudogene. Alternatively, but a much rarer fate would be that the y2-gene would acquire a new advantageous function for the species.