Boundaries of gene conversion within the duplicated human alpha-globin genes. Concerted evolution by segmental recombination.

The human adult alpha-globin genes, alpha 1 and alpha 2, are embedded in homologous duplication units, each of which spans approximately 4 kilobase pairs of chromosomal DNA. Previous studies established that the 3'-ends of the duplication units are located adjacent to the polyadenylation sites of the two genes. We have now determined the 5'-boundary of the homology which includes both the structural genes and their upstream sequences. The 5'-flanking regions of alpha 1 and alpha 2 are perfectly homologous for 868 base pairs, with the exception of two single nucleotide differences. This is in contrast to the considerable divergence of the 3'-ends of these loci. Since the alpha-genes undergo concerted evolution by homologous unequal crossing over and/or gene conversion, the presence of adjacent regions with different degrees of homology indicates that this process is segmental. Furthermore, we have determined that an alpha-thal-2 gene, a variant alpha-globin allele resulting from unequal crossing over between normal alpha 1 and alpha 2 genes, has a mosaic arrangement of parental sequences. This patchwork structure may have arisen from a single recombination event which was limited in both the 5' and 3' directions by flanking non-homologies and in which mismatch repair occurred in a heteroduplex intermediate. Unequal crossing over and gene conversion of this type may effect the segmental concerted evolution of the human alpha-globin locus. Restriction mapping of additional alpha-thal-2 genes and of the reciprocal triplicated alpha-gene complex was consistent with this hypothesis.

The human adult a-globin genes, a1 and a2, are embedded in homologous duplication units, each of which spans approximately 4 kilobase pairs of chromosomal DNA. Previous studies established that the 3'-ends of the duplication units are located adjacent to the polyadenylation sites of the two genes. We have now determined the 5"boundary of the homology which includes both the structural genes and their upstream sequences. The 5"flanking regions of a1 and a2 are perfectly homologous for 868 base pairs, with the exception of two single nucleotide differences. This is in contrast to the considerable divergence of the 3'ends of these loci. Since the a-genes undergo concerted evolution by homologous unequal crossing over and/or gene conversion, the presence of adjacent regions with different degrees of homology indicates that this process is segmental. Furthermore, we have determined that an a-thal-2 gene, a variant a-globin allele resulting from unequal crossing over between normal a1 and a2 genes, has a mosaic arrangement of parental sequences. This patchwork structure may have arisen from a single recombination event which was limited in both the 5' and 3' directions by flanking non-homologies and in which mismatch repair occurred in a heteroduplex intermediate. Unequal crossing over and gene conversion of this type may effect the segmental concerted evolution of the human a-globin locus. Restriction mapping of additional a-thal-2 genes and of the reciprocal triplicated a-gene complex was consistent with this hypothesis.
The genes encoding the a-like chains of human hemoglobins form a small multigene family on the short arm of chromosome 16 (1-6). The a-gene cluster spans approximately 30 kb' and includes a single functional embryonic locus ({), two functional adult genes (a1 and a2), and two * This work was supported by grants from the National Institutes of Health and from the National Foundation-March of Dimes. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisernent" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Both adult a-loci are expressed (9) and direct the synthesis of identical polypeptides (10). The DNA sequences of the a1 and a2 genes confirm these conclusions (11-13) and have permitted quantitation of a l -and a2-specific RNA transcripts in human erythroid cells (14,15). In contrast to the amino acid sequence identity of the achains encoded by the nonallelic a-loci within a species, significant divergence occurs in a-globin peptide sequences between species, including closely related primates (16). The maintenance of such sequence homology among nonallelic members of a multigene family within a single species has been termed coincidental evolution (17) and, more recently, concerted evolution (18). Several mechanisms have been proposed to account for this phenomenon, including gene conversion and unequal crossing over (17)(18)(19)(20)(21)(22)(23)(24). It is believed that both of these processes involve the exchange of DNA strands between homologous parental molecules (25,26), underscoring the importance of sequence homology in mediating concerted evolution. In this context, it is of interest that electron microscopic heteroduplex analysis of cloned fragments from the human a-globin cluster revealed that the adult genes, each of which spans only 850 bp, are embedded in homologous duplication units of approximately 4 kb (3). This large stretch of sequence homology may mediate unequal crossing over and gene conversion, the repeated occurrence of which would maintain the evolutionary homogeneity of DNA sequences which otherwise would diverge in the absence of selective pressures (17-24).
The identification of individuals possessing one (27)(28)(29) or three (30-33) adult a-genes on a single chromosome instead of the normal two genes provides strong genetic evidence for the occurrence of' unequal crossing over in the human aglobin complex. The alignment of the restriction maps of the a-loci residing on the one-, two-, and three-gene chromosomes reinforces this hypothesis. Additional evidence that sequence homology in the a-cluster promotes unequal recombination is the production of deletions, which are indistinguishable from those found in the human population, upon propagation of the cloned @-gene region in Escherichia coli (3).
Although the duplicated human a-globin genes encode identical polypeptides, we (11) and others (12, 13) have recently demonstrated that the a1 and a2 genes are not identical at the DNA sequence level. Whereas the 5"untranslated regions, the three coding blocks, all of the first and the 5' four-fifths of the second intervening sequences (IVSl and IVS2) are highly homologous, the 3'-ends have markedly diverged. This finding must be reconciled by any mechanism purported to effect the concerted evolution of the a-globin genes.
The available DNA sequences only define the 3"extent of the tu-gene homology. To precisely identify the 5'-homology boundary of the a-genes, we have sequenced approximately 900 bp upstream of both a1 and a2. In addition, we have cloned and sequenced a naturally occurring product of recombination between the normal a-genes in an attempt to define the molecular mechanisms which underlie this process. Consideration of these results allows us to propose an evolutionary model to account for the differing degrees of sequence conservation within adjacent segments of the a-globin genes.

EXPERIMENTAL PROCEDURES
Normal a-Globin Gene Clones-The normal a1 and a2 genes whose 5"flanking sequences we determined were originally cloned by J. Lauer and T . Maniatis (Harvard University) and were supplied to us as plasmid subclones (3). The regions included in these clones have been described (3, 11). These plasmid subclones were constructed from the same bateriophage recombinant and therefore the a-genes which were examined derive from the same chromosome of a single individual.
Cloning ofa Rightward Deletion a-Thal-2 Gene-Lymphocyte DNA from a Chinese patient with classical hemoglobin H disease a-thalassemia (genotype a37-/--) was digested to completion with EcoRI and EamHI. DNA fragments of about 10 kb were purified by sucrose gradient centrifugation and were ligated to EcoRI-BamHI phage arms of the bacteriophage cloning vector, Charon 30 (34). The subsequent cloning and screening procedures were as previously described (35). The 5'-portion of an a-specific phage clone was then subcloned as a 1.6-kb PuuII-Hind111 fragment in pBR322.
DNA Sequence Analysis-DNA restriction site end-labeling with [r-'v2P]ATP and polynucleot,ide kinase, fragment isolation, and chemand Gilbert (36). Blunt S n a I termini were labeled with [ L U -" P ]~C T P ical sequencing reactions were carried out as described by Maxam and T4 DNA polymerase (37), while [~~-"~P]cordycepin triphosphate and terminal transferase were used to label PstI ends, as described (38). The strategy used to sequence the 5"flanking regions of the normal r u l and tu2 genes is shown in Fig. 1. The -733 and -634 the SmaI site a t position -700. The IVS2 and 3"untranslated se-nucleotides were sequenced in the a-thal-2 plasmid subclone from quences ofthe tu-thal-2 gene were determined from intragenic Hind111 and DdeI sites, respectively, using appropriate restriction fragments isolated directly from the recombinant phage DNA.
Southern Blot Hybridization-Human genomic and a-globin phage DNAs were digested to completion with ApaI (Boehringer Mannheim), subjected to electrophoresis through horizontal agarose slab gels, and transferred to nitrocellulose filters by the method of Southern (39). Filter-bound a-globin restriction fragments were subsequently detected by hybridization with either the 1.5-kb PstI-PstI or 1.0-kb PstI-Hind111 probes illustrated in Figs. 4 and 6, respectively. These probes were isolated from polyacrylamide or low melting point  agarose gels following digestion of the normal a1 and tu2 plasmid subclones (see above) with the appropriate restriction enzymes and were labeled by nick translation in the presence of [ c~-~* P ]~C T P (40).

RESULTS
Sequence Comparison of Normal a1 -and a2-Globin Genes-Previous electron microscopic heteroduplex analysis of cloned human a l -a n d a2-globin genes revealed the presence of extensive homology both within and 5' to the structural gene sequences (3). This was referred to as the Z-homology. To precisely determine the upstream boundary of Z, we sequenced approximately 900 bp of the 5"flanking regions of both a1 and a2 according to the strategy depicted in Fig. 1. Comparison of these flanking sequences reveals that they are almost identical (Fig. 2). No gaps are required to align the proximal 868 bp of the a1 and a2 genes and only two single nucleotide differences are found in these sequences: position -634 relative to the cap sites is A in a1 and G in 1x2, and position -733 is C in a1 and T in a2. Further 5 ' , a 2-bp gap must be introduced in a1 (at position -869 of a2) to maintain the sequence alignment. In contrast to the strong conservation in the proximal 5'-flanking sequences, the portions distal to the short gap abruptly diverge due to a 224-bp insertion/deletion difference between a1 and a2.' These results are in agreement with the previous heteroduplex analysis of these regions (3).
Consideration of the a1 and a2 structural gene sequences (11-13; Fig. 2) permits a complete description of the Zhomology at the nucleotide level. The two genes are identical in their 5"untranslated regions, IVS1, and all three coding blocks. The 5' four-fifths of IVS2 are also the same in the two genes except for a single base difference: position 55 is G in a1 but T in a2. In contrast, the 3'-ends of the genes have markedly diverged. The 3"untranslated regions differ by 19 of 113 nucleotides, a total of 17% divergence (11,13). The 3'portions of IVS2 also have several differences, the most notable being the absence from a 2 of 7 contiguous base pairs corresponding to nucleotides 115 to 121 of al's IVS2. In addition, a C is found in a1 a t IVS2 position 126 while this nucleotide is G in a2. Finally, a short region of homology adjacent to the polyadenylation sites terminates in the sequence CCTG(TG):,CCTG, which is also located at the +al/a2 boundary (7). This finding led to the suggestion that this short repeated sequence represents the ends of the a-duplication units (7).
In summary, there is a 1436-bp sequence, extending from nucleotide 868 upstream of the a1 and a2 cap sites to nucleotide 114 of their large introns, in which the nonallelic a-genes have continuous and uninterrupted homology with the exception of three single base differences (Fig. 3). That is, the limited (0.2%) divergence of these coding and noncoding regions is solely due to point mutations; no insertions or deletions are present, as is characteristic of the noncoding portions of other duplicated genes (41). Each end of this homology block is flanked by a short gap in one gene relative to the other and beyond these points the two seuquences are considerably more divergent. In particular, the segments between the 7-bp gap in IVS2 and the repeated sequence marking the ends of the duplication units contain 20 variant nucleotides representing 7.2% divergence.
A Recombinant a-Globin Gene Is a Mosaic of Normal a1 and a2 Sequences-The extensive sequence homology in and around the human a-globin genes represents a large target for homologous recombination. Crossing over in the Z-homologies of unequally paired a1 and a2 genes generates one ill 012 :

G C~C C G G G T G~~~C U ; G A G T G G A G T G C C C G G T G G A G G G T C A C C C C C
sPheAspLeuSerH~sGlySer/ilaClnvailysC,lyH~sClyLysLysVaIAlaArpALaLeu7hrl\.nAl~V~ dash signifies identity between a1 and a2. Asterisks designate gaps chromosome with three a-genes and a second chromosome with one a-gene (27)(28)(29)(30)(31). The latter recombinant allele is referred to as the rightward deletion a-thalassemia-2 (a-thal-2) gene. Given the a l -and a2-specific sequence markers described above, we reasoned that the fine structure of such an a-thal-2 allele might reveal some of the molecular mechanisms underlying the recombination event which led to its creation. Furthermore, since homologous but unequal recombination is believed to have occurred during the evolution of the normal a-loci (3, 18), the structure of the a-thal-2 gene might provide insight into the origin of the contemporary acomplex.

C~C A C C T G A G C C A C G G C T C~C C C A G T . T T I U
We therefore cloned a rightward deletion a-thal-2 gene and sequenced the recombinant in the vicinity of all the a1 and a2 markers. The results of this analysis are summarized in Fig. 3. Rather than having a discretely polar arrangement of a1 and a2 markers, the or-thal-'2 gene is an unexpected mosaic of normal a-sequences. The CY-thal-2 gene contains the AC dinucleotide found only in the a2 sequence at position -8691 -870, as well as an adjacent a2-specific Y block,' indicating that the most distal 5"flanking DNA is derived from a2. The more proximal upstream sequences are alternatively a l -and a2-specific since position -733 of the a-thal-2 gene corresponds to the a l nucleotide and position -634 is equivalent to that of a2 (Fig. 3). In contrast to this patchwork organization of the 5"flanking region, the 3'-end of the a-thal-2 allele is identical with a normal al-gene. That is, both the athal-2 IVS2 and 3"untranslated region share homology with 01, only. No additional sequence variations from normal asequences were present in the portions of the a-thal-2 gene that were examined.
Independent a-Thal-2 Genes Have a1 -specific IVS2 and 3'-Untranslated Sequences-Sequence analysis of a single cloned a-thal-2 gene revealed that its 3'-end is identical with a normal a1 gene. Since it is of interest to determine whether additional independent or-thal-2 genes have a common IVS2 and 3"untranslated region structure, we developed a rapid screening procedure for examining this possibility. The strategy employed is based on the placement of ApaI restriction sites both within and surrounding the normal a-globin genes. As indicated in Fig. 4 A , ApaI cleaves a short distance upstream of a2 and downstream of CUI, once between a2 and a l , once within the 3"untranslated region of a2, only, and once within IVS2 of a l , only. These sites were identified by inspection of the available DNA sequences of these regions' (Fig. 2) and were confirmed by Southern blotting (Fig. 4B). Thus, ApaI recognizes multiple sites of sequence divergence between a1 and a2.
Given this ApaI restriction map of the normal a-globin in one sequence relative to the other and have been introduced to maximize sequence homology. The alignment of the most distal 5'flanking sequences is confirmed by the additional data of Shen and co-workers.' The sequences shown here encompass the entire Zhomology which was defined by heteroduplex analysis (3) and extend into the Y-homology of a2 and the nonhomologous segment between Y and Z of nl (Fig. 7). The translational initiation and termination codons are underlined and the intervening sequences are in lower case letters. The 5"flanking sequences were determined in the present study, while the remaining sequences were compiled from previous work (7, 11-13) and are included here for comparative purposes (see text for details). We note one correction of our previously published a1 sequence (11): nucleotide 110 of IVS2 should be G, not A. In addition, our tu2 sequence differs from that of Liebhaber et al.   Fig. 2. The 3'-untranslated region divergence is not shown here in detail. IVS2 positions 5 6 9 7 9 are included to emphasize the 7-hp gap in a (r2 relative to c r l and the presence of a short direct repeat which may have mediated the deletion of these 7 bp from n2. The nucleotide assignment of each t r l -and tr2-specific marker is given in the lower line for the tu-thal-2 gene. The arrangement of parental sequences in the cr-thal-2 gene is 5'-tr2-1rl-n2-nl-3'. The n-thal-2 sequence upstream of position -869 is entirely tr2-specific.

R. blot
hyhridization o f Alpnl-digested phage clones containing duplicated Nglohin genes and an ,I-thal-2 allele. Phage clones containing either genes and the sequence of the cloned tu-thal-2 gene, we anticipated that the latter would contain the normal 0.89-kb fragment diagnostic of the 3'-end of tul, plus a new 2.5-kb fragment resulting from fusion of the upstream tu2 sequences to the Z-homology of tu1 (see "Discussion"). At the same time, the intergenic ApaI fragments should be deleted from the nthal-2 gene. These predictions are substantiated by the blot shown in Fig. 4H. Alternative tu-thal-2 3'-ends can also be identified by this approach.
For example, 2.7-and 0.69-kb fragments would replace the 2.5-and 0.89-kh bands if the CYthal-2 IVS2 and :l'-untranslated region were derived from an tu2 rather than from an t u 1 gene.
Having established a simple method for determining the ?,'-structure of a n tu-thal-2 gene, we next screened a panel of human DNAs isolated from individuals of different ethnic groups, each of whom carries such a variant allele. These subjects all have the genot-ye n".'-/--, that is, they each have a single structural tu-locus on one chromosome (the rightward deletion tu-thal-2 gene), with both tu-loci deleted from its homologue. As shown in Fig. 5 , 16 unrelated individuals representing 8 different ethnic groups all have the 2.5-and 0.89-kb ApaI tu-globin fragments. Thus, they share the tu-thal-2 gene structure that was previously identified by sequence analysis of a n tu-thal-2 allele cloned from a Chinese patient.
The Middle tu-Gene of the Triplicated n-Complex Has n2spccific IVS2 and 9'-CJntranslated Sequences-The middle nglobin gene of the triplicated tu-complex and the tu-thal-2 allele should have precisely reciprocal structures, a consequence of unequal crossing over between parental tu1 and tu2 genes. Rased on the above data for a series of n-thal-2 variants, the middle tr-gene should have an tul-specific distal 5'flanking region and tu2-specific IVS2 and :3'-untranslated the three tu-glohin genes from a triplicated tu-complex is illustrated, assuming that the middle tu-gene has an trl-specific distal :,'-flanking region and cu2"specific IVS2 and 8"untranslated sequences. The middle gene is designated tu2 to indicate the origin of its distinctive structural gene sequences. The sizes of the restriction fragments are given in kilohases. H. hlot hybridization analysis of DNAs containing triplicated tr-globin genes using Apal. DNAs isolated from individuals with normal, cy-thal-2, and triplicated n-globin genes were digested with ApaI and suhjected to Southern hlot analysis using the 1.0-kh crl-specific I'stl-HindIII fragment shown in A as the hybridization probe. Lane 1, normal; lane 2, Southeast Asian ~t h a l -2 ; lanes 3. 4, Jamaican (genotype cutrtr/trtr); lane 5 , Jamaican (genot-ype w r t r / t r -) . The presence of the 1.9-kb band in each of the triplicated tu-complexes suhstantiates the restriction map shown in A.
sequences. As a result, the ApaI fragment encompassing the middle 0-gene should be 1.9 kb long (Fig. 6A). T o test this possibility, we used a variation of the previously described ApaI blotting strategy in which the 1.5-kb M I -P s t I 0-gene probe (Fig. 44) was replaced with a 1.0-kb probe that extends from a PstI site in the 5"flanking region of 01 to an intragenic Hind111 site (Fig. 6A). When a Southern blot ofApaI-digested DNA containing the triplicated tu-gene chromosome was hybridized with this probe, a 1.9-kb band was detected, in addition to the 2.7-and 1.7-kb fragments characteristic of the normal tu-genes (Fig. 6 B ) . Since the probe lacks 3' a-sequences, the new 1.9-kb ApaI fragment must extend from an tul-specific 5'-end of the middle a-gene to its tr2-specific IVS2 and 3"untranslated region; it cannot correspond to the similarly sized fragments from the 3'-ends of both the 5' and middle a-genes (Fig. 6A). Of the three individuals with triplicated tu-complexes who were examined, two have the (Y(Ya/(YN genotype (Fig. 6B, lanes 3 and 4 ) , while the third must be t~tutu/tu-since an additional 2.5-kb band diagnostic of an athal-2 gene is present (Fig. 623, lane 5). In summary, the middle a-gene on a chromosome bearing three a-loci has an IVS2 and 3"noncoding region that are exactly reciprocal in sequence to those of the a-thal-2 allele.

Boundaries of Sequence Homology between
CUI and 1u2 Genes-We have demonstrated that almost 900 bp of the proximal 5"flanking regions of the human tr-globin genes are highly homologous, whereas the sequences immediately upstream of nucleotide -869 diverge significantly (Fig. 2). This information, combined with the sequences of the structural genes themselves (11-13), permits us to define regions of segmental homology in and around the tul-and cu2-globin genes. Since these homologies encompass sequences which are under no apparent constraint from divergence, they are presumed to result from specific rectification mechanisms which effect concerted evolution (17-24). These include gene conversion, unequal crossing over, or a combination of these two processes.
Three segments, each having a different degree of uninterrupted sequence homology, can be distinguished in the homologous region previously identified by electron microscopic heteroduplex analysis and referred to as Z. By uninterrupted, we mean maximal homology alignment without the introduction of gaps. The first segment extends for 1436 bp from a dinucleotide gap at position -869 to a 7-bp gap at nucleotide 115 of IVS2 and contains 3 single nucleotide mismatches (0.2% divergence). The second spans the 229 bp between the IVS2 gap and a single base length difference at nucleotide 76 of the 3"untranslated regions, while the third continues for an additional 53 bp from the latter position to the presumptive ends o f t h e tu-duplication units (7). For simplicity of further discussion, we will consider the second and third blocks as one continuous stretch of 283 bp containing 20 single nucleotide differences (7.1% divergence). However, the interpretation which follows can be simply modified to accommodate the additional complexity.
The most striking features of the two homologous segments defined above are their differing extents of sequence homology (>99% as opposed to 93% for the regions 5' and 3' to IVS2, respectively) and their sharp demarcation by flanking nonhomologies. If gene conversion and unequal crossing over serve to homogenize these sequences during the course of evolution, then these processes must be acting in a differential manner on the two adjacent regions. We therefore propose that there are two contiguous but independent conversion units encompassing the human a-globin loci, the boundaries of which are the dinucleotide gap at position -869 of the 5'flanking regions, the 7-bp gap in IVS2, and the direct repeats at the 3'-ends of the a-duplication units. In keeping with the established nomenclature for the homology blocks of the agenes, we refer to these as ZLc and Zsc for the long and short conversion units on the 5' and 3' sides of the IVS2 gap, respectively (Figs. 3 and 7). A specific model for the independent concerted evolution of ZLc and Zsc is outlined below, the details of which derive in part from the structure of a variant a-globin allele that is a product of unequal recombination between normal a1 and a2 genes.
Molecular Basis for the Mosaic Structure of an a-Thal-2 Gene-The cloned a-thal-2 allele is composed of a mosaic of parental a1 and a2 sequences (Fig. 3). The 5'-end of this recombinant gene is alternatively a2-, a l -, and a2-specific, while its 3'-end (including IVS2 and the 3"untranslated region) is derived entirely from a l . The al-specific identity of the 3'-noncoding sequence of the a-thal-2 gene was anticipated from earlier studies of a-globin mRNA obtained from individuals carrying this allele (14, 15). However, the present results establish that the maximum 3"extent of the crossover which led to the creation of the a-thal-2 gene is the 7-bp gap in IVS2 of a2, as depicted by the hatched box in Fig. 7. It should be noted that this site corresponds to our assignment of the 3'-border of the long conversion unit, ZLc, and that multiple a-thal-2 alleles from individuals of different ethnic origins have the same sequence in this region. Based on the size of the a-thal-2 deletion (27)(28)(29), its 5'-boundary must lie 3.7 kb (that is, the intergenic distance) upstream in the homologous position of a2. The deletion breakpoints cannot be defined more precisely due to the absence of additional a l -   a2 (Q'R') genes (Z) is presumed to occur initially by asymmetric strand exchange followed by symmetric heteroduplex formation (ZI; (30)). The resulting branch or joint between the interacting DNA molecules is free to migrate through regions of sequence homology.
The arrow in ZZ indicates the direction of net movement. However, upon reaching a nonhomologous segment such as the 7-bp gap in IVS2 of a2 (indicated by the loop in a l ) , further movement of the branch is inhibited (ZZZ) and resolution of the recombination intermediate produces the structures depicted in ZV. Q'R corresponds to the a-thal-2 gene and QR' corresponds to the middle a-gene of the triplicated chromosome. The non-homology at the 5'-ends of ZLC would serve a similar function but has been omitted here for clarity. Although reciprocal exchange of the flanking arms is depicted, a similar model can be constructed in which only patches are exchanged with maintenance of the parental configuration of the flanking sequences. The latter model would account for segmental gene conversion events which are limited in extent by sequence non-homologies and which do not involve expansion and contraction of gene number. Such a mechanism is applicable to the human y-globin genes (23) and may also occur in the human a-globin gene complex. and a2-specific sequence markers within IVSl and the first two exons. Thus, the deletion spans regions A + B or E + C of Fig. 7, where A and C are the ZLC homologies of a2 and al, respectively. In either case, the a-thal-2 gene represents a fusion of the ZI,c portions of a1 and a2, although not necessarily a structural gene fusion (14).
Although three separate exchanges within ZLc could account for the observed mosaic sequence arrangement of the cloned a-thal-2 gene, we propose that this novel structure was generated by a single recombination event between unequally paired a1 and a2 genes, a process which leads to the deletion of the intergenic region that was excluded from the original pairing (27)(28)(29). The molecular details of this proposal can be considered separately for the 5'-and 3'-ends of the a-thal-2 gene.

Human a-Globin
Gene Conversion Units 15251 a barrier to branch migration as a mechanism to limit the extent of hybrid DNA formation; this in turn restricts marker exchange. This barrier or obligatory recombinational boundary is the non-homology at the 3'-end of IVS2 and consists of a 7-bp insertion/deletion difference between a1 and a2 (Figs. 2 and 3). As noted earlier, this region defines the maximum 3"extent of the a-thal-2 deletion, as well as the 3"boundary of Z,,(.. In Fig. 8, QR and Q'R' represent homologously but unequally paired a1 and a2 genes, respectively, with the IVS2 non-homology denoted by the loop in al. Recombination is presumed to occur by homologous strand displacement and uptake, loop cleavage, strand assimilation and isomerization with reciprocal exchange of the flanking arms ( Fig. 811; Ref. 25). The resulting heteroduplex joint is then free to migrate bidirectionally through regions of sequence homology (42,43). However, this symmetric exchange cannot penetrate the IVS2 non-homology, resulting in the termination of branch migration at or 5' to the point marked T, (Fig. 811Z), and the resolution of the recombination intermediate into the reciprocal products, QR' and Q'R (Fig. 81V). The ty-thal-2 gene corresponds to Q'R and thus must include the additional 7 bp characteristics of al's IVS2, as well as the al-specific sequences 3' to this point. This prediction is satisfied by the structures of the a-thal-2 genes that we have examined. Furthermore, this model requires the reciprocal crossover product to have an IVSP and 3"untranslated sequence characteristic of a parental a 2 gene. This is indeed the case, as demonstrated by the ApaI mapping analysis of the triplicated a-globin genes (Fig. 6).
The dinucleotide gap at the 5'-end of ZL(. could serve a similar function as that elaborated for the IVS2 non-homology. As predicted by our model, the cloned a-thal-2 gene contains these two nucleotides and the more distal a2-specific sequences. This finding reinforces our earlier definition of the 5"boundary of Z1,r. Although the above model is presented as a reciprocal exchange of the flanking DNA sequences with the resulting expansion and contraction of gene number, the process could also take place as a patch exchange in which the parental configuration of the flanking arms is maintained. Gene conversion could occur by this type of interaction between a1 and a2 sequences (see below). Alternatively, the recombination may take place intrachromosomally with concomitant intergenic deletion but without the production of the reciprocal duplication (29). The actual occurrence of interchromosomal or interchromatid unequal crossing over is inferred from the existence of chromosomes bearing either one or three a-genes in the human population (27,31). However, the more frequent occurrence of intrachromosomal recombination could combine with selective pressures to produce the observed predominance of the a-thal-2 chromosome over the triplicated state (44).
The unique feature and major proposal of our molecular model is that a non-homology is capable of blocking branch migration. Evidence has been obtained that this may indeed occur in E. coli (45)(46)(47)(48) and in certain fungi (49). Whether such a process also occurs in mammalian cells remains to be determined.
If the heteroduplex which formed as a recombination intermediate between paired a1 and a2 genes (Fig. 811) extended into the 5'-flanking regions, then it would include two single nucleotide mismatches, one at position -634 and the other at position -733 (Fig. 9). Tracts of hybrid DNA can indeed be sufficiently long to encompass the entire Zl,c homology (50)(51)(52). Repair enzymes could then act at these sites to correct the mispairing using one of the two parental strands as template . Because one of the two nucleotides is a1- specific and the other is a2-specific in the a-thal-2 product, both of the heteroduplex strands must have been selected as the repair template, one in each of the adjacent segments. These two sites are separated by only 100 bp and therefore could be included in a single excision tract produced during mismatch repair (53,54). Thus, either excision tracts on each of the complementary strands must have terminated within the 100 bp separating the two mismatches or a more sitedirected mechanism was operative. Whatever the details of its origin, the 5"flanking region of the a-thal-2 gene reflects the two genetic consequences of gene conversion: sequence homogenization and the generation of a new nucleotide sequence (56). A History of the Segmental Concerted Evolution of the a-Globin Genes-The nonallelic human a-globin genes have >99% sequence homology in their proximal 5"flanking regions and in most of their coding and noncoding segments. This is the hallmark of concerted evolution (18). In contrast, the sequences contained within the duplication units 3' to the previously described non-homology boundary in JVS2 are significantly less conserved. These findings led us to propose that the IVS2 non-homology delimits two independent conversion units, Zr.c. and Zsc (Figs. 3 and 7). Expanding on the proposal that non-homologies inhibit branch migration during genetic recombination (Fig. 8), we now describe an evolutionary history of the a-globin genes in which segmentat strand exchanges account for the observed polarity of sequence conservation within these loci.
A single ancestral tu-globin gene (ag, Fig. 10) is presumed to have inserted into staggered chromosomal breaks so as to produce a short direct repeat on either side (7, 57, 58). Subsequent duplication could occur by homologous recombination between the flanking repetitive sequences (59), giving rise to two initially identical, closely linked repeat units (aI~.s. and a I '~l ' , Fig. 10). At some time following the duplication of the original tu-gene, 7 bp were deleted from IVSX of the 5' member of the pair, perhaps by slipped mispairing mediated by a short direct repeat (GGCC, Fig. 3) during replication (60,61); insertion of the 7-bp sequence in the 3' gene cannot be excluded, however. These duplicated genes are the immediate ancestors of the contemporary a1 and a2 sequences.
Due to the length of the sequence homology both within and flanking the a-genes, a product of the original amplification, unequal alignments between a1 and a2 genes on the same or different DNA duplexes (that is, intrachromosomal, undergoes duplication to produce two initially identical precursors to the contemporary n-globin loci (~p .~. and CY^.^, evolve to a2 and a l , respectively). A deletion of 7 bp (indicated by the "v" in a2 and the A In t u l ) then occurs in IVS2 of ap.5' which corresponds to the IVS2 gap in the current a2 gene. This sequence non-homology now demarcates two adjacent homologous segments within a2 and al-the long 5' ZLr unit ( I in a2 and o in a1) and the shorter 3' ZSC unit (o in a2 and m in a l ) . Unequal recombination between the ZLc segments of a2 and a1 matches these sequences during the course of evolution by the combined processes of random sequence shuffling (shown here) and gene conversion (Fig. 9). The IVSZ non-homology prevents recombination events which initiate in ZLC from propagating into Zsc.
Similarly, independent events that occur in Zsc cannot include ZLc (not shown). Thus, although both ZLC and ZSC undergo concerted evolution, they do so independently by obligatory segmental recombination mediated by flanking regions of non-homology. interchromosomal, or interchromatid pairings) are possible. This is shown in the middle of Fig. 10 as an equal pairing of a n a 2 gene on chromosome A and an a1 gene on chromosome B. Homologous recombination can then occur, the probability of which is much greater on the 5'-side of the IVSB gap than on the 3'-side due to the larger target size of the former region. The reciprocal products of such an unequal crossover are chromosomes A' and B' containing one and three agenes, respectively (Fig. 10). If a second unequal crossover takes place between the single a-gene on chromosome A' and the middle a-gene on chromosome B', the products are chromosomes A" and B" each containing two a-genes once again. The net result is that through expansion and contraction of gene number mediated by homologous but unequal crossing over, one of the two original a-sequences has become predominant on a single chromosome. This can be seen clearly in and B. Although this basic mechanism has been elaborated by others (17-22, 24), the unique feature that we have introduced is the recombinational barrier in IVSB (Fig. 8). Thus, the IVS2 non-homology isolates the 3'-ends of the genes from the processes that homogenize their 5' sequences. In a similar manner, strand exchanges that initiate within Zsc cannot propagate into Zr,c. Repeated rounds of such segmental recombination lead to the independent concerted evolution of the adjacent homology blocks. Since crossovers should occur less frequently in the shorter Zsc segment, these sequences should be less homologous than those of Zr.c. This is indeed the case: the two Z1.c units diverge by only 0.2% while the two Zsc units diverge by 7.1%. Significantly, all of the a1 genes that have been analyzed have the same IVS2 sequence including the 7 bp of interest, while all of the examined a2 genes lack these same 7 bp (11-13, 35,62). Similarly, the 3"untranslated regions are unique to a1 and a 2 with no interchanges apparent (14,15). If branch migration proceeded past the IVSB non-homology, then we would expect to find a1 and a2 genes resembling each other with respect to their 3'-ends. The common 3'-end structure of the multiple a-thal-2 genes that was revealed by the ApaI blot analysis (Fig. 5) also supports the concept of segmental recombination between 01 and a2.
In addition to the homogenizing effects of unequal crossing over, the physical interaction between parental DNA strands in the recombination intermediate can lead to sequence correction by mismatch repair (Fig. 9). This represents a true gene conversion mechanism (24,25,63), as manifest in the structure of the cloned a-thal-2 gene. It should be emphasized that although gene conversion can occur independently of unequal crossing over, these two processes may function simultaneously in matching related sequences. Since the relative contribution of each cannot be assessed in the case of the human a-globin genes, we have referred to ZLc and Zsr as conversion units.
It is striking that although the Zsc segments of a1 and a2 are significantly divergent, no nucleotide differences are found in exon 3. If this region belongs to ZSC, then silent substitutions would be expected to accumulate at a rate similar to that of noncoding regions (64). However, sequence comparisons of pseudogenes and their functional counterparts have revealed that codon usage by active genes is non-random (65, 66). Functional constraints against synonymous codon changes may therefore explain the lack of divergence in this short coding block. An alternative possibility that is consistent with our model is that the exon 3 homology is due to a recent conversion which was limited to this coding block.
The noncoding region 5' to the Z blocks of a1 and a 2 contain several sequence discontinuities interspersed with the X and Y homologies (3, 7), the evolution of which is consistent with a segmental correction mechanism.' In agreement with the expected correlation between conversion unit length and degree of homology (see above), the relatively short Y blocks are more divergent than either ZSC or ZLC.' Also relevant to this point is the finding from statistical analysis of the human y-globin gene sequences that regions lacking sequence gaps have a reduced number of single nucleotide substitutions (67).
The evolution of the human y-globin (23, 68) and goat Nglobin (69) genes can be traced in a similar manner as that of the human a-loci. In each case, a gap in an intervening sequence forms a boundary between independent conversion units having homologies proportional to their respective lengths. A 20-bp gap is found in IVS2 of the human Ay-gene relative to its ('y counterpart, while a 7-bp gap is present in IVSl of the goat "a-gene compared to the linked 'a-locus. Additional non-homologies are located at the opposite ends of these presumptive conversion units. Length-dependent segmental recombination during the course of evolution would then produce the patterns of sequence homology evident in these contemporary genes. The closely linked human {-and +{-globin genes also possess regions of homology flanked by nonhomologous sequences (8). Rapid drift in the intervening sequences and flanking regions of other gene pairs may similarly accelerate the divergence of their duplication units if the types of mutations and their rates of accumulation have been sufficient to overcome homogenizing influences (41,61,70). The frequent occurrence of short insertions and deletions (41,71), as well as the transposition of repeated elements to intervening or flanking sequences (72,73),' would effectively disrupt gene correction mechanisms according to our proposal. Thus, conversion units, which are initially co-linear with the original duplications, are gradually reduced in size by sequence alterations that interfere with homogenizing recombination events. The human a-globin genes represent one stage of such a process in which considerable flanking homology is preserved along with that of the structural genes. However, the extreme 5 'portions of the a-duplication units have begun to diverge (3,  7). The duplicated goat a-globin genes may have reached a later stage of divergence in that their only conserved regions are those contained within the transcription units (69). An even more advanced stage in which structural gene sequences diverge is typified by the mouse @ma'-and /3"'"-globin genes (41), the chicken a*-and aD-globin genes (74), and the human 6-and @-globin genes (61). Finally, the formation of pseudogenes in some multigene families may be an extreme consequence of the isolation from sequence rectification mechanisms.
Acknowledgments-We thank Joyce Lauer and Tom Maniatis for providing the normal n-globin gene clones used in our sequence analysis and Douglas Higgs and John Phillips for the genomic DNAs used in the mapping studies. We are grateful to James Shen and coworkers for communicating their work prior to publication and to James Shen, Jack Szostak, Matthew Meselson, and Richard Kolodner for stimulating discussions on the mechanisms of genetic recombination and the evolution of multigene families. The excellent technical assistance of Sabra Goff is greatly appreciated.