Concerted Evolution of the Primate Immunoglobulin a-Gene through Gene Conversion*

We determined four nucleotide sequences of the hom- inoid immunoglobulin a (Ca) genes (chimpanzee Ca2, gorilla Ca2, and gibbon C a l and Ca2 genes), which made possible the examination of gene conversions in all hominoid C a genes. The following three methods were used to detect gene conversions: 1) phenetic tree construction; 2) detection of a DNA segment with ex- tremely low variability between duplicated Ca genes; and 3) a site by site search of shared nucleotide changes between duplicated C a genes. Results obtained from method 1 indicated a concerted evolution of the duplicated Ca genes in the human, chimpanzee, gorilla, and gibbon lineages, while results obtained from method 2 suggested gene conversions in the human, gorilla, and gibbon Ca genes. With method 3 we identified clusters of shared nucleotide changes between duplicated Ca genes in human, chimpanzee, gorilla, and gibbon line-ages, and in their hypothetical ancestors. In the present study converted regions were identified over the entire C a gene region excluding a few sites in the coding region which have escaped from gene conversion. This indicates that gene conversion is a general phenome- non in evolution, that can be clearly observed in non-functional regions. The mammalian immunoglobulin heavy chain constant defined as b sites which flank a cluster of a or c sites, but this definition gives an upperlimit for a converted region and there is a risk of overestimating the span of conversions.

I To whom correspondence should be addressed.
The abbreviations used are: CH, immunoglobulin heavy chain constant region; bp, base pair(s); CH, domain of immunoglobulin heavy chain constant region.
with that of the A2m(l) allele of the Ca2 gene, but is different from that of the A2m(2) allele. Thus, they argued that there was a localized transfer of the genetic information from the 3' end of the Cal gene to the A2m(l) allele of the Ca2 gene through gene conversion.
Gene conversion is broadly defined as a nonreciprocal superposition of the information of one piece of DNA onto another (3). Although its molecular mechanism is not well understood, segmental homology between related genes is often attributed to gene conversion. It can occur between alleles on homologous chromosomes or between related genes on the same, sister, or nonhomologous chromosomes of an organism at mitosis or meiosis (4). Gene conversion has been suggested as a process of concerted evolution which maintains or creates sequence homogeneity in the related genes (5). The existence of concerted evolution can be shown when duplicated genes share the same nucleotide changes within a species but different changes between species, and gene conversion is suggested when the nucleotide positions with such changes are contiguous (6).
We previously studied the organization of the CH gene cluster in non-human primates (7)(8)(9)(10). Chimpanzee and gorilla were found to have the Ct2-Cal and Ctl-Ca2 regions as does human, although all exons and introns of the chimpanzee Cc2 gene are deleted (7). Orangutan, gibbon, and Old World monkey have one, two, and one Ca genes, respectively (8), and have one Ct gene in their CH gene clusters (9). Our recent study shows that duplication, including the Cy-Cy-Ct-Ca genes, occurred in the common ancestor of hominoids (human, chimpanzee, gorilla, orangutan, and gibbon), followed by the deletion of the gibbon Ct gene upstream from the Cal gene and also of one of the Ct-Ca regions in orangutan. Support for this gene phylogeny can also be found when the evolution of the Ca hinge region is considered. Human, chimpanzee, and gorilla share different hinge structures in their C a l and Ca2 genes (four tandem repeats of a 15-bp unit with a 6-bp overlap in the Cal genes and only one 15-bp unit in the Ca2 genes) (7,lO). The hinge region of the orangutan Ca gene is a Cal type of human, chimpanzee, and gorilla, while that of the gibbon C a l gene consists of two tandem repeats of a 15-bp unit with a 6-bp deletion. The hinge region of the gibbon Ca2 gene is a Ca2 type of human, chimpanzee, and gorilla (8). The most parsimonious explanation for the evolutionary history of the hominoid Ca hinge region is a gene phylogeny in which duplication of the Ca gene occurred in the common ancestor of hominoids and the deletion of the Ca2 gene occurred in the orangutan lineage (Fig. 1).
On the basis of the Ca gene phylogeny described above, we examined in the present study whether gene conversion has occurred in the duplicated Ca genes of primates. We sequenced chimpanzee and gorilla Ca2 genes as well as Cal and Ca2 genes of gibbon. These new sequence data made it possible to compare all the nucleotide sequences ol" hominoid and Old World monkey Ca genes. Our phylogenetic analysis For convenience sake we used a tricotomous topology for the relationship among human, chimpanzee, and gorilla. Hu, Ch, Go, Or, Gi, and OWM indicate human, chimpanzee, gorilla, orangutan, gibbon, and Old World monkey, respectively. This tree topology is most parsimonious for the evolution of the Ca hinge region in hominoids. Since the hinge region of the ancestral Ca gene of hominoids is thought to consist of two tandem repeats of a 15-bp unit (8), four duplication or deletion events are required in its evolution: 1) a deletion of the first 15-bp unit of the Ca2 gene; 2) a 6-bp deletion from the first unit of the gibbon C a l gene; 3) a duplication of 2 units resulting in 4 units with 6-bp overlap; and 4) a 6-bp deletion from the first unit of the orangutan Ca gene.  (2)], the chimpanzee Cal and Cap, the gorilla Cal and Ca2, the orangutan Ca, the gibbon Cal and Ca2, and the crab-eating macaque Ca genes (designated by Hul, Hu2-1, Hu2-2, Chl, Ch2, Gol, Go2, Or, Gil, Gi2, and Cr, respectively) are shown. For clarity, the only complete sequence shown is the human C a l gene and only nucleotides different from it are shown in other Cn genes. Gaps are designated by hyphens. Dots represent positions for which no data are available. Noncoding and coding sequences are denoted by lower case and capital letters, respectively. Landmarks for CH exons, introns, and the hinge region are indicated above the human C a l sequence, and those for termination codon, putative polyadenylation signal (aataaa), and alternating purine and pyrimidine sequence are represented by Ter, poly-A, and Pulpy, respectively. Human Cal and Ca2 gene sequences are from Flanagan et al. (2), and the nucleotide sequences of the chimpanzee C a l , the gorilla Cal, the orangutan Ca, and the crab-eating macaque Ca genes are from Kawamura et al. (28).
showed that gene conversions occurred on the duplicated Ca genes of the chimpanzee, gorilla, gibbon, and their ancestors. We also found that human Ca gene conversion occurred not only between the Cal and the Ca2[A2m(l)] genes but also between the Cal gene and the ancestral Ca2 gene of the two alleles.

MATERIALS AND METHODS
DNA Manipulations-Recombinant phage clones containing chimpanzee (Pan troglodytes) Ca2, gorilla (Gorilla gorilla) Ca2, and gibbon (Hylobates lar) Cal and Ca2 genes were isolated previously from their genome libraries (7,8). A restriction fragment from each phage clone containing the Ca gene was subcloned into pUC119 or Bluescript plasmid vectors in both directions. Using exonuclease 111 and mung bean nuclease (Toyobo Co., Ltd.), varying degrees of unidirectional deletions were introduced into the insert of each plasmid. Single-stranded DNAs were rescued using a helper phage and their nucleotide sequences were determined by the dideoxynucleotide chain termination method. All sequencing runs were performed on both sense and antisense strands. These DNA manipulations were performed using standard procedures (11).

G C C G A G C C A n ; G M C C A~G A C~C A C~C~~A C C C f f i A G T~G A C C C~M C~C A C~~~T C f f i g t~C C 8 g a C C C t g C t C~C C t g C
1080 Hul tcagtgctctggtttgcaaagcatattcct~cctgcctcctccctcccaatcctgggctccagtgctcatgccaagtacaca~aaactga~a~ctga~-ccaEacacagc 1200 Hu2-1 C All?-?

FIG. 2.--continued
Hol Hu2-2 Ho2-1 Chl Or ~ Gil Gi2 estimated number of nucleotide substitutions per site ( K ' ) between two DNA sequences was computed using the formula: where K is the number of nucleotide differences per site (12). The variance of the K' value is given by V(K') = K(l-K)/ n(l-4K/3)', where n is the number of nucleotides compared (13). The standard error (S.E.) of the Kc value is given by the square root of V(K'). The human Ca2[allele A2m (2)] was excluded from this computation because its nucleotide sequence in the upstream portion from position 783 of Fig. 2 has not yet been published. In computation, gap sites were excluded and the remaining 2354 bp were used.
A relative rate test (14) was performed by comparing the evolutionary distances ( K ' ) from the crab-eating macaque C o gene to hominoid Ca genes applying the statistical method of Wu and Li (15). When K' between hominoid gene i and crab-eating macaque gene (m) is referred to as Kfm, a variance of Kim -K g , is given by (15).
A phenetic gene tree was constructed from the K' distance matrix using the unweighted pair group method with arithmetic mean (16) and the S.E. value of each branching point of this tree was calculated using the method described by Nei et 01. (17). Using these S.E. values we tested the statistical significance of the tree topology (see Ref. 17 for a detailed explanation of the method).
Regions of extremely low variability between Cal and Ca2 genes of a species were identified using Stephens' (18) method under an assumption of random distribution of varied and unvaried sites. The probability ( p ) of observing at least one unvaried segment as long as or longer than an unvaried segment observed, is given by where s is the number of varied sites, r is the total number of unvaried sites placed between varied sites, and go is the length of consecutive unvaried sites (18). An insertion/deletion site was regarded as one varied site regardless of its length.
A site by site reconstruction of gene conversion was performed using Fitch et al.'s method (6). At each aligned position (Fig. 2), a parsimony reconstruction was made of the nucleotide changes which occurred, assuming the Co gene phylogeny shown in Fig. 1. Each k-0 insertion/deletion was regarded as 1 nucleotide change regardless of its length. Each variant sequence was classified into four categories by every varied site (a) variants shared between the Cal and Ca2 genes of a single species (shown as open circles in Figs. 5a and 6); (b) variants shared among orthologous C a l or Ca2 genes of different species, but not between paralogous Cal and Ca2 within a species, or those shared between two alleles of the human Ca2 gene but not with the hxman Ca1 gene (shown as open squares in Figs. 5b and 6); (c) variants in which paralogous genes of the same species had the same nucleotide (shown as triangles when the Cal gene changed, or inuerted triangles when the Ca2 gene changed (Figs. 5c and 6); and ( d ) all other variants (shown as full circles in Figs. 5d and 6). (Orthologous genes are related genes which are derived by speciation from a single common ancestral gene, whereas paralogous genes are those which arose from gene duplication events within a species.) In cases where there were multiple equally parsimonious solutions, we chose category d because this provides a conservative answer concerning the occurrence of gene conversion (e.g. position 47 of Fig. 2). When either category a or c was possible, category a was chosen as a conservative determination of gene conversion direction (e.g. position 18 of Fig. 2). When two equally parsimonious solutions fell into the same category, both solutions were indicated by a pair of bars (left and right, up and down, left-up and right-down, or left-down and right-up) (see Fig. 6). If two or more a or c sites were contiguous, a gene conversion event over the region containing those sites was inferred. On the other hand, if b sites were contiguous, a gene conversion over the region was not inferred (6).
It should be noted that the boundaries of the converted regions were defined as the outermost a or c sites lacking two or more consecutive b sites. In the original method (6, 19-21), boundaries were defined as b sites which flank a cluster of a or c sites, but this definition gives an upperlimit for a converted region and there is a risk of overestimating the span of conversions.

RESULTS AND DISCUSSION
We determined the nucleotide sequences of chimpanzee Ca2, gorilla Ca2, and gibbon Cal and Ca2 genes, and compared them with human Cal and Ca2[A2m(l) and A2m(2) alleles], chimpanzee C a l , gorilla Ca1, orangutan Ca, and crab-eating macaque C a genes (Fig. 2). All sequences have the same exon-intron organization and have neither a termi- Phenetic Tree Construction- Table I shows the Kc distance matrix for primate C a genes. Other estimation methods (22,23) for superimposed nucleotide substitution gave essentially the same K' values as did the Jukes-Cantor method used in the present study (data not shown). A relative rate test (14) was performed for the hominoid genes with the crab-eating macaque gene as an outgroup reference using Wu and Li's (15) statistical method (Table 11). No significant variation was observed for evolutionary distances from the crab-eating macaque gene among hominoid genes (see Table 11), indicating the existence of an almost constant evolutionary rate for hominoid Ca genes. Therefore, we used the unweighted pair group method with arithmetic mean (16) to construct a phenetic gene tree (see Fig. 3) because this method produces a good evolutionary tree when the expected rate of nucleotide substitution is constant and because clusterings can be easily tested using S.E. values for each branching point (17).
In the phenetic tree, the two C a genes of human, chimpanzee, gorilla, and gibbon formed statistically significant clusters in every species (clusters A, C, B, and D of Fig. 3, respectively) at the 5, 5, 0.5, and 0.01% levels, respectively, when the onetailed t test was used with infinite degrees of freedom. A clustering of the two paralogous Ca genes in every species was also observed in gene trees constructed using the neighbor-joining method (24) and the maximum parsimony method (25), both of which do not assume rate constancy of nucleotide substitution (data not shown). Because of the Ca gene duplication in the common ancestor of hominoids and the deletion of the orangutan Ca2 gene (see Introduction and Fig. l ) , these results indicate that a concerted evolution of the Ca genes occurred in human, chimpanzee, gorilla, and gibbon lineages.
Statistical Test for Detecting a Cluster of Unvaried Sites-To examine whether gene conversion is the underlying mechanism for the concerted evolution of hominoid C a genes and t o elucidate where it occurred in the gene region, we searched for consecutive unvaried regions between paired human, chimpanzee, gorilla, and gibbon Ca sequences. Observation of such regions is expected when gene conversion has occurred between two gene loci. Fig. 4 shows the distribution of the varied sites between the Cal and Ca2 genes. Varied sites include sites of nucleotide substitution and insertion/deletion. An insertion/deletion site was regarded as one varied site regardless of its length.  Under the assumption of random distribution of varied and unvaried sites, regions of extremely low variability can be identified using Stephens' (18) method. Therefore, we applied this method to our data. In computation we excluded the hinge region because this region is known to have evolved through frequent recombinational events (8). The longest unvaried segment was significantly long in human, gorilla, and gibbon Ca genes (see Fig. 4). Since the next longest segments in human, gorilla, and gibbon were not significantly long, we excluded the longest segment and recalculated the probability of the next longest segment. In this procedure, r is substituted by {rlongest go) and s by {s -1) (18). The next longest segment was significantly long only in gibbon (Fig. 4). Thus we identified the regions which have extremely low variability under the assumption of random distribution of variable sites (regions 2, 4, 6, 8, and 9 in Fig. 4; see the figure legends for details).
Site by Site Reconstruction of Gene Conversions-Stephens' (18) method is effective in detecting gene conversions spanning long stretches of sequences, but we cannot rule out the possibility that low variability may be attributed to a higher selection pressure or lower mutation rate than in other regions. Therefore, in the next step, we applied the site by site reconstruction method which was used to identify y'-/y2-globin conversions and &/O-globin conversions during primate evolution (6,(19)(20)(21). This method, 1) is effective in detecting gene conversions for both short stretches of sequences and long ones with nucleotide changes after conversion, both of which Stephens' method fails to detect; 2) is capable of examining whether low variability between the two C a genes of a species is due to lack of nucleotide changes or due to shared nucleotide changes between them; and 3) is capable of detecting gene conversions in the ancestral species.
Once a gene tree is determined, we can identify where a nucleotide change occurred in the tree and the nature of the sequence change on the basis of the maximum parsimony principle. Each insertion/deletion was regarded as 1 nucleotide change regardless of its length. Each variant site was classified into four categories, a, b, c, and d (see "Materials and Methods" and Fig. 5), assuming the gene tree of Fig. 1. By mapping these sites along the C a gene region for the paralogous gene pair of each species including ancestors, we identified clusters of shared nucleotide changes between paralogous Ca genes of each species (Fig. 6).
In gorilla and gibbon, the regions of extremely low variability identified in Fig. 4 overlap the clusters of shared nucleotide changes categorized as a (Fig. 6, V and VI). This indicates that the low variability between the duplicated C a genes is not because of scarce nucleotide changes in these regions, but because of shared nucleotide changes, indicating the gene conversions in these regions suggested by using Stephens' (18) method. Besides this large cluster of a sites in the downstream region from intron 2, small stretches of possible gene conversions were observed in the region upstream from the CH2 exon (Fig. 6, V and VI).
In human genes, however, the conversion in region 2 of Fig.  4, which was identified using allele A2m(l) as a Ca2 gene, is indicated only by a short cluster of two a sites (Fig. 61). Furthermore, conversion in region 4 of Fig. 4, which was identified using allele A2m(2) as a Ca2 gene, is not indicated by any such cluster (Fig. 611). Instead, we found a long stretch of a sites in the gene pair of human Cal and ancestral Ca2 gene of the two alleles, which overlaps regions 2 and 4 of Fig.  4 (Fig. 6111). Therefore  order to find gene conversions, we use four hypothetical varied sites. I n case a, the root nucleotide is c, thus variants are the gorilla C a l and Ca2 genes. These variants are caused by shared nucleotide changes ( c to g) between the paralogous genes in a single species, suggesting a gene conversion in gorilla over this site, and they are presented by open circles. In case b, the root nucleotide is g and a change ( g to a ) occurs in the common ancestral Ca2 gene of hominoids (this change is not shared with its paralog and is indicated by a full circle). This varied nucleotide is retained during hominoid evolution among the Ca2 genes and no additional change occurred in t h e C a l genes of hominoids as well, suggesting no gene conversion before the divergence of the human Ca2 gene into the present two alleles. In the Cal and CaB[allele A2m(l)] gene pair, three possible conversions with short stretches were found (Fig. 61).
Although the Stephens' (18) method detected no significant stretch of unvaried sites in chimpanzee (Fig. 4), we identified three possible regions for gene conversion with the site by site method (Fig. 6ZV). Distribution of these regions is similar to those observed in human, gorilla, and gibbon, i.e. the largest one in the region downstream from intron 2 and smaller ones in the region upstream from the CH2 exon. Therefore, lack of a significantly long stretch of unvaried sites in this downstream portion in chimpanzee is considered to be due to nucleotide changes after conversion. Possible converted segments were also observed in the hypothetical gene pair of the common ancestor of human, chimpanzee, and gorilla, and in that of orangutan and human, chimpanzee, and gorilla (see Fig. 6, VZZ and VZZZ), distributions of which are similar to after the hominoid divergence over this site. Therefore, we give open squares to the Ca2 genes of human, chimpanzee, gorilla, gibbon, and hypothetical ancestors at this varied site. Case c is different from case b in that additional nucleotide changes occur in the chimpanzee C a l and the gorilla Ca2 genes after a nucleotide change occurred in the common ancestral C a l gene of hominoids. These additional changes result in the same nucleotide with their paralogs, suggesting a gene conversion of the chimpanzee Cot1 gene by the Ca2 gene and that of the gorilla Cot2 gene by its Cal gene over this site. Therefore, triangles are given to the chimpanzee C a l gene and inverted triangles are given to the gorilla Ca2 gene. In case d, the root is t and changes from t to c occur in the gibbon Cal and the gorilla Ca2 genes. These nucleotide changes are not shared either by paralogs or orthologs, and they are given full circles. Hu, Ch, Go, Or, Gi, and Cr indicate human, chimpanzee, gorilla, orangutan, gibbon, and crab-eating macaque, respectively. I and 2 indicate Cal and Cot2 genes, respectively. The A2m(l) and A2m(2) alleles of the human Ca2 gene are shown as 2(1) and 2f2), respectively. those in extant species with the feature described above.
Both the largest segments of possible gene conversions suggested by the site by site method and the significantly long segments of unvaried sites detected by Stephens' (18) method were always found downstream from intron 2. In addition, no category b site, which does not support the occurrence of gene conversion, was observed in this downstream region in all cases except for two human cases using two Ca2 alleles (Fig.  6). These results suggest that the frequency of gene conversion is higher in this downstream portion than in the remaining upstream portion. As Flanagan et al. (2) pointed out, an alternating purine and pyrimidine sequence (positions 2277-2316 of Fig. 2) might trigger frequent gene conversions in the downstream portion.
Directions of some conversions were estimated based on category c sites as shown in Fig. 6. The conversions of the C a l gene by the Ca2 gene were identified in three clusters containing sites marked with triangles; two in chimpanzee and one in HCG. In gorilla, conversion of the Ca2 gene by the Cal gene was identified in a short cluster containing sites marked with inverted triangles. However, it is not clear whether a preferred direction of gene conversion exists in ca gene conversions, unlike those observed in the y'-/yz-globin and &/@-globin gene regions (6,(19)(20)(21).
One drawback of the site by site reconstruction method is that the parsimonious solution is not always true, and we might miss or erroneously identify gene conversions. Human Ca2[allele A2m (2)] has nine changes of category d (see region A in Fig. 611), which are regarded as substitutions after divergence of A2m(l) and A2m(2) alleles using the site by site method, while the AZm(1) allele has no such change in this region. Therefore, there is a possibility that, as Flanagan et al. (2) proposed, the A2m(l) allele was converted by the Cal gene in region A , after divergence into the two alleles, while the A2m(2) allele remains unconverted. Nevertheless, the site by site method gives more information about gene conversion using a phylogenetic approach than when merely comparing duplicated genes of a single species, as shown in the present study.
Gene conversion has been proposed as a process of concerted evolution of related genes. When the conversions involve a donor gene that is a more distantly related family member, they might generally result in short regions with high diversity within a family of closely related genes (see Refs. 5 and 26, for reviews). In the Ca gene loci, gene conversion apparently causes the concerted evolution. The phenomenon of concerted evolution has been considered an efficient mechanism for fixing selectively advantageous mutations among all of the members of a multigene family ( 5 ) . However, category b positions 309 and 310 in the CH1 exon and 693-707 in the hinge region have escaped from gene conversion in all the hominoid species with two Ca2 genes. It has been suggested that a long hinge region of IgAl is effective for allowing antigen-binding arms to move and thus bind to antigens, whereas a short hinge region of IgA2 is important for protection from proteases produced by pathogenic bacteria in the secretory fluids where IgA is characteristically found (2,27). Thus, these category b sites might be the consequence of some functional difference between IgAl and IgA2 that has been maintained by selection. Moreover, the converted regions identified in the present study cover a large part of the C a gene including the downstream region encompassing the poly(A) adenylation signal. These observations suggest that most gene conversions that have spread through a lineage are selectively neutral and that the disadvantageous gene conversions have been eliminated by natural selection. Therefore, we propose that gene conversion is a general phenomenon in evolution and will be found wherever duplicated sequences occur in the genome.