The complete murine immunoglobulin class switch region of the alpha heavy chain gene-hierarchic repetitive structure and recombination breakpoints.

A 7255-base pair (bp) sequence, including the previously sequenced murine segments of I alpha, S alpha, and C alpha, has been completed. Homology matrix comparison revealed a switch repetitive region of 4.2 kilobases (kb) composed of 20-80-bp homology runs, including the previously assigned S alpha region. We distinguished several stretches of duplication, i.e. the central 0.8-kb repetitive region, with some 80-bp staggered consensus repeats containing 20-30-bp subsets, made up of the primordial pentamers CTG(A/G)G. All the break-points of the S alpha switch recombination, including those generated by the translocation of the c-myc protooncogene and those catalyzed by bacterial extracts, are located within the consensus sequence subsets of the 4.2-kb repetitive region.

the primary switch recombination, because chromosomal breakpoints in myeloma cells, virus-transformed cell lines, and hybridomas (4,(9)(10)(11)(12)(13)(14) were often found outside such structurally distinct switch-repetitive regions, possibly due to frequent secondary deletions. Upon analyzing lipopolysaccharide/transforming growth factor @-induced switch circular DNA, we found that two Sa breakpoints mapped to a new region that was comprised of a simple repetition of 5-base consensus sequences (15) outside the previously assigned Sa region. Since only limited sequence data on Sa regions are available, it has been difficult to relate the Sa breakpoints to longer stretches of switch-repetitive sequences. We isolated several overlapping deletion fragments located between already sequenced segments and connected four data base sequences of a total of 7255 bp. We searched for repetitive sequences by a computerized sorting method and found a switch-repetitive sequence of 4.2 kb that displays an underlying hierarchic repetitive structure defined by the primordial pentamers CTG(A/G)G, which organize into larger repetitive units of 10,20,30,80, and 800 bp.

EXPERIMENTAL PROCEDURES
The nucleotide sequencing strategy for repetitive sequences is described under "Results." Several deletion fragments of pCS16y and pCS14a derived from BALB/c mice were prepared as follows. Both fragments were recloned into the polylinker cloning sites of pHSG399 and digested by SphI. Purified SphI fragments were digested by XbaI. These SphI-XbaI fragments were treated with exonucleases 111 and VI1 as described (16). Deletion fragments were self-ligated after deleting a stretch of nucleotides with T4 DNA polymerase using a DNA blunting kit (Takara Shuzo Ltd., Kyoto, Japan) and transfected into Escherichia coli DH5a recAl endAl. A computer program was used to generate diagonal lines indicating segments of 20 bases long that show homology above a threshold level (17).

RESULTS
Sequencing Strategy-The 10-kb EcoRI fragment (IgH703) isolated previously from BALB/c mice (18) contains the region between the Ia segment and the Ca region in which the entire Sa region was supposed to be located. Four segments of this fragment have been sequenced and registered in the data bases M29011, X62548,500474, and 500475 as shown in Fig. 1. The sequence data available under accession number X62548 were previously constructed from the overlapping Sa+ circular DNA clones (15). To connect these four segments, we subcloned five gene fragments covering the three sequence gaps between the sequenced segments (a 4.2-kb XbaI fragment (pCS15), a 0.6-kb XbaI fragment (pCS51), a 2.5-kb Sad-XbaI fragment (pCSlGy), a 0.6-kb PstI-XbaI fragment (pCSlGa), and a 1.8-kb XbaI-EcoRI fragment (pCS14a)) ( Fig. 1). These genomic subclones were stable during their propagation in the  E. coli DH5a RecA-En&-strain, whereas Ca genomic clones were unstable during propagation in E. coli RecA+ strains (19). Since the Sa region has been shown to have strong internal homologies (repeats), one cannot use shotgun cloning or chromosome walking with synthetic oligonucleotides for sequencing that is technically easier but tends to misalign the repetitive sequences. To avoid such misalignment, we prepared several overlapping fragments, which were deleted at different distances from the 3'-end of pCSl6y and the 5'-end of pCSl4a (a-g and A-J). Nucleotide sequences of these fragments were determined by the dideoxy chain termination method (20) using the primers indicated in Fig. 1. Several sequence discrepancies, probably due to polymorphism between mouse strains or sequencing errors, were found in the overlapping sequences of data bases M29011, 500474, and 500475. Nucleotides were numbered consecutively from the start of data base M29011. A new complete 7255-bp data base (accession number D11468) was assembled comprising the four already registered data bases and new sequence stretches (positions 357-3029 and 4013-6374) connecting these registered sequences.
Internal Homologies-To find internal homologies, we generated a homology matrix between identical pairs of the complete sequences. Segments of 20 bp having more than 80 and 90% homology to other segments are shown by thin and bold (double width) diagonal lines, respectively (Fig. 2a). In-ternal homologies are clustered in a 4.2-kb area. We found three long direct repeats, DR1, DR2, and DR3, in the 5' to 3' orientation. DR1 and DR3 repeats share homology with other repetitive sequences, whereas the DR2 repeat shows a unique sequence unit. We looked back directly at the original data to find the maximum unit length of these repeats. DR2 is composed of homopurine stretches of 130 bp at position 3140-3269 reading (AGGAG)2AAGAG(AGGAG)23, and DR1 consists of 65 nucleotides at position 3070-3134 composed of (CTGAG),, as found previously (21). DR3 comprises 132 nucleotides at position 4369-4500 reading (('M'AGT)(CTGG-Internal homologies of longer stretches are most enriched in the central part at position 2.3-3.1 kb followed by the downstream region. Internal homologies of only shorter stretches were found in the region upstream of the central part. We aligned the long internal homologies at position 2269-3871 excluding DR2 and tried to create maximal sequence similarity alignment by manually inserting gaps (Fig.  3a). A consensus sequence of an 80-bp repeating unit showing more than 60% homology to its individual repeats was identical to the prevalent sequence of Sa (18) previously processed from data base 500474 (Fig. 3b). The 80-bp consensus sequence contains two 30-bp repeating units, which are synonymous to the 5"Ca consensus sequence (9) and three 20-bp repeating units (Fig. 36). Further analysis of the 80-bp con-  sensus sequence revealed eight 10-bp repeats (NTGRGCTGGG) as well as the primordial pentamers (CTGRG). These pentamers were nearly homologous to both of the pentamer units of the Sp region, CTGAG and GTGGG.
Recombination Breakpoints-To analyze the prevalence of the 80-bp consensus sequence motif enriched in the central part of the internal homology area, we compared it with the whole 7255-bp sequence to see how far the matches would continue and printed out those matches and lengths by generating a homology matrix where diagonal lines indicate segments of 20 bp, which show more than 75% homology (thin line) or more than 80% homology (double width bold line) (Fig. 26).
These consensus sequence motifs are most strikingly distributed within the internal homology area of 4.2 kb between positions 627 and 4806 excluding DR2 (Fig. 2b). We assigned to every recombination breakpoint of Sa sequenced to date the position in the germ line 7255-bp sequence on the basis of nucleotide sequence matching except two breakpoints, M104E (22) and T2-4 (6), whose sequences were not found at the expected positions 2.3 and 2.8 kb, respectively (Fig. 2b). All breakpoints were located within stretches homologous to the consensus sequence of at least 20 bp. Our sample size was 35 (12 for switch circular DNA, 9 for switch recombination in myeloma, 10 for myc translocation or deletion in myeloma, and 4 for the recombination catalyzed by bacterial extracts). Furthermore, one of the primordial pentamers (CTGGG) was found within 10 bp of most breakpoints. This suggests the involvement of the longer stretches of consensus sequences or the subsets including CTGGG in the recombination of Sa switch-repetitive regions. The internal homology area of 4.2 kb, which we now term the Sa region, contains subsets of consensus sequences enriched in recombination hot spots. This result clearly indicates that the region defined by its repetitive structure and homology to the consensus sequence actually acts as a functional Sa region.
We screened a complete 7255-bp sequence for a heptamer consensus motif (YAGGTTG), which had been found near the majority of switch recombination sites in plasma cell tumors and hybridoma lines (23, 24). Six of seven heptamers were found in a small region within Sa from positions 3281 to 3766, and another heptamer was outside Sa (5240-5246). This paucity of the heptamer consensus motif in the flanking region of Sa might correlate with the fact that all the switcha breakpoints in myeloma were located within the Sa region, whereas myeloma switch+ breakpoints were often located outside the Sp (6,15).

DISCUSSION
Three major classes of highly repetitive DNAs have been identified in primates: tandemly repeated sequences, such as the a-satellite, as well as both short (SINE) and long (LINE) interspersed repetitive nucleotide sequences (25). However, the Sa switch-repetitive region described here is different from these highly repetitive DNAs, since the Sa sequence repeats are well organized in different unit lengths of hierarchic classes (4.2 and 0.8 kb and 80, 30, 20, 10, and 5 bp). This suggests the duplication of an ancestral chromosomal region. Two long internal repeats, DR1 and DR2, are conserved in the central part of the internal homology region. DR1 is composed of tandem primordial pentamers constituting an 80-bp consensus sequence of recombination hot spots. The central 0.8-kb region containing DR1 is rich in this consensus sequence. Several duplications of the 0.8-kb region may have formed the 4.2-kb Sa switch-repetitive region, while DR2 missing recombination hot spots remained unduplicated. This duplication of the Sa switch region may have proceeded independently of the evolution of the CH gene locus presumably generated by Cy duplication in the ancestral chromosomal region containing Cp-CG-Cy-Ct-Ca (26,27). The higher order structure of switch-repetitive regions as found in the Sa region seems to be a common feature in the S region of longer stretches, Sy3, Syl, and S-yZb (28-30). Duplication events in switch regions may have elongated S regions. This can be correlated with the relative contents of immunoglobulin classes in mouse serum (3). During duplication events, the primordial pentamer sequences degenerated in all the S-y regions but remained well conserved in the Sp, Sr, and Sa regions (3). The Sa region is common to the Sp and Sc regions with respect to the conservation of primordial pentamers and also to the S-y regions with respect to the higher order duplication structure.
Switch recombination sites tend to share little homology as shown previously (3,6-8,15). Thus, the enzymes that mediate the cutting-religation reaction do not seem to recognize an obviously conserved sequence. However, the close correlation between recombination breakpoints and the Sa consensus sequence suggests that switches to Sa involve recognition of at least a 20-bp motif or a subset of the SO-bp consensus sequence including the primordial pentamer CTGGG. Longer stretches of the GC-rich sequence may be required for the particular pairing of switch regions in recombination. Thus, the class switch region may be defined as the sequence enriched in consensus sequences targeted by switch recombinase. This is the first evidence of an Sa region in which the sequence structure defined by internal homology agrees with the functional structure defined by recombination breakpoints.