A 66-base pair insert bridges the deletion responsible for a mouse model of beta-thalassemia.

The breakpoints of the deletion responsible for the Hbb(th-1) mouse model of beta-thalassemia have been isolated. A 3709 (+/- 2)-base pair (bp) region, including the entire beta major globin gene and 2 kilobases of 5' flanking region, is deleted. A novel 66 (+/- 2)-bp sequence, ending in a stretch of 25 dA:dT base pairs, was found to bridge the deletion. A region of the normal murine genome, containing the first 43 bp of the deletion-associated insert (DAI), but lacking the 25-bp dA:dT sequence, was isolated. All normal mice tested contain this DAI-like element and several inbred strains contain an additional DAI-like element. The sequence spanning the Hbb(th-1) deletion may be a reverse transcript of this region.

Many deviations from normal heredity are the result of recombinatorial events (Lewin, 1985). These may be manifest as an unusual combination of traits within an individual, the production of novel protein products (e.g. immunoglobulins, Tonegawa, 1983), or the appearance of mutations (e.g. hemoglobin Lepore (Baird et at., 1981). The analysis of the primary nucleotide sequence at the sites of recombination has provided clues relating to the mechanisms by which such rearrangements might take place (Baltimore, 1985;Hochtl and Zachau, 1983).
Deletions in the globin loci can produce easily identified, non-lethal mutations (Collins and Weissman, 1984). Indeed, many of the mammalian deletions for which primary sequence data are available involve the human globin loci. Some of these are believed to confer a selective advantage upon the heterozygote (Weatherall and Clegg, 1972), and many of the best characterized deletions have persisted in populations for long periods of time. Over time, the regions surrounding these events may have undergone further mutation and/or recombination, making it difficult to define the original event.
A mouse model of @-thalassemia has been described and the genetic locus associated with it has been designated Hbb(th-1) (Skow et al., 1983). In the homozygous state these animals have a microcytic, hypochromic anemia with a profound reticulocytosis. Their peripheral smears are remarkably similar to those seen in human @-thalassemia (Skow et al., 1983). Chromatographic separation of denatured globin chains reveals that the fl major globin chain, which normally represents 80% of the total adult @-globin in this (Hbb(d)) variety of mouse (Russell and McFarland, 1974), is absent.
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Despite their anemia, mice homozygous for the Hbb(th-1) allele survive to adulthood and are able to reproduce (Popp et al., 1984).
The underlying defect responsible for the Hbb(th-1) phenotype, which arose spontaneously, is the deletion of the fl major globin gene (Skow et at., 1983). Both the @ minor globin gene (3' to @ major) and the @-H3 pseudogene (5' to ( 3 major) are intact. On the basis of the restriction and hybridization patterns of genomic DNA, the deletion was estimated to be 3.3-3.6 kb,' covering the entire / 3 major globin coding region (Skow et aL, 1983). The chromosomal region corresponding to the Hbb(th-1) locus has now been isolated. The sequence of the DNA spanning the breakpoints of the deletion has been determined. Analysis of this portion of the genome of the mouse model of @-thalassemia provides several clues relating to the mechanism by which this, and possibly other, deletions arose.

Restriction Fragment
Probes-Probes for areas related to the murine 0 major globin gene were prepared as restriction enzyme digests of the 7.4-kb EcoRI fragment described by Konkel et al. (1978). Restriction enzymes were purchased from New England Biolabs or Bethesda Research Laboratories, and digestions were performed according to the conditions specified by the supplier. The fragments were separated by agarose gel electrophoresis, electroeluted, extracted with phenol and chloroform and then ethanol-precipitated. Radiochemicals were purchased from Amersham Corp. Restriction enzyme fragment probes were labeled by nick translation (Rigby et al., 1977) using [a-"P]dATP as substrate.
Southern Blotting-DNA was extracted from the livers of the various mouse lines by the method of Gross-Bellard et al. (1973). Restriction enzyme digests of genomic DNA were transferred to nitrocellulose (Schleicher & Schuell, BA 85) after agarose gel electrophoresis in the manner described by Southern (1975). Restriction digests of recombinant phage DNA were transferred simultaneously to two sheets of nitrocellulose by sandwiching the gel between the two sheets after soaking both the gel and the nitrocellulose in 20 X SSC (SSC = 0.15 M NaC1, 0.015 M sodium citrate). Blots were baked under vacuum at 80 "C for 2 h and stored dessicated until use. They were then prehybridized for 4-18 h in 6 X SSC, 5 X Denhardt (1966) solution, 0.5% sodium dodecyl sulfate (SDS). Ltibeled probe was denatured by boiling for 10 min and was added to the prehybridization mixture, and the incubation continued for an additional 12-18 h. Blots were washed four times for 20 min at room temperature in 2 X SSC, 0.5% SDS, twice for 20 min in 0.2 X SSC, 0.5% SDS at the hybridization temperature, air-dried, and exposed to x-ray film (XAR, Kodak) for periods ranging from 5 min (for phage DNA) to 5 days (for murine genomic DNA).
Genomic Libraries-A genomic library of liver DNA extracted from a &thalassemic mouse was constructed from an EcoRI partial digestion of the DNA cloned into a Charon 30 vector (Rimm et al., 1980) The abbreviations used are: kb, kilobase(s); DAI, deletion-associated insert; bp, base pairs; SDS, sodium dodecyl sulfate.
FIG. 1. Deletion in the murine &globin locus in the Hbb(th-1) mouse. a shows the map of the Hbb(d) murine &globin locus (Edgell et al., 1981). The bar above the map shows the extent of plaque 5156, isolated from the Hbb(th-1) genome and spanning the deletion. Successive expansions of the region near the Hhh(th-1) deletion show the restriction sites in this area. Hatched area is preserved, thin line is deleted in the Hbb(th-1) genome. b shows a Southern blot of plaque 5156 (which spans the deletion breakpoint) digested with the enzymes indicated and hybridized to the XbaI-BdII fragment 3' to the breakpoint. Restriction enzyme sites: C, BglII; E, EcoRI; H, HindiII; X , XbaI. and plated on the BNN45 Escherichia coli substrain. A library of a Sau3AI partial digest of DBA/2J liver DNA was constructed in an EMRL3B vector (Frischauf et al., 1983) and grown on BNN45. Plaques were transferred to nitrocellulose and hybridized to probe by the method of Benton and Davis (1977). Positive plaques were picked, grown in the same substrain of E. coli, and characterized by restriction enzyme digestion and hybridization pattern.
Sequence Determination-Restriction fragments of isolated X clones were subcloned into pUC8. These fragments were further subcloned into M13mp18 and M13mp19. Primary nucleotide sequences were then determined by the technique of Sanger et al. (1977) using a synthetic 15-base primer (Bethesda Research Laboratories).
Synthesis of Oligomeric DNA Probes-The %base sequence, TA-AGTGTAAATTTTTGTT, was synthesized by a modification of the solid-phase phosphotriester technique using a Vega Polynucleotide Synthesizer (Miyoshi et al., 1980a. The 20-bp sequence, AATGTGTATTAAGCAAAAAGA, was synthesized by the phosphite-triester method (Beaucage and Caruthers, 1980). The oligonucleotides were deprotected by acidification, purified by reverse phase high pressure liquid chromatography using a modification of the method of Dembek et 01. (1981) and Kempe et al. (1982). The fragments were terminally labeled using T4 polynucleotide kinase (P-L Biochemicals) (Maniatis et al., 1978) using [-y-'*P]ATP (3000 Ci/ mmol, llOTBq/mmol) as the phosphate donor. The sequences of the oligonucleotides were verified by chemical sequence analysis (Maxam and Gilbert, 1980). Hybridization with Oligomeric Probes-Southern blots were prepared and prehybridized as described above for hybridization with restriction fragments. The probe was diluted 100-fold in hybridization solution and incubated at 65 "C for 10 min. The filters were then hybridized a t 47 "C (18-mer), or 52 "C (20-mer) for 18-24 h. The filters were then washed three times in 6 X SSC at 0 "C for 15 min each, twice at the hybridization temperature with 3 X SSC, 0.5% SDS, and twice a t room temperature with 6 X SSC. The filters were dried in air, covered with plastic wrap, and exposed to x-ray film for periods of 5 min (phage) to 4 days.
Computer Analysis-Some of the computer resources used to carry out these studies were provided by the B I O N E F National Computer Resource for Molecular Biology, whose funding is provided by the Biomedical Research Technology Program, Division of Research Resources, National Institutes of Health, Grant 1U41RR-01685-02. Sequence data were input to an APPLE I1 computer using the Gel-Pad System (K&H BioSoft, Frederick, MD). The sequence data were FIG. 2. Genomic Southern blot probed with a fragment spanning the deletion breakpoint. DNA extracted from d-thalassemic (Hhb(th-1)) mice ( T ) and normal DBAI2J mice ( D ) were digested with the enzymes indicated and probed with a 456-bp HindIII-XbaI fragment isolated from plaque 5156, which spans the deletion breakpoint. The difference in the patterns of hybridization reflects the deletion that has taken place in the P-thalassemic mouse (see Fig. lb). The BamHI sites are outside the map shown in Fig. 1. then transmitted to the BIONET system for analysis. The GEL program was used for sequence assembly and IFIND for comparison to the databases.

+1717 +1724
Sequence at the Hbb(th-1) deletion breakpoint. The sequence of the HindIII-XbuI fragment isolated from the Hbb(th-1) genome (th-I), the 5' and 3' flanks of the normal DBA/ZJ p major globin gene (pmajor). were determined using the clones described. Only a small portion of the alignment with the normal sequences i s shown. The 4 A T base pairs beginning at position 1725 in the normal 3' flanking sequence make the assignment of the 3' breakpoint ambiguous. The numbering of the normal sequence begins with +1 at the cap site. The underlined sequence is the inserted region. The sequence in italics (GGTTTC) is the hexanucleotide common to several deletions (see "Discussion"). that a simple 3.3-3.6-kb deletion spanning the @ major globin gene accounts for the absence of this protein in mice homozygous for the Hbb(th-1) allele (Skow et al. 1983). Analysis of the novel band pattern generated by DNA extracted from @thalassemic mice on genomic Southern blots probed with the non-deleted, p major specific, 5' and 3' flanking regions generated the restriction map shown in Fig A fragment extending from an XbaI site (2 kb 3' to the cap site) to a BglII site (1 kb further downstream) was used to screen a genomic phage library of the @-thalassemic mouse. This region is beyond the deletion in the Hbb(th-1) mutation and produces a unique set of bands on genomic Southern blots of /3-thalassemic DNA that are different from those generated by normal DBA/2J DNA (data not shown).
The clones identified by this probe had the restriction pattern and hybridization pattern expected at the breakpoint of the deletion in the Hbb(th-1) genome. The insert contained an 8.8-kb EcoRI fragment and a 1.85-kb BglII fragment, both of which hybridized to the XbaI-BglII fragment 3' to the p major globin gene (Fig. lb). The pattern of the cloned region was consistent with that previously determined by genomic Southern blots; a simple 3.5-kb deletion extending 1.5 kb 5' and 2 kb 3' to the p major globin cap site appeared to have occurred in the Hbb(th-1) genome.
A 456-bp (see below) HindIII-XbaI fragment, isolated from the &thalassemic genome and spanning the Hbb(th-1) deletion, was subcloned and used as a probe for genomic Southern blots (Fig. 2). Different bands in the &thalassemic and DBAj 25 genomes were detected, reflecting the change that has taken place near the p major globin gene. The p major gene has a single BamHI site in the second exon. Therefore, two bands are detected with this probe which contains both 5' and 3' sequences. A single band is seen in the thalassemic lane. There are no BglII sites in the @ major globin gene, so only a single band is seen in both normal and thalassemic DNA, but the bands differ in size by approximately 3.5 kb, the size of the deletion.
A 66-bp Novel Sequence Is Inserted at the Deletion Breakpoint-The sequence of the Hbb(th-1) HindIII-XbaI fragment, spanning the deletion breakpoints, was determined. Both the 5' and 3' breakpoints in the normal genome were outside the previously reported sequence. To obtain the normal sequence, an EMBL-3B (Frischauf et al., 1983) library of the normal DBA/2J genome was constructed and screened with portions of the @ major globin coding region. Hybridizing plaques were isolated and their identities confirmed by restriction digest pattern. A clone that extends more than 2 kb 5' to the cap site was used to determine the location of the 5' breakpoint. Portions of the normal 3' flanking region of the / 3 major globin gene from the BALB/c mouse had previously been cloned (Hofer and Darnell, 1981). These fragments were also sequenced. By comparing the sequence from the P-thal-assemic mouse with those from the normal progenitor, the breakpoints of the deletion could be identified (Fig. 3). Assigning +1 to the position of the cap site, the 5' breakpoint is at nucleotide -1984 and the 3' breakpoint is between nucleotides +1727 and 1731. Thus the deletion is 3709 -c_ 2 base pairs long. (The ambiguity is caused by the 4 dA:dT base pairs at the 3' end of the normal sequence, any one of which could be the breakpoint). The BALB/c and DBA/2J sequences are identical 3' to the breakpoint for at least 250 bp. This confirms the conservation of @-globin sequences between these two strains of mice.' Sixty-six (k2) base pairs are inserted at the deletion site. This deletion-associated insert (DAI) is very A:T-rich (82% A+T) and ends in a 25-bp dA:dT sequence but it does not contain an identifiable poly(A) addition signal (AATAAA). No sequences similar to it have been described in the vicinity of any murine &globin gene (Konkel et al., 1978;Citron et al., 1984). Comparison of this sequence with the GenBank" database (release 31.0) using the algorithm of Wilbur and Lipman (1983) failed to identify it.
The Normal Murine Genome Contains a Small Number of DAI-like Sequences-Insertions often accompany deletions but they are frequently so small that their origins cannot be identified. The Hbb(th-1) DAI sequence was large enough to allow its use as a probe. The 124-base pair MnlI fragment consisting of the 66-base pair insert, 21 base pairs of 5' flanking DNA, and 35 base pairs of 3' flanking DNA (described above), was used as a probe for genomic Southern blots. A diffuse hybridization pattern, characteristic of a repeat sequence, was obtained (data not shown).
Since this restriction fragment contains a 25-bp dA:dT stretch that could be expected to hybridize with repeat sequences (Schmid and Jelinek, 1982), the oligonucleotide, TA-AGTGTAAATTTTTGTT, corresponding to the (-) strand of the left-most 18 base pairs of the insert (orientation defined by the direction of p-globin transcription) was synthesized, purified by high pressure liquid chromatography, end-labeled with polynucleotide kinase, and used to probe restriction enzyme digests of DNA extracted from normal, DBA/2J mice and @-thalassemic mice (Fig. 4). Hybridization of this probe produced a relatively complex pattern with some apparent differences between normal and thalassemic samples. A second probe was then synthesized, AATGTGTATTA-AGCAAAAAGA, corresponding to the (-) strand of the 20 base pairs adjacent to the dA:dT region of the insert. This oligonucleotide was purified by high pressure liquid chromatography, labeled, and similarly used as a probe for restriction digests of genomic DNA (Fig. 4). Using this sequence, which does not overlap the 18-base probe, a less complex pattern is generated. BamHI generates only two hybridizing bands in both the @-thalassemic and the DBA/2J genomes. A 15-kb band is present in both genomes. The hybridizing band at 8.3 kb in the @-thalassemic DNA corresponds to the location of the deleted /? major globin gene in the Hbb(th-1) genome. A 2.7-kb BamHI band is present in normal DBA/2J DNA but absent in the @-thalassemic genome. The 20-base probe hybridized to a subset of the bands that hybridize to the 18-base probe. As expected, the longer probe is more specific. It is surprising, however, that all fragments that hybridized to the 20-base probe also hybridize to the 18base probe. This suggests that the association of the 18-and 20-base sequences is not limited to the DAI. These two independent probes are close to each other elsewhere in the genome.
A Normal Genomic Sequence Homologous to the DAI Lacks a String of dA:dT-The normal genomic sequence that corresponds to the DAI may represent the recombinatorial partner which, along with the @ major globin gene, generated the Hbb(th-1) allele. On the basis of Southern blots (Fig. 4) more than two such areas exist in the normal genome. By isolating such a region, we might determine the relationship between the inserted fragment and its precursor.
The previously described EMBL-3B library of the normal DBA/2J genome was screened with the 18-and 20-base oligomeric probes. A plaque which hybridized to both oligomers, designated A42, was selected for further analysis (Fig. 5). DNA was isolated from this phage, digested with a variety of restriction enzymes (both singly and in combination), and transferred to nitrocellulose as described under "Materials and Methods." The resulting Southern blots were then independently hybridized to the two oligomeric probes described above. With every combination of restriction enzymes tried, the same restriction fragments hybridized to both probes (data not shown). Thus, the two probed sequences appeared to be close to one another.
A 410-base pair region (extending from an RsaI site to a BglII site) which hybridizes to both oligomeric probes, derived from plaque A42, was sequenced by subcloning several subfragments into an M13 vector (Fig. 6). This fragment contains a sequence that is identical to the first 43 base pairs of the DAI sequence found at the site of the Hbb(th-1) deletion. However, it does not contain a dA:dT stretch nor does it contain a canonical poly(A) addition signal.
The DAI-like Loci Are Polymorphic-The 190-base pair Sau3A-AluI fragment isolated from the normal DBA/2J genome and containing an area that is identical to the 43-bp (DAI) insertion sequence was radiolabeled by nick translation and used to probe restriction enzyme digests of DNA extracted from a /?-thalassemic mouse, a normal DBA/2J mouse, and a heterozygote between Hbb(th-1) and C57BL/6 mice (Fig. 4). The 43-bp insertion sequence constitutes less than 25% of this probe and is the most A:T-rich region. This explains the weak signal a t 8.3 kb on a BamHI digest of /?-thalassemic DNA, the area that corresponds to the locus of the deletion, and the stronger signal at 15 kb. DBA/2J DNA generates the same pair of bands, 15 and 2.7 kb, whether the probe is genomic or a synthetic 20-mer. The heterozygote shares a common band of 15 kb with the other mice. The 8.3-kb band, which corresponds to the single Hbb(th-1) allele in the genome of the heterozygous mouse, is even fainter. The 2.7-kb band, that is so prominent in the DBA/2J genome, is absent from the heterozygote as well as from the /?-thalassemic homozygote, suggesting that the 2.7-kb band is polymorphic.
The 3.7-kb EcoRI-BamHI fragment that surrounds this 190-bp probe (Fig. 5) does not reveal any additional bands in any of the DNAs tested (Fig. 4). Increasing the size of the probe does not uncover a repeat element, nor does it reveal additional areas in which the homology with the 2.7-kb fragment continues.
Since the @-thalassemic mutation had arisen in a DBA/2J mouse and the offspring had been crossed to C57BL/M mice, the reason for the absence of the 2.7-kb band from the Hbb(th-1) genome was unclear. It could have been lost along with the

Molecular Defect in Murine Thalassemia
@-globin gene, or its absence could reflect a polymorphism a t that site. T o identify the genealogy of these elements, Southern blots of a variety of mouse strains were probed with the 190-bp Sau3A-AluI fragment (Fig. 7). The 15-kb BamHI fragment was present in all strains tested. The 2.7-kb fragment was present in some strains (DBAIPJ, DBA/lJ, NZB/ J , AKR/N, STS/J, and C58/J) and absent from others (C57BL/6J, C57/W, and BALB/cJ). Closely related strains share the same pattern. Neither C57BL/6J nor C57/W has the 2.7-kb band; both DBA/lJ and DBA/2J have it. This pattern of divergence is consistent with a t least one ancestral rearrangement in this region, prior to the divergence of BALB/cJ and AKR/J.

DISCUSSION
A previously described mouse model of @-thalassemia (Skow et al., 1983) is the result of a deletion of the @ major globin gene. The molecular details of the deletion have now been investigated. The deleted DNA (3709 f 2 bp) is replaced by 66 k 2 bp of DNA that apparently originated in another part of the murine genome.
Several Deletions Share a Common Hexarzucleotide-The hexanucleotide at the 5' breakpoint, GAGTTTC, which includes the first 5 bases deleted in the Hbb(th-1) mutation, is found close to the breakpoint of several other deletions ( Table  I). The cleavage between the two guanines and the six nucleotides that follow are identical to the deletion in an analbuminemic rat described by Esumi et al. (1983). In these rats a deletion of 7 base pairs in the eighth intron of the albumin gene appears to cause a splicing error that is responsible for the analbuminemia. Of the seven nucleotides deleted, the first six and the preserved guanine immediately preceding the deletion correspond exactly to the 5' end of the sequence that is deleted in the @-thalassemic mouse described here. Thus, both heritable deletion mutations of rodents for which the primary sequence data is available involve identical sequences a t their 5' breakpoints. Also, both of these mutations were spontaneous.
Three independent deletions in the human &@-globin region have their 5' breakpoints close to the hexanucleotide GGTTTC (Table I). This hexanucleotide sequence appears 12 times within the 16,489 bases encompassing the human 6- and @-globin genes (GenBank@, accession number K01239, release 31.0, March, 1985) and three of these are associated with deletions. Since the deletions in the human &@-globin complex are probably much older than those described in the rodents, genetic drift has made the assignment of the breakpoints in the human mutations less precise. A 619-base pair deletion (Spritz and Orkin, 1982), associated with the gene responsible for 20% of the fl-0 thalassemia in India (Antonarakis, et al., 1985) begins within 10 bp of the this hexanucleotide. The deletion associated with the common form of hereditary persistance of fetal hemoglobin in the United States (HPFH-1) is more than 60 kb in length (Fritsch et al., 1979). Its 5' breakpoint (Jagadeeswaran et d., 1982) is 5 nucleotides from this hexanucleotide. A deletion causing 6-0, @-0-thalassemia (Ottolenghi and Giglioni, 1982), begins within 20 base pairs of a third occurrence of this sequence.
This hexanucleotide resembles the nonamer signal sequence found 12 or 23 bp from sites of immunoglobulin rearrangement (GGTTTTTGT) (Early et al., 1980) and is a part of the nonamer signal sequence that is believed to be involved in the rearrangement of the T cell receptor gene (Siu et al., 1984). In a human processed pseudogene that appears to be derived from an immunoglobulin gene (Hollis et al., 1982), the normal nonameric sequence is replaced (in part) by the hexamer GGTTTC. Despite (or perhaps before) this change, this pseudogene appears to have undergone a V-J joining. This same hexanucleotide appears in the murine immunoglobulin switch region, S(mu) (Gerondakis et al., 1984), and is also found 2 bp 3' to a site at which the spleen focus-forming virus genome differs from the Moloney murine leukemia virus genome by a deletion of 810 base pairs (Clark and Mak, 1983). Recently, the 9q+ (reciprocal of the Ph(1) chromosome) breakpoint has been sequenced." This breakpoint (which is also associated with a deletion) occurs at the same (GGTTTC) hexanucleotide.
The appearance of this hexanucleotide, GGTTTC, near several deletion breakpoints suggests that it might play some role in a recombinatorial process which has led to these heritable mutations. Based upon the limited data available, this sequence may be a recognition site for an enzyme that is active in germ cells, possibly related to meiotic recombination. The appearance of this sequence in immunoglobulin switch regions and areas involved in T cell receptor rearrangement implies that either the somatic and germ line recombinatorial mechanisms share certain features (including some degree of sequence specificity) and/or that germ line rearrangements can be the manifestations of the action of the somatic recombinatorial machinery acting on an unusual substrate.
Structure of the Insertion-The association of short inserts with heritable deletions is a common (Table I) although not invariant finding. Short, novel sequences are also associated with somatic recombinatorial events, e.g the "N sequences" of immunoglobulin rearrangement (Alt and Baltimore, 1982) and the translocation of cellular oncogenes (Gerondakis et al., 1984). Some of these sequences seem to be duplications of neighboring regions of the genome (e.g the analbuminemic rat (Esumi et al., 1983) and the Indian @-0-thalassemia (Spritz and Orkin, 1982) genes). Others do not have a clear origin but are generally too short and contain too little information to specify their origins.
The 43-base pair Hbb(th-1) DAI is large enough to be a meaningful probe for the detection of matching sequences in the normal murine genome. However, initial attempts to identify a genomic source for the Hbb(th-1) insertion sequence, using restriction enzyme fragments, suggested that it ' D. Leibowitz, personal communication.

Association of the heranucleotide GGTTTC with deletions
The first column lists the gene that is involved in the deletion. The species is listed in the second column. The "A" marks the deletion breakpoints. In all cases the GGTTTC sequence is among the deleted bases. The entire extent of the deletion is shown for the rat albumin gene. All others extend beyond the sequence shown. The GGTTTC sequence appears on the non-coding strand for the second 6-P-globin deletion shown, hence the breakpoint is shown on the right. The novel DAI sequences inserted at the deletion breakpoints are shown in the last column. The sources for these data are this work; Esumi et al., 1983;Spritz and Orkin, 1982 (Schmid and Jelinek, 1982) and processed pseudogenes (Sharp, 1983) widely scattered through the genome.
The dA.dT sequence was eliminated by synthesizing oligonucleotide probes based upon the 5' portion of the DAI. These probes (especially the 20-base probe) identified a specific set of bands on genomic Southern blots (Fig. 4) and could be used to isolate a non-repetitive region of the normal genome which contained all of the 5' portion of the inserted sequence (Fig. 6). Probes ranging in size for 20 to 3700 base pairs produced identical patterns on genomic Southern blots (Fig.  4), attesting to the non-repetitive nature of the inserted sequence. Finding a closely related sequence in the normal murine genome makes it unlikely that the DAI is the product of a template-independent repair process at the site of a DNA break.
The origin of the 25 dA.dT base pairs at the 3' end of the inserted sequence, absent in the genomic clone, is not known. One possibility is that insertion of the DAI involved an RNA intermediate.
Genealogy of the DAI Sequence-All mice tested have a copy of the DAI sequence that migrates as a 15-kb BamHI fragment (Fig. 7). A second copy, seen as a 2.7-kb BamHI fragment, is present in some strains and absent in others. When probes isolated from the phage clone of the 15-kb copy are used to probe genomic Southern blots, the hybridization to this 2.7-kb band remains strong with increasing probe size, whereas the signal that corresponds to the Hbb(th-1) breakpoint grows weaker as the DAI sequence becomes a smaller fraction of the total probe (Fig. 4). Thus, these two normal copies of the DAI appear to share a longer region of sequence homology than the 43 base pairs inserted at the Hbb(th-1) breakpoint.
The kinship pattern suggested by the Southern blot is consistent with the genealogy of these inbred mouse strains (Staats, 1980). Both DBA strains have the additional band and both C57 strains lack it. Since the C57 and C58 strains are related only through their paternal genes, the 2.7-kb BamHI fragment must have come from the female in this 1921 mating.
The genealogy of the strains can be used to locate the point of divergence with respect to this additional copy of the DAI sequence in the evolutionary history of these inbred mouse strains. The progenitor of the AKR and DBA strains (which contain the 2.7-kb fragment) diverged from the progenitor of the BALB/c and C57 strains (which have a single copy of the DAI) in the mid-19th century. Further analyses using wild mice and related rodent species will be necessary to decide whether this divergence constituted an insertion or a deletion of the second DAI-like region.
How Did the @-Thalassemic Mouse Arise?-Models attempting to surmise the events that generated the Hbb(th-1) allele must account for the deletion, the insertion, and the dA:dT stretch (a putative reverse transcript). Many heritable deletions can be understood as mismatched meiotic recombinatorial events (e.g. a-thalassemia (Higgs and Weatherall, 1983)) or intramolecular gene conversions (Michelson and Orkin, 1983). There is no evidence that the process that generated the mouse model of @thalassemia involved a homologous recombination. The ends of the deletion are not homologous with one another, and the only detectable region of homology between the DAI and the @ major globin region is between the dA:dT sequence of the DAI and the region between 1728 and 1731 consisting of 4 dA:dT base pairs at the 3' insertion site for the DAI.
The dA:dT stretch suggests that the DAI arose from an RNA precursor. There are three models (not mutually exclusive) that account for all of the data: Damage repair, immunoglobulin type rearrangement, and transposition. The data are insufficient to distinguish between these models. The characterization of other deletions and the further analysis of this element should help to clarify the mechanisms involved in the process that generate these events.