Molecular characterization of the lymphoid V(D)J recombination activity.

Antibodies mediate humoral immunity and are produced by cells of the B-lymphoid lineage. The basic subunit of an antibody molecule is a pair of identical heavy (H) and light (L) immunoglobulin (Ig) polypeptide chains. The N-terminal regions of H and L Ig chains have a variable amino acid sequence from species to species of molecule; this variation provides diversity in the immune response as H and L chain variable regions interact to form the antigen-binding domain. Cellular immunity is directly mediated by T-lymphocytes. The T cell antigen receptor (TCR) is either of two distinct but related heterodimers that again mediate antigen binding through the interaction of N-terminal variable regions. Ig H chain variable region genes are encoded by germ line variable (V), diversity (D), and joining (J) segments (Fig. l ) , those of Ig L chains by V and J segments (11, and those of TCR chains by analogous sets of germ line gene segments (2). Ig and TCR variable region genes are assembled from these segments by a site-specific recombination mechanism (V(D)J recombination) that operates during the differentiation of lymphocyte precursors (pre-B and pre-T cells). Variable region genes are assembled upstream from exons that encode the remainder (constant region) of a given type of chain (Fig. l), the complete mRNA being generated by processing of the precursor mRNA. Within Ig H and L chain variable regions are three areas of greatest sequence variability (hypervariable regions (3)) that together form the antigen recognition site and are therefore referred to as complementary-determining regions (CDRs) (4). CDRs 1 and 2 are encoded within the V segment, and CDR 3 by the site a t which V, (D), and J segments are juxtaposed, including the D segment (1). Depending on the receptor family, each type of segment may be present in the germ line in single or multiple (different) copies (e.g. Fig. 1 (1, 2)). In mammals, combinatorial assortment among variable region gene segments and mechanisms (described below) that create imprecision at the site of joining generate considerable diversity within CDRs (1). Additional mechanisms that may generate diversity within CDRs

H and L chain variable regions interact to form the antigen-binding domain. Cellular immunity is directly mediated by T-lymphocytes.
The T cell antigen receptor (TCR) is either of two distinct but related heterodimers that again mediate antigen binding through the interaction of N-terminal variable regions. Ig H chain variable region genes are encoded by germ line variable (V), diversity (D), and joining (J) segments (Fig. l ) , those of Ig L chains by V and J segments (11, and those of TCR chains by analogous sets of germ line gene segments (2). Ig and TCR variable region genes are assembled from these segments by a site-specific recombination mechanism (V(D)J recombination) that operates during the differentiation of lymphocyte precursors (pre-B and pre-T cells). Variable region genes are assembled upstream from exons that encode the remainder (constant region) of a given type of chain (Fig. l), the complete mRNA being generated by processing of the precursor mRNA.
Within Ig H and L chain variable regions are three areas of greatest sequence variability (hypervariable regions (3)) that together form the antigen recognition site and are therefore referred to as complementary-determining regions (CDRs) (4). CDRs 1 and 2 are encoded within the V segment, and CDR 3 by the site at which V, (D), and J segments are juxtaposed, including the D segment (1). Depending on the receptor family, each type of segment may be present in the germ line in single or multiple (different) copies (e.g. Fig. 1 (1, 2)). In mammals, combinatorial assortment among variable region gene segments and mechanisms (described below) that create imprecision at the site of joining generate considerable diversity within CDRs (1). Additional mechanisms that may generate diversity within CDRs different species and have been reviewed (4)(5)(6). This review will be (gene conversion, somatic mutation) operate to varying degrees in confined to a summary of our current understanding of the mechanism and control of V(D)J recombination in mammals. Although most work has focused on murine systems, general features of Ig and TCR loci appear to be analogous in humans (6).
To date, despite substantial effort, V(D)J recombination has not been achieved in uitro. However, less direct experimental approaches have identified some molecular details of this process, Comparison of the products of V(D)J recombination events in lymphoid tumors with germ line (unrearranged) counterparts from non-lymphoid cells (1) suggested a general outline for the recombination mechanism. More recent technical advances have allowed additional insights and revealed certain details; in particular the most rewarding experimental approach has involved the introduction of germ line Ig or TCR variable region gene segments within plasmid or viral vectors (recombination substrates) into permanent pre-B cell lines that have active V(D)J recombination activity. In such lines, V(D)J recombination appears to occur by the normal mechanism within either chromosomally integrated (7,8)  surrounding chromosomal environment. Another potentially important recent advance was the identification of the lesion in the scid (Severe combined immune deficient) mouse as a defect in the V(D)J recombination process, providing the first defined mutant that affects this recombination system (10).

Mechanistic Aspects Revealed from Structural Analyses o f Germ Line and Rearranged Variable Region Gene Elements
Comparison of the DNA sequences of germ line and assembled H and L chain variable region genes yielded many fundamental insights into the V(D)J recombinase mechanism. All germ line Ig H and L chain variable region gene segments were found to be immediately flanked by a conserved palindromic heptamer and an AT-rich nonamer separated by a relatively nonconserved spacer region of either 12 f 1 or 23 & 1 base pairs in length ( Fig. 2 (1)). The location and degree of conservation of these "signal sequences" suggested that they are target recognition elements for a common site-specific recombination machinery (11,12). In this context, highly related sequences also have been found to flank the germ line TCR variable region gene segments (2). For individual loci, all segments of a particular type (e.g. J segments) have signal sequences with spacers of the same length, and segments that are joined are flanked by recognition elements separated by spacers of different length (Fig. 1). These findings suggested that spacer length in the flanking signal sequence dictates which segments can be joined (the 12/23 rule (11,12)). Relative to the location of the signal sequences, joining between coding regions was found to be imprecise, with potential coding bases lost in some junctions and not in others (1). In addition, many H chain (and TCR) joins contained nucleotides at the junction that could not be ascribed to known germ line elements (13). The imprecision inherent in coding join formation appears to be a significant source of diversity in Ig and TCR variable regions (6). Finally, although assembly of H chain variable region genes is generally accompanied by deletion of signal sequences and intervening DNA from the chromosome, many rearranged K L chain loci (and one H chain gene rearrangement) retained the intervening sequence along with complementary signal sequences precisely fused (without base loss or addition) back to back (14)(15)(16).
Together, the observations outlined above suggested that V(D)J recombination involves a multistep mechanism in which coding and signal segment ends are processed and joined together in a nonreciprocal fashion ((16) Fig. 2). Currently available evidence suggests that this site-specific recombination event is initiated by introduction of an endonucleolytic break on both recombining segments precisely at the border of signal and coding sequences. Bases are then removed from coding but, in general, not from signal sequence ends. Removal of bases may occur by exonuclease activity although other possibilities are conceivable. Extra nucleotides at junctions originally were proposed to derive from D-to-D joining (13,17), but corresponding D segments were not found. An alternative proposal for the origin of the extra nucleotides (referred to as "N" regions) was that they are added de mu0 to liberated coding sequence ends, possibly by the lymphoid-specific enzyme terminal deoxynucleotidyltransferase (TdT) (16); TdT can add nucleotides to free DNA ends in a 5'40-3' direction without a template. In the context of this recombination model, the event would be completed when complementary bases are added to single-stranded regions and the signal and coding segment ends are ligated to form "signal" and "coding" joins ( Fig. 2). The frequent occurrence of "redundant" nucleotides in junctional regions of coding sequences (8,17) suggested the possibility that synthesis of complementary strands might be primed by transient base pairing between potential single-stranded tails (16), although once again other possible mechanisms can be readily imagined.
V(D)J recombination can either delete or invert intervening DNA by the same mechanism, the outcome depending on the orientation of the segments involved ( (16)  Consensus recombination recognition signals that are separated by a 12-base respectively. The assembly process is ordered so that DH-to& rearrangement occurs first, followed by appendage of a V, segment to this DH.J, complex (49).  experimentally in a recombination substrate (7) but, based on structural studies of Ig gene organization, has been confirmed to occur frequently within endogenous Ig I( L chain and TCR loci (18-22). In support of the notion that inverted and deletional joining involves a common V(D)J recombination mechanism, analyses of extrachromosomal circular DNA in the thymus confirmed the prediction that deletional joining events would generate circles with a novel joint consisting of fused heptamers (23-26). Fusion of signal sequences is essential to maintain chromosomal integrity during inversional joining, but there is no clearly obvious advantage for ligating signals in deleted DNA the latter process may simply reflect the common mechanism.

Recognition Sequence Requirements for a V O J Recombination Event
Within V(D)J recombination substrates, joining was found to occur between recognition signals flanked by a very limited amount of adjacent sequences (7-9, 27-30); most strikingly, site-specific joining was found to occur between synthetic oligonucleotides that represent only recognition elements separated by appropriate spacers (31). The latter results confirm that these sequences serve to target the V(D)J recombinase and further demonstrate that this activity can operate on the consensus nonamer and heptamer elements in the absence of other sequences. However, it should be noted that, with respect to normal physiology, other elements may be necessary for increasing the efficiency of V(D)J recombinase action (see below). The exact sequence of the heptamers and nonamer can vary, but certain bases are generally conserved (31, 32); modifications of oligonucleotide substrates confirmed the critical bases within the heptamer (Fig. 2) and also further supported the prediction that recombination does not occur between sets of recognition elements that are separated by spacers of the same length (31). Joining that may have violated the 12/23 rule has been observed in the endogenous TCR CY J region, but neither the frequency nor mechanism of such rearrangements is known (33). Recombination apparently occurs between certain endogenous elements in which one partner consists of a heptamer not accompanied by a nonamer (34-40). Although some of these rearrangements appear to occur infrequently (39, 40), others seem to occur at a frequency that is significant relative to that observed between two complete sets of recognition elements (36-38, 41). Regardless, the presence of an appropriately spaced nonamer appears to greatly enhance the frequency at which V(D)J recombination can occur and may also act as a restricting element in the context of the 12/23 rule (42). If heptamer and nonamer elements are recognized by separate factors, these effects might be mediated by cooperative binding between these factors, as is characteristic of many transcriptional regulatory proteins (81). Cooperative binding has been described that can occur only when the two binding sites are separated by integral numbers of turns of the DNA helix (82). Related mechanisms might account for the importance of spacer pairing in V(D)J recombination.

Formation of Coding and Signal Sequence Joins
Numerous experiments have confirmed that base loss is frequent at coding joins and exceedingly rare with respect to signal joins (7, 14-16, 23-26, 43-46), indicating that signal join partners are not acted on by the mechanisms that remove nucleotides from coding segment ends. The major difference between reported Ig H and L chain coding joins is the frequent appearance of N regions (not associated with D segments) in the former hut not the latter. Considerable evidence now supports the proposal that N regions are inserted by TdT. Structurally, N regions are usually GC-rich consistent with the known activity of TdT. Furthermore, the presence of N regions in joined endogenous and introduced chromosomal elements closely correlates with TdT expression (8, 27, 28, 47), and a pre-B cell line that rarely inserts N regions can be induced to do so at higher frequency by introduction of a TdT expression vector (48). Thus, H and L (and TCR) chain gene assembly are the same mechanistically, the qualitative difference in the two joints probably reflecting the relative participation of TdT (16). In this regard, H chain variable region genes are generally assembled before L chain variable region genes (49) and where examined TdT has been found to be expressed primarily at the earlier (H chain rearranging) stages of B cell differentiation (16, 48). Thus, the relative frequency at which N regions are added to H and L chain variable region joins could be controlled by regulating the expression of TdT at the stage when the two types of joining events occur (16).
N regions (GC-rich) were found within signal joins of extrachromosomal substrates and correlated with TdT expression in host cell lines (46). In general, chromosomal signal joins that have been isolated did not contain inserted nucleotides; however, most were L chain signal joins and therefore occurred in cells that probably did not express TdT. GC-rich inserts that did not correspond to germ line coding sequences have been observed in one L chain signal join (50) and in a number of TCR signal joins (20, 22, 24). Thus, the relative frequency of N region addition into chromosomal coding and signal joins is unknown.'However, in extrachromosomal substrates N regions occur more frequently in coding than in signal joins, but these particular coding join insertions did not correlate well with TdT expression and were not GC-rich (46). Mammalian cells efficiently join DNA ends, often inserting nucleotides by a TdT-independent pathway without a preference for G and C residues (51). Many of the extrachromosomal coding join inserts are more consistent with these latter types of structures than with TdT-mediated insertions (46). It should be emphasized that in general the presence of N regions at recombination sites does not unequivocally demonstrate that joins were mediated by V(D)J recombinase (or TdT), although factors such as length and base composition may be indicative (51).
Recent experiments have demonstrated that the V(D)J recombinase does not appear to be absolutely constrained with respect to juxtaposition of ends for ligation; "hybrid joins, in which a signal heptamer is joined to the coding end of the other partner (Fig. 3), may occur at a significant frequency (about 10%) relative to normal joins in both extrachromosomal and integrated recombination substrates (62, 63). It is not yet clear how often hybrid joins occur in vivo.
Although potential examples of endogenous hybrid joins exist (34, 57, 64, 65), relatively few such structures have been identified. Some hybrid joins could conceivably contribute to antibody diversity.
The scid Defect Involves the V(D)J Recombinuse-scid mice do not generate functional B or T lymphocytes due to a defect in the V(D)J recombinase activity shared by these two lineages (10). Analyses of scid mice and lymphoid cell lines derived from them demonstrated that not only are signal and coding segment ends processed differently during V(D)J recombination but that they are also joined together by distinct mechanisms. scid pre-B and pre-T cells assemble grossly aberrant rearrangements at respective Ig and TCR loci (lo), but these mice lack obvious signs of a general defect in DNA repair or ligation (52, 53). Chromosomal scid coding joins involve appropriate partners but generally result in either deletion of both recombining segments or joining of one to sequences distal to the other ((10,54-58) Fig. 3). In contrast, in scid pre-B cells endogenous signal joins are normal in most respects and are formed at an approximately normal efficiency, but the corresponding coding joins once again are abnormal ((58) Fig.  3). Within extrachromosomal substrates scid pre-B cells also efficiently form signal joins that are nearly normal in structure (except for loss of a few bases in several) but are unable to form coding joins even though these cells can join free DNA ends efficiently (45). The scid defect therefore does not greatly impair V(D)J recombinase activities that recognize, site specifically cut at, juxtapose, and ligate recombination recognition signal sequences; its impact is largely restricted to coding join formation. Because the scid V(D)J recombinase appears to be unable to join coding segment ends at a significant rate, it has been proposed that aberrant scid chromosomal coding joins derive from "rescue" of an otherwise lethal double strand break by an illegitimate recombination mechanism (56). Joins that are analogous in structure to aberrant scid coding joins have also been identified in B lineage cells from normal mice (59, 60) and in chromosomal translocations that are characteristic of certain lymphoid malignancies (61). Apparently, at a certain frequency, coding segment ends may "escape" the V(D)J recombinase in normal lymphoid cells and undergo recombination with other chromosomal sequences by an alternative pathway.

Identification o f V(D)J Recombinase Components
The V(D)J recombinase activity has been generally believed to consist of multiple components (68). Yet, surprisingly, this activity has been generated in a fibroblast cell line by transfection of genomic DNA (69). Sequences that confer the activity are contained within a 40-50-kilobase segment of transfected DNA, making it unlikely that they include two unlinked genes (69). The transfected region conceivably might encode a protein that together with factors present in the fibroblast can perform V(D)J recombination or else may encode a regulatory factor that activates expression of V(D)J recombinase components (69). In either case, this experiment may represent a significant step toward isolation of genes that encode these components. In addition, recent studies have detected factors that bind specifically to heptamer or nonamer signal sequences (66,671. Efforts to purify these factors or to isolate the genes that encode them are in progress. During B and T lymphocyte differentiation Ig and TCR variable region genes are assembled by an ordered and regulated process, apparently to ensure that these cells each express an appropriate set of receptor polypeptides (2, 6). However, in a pre-B cell line that assembles endogenous Ig H chain genes only, TCR D and J or Ig L chain variable region gene segments within a recombination substrate recombine frequently, apparently because the introduced segments are "accessible" for recombination (27, 28). These experiments provided the first experimental evidence indicating that assembly of Ig and TCR variable region genes is controlled by modulation of their "accessibility" to a common V(D)J recombinase (72) rather than by expressing segment-specific recombinase activities. Several germ line variable region gene segment loci are transcribed when they undergo rearrangement (6); these events may either directly or indirectly reflect control of recombinational "accessibility." Although transcription has been associated with enhancement of joining within a recombination substrate (28), the molecular mechanisms that determine recombinational "accessibility" have not yet been defined. In one recent set of transgenic experiments, it was clearly demonstrated that a transcriptional enhancer element was required for access of the recombinase to the substrate gene segments (73). It remains to be seen whether these mechanisms "open" Ig and TCR loci in a "nonspecific fashion" (e.g. to multiple enzyme systems) or whether they specifically influence V(D)J recombinase activity.

Factors That Direct V(D)J Recombination in Vivo
Assembly of the gene segments within a particular locus may also be an ordered process. This is best exemplified by the assembly of an Ig H chain variable region gene; in general, a DH segment is first joined to the 5' end of a JH segment, followed by appendage of a VH segment to the DH.JH complex ((49) Fig. 1). A similar ordering of rearrangement has been observed even within the limited confines of recombination substrates that contain V, D, and J segments and was correlated with increased nuclease sensitivity of the D and J-containing region (74), suggesting that ordered rearrangement might occur by sequential activation of regions of the locus to recombinase action. An alternative and not mutually exclusive possibility is that order is effected by a tracking mode of recombinase action. A tracking mechanism was first proposed to explain the observation that although DH segments are flanked on both 5' and 3 ' sides by recognition sequences compatible with DH to J H joining, joins utilizing the 5' recognition sequences (inversions) rarely occur (75). According to a tracking model recombinase enters the locus within the J H region and "tracks" along the DNA until it binds to a set of recognition signals, this mode of action yielding a preference for proximal signal sequences (75). Additional support for a tracking mechanism has come from the observation that the most JH-proximal VH segments are used highly preferentially in VHDJH rearrangements and that the relative usage of VH segments is in inverse proportion to their distance from the JH region (76)(77)(78). In the context of such a mechanism, recombinase activity could also be directed by the modulation of recognition sites independent of the signal sequences that could mediate its "entry" into variable region loci. However, order, directionality, and segment usage preference during V(D)J recombination could be mediated by other mechanisms. For example, differences in "accessibility" might account for preferences in VH segment usage (49,79). Directionality in DH segment recognition might be mediated by slight differences that exist between the 5' and 3' sets of recognition elements that flank these segments (17) or by other nearby sequences yet to be identified. Thus, although the V(D)J recombination event itself appears to involve recognition of only the recombination signal sequences, control of this process appears to be complex and to involve as yet uncharacterized mechanisms and factors.