|Home | About | Journals | Submit | Contact Us | Français|
V(D)J recombination is the process that generates the diversity among T cell receptors and is one of three mechanisms that contribute to the diversity of antibodies in the vertebrate immune system. The mechanism requires precise cutting of the DNA at segment boundaries followed by rejoining of particular pairs of the resulting termini. The imprecision of aspects of the joining reaction contributes significantly to increasing the variability of the resulting functional genes. Signal sequences target DNA recombination and must participate in a highly ordered protein–DNA complex in order to limit recombination to appropriate partners. Two proteins, RAG1 and RAG2, together form the nuclease that cleaves the DNA at the border of the signal sequences. Additional roles of these proteins in organizing the reaction complex for subsequent steps are explored.
The adaptive immune system of vertebrate animals achieves its protective goals largely through the recognition of target antigens by antigen-binding proteins. These antigen-binding proteins belong to two families: the T cell receptors (TCRs), expressed by and retained on the surface of T cells, and the immunoglobulins (Igs), expressed by B cells and maintained as both cell surface receptors and secreted circulating proteins (antibodies). The overall structures of TCRs and Igs are conserved and are, in both cases, largely composed of repeated domains of the so-called immunoglobulin fold. One special property distinguishes these molecules from any other proteins in vertebrates. In addition to the conserved domains, these molecules also contain loops of peptide near their N-termini that fold into an antigen-binding pocket. The primary sequence of these peptide loops endow each pocket with selective specificity for binding to a particular antigen. In general, a given receptor molecule is very specific in binding to a target. To allow recognition of many antigens, the immune system creates a large population of receptor molecules that differ in the primary sequence of their antigen-binding regions. This population is called the ‘immunologic repertoire’ and is formed by a process of diversification and selection quite different from the way other gene products are produced in our cells.
For most genes two copies per locus is sufficient. Regulation is largely concerned with turning these genes on or off at the right time and in the right cells. Even given the additional complexity introduced by alternative processing pathways, the number of different gene products that are produced from one gene cannot exceed a small number. However, the number of specificities needed by TCR or Ig molecules vastly exceeds this. The total diversity anticipated in a human is estimated at around 1014 for Ig and about a whopping 1018 for TCR molecules [see Janeway et al (1), figure 4.34]. Clearly, with only approximately 105 genes in our entire genome, this kind of diversity cannot be achieved simply by inheriting genes preformed for each specificity. Furthermore, inheriting preformed genes suffers from the additional difficulty of a lack of plasticity. For example, the entire species would be continually at risk from variant viruses that were not recognized by an inherited repertoire. In actuality, when new viral variants do arise, we usually respond to them with new antigen receptors.
Both of these challenges are met by clever DNA sequence modification strategies that allow the immune system to generate a large, expressed repertoire from a much more limited, inherited germline set of sequences. T and B cells each use a site-specific DNA recombination mechanism called V(D)J recombination to cut and rejoin segments of DNA that, when assembled, encode the N-terminal variable portion of the TCR or Ig molecules. This is the portion of the antigen receptors that binds antigen. Elaborating upon the mechanism that coordinates this recombination will occupy most of this review. In addition, two other mechanisms, unique to B cells, further diversify the Ig gene products. One is called ‘Class’ or ‘Isotype’ switching, which is an independent (region targeted) recombination pathway that, upon induction, replaces one gene segment encoding a constant region for another. This allows a B cell (or its descendants) to utilize the same antigen-binding specificity in the context of different constant regions with important immunologic consequences. The mechanism of this reaction is yet to be fully determined, but two recent reviews describe intriguing links between this pathway and mismatch repair, non-homologous end joining (NHEJ) and mRNA processing (2,3). The third diversity-generating pathway present in B cells is termed ‘Somatic Hypermutation’ in which seemingly random single base changes are directed at high frequency into the gene segments encoding the antigen-binding pocket. The consequence is a proliferation of sequence changes that alter the specificity or affinity of an Ig already selected by antigen. If the resulting mutated version of the Ig displays a greater affinity for antigen, the new version of the gene is preferentially selected. Again, the mechanism is still uncertain but candidate error-prone or mutagenic reactions have been suggested (4–6), perhaps also associated with double-strand DNA breaks (7,8).
Much of the diversity in both TCR and Ig proteins is generated by the combinatorial recombination at the DNA level of inherited gene segments. Both TCR and Ig function as heterodimers, and each polypeptide is comprised of a variable region [assembled by V(D)J recombination] and a constant region (joined to the variable region by conventional mRNA splicing). The variable portion of one polypeptide in each heterodimer (Ig heavy chain and TCR β or δ chains) is assembled from three elements named ‘V’, ‘D’ and ‘J’ segments. The partner polypeptide (one of the Ig light chains κ or λ, or TCR α or γ) is assembled from V and J segments only. In mice and humans these segments are usually inherited as tandem clusters of related elements (Fig. (Fig.1A). 1A). The organization has evolved and is quite different among phylogenetically older vertebrates such as sharks. A detailed discussion of the origin of this recombination system and the evolution of the chromosomal organization of the various rearranging loci has recently appeared (9 and references therein, 10).
A brief discussion of the reaction steps in V(D)J recombination is necessary to introduce the key concepts, but these have been reviewed extensively. The reader is referred elsewhere for a comprehensive introduction to the process from the perspective of the substrate loci and the recombinant products (11,12). The key terminology, elaborated in Figure Figure1,1, is that the various ‘coding’ segments (drawn as colored vertical bars) are physically cut and joined together. The cuts occur in two steps, first by nicking a particular DNA strand adjacent to the heptamer, followed by cleavage of the second strand to yield a terminal hairpin on the coding DNA. The ligation of two coding segments forms a ‘coding joint’, which is deliberately imprecise. Small deletions, random insertions and occasional small palindromic additions are common at these junctions and arise through the processing of the intermediate hairpin end shown in Figure Figure1C. 1C. The junctional diversity greatly increases the yield of different receptor molecules, which otherwise would be limited to the multiplicative product of the number of segments. The DNA ligation step is dependent on proteins already known to participate in the repair of other DNA double-strand breaks, the so-called NHEJ pathway (reviewed in 13–19).
The cleavage is directed to a precise location by the presence of recombination signal sequences (RSSs). These are represented by triangles in Figure Figure1.1. The RSS is composed of conserved heptamer and nonamer elements separated by an intervening spacer of fixed length. The consensus sequence is shown in Figure Figure1D,1D, but variations are frequent, and deviation from an optimal sequence may modulate the efficiency with which particular sites are used (20), perhaps through the intervention of additional protein factors (21). The RSS elements have evolved over time (22). There are two classes of RSS, which are distinguished by the length of their spacer regions. Spacer length centers on 12 or 23 nt and the resulting signals are referred to as 12-RSS or 23-RSS (Fig. (Fig.1,1, white and black triangles, respectively). The existence of the two classes helps solve a potential pitfall in the recombination system. While it is desirable to allow recombination between, for example, any one V segment and any one D segment, it would serve no useful purpose to recombine two V segments with each other. This is avoided by organizing the RSS elements such that only one length class is used within a cluster of segments. In the case of the mouse Ig heavy chain locus, each of the V regions is associated with 23-RSS elements. The D regions in this locus are flanked on both sides by 12-RSS elements, thus allowing recombination to occur both upstream and downstream. Finally, the J segments of this locus utilize purely 23-RSS signals (Fig. 1B). The mechanism of recombination incorporates a ‘12/23 rule’ (23), which specifies that recombination be permitted only between segments of complementary RSS length. This allows V to D and D to J recombination but neatly prevents the undesired occurrences. D segments contribute only short lengths of coding information to the assembled polypeptide, typically from two to nine amino acids. True devotees will appreciate the totally different configuration of RSS elements at the mouse TCR δ locus. The D elements there, in contrast, are flanked by one 12-RSS and one 23-RSS, permitting D to D recombination and resulting in both VDJ and VDDJ products. Both result in functional molecules [see Lewis (11), figure 1]. The molecular implementation of the 12/23 rule is not fully determined, but will be addressed later.
While the joining of coding sequences, as portrayed in Figure Figure1C,1C, creates the assembled receptor gene, it is important for several reasons that the DNA cleaved from the coding sequence, which carries the RSS elements, also be rejoined. This DNA is shown as the circular product in Figure Figure1C.1C. The cleavage reaction leaves blunt double-strand breaks at the border of the heptamer (24,25), and these are typically joined to each other in precise heptamer to heptamer juxtaposition. These so-called ‘signal joints’ also use the NHEJ pathway, but are distinguishable from coding joints. Signal joints only rarely lose or gain bases and do not require the protein DNA-PKcs (26,27) for their formation. This probably reflects the additional processing demanded by the coding ends. Forming signal joints serves two apparent purposes. First, it prevents the undesirable integration of these DNA ends into other locations as studied in vitro (28–34), and indirectly, in vivo (35,36). The second need for signal joint formation reflects other topologies of the reaction. In the typical rendering of the reaction, as shown in Figure Figure1C,1C, the opposing orientation of the RSS elements creates a deletion circle. This is the most common configuration in natural loci. However, functional recombination does occur among gene segments in which the participating RSS elements are in direct orientation. Under this circumstance, rearrangement demands inversion of the DNA between the RSS elements rather than deletion, and signal joints are required for chromosomal integrity.
The molecular biology of V(D)J recombination experienced a revolution with the isolation of the two genes RAG1 and RAG2 (37,38). The name is derived from the acronym of ‘recombination activating genes’ and they remain the only identified tissue-specific genes required for V(D)J recombination. Extracts containing the encoded proteins, and later, purified versions of these two proteins, are able to cut DNA containing RSSs (39,40). Truncated versions of both RAG proteins are sufficient to catalyze the complete reaction yet proved to be more tractable for expression and purification (41–44).
Biochemical analysis of the recombination pathway is subject to the same fate as the proverbial blind investigators describing an elephant. Each assay only recognizes part of the total. Table Table11 lists the many functions to which the recombinase should contribute. Not every step need be an explicit function of the RAG proteins. The table is divided into two categories, reactions operating on one DNA substrate and therefore only requiring one set of interactions, and reactions that demand intelligence of the other DNA and therefore require a more elaborate complex. These steps will be addressed in the discussion that follows.
The mouse RAG1 protein, as a reference sequence, contains 1040 residues (119 kDa). A previous sequence alignment of five species (41) revealed a central core (residues 384–1008) with a higher degree of conservation than the surrounding sequence. In Table Table2, 2, the amino acid alignment has been updated to include more species over the region numbering 354–411 of the mouse sequence. This segment spans the N-terminal border of the RAG1 core to show how striking the evolutionary conservation is between sequences within and outside the core. In this PSI-BLAST (45) alignment, certain residues show absolute conservation and are listed on the top line. For example, the left half of Table Table22 shows complete preservation of two cysteine and two histidine residues that form part of a zinc-finger motif (46,47). This motif is believed to participate in RAG1 dimerization, but it does not appear essential for this function since the RAG1 core region dimerizes on its own (48,49). Except for the direct metal-binding amino acids, the N-terminal region shows a tolerance for sequence variation, including insertions, when compared to the mouse sequence. The conservation of sequence becomes almost absolute upon entering the core region and this property extends to the C-terminal end of the core, mouse residue 1008, with only a few islands of variation. The core region of RAG1 is sufficient (along with RAG2) to perform all the known enzymatic functions, and to catalyze the complete recombination of test substrates in cells. The core region alone also localizes to the nucleus and has a short half-life, as does its full-length RAG1 (41). However, the N-terminal third of the protein clearly has functional significance. Restoring parts of this region to the core has increased the efficiency of the complete recombination reaction using test substrates (50–52).
Naturally arising human RAG1 or RAG2 mutations fall into two classes. Null mutants of either protein result in complete immunodeficiency, but other mutations, which seem to retain low levels of activity, lead to the immunodeficiency called Omenn syndrome (53). The recent finding that some patients demonstrate complete failure of Ig rearrangements but retain some level of TCR rearrangements (54–57) suggests that there may be additional regulatory functions to the RAG proteins that are not identical in the different cell environments of B and T cells, despite the fundamental enzymatic similarity of the recombination reactions.
A basic sequence beginning 389-GGRPR can be found near the start of the core region (Table (Table2).2). Deletion of the region from residues 389–486 has been shown to interfere with DNA binding. Thus, this region is commonly termed the NBD (for nonamer-binding domain) although only indirect evidence strictly links these amino acids with that DNA contact (34,58,59). One might anticipate several other parts of the molecule also contribute to DNA binding. Site-directed mutagenesis has identified three conserved acidic amino acids, D600, D708 (60–62) and E962 (60), that are each essential for the nuclease activity. This triad and some evolutionary considerations support the proposal that the RAG proteins share a mechanism with bacterial transposases and HIV integrase (34 and references therein). One would expect that if these amino acids form the catalytic active site, they also must achieve close contact to the DNA. In support of these additional DNA interactions, mutations near the D600 site (substitutions and a deletion in residues 606–611) affect the target specificity of recombination (63–65). Direct contact between RAG1 and DNA has also now been assigned to an N- and C-terminal region of the core protein (66), which will be further discussed below.
The mouse RAG2 protein is 527 residues in length and, like the RAG1 protein, possesses a core region (1–382) that is sufficient to evoke the complete recombination of test substrates in cells (43,44). However, in contrast to RAG1, the core does not display the same coincidence of sequence conservation with enzymatic function. The C-terminal quarter of the molecule includes a striking acidic region and, as is the case for the N-terminal region of RAG1, may contribute a regulatory role to the reaction. It is reported that a certain stage in B cell rearrangement, the joining of heavy chain V region to the already assembled DJ segment, requires the C-terminal region of RAG2 (67). The authors of this last study suggest a role of RAG2 in governing access to particular loci. Mutations in RAG2 can also lead to Omenn immunodeficiency syndrome and the location of these mutations has been interpreted to support the prediction of a structural model for the RAG2 core (53,68–70).
The direct demonstration that the two RAG proteins together were capable of cleaving a DNA target at the border of the heptamer (40) (Fig. (Fig.1D) 1D) settled for many the question of whether double-strand breaks in the DNA were a reaction intermediate. Since then, there has been progress in analyzing the interactions between the proteins and substrate. It is now clear that the two proteins cooperate in binding, and both are required for each step of cleavage. RAG1 protein alone exhibits DNA binding with some intrinsic affinity for the nonamer versus random DNA (58,71–73). There is a difference in reports of whether RAG2 alone binds DNA. Mo et al. (73) detected DNA binding by RAG2 alone under different binding conditions and with reduced concentration of non-specific competitor than used by other investigators. All agree that together both proteins exhibit a higher affinity, specificity and stability in assembling a complex on DNA containing an RSS. It is worth noting, however, that the specificity of these two proteins binding to the target over competitor appears to be ~10–20-fold. This would not be sufficient to prevent frequent recombination activity at undesired locations, such as sequences resembling an RSS in unrelated genes (74).
Studies using oligonucleotide cleavage assays and either mutating the RSS or modifying the DNA structure with single-stranded regions (75,76) indicate that, in addition to base-specific sequence recognition, there is a structural component that favors unpairing of the DNA at the border of the coding region and the heptamer. This is where the hairpin DNA structure forms during cleavage. Distortion of the helical geometry of the DNA at this position in a complex containing both proteins was also detected by footprinting assays (72,77). This distortion might be accompanied by bends in the helical axis, and, in fact, such bends were detected (78,79). Additional footprinting and crosslinking studies provide further details of the interactions of each protein with the DNA. RAG1 was found to make direct contacts at both the nonamer and the heptamer (the latter only in the presence of RAG2) using crosslinking strategies (49,73,80). Footprinting RSS substrates and modification/protection strategies definitely indicate contacts by the proteins at the nonamer and heptamer, and suggest protection over the spacer of the 12-RSS. This last point is subtly contrasted by our own work using crosslinking assays (79), which indicated a much reduced interaction between the RAG proteins in the spacer region. We also detected enhanced binding of the HMG1 protein (which bends DNA and binds more stably at sites that are already distorted) at the 3′ side of the heptamer in the 12-RSS and within the spacer of the 23-RSS. We suggest that the DNA is severely bent between the heptamer and nonamer with the spacer regions looping away from the RAG proteins. This allows similar protein–DNA interactions to occur at the heptamer and nonamer on both types of RSS, with the difference in sequence length accommodated by the size of the loop (Fig. (Fig.3).3). This view is consistent with the previous observation that the length of the spacer appears to demand an integral number of full helical rotations but can tolerate an extra full rotation to 34 bp (76,81).
Using crosslinking strategies, RAG2 was detected at several nucleotides that also contacted RAG1 (79,80). This supports the idea that RAG2 does make independent DNA contacts, although there still exists the formal possibility that RAG2 is positioned on the DNA purely by tethering through RAG1.
As mentioned above, the protein HMG1 was used in certain binding studies. HMG proteins are abundant and ubiquitous nuclear proteins found throughout eukaryotic evolution (82), and which play a structural role in chromatin. The closely-related proteins HMG1 and HMG2 were each found to stimulate the cleavage reaction with various substrates (83–85). While its effect is more pronounced on 23-RSS substrates, the presence of HMG1 stabilizes the binding and cleavage of both classes of RSS (78,79). One laboratory (78) showed a direct interaction between the HMG proteins (1 or 2) and RAG1, with a relatively modest dissociation constant of 10–5 M. Our laboratory also noted this interaction in pull-down assays (unpublished results).
Joining is not coupled directly to the cleavage reaction, but rather follows additional processing of the coding ends. In theory, signal ends could be joined instantly to each other, but this also does not necessarily occur (86). Once the coding DNA is detached from the adjacent heptamer, there is no longer any feature that identifies this sequence. In model recombination reactions, the coding DNA can be completely replaced by an experimentally-selected sequence (11). There are consequences to some choices, and these will be discussed further below. Rampant chromosomal breakage could occur if the broken DNA ends were to escape from each other, so the retention of coding ends in a RAG protein complex after cleavage satisfies a conceptual need. The first evidence of an interaction between RAG1 and the coding DNA was the indirect observation that mutants of RAG1 altered the efficiency of recombination as a function of the coding sequence (63,64). Subsequent crosslinking studies directly detected contacts to both RAG1 and, more weakly, to RAG2 using an iodinated base analog at the –1 position (relative to the cut site) (80), or at the –2 position, but not the –4 position, using an azido-bearing adduct attached to the phosphate backbone (79). Consistent data were also obtained independently by crosslinking and immunoprecipitation studies (49). This contact is sufficient to prevent dissociation of the coding end. A gel-shift of the complex following cleavage in vitro still retains the cleaved signal and coding DNA (48). Subsequent joining of coding ends in a cell-free assay fails if the DNA is deproteinized following cleavage (87).
Since RAG proteins constitute an endonuclease capable of nicking DNA, and since the protein is already associated with the hairpinned coding end, it is enticing to picture this activity being used a second time to open the hairpin. The purified proteins are capable of this activity (88,89) and can also act as endonucleases on 3′ flap structures (90). This does not rule out the possibility that other DNA repair pathways, such as that involving MRE-11, could also perform this function in vivo (91,92), perhaps as alternative pathways. An important issue in this regard is the desire for the processing of the coding end to create a variety of products including deletion, palindromic insertion and random insertion through cooperation with terminal deoxynucleotidyl transferase.
The sequence of the coding flank DNA, while not engaging in any base-specific interactions, may influence the efficiency and outcome of the recombination reaction in several ways. This effect was first detected in comparison to the efficiency of recombination of plasmid substrates (93). As has been addressed above, distortion leading to single-stranded bubbles at the coding end/heptamer border may precede the nicking step. Certain coding sequences, originally identified as ‘good flanks’ with respect to recombination by mutants of RAG1 (63,64), are likely to promote this process, and these were found to be better substrates for the cleavage reaction at single sites in vitro (75,76). This issue was revisited very recently (65), to show that substrates containing unpaired coding flanks rescue hairpin formation using the mutant RAG1 proteins. These particular effects on hairpin formation may not be as pronounced when measured by concerted cleavage (81), and careful work seems to distinguish effects at nicking from the subsequent second cleavage step associated with hairpin formation (94,95). This will be discussed further.
The coding end sequence seems to have additional influence at later processing steps. This could affect hairpin opening, exonucleolytic polishing of the opened hairpin and pairing of the coding ends (which may be sensitive to microhomologies) (96–98). Therefore, the designation of coding ends as good or bad may vary considerably based on the assay employed. As mentioned previously, the joining of coding ends is at some stage dependent upon the activity of DNA-PKcs. An interesting connection between signal joint formation and coding end processing was observed in pre B cells and fibroblasts (99) carrying the murine scid mutation, which results in a C-terminal truncation of that kinase (100,101). With certain homopolymeric coding sequences adjacent to both RSSs, scid cells showed a reduction in the abundance of signal joints, arguing that the two processes are linked. Conclusions about the joining reactions depend, of course, on the cleavage efficiency remaining constant across the experiment.
Theorists would support the existence of a four-end complex that retains signal ends as well as coding ends based on the observation that these ends occasionally pair indiscriminately, in violation of the usual rules (11,102,103). This implies that all four ends remain in physical proximity for some time after cleavage.
The reader has been spared certain details about reaction conditions. The previous in vitro cleavage reactions, performed on single RSS substrates, proceed to double-strand breaks in reactions using Mn+2 as divalent cation (39). This ion seems to relax a constraint in the nuclease that exists when cleavage is undertaken in the presence of the more physiologic ion Mg+2 (104). The constraint is not absolute in activity on single RSS substrates. Nevertheless, when both 12- and 23-RSSs are present, double cleavage coordinated at both sites is obtained (105,106) in Mg+2, especially when supplemented by either HMG1 or HMG2. A detailed review of the role of metal ions in nucleic acid chemistry is available (107), and this topic has been also addressed specifically in V(D)J recombination (34). The demonstration that cleavage at two sites can be concerted under any condition profoundly alters our understanding of the reaction pathway. It means that a protein–DNA complex must assemble at the RSS elements, but the complex must wait for permission to cleave until a partner that satisfies the 12/23 rule is identified. Formation of this complex may be a rate-limiting step (108). Bacterial transposition (109–111), retroviral integration (112) and bacteriophage λ (which assembles distinct complexes under different conditions; 113) also exhibit analogous behaviors, though not always requiring distinguishable ends. In principle, this coordination of cleavage is a very good thing. It prevents chromosomal breakage until the partner for rejoining is already identified, thereby minimizing the duration of broken intermediates. It also provides an increase in specificity since isolated RSS-like sequences may be less likely to result in chromosome breaks. The initial observation that optimal concerted cleavage required both classes of RSS (105,106) showed a few-fold preference over cleavage with only one RSS available. Reality seems to be more complicated, providing uncharted territory for future investigation. Several investigators find that complete cleavage can be detected using substrates that only provide one RSS (81,104,114). The first cleavage step, that of nicking one strand 5′ to the heptamer, may in fact occur in the absence of any requirement for interaction with a second RSS. One recent study demonstrated this by inhibiting potential like-substrate pairing (12/12 or 23/23) by immobilizing the oligonucleotides. Kinetic analysis of cleavage of these substrates was consistent with this interpretation (95). It appears that it is actually the second step in the cleavage reaction, the formation of the hairpin, which is most sensitive to correct pairing (115). This study contends that in a strict sense, pairing can occur equally well among the various combinations of RSS substrates, but that only the 12/23 pairing promotes the reaction. One caveat might be a concern that the short tethered substrates used may impose unnatural constraints on the mechanism, though substrates with the same spacing seem to function in cells (116).
The first evidence that RAG2 participated in some manner in assisting or stabilizing RAG1-binding to DNA was acquired through a transcriptional reporter assay (58). Subsequent gel mobility shift experiments showed that one or more protein–DNA complexes that were formed were dependent on the presence of RAG2 in solution (71,72,117). Indirect assays using resistance to nucleases (118) or accessibility to terminal deoxynucleotidyl transferase (119) suggested similar conclusions. We excised the gel mobility shift bands that formed on 12- or 23-RSS-containing oligonucleotides, and showed that each contained RAG1 and RAG2 in stoichiometric equivalence (73). A study using the additive contribution of multiple copies of the maltose-binding domain to distinguish the mobility of each subunit (49) found RAG1 plus RAG2-containing complexes in both a 2:1 and 2:2 ratio. Bailin et al. (48) showed that RAG1, in solution and in the absence of DNA, existed as a protein dimer, but when isolated in the presence of RAG2, existed as a 2:2 tetramer. Bound to DNA, RAG1 in the absence of RAG2 formed an array of an increasing number of dimers, but in the presence of RAG2, formed two discrete species. In our hands, these represented the RAG1 dimer (only) and the tetramer comprised of RAG1 dimer plus two RAG2 molecules. Ferguson plot analysis suggests that this tetramer of proteins assembles on one DNA target (48). Gel filtration analysis of RAG2 showed monomer and higher forms in solution, but we are not convinced that the higher forms represent meaningful interactions. A current consistent picture is a tight RAG1 dimer with each partner associating independently with RAG2. RAG2 also either caps a surface on RAG1 or changes RAG1 conformation in a manner that prevents multimerization of the RAG1 dimer along one DNA molecule.
The peptide sequences contributing to the RAG1/RAG2 interface have been addressed in deletion and mutation studies (120,121) and are not entirely in agreement. It is, however, impossible at this time to exclude the possibility that deletions perturb the global protein architecture in a manner that would interfere with a distant protein interaction. The later study identifies one residue in RAG2 which, when mutated, interferes with complex formation with RAG1 as measured by coimmunoprecipitation and gel shift assays.
The tetramer of RAG1 and RAG2 can be viewed as the fundamental active enzyme unit in the recombination reaction. As suggested by Bailin et al. (48), one or two of these tetramers may participate in binding to the two RSSs. Very recent work (122) provides some additional insight by using mixed dimers of RAG1 in which one partner is defective in DNA binding (through replacement of 10 consecutive residues in the NBD with alanines) and the other is defective in nuclease activity (by single alanine substitution of individual critical acidic residues). The mixtures of mutations show that within the mixed RAG1 plus RAG2 complex, one RAG1 does the binding while the other provides catalysis for cleavage. New in this study is the evidence that all three acidic residues are provided from the same polypeptide, reducing the dimensions of cis–trans relationships within the dimers.
Figure Figure1C1C is drawn with the implicit assumption that a synaptic structure is constructed in two steps: first by assembling parallel complexes on the two RSSs, and second by docking these together to make a larger protein–DNA complex. If each RSS interacts with a RAG1 dimer (previously referred to as a single tetramer of RAG1 and RAG2) then the synaptic complex would represent the dimer of dimers of RAG1, plus the associated RAG2 proteins (Fig. (Fig.2A).2A). That model would be driven forward by raising the concentration of single-RSS protein-bound intermediates. This is not the only possibility. If a complex assembling on one RSS only occupies one-half of the binding sites within the dimer of RAG1, then a second DNA interaction could occur on that same complex. This would, in essence, require the first protein–DNA complex to sample additional DNA sequences and recruit the second target sequentially into the complex (Fig. (Fig.2B). 2B). Under the simplest of binding interactions, this process might be expected to be inhibited by high protein concentrations since the free sites would be saturated with RAG complexes that might interfere with successful scanning. This pitfall might be averted by positing a mechanism to hand off the DNA from one complex to another (Fig. (Fig.2C).2C). Distinguishing between one dimer of RAG1 or two in the synaptic complex is currently under investigation.
The suggestion has already been introduced that a protein complex contacts the RSS at the nonamer and heptamer and, in addition, holds the coding DNA through predominantly sequence-independent contacts. An attractive proposal is that the complex undergoes an isomerization following cleavage, which reorients the DNA ends for subsequent joining and may bias which members of the four-end complex rejoin. This isomerization, when executed correctly, would also retain the broken chromosome pieces in a single complex. One difficulty with this model would arise if all the DNA contacts occurred within a single, rigid, DNA-binding region of the protein. Reorientation of DNA ends under this circumstance would require physical dissociation of the DNA from the protein, with increased risk of escape from the complex altogether. Our recent work (66) relieves one of these difficulties. We find a trypsin-accessible site in the middle of the RAG1 core that may divide it into two separate protein domains. Using the chemical crosslinking approach, we find that the coding DNA is associated with the C-terminal region, while heptamer and nonamer make contact with the N-terminal region (Fig. (Fig.3).3). DNA binding would therefore be divided between two parts of RAG1, and allow reconfiguration by conformational changes without dissociation.
A minimal model of the joining reaction might suggest that once cleavage had occurred, rejoining of the ends becomes the full responsibility of the NHEJ pathway. Alternatively, the RAG proteins may remain associated with the DNA ends and play an active role in recruiting the repair proteins and directing the fidelity of the reaction. Three recent reports show that the minimal model is not sufficient (123–125). The first two reports, using different systems, appear to reach opposing conclusions regarding the effect of topology (cis versus trans) of the coding DNA at the time of joining. Additional support for a role of the RAG proteins in the latter steps of the reaction is obtained by the recent report that mutant versions of either RAG1 (126) or RAG2 (127) permit the initial cleavage of DNA including hairpin formation at the coding end, but fail at the stage of hairpin opening. If the same nuclease active site is used for each of these steps, as preferred, then these observations may be tied together by requiring a conformational change to expose the hairpin to the nuclease. Mutations that interfere with the conformational change would be consistent with the phenotype.
Broken chromosomes commonly induce apoptosis through p53 signaling. Clearly it benefits our cells to abstain from this response during V(D)J recombination. Nevertheless, interfering with the normal pathway does provoke apoptosis (128). One possibility is that a well-formed complex following cleavage sequesters the ends and prevents activation of any damage sensors in the cell. This simple model is not strictly true, as evidenced by a recent report that can serve as a gateway into the world of how DNA breaks are recognized by the cell (129).
The DNA recombination process that assembles V, D and J segments encoding the antigen receptors of our immune system requires a series of well-coordinated steps. At the heart of the reaction is the cleavage of the DNA at the appropriate sequences, but other steps are also necessary to generate junctional diversity and to assure the coordinated joining of the intermediates. These steps are performed by the two RAG proteins in association with other factors, some of which have yet to be identified. Overall, the reaction requires an elaborate machine with precise protein–DNA contacts and protein–protein interactions to handle the intermediates, without allowing them to diffuse away from each other.
This review benefited from discussion with others and especially the reviewers comments, but the author apologizes for any omissions. M.J.S. is a scholar of the Leukemia and Lymphoma Society and is supported by NIH grant AI41711.