Recombinant inbred strains are currently one of the best genetic resources for exploring phenotypic variance modulated by complex mixtures of genetic and environmental factors. A renewable resource of genetically defined genomes is an important advantage in exploring gene pleiotropy, genetic correlation and reaction norms [
1,
2,
3,
7,
8,
10,
11,
12,
27]. For example, Eleftheriou and colleagues [
31] exploited the CXB set to test effects of subtle environment differences (animals reared in Italy or at the Jackson Laboratory) on brain weight, and we have been able to revisit this same phenotype in the CXB set after an interval of 25 years. With the improved set of fully typed markers it is now feasible to map sets of QTLs under different environmental conditions, including temperature, pathogen load and food source, using RI strains. The modest number of RI strains, among other considerations, has, however, hindered their widespread adoption by mammalian geneticists. To improve the utility and power of complex trait analysis and to provide a better basis for collaborative QTL mapping, we have increased marker density in several of the major sets of RI lines and have merged data from over 100 mouse RI strains using a framework based on 490 shared markers. Approximately 1,000 unique SDPs (an average of about one per 1.5 cM) have been defined and mapped in the collected set. Three to four times as many SDPs remain to be discovered in the BXN set.
At the current marker density the cumulative RI map is about 5,000 cM long, roughly 3.6 times the length of standard intercross or backcross maps. When corrected using the Haldane-Waddington equation, the RI maps have a cumulative length of 1,400 cM, perfectly consistent with those of chromosome committee reports. Further improvements in the power and utility of RI strains will rely primarily on increased numbers and genetic diversity of these strains. Prospects are good, and more than 150 new mouse RI strains are currently being produced and genotyped by several research groups (see [
32] for an updated list of investigators and new RI strains). For example, in collaboration with J.L. Peirce and L.M. Silver (Princeton University, USA), we are now producing over 40 new BXD RI strains. The first 20 lines have already been typed at over 600 markers. A set of approximately 85 RI strains has recently been completed by B. Bennett and T.E. Johnson (University of Colorado, Boulder, USA) and these lines are currently being genotyped at approximately 400 markers.
Information content of RI strain sets
Despite the accumulation of genotypes in RI strains, these genetic resources have often not been typed with sufficient density to accurately define the frequency and positions of recombination breakpoints. For example, in the venerable set of 13 CXB strains, only 11 unique SDPs had been assigned to chromosome 1 before our work. With a more dense map of chromosome 1 that is now based on approximately 60 markers, we have recovered a total of 38 recombinations on this chromosome - approximately three recombinations per strain. The positions of these recombinations have been defined with a precision that ranges from 0.5 to 6.0 cM (2.3 cM average) as referenced to standard CCR maps. Twenty-one of the 38 SDPs are represented by one or more marker, but at least 17 SDPs remain to be defined and these SDPs unfortunately cannot be predicted unambiguously. For example, if two adjacent markers P and D have genotypes BBCCC and CCCCC, then there must be at least one unrecovered SDP between P and D. Until we actually type markers in the P-D interval, we do not know whether the intercalated SDP is BCCCC or CBCCC. To discover the undefined SDP could require considerable effort especially if available polymorphic markers on the P-D interval have been exhausted. All unrecovered SDPs lower the information content of an RI set. Their absence can significantly reduce linkage of both Mendelian and quantitative traits that are unlucky enough to be controlled by loci in the intervals with ambiguous SDPs.
How dense should a marker map be to define more than 90% of the total number of SDPs? With 862 markers, we were able to define approximately 60% of all likely SDPs among the 13 CXB strains. In the collected set of BXN RI strains, approximately 23% of the estimated 5,000 possible SDPs have been confidently defined with MIT microsatellites. We can estimate the density of the marker map that would be necessary to define 95% of all SDPs. For example, for the BXD set, if one assumes a random and independent distribution of breakpoints across strains and a random distribution of markers, it would take a map with about 2,700 markers to define 95% of the 1,536 SDPs.
Use of the BXN set
Most mapping software applications used by mouse geneticists are adapted for diallele crosses of various types. The BXN data set was therefore formatted in a way that collapses all non-B6 alleles into a single
N class. The collected set of just over 100 strains can be used without complication with software such a Map Manager QTX [
33,
34]. This procedure was used largely as a convenience to integrate RI genetic maps. There are self-evident limitations that follow from the collapse of all non-
B alleles (A/J, DBA/2J, C3H/HeJ and BALB/cByJ) into a single category. Geneticists using the BXN set should therefore begin virtually all studies by mapping with the individual component RI sets (AXB-BXA, BXD, BXH and CXB) to detect possible levels of allele effects (an allelic series). The
B allele is a common feature and may be a useful reference point for estimating hierarchies among the five parental alleles. This separate, set-by-set analysis prevents the
N alleles from averaging out, as they might in a cumulative analysis (the
N alleles will often have effects that are both higher and lower than that of the
B allele). Because the BXN set includes 490 common marker loci and a consistent alignment and integration of the component RI maps, it is now much easier to combine linkage likelihood ratios from the component RI sets. A simple method based on Fisher's method is described by Williams and colleagues [
8] in a study that pooled data from BXD and BXH sets. More sophisticated methods for automatically extracting and combining linkage statistics from the multi-allele BXN sets will require modification of mapping application programs. Pooling data in this way will require judicious and well justified statistical procedures. Combining data across the BXN sets can easily degrade a linkage analysis. The statistical exploration of different combinations of RI sets provides new degrees of freedom which may generate false-positive results, but which may also generate interesting hypotheses regarding QTL action.
The BXN map could be refined further by interpolating genotypes of other markers and genes that have been mapped independently by many investigators in single RI sets. Our BXD database includes only microsatellite loci, for example, and excludes hundreds of potentially informative polymorphic loci, many in interesting genes. We regret having to use this procrustean approach, but because of the difficulty of verifying genotypes and because numerous loci introduce improbable double-recombinant haplotypes, we have used exclusive criteria to ensure high-quality maps. Investigators interested in recovering some of this lost data should refer to the comprehensive lists of genotypes maintained by the Mouse Genome Database [
35]. However, genotypes of any marker and strain that introduce new double-recombinants into the BXN map should be regarded with a high level of suspicion.
Power and precision of RI strains
A set of 100 conventional RI strains will have twice the genetic variance of a matched set of 100 F2 progeny and four times that of 100 backcross progeny. This increased genetic variance comes at some cost: 100 F2 animals represent 200 meioses and contain almost 200 unique haplotypes per chromosome (the non-recombinant chromosomes reduce this number somewhat). RI strains are fully inbred and 100 lines represent almost 100 unique haplotypes per chromosome. A set of 100 RI strains therefore has approximately twice the load of recombinations as 100 F2s. For a semidominant Mendelian trait, 100 RI strains therefore provide roughly twice the precision of 100 F2 progeny and four times that of 100 N2 progeny. When both genetic variance and recombination load are considered together, a set of 100 RI strains should be approximately four times as effective (precise) for mapping complex traits as an F2, and eight times as effective as a backcross. This estimate assumes that only a single RI animal is sampled per line; a strategy that is appropriate for mapping SNPs, microsatellites and other Mendelian loci.
The gain for mapping quantitative traits will be greater and will depend strongly on the heritability and to a lesser extent on the degree of dominance at each locus. Belknap [
3] has compared the relative power of RI strains and F2 intercrosses under several models and assuming different levels of heritability. For morphometric traits such as brain weight, with narrow sense heritabilities of around 0.5, 100 RI strains will provide a level of precision and power that is conservatively equivalent to that of 600-1,000 F2 intercross progeny. The advantage shifts further in favor of RI strains for traits with lower heritability. Power is one key issue in QTL mapping, but at present, precision - the ability to fine-map QTLs to subcentimorgan intervals suitable for candidate gene analysis - is the hurdle, and one that would be less imposing with improved RI resources [
36].
Making better RI resources
The usefulness of RI strains for mapping is largely a function of the number of known recombination breakpoints and useful polymorphisms that they harbor. All current mouse RI sets are small, and consequently the most common criticisms leveled at QTL mapping with RI strains is that the precision and power are poor and that only those QTLs with unusually large effects can be detected reliably. The BXN set provides only a partial solution to this problem by expanding the set of RI strains that can be treated statistically as a complex cross. A much better long-term solution is to generate larger sets of RI strains for high-precision complex trait analysis. RI sets consisting of 100 to 1,000 lines could provide very impressive power and subcentimorgan precision. The LXS set (80-90 strains) and the enlarged BXD set (70-80 strains) mentioned above will soon provide practical demonstrations. Generating large sets is an undertaking, but the effort is dwarfed by ongoing mutagenesis and sequencing efforts. Generating, maintaining and storing 1,000 RI lines could be a well justified expense given the long-term utility of large RI sets in tackling otherwise intractable problems in functional genomics - gene pleiotropy, genetic correlations, epistasis and reaction-norm genetics - in a mammal.
Several other factors make this idea significantly more attractive. First, an RI set can be produced using more than two inbred strains. Four to eight strains could in principle be combined to make RI sets that segregate for a greater variety of polymorphisms. This addresses the concern that a single conventional diallele RI set may not be useful for studying particular traits because of a paucity of relevant polymorphisms. Such multi-way RI lines buck the reductionist trend of eliminating genetic complexity by isolating gene variants on inbred backgrounds, but such complexity has its advantages and these strains would provide welcome models for exploring genetic background effects that plague much of the current work on transgenic and knockout mice [
37]. Second, by genotyping and selectively breeding the most highly recombinant animals it should be possible to generate RI strain sets with map expansions that significantly exceed that predicted by the Haldane-Waddington equation, an equation that assumes random mating of sibs. A six- to eight-fold expansion should be attainable, particularly if recombinations are tracked before and during the inbreeding process (Figure ).
Recombination density could be further increased by starting RI strains from either advanced intercross progeny [
36] or heterogeneous stock (Figure ) as was done in making the new set of 40 BXD strains mentioned above. Third, the power of RI sets can now be amplified significantly by use of RI intercross (RIX) and RI backcross (RIB) designs [
19,
20]. Finally, large RI sets will largely eliminate the problem of non-syntenic association.
A second well justified objection to using RI strains to map quantitative traits is that fully inbred strains may not provide representative phenotypes precisely because they are inbred and subject to often severe inbreeding depression. The abnormal genetic architecture of inbred strains and the fixation of multiple alleles that affect fitness will almost inevitably produce unusual pleiotropic and epistatic effects on a range of complex traits. Outliers are common on these and other inbred lines. Can the strain means be trusted?
RIX progeny provide a surprisingly simple solution to this problem [
19,
20]. RIX progeny made among members of a single diallele RI set will be similar to an F2 intercross with an inbreeding coefficient of 0.5. Crosses between members of completely different RI sets (for example, AXB1 crossed to LXS80) will have an inbreeding coefficient close to zero. In this respect they will be more appropriate models of human genetic variation, but with the remarkable advantages of completely defined genometypes and the option of generating large numbers of isogenic individuals.
Using the BXN and their RIX progeny
QTLs mapped using RI sets can be quickly verified and positionally refined by generating sets of RIX and RIB lines between those parental strains that have recombinations in critical QTL intervals. The RIX method has already proved a highly effective way of extracting QTLs from the tiny set of 13 CXB strains [
19,
20]. The 13 inbred lines have the potential to be converted to as many as 156 F1 lines, of which small subsets can be selected based on parental genotypes to test particular candidate QTLs and to simultaneously recover gene dominance signal by generating F1 heterozygotes. This greatly increases the power to detect QTLs in the presence of strong genetic, parental and developmental background noise, and at the same time exposes gene dominance deviations to help refine QTL effect and position. The BXN opens up a huge RIX domain for analysis. Approximately 88 BXN RI strains are now available from the Jackson Laboratory, and these strains can be crossed to generate about 88 × 87/2 (3,828) genetically unique recombinant inbred intercross progeny (RIX progeny) with breakpoints in precisely defined intervals. Each one of these F1s can be made in reciprocal pairs to assess the role of parental effects (for example, a BXD1 mother crossed to an AXB2 father or vice versa) and, like RI strains, many isogenic individuals can be typed to reduce non-genetic variance.
Selected subsets of this huge pool of 3,828 unique RIX genomes can be made by crossing those RI strains with breakpoints in intervals thought to harbor QTLs. These interval-specific RIX progeny can be phenotyped and used to refine the genetic analysis of complex traits. Once QTLs have been mapped to candidate intervals, the subset of strains with recombinations within those intervals becomes an important resource for confirming and refining QTL location [
33]. This is especially the case if one exploits the RIX method. For example, if a QTL maps between 10 and 25 cM on chromosome 1 in the BXD set (that is between
D1Mit430 and
D1Mit375), and if
B alleles in this interval are associated with high phenotypes, then the cross of BXD15 with BXD20 may be particularly informative because the F1 hybrid is an obligatory
B homozygote on a short interval between 15 cM and 17 cM and is also an obligatory
D homozygote proximal to 13 cM and distal to 18 cM. A set of isogenic F1 RIX progeny made by crossing several RI lines with recombinations in a critical interval can be used to refine the probable position of a QTL. Map Manager QTX has now been updated to automatically generate the genotypes of the RIX progeny produced by a one-generation cross of RI parents [
34]. Given this huge sample of unique RIX genomes, even modest quantitative differences between C57BL/6 and other strains should be readily mapped (or confirmed) using the BXN and RIX mapping.
Mapping modifiers of dominant alleles using RI backcrosses
Knowing the precise location of breakpoints in RI lines also makes it possible to map modifier loci of mutations by making and phenotyping a set of different F1 crosses made between inbred carrier stock (for example, a knockout carried on a C57BL/6 background) and fully typed RI lines. A set of these RI backcrosses (RIB) has a genetic structure similar to a conventional N2 backcross, but there is no need to genotype any of the RIB progeny and they have the major advantage that isogenic progeny can be typed to obtain much more reliable trait scores. This method does depend on either a dominant or semidominant mutant allele, since the phenotype must be detectable on a significant fraction of the RIB progeny. Provided that this condition is met, the costs and logistics of this type of screen may be more modest than a typical screen for modifier loci. The analysis can be carried out without genotyping and using replicated genomes to test for environmental modulators.
BXN and sequencing efforts
Five of the widely used sets of RI strains that we have typed and analyzed share C57BL/6 as a parental strain. The genome of C57BL/6J is currently being sequenced as part of a public effort [
38] and for this reason, the utility of the BXN set for converting QTLs to strong candidate genes will increase significantly in the next few years [
37]. It will become far easier to generate complete lists of positional candidate genes and then to obtain data on gene and protein expression patterns. The two other major strains incorporated into the BXN set - A/J and DBA/2J - are also being sequence by Celera Genomics and, in principle, it will be possible to compare sequences of these three major strains to generate lists of possible allelic variants in positional candidate genes. The recent cloning of the
Sac locus that controls sugar and saccharin preference on distal chromosome 4, provides a good example of the increased power of candidate gene analysis. This locus was initially mapped using 20 BXD stains [
39,
40]. In the absence of high-resolution mapping, but with astute analysis of human and mouse sequence data,
Sac was identified almost simultaneously by several groups as the gene for the T1R3 receptor [
41,
42,
43,
44,
45,
46,
47]. In a few years, the identification of genes associated with QTLs will probably be no more of a special exception than the cloning of Mendelian genes was in the mid-1990s.