We have generated a novel set of 46 RI lines based on progenitors from two B6 × D2 advanced intercrosses. These lines have considerably more recombinations than the BXD RI set of lines, archiving an estimated 2.1-fold more recombinations per line. Over the subset of 22 genotyped ARI lines, the ARI strains archive a minimum of 1.4-fold and an estimated 1.6-fold to 1.9-fold increase in unique recombinations per line. Even using the more conservative estimate and considering only the 20 well-genotyped ARI lines, we have at least doubled the number of available recombinations in the BXD RI background. The additional 26 strains will approximately triple the number of unique and nearly quadruple the number of total recombinations available for analysis in this background.
BXD ARI advantages
The advantages and issues involved in utilizing RI strains for mapping have previously been extensively discussed [26
]. The ARI strains retain many of these characteristics. The chief advantages of ARI strains versus conventional RI strains, however, are greater potential mapping precision and lower cost per archived total and unique recombination, both of which stem from the higher recombination density. We estimate that the set of 46 ARI strains will ultimately provide a number of unique, characterized recombinations equivalent to at least 90 conventional F2-derived BXD RI strains – a considerable saving in facility space and costs per archived recombination. The increased recombination density in these strains is also ideal for mapping techniques such as RIX mapping, providing better mapping resolution with a greater fraction of useful strains. We strongly recommend this method as a means of developing a long-term resource from any currently existing advanced intercross lines.
In addition, extending the number of available BXD RI strains allows researchers to take advantage of the extensive work that has already been done using these strains and their parental lines. Since the BXD RI strains were also the largest previously available mouse RI population, they have been extensively phenotyped. At least 626 phenotypes, including a large number of alcohol-related phenotypes [6
] and a wide variety of observations from methamphetamine response [33
] to stem cell number [34
], have been studied in the currently available BXD RI strains. These data are easily accessible via the published phenotypes database and QTL analysis tools that are part of the WebQTL project [35
]. The availability, also via WebQTL, of a large set of forebrain gene expression phenotypes derived from Affymetrix expression studies of the previously available BXD RI strains further increases the value of this extended RI set. Previously, the 34 existing BXD RI strains had the power to reliably detect (power = 0.80, p < 0.0001) only QTLs accounting for 47% of between strain variance [38
]! These additional strains make it possible to reliably detect QTLs accounting for only 24% of genetic variance. With a second, independent population for statistical confirmation (power = 0.80, p < 0.05), the additional strains allow reliable detection of QTLs accounting for as little as 9% of genetic variance. Having more strains available will also give us sufficient power to characterize some simple epistatic interactions for loci with relatively large effects.
Additionally, sequence data is available for both parental strains and, by imputation, for all well characterized BXD RI strains. B6 sequence data [39
] is publicly available and D2 sequence data is available via Celera Discovery System [40
] subscription. Since a QTL should typically relate to a DNA polymorphism between the parental strains, a list of all such polymorphisms in a QTL region is a valuable tool.
The chief disadvantage of ARI strains compared to RI strains is the relatively complicated relationship between ARI strains and the inability to assume that ARI recombinations are unique, which is particularly important in fine mapping efforts. Another major disadvantage is that ARI strains are more difficult and time consuming to create than conventional RI strains, since they require a well developed AIL cross. Also, a given AIL can only be profitably used to create a limited number of ARI strains.
In addition, the AIL population from which the ARI progenitors were drawn is not in Hardy-Weinberg equilibrium. While the overall frequency of B6 and DBA alleles in the AILs is similar (55% and 53% B6 alleles for Group B and Group A, respectively), the frequency of alleles at a given locus varies widely, affecting the likely composition of the resulting ARI population on a per locus basis. For instance, the genotypes for the Group A proximal chromosome 4 and the majority of the Group B chromosome 10 are almost entirely B6 derived.
In our analysis of heterozygosity and evaluation of unique recombinations (using the set of pairs of very closely spaced markers), there was a larger than expected number of cases where one parent had a B, H or D, H genotype while the other was B, B or D, D. These cases represent a considerable fraction of recombinations in the ARI population, and are somewhat surprising given the inbreeding of most of these strains, suggesting that either heterozygotes have some selective advantage or that a small number of genotyping errors have occurred. Unfortunately in most cases there are not flanking markers close enough to meaningfully check these data and distinguish between these possibilities. These cases increase heterozygosity of the population, decrease unique recombinations, and generally provide a conservative bias to these measures.
Early genotyping and heterozygosity
Full inbreeding (20 generations) of a mouse inbred line takes an average of four to five years, though the great majority of inbreeding is accomplished in the first half of that time. In order to gain several potential years of useful analysis and to make our strains available to the community more quickly, we genotyped a total of 22 strains relatively early in the inbreeding process. Naturally, there are a significant number of heterozygous regions still present, and, in fact, there was considerable variation in the number and size of these regions between lines, suggesting that some lines, BXD48 and BXD65 for example, may actually have experienced several generations of cousin-cousin, rather than brother-sister mating. Ultimately this will only serve to increase the number of recombinations in these strains, but proximally they have fewer defined recombinations.
Ultimately, we will re-genotype all strains after full inbreeding is achieved. Early genotypes greatly facilitate the current usefulness of the strains, but must be treated with caution. For instance, heterozygous regions in these strains should be treated as unknown regions, and researchers should be aware of potential mis-assignment of homozygotes in a small number of cases. Likewise, caution should be exercised in comparing phenotypes between animals at intermediate stages of inbreeding and animals comprising the resulting fully inbred lines, though for highly polygenic traits this will be less important. An easy precaution is to take DNA samples from phenotyped animals and confirm genotypes at loci of interest via pooled genotyping if needed. For applications where a somewhat higher noise level is tolerable, early genotyping is a valuable means of accelerating the usefulness of RI-like lines by several years. We have, for instance, successfully used 20 ARI lines in a small QTL mapping study of alcohol preference (manuscript in preparation). There was some indication that a QTL was present for 4 of 8 previously observed QTLs [41
] in this small set, a reasonable result given that some of the previously observed QTLs may be false and that we do not expect to reliably detect real QTLs of modest effect size with this limited number of strains.
Unique recombinations and saturation of the AIL-derived recombination pool
The ARI strains archive a large number of recombinations per line. However, making ARI strains based on AIL progenitors is not a fully extensible strategy for making strains with high recombination densities. Since there is a limited, constant pool of AIL-derived recombinations, only inbreeding-derived recombinations will be novel once sampling of the AIL-derived recombinations is saturated.
In the currently genotyped lines the saturation level is quite low – between 0% (Group A) and 8% (Group B). This is reasonable considering that the initial pool from which the ARI progenitors were drawn consisted of 90–100 animals in the case of Group A and 40–60 animals in the case of Group B. The degree of saturation is an important issue for other investigators considering creation of similar strains from pre-existing AILs, since eventually the process will yield returns of unique recombinations approaching F2-based RI strains. Group B includes a total of 33 strains, and can serve as a partial model for this decision, albeit an imperfect one because different AILs will be based on different family sizes, breeding schemes, and generations.
Since it is difficult, in the absence of extremely precise genotyping information, to determine which recombinations are unique, we developed several approaches to this problem. Ultimately, the true average number of unique recombinations present in the genotyped ARI lines will fall between the average determined using our conservative estimate (59) and the estimated total average number of recombinations (85). It is more likely, however, that the number of unique recombinations will resemble our proportionate or experimental estimates given the known number of total recombinations. That is, the conservatively corrected estimate serves as a reasonable minimum number of recombinations/strain that can be expected for our current strain set, while our best guess at the actual number of recombinations is considerably higher.
In either case, it is clear we have not yet reached a point of seriously diminished return on the creation of new ARI lines. Investigators considering this approach can expect to generate at least 30–40 valuable strains from a single AIL population.
A similar heterogeneous stock based approach
A similar approach to the problem of archiving large numbers of recombinations per strain would be to use a heterogeneous stock (HS) [42
] as a progenitor. This approach would have a recombination density likely to be superior to an AIL-based approach, especially for longstanding HS populations, and has the additional advantage and complication of incorporating chromosomal segments from multiple strains. Because of the incorporation of input from many strains, mapping with these strains is likely to be both more versatile and more complex. The limitations of this approach with respect to expense, time of initial establishment, determination and treatment of unique recombinations and eventual diminishing returns, are quite similar to ARI lines. Additionally, however, detection of rare alleles could be problematic.
A seemingly similar approach using a HS population has been taken by Bennett and colleagues (Bennett, personal communication), who created a large 76 strain RI population (LXS) from a pair of inbred strains (ILS, ISS) derived from a randomly mated HS population based on 8 progenitor strains. This HS population was used to select populations that differed with respect to long and short sleep time in response to a hypnotic dose of ethanol [42
] and members of these selected populations were subsequently inbred. Because this effort started with two fully inbred strains and immediately commenced inbreeding, it is actually much more similar to an F2-based standard RI approach than to an AIL-based approach or the theoretical HS-based approach above. These RI strains will be extremely useful, especially in research on alcohol-related phenotypes.
Improving on the ARI model
Ideally, an RI-like mapping population should maintain a high density of archived, fully independent recombinations. One approach to generating such a population would be to start with 2n F2 × F2 breeding cages, where n/2 is the desired genome expansion prior to inbreeding. Breeding would then proceed as illustrated in Fig. . Briefly, each of the F2 animals carries an independent set of recombinations. The 2n initial crosses generate an independent set of F3 animals that will carry half of the recombinations present in the F2 population in addition to recombinations from the current generation cross. In each subsequent generation there will be a total of 2n/2g, breeding cages per line where g is the number of generations following the initial F2 cross. Each F3 animal can then be crossed with another F3, and so on. Since these animals share no common ancestors, all accumulated recombinations will be independent, and since at each generation half of the novel recombinations will be passed to the following generation, the genome expansion should proceed at a predictable rate of n/2. As an example, with an initial set of 32 crosses per resulting strain it should be possible to achieve a 2.5-fold expansion from pre-inbreeding breeding in addition to the usual inbreeding expansion. This is not as large an expansion as that of the current ARI lines (approximately a 75% improvement on the usual 3.3–3.4-fold expansion from inbreeding), but all recombinations will be independent and unique, so the technique is extensible to any desired number of strains. While the initial number of breeding cages and animals per line may seem excessive, this number decreases rapidly, and investigators can set up lines sequentially to minimize needed space and funding.
Figure 4 New RI-like breeding scheme. Novel proposed method for maximizing unique recombinations archived in a 2-way RI-like cross. The breeding scheme shown above and discussed more generally in the text results in a single strain with 75% more unique recombinations (more ...)
Another, potentially even more valuable approach is the creation of RI-like lines based on a number of initial progenitors larger than two. Such lines will include a more dense set of recombinations than the typical two progenitor approach and will allow analysis of a wider array of traits, especially given the rather limited diversity of a cross incorporating two parental strains that may often already share common ancestry. A large number of such strains would be a suitable community-wide resource for efficient fine mapping of complex traits, analysis of epistasis, and a wide variety of other interesting approaches and questions that currently await an appropriate tool. Of course, the proposal above is compatible with multiple strains by generating a population as described with each pair of animals to be included. The outputs of the population would replace the F1 animals in the multi-way cross for any cross design. The 1K Collaborative Cross proposed by the Complex Trait Consortium (CTC) uses such a design [43
Strength in numbers
The cross proposed by the CTC has another important aspect – it would consist of at least 1000 independent lines. As has been amply demonstrated in plants and other organisms, RI strains have many advantages as a mapping resource when a sufficient number exist to adequately power the investigations in question. This strain set, 80 lines in combination with the original BXD RI strains, will be the largest and most recombinant strain set available in mice but will still be much smaller than the strain set available in maize. If this strain set and the 77 member LXS strain set show promise at all, relative to the much smaller strain sets currently available, they should be considered proof-of-principle for a much larger enterprise. The sketchy reputation of RI-based complex trait mapping in the mouse genetics community will evaporate rapidly if we borrow a leaf from our colleagues in the plant genetics community and create a tool adequate to the statistical requirements of our desired results.
Availability of strains
We intend to make the BXD ARI lines widely available to the academic community. The first set of lines available will be those inbred at Princeton, as these are already extensively genotyped. Since the Princeton facility has a number of pathogens, which prevent export to most other animal facilities, we have rederived all but four of these strains and are establishing breeding colonies in the SPF facility at UTHSC. Once these colonies are established, strains will be made available to the academic research community both prior to and after complete inbreeding. We expect that at least 17 genotyped strains will be available by publication, with the remaining densely genotyped strains available within a few months. Subject to breeding constraints, we intend to make additional strains available as rapidly as possible.