|Home | About | Journals | Submit | Contact Us | Français|
State-of-the-art, genome-wide assessment of mouse genetic background uses single nucleotide polymorphism (SNP) PCR. As SNP analysis can use multiplex testing, it is amenable to high-throughput analysis and is the preferred method for shared resource facilities that offer genetic background assessment of mouse genomes. However, a typical individual SNP query yields only two alleles (A vs. B), limiting the application of this methodology to distinguishing contributions from no more than two inbred mouse strains. By contrast, simple sequence length polymorphism (SSLP) analysis yields multiple alleles but is not amenable to high-throughput testing. We sought to devise a SNP-based technique to identify donor strain origins when three distinct mouse strains potentially contribute to the genetic makeup of an individual mouse. A computational approach was used to devise a three-strain analysis (3SA) algorithm that would permit identification of three genetic backgrounds while still using a binary-output SNP platform. A panel of 15 mosaic mice with contributions from BALB/c, C57Bl/6, and DBA/2 genetic backgrounds was bred and analyzed using a genome-wide SNP panel using 1449 markers. The 3SA algorithm was applied and then validated using SSLP. The 3SA algorithm assigned 85% of 1449 SNPs as informative for the C57Bl/6, BALB/c, or DBA/2 backgrounds, respectively. Testing the panel of 15 F2 mice, the 3SA algorithm predicted donor strain origins genome-wide. Donor strain origins predicted by the 3SA algorithm correlated perfectly with results from individual SSLP markers located on five different chromosomes (n=70 tests). We have established and validated an analysis algorithm based on binary SNP data that can successfully identify the donor strain origins of chromosomal regions in mice that are bred from three distinct inbred mouse strains.
The laboratory mouse has emerged as the principal animal model for a large number and variety of preclinical studies in biomedical research.1,2 The influence of genetic background on phenotype is an important concern, and variation in genetic background can confound the interpretation of experimental data.3–9 Recently developed analytical tools have allowed the routine and inexpensive monitoring of mouse genetic background, for example, through submission of tail clippings to, and analysis at, shared resource facilities, such as DartMouse, the Speed Congenic Core Facility at the Geisel School of Medicine at Dartmouth.10
In the 1990s, the molecular analysis of genetic background advanced with the advent of PCR to query microsatellites or simple sequence length polymorphism (SSLP) markers.11–13 SSLP-based PCR analyses use primers that flank short repeats of di- or trinucleotides, most typically repeats of the dinucleotide CA.14,15 For a given SSLP, the number of repeats generally varies widely among different mouse strains. The large number of possible alleles (amplicon lengths) makes the technique useful for distinguishing the genetic origin of a specific chromosomal region of interest in a given genetically hybrid mouse. This is particularly advantageous when a mouse is a genetic mosaic bred from the interbreeding of three or more possible founder inbred strains. The mouse genome contains many thousands of SSLP. A selection of 50–100 SSLPs located at regular intervals along the genome allows for a reasonably complete first-pass assessment of genetic origin of a mouse.16 However, as SSLP alleles are typically assessed using separation technologies (e.g., electrophoresis through agarose or acrylamide gels), SSLP-based assays are not particularly amenable to high-throughput approaches, as each PCR is carried out in an individual tube and analyzed on an individual basis. Thus, a complete genome-wide assessment of genetic origin requires a minimum of 50–100 distinct PCR reactions and electrophoreses. This approach is labor-intensive and time-consuming and therefore, expensive and accordingly, has not been used routinely by the typical biomedical research laboratory interested in profiling the genetic backgrounds of mice populating their lab's mouse colony.
The development of multiplex single nucleotide polymorphism (SNP) PCR and chip-based technology has revolutionized the field,17 rendering the genetic analysis of the mouse genome accessible and affordable for the typical biomedical research laboratory. Genome-wide analysis using mouse SNP chips, containing >1400 SNPs covering the genome at an average density of well under 5 cM, is available to the wider research community at low cost and with rapid turnaround time. For example, as of this writing, DartMouse provides this service at $149/tail sample submitted, with a turnaround time of sample submission-to-report of 14 business days. The assay is multiplex, with the PCRs for all SNPs done simultaneously in a single tube and interpreted using a bead-chip reader. Thus, this technique lends itself well to high-throughput analysis, which in large part, accounts for the relative affordability of the technique for the typical biomedical research laboratory.
The inherent output of an individual SNP is binary. Whereas in theory, a given nucleotide in genomic DNA can be A, C, G, or T, in practice, each SNP typically has only two alleles (e.g., G substituted with C), and primers used in SNP assays are appropriately designed to distinguish only between alleles A and B (in the context of SNP alleles, the letter designations are arbitrary, and “A” is not meant to indicate the presence of adenine). One consequence is that SNP genotyping traditionally has not allowed for the identification of more than two alleles/SNP. If, for example, three inbred strains of mice are known to be involved in the generation of a genetically mosaic mouse, then for a single locus in the mouse, SSLP genotyping is the superior analytical technique, as it will yield unequivocal results.
In this study, our objective was to devise and validate a technique that would allow the use of a high-throughput SNP platform to identify genetic origin when three different inbred strains are to be considered as possible donors of genetic material in a given mouse.
Two F1 female (C57BL/6J female×DBA/2J male) mice and two F1 male (BALB/cJ female×C57BL/6J male) mice were purchased from The Jackson Laboratory (Bar Harbor, ME, USA) and bred to one another at Dartmouth Medical School, according to the Association for Assessment and Accreditation of Laboratory Animal Care practices, to produce 15 F2 mice. This project has been approved by the Institutional Animal Care and Use Committee at the Geisel School of Medicine.
Genomic DNA from parental inbred C57BL/6J mice, DBA/2J mice, and BALB/cJ mice, were purchased directly from The Jackson Laboratory. Genomic DNA from F1 mice and from F2 mice was prepared in-house: mouse tail clippings (5–10 mm) were placed into a Maxwell 16 instrument (Promega, Madison, WI, USA) and genomic DNA prepared according to the manufacturer's recommendations.
Genomic DNA was prepared using the GoldenGate multiplex (Illumina, San Diego, CA, USA) genotyping assay and analyzed using a Mouse Medium Density (MD) Linkage Panel (Illumina) consisting of 1449 SNP loci. In brief, 250 ng DNA was activated, precipitated, and biotinylated and then hybridized with query oligonucleotides and paramagnetic particles according to the manufacturer's recommendations. Following washing, extension, and ligation, DNA was fluorescently labeled and amplified using PCR, according to the manufacturer's recommendations. The PCR product was washed and rendered single-stranded via exposure to sodium hydroxide. The solution was pipetted onto BeadChips (Illumina) and hybridized overnight. Following washing, the BeadChips were coated and placed into a BeadArray Reader (Illumina), which captured fluorescence by high-resolution imaging. Images were interpreted using GenomeStudio software (Illumina), which provided the binary output (A vs. B) used for the donor strain assignment (DSA) algorithm.
Genomic DNA was subject to standard PCR using the SSLP markers (see Table 3). Markers were selected on the basis of predicted polymorphisms among all three parental strains (C57BL/6J mice, DBA/2J mice, BALB/cJ mice). Amplicons were run on a 4% low-melting agarose gel or using capillary electrophoresis via a QIAxcel Advanced instrument (Qiagen, Valencia, CA, USA).
Cognizant that the primary readout of each SNP assay is inherently binary, we took a computational approach to address the problem of distinguishing among three possible strains of origin. The interpretation of results in a three-strain analysis (3SA) begins with a DSA algorithm (Fig. 1) The DSA algorithm applies consecutive tests to each individual SNP in the array, based on the results of analysis of the three donor strains proposed to contribute to the genomes of the unknown samples.
The first step is a quality-control test, which for each individual SNP determines whether genotyping of the donor strains yields interpretable results. An uninterpretable SNP is one in which with any one or more of the three donor strains, the SNP genotype result is no call (NC) or heterozygous AB. NC indicates a technical error, whereas AB indicates a SNP genotype result of heterozygosity for one or more of the donor strains. An AB result from a donor strain is unexpected, as by definition, a pure inbred strain should yield a homozygous genotype (AA or BB) for each SNP queried. A SNP result of NC or AB for one or more of the three donor strains renders that SNP uninterpretable, and the SNP is ignored in subsequent steps. In a typical run, 97–100% of SNPs pass Step I (data not shown) and are allowed to move to Step II.
If a SNP passes Step I (i.e., is accepted as interpretable), it is then queried as to whether it is uninformative or informative. A SNP is deemed uninformative when all three donor strains (DS1, DS2, DS3) yield the identical genotype (AA, AA, AA or BB, BB, BB), as progeny will type identically at this SNP, regardless of the donor-strain origin of the chromosomal region harboring the SNP; an uninformative SNP is likewise ignored in subsequent steps. Conversely, a SNP is deemed informative when the genotype of one donor strain is different from the genotypes of the other two strains. A SNP that passes the tests applied in Steps I and II is interpretable and informative. Considering 12 different commonly used inbred strains of mice (C57Bl/6, 129/Sv1, BALB/c, FVB, C3H, NOD, A, C57Bl/10, DBA/2, SJL, AKR, SWR), there are 220 possible three-strain combinations. For each of the three-strain combinations in this set, approximately two-thirds (942±154; range, 511–1249) of the 1449 SNPs available on the Mouse MD Linkage Panel is informative for a given three-strain combination (unpublished data).
Based on the genotype data of the three donor strains, a DSA is given to each interpretable informative SNP. The DSA corresponds to the donor strain result for which a homozygous genotype result in the unknown sample may be considered unambiguous. Of the eight possible genotype combinations, six are informative and allow a DSA to be rendered (Table 1).
Once a DSA has been rendered for each interpretable informative SNP, the next step is to examine the genotype of the corresponding SNP within each unknown sample and to compare it with the DSA to determine inheritance. If the SNP genotype result is homozygous and corresponds to its DSA, then the SNP is considered homozygous for that donor strain. If the genotype result is homozygous for the alternative allele, then the SNP is considered homozygous for either of the other two donor strains. Similarly, a result of AB indicates heterozygosity, with one allele donated by the DSA and the other by either of the other two strains. Hypothetical examples of SNP interpretations using the algorithm in an unknown genome containing contributions from three inbred strains (BALB/c, DBA/2, and C57Bl/6) are shown in Table 2.
We tested the performance of the DSA algorithm by applying it to an actual mouse cross. Three inbred donor strains (C57Bl/6, BALB/c, and DBA/2) were interbred to create genetically mosaic mice with distinct contributions from each strain (Fig. 2) BALB/c females were crossed with C57Bl/6 males to generate F1B×C mice, and DBA/2 males were crossed with C57Bl/6 females to generate F1D×C mice. Then, the two strains of F1 mice were bred together to generate F2 mice. Each F2 mouse is unique and genetically mosaic with distinct genetic contributions from each of the three donor strains. Genomic DNA samples were prepared from one mouse of each donor strain, two mice each of the two F1 strains, and 15 individual F2 mice and then subject to multiplex PCR and analysis using a 1449 SNP Mouse MD Linkage Panel. Genotype results were interpreted according to the DSA algorithm. Results are displayed as genome maps, with colors used to indicate SNP interpretation (Figs. 33–5; and data not shown). By definition (i.e., derived from the DSA algorithm), each inbred donor strain mouse genotyped as homozygous for a subset of interpretable SNPs (Fig. 3), with each yielding a distinct genomic SNP map. Of 1410 interpretable SNPs queried, a total of 1204 (85.4%) could be assigned DSAs. Of these, 558 (39.6%), 307 (21.8%), and 339 (24.0%) were assigned DSAs of C57Bl/6, BALB/c, and DBA/2, respectively. The remaining interpretable SNPs (206, or 14.6%) were uninformative for any of the three possible strain combinations. Each F1 mouse genotyped as heterozygous at nearly all interpretable autosomal SNPs; the specific output correlated with the expected strain combination (Fig. 4). For the F1 samples, five to 10 (<1%) SNPs appeared to type as homozygous; such results are likely erroneous, although we did not attempt to test this assumption further. As expected, male F1B×C mice, sired by a male C57Bl/6 mouse, showed only BALB/c homozygosity on the X chromosome (Fig. 4). Fifteen F2 mice were also assessed, and as predicted, each was a genetic mosaic, showing a unique pattern of contributions from the C57Bl/6, BALB/c, and DBA/2 genetic backgrounds (Fig. 5; and data not shown).
By itself, a SNP that types as heterozygous is necessarily ambiguous, as one allele is donated by the donor strain to which that SNP is assigned, whereas the other allele can be donated by either of the other two donor strains. The ambiguity is resolved by assessing the genotype of the SNP in the context of those of closely linked SNPs; this process is facilitated by the visual inspection of the color-coded genomic SNP maps. For example, a run of SNPs, in which some are heterozygous BALB/c × either C57Bl/6 or DBA/2 (pastel red), and others are heterozygous C57Bl/6 × either BALB/c or DBA/2 (pastel green), reflects a region of heterozygosity with contributions from the BALB/c and C57Bl/6 strains and no contribution from the DBA/2 strain. This is depicted in more detail in Fig. 6, which displays genomic SNP maps corresponding to Chromosome 4 [0–80 megabases (Mb)]. Note that the SNP map of the F1B×C sample includes interpretations at every SNP with a DSA of BALB/c or C57Bl/6 but no interpretation at any SNP with a DSA of DBA/2. Similarly, the SNP maps of the F1D×C and F1B×D samples include interpretations at every SNP corresponding to the relevant DSAs but not for SNPs with a DSA of the third donor strain.
We used SSLP typing to validate the predictions made using the SNP-based 3SA. For example, a detailed view of Chromosome 3 (60–120 Mb) for each of the 15 F2 mice analyzed (Fig. 7A) allows for the assignment of strain origin at the specific chromosomal location 90 Mb (Fig. 7B). To validate these strain origin assignments, we used SSLP D3Mit28 (located at 90.4 Mb). The D3Mit28 PCR reaction yields amplicon sizes corresponding to the C57Bl/6, BALB/c, and DBA/2 alleles of 150, 202, and 188 base pairs, respectively (Table 3). The expected bands were observed upon PCR analysis of the three donor strains and the two F1 strains (Fig. 7C); analysis of DNA from the 15 F2 mice yielded a variety of band patterns. For each F2 mouse, the genotype determined from the SSLP analysis matched exactly that predicted from the SNP-based 3SA algorithm. We repeated this analysis using four additional SSLPs located on Chromosomes 5, 11, 12, and 16, respectively. In all cases, there was perfect correlation between the two techniques (n=70 individual 3SA calls; Table 3).
We have established and validated an analysis algorithm based on SNP data that can successfully identify the strain origins of chromosomal regions in mice that have been bred from three distinct inbred strains. A standard individual SNP assay distinguishes only two possible alleles. We designed a computational approach to extend the capabilities of traditional SNP analysis to include an ability to distinguish genetic contributions from more than two inbred strains of mice. We subsequently validated this approach using SSLP analysis, which directly distinguishes multiple alleles. In all cases examined (n=70), there was 100% correlation between the two techniques. The ability to distinguish among three distinct genetic backgrounds has heretofore been the purview of molecular biology assays that are not inherently limited to only two alleles/polymorphism queried. Such assays include not only SSLP analysis but also the older technique of restriction fragment-length polymorphism (RFLP) analysis.18,19 As SSLP and RFLP analyses are not readily adapted to high-throughput technologies, they are not widely used, and SNP-based analysis has become the predominant method for genome-wide strain background analysis of laboratory mice. The work presented here therefore overcomes one of the salient disadvantages of traditional SNP analysis: its apparent inability to distinguish among more than two strains of mice.
There are several applications for the 3SA technique. First, a more complete understanding of the genetic makeup of mice used in experiments can facilitate the interpretation of results from experiments in which genetic background may not be particularly well controlled. For example, mouse-to-mouse phenotypic variability within the same experimental group may be more sensibly interpreted if phenotype can be shown to be associated with the presence of a specific genetic background on a particular chromosomal region. Second, the 3SA technique can identify mistakes in breeding during mouse colony development or maintenance. If a breeding involves two inbred strains of mice, a suspicion that a previous generation may have inadvertently included a mouse from a third strain can be easily tested and readily confirmed or refuted. Third, the 3SA can substitute for genotyping assays designed to screen for transgenic/knockout carrier status in mice. Indeed, we recently assisted a colleague who was having technical difficulty screening mice to identify carriers of a particular gene knockout allele. The breeding scheme was to backcross a specific knockout allele from one inbred strain (C57Bl/6) into another (FVB). During the breeding process, the PCR used to identify knockout carrier status, for unknown reasons, had begun to fail, yielding no bands at gel electrophoresis. We knew that the knockout mouse had been generated originally with the use of homologous recombination in embryonic stem cells from a distinct, third background (129/Sv1) and therefore, applied the 3SA technique to the progeny to identify carriers. SNPs immediately flanking (i.e., tightly linked to) the locus of the knockout allele genotyped as heterozygous (FVB×129/Sv1) in approximately one-half of mice tested, as expected. Used in this way, the 3SA technique readily serves as a surrogate assay to direct genotyping to identify mice that are carriers of knockout alleles.
There are inherent limitations to the 3SA algorithm. First, its effectiveness relies on correctly identifying the three specific strains to be assessed. Second, genomic regions that show contributions from all three donor strains defy interpretation. However, in an F2 cross, beginning with inbred strains, such regions are expected to be rare or nonexistent, based on the infrequency of double meiotic recombinants within ~60 Mb of one another.20 Such double-recombinants are rare, owing to the phenomenon of crossover interference, in which one meiotic crossover event inhibits the development of a second meiotic crossover event in cis.21 A chromosomal region larger than ~60 Mb, lacking a single informative SNP, is of more concern (an example of such a region is shown in Fig. 8). As the size of such “empty” regions increases, the uncertainty associated with determining genetic origin increases as well. Finally, there are occasional isolated cases of individual SNPs of one genetic background flanked by SNPs of distinct background origin. In the context of a backcross, these likely represent miscalls and can be safely ignored. We typically encounter fewer than five to 10 miscalled SNPs/genome.
Notably, the number of strains that can be distinguished by the algorithm described here need not be limited to three. Applying a similar computational logic to four strains, one can assign a DSA to each SNP if the SNP is informative in a 4SA; that is, if one strain genotypes differently (e.g., as AA) from each of the other three strains (e.g., each genotype, as BB). Applying this algorithm, approximately one-half of SNPs on the Mouse MD Linkage Panel is informative, as compared with approximately two-thirds for three strain combinations (unpublished data). Moreover, the DSAs for the informative SNPs must be distributed among four genetic backgrounds, rather than three, lowering the density of informative SNPs/strain. The practical implication of this is that as the number of strains to be considered increases, there are fewer informative SNPs assigned to each donor strain. More specifically, in a 3SA, each strain is assigned ~320 SNPs (out of 1449), whereas in a 4SA, each strain is assigned only ~180 SNPs (unpublished data). Depending on the chromosomal distribution of informative SNPs, the Mouse MD Linkage Panel can be useful for a 4SA. Beyond four strains, the density of assignable informative SNPs/strain becomes even lower, further reducing the usefulness of the approach. However, this limitation could be overcome fairly easily (albeit not cheaply) by increasing many-fold the number of SNPs on the panel.
This work was supported by NIHP30RR032136-01 and NIHP20RR016437 [including Centers of Biomedical Research Excellence (COBRE) Supplement 09S2].