A significant challenge facing high-throughput phenotyping of in-vivo knockout mice is ensuring phenotype calls are robust and reliable. Central to this problem is selecting an appropriate statistical analysis that models both the experimental design (the workflow and the way control mice are selected for comparison with knockout animals) and the sources of variation. Recently we proposed a mixed model suitable for small batch-oriented studies, where controls are not phenotyped concurrently with mutants. Here we evaluate this method both for its sensitivity to detect phenotypic effects and to control false positives, across a range of workflows used at mouse phenotyping centers. We found the sensitivity and control of false positives depend on the workflow. We show that the phenotypes in control mice fluctuate unexpectedly between batches and this can cause the false positive rate of phenotype calls to be inflated when only a small number of batches are tested, when the effect of knockout becomes confounded with temporal fluctuations in control mice. This effect was observed in both behavioural and physiological assays. Based on this analysis, we recommend two approaches (workflow and accompanying control strategy) and associated analyses, which would be robust, for use in high-throughput phenotyping pipelines. Our results show the importance in modelling all sources of variability in high-throughput phenotyping studies.
Klebsiella pneumoniae (Kp) is a bacterium causing severe pneumonia in immunocompromised hosts and is often associated with sepsis. With the rise of antibiotic resistant bacteria, there is a need for new effective and affordable control methods; understanding the genetic architecture of susceptibility to Kp will help in their development. We performed the first quantitative trait locus (QTL) mapping study of host susceptibility to Kp infection in immunocompetent Collaborative Cross mice (CC). We challenged 328 mice from 73 CC lines intraperitoneally with 104 colony forming units of Kp strain K2. Survival and body weight were monitored for 15 days post challenge. 48 of the CC lines were genotyped with 170,000 SNPs, with which we mapped QTLs.
CC lines differed significantly (P < 0.05) in mean survival time, between 1 to 15 days post infection, and broad sense heritability was 0.45. Distinct QTL were mapped at specific time points during the challenge. A QTL on chromosome 4 was found only on day 2 post infection, and QTL on chromosomes 8 and 18, only on day 8. By using the sequence variations of the eight inbred strain founders of the CC to refine QTL localization we identify several candidate genes.
Host susceptibility to Kp is a complex trait, controlled by multiple genetic factors that act sequentially during the course of infection.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-865) contains supplementary material, which is available to authorized users.
Klebsiella pneumoniae; Mouse model; Collaborative cross mice; Host susceptibility; QTL mapping; Candidate genes
The extent to which sex-specific genetic effects contribute to phenotypic variation is largely unknown. We applied a novel Bayesian method, sparse partitioning, to detect gene by sex (GxS) and gene by gene (GxG) quantitative loci (QTLs) in 1,900 outbred heterogeneous stock mice. In an analysis of 55 phenotypes, we detected 16 GxS and 6 GxG QTLs. The increase in the amount of phenotypic variance explained by models including GxS was small, ranging from 0.14% to 4.30%. We conclude that GxS rarely make a large overall contribution to the heritability of phenotypes, however there are cases where these will be individually important.
Post-translational protein modifications such as acetylation have significant regulatory roles in metabolic processes, but their relationship to both variation in gene expression and DNA sequence is unclear. We address this question in the Goto-Kakizaki (GK) rat inbred strain, a model of polygenic type 2 diabetes. Expression of the NAD-dependent deacetylase Sirtuin-3 is down-regulated in GK rats compared to normoglycemic Brown Norway (BN) rats. We show first that a promoter SNP causes down-regulation of Sirtuin-3 expression in GK rats. We then use mass-spectrometry to identify proteome-wide differential lysine acetylation of putative Sirtuin-3 protein targets in livers of GK and BN rats. These include many proteins in pathways connected to diabetes and metabolic syndrome. We finally sequence GK and BN liver transcriptomes and find that mRNA expression of these targets does not differ significantly between GK and BN rats, in contrast to other components of the same pathways. We conclude that physiological differences between GK and BN rats are mediated by a combination of differential protein acetylation and gene transcription and that genetic variation can modulate acetylation independently of expression.
Oligonucleotide microarray-based comparative genomic hybridization (CGH) offers an attractive possible route for the rapid and cost-effective genome-wide discovery of deletion mutations. CGH typically involves comparison of the hybridization intensities of genomic DNA samples with microarray chip representations of entire genomes, and has widespread potential application in experimental research and medical diagnostics. However, the power to detect small deletions is low.
Here we use a graduated series of Arabidopsis thaliana genomic deletion mutations (of sizes ranging from 4 bp to ~5 kb) to optimize CGH-based genomic deletion detection. We show that the power to detect smaller deletions (4, 28 and 104 bp) depends upon oligonucleotide density (essentially the number of genome-representative oligonucleotides on the microarray chip), and determine the oligonucleotide spacings necessary to guarantee detection of deletions of specified size.
Our findings will enhance a wide range of research and clinical applications, and in particular will aid in the discovery of genomic deletions in the absence of a priori knowledge of their existence.
Mutation; Deletion; Microarray; Genome; Comparative genomic hybridization; Probe density
Genetic variation in the major histocompatibility complex (MHC) affects CD4∶CD8 lineage commitment and MHC expression. However, the contribution of specific genes in this gene-dense region has not yet been resolved. Nor has it been established whether the same genes regulate MHC expression and T cell selection. Here, we assessed the impact of natural genetic variation on MHC expression and CD4∶CD8 lineage commitment using two genetic models in the rat. First, we mapped Quantitative Trait Loci (QTLs) associated with variation in MHC class I and II protein expression and the CD4∶CD8 T cell ratio in outbred Heterogeneous Stock rats. We identified 10 QTLs across the genome and found that QTLs for the individual traits colocalized within a region spanning the MHC. To identify the genes underlying these overlapping QTLs, we generated a large panel of MHC-recombinant congenic strains, and refined the QTLs to two adjacent intervals of ∼0.25 Mb in the MHC-I and II regions, respectively. An interaction between these intervals affected MHC class I expression as well as negative selection and lineage commitment of CD8 single-positive (SP) thymocytes. We mapped this effect to the transporter associated with antigen processing 2 (Tap2) in the MHC-II region and the classical MHC class I gene(s) (RT1-A) in the MHC-I region. This interaction was revealed by a recombination between RT1-A and Tap2, which occurred in 0.2% of the rats. Variants of Tap2 have previously been shown to influence the antigenicity of MHC class I molecules by altering the MHC class I ligandome. Our results show that a restricted peptide repertoire on MHC class I molecules leads to reduced negative selection of CD8SP cells. To our knowledge, this is the first study showing how a recombination between natural alleles of genes in the MHC influences lineage commitment of T cells.
Peptides from degraded cytoplasmic proteins are transported via TAP into the endoplasmic reticulum for loading onto MHC class I molecules. TAP is encoded by Tap1 and Tap2, which in rodents are located close to the MHC class I genes. In the rat, genetic variation in Tap2 gives rise to two different transporters: a promiscuous A variant (TAP-A) and a more restrictive B variant (TAP-B). It has been proposed that the class I molecule in the DA rat (RT1-Aa) has co-evolved with TAP-A and it has been shown that RT1-Aa antigenicity is changed when co-expressed with TAP-B. To study the contribution of different allelic combinations of RT1-A and Tap2 to the variation in MHC expression and T cell selection, we generated DA rats with either congenic or background alleles in the RT1-A and Tap2 loci. We found increased numbers of mature CD8SP cells in the thymus of rats which co-expressed RT1-Aa and TAP-B. This increase of CD8 cells could be explained by reduced negative selection, but did not correlate with RT1-Aa expression levels on thymic antigen presenting cells. Thus, our results identify a crucial role of the TAP and the quality of the MHC class I repertoire in regulating T cell selection.
The number of imprinted genes in the mammalian genome is predicted to be small, yet we show here, in a survey of 97 traits measured in outbred mice, that most phenotypes display parent-of-origin effects that are partially confounded with family structure. To address this contradiction, using reciprocal F1 crosses, we investigated the effects of knocking out two nonimprinted candidate genes, Man1a2 and H2-ab1, that reside at nonimprinted loci but that show parent-of-origin effects. We show that expression of multiple genes becomes dysregulated in a sex-, tissue-, and parent-of-origin-dependent manner. We provide evidence that nonimprinted genes can generate parent-of-origin effects by interaction with imprinted loci and deduce that the importance of the number of imprinted genes is secondary to their interactions. We propose that this gene network effect may account for some of the missing heritability seen when comparing sibling-based to population-based studies of the phenotypic effects of genetic variants.
•Heritability of murine complex traits has a significant parent-of-origin effect•Many mouse quantitative trait loci show parent-of-origin effects•Gene knockouts induce parent-of-origin-like expression changes in reciprocal crosses
A surprisingly large proportion of traits exhibiting inheritance patterns based on parent of origin indicates a network of interactions between imprinted and nonimprinted genes. The results suggest that these interactions may account for some of the missing heritability seen when comparing sibling-based to population-based studies of the phenotypic effects of genetic variants.
Genetic mapping on fully sequenced individuals is transforming our understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating novel genes in models of anxiety, heart disease and multiple sclerosis. The relation between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show the extent and spatial pattern of variation in inbred rats differ significantly from those of inbred mice, and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species.
The function of adult neurogenesis in the rodent brain remains unclear. Ablation of adult born neurons has yielded conflicting results about emotional and cognitive impairments. One hypothesis is that adult neurogenesis in the hippocampus enables spatial pattern separation, allowing animals to distinguish between similar stimuli. We investigated whether spatial pattern separation and other putative hippocampal functions of adult neurogenesis were altered in a novel genetic model of neurogenesis ablation in the rat. In rats engineered to express thymidine kinase (TK) from a promoter of the rat glial fibrillary acidic protein (GFAP), ganciclovir treatment reduced new neurons by 98%. GFAP-TK rats showed no significant difference from controls in spatial pattern separation on the radial maze, spatial learning in the water maze, contextual or cued fear conditioning. Meta-analysis of all published studies found no significant effects for ablation of adult neurogenesis on spatial memory, cue conditioning or ethological measures of anxiety. An effect on contextual freezing was significant at a threshold of 5% (P = 0.04), but not at a threshold corrected for multiple testing. The meta-analysis revealed remarkably high levels of heterogeneity among studies of hippocampal function. The source of this heterogeneity remains unclear and poses a challenge for studies of the function of adult neurogenesis.
Adult neurogenesis occurs in the rodent brain, but its function remains unclear. Current theories support the view that adult neurogenesis in the hippocampus supports pattern separation in the hippocampus, thereby allowing animals to distinguish between similar, overlapping inputs. However the effects of pharmacological, radiation and genetic ablation of adult neurogenesis on putative hippocampal functions have been inconsistent. We developed a novel genetic model to ablate adult neurogenesis in the rat. We found that we could reduce adult neurogenesis by 98%. Rats without adult neurogenesis showed no significant difference from controls in learning and memory tasks nor spatial pattern separation. We investigated the sources of heterogeneity in published results using a meta-analysis. The source of this heterogeneity remains unclear and poses a challenge for studies of the function of adult neurogenesis.
Periodontal infection (Periodontitis) is a chronic inflammatory disease, which results in the breakdown of the supporting tissues of the teeth. Previous epidemiological studies have suggested that resistance to chronic periodontitis is controlled to some extent by genetic factors of the host. The aim of this study was to determine the phenotypic response of inbred and Collaborative Cross (CC) mouse populations to periodontal bacterial challenge, using an experimental periodontitis model. In this model, mice are co-infected with Porphyromonas gingivalis and Fusobacterium nucleatum, bacterial strains associated with human periodontal disease. Six weeks following the infection, the maxillary jaws were harvested and analyzed for alveolar bone loss relative to uninfected controls, using computerized microtomography (microCT). Initially, four commercial inbred mouse strains were examined to calibrate the procedure and test for gender effects. Subsequently, we applied the same protocol to 23 lines (at inbreeding generations 10–18) from the newly developed mouse genetic reference population, the Collaborative Cross (CC) to determine heritability and genetic variation of control bone volume prior to infection (CBV, naïve bone volume around the teeth of uninfected mice), and residual bone volume (RBV, bone volume after infection) and loss of bone volume (LBV, the difference between CBV and RBV) following infection.
BALB/CJ mice were highly susceptible (P<0.05) whereas DBA/2J, C57BL/6J and A/J mice were resistant. Six lines of the tested CC population were susceptible, whereas the remaining lines were resistant to alveolar bone loss. Gender effects on bone volume were tested across the four inbred and 23 CC lines, and found not to be significant. Based on ANOVA analyses, broad-sense heritabilities were statistically significant and equal to 0.4 for CBV and 0.2 for LBV.
The moderate heritability values indicate that the variation in host susceptibility to the disease is controlled to an appreciable extent by genetic factors. These results strongly support the possibility of using the Collaborative Cross, as well as developing dedicated F2 (resistant x susceptible inbred strains) resource populations, for future dissection of genetic factors in periodontitis.
Periodontal infection; Experimental periodontitis; microCT; Collaborative cross; Genes; Heritability
Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits.
Regulatory sites of the genome affect gene expression and complex traits, including disease susceptibility. Variable regulatory sites are potentially interesting because they are a likely cause of phenotypic variation, providing a bridge between sequence and transcriptional variation. In this paper we identify regions of the genome where DNA is not wrapped up in chromatin (hence potentially regulatory) in eight inbred strains of mice. We compare sites that vary among strains and compare them to non-variable sites. We show that more than half of variable sites cannot be attributed to local sequence variation. Functional consequences (in terms of readily detectable changes in gene expression) are associated with less than 10% of variable DNase I hypersensitive sites. We show that variable sites are enriched for sequence variants contributing to complex traits in mice.
A significant challenge of in-vivo studies is the identification of phenotypes with a method that is robust and reliable. The challenge arises from practical issues that lead to experimental designs which are not ideal. Breeding issues, particularly in the presence of fertility or fecundity problems, frequently lead to data being collected in multiple batches. This problem is acute in high throughput phenotyping programs. In addition, in a high throughput environment operational issues lead to controls not being measured on the same day as knockouts. We highlight how application of traditional methods, such as a Student’s t-Test or a 2-way ANOVA, in these situations give flawed results and should not be used. We explore the use of mixed models using worked examples from Sanger Mouse Genome Project focusing on Dual-Energy X-Ray Absorptiometry data for the analysis of mouse knockout data and compare to a reference range approach. We show that mixed model analysis is more sensitive and less prone to artefacts allowing the discovery of subtle quantitative phenotypes essential for correlating a gene’s function to human disease. We demonstrate how a mixed model approach has the additional advantage of being able to include covariates, such as body weight, to separate effect of genotype from these covariates. This is a particular issue in knockout studies, where body weight is a common phenotype and will enhance the precision of assigning phenotypes and the subsequent selection of lines for secondary phenotyping. The use of mixed models with in-vivo studies has value not only in improving the quality and sensitivity of the data analysis but also ethically as a method suitable for small batches which reduces the breeding burden of a colony. This will reduce the use of animals, increase throughput, and decrease cost whilst improving the quality and depth of knowledge gained.
The genes involved in conferring susceptibility to anxiety remain obscure. We developed a new method to identify genes at quantitative trait loci (QTLs) in a population of heterogeneous stock mice descended from known progenitor strains. QTLs were partitioned into intervals that can be summarized by a single phylogenetic tree among progenitors and intervals tested for consistency with alleles influencing anxiety at each QTL. By searching for common Gene Ontology functions in candidate genes positioned within those intervals, we identified actin depolymerizing factors (ADFs), including cofilin-1 (Cfl1), as genes involved in regulating anxiety in mice. There was no enrichment for function in the totality of genes under each QTL, indicating the importance of phylogenetic filtering. We confirmed experimentally that forebrain-specific inactivation of Cfl1 decreased anxiety in knockout mice. Our results indicate that similarity of function of mammalian genes can be used to recognize key genetic regulators of anxiety and potentially of other emotional behaviours.
Thousands of small effect loci are believed to contribute to behavioural variation in mammals. Their abundance and small size frustrate gene identification and make it difficult to know which among them are central to the responsible biological mechanisms. Using imputed genome sequences from 2,000 outbred mice and by testing for an enrichment of functional annotations, we identify 167 candidate genes involved in anxiety. Unexpectedly, annotations implicate actin depolymerizing factors (ADFs), including cofilin-1 (Cfl1), as being involved with the expression of anxiety phenotypes in mice. We confirmed that forebrain-specific inactivation of Cfl1 decreased anxiety in knockout mice.
Structural variation is widespread in mammalian genomes1,2 and is an important cause of disease3, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear4,5. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 0.71M SVs at 0.28M sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 0.16M SVs were mapped to base pair resolution allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene-expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One third of the genes so affected have immunological functions.
The Collaborative Cross (CC) is a panel of recombinant inbred lines derived from eight genetically diverse laboratory inbred strains. Recently, the genetic architecture of the CC population was reported based on the genotype of a single male per line, and other publications reported incompletely inbred CC mice that have been used to map a variety of traits. The three breeding sites, in the US, Israel, and Australia, are actively collaborating to accelerate the inbreeding process through marker-assisted inbreeding and to expedite community access of CC lines deemed to have reached defined thresholds of inbreeding. Plans are now being developed to provide access to this novel genetic reference population through distribution centers. Here we provide a description of the distribution efforts by the University of North Carolina Systems Genetics Core, Tel Aviv University, Israel and the University of Western Australia.
We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.
Multicellular organisms can be regenerated from totipotent differentiated somatic cell or nuclear founders [1–3]. Organisms regenerated from clonally related isogenic founders might a priori have been expected to be phenotypically invariant. However, clonal regenerant animals display variant phenotypes caused by defective epigenetic reprogramming of gene expression , and clonal regenerant plants exhibit poorly understood heritable phenotypic (“somaclonal”) variation [4–7]. Here we show that somaclonal variation in regenerant Arabidopsis lineages is associated with genome-wide elevation in DNA sequence mutation rate. We also show that regenerant mutations comprise a distinctive molecular spectrum of base substitutions, insertions, and deletions that probably results from decreased DNA repair fidelity. Finally, we show that while regenerant base substitutions are a likely major genetic cause of the somaclonal variation of regenerant Arabidopsis lineages, transposon movement is unlikely to contribute substantially to that variation. We conclude that the phenotypic variation of regenerant plants, unlike that of regenerant animals, is substantially due to DNA sequence mutation.
► Regenerant Arabidopsis lineages display heritable phenotypic variation ► Regenerant Arabidopsis lineages display elevated genome-wide DNA sequence mutation ► Regenerant DNA sequence mutations comprise a distinct molecular spectrum ► Regenerant base substitution mutations confer heritable phenotypic variation
During a meeting of the SYSGENET working group ‘Bioinformatics’, currently available software tools and databases for systems genetics in mice were reviewed and the needs for future developments discussed. The group evaluated interoperability and performed initial feasibility studies. To aid future compatibility of software and exchange of already developed software modules, a strong recommendation was made by the group to integrate HAPPY and R/qtl analysis toolboxes, GeneNetwork and XGAP database platforms, and TIQS and xQTL processing platforms. R should be used as the principal computer language for QTL data analysis in all platforms and a ‘cloud’ should be used for software dissemination to the community. Furthermore, the working group recommended that all data models and software source code should be made visible in public repositories to allow a coordinated effort on the use of common data structures and file formats.
QTL mapping; database; mouse; systems genetics
The onset of flowering is an important adaptive trait in plants. The small ephemeral species Arabidopsis thaliana grows under a wide range of temperature and day-length conditions across much of the Northern hemisphere, and a number of flowering-time loci that vary between different accessions have been identified before. However, only few studies have addressed the species-wide genetic architecture of flowering-time control. We have taken advantage of a set of 18 distinct accessions that present much of the common genetic diversity of A. thaliana and mapped quantitative trait loci (QTL) for flowering time in 17 F2 populations derived from these parents. We found that the majority of flowering-time QTL cluster in as few as five genomic regions, which include the locations of the entire FLC/MAF clade of transcription factor genes. By comparing effects across shared parents, we conclude that in several cases there might be an allelic series caused by rare alleles. While this finding parallels results obtained for maize, in contrast to maize much of the variation in flowering time in A. thaliana appears to be due to large-effect alleles.
A large correlation between variation in T cell subsets and hippocampal neurogenesis suggests that the immune system has an unexpectedly large influence on the brain.
Neurogenesis continues through the adult life of mice in the subgranular zone of the dentate gyrus in the hippocampus, but its function remains unclear. Measuring cellular proliferation in the hippocampus of 719 outbred heterogeneous stock mice revealed a highly significant correlation with the proportions of CD8+ versus CD4+ T lymphocyte subsets. This correlation reflected shared genetic loci, with the exception of the H-2Ea locus that had a dominant influence on T cell subsets but no impact on neurogenesis. Analysis of knockouts and repopulation of TCRα-deficient mice by subsets of T cells confirmed the influence of T cells on adult neurogenesis, indicating that CD4+ T cells or subpopulations thereof mediate the effect. Our results reveal an organismal impact, broader than hitherto suspected, of the natural genetic variation that controls T cell development and homeostasis.
In adult mice new neurons are produced in the hippocampus, where they are thought to influence learning, memory, and emotional regulation. The mechanisms and functions of this neurogenesis, however, remain unclear. Here we report that in different strains of mice, variation in cellular proliferation in the hippocampus (an index of neurogenesis) correlates with variation in the relative proportions of the ratio of CD4+ to CD8+ T cells (an immunology phenotype). We also show that T cells can influence neurogenesis (but that neurogenesis does not influence T cells) by analyzing knockouts, depleting mice of T cells, and repopulating alymphoid animals. The strong genetic correlation between T cells and cellular proliferation in the hippocampus contrasts with the weak, often non-significant, correlation with behavioral phenotypes. Of significance, the findings here suggest that modulation of the functions of the hippocampus to influence behavior is not the primary role of neurogenesis.
Array comparative genomic hybridization (aCGH) to detect copy number variants (CNVs) in mammalian genomes has led to a growing awareness of the potential importance of this category of sequence variation as a cause of phenotypic variation. Yet there are large discrepancies between studies, so that the extent of the genome affected by CNVs is unknown. We combined molecular and aCGH analyses of CNVs in inbred mouse strains to investigate this question.
Using a 2.1 million probe array we identified 1,477 deletions and 499 gains in 7 inbred mouse strains. Molecular characterization indicated that approximately one third of the CNVs detected by the array were false positives and we estimate the false negative rate to be more than 50%. We show that low concordance between studies is largely due to the molecular nature of CNVs, many of which consist of a series of smaller deletions and gains interspersed by regions where the DNA copy number is normal.
Our results indicate that CNVs detected by arrays may be the coincidental co-localization of smaller CNVs, whose presence is more likely to perturb an aCGH hybridization profile than the effect of an isolated, small, copy number alteration. Our findings help explain the hitherto unexplored discrepancies between array-based studies of copy number variation in the mouse genome.
Genome-wide association studies using commercially available outbred mice can detect genes involved in phenotypes of biomedical interest. Useful populations need high-frequency alleles to ensure high power to detect quantitative trait loci (QTLs), low linkage disequilibrium between markers to obtain accurate mapping resolution, and an absence of population structure to prevent false positive associations. We surveyed 66 colonies for inbreeding, genetic diversity, and linkage disequilibrium, and we demonstrate that some have haplotype blocks of less than 100 Kb, enabling gene-level mapping resolution. The same alleles contribute to variation in different colonies, so that when mapping progress stalls in one, another can be used in its stead. Colonies are genetically diverse: 45% of the total genetic variation is attributable to differences between colonies. However, quantitative differences in allele frequencies, rather than the existence of private alleles, are responsible for these population differences. The colonies derive from a limited pool of ancestral haplotypes resembling those found in inbred strains: over 95% of sequence variants segregating in outbred populations are found in inbred strains. Consequently it is possible to impute the sequence of any mouse from a dense SNP map combined with inbred strain sequence data, which opens up the possibility of cataloguing and testing all variants for association, a situation that has so far eluded studies in completely outbred populations. We demonstrate the colonies' potential by identifying a deletion in the promoter of H2-Ea as the molecular change that strongly contributes to setting the ratio of CD4+ and CD8+ lymphocytes.
We show that commercially available mice are a resource for detecting single genes by genome-wide association. We surveyed 66 populations and identified those with properties conducive to high-resolution mapping. Importantly, we show that the same alleles contribute to variation in different colonies, so that when mapping progress stalls in one colony, another can be used in its stead. As a proof of principle, we detect the same QTL in different colonies influencing CD4+/CD8+ ratios and refine this mapping to the gene level. We show that a deletion in the promoter of H2-Ea is the molecular change that strongly contributes to setting the ratio of CD4+ and CD8+ lymphocytes. Our results make it possible for geneticists to make informed choices on the use of colonies for genome-wide association studies of complex traits in mice.
The 1001 Genomes project for Arabidopsis thaliana could provide an enormous boost for plant research for a modest financial investment.
We advocate here a 1001 Genomes project for Arabidopsis thaliana, the workhorse of plant genetics, which will provide an enormous boost for plant research with a modest financial investment.