|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: MY PL. Performed the experiments: PL HV. Analyzed the data: PL DW YL. Wrote the paper: PL. Other: Revise the manuscript HV.
Understanding the genetic basis of common disease and disease-related quantitative traits will aid in the development of diagnostics and therapeutics. The processs of gene discovery can be sped up by rapid and effective integration of well-defined mouse genome and phenome data resources. We describe here an in silico gene-discovery strategy through genome-wide association (GWA) scans in inbred mice with a wide range of genetic variation. We identified 937 quantitative trait loci (QTLs) from a survey of 173 mouse phenotypes, which include models of human disease (atherosclerosis, cardiovascular disease, cancer and obesity) as well as behavioral, hematological, immunological, metabolic, and neurological traits. 67% of QTLs were refined into genomic regions <0.5 Mb with ~40-fold increase in mapping precision as compared with classical linkage analysis. This makes for more efficient identification of the genes that underlie disease. We have identified two QTL genes, Adam12 and Cdh2, as causal genetic variants for atherogenic diet-induced obesity. Our findings demonstrate that GWA analysis in mice has the potential to resolve multiple tightly linked QTLs and achieve single-gene resolution. These high-resolution QTL data can serve as a primary resource for positional cloning and gene identification in the research community.
Humans and mice last shared a common ancestor 75 million years ago. Of mouse genes, 99% have homologues in humans and 96% are in the same syntenic location. Regions of the genome that are highly conserved between the two species align well with functionality . The mouse has been a powerful force in elucidating the genetic basis of human physiology and pathophysiology . Many genetic alterations identified in mouse models have a common genetic equivalent (or ortholog) in humans. In the past decades, quantitative trait locus (QTL) mapping has identified hundreds of chromosomal regions containing genes affecting cancer, diabetes, hypertension, obesity and other disease-related phenotypes in mice. Most of these complex traits show clear genetic components, yet the underlying genes remain largely elusive.
QTL analysis using mouse models can potentially identify genes that are important for human diseases, which are often difficult to identify in human populations, due to the complexity of human genetic data. However, a major obstacle of identifying QTL genes is the difficulty of resolving these chromosomal regions (10~20 cM) into sufficiently small intervals to make positional cloning possible. Recent advances in genomic sequence analysis and SNP discovery have provided researchers with the necessary resources to explore a wide range of genetic variation in laboratory inbred mice . The use of dense SNP maps in laboratory inbred mice has proven successful in the refinement of previous QTL regions and the identification of new genetic determinants of complex traits –. Recently, positional cloning of a novel gene underlying a complex trait was done immediately after a genome-wide association (GWA) analysis .
A large and diverse set of traits has been systematically organized in the publicly accessible Mouse Phenome Database (MPD) (www.jax.org/phenome), housed at The Jackson Laboratory (Bar Harbor, Maine, United States). Rapid and effective integration of mouse genome and phenome data resources can dramatically speed up the process of gene discovery (Figure 1). Here, we describe an in silico gene-discovery strategy through GWA scans to identify high-resolution QTLs in inbred mice. We further demonstrate that GWA analysis in mice has the potential to resolve multiple tightly linked QTLs and achieve single-gene resolution.
We first evaluated the laboratory inbred mouse haplotype map that was used in the analysis. This haplotype map was generated by a joint effort of The Broad Institute of Harvard and MIT, and the Wellcome Trust Center for Human Genetics (Text S1). In total, 148,062 evenly distributed single-nucleotide polymorphisms (SNPs), which span the mouse genome at an average density of ~18 kb per SNP, were genotyped in classical inbred mouse strains (Figure S1). Of these SNPs, 96.2% have minor allele frequencies greater than 0.05 and 76.7% of the inter-SNP distances are smaller than 20 kb. LD decay over the physical distance was observed where the mean r2 fell to <0.5 within 2 kb and to less than 0.23 within 200 kb. To capture large-scale patterns of LD, we also measured the number of proxies, i.e. SNPs, showing a strong correlation with one or more others , with a window size of 500 kb across the genome (Figure S2). Using a stringent threshold of r2=1.0 (that is complete LD), the average number of proxies per SNP is 2.5, and one in two SNPs have five or more proxies. Of all the SNPs, 89.5% have an r2=1.0 with at least one other SNP. Using a looser threshold of r2>0.5, the average number of proxies is increased to 6.5, and two in three SNPs have more than five proxies. Of all the SNPs, 91.3% have an r2>0.5 with at least one other SNP. These data indicate that the mouse haplotype map provides substantial coverage of genetic variation in inbred mice and is useful for high-resolution genetic mapping studies.
We performed a comprehensive survey of 173 complex quantitative traits from the mouse PHENOME projects instituted by the Jackson Laboratory (www.jax.org/phenome). In most PHENOME projects, each phenotype was measured for about 40 strains from a total of 59 inbred strains of mice with a wide range of genetic variation (Figure S3 and Text S1). The strains used for phenotyping varied slightly among different PHENOME projects. Generally, at least 10 males and 10 females from each strain were measured for each trait. These complex traits include models of human disease (such as atherosclerosis, cardiovascular disease, cancer, diabetes, obesity, and osteoporosis) as well as behavioral, biochemical, hematological, immunological, metabolic, neurological, and reproductive traits (Table 1). Most of these 173 mouse quantitative traits are highly heritable, with a mean heritability of 0.51, and 74% of these traits have a heritability larger than 0.40 (Figure S4). In general, cancer, taste and drinking preferences show less genetic control (h2<0.3), while body composition and plasma lipid concentrations show larger genetic control (h2>0.7). Of these mouse traits, 47% display significant phenotypic differences between males and females (P<0.05), contributing to 1–31% of the phenotypic variation in the inbred mouse populations. Notably, obesity-related traits such as body weight and fat mass have larger sex effects and explain more than 20% of the phenotypic variation in inbred mice. Nevertheless, sex-specific genetic variations are generally small (less than 6%) in the inbred mouse populations.
To identify the genetic basis of this large and diverse set of mouse phenotypes, we designed an in silico approach for gene identification through GWA scans in inbred mice. In our approach, GWA scans were implemented in an automatic processing pipeline which constitutes data retrieving, outlier detection, data preprocessing, hypothesis testing, permutation testing and QTL identification. The potential problem of population structure in inbred mice was also inspected (Figure S5 and Text S1). Totally, we identified 937 QTLs from the survey of 173 complex quantitative traits in inbred mice. These QTLs affect almost all types of mouse phenotypes and are distributed unevenly on all of the mouse chromosomes except chromosome Y (Figure 2A, and Table S1 and S2). The number of QTLs per trait ranges from 0 to 38, with a mean of 5.4 (Figure 2B). The average size of a QTL region is 0.54 Mb and 67% of the QTLs were mapped to genomic regions less than 0.5 Mb (Figure 2C). This translates to an extraordinary ~40-fold increase in mapping precision as compared with clasical linkage analysis. In addition, about 20% of the QTLs were located within genomic regions less than 100 kb away from at least one other QTL, implying the prevalence of pleiotropic effects of genetic manipulation in mice.
Figure S6 shows a general QTL landscape from our GWA scans of several obesity-related phenotypes, including body weight and lean mass on normal chow and high fat, high cholesterol atherogenic diets. The analyses identified a large number of QTLs, most of which contained multiple significant SNPs clustering in the same genomic region. Several QTLs affecting multiple obesity phenotypes were repeatedly mapped to the identical locations on distal chromosome 1 and proximal chromosome 13 in different datasets. Most of these QTLs are broadly concordant with those detected previously by linkage analyses which, however, normally encompassed 10–20 cM (about 20 to 40 Mb on the mouse genome) . In some scenarios, several separated QTLs from the GWA scans were mapped to a single linkage-defined QTL region. This lack of precision is mainly due to limited recombinants in these linkage mapping populations. Our GWA scans also found several novel QTLs that do not overlap with previous linkage analysis regions (Figure S6). For example, we fine mapped a novel obesity QTL to a region of approximately 700 Kb (33.98–34.71 Mb) on proximal chromosome 9, which covers two candidates: Kirrel3 and A130066N16Rik; and mapped a novel QTL to a region of approximately 100 Kb (56.28–56.37 Mb) on distal chromosome 19, which covers Habp2 and Nrap. These novel QTLs identified in the GWA are not unexpected since association mapping detects allelic associations at the level of the population, which is normally comprised of 30–40 genetically diverse inbred mouse strains, rather than allelic difference between two specific parental strains in linkage mapping. Furthermore, all the obesity QTL regions detected by the GWA analyses using natural inbred mouse population samples are much smaller (less than 0.6 Mb on average with the present SNP density of 18 kb per SNP) as compared with linkage analyses. This makes positional cloning feasible immediately after GWA scans without the need for further time-comsuming fine mapping with congenic strains.
GWA scans help resolve multiple, tightly linked QTLs. We again used obesity QTLs to demonstrate this potential. So far, at least 12 obesity QTLs have been mapped to distal mouse chromosome 1, and most of them overlap with each other and may be linked to the same gene (Figure 3A). Among these, Bwtq1 was originally identified in a cross between DBA/2J and C57BL/6J  and later refined to an 8-cM region (~33 Mb) between D1Mit30 and D1Mit57 . This QTL has a general effect on body size, affecting the length of the tail and most bones studied, as well as having a weaker effect on mass. Using congenic strains, Bwtq1 was narrowed down to a 1.4-cM region (~4 Mb), approximately the region from D1Mit451 to D1Mit219 . The location of Bwtq1 was further refined to 0.91 Mb between D1Icp7 and D1Icp10 using interval-specific subcongenic strains, which led to the identification of a strong candidate, Pappa2, encoding an enzyme that cleaves an insulin-like growth factor binding protein (Figure 3B) . However, our GWA scans identified two closely linked QTLs within 5 Mb of each other on distal chromosome 1. One is associated with several different obesity phenotypes in three independent datasets and is located within 153.5–155.5 Mb (–log(P)=8.2–11.4) and the other reproduced the refined Bwtq1 QTL, showing significant effects on body weight and bone mineral content (–log(P)=7.2). These results illustrate the robustness and reproducibility of GWA scans for complex quantitative traits in inbred mice.
The refined region of previous QTLs was reduced, on average, to 0.54 Mb, which covers on average 6 genes on the mouse genome. Characterization of such a small number of genes in each of these QTLs can greatly accelerate the discovery of genes underlying complex traits. We further performed a detailed survey of QTLs for all obesity-related phenotypes such as body weight, fat mass, lean mass and leptin levels. Of the QTLs detected in the present study, we identified 10 candidate genes within these QTLs that, when mutated or overexpressed as transgenes in the mouse, result in phenotypes affecting body weight and adiposity. These genes include Gpr39 (G protein-coupled receptor 39), Crhr2 (corticotropin releasing hormone receptor 2), Cdkn1b (cyclin-dependent kinase inhibitor 1B), Adam12 (ADAM metallopeptidase domain 12), Rasgrf1 (RAS protein-specific guanine nucleotide-releasing factor 1), Chrm3 (cholinergic receptor, muscarinic 3), Mapk8 (mitogen activated protein kinase 8), Wnt10b (wingless related MMTV integration site 10b), Dusp1 (dual specificity phosphatase 1) and Cdh2 (cadherin 2) (Table 2). For example, an obesity QTL on the proximal chromosome 13 has been replicated in at least four linkage studies –, and has not yet been narrowed down. Our GWA scans identified a refined QTL (8.0–10.0 Mb) covering the gene Chrm3 to be associated with four obesity phenotypes in three independent datasets (-log(P)=6.7–8.6) (Figure 4A). Chrm3 is a member of the muscarinic acetylcholine receptors that play critical roles in regulating the activity of many important functions of the central and peripheral nervous system . Chrm3 knockout mice had a significant reduction in food intake, body weight (25%), and peripheral fat deposits. The lack of Chrm3 also led to pronounced reductions (~5–10-fold) in serum leptin and insulin levels .
Similarly, we performed a survey of QTLs for plasma lipid levels which are correlated with the risk of atherosclerotic coronary heart disease . We identified 11 candidate genes in the refined regions of QTLs affecting levels of high density lipoprotein (HDL) cholesterol, low density lipoprotein (LDL) cholesterol, and triglycerides. These genes are Gpr39 (G protein-coupled receptor 39), Apoa2 (apolipoprotein A-II), Cd36 (CD36 antigen), Cckar (cholecystokinin A receptor), Npy (neuropeptide Y), Pparg (peroxisome proliferator activated receptor gamma), Ccr2 (chemokine (C-C motif) receptor 2), Acaca (acetyl-Coenzyme A carboxylase alpha), Apob (apolipoprotein B), Soat2 (sterol O-acyltransferase 2) and Lipg (endothelial lipase) (Table 3). All of these genes were observed to have strong involvement in plasma lipid phenotypes in murine knockout and/or transgenic models. Notably, we identified a QTL containing Apoa2 on chromosome 1 located at 173.0–173.9 Mb, which affects the percent of total plasma cholesterol in the HDL fraction after 8 weeks on an atherogenic diet (-log(P)=7.9) (Figure 4B). Apoa2 is located within a strong HDL QTL, hdlq5, which has been repeatedly detected on distal chromosome 1 by linkage analyses in 12 different genetic crosses . Homozygous Apoa2 knockout mice had 67% and 52% reductions in HDL cholesterol levels in the fasted and fed states, respectively, and HDL particle size was also reduced .
GWA scans have the potential to achieve single-gene resolution, depending on local LD patterns and gene density on the mouse genome. A closer view of the above 10 obesity candidate-gene regions (Table 2) further identified two QTLs that were mapped with single-gene resolution. One QTL significantly affects the percent of fat mass after 8 weeks on an atherogenic diet (-log(P)=6.5) and is located on chromosome 7 spanning less than 100 kb where Adam12 is the only candidate gene (Figure 5A). Meltrin alpha (Adam12) is a metalloprotease-disintegrin that belongs to the ADAM (for “a disintegrin and metalloprotease”) family. Compared with wild-type mice, meltrin alpha(-/-) mice displayed moderate resistance to weight gain induced by a high-fat diet, mainly because of an increase in the number of adipocytes . Another QTL is significantly associated with mouse liver weight after 8 weeks on an atherogenic diet (-log(P)=8.2) and is located on chromosome 18, spanning less than 100 kb, where Cdh2 is the only candidate gene (Figure 5B). N-cadherin (Cdh2) is a classical cadherin from the cadherin superfamily. Targeted expression of a dominant-negative N-cadherin in vivo delays peak bone mass and increases adipogenesis when compared with wild-type littermates . Since Adam12 and Cdh2 are the only candadiate gene in each of their QTL regions and further supported by the evidence from knockout and/or transgenic mice, we can safely establish these two QTL genes as casual genetic variants for atherogenic diet-induced obesity in natural inbred mouse populations. It should be noted that evidence solely from knockout or transgenic mice is not sufficient to infer causal relationship between a particular gene and disease in natural populations since the gene that was knocked out or overexpressed in mice may be only one of the several genes involved in the same biological pathway leading to the disease, rather than the true gene that is mutated in natural populations. Only when genetic mapping data, especially from high-resolution QTL analyses that identify allelic variants associated with the disease, are incorporated with murine knockout or transgenic models can the casual relationship be firmly established.
Among these 21 candidate genes (Table 2 and and3),3), Gpr39 has knockout phenotypes on both obesity and plasma lipid levels. Our GWA scans identified a QTL on chromosome 1 affecting percent fat mass on atherogenic diet (-log(P)=6.1). This QTL colocalizes with another QTL identified by our GWA scans, which affects total cholesterol and non-HDL cholesterol levels, and fold change in total cholesterol levels after 17 weeks on atherogenic diet (-log(P)=6.6–9.9) (Figure 6). The genomic locations of these two overlapped QTLs are indistinguishable which are very likely controlled by the same gene with pleiotropic effects on obesity and plasma lipid levels. The critical region of this pleiotropic QTL spans about 0.7 Mb (127.2–127.9 Mb) and covers six annotated genes based on the latest NCBI mouse genome build 36.1, including Actr3, Slc35f5, Gpr39, LOC667043, Lypd1 and E030049G20Rik. Gpr39 is a member of a family that includes the receptors for ghrelin and motilin . Mature body weight and percent fat mass were increased by 38% and 5%, respectively, in Gpr39 loss-of-function (Gpr39−/−) mice; cholesterol levels were increased by 11% and 42% in male and female Gpr39−/− mice, respectively . This establishes Gpr39 as a candidate for this pleiotropic QTL affecting both obesity and plasma lipid levels, even in light of the fact of recent observations clouding its role –. Nevertheless, these remarkable phenotypic differences between Gpr39−/− and wild-type mice mentioned above were only observed at old age (about 50 weeks for obesity and 25 weeks for cholesterol) , suggesting that deleterious mutations in Gpr39 may increase the risk for late-onset obesity and atherosclerotic coronary heart disease.
We have demonstrated the robustness and reproducibility of GWA analysis by comparing QTLs derived from GWA with those derived from previous linkage analyses in inbred mice. In a recent GWA analysis of lung tumor incidence, we reproduced the pulmonary adenoma susceptibility 1 (Pas1) locus identified in previous linkage studies and further narrowed this QTL to a region of less than 0.5 Mb in which at least two genes, Kras2 and Casc1, are strong candidates . The refined region is completely coincident with that from traditional fine mapping by using congenic strains of mice . Casc1 knockout mouse tumor bioassays further confirmed Casc1 as a primary candidate for the Pas1 locus . In the present study, we again reproduced and narrowed the Bwtq1 locus to a region of less than 1.0 Mb where Pappa2 is a primary candidate. The refined Bwtq1 region is identical to that achieved by developing a series of congenic mice and interval-specific subcongenic mice , . These results demonstrate that GWA scans using dense SNP maps in laboratory mice are a powerful tool for the refinement of previously identified QTL regions. These results can be also served as an internal control for the methods that we used in our GWA scans. Given that known refined QTLs were exactly reproduced in the analyses, the newly refined QTLs identified by our GWA scans are expected to be a useful resource for further positional cloning and gene discovery in the mouse. It should be also noted that 937 QTLs identified in our study represent about one half of all QTLs derived from linkage analyses in mouse cross-breeding experiments in the past decades (http://www.informatics.jax.org).
Research tools such as genetrap mutations, transgenes, and standard knockout technology can be used to verify causal variants from highly-refined QTL regions immediately after GWA scans , . Using resources from murine gene-deficiency and/or transgenic models, we have identified 10 candidate genes affecting obesity-related phenotypes and 11 candidate genes affecting plasma cholesterol levels from a survey of several highly refined QTLs (Table 2 and and3).3). These QTL genes are most likely the causal genes based on the following evidence. First, the genomic regions where these genes are located are not only highly associated with the analyzed phenotypes in the GWA scans, but are also under the linkage peak from two or more independent linkage studies with crossing-breeding experiments , . Second, these candidates were identified from a highly refined genomic region, as narrow as 0.54 Mb (with an average of 6 genes per region), which is extremely more precise than those chosen from a much broader linkage region (about 20 Mb). Third, and more importantly, relevant phenotypes for these genes were observed in murine knockout and/or transgenic models. Notably, due to the single-gene resolution achieved by our GWA scans, we firmly establish two QTL genes, Adam12 and Cdh2, as causal variants for obesity in inbred mice. It is also worth noting that in the present study we only focused on 28 obesity and cholesterol QTLs out of 937 QTLs, which have available data from murine knockout and/or transgenic models (Table 2 and and3).3). Other QTL genes identified in this study are now ready to be evaluated for their functional relevance to the analyzed mouse phenotypes. We have summarized this extremely valuable QTL resource in online Table S1 and S2; this resource can greatly facilitate positional cloning and the identification of new genetic determinants of complex traits.
Finally, several caveats for our findings should be mentioned. Firstly, selection and inbreeding play important roles in the formation of the genetic architecture of mouse inbred strains. Potential population structure may arise in these mouse strains, which inflates type I error rates and may lead to spurious associatons in the analysis . Wild-derived inbred mice have divergent evolutionary histories and thus have strong potential to generate spurious assocations. There are systematic differences in obesity-related trait values between wild-derived and other inbred strains. Therefore, the population structure was carefully inspected in our association analysis, and we removed wild-derived inbred strains from the analysis when population structure was detected. Spurious associations were largely reduced in the analysis after the removal of wild-derived inbred mouse strains (Figure S7 and Text S1). Several methods have been developed that adjust for the effect of population structure on spurious assocations –, but these may not be sufficient for the analysis of a small number of mouse inbred strains (~30–40 strains) . Secondly, we used permutations to establish stringent genome-wide thresholds for declaring significant association for each phenotype. Although most genomic regions show correct type I error rates, some regions may show increased error rates due to unequal relatedness among the strains . The risk of these spurious associations can be alleviated when incorporating prior linkage evidence into the analysis. The identified QTL genes located within previous linkages that have been replicated in two or more independent studies should be prioritized for further investigation. The associated QTL alleles need to be checked to see if they are segregated in linkage mapping populations. We have demonstrated this strategy in a recent study in which the positional cloning of a novel QTL gene, from a previous linkage-defined region, was done immediately after a GWA analysis . Thirdly, we estimated power of association analysis using classical inbred mouse strains through simulation studies (Figure S8 and Text S1). GWA scans have reasonable power to detect QTL genes with major effects, while are limited to detect QTL genes with moderate effects. However, focusing association analysis on linkage-defined regions can dramatically increase statistical power and has sufficient power to detect moderate-effect QTL genes. In our results, the great majority of the QTLs identified in the GWA scans overlapped with previous linkage-defined regions from mouse crosses which were retrieved from the Mouse Genome Informatics (MGI) at http://www.informatics.jax.org. Nevertheless, association results for small-effect QTLs from non-linkage regions should be interpreted with caution. It should also be noted that statistical power of association analysis in inbred mice can be significantly improved by increasing the number of mice per strain used in phenotype measurement. Fourthly, most SNPs were discovered by comparison of the genomes of several classical inbred laboratory mouse strains (such as C57BL/6J, DBA/2J, A/J and 129S1/SvImJ). Very few SNPs show polymorphisms among wild-derived strains. The ascertainment bias of SNPs will affect population inference including linkage disequilibrium and population structure. This ascertainment bias will also likely erode the power of tests of association between genotype and phenotype. However it is unlikely that the ascertainment bias will introduce false-positive inferences .
Mouse phenotypes were generated by the mouse PHENOME projects (http://phenome.jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). In most PHENOME projects, each phenotype was measured for about 40 strains from a total of 59 inbred strains of mice. The strains used for phenotyping varied slightly among different projects. Generally, at least 10 males and 10 females from each strain were measured for each phenotype when 10–14 weeks of age. Before statistical analysis was conducted, the phenotypic data were subjected to detection of outliers using box plot (http://www.itl.nist.gov/div898/handbook/) (Text S1). The SNP data were obtained from the Wellcome Trust Centre for Human Genetics (WTCHG) at Oxford University (http://www.well.ox.ac.uk/mouse/INBREDS) and the Broad Institute of Harvard and MIT (http://www.broad.mit.edu/personal/claire/MouseHapMap/Inbred.htm). The genomic positions (in bp) of the SNPs for the two data sets were unified based on the latest NCBI mouse genome map build 36.1. These SNPs were further selected by removing SNPs with fewer than 20 strains typed or without genetic mapping information. The resulting SNP data include 148,062 SNPs spanning the mouse genome at an average density of ~18 kb per SNP. The SNP genotyping accuracy reported by the WTCHG and the Broad Institute is over 99.8%. It is worth noting that several strains used in the analysis only have available WTCHG SNP data. Thirty-seven additional SNPs chosen from Perlegen Sciences were genotyped to fill gaps in QTL regions covering Adam12 and Cdh2.
The commonly used pairwise LD measure, r2, which is the squared correlation coefficient of alleles at two loci , was calculated for pairwise SNPs on the mouse genome. To capture large-scale patterns of LD, we measured the number of proxies, i.e. SNPs showing a strong correlation with one or more others , with a window size of 500 kb across the genome.
A full description of statistical methods for GWA scans is provided in Text S1. Briefly, the associations of a mouse phenotype with SNP markers on the genome were tested by using linear regression models. To maximize power in GWA scans, we examined several competitive models with different variables based on Bayesian information criteria. The log-likelihood-ratio test statistic for the existence of a QTL is calculated by comparing the likelihood values under full model with null model. In the results, the negative 10-base logarithmic p value from x2 tests, i.e. –log(P), was presented. Population structure was investigated by the cumulative distribution of p values in genome-wide association analysis for each phenotype . We removed wild-derived inbred strains in the association analysis when population structure was detected.
In the GWA scans, 1,000 permutations were used to establish a genome-wide threshold (a global p value of 0.05) for declaring significant associations for each phenotype. Specifically, the analyzed phenotype was randomly reshuffled among subjects while fixing the genotypes. For each of the 1,000 permutations, the GWA scan was repeated, and the most significant –log(P) was recorded. Sorting the maximum –log(P) from large to small, the 5% quantile of the empirical distribution was taken as the genome-wide threshold (a global p value of 0.05). Since LD extends a very short genomic region (normally less than a few hundred kb) on the mouse genome, once a significant SNP is identified, a causal variant responsible for the QTL should be close to the identified SNP. To define approximate QTL regions, we first identified significant SNPs with an established empirical threshold. If two significant SNPs were less than 1,000 kb from each other, they were treated as one association signal. Then these genomic regions were extended to cover closely linked SNPs that showed suggestive associations with the analyzed traits. We also reported p values, physical positions and genomic domains for each significant SNP. The identified QTLs were compared with those detected by previous linkage analyses from mouse crosses which were retrieved from Mouse Genome Informatics (MGI) at http://www.informatics.jax.org.
To avoid the violation of assumption of normal distribution in the above statistical analysis, 173 quantitative phenotypes were checked for their normality by Shapiro-Wilk tests  before GWA scans. Phenotypes with Shapiro-Wilk tests P<0.05 were converted to approach normality by Box-Cox transformation or normal scores , . For those phenotypes still showing marked departure from the normal distribution (p<0.01) after data transformation, the Mann-Whitney rank tests were performed in the GWA scans .
The heritability of inbred strains can be estimated by linear regression models with sex, strain and sex-strain interaction as independent variables. Each strain was treated as a categorical variable in the linear model analyses. The difference between the adjusted R2 of the regression models with and without strain and its associated sex-strain interaction is approximately equal to the percentage of genetic variation (that is, heritability) in inbred mouse populations. Similarly, the phenotypic variation due to sex differences can also be estimated.
The phylogenetic tree was constructed with 148,062 SNPs from 59 inbred mouse strains, implemented in the dnadist program in the PHYLIP 3.66 package (http://evolution.genetics.washington.edu/phylip.html). Branch length information was used to plot evolutionary distance between strains in the drawtree program in the PHYLIP 3.66 package.
Supporting methods: A detailed description of methods and materials.
(0.11 MB DOC)
173 mouse quantitative traits analyzed in genome-wide association analyses. The table gives empirical genome-wide threshold, the number of QTLs and significant SNPs detected in GWA scans for each trait. The table also lists project names, mouse phenome database (MPD) accession numbers, short and full name of traits, and categories of traits.
(0.36 MB DOC)
Summary of 937 QTL identified in genome-wide association analyses. The table gives the chromosome, position of QTL peak (bp, based on the NCBI genome build 36), QTL interval, and -log (P) for each QTL identified in GWA scans. The table also provides the SNP ID of the most significant SNP and its dbSNP's annotation if available, and total number of significant and suggestive SNPs for each QTL.
(1.99 MB DOC)
Distribution of SNP positions and LD structure across the mouse genome. For each chromosome, the top panel shows number of SNPs in a window size of 500 kb. The bottom panel shows number of proxies (red lines for r^2>0.5 and blue lines for r^2>0.8) per SNP in a window size of 500 kb.
(0.41 MB PDF)
Characteristics of the mouse SNP map. (A) Distribution of SNP allele frequency. (B) Distribution of inter-SNP distances. (C) LD decay as a function of physical distance. (D) Number of proxies per SNP in a window size of 500 kb, as a function of the threshold for correlation (r^2).
(0.90 MB PDF)
Phylogenetic tree of 59 inbred mouse strains. The phylogenetic tree was constructed with a total of 148,062 SNPs from the WTCHG and Broad Institute, implemented in the dnadist program in the PHYLIP 3.66 package (http://evolution.genetics.washington.edu/phylip.html). The branch length information was used to plot evolutionary distance between strains in the drawtree program. 59 inbred mouse strains are organized into seven groups: Bagg albino derivatives, C57-related strains, Castle's mice, Japanese and New Zealand inbred strains, Little's DBA and related strains, Swiss mice, and wild-derived strains.
(1.41 MB PDF)
Characteristics of 173 complex quantitative traits in inbred mice. (A) Heritability of quantitative traits. (B) Phenotypic variation due to sex effects.
(0.70 MB PDF)
An in silico strategy for high-throughput gene discovery in inbred mice. GWA scans were implemented in an automatic processing pipeline which constitutes data retrieving, outlier detection, data preprocessing, hypothesis testing, permutations and QTL identification.
(0.21 MB PDF)
Genome-wide association analysis of several obesity-related phenotypes in inbred mice. The obesity-relaed phenotypes are (A) body weight at the start of testing (8 weeks) (Tordoff3_bw_start); (B) calculated weight of lean tissue (14 weeks) (Tordoff3_lean_wt); (C) body weight after 8 weeks on an atherogenic diet (Naggert1_bw_fat8); (D) total tissue mass after 8 weeks on an atherogenic diet (Naggert1_tissuemass_fat8); (E) weight of lean portion of tissue mass after 8 weeks on an atherogenic diet (Naggert1_leanwt_fat8); (F) bone mineral content after 8 weeks on an atherogenic diet (Naggert1_BMC_fat8); (G) initial body weight (7–9 weeks), day 0 of an atherogenic diet (Paigen1_initbw); and (H) final body weight after 8 weeks on an atherogenic diet (Paigen1_finalbw). In Paigen1 and Naggert1 projects, mice at 7–9 weeks of age were weighed and then administered a high fat, high cholesterol atherogenic diet. The scatter plots were drawn for -log (P) against the SNP position in the chromosomes. The horizontal gray lines indicate genome-wide empirical thresholds (global p value=0.05). The horizontal coordinates were plotted using physical distance (Mb) in each panel.
(7.96 MB PDF)
Potential population structure in inbred mice. Body weight after 8 weeks on altherogenic diet (Naggert1_bw_fat8) was used to illustrate the population structure in inbred mice. (A) Cummlative distribution of p values from GWA analysis of samples with (red line) and without (blue line) wild-derived inbred strains. cdf, cumulative distribution function. (B) Comparison of GWA analysis of samples with (upper, red) and without (lower, blue) wild-derived inbred strains. The two green horizontal lines are genome-wide thresholds (a global p value of 0.05). Spurious assocaitons were largely reduced in the analysis after the removal of wild-derived inbred mouse strains.
(1.78 MB PDF)
Power of association analysis in inbred mice. Power was estimated for different trait heritabilities under genome-wide and QTL-wide thresholds (P=0.05). The genome-wide threshold is used to declare a significant association on the genome without any prior genetic evidence; while the QTL-wide threshold is used to declare a significant association on a privious linkage-defined region.
(0.48 MB PDF)
We thank the Jackson Laboratory and investigators (a detailed list of investigators of the mouse PHENOME projects is provided in the Text S1) for collecting and phenotyping inbred mouse strain characteristics. We thank the Broad Institute of Harvard/Massachusetts General Hospital and MIT, the Wellcome Trust Center for Human Genetics at Oxford University, and Perlegen Sciences for releasing inbred laboratory mouse SNP data. The mouse phenotype and SNP data were crucial for the key findings described in this paper.
Competing Interests: The authors have declared that no competing interests exist.
Funding: Our research was supported by grants from the US National Institutes of Health (CA099187, CA099147, ES012063 and ES013340).