We have demonstrated the robustness and reproducibility of GWA analysis by comparing QTLs derived from GWA with those derived from previous linkage analyses in inbred mice. In a recent GWA analysis of lung tumor incidence, we reproduced the pulmonary adenoma susceptibility 1 (Pas1) locus identified in previous linkage studies and further narrowed this QTL to a region of less than 0.5 Mb in which at least two genes, Kras2 and Casc1, are strong candidates
[4]. The refined region is completely coincident with that from traditional fine mapping by using congenic strains of mice
[32]. Casc1 knockout mouse tumor bioassays further confirmed Casc1 as a primary candidate for the
Pas1 locus
[4]. In the present study, we again reproduced and narrowed the Bwtq1 locus to a region of less than 1.0 Mb where Pappa2 is a primary candidate. The refined Bwtq1 region is identical to that achieved by developing a series of congenic mice and interval-specific subcongenic mice
[13],
[14]. These results demonstrate that GWA scans using dense SNP maps in laboratory mice are a powerful tool for the refinement of previously identified QTL regions. These results can be also served as an internal control for the methods that we used in our GWA scans. Given that known refined QTLs were exactly reproduced in the analyses, the newly refined QTLs identified by our GWA scans are expected to be a useful resource for further positional cloning and gene discovery in the mouse. It should be also noted that 937 QTLs identified in our study represent about one half of all QTLs derived from linkage analyses in mouse cross-breeding experiments in the past decades (
http://www.informatics.jax.org).
Research tools such as genetrap mutations, transgenes, and standard knockout technology can be used to verify causal variants from highly-refined QTL regions immediately after GWA scans
[33],
[34]. Using resources from murine gene-deficiency and/or transgenic models, we have identified 10 candidate genes affecting obesity-related phenotypes and 11 candidate genes affecting plasma cholesterol levels from a survey of several highly refined QTLs ( and ). These QTL genes are most likely the causal genes based on the following evidence. First, the genomic regions where these genes are located are not only highly associated with the analyzed phenotypes in the GWA scans, but are also under the linkage peak from two or more independent linkage studies with crossing-breeding experiments
[10],
[22]. Second, these candidates were identified from a highly refined genomic region, as narrow as 0.54 Mb (with an average of 6 genes per region), which is extremely more precise than those chosen from a much broader linkage region (about 20 Mb). Third, and more importantly, relevant phenotypes for these genes were observed in murine knockout and/or transgenic models. Notably, due to the single-gene resolution achieved by our GWA scans, we firmly establish two QTL genes, Adam12 and Cdh2, as causal variants for obesity in inbred mice. It is also worth noting that in the present study we only focused on 28 obesity and cholesterol QTLs out of 937 QTLs, which have available data from murine knockout and/or transgenic models ( and ). Other QTL genes identified in this study are now ready to be evaluated for their functional relevance to the analyzed mouse phenotypes. We have summarized this extremely valuable QTL resource in online
Table S1 and
S2; this resource can greatly facilitate positional cloning and the identification of new genetic determinants of complex traits.
Finally, several caveats for our findings should be mentioned. Firstly, selection and inbreeding play important roles in the formation of the genetic architecture of mouse inbred strains. Potential population structure may arise in these mouse strains, which inflates type I error rates and may lead to spurious associatons in the analysis
[3]. Wild-derived inbred mice have divergent evolutionary histories and thus have strong potential to generate spurious assocations. There are systematic differences in obesity-related trait values between wild-derived and other inbred strains. Therefore, the population structure was carefully inspected in our association analysis, and we removed wild-derived inbred strains from the analysis when population structure was detected. Spurious associations were largely reduced in the analysis after the removal of wild-derived inbred mouse strains (
Figure S7 and
Text S1). Several methods have been developed that adjust for the effect of population structure on spurious assocations
[35]–
[38], but these may not be sufficient for the analysis of a small number of mouse inbred strains (~30–40 strains)
[39]. Secondly, we used permutations to establish stringent genome-wide thresholds for declaring significant association for each phenotype. Although most genomic regions show correct type I error rates, some regions may show increased error rates due to unequal relatedness among the strains
[40]. The risk of these spurious associations can be alleviated when incorporating prior linkage evidence into the analysis. The identified QTL genes located within previous linkages that have been replicated in two or more independent studies should be prioritized for further investigation. The associated QTL alleles need to be checked to see if they are segregated in linkage mapping populations. We have demonstrated this strategy in a recent study in which the positional cloning of a novel QTL gene, from a previous linkage-defined region, was done immediately after a GWA analysis
[4]. Thirdly, we estimated power of association analysis using classical inbred mouse strains through simulation studies (
Figure S8 and
Text S1). GWA scans have reasonable power to detect QTL genes with major effects, while are limited to detect QTL genes with moderate effects. However, focusing association analysis on linkage-defined regions can dramatically increase statistical power and has sufficient power to detect moderate-effect QTL genes. In our results, the great majority of the QTLs identified in the GWA scans overlapped with previous linkage-defined regions from mouse crosses which were retrieved from the Mouse Genome Informatics (MGI) at
http://www.informatics.jax.org. Nevertheless, association results for small-effect QTLs from non-linkage regions should be interpreted with caution. It should also be noted that statistical power of association analysis in inbred mice can be significantly improved by increasing the number of mice per strain used in phenotype measurement. Fourthly, most SNPs were discovered by comparison of the genomes of several classical inbred laboratory mouse strains (such as C57BL/6J, DBA/2J, A/J and 129S1/SvImJ). Very few SNPs show polymorphisms among wild-derived strains. The ascertainment bias of SNPs will affect population inference including linkage disequilibrium and population structure. This ascertainment bias will also likely erode the power of tests of association between genotype and phenotype. However it is unlikely that the ascertainment bias will introduce false-positive inferences
[41].