QTL mapping in rodents has been an important strategy for narrowing the expansive genome to relatively small regions of the genome containing genes relevant to a phenotype of interest. To date, the majority of QTL have been identified using populations based on F2 crosses. However, these methods are time-consuming and expensive. Furthermore, only a small percentage of the QTL identified using F2 crosses have been mapped down to the causative gene or polymorphism, at least in part due to the comparatively large size of the QTL regions.
Here, we presented a comparative analysis of methods that utilize the genetic and phenotypic diversity present in common laboratory inbred mouse strains. The potential advantages of this type of approach are two-fold. First, phenotype-specific mouse crosses are not required to generate the required genetic and phenotypic diversity for initial QTL identification. Phenotype data still need to be measured on the panel of inbred mouse lines, but assuming an appropriate range of phenotype values exists, the rest of the association analysis can be performed in silico. Second, large-scale genotyping efforts can be generated and combined in a phenotype-independent manner, making this approach amenable to collaborative efforts that will benefit the entire mouse genetics community. Furthermore, given the existing data sets that we and others have produced, our haplotype association mapping method allows association studies to be quickly performed over a number of available phenotypes (MPD, for example).
QTL mapping has also been performed using RI lines, which also have the benefit of combining genotype data in community efforts. However, because commonly available RI lines are derived from only two parents, regions in which the parental strains are identical-by-descent (IBD) cannot be probed for QTL. For example, when comparing C57/BL6J and DBA (parents in the BXD RI panel), only 6292 loci have a different inferred haplotype. In contrast, the full panel of laboratory inbred mouse strains interrogates 11182 loci, even after filtering out loci with trivially small haplotype group sizes. In addition, QTL mapping by RI lines is also currently constrained by the limited availability of specific crosses.
Although the set of inbred mouse strains used in our analysis contains greater genotypic and phenotypic diversity compared to the currently available RI lines, the proposal by the Complex Trait Consortium to create additional 1000 RI strains could serve as an even more powerful resource for genome-wide association algorithms [3
]. Since these strains will be derived from crosses of eight parental strains, they will certainly represent an equally broad genotypic and phenotypic diversity as our panel of inbred strains. In addition, the controlled randomization of the genome will lead to a more controlled population structure than is currently found in the common laboratory inbred strains.
Here, we have explored variants of the association mapping algorithm we originally reported [9
] using different test statistics and methods of calculating significance in currently available inbred strains. In addition, we have investigated the use of generalized FWER thresholds for setting genome-wide significance thresholds. Although the lack of a true gold standard prevents a definitive comparison between these methods, two general trends are observed that can likely be extrapolated to all association mapping in inbred strains. First, since the haplotype block structure in inbred strains is complex relative to RI or F2 populations, the use of multi-SNP windows to assign haplotype groups is more appropriate than simply using the genotype at a single locus. Second, because population structure is clearly evident in these inbred lines, methods to account for that structure must be incorporated into association mapping algorithms. Here, we utilize a modified F-statistic that factors into the calculation the average pairwise genetic similarity within a haplotype group.
Despite the potential advantages of haplotype association mapping, the limitations in the experimental design relative to traditional cross-based QTL mapping must be noted. As noted above, there is significant population structure in these inbred mice that is not present in either F2 or RI populations. This structure complicates the analysis, and in some cases prevents this strategy from being meaningfully applied to certain phenotypes. The association metric of our haplotype associationmethod also uses a relatively simple ANOVA model (in comparison to more complex maximum likelihood estimation). Traditional linkage analyses base their estimates on regression models that incorporate individual animals, whereas our ANOVA methodology utilizes strain means. Further, sizes of haplotype groups are small compared to the much larger number of individuals utilized in typical linkage studies. All of these factors can lead to a loss of power.
It is also important to note the strong dependence this haplotype association mapping method has on the inferred haplotype block structure in the mouse genome. While the existence of haplotype block structure is generally accepted, there is ongoing debate regarding the size of these blocks and the ability of haplotype association mapping methods to detect associations. More recent results of Frazer et al
] and Yalcin et al
] indicate the haplotype structure of inbred mice may contain regions of complexity which prevent even dense SNP maps from detecting meaningful associations between genotype and phenotype. We have also encountered this complexity when investigating certain loci that contain known quantitative trait genes. In some cases, we have observed that higher SNP densities are necessary to detect the known loci, possibly indicating a more fragmented haplotype in this region. Clearly those who utilize these algorithms must be cognizant of the limitations of their SNP set, but as SNP density increases the effect of these issues will be mitigated.
Regardless of the relative strengths and weaknesses between haplotype association and traditional QTL mapping, these methods are intermediate steps in pursuit of the final goal – identification of a gene which directly affects the phenotype of interest. In this study, we have used the HDLC levels as the primary phenotype to assess the performance of our algorithms. This phenotype was chosen because it has been extensively studied and many QTL have been previously identified. However, the list of QTL that influence HDLC levels are certainly not exhaustive, and in most cases the specific genes in the QTL regions have not been identified. While the ability of haplotype association methodologies to replicate loci identified in traditional QTL methods is encouraging, this comparison is not an ideal method to assess its specificity and sensitivity. Ultimately, a comprehensive assessment between these approaches may come only after the genetic basis for multiple complex traits has been exhaustively studied.