Genetic mapping studies in the mouse and other model organisms are used to search for genes underlying complex phenotypes. Traditional genetic mapping studies that employ single-generation crosses have poor mapping resolution and limit discovery to loci that are polymorphic between the two parental strains. Multiparent outbreeding populations address these shortcomings by increasing the density of recombination events and introducing allelic variants from multiple founder strains. However, multiparent crosses present new analytical challenges and require specialized software to take full advantage of these benefits. Each animal in an outbreeding population is genetically unique and must be genotyped using a high-density marker set; regression models for mapping must accommodate multiple founder alleles, and complex breeding designs give rise to polygenic covariance among related animals that must be accounted for in mapping analysis. The Diversity Outbred (DO) mice combine the genetic diversity of eight founder strains in a multigenerational breeding design that has been maintained for >16 generations. The large population size and randomized mating ensure the long-term genetic stability of this population. We present a complete analytical pipeline for genetic mapping in DO mice, including algorithms for probabilistic reconstruction of founder haplotypes from genotyping array intensity data, and mapping methods that accommodate multiple founder haplotypes and account for relatedness among animals. Power analysis suggests that studies with as few as 200 DO mice can detect loci with large effects, but loci that account for <5% of trait variance may require a sample size of up to 1000 animals. The methods described here are implemented in the freely available R package DOQTL.
diversity outbred; haplotype reconstruction; quantitative trait locus mapping; Multiparent Advanced Generation Inter-Cross (MAGIC); multiparental populations; MPP
A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosaic. Our aim is to provide a framework for coherent estimation of haplotype and diplotype (haplotype pair) effects that takes into account the following: uncertainty in haplotype composition for each individual; uncertainty arising from small sample sizes and infrequently observed haplotype combinations; possible effects of dominance (for noninbred subjects); genetic background; and that provides a means to incorporate data that may be incomplete or has a hierarchical structure. Using the results of a probabilistic haplotype reconstruction as prior information, we obtain posterior distributions at the QTL for both haplotype effects and haplotype composition. Two alternative computational approaches are supplied: a Markov chain Monte Carlo sampler and a procedure based on importance sampling of integrated nested Laplace approximations. Using simulations of QTL in the incipient CC (pre-CC) and Northport HS populations, we compare the accuracy of Diploffect, approximations to it, and more commonly used approaches based on Haley–Knott regression, describing trade-offs between these methods. We also estimate effects for three QTL previously identified in those populations, obtaining posterior intervals that describe how the phenotype might be affected by diplotype substitutions at the modeled locus.
QTL mapping; Collaborative Cross; haplotype effects; multiparent lines; Multiparent Advanced Generation Inter-Cross (MAGIC); multiparental populations; MPP; mixed models; genetic architecture; heterogeneous stocks
The Collaborative Cross (CC) is an emerging panel of recombinant inbred
mouse strains. Each strain is genetically distinct but all descended from the
same eight inbred founders. In 66 strains from incipient lines of the CC
(pre-CC), as well as the 8 CC founders and some of their F1 offspring, we
examined subsets of lymphocytes and antigen-presenting cells. We found
significant variation among the founders, with even greater diversity in the
pre-CC. Genome-wide association using inferred haplotypes detected highly
significant loci controlling B-to-T cell ratio, CD8 T-cell numbers, CD11c and
CD23 expression. Comparison of overall strain effects in the CC founders with
strain effects at QTL in the pre-CC revealed sharp contrasts in the genetic
architecture of two traits with significant loci: variation in CD23 can be
explained largely by additive genetics at one locus, whereas variation in B-to-T
ratio has a more complex etiology. For CD23, we found a strong QTL whose
confidence interval contained the CD23 structural gene Fcer2a.
Our data on the pre-CC demonstrate the utility of the CC for studying
immunophenotypes and the value of integrating founder, CC, and F1 data. The
extreme immunophenotypes observed could have pleiotropic effects in other CC
Collaborative Cross; FcεR; QTL
Obesity in human populations, currently a serious health concern, is considered to be the consequence of an energy imbalance in which more energy in calories is consumed than is expended. We used interval mapping techniques to investigate the genetic basis of a number of energy balance traits in an F11 advanced intercross population of mice created from an original intercross of lines selected for increased and decreased heat loss. We uncovered a total of 137 quantitative trait loci (QTLs) for these traits at 41 unique sites on 18 of the 20 chromosomes in the mouse genome, with X-linked QTLs being most prevalent. Two QTLs were found for the selection target of heat loss, one on distal chromosome 1 and another on proximal chromosome 2. The number of QTLs affecting the various traits generally was consistent with previous estimates of heritabilities in the same population, with the most found for two bone mineral traits and the least for feed intake and several body composition traits. QTLs were generally additive in their effects, and some, especially those affecting the body weight traits, were sex-specific. Pleiotropy was extensive within trait groups (body weights, adiposity and organ weight traits, bone traits) and especially between body composition traits adjusted and not adjusted for body weight at sacrifice. Nine QTLs were found for one or more of the adiposity traits, five of which appeared to be unique. The confidence intervals among all QTLs averaged 13.3 Mb, much smaller than usually observed in an F2 cross, and in some cases this allowed us to make reasonable inferences about candidate genes underlying these QTLs. This study combined QTL mapping with genetic parameter analysis in a large segregating population, and has advanced our understanding of the genetic architecture of complex traits related to obesity.
QTL by sex interactions; Metabolic rate; Feed intake; Body weight and body composition
Haloperidol is an efficacious antipsychotic drug that has serious, unpredictable motor side effects that limit its utility and cause noncompliance in many patients. Using a drug–placebo diallel of the eight founder strains of the Collaborative Cross and their F1 hybrids, we characterized aggregate effects of genetics, sex, parent of origin, and their combinations on haloperidol response. Treating matched pairs of both sexes with drug or placebo, we measured changes in the following: open field activity, inclined screen rigidity, orofacial movements, prepulse inhibition of the acoustic startle response, plasma and brain drug level measurements, and body weight. To understand the genetic architecture of haloperidol response we introduce new statistical methodology linking heritable variation with causal effect of drug treatment. Our new estimators, “difference of models” and “multiple-impute matched pairs”, are motivated by the Neyman–Rubin potential outcomes framework and extend our existing Bayesian hierarchical model for the diallel (Lenarcic et al. 2012). Drug-induced rigidity after chronic treatment was affected by mainly additive genetics and parent-of-origin effects (accounting for 28% and 14.8% of the variance), with NZO/HILtJ and 129S1/SvlmJ contributions tending to increase this side effect. Locomotor activity after acute treatment, by contrast, was more affected by strain-specific inbreeding (12.8%). In addition to drug response phenotypes, we examined diallel effects on behavior before treatment and found not only effects of additive genetics (10.2–53.2%) but also strong effects of epistasis (10.64–25.2%). In particular: prepulse inhibition showed additivity and epistasis in about equal proportions (26.1% and 23.7%); there was evidence of nonreciprocal epistasis in pretreatment activity and rigidity; and we estimated a range of effects on body weight that replicate those found in our previous work. Our results provide the first quantitative description of the genetic architecture of haloperidol response in mice and indicate that additive, dominance-like inbreeding and parent-of-origin effects contribute strongly to treatment effect heterogeneity for this drug.
iallel; MCMC; Collaborative Cross; inbred strains; haloperidol; causal modeling; treatment effect heterogeneity; pharmacogenetics
X chromosome inactivation (XCI) is the mammalian mechanism of dosage compensation that balances X-linked gene expression between the sexes. Early during female development, each cell of the embryo proper independently inactivates one of its two parental X-chromosomes. In mice, the choice of which X chromosome is inactivated is affected by the genotype of a cis-acting locus, the X-chromosome controlling element (Xce). Xce has been localized to a 1.9 Mb interval within the X-inactivation center (Xic), yet its molecular identity and mechanism of action remain unknown. We combined genotype and sequence data for mouse stocks with detailed phenotyping of ten inbred strains and with the development of a statistical model that incorporates phenotyping data from multiple sources to disentangle sources of XCI phenotypic variance in natural female populations on X inactivation. We have reduced the Xce candidate 10-fold to a 176 kb region located approximately 500 kb proximal to Xist. We propose that structural variation in this interval explains the presence of multiple functional Xce alleles in the genus Mus. We have identified a new allele, Xcee present in Mus musculus and a possible sixth functional allele in Mus spicilegus. We have also confirmed a parent-of-origin effect on X inactivation choice and provide evidence that maternal inheritance magnifies the skewing associated with strong Xce alleles. Based on the phylogenetic analysis of 155 laboratory strains and wild mice we conclude that Xcea is either a derived allele that arose concurrently with the domestication of fancy mice but prior the derivation of most classical inbred strains or a rare allele in the wild. Furthermore, we have found that despite the presence of multiple haplotypes in the wild Mus musculus domesticus has only one functional Xce allele, Xceb. Lastly, we conclude that each mouse taxa examined has a different functional Xce allele.
Although mammalian females have two X chromosomes in each cell, only one is functional, while gene expression from the other is silenced through a process called X chromosome inactivation. Little is known about the early stages of this process including how one parental X chromosome is inactivated over the other on a cell-by-cell basis. It has been shown, however, that certain inbred mouse strains are functionally different at a locus that controls this choice that provides an opportunity to identify the locus and determine its molecular mechanism. This has been the goal of many researchers over the past 40 years with incremental success. Here we took advantage of new mouse genotype and whole genome sequencing data to pinpoint the locus controlling choice. Our results identified a smaller region on the X chromosome that contains large duplicated sequences. We propose an explanation for multiple functional alleles in mouse and provide insight into the possible molecular mechanism of X chromosome inactivation choice. Our evolutionary analysis reveals why functional diversity at this locus appears to be common in laboratory mice and offers an explanation as to why we do not see this level of diversity in humans.
Genetic variation contributes to host responses and outcomes following infection by influenza A virus or other viral infections. Yet narrow windows of disease symptoms and confounding environmental factors have made it difficult to identify polymorphic genes that contribute to differential disease outcomes in human populations. Therefore, to control for these confounding environmental variables in a system that models the levels of genetic diversity found in outbred populations such as humans, we used incipient lines of the highly genetically diverse Collaborative Cross (CC) recombinant inbred (RI) panel (the pre-CC population) to study how genetic variation impacts influenza associated disease across a genetically diverse population. A wide range of variation in influenza disease related phenotypes including virus replication, virus-induced inflammation, and weight loss was observed. Many of the disease associated phenotypes were correlated, with viral replication and virus-induced inflammation being predictors of virus-induced weight loss. Despite these correlations, pre-CC mice with unique and novel disease phenotype combinations were observed. We also identified sets of transcripts (modules) that were correlated with aspects of disease. In order to identify how host genetic polymorphisms contribute to the observed variation in disease, we conducted quantitative trait loci (QTL) mapping. We identified several QTL contributing to specific aspects of the host response including virus-induced weight loss, titer, pulmonary edema, neutrophil recruitment to the airways, and transcriptional expression. Existing whole-genome sequence data was applied to identify high priority candidate genes within QTL regions. A key host response QTL was located at the site of the known anti-influenza Mx1 gene. We sequenced the coding regions of Mx1 in the eight CC founder strains, and identified a novel Mx1 allele that showed reduced ability to inhibit viral replication, while maintaining protection from weight loss.
Host responses to an infectious agent are highly variable across the human population, however, it is not entirely clear how various factors such as pathogen dose, demography, environment and host genetic polymorphisms contribute to variable host responses and infectious outcomes. In this study, a new in vivo experimental model was used that recapitulates many of the genetic characteristics of an outbred population, such as humans. By controlling viral dose, environment and demographic variables, we were able to focus on the role that host genetic variation plays in influenza virus infection. Both the range of disease phenotypes and the combinations of sets of disease phenotypes at 4 days post infection across this population exhibited a large amount of diversity, reminiscent of the variation seen across the human population. Multiple host genome regions were identified that contributed to different aspects of the host response to influenza infection. Taken together, these results emphasize the critical role of host genetics in the response to infectious diseases. Given the breadth of host responses seen within this population, several new models for unique host responses to infection were identified.
Susceptibility to inflammatory arthritis is determined by a complex set of environmental and genetic factors, but only a portion of the genetic effect can be explained. Conventional genome-wide screens of arthritis models using crosses between inbred mice have been hampered by the low resolution of results and by the restricted range of natural genetic variation sampled. We sought to address these limitations by performing a genome-wide screen for determinants of arthritis severity using a genetically heterogeneous cohort of mice.
Heterogeneous Stock (HS) mice derive from eight founder inbred strains by serial intercrossing (N>60), resulting in fine-grained genetic variation. With a cohort of 570 HS mice, we performed a genome-wide screen for determinants of severity in the K/BxN serum-transfer arthritis model.
We mapped regions on chromosomes 1, 2, 4, 6, 7 and 15 that contain QTLs influencing arthritis severity at a resolution of a few Mb. In several instances, these regions proved to contain 2 QTLs: the region on chromosome 2 includes the C5 fraction of complement known to be required for K/BxN arthritis, but also contained a second adjacent QTL, for which an intriguing candidate is Ptgs1 (Cox-1). Interesting candidates on Chr4 include the Padi gene family, encoding peptidyl-arginine-deiminases responsible for citrulline protein modification; suggestively, Padi2 and Padi4 RNA expression was correlated with arthritis severity in HS mice.
These results provide a broad overview of the genetic variation that controls the severity of K/BxN arthritis and suggest intriguing candidate genes for further study.
A number of recent works have introduced statistical methods for detecting genetic loci that affect phenotypic variability, which we refer to as variability-controlling quantitative trait loci (vQTL). These are genetic variants whose allelic state predicts how much phenotype values will vary about their expected means. Such loci are of great potential interest in both human and non-human genetic studies, one reason being that a detected vQTL could represent a previously undetected interaction with other genes or environmental factors. The simultaneous publication of these new methods in different journals has in many cases precluded opportunity for comparison. We survey some of these methods, the respective trade-offs they imply, and the connections between them. The methods fall into three main groups: classical non-parametric, fully parametric, and semi-parametric two-stage approximations. Choosing between alternatives involves balancing the need for robustness, flexibility, and speed. For each method, we identify important assumptions and limitations, including those of practical importance, such as their scope for including covariates and random effects. We show in simulations that both parametric methods and their semi-parametric approximations can give elevated false positive rates when they ignore mean-variance relationships intrinsic to the data generation process. We conclude that choice of method depends on the trait distribution, the need to include non-genetic covariates, and the population size and structure, coupled with a critical evaluation of how these fit with the assumptions of the statistical model.
The classic diallel takes a set of parents and produces offspring from all possible mating pairs. Phenotype values among the offspring can then be related back to their respective parentage. When the parents are diploid, sexed, and inbred, the diallel can characterize aggregate effects of genetic background on a phenotype, revealing effects of strain dosage, heterosis, parent of origin, epistasis, and sex-specific versions thereof. However, its analysis is traditionally intricate, unforgiving of unplanned missing information, and highly sensitive to imbalance, making the diallel unapproachable to many geneticists. Nonetheless, imbalanced and incomplete diallels arise frequently, albeit unintentionally, as by-products of larger-scale experiments that collect F1 data, for example, pilot studies or multiparent breeding efforts such as the Collaborative Cross or the Arabidopsis MAGIC lines. We present a general Bayesian model for analyzing diallel data on dioecious diploid inbred strains that cleanly decomposes the observed patterns of variation into biologically intuitive components, simultaneously models and accommodates outliers, and provides shrinkage estimates of effects that automatically incorporate uncertainty due to imbalance, missing data, and small sample size. We further present a model selection procedure for weighing evidence for or against the inclusion of those components in a predictive model. We evaluate our method through simulation and apply it to incomplete diallel data on the founders and F1's of the Collaborative Cross, robustly characterizing the genetic architecture of 48 phenotypes.
Recent developments in high-density genotyping and statistical analysis methods that have enabled genome-wide association studies in humans can also be applied to outbred mouse populations. Increased recombination in outbred populations is expected to provide greater mapping resolution than traditional inbred line crosses, improving prospects for identifying the causal genes. We carried out genome-wide association mapping by using 288 mice from a commercially available outbred stock; NMRI mice were genotyped with a high-density single-nucleotide polymorphism array to map loci influencing high-density lipoprotein cholesterol, systolic blood pressure, triglyceride levels, glucose, and urinary albumin-to-creatinine ratios. We found significant associations (P < 10−5) with high-density lipoprotein cholesterol and identified Apoa2 and Scarb1, both of which have been previously reported, as candidate genes for these associations. Additional suggestive associations (P < 10−3) identified in this study were also concordant with published quantitative trait loci, suggesting that we are sampling from a limited pool of genetic diversity that has already been well characterized. These findings dampen our enthusiasm for currently available commercial outbred stocks as genetic mapping resources and highlight the need for new outbred populations with greater genetic diversity. Despite the lack of novel associations in the NMRI population, our analysis strategy illustrates the utility of methods that could be applied to genome-wide association studies in humans.
Mouse Genetic Resource
Traditional methods for detecting genes that affect complex diseases in humans or animal models, milk production in livestock, or other traits of interest, have asked whether variation in genotype produces a change in that trait’s average value. But focusing on differences in the mean ignores differences in variability about that mean. The robustness, or uniformity, of an individual’s character is not only of great practical importance in medical genetics and food production but is also of scientific and evolutionary interest (e.g., blood pressure in animal models of heart disease, litter size in pigs, flowering time in plants). We describe a method for detecting major genes controlling the phenotypic variance, referring to these as vQTL. Our method uses a double generalized linear model with linear predictors based on probabilities of line origin. We evaluate our method on simulated F2 and collaborative cross data, and on a real F2 intercross, demonstrating its accuracy and robustness to the presence of ordinary mean-controlling QTL. We also illustrate the connection between vQTL and QTL involved in epistasis, explaining how these concepts overlap. Our method can be applied to a wide range of commonly used experimental crosses and may be extended to genetic association more generally.
The high and low alcohol preferring (HAP1 and LAP1) mouse lines were selectively bred for differences in alcohol intake. The HAP1 and LAP1 mice are essentially noninbred lines that originated from the outbred colony of HS/Ibg mice, a heterogeneous stock developed from intercrossing 8 inbred strains of mice.
A total of 867 informative SNPs were genotyped in 989 HAP1 × LAP1 F2, 68 F1s, 14 parents (6 LAP1, 8 HAP1), as well as the 8 inbred strains of mice crossed to generate the HS/Ibg colony. Multipoint genome wide analyses were performed to simultaneously detect linked QTLs and also fine map these regions using the ancestral haplotypes.
QTL analysis detected significant evidence of association on 4 chromosomes: 1, 3, 5, and 9. The region on chromosome 9 was previously found linked in a subset of these F2 animals using a whole genome microsatellite screen.
We have detected strong evidence of association to multiple chromosomal regions in the mouse. Several of these regions include candidate genes previously associated with alcohol dependence in humans or other animal models.
Quantitative Trait Locus; Alcohol Consumption; Association
Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms.
Most traits of economic and evolutionary interest vary quantitatively and have multiple genes affecting their expression. Dissecting the genetic basis of such traits is crucial for the improvement of crops and management of diseases. Here, we develop a new resource to identify genes underlying such quantitative traits in Arabidopsis thaliana, a genetic model organism in plants. We show that using a large population of inbred lines derived from intercrossing 19 parents, we can localize the genes underlying quantitative traits better than with existing methods. Using these lines, we were able to replicate the identification of previously known genes that affect developmental traits in A. thaliana and identify some new ones. This paper also presents all the necessary biological and computational material necessary for the scientific community to use these lines in their own research. Our results suggest that the use of lines derived from a multiparent advanced generation inter-cross (MAGIC lines) should be very useful in other organisms.
Genetic and environmental factors have important roles in multiple sclerosis (MS) susceptibility. A clear parent of origin effect has been shown in several populations, perhaps resulting from factors operating during gestation. Preterm birth (birth at less than 37 weeks gestational age) has been shown to result in long-term health problems, including impaired neurological development. Here, in a population-based cohort, we investigate whether preterm birth increases the risk to subsequently develop MS.
We identified 6585 MS index cases and 2509 spousal controls with preterm birth information from the Canadian Collaborative Project on Genetic Susceptibility to MS. Rates of individuals born preterm were compared for index cases and controls.
There were no significant differences between cases and controls with respect to preterm births. 370 (5.6%) MS index cases and 130 (5.2%) spousal controls were born preterm, p = 0.41.
Preterm birth does not appear to contribute to MS aetiology. Other factors involved in foetal and early development need to be explored to elucidate the mechanism of the increased risk conferred by the apparent maternal effect.
The development of the mammalian brain is dependent on extensive neuronal migration. Mutations in mice and humans that affect neuronal migration result in abnormal lamination of brain structures with associated behavioral deficits. Here, we report the identification of a hyperactive N-ethyl-N-nitrosourea (ENU)-induced mouse mutant with abnormalities in the laminar architecture of the hippocampus and cortex, accompanied by impaired neuronal migration. We show that the causative mutation lies in the guanosine triphosphate (GTP) binding pocket of α-1 tubulin (Tuba1) and affects tubulin heterodimer formation. Phenotypic similarity with existing mouse models of lissencephaly led us to screen a cohort of patients with developmental brain anomalies. We identified two patients with de novo mutations in TUBA3, the human homolog of Tuba1. This study demonstrates the utility of ENU mutagenesis in the mouse as a means to discover the basis of human neurodevelopmental disorders.
The molecular recognition and discrimination of adenine and guanine ligand moieties in complexes with proteins have been studied using empirical observations on carefully selected crystal structures. The distribution of protein folds that bind these purines has been found to differ significantly from that across the whole PDB, but the most populated architectures and folds are also the most common in three genomes from the three different domains of life. The protein environments around the two nucleic acid bases were significantly different, in terms of the propensities of amino acid residues to be in the binding site, as well as their propensities to form hydrogen bonds to the bases. Plots of the distribution of protein atoms around the two purines clearly show different clustering of hydrogen bond donors and acceptors opposite complimentary acceptors and donors in the rings, with hydrophobic areas below and above the rings. However, the clustering pattern is fuzzy, reflecting the variety of ways that proteins have evolved to recognise the same molecular moiety. Furthermore, an analysis of the conservation of residues in the protein chains binding guanine shows that residues in contact with the base are in general better conserved than the rest of the chain.
Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a “hit region” of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection. Genet. Epidemiol. 36:451–462, 2012. © 2012 Wiley Periodicals, Inc.
GWAS; case-control; genotype imputation; model averaging; LASSO; Stability Selection