|Home | About | Journals | Submit | Contact Us | Français|
Admixture mapping is a rapidly developing method to map susceptibility alleles in complex genetic disease associated with continental ancestry. Theoretically, when admixture between continental populations has occurred relatively recently, the chromosomal segments derived from the parental populations can be deduced from the differences in genotype allele frequencies. Progress in computational algorithms, in identification of ancestry informative single nucleotide polymorphisms, and in recent studies applying these tools suggests that this approach will complement other strategies for identifying the variation that underlies many complex diseases.
The approach of using admixture for disease studies was first advanced to examine linkage disequilibrium (LD) between susceptibility genes and markers in a process termed ‘mapping by admixture disequilibrium’ (MALD) [1,2]. In brief, the LD created by admixture between alleles with large frequency variations in the different populations can facilitate the mapping of traits in admixed populations, if any disease susceptibility alleles or disease protective alleles are present in a sufficiently different frequency distribution in the parental populations.
Alternatively, admixture mapping can be viewed and implemented by examining the linkage between a trait and ancestry segments from one or another founding population. Instead of testing for allelic association, markers provide ancestry information and this information is used to examine the linkage with the trait. This concept, originally advanced by McKeigue , underlies the approach that my co-workers and I, and others have taken to address the etiopathogenesis of complex genetic disease [4–7]. Here, I consider the situation where admixture has occurred between two continental populations. Although it is theoretically possible to perform these studies in more complex admixed populations, achieving statistical confidence in ancestry assignment becomes more difficult.
Modeling studies suggest that between 2000 and 5000 well-distributed ancestry informative markers (AIMs) distinguishing parental origins are sufficient for admixture mapping when two continental populations have admixed within the last 15 generations [4–6,8••,9,10]. This number contrasts with the 250 000 or more markers suggested to be necessary for association studies. Admixture mapping suffers, however, from the disadvantages that it can map only disease-associated alleles that are present in different frequencies in the parental populations, and increased regions of LD can hinder fine-scale mapping in the latter stages of identifying the causative genetic variation underlying disease susceptibility. Admixture mapping has an advantage over general association studies in that it is not deterred by multiple independent mutational events, because only the ancestral identity of an allele is used in computations. General association studies have been criticized because of their decreased power in the presence of allelic heterogeneity, particularly because allelic heterogeneity is likely to be very common in complex genetic diseases .
Admixture mapping also has the potential to map genes that are not sufficiently polymorphic within a non-admixed population to be detected by either association or linkage studies. Some functional polymorphisms might be fixed or nearly fixed in one of the parental populations. For the cell-surface glycoprotein Duffy, for example, the null allele is fixed in the African population and a functional allele is fixed in the European population. This particular example is a function of positive selection and is thought to be due to Duffy’s role as a receptor for Plasmodium vivax . Furthermore, selection has been suggested to have been an important factor in shaping the differences between the major ancestral groups; in other words, many AIMs might have acquired their ethnic differences in allele frequencies owing to selection . This possibility is supported by recent studies [14,15••,16] and might enhance the possibility of successful admixture mapping for some diseases.
Although definitive studies demonstrating that admixture mapping is an effective strategy for identifying genetic factors in complex disease are still lacking, studies in African Americans for skin pigmentation, hypertension and multiple sclerosis have provided data suggesting the applicability of these methods [17–19,20•]. More recent studies have provided further enthusiasm for admixture mapping. Reich’s group [21••] has reported mapping a strong susceptibility locus for early onset prostate cancer in African Americans. In addition, studies led by Ziv report the ability of the admixture mapping approach to identify an amino acid variant linked to African ancestry that determines the levels of soluble interleukin-6 receptor. This work, presented at the 2006 annual meeting of the American Society of Human Genetics, might provide the best proof in concept for the admixture mapping approach (http://www.ashg.org).
In the American continents, admixture has occurred between different continental groups from European, Amerindian and African ancestry. Different ethnic groups – for example, Mestizo Mexicans, African Americans and Puerto Ricans – have different allelic contributions from the ‘parental’ populations [22•]. In addition, the admixture characteristics differ with respect to the number of generations and probably the history of gene flow. These differences have an impact on power characteristics and on the number of AIMs needed for defining chromosomal segment ancestry.
Table 1 provides a partial list of diseases that might be particularly amenable to study in specific admixed populations on the basis of the prevalence in founding populations.
Over the past several years, many groups have developed and characterized sets of markers that can distinguish among major ethnic groups in their genetic ancestry (i.e. AIMs) [8••,23–26]. Using HapMap data, for example, my co-workers and I [8••] have recently identified and validated a set of 4222 single nucleotide polymorphisms (SNPs) distributed throughout the genome that have very large differences in allele frequencies between African and European continental populations. These SNPs have been selected to show little difference between disparate African populations and also show very few differences in allele frequencies between different European subpopulations.
Current efforts are nearing completion with respect to a similar set of AIMs to distinguish European and Amerindian ancestry throughout the genome. Our recent studies and those of other groups indicate that the differences among Amerindian groups are usually larger than those found in other subpopulations. Our studies suggest that using a single specific Amerindian group to represent one of the parental populations contributing to the Mexican American admixed population can be problematic. By screening multiple Amerindian populations, however, it is possible to identify sufficient numbers of SNPs that have large differences between Amerindian and European populations and yet show little variation among different Amerindian groups. These studies should enable the development of sets of AIMs for admixture mapping in the Mexican, Mexican American and other ‘Latino’ populations.
Admixture mapping is based on the assumption that some susceptibility variants will be associated with continental ancestry and that this association can be discerned in admixed populations by examining linkage to ancestry. Several different algorithms and computational programs have been developed to facilitate admixture mapping [3–7]. Each of these relies on using a hidden Markov model (HMM) to determine ancestral states along the chromosome (transition probabilities) on the basis of the typing results of markers that are informative for ancestry. The model formulation relies on the prior probability that any given locus in the current generation is derived from one of the founder populations, and depends on the occurrence of different ancestral states along a chromosome that are a result of recombinant events in previous generations since admixture. This HMM approach is designed to infer the unobserved local ancestries for each individual and uses multipoint information from linked markers. Thus, the transition probabilities in HMM, simulated by Poisson arrivals, provide an approximation to the correlations in ancestry between linked markers.
Although the actual underlying model is unknown, simulation studies have shown the ability of these methods to discern ancestry linkage in various admixture models and conditions. These approaches use either case–control analyses or both case–control and case-only analyses. For case-only algorithms, the detection of linkage to ancestry is based on the difference in distribution of the ancestry of chromosomal segments for the loci associated with disease as compared with those in which there is no association. We note that appropriate genome-wide α levels (i.e. the meaningful significance level) should be less when both case-only and case–control algorithms are used to analyze the same set of probands; however, extensive simulations will be necessary to establish this relationship.
Computational programs that are readily available include AncestryMap , Structure/MALDsoft  and Admix-Map . In our application of these programs to simulated data based on real genotypes, my colleagues and I [8••] have found that, although each algorithm can yield appropriate results in many models, the AdmixMap program performs the best when the admixture model is more complex (more generations and different gene flow schemes). Unlike the other methods, AdmixMap estimates both the admixture proportion and the number of generations for each gamete in each individual. We have also noted that the potential issue of LD within parental populations does not seen to be a problem with our current AIM sets using this algorithm [8••].
A new Markov-HMM (MHHM) algorithm has been recently developed that explicitly accounts for LD in parental populations [27•]. The power in real or simulated data sets using real genotyping data has not, however, been robustly examined; thus, the efficacy of this algorithm in admixture mapping is not yet clear. This study has also suggested that the MHHM algorithm will enable admixture mapping without the use of AIMs. In practice, this approach might be problematic because some or many SNPs might not have the appropriate characteristics – namely, little variation in allele frequencies within multiple parental populations that could have contributed to one continental founder population (see ‘Ancestry informative markers’ above).
The power of admixture mapping is determined by several factors: (i) the sample size; (ii) the risk, here termed the ‘ethnicity risk ratio’ (ERR), conferred by one parental ancestry as compared with the other parental ancestry in the admixed population; (iii) the admixture characteristics (e.g. continuous gene flow and number of generations since admixture); and (iv) the ability to define the ancestry of each chromosomal segment derived from both parental gametes in the study subjects. Importantly, modeling studies suggest that multiple waves of parental contribution to the admixed population might enhance rather than diminish the ability of admixture mapping to identify chromosomal regions of interest [5,6,28]. The reader should note that ERR equals the genotypic risk ratio (GRR) in the admixed population if the responsible allelic variant is fixed in opposite directions in the founding populations. This method is generally limited to detecting genes with a GRR of more than 1.5 in the admixed population in studies using ~1000 cases.
For admixture mapping in African Americans, my coworkers and I [8••] have recently examined both the admixture mapping information of AIM sets and the power as a function of admixture mapping information. A set of 4222 AIMs extracts ~70% admixture mapping information genome-wide and provides good power. A subset of 2000 AIMs selected from this set provides only marginally less coverage. Figure 1 shows the power relationship using real marker sets and using case-only and case–control algorithms under an admixture model (six generations, continuous gene flow, 80:20 admixture ratio) consistent with both our own  and other assessments of African American admixture characteristics. Although the case-only algorithms provides greater power, it is not yet certain whether segregation distortion or other factors unrelated to phenotype selection can result in false positives. For Mexican American subjects these relationships are different. Our current studies suggest that a total of ~15 generations fits the admixture characteristics of this population and that the admixture proportion is close to 50% European and 50% Amerindian. Under these conditions, substantially more markers (~5000) are necessary for adequate admixture mapping extraction. Simulations under these conditions, however, show that the power is greater and that the critical region defined by ancestry segments is substantially smaller. In addition, for Mexican Americans the power for case–control algorithms is nearly equal to that for case-only algorithms.
The above sections have focused on identification of the chromosomal region containing susceptibility loci linked to continental ancestry. The size of the critical region identified by these studies depends on the ERR, the sample size and the admixture characteristics that resulted in the population being studied . Identification of the actual susceptibility gene within the critical region requires the same approach used in general association tests or to identify candidate genes; in other words, further narrowing depends on LD within the ancestry-associated segment that is not due to continental ancestry or identification of particular nucleotide variations that can be shown to have a physiological effect.
This brief review has highlighted an emerging methodology – admixture mapping – that is a promising approach in many complex common diseases. Advances in high-throughput genotyping might facilitate the extension of this methodology to admixed populations with several rather than two founding populations. Many ongoing studies using admixture mapping will indicate whether differences between continental populations are particularly useful in identifying variations that result in disease, response to pharmacological therapy and toxicity.
This work was supported by grants from the National Institutes of Health (AR050267 and DK071185).
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest