|Home | About | Journals | Submit | Contact Us | Français|
There is great interest in characterizing the genetic architecture underlying drug response. For many drugs, gene-based dosing models explain a considerable amount of the overall variation in treatment outcome. As such, prescription drug labels are increasingly being modified to contain pharmacogenetic information. Genetic data must, however, be interpreted within the context of relevant clinical covariates. Even the most predictive models improve with the addition of data related to biogeographical ancestry. The current review explores analytical strategies that leverage population structure to more fully characterize genetic determinants of outcome in large clinical practice-based cohorts. The success of this approach will depend upon several key factors: (1) the availability of outcome data from groups of admixed individuals (i.e., populations recombined over multiple generations), (2) a measurable difference in treatment outcome (i.e., efficacy and toxicity endpoints), and (3) a measurable difference in allele frequency between the ancestral populations.
Many genes underlying drug response remain largely uncharacterized. Single nucleotide polymorphisms (SNPs) are the marker of choice for studying genes related to such complex traits because they are abundant, stable across generations, and informative for linkage disequilibrium mapping when selected for appropriate allele frequencies1–2. Currently, there are more than 12 million SNPs deposited in GenBank, 6.5 million of which have been validated (http://www.ncbi.nih.gov/SNP). SNPs occurring every 100–300 bases along the genome capture ~80% of the total genetic diversity in humans3, whereas copy-number variation (CNV), the most prevalent type of structural variation in the human genome, contributes much of the remaining 20%4. These and other forms of genetic variation within individuals and between individuals contribute to variability in treatment response in the context of relevant environmental factors5–6.
The human genome project was established (www.genome.gov; final draft, April 2003) to serve as ‘biology’s periodic table of genes’ in the dissection of complex traits7. The HapMap project (http://www. hapmap.org) provides a validated SNP map in four populations, at an approximate density of one SNP per kb8. The resources available from the HapMap project provide information on allele frequency, Hardy-Weinberg equilibrium (HWE), linkage disequilibrium (LD), haplotype structure, and tagSNPs for common variants. Thus, this database is an important resource in the selection of markers for characterization of complex traits like drug outcome9–10.
Rare variants can also contribute to variability within complex traits11. Through the use of next-generation sequencing technologies, the 1000 Genomes Project is expected to identify over 95% of all variants with allele frequencies greater than 1% within eleven geographically diverse populations. The importance of world-wide sequencing efforts such as these cannot be overstated. Many rare variants have occurred in recent human history, and they therefore tend to have greater population diversity than common variants12. Such variants are ideal for admixture mapping.
Admixture, a common form of gene flow between populations, refers to the process whereby two or more genetically and phenotypically diverse populations begin to mate and form a new, mixed or ‘hybrid’ population13–14. Each chromosome of an admixed individual resembles a mosaic of chromosomal segments derived from a particular ancestral population (Figure 1). The use of such populations to map genes was proposed over 50 years ago15, but has come into prominence recently due to the availability of genome-wide sets of highly informative markers and adequate statistical tools to successfully conduct these studies16–19. Markers with different frequency distributions among ancestral populations can be used to adjust for population stratification among admixed populations. Such markers are often referred to as ancestry informative markers (AIMs), because they can distinguish the ancestral origin of the haplotype on which they reside. If an ancestral population carries a genetic risk allele at a higher frequency than other(s), then genomes of affected “mixed-ancestry” offspring will share a greater level of ancestry (DNA) from that population around that disease susceptibility locus, compared with the background ancestry level in the genome-wide average or compared with the ancestry sharing among discordant individuals around the same location20.
Populations like African Americans, African Caribbeans and Mexican Americans were formed within the past 400 years (i.e., within approximately 15 generations)21. Stretches of DNA with contiguous European and African ancestry have therefore not had sufficient time to break up due to recombination; hence, allelic associations in these populations typically extend over distances as large as 20–30 cM22–23. For studies mapping genes contributing to variability in drug outcome within these populations, the large amount of linkage disequilibrium (i.e., linkage between markers with ancestral information) in these admixed populations will translate into smaller requirements for both marker saturation and sample size19, 24–25. This principle is important, since it is estimated that about 20% of the genetic material in today’s African-American population originated from a non-African, predominantly Caucasian, source26.
Gene mapping can be accomplished within admixed populations through the application of three fundamental steps. First, a panel of AIMs that differentiate well between ancestral populations must be designed. Next, individuals are genotyped using the panel (e.g., following case control design), and the mosaic of ancestries is inferred for each individual. Finally, the inferred ancestral profiles are scanned in search of an aberration skewed toward the ancestral population with the higher risk, as expected based upon prior association with the locus of interest. Given the recent and common origin of all human populations, any two unrelated individuals share more than 99.8 % of their DNA sequence, and variation within populations is by far greater than variation between populations27. The admixture between geographically isolated populations such as Europeans, Africans and Native Americans, has only a small average effect (0.2%) on the genetic variation of the gene pool. For most genomic regions, the parental populations have similar allele frequencies and, at these frequencies, admixture may be of limited consequence28. However, at other loci, there have been marked changes in allele frequency in the time since the separation of parental populations, and it is at these loci that difference in frequencies of risk alleles can be leveraged to identify the loci impacting clinically recognizable traits9, 29–31. Stated another way, admixture mapping is most applicable when variability in a given drug outcome is significantly different between the ancestral populations from which the admixed population has been formed. When such a trait is studied, admixed individuals demonstrating greater variability are expected to show an elevated genomic contribution from the ancestral population with the higher prevalence of the trait around the associated genetic loci.
The arguments in favor of admixture mapping are compelling, and the statistical methods are improving rapidly. Until recently, the availability of admixed populations suitable for study (with differences in disease and/or allele frequencies) has been somewhat limited18, 32. Most pharmacogenetic association studies conducted to date have involved retrospective genotyping of archived DNA from previous randomized treatment trials33. Nearly a decade ago, international efforts were initiated to launch the construction of DNA biobanks that were either based in the community (population-based) or in the context of routine clinical care (practice-based)34. Biobanks typically include biological samples (i.e., serum, plasma and DNA) linked to structured clinical databases (i.e., comprehensive electronic medical records). These biobanks are uniquely suited for studies quantifying the impact of ancestry in admixed population and play a role in the pathway to personalized medicine in which treatments will be no longer be “one size fits all” but instead “tailored” to the molecular and genetic profiles of each patient based on pharmacogenetic predictors “treatment plans that fall in line with a “one-size-fits-one” approach. For example, the eMERGE network (electronic Medical Records and Genomics) represents a group of five large academic medical institutions within the United States that have collected DNA linked to secure encrypted clinical data extracted from dense electronic medical records33. The participating institutions, Vanderbilt University (coordinating center), Northwestern University, Marshfield Clinic, Mayo Clinic, and Group Health, have begun the process of standardizing clinical phenotypes within the context of disease (onset and progression), as well as treatment outcome (efficacy and toxicity), for ongoing genome-wide association studies using densely populated high-throughput SNP scans (http://www.gwas.net). In many cases, these data are available along with self-reported race and family structure across multiple generations. Similar efforts are also underway at Harvard Pilgrim, and Fallon Healthcare (in the Northeastern United States), Kaiser Permanente Georgia (in the Southeastern United States), HealthPartners, Henry Ford, and Marshfield Clinic (in the Midwestern United States), Kaiser Permanente Colorado, Kaiser Permanente Northwest, Group Health Cooperative, Lovelace, Kaiser Permanente Southern California, Kaiser Permanente Northern California, and Kaiser Permanente Hawaii (in the Western United States) 35. In some cases, race-specific biobanks are also being developed. The African-American DNA biobank at Howard University in Washington, D.C. represents the largest resource of its kind36.
Most biobanks collect only limited historical information on the ancestry (origin) of the donors. In the United States, it would be inaccurate to assume homogeneity of population ancestry in the context of clinical practice based biobanks. Most often race/ethnicity-ancestry-data are missing from the electronic medical records; in some cases, race/ethnicity is estimated by a study coordinator’s visual inspection at time of enrollment; in others, study participants are asked directly to self-identify a single race/ethnicity that they feel best identifies them. While self-reported race and ancestry can predict general ancestral clusters, self-reported race does not reveal the extent of admixture. Furthermore, group identity (e.g., the Hispanic American identity often indicated in queries of ethnicity) can be much more complex than self-identity, and it is therefore even less feasible to rigorously define an ethnic group by its genome.
Hence, there is “missing ancestry” in these biobank resources. Currently, two main approaches are used for inferring genetic ancestry. If the ancestral populations of the cohort being studied are known, AIMs can be genotyped to infer global ancestry using Principle Components Analysis (PCA) or other clustering algorithms37–39. Often, however, the ancestral population information is usually not available or not known with confidence. This is particularly true for populations of relatively complicated admixture or unknown origins. In these cohorts, a large number of loci need to be genotyped, followed by application of PCA to individual level genetic data, to fully characterize biogeographical ancestry. Adjustment made through this approach increases investigator confidence that pharmacogenetic findings are not spurious associations.
In the hypothetical example shown in Figure 2, the presence or absence of toxicity could be adjusted using a dummy variable in an admixed population with parental (ancestral) populations A1 and A2. Any allele showing higher frequency in the A1 population than in A2, could in theory show association with toxicity. Thus, the stratification variable would be a confounding variable, associated with both the exposure (alleles, haplotypes or genotypes) and the outcome of interest (higher incidence of toxicity in the A1 population than in the A2). This problem is compounded when analyses are conducted on a genome-wide scale. The extrapolation of pharmacogenomic data from genetically-homogeneous populations to admixed populations would be expected to generate large numbers of false positive and false negative results. The failure to recognize admixture can thus prevent proper characterization of the genetic structure underlying ADRs in populations such as these, leading to inaccurate prediction of outcome as well as incorrect inferences about the evolutionary factors driving patterns of diversity40.
Consequently, an in silico method of data mining is needed to extract and utilize non-redundant AIMs from the ever-increasing wealth of genomic data in public domains41. Currently, the amount of information characterizing variants across the human genome is somewhat limited for the majority of racial and ethnic groups10, 42. Nonetheless, utilization of these data represents an economical, rapid, and practical strategy for developing a comprehensive and informative panel of AIMs28.
Admixture-based methods rely on the use of suitable markers, and accurate estimation of allele frequencies, from appropriately identified parental populations (Shriver and Kittles, 2004). Theoretically, any marker that has large allele frequency difference between ancestral populations can be used for estimating individual ancestry and for admixture mapping. These include SNPs, restriction fragment length polymorphisms (RFLP), microsatellites/ simple tandem repeat (STRs), insertion/deletion polymorphisms (INDELs), copy number variations (CNV)4, 43–47. The ideal AIM has one allele that is monomorphic in one population (p = 1.0) and that is not present in another48. However, most alleles are shared between populations, and alleles common in one population are often common in other populations. Since most genetic markers are unaffected by admixture, it is imperative to identify and choose the most informative markers that show high levels of absolute difference in allele frequency between the parental populations30.
To date, several measures of marker informativeness (ability of markers to differentiate between populations) have been developed to select the most informative AIMs (reviewed in 48) from an ever-increasing wealth of genomic databases49–52. These measures include: absolute allele frequency differences (δ), Shannon Information Content (SIC), Fisher Information Content (FIC), measure of Informativeness for assignment (In) and F statistics Index (FST). The cutoff values for δ are highly subjective, and have steadily decreased over time from 0.543 to 0.444 to ≥ 0.347. Cutoffs that have been used for other measures are FST ≥ 0.447, FIC ≥ 2.048, and for In or SIC ≥ 0.321, 48. With so many methods available, it is important to understand their similarities and differences in practice. Each approach applies a different computer algorithm for marker selection. For example, measures such as δ can be used for only two ancestral populations at a time48, while measures such as FST FIC, and SIC can be applied to select informative markers for admixed populations formed from two or more ancestral populations. Further, the estimation of FIC, In and SIC requires known parental proportional contributions to the admixture. Some analytical methods require this degree of granularity while others do not. If a study is properly designed, the AIM selection method can often lead to reductions in the amount of genotyping required for inference.
Several SNP panels have been utilized to adjust the results of genetic association studies according to population admixture over the past few years21, 28, 30, 43–44, 53–55 focusing on identifying panels of markers suitable for admixture studies as well as in developing measures of ancestry information content of markers. Smith et al.46 screened 744 microsatellite markers for composite δ values in four different populations and identified a genome spanning set of 315 markers (average spacing 10 cM, δ>0.3) for mapping in African Americans and 214 markers (average spacing of 16 cM, δ>0.25) for mapping in Hispanics. A DNA pooling method was used to identify 151 AIMs (microsatellites and short insertion/deletion polymorphisms), with δ= 0.3 for mapping in Mexican American populations to distinguish between European-American and Native-American contributions. Ninety-seven AIMs were identified for mapping in African-American populations that show limited variation within Africa47. However, it is likely that different sets of markers will be needed for different populations20.
Understanding the genetic consequences of admixture in large populations is important because admixture can be both a confounding factor and a source of statistical power56. Admixture is a distinct disadvantage within some analytical contexts. Admixture causes false-positive associations in genetic studies, subjecting case-control findings to stratification bias. Population stratification occurs when the population under study is assumed to be homogeneous with respect to allele frequencies but in fact is comprised of subpopulations that have different allelic frequencies for a particular gene of interest. If these subpopulations also have differing frequencies for the primary trait of interest, then subpopulation membership is a confounder57. As such, an association between a gene and drug outcome may be incorrectly estimated without properly accounting for population structure29, leading to both false positive and false negative associations58. In order for the bias due to population structure to exist, both of the following conditions must be true: (a) the frequency of the marker of interest varies significantly by race/ethnicity, and (b) the background prevalence of the trait varies significantly by race/ethnicity59.
Currently, three fundamentally different methods are used to correct for confounding induced by admixture within association studies37, 39, 60. These methods are (1) genomic control, (2) structured association, and (3) principle component analysis. Genomic control uses a set of non-candidate, unlinked loci to estimate an inflation factor, l, which was caused by the population structure present and then corrects the standard Chi-square test statistic for this inflation factor. The structured association method utilizes Bayesian techniques to assign individuals to “clusters” or subpopulation classes using information from a set of non-candidate, unlinked loci and then tests for an association within each “cluster” or subpopulation class. As noted earlier, PCA can also be used to identify and adjust for population substructure due to admixture37. In PCA, genotype data is used to infer axes of genetic variation and determine principal component scores, which are then included as covariates in the analysis model. Thus, PCA collectively scores a set of alleles whose allele frequencies are correlated because they derive from the same parental population.
In other situations, however, admixture may be an advantage for pharmacogenomic research. Ancestry-phenotype correlation studies are useful for examining the role of genetic factors underlying observed racial/ethnic differences in loci influencing complex phenotypes like drug response. For example, population admixture results in longer linkage disequilibrium (LD) segments than in the previously isolated populations, enabling fewer genetic markers to be used in association studies23. This property is valuable in investigating causality61 and is thus highly useful in pharmacogenomic research. Admixed populations in clinical practice-based cohorts can also be utilized to gain additional information on people, such as African groups, that are typically under-represented in clinical drug trials. Studies in African American populations in particular will fill important pharmacogenomic information gaps. Estimation of the genetic admixture proportion in the admixed population can be performed at the population level using allele frequencies or at the individual level using genotype frequencies13, 62–63. In the former case, the focus is on estimating the fraction of genes in the admixed population that come from each parental population. In the latter, the proportion of loci in the genome of a single individual that come from a parental population is used to adjust for admixture fractions in association studies64.
As shown in Table 1, two types of approaches are typically used to estimate admixture: Bayesian methods, and assignment of maximum likelihood (ML)18, 65. Both approaches examine the transmission of alleles according to parental population (e.g., African- versus European-by-descent) at specific loci along each chromosome66. Programs such as AdmixMap, AncestryMap and STRUCTURE leverage data from the founding populations (e.g., in the form of individuals for STRUCTURE), to provide an informative prior probability and calculate the posterior distribution of admixture estimates using a Markov Chain Monte Carlo (MCMC) method within the Bayesian framework39, 67–70. Conversely, ML estimation methods, such as IBGA, PSMIX and FRAPPE, fall under the frequentist framework, where prior information is not required54, 71–72.
While genome wide association studies are being conducted on an unprecedented scale in cohorts that are disease-based, most of the work conducted in the context of treatment outcome has been limited to biological candidate genes73. The application of genome wide SNP scanning platforms to studies of drug response has occurred only recently, and the number of such studies in the literature remains relatively small. Because several GWAS have recently been reported for outcomes related to lipid-lowering therapy74–76, the following discussion/example explores the need for analytical strategies that deal with admixture specifically within in the context of cardiovascular risk reduction.
Cardiovascular disease is responsible for 1 in every 3 deaths in the U.S., and it affects patients of all races and ethnicities77. HMG-CoA reductase inhibitors (statins) are highly efficacious in the primary and secondary prevention of cardiovascular disease in patients at risk78. The efficacy of these drugs is directly proportional to the amount by which they decrease low density lipoprotein (LDL) cholesterol79. Statins are currently the most widely prescribed class of medications in the world, and the last decade has seen a trend toward much more aggressive dosing80. In Treating to New Targets (TNT), a randomized treatment trial of more than 10,000 subjects with known coronary artery disease, high dose atorvastatin was clinically superior to low dose atorvastatin81. However, when 1,984 rigorously selected participants from this trial recently underwent whole genome scanning (322,000 SNP scan from Perlegen), no variants from this scan were associated with LDL cholesterol reduction at a level that reached genome-wide significance75. Although subsequent efforts reflecting combined cohorts from multiple statin treatment trials (TNT, atorvastatin 10 mg daily; CAP, simvastatin 40 mg daily; and PRINCE, pravastatin 40 mg daily) are beginning to identify variants associated with lipid lowering efficacy, their effect size appears to be relatively small82. This may in part be due to misclassification bias introduced by utilizing single-dose efficacy. The source of this bias can be explained as follows.
Individual trials such as those introduced above typically have lipid data available for only a single treatment dose. However, if GWAS are to resolve genetic determinants of outcome for these drugs, a more thorough characterization of phenotype may be necessary. To reduce misclassification, data should be collected across multiple doses, and full dose-response relationships should be leveraged to extract rigorous phenotypic traits representing potency (ED50) and maximal efficacy (Emax)73. Such an approach is possible within the context of clinical practice-based datasets33, 35, 83. Biobanks such as those contained within the NIH-funded eMERGE Network (http://www.gwas.net) represent an unprecedented opportunity to characterize treatment outcome across large academic medical centers with DNA samples linked to secure encrypted copies of comprehensive electronic medical records. Genome wide association studies are already being conducted within these combined biobanks, for six primary study outcomes (including modifiable cardiovascular risk factors).
Datasets such as these are typically comprised of individuals with large variability in ancestry. Thus, analytical strategies adjusting for admixture will further improve resolution of the genetic architecture underlying lab-based efficacy for drugs used to reduce cardiovascular risk. With the statistical power that accompanies these large multi-institutional networks, investigative teams will also be able to characterize the impact of admixture on cardiovascular efficacy (i.e., the ability of pharmacologic intervention to reduce hard cardiovascular events). Figure 3 shows the distribution of death from acute myocardial infarction across the United States. While disparities in cardiovascular death rate clearly reflect complex interactions between cultural factors (e.g., systems of care) and operational factors (e.g., access to care), the contribution made by genetic factors is substantial. Robust admixture mapping strategies will therefore be essential to the successful characterization of genetic determinants of treatment outcome within this context.
The pharmacologic treatment of cardiovascular disease involves optimization of hemostasis, hemodynamics, and lipid homeostasis (Table 2). Antiplatelet agents are a mainstay of therapy. Although aspirin is efficacious in the secondary prevention of coronary artery disease, there is wide variability in the degree to which this drug attenuates platelet aggregation84–86. Aspirin-induced changes in platelet function are heritable86, and the genetic architecture underlying aspirin response has therefore been an intense focus of investigation84, 87–88. Clopidogrel is used to prevent stent restenosis following percutaneous intervention, and candidate genes contributing to its efficacy are rapidly coming into view89–91. Pharmacologic modulation of clotting factor synthesis is highly efficacious in the prevention and treatment thromboembolic disease, and rigorous gene-based dosing models (including VKORC1, CYP2C9 and CYP4F2) are being developed for warfarin therapy (based on age, gender, ancestry, and a variety of clinical covariates) (http://www.warfarindosing.org). These models will be stronger when the data are corrected for admixture92. For example, CYP2C9 inactivates S-warfarin and the distribution of variant CYP2C9 alleles is influenced by ancestry. The defective CYP2C9*5 allele, previously found in African-Americans and sub-Saharan Africans, but not in Europeans and their descendants, has been recently detected in a white Brazilian man. The relative contributions of European, African and Native Americans to his genetic pool were 92.0%, 7.5% and 0.5%, respectively93–94. The CYP2C9*5 allele was also detected in the proband's mother and in one of his brothers, consistent with inheritance through matrilineal African ancestry93. Thus while the contribution of African alleles to his genome was limited to 7.5%, this relatively small contribution will be critical to predicting his response to medications.
The genetic underpinning of outcomes related antihypertensive therapy has also been an intense focus of investigation over the past decade. First line agents recommended by the Joint National Commission on Hypertension (JNC-VII) include β blockers and thiazide diuretics. Like warfarin, variance in outcome related to β blocker use can be partly explained by a combination of well-characterized pharmacodynamic and pharmacokinetic candidate genes (ADRB1, ADRB2, CYP2D6, UGT1A1)95–96. However, current models are limited97 in part due to the impact of race98. Factors contributing to the efficacy of thiazide diuretics is also influenced by ancestry, and some have been partly characterized (e.g., polymorphisms in the G-protein β3 subunit)99–100.
As noted earlier, lipid lowering therapy is also a pivotal component in the overall prevention and treatment of cardiovascular disease, and prior work with biological candidate genes has clearly demonstrated that both pharmacodynamic and pharmacokinetic markers are associated with the degree to which statins lower LDL cholesterol (HMGCR, CYP3A4, and CYP3A5, respectively)101–103. In 2005, we reported that variability in CYP3A5 was associated with the severity of statin-related muscle toxicity104, and distribution of the causative allele (CYP3A5*3) is strongly influenced by ancestry. The CYP3A family of enzymes catalyze the oxidative Phase I metabolism of nearly half of all known therapeutic agents105. Interindividual variability in CYP3A activity exceeds 80%106, and major interethnic differences in the pharmacokinetics of some CYP3A substrates have been reported107. Allele frequency differences for representative variants in CYP3A4 and CYP3A5 are shown in Table 2. Across the pharmacogenetic functional variant(s) investigated, allele frequencies within the admixed African American population are intermediate between the respective ancestral (European and African) populations, confirming our initial hypothesis that admixed populations require a different drug evaluation strategy than the relatively homogenous ancestral populations.
It is important to recognize that variability in CYP-dependent drug oxidation only represents a single component of an individual’s genetically predetermined capacity for drug disposition (Wilke 2005). For example, many statins (and CYP-derived hydroxyl-statin intermediates) undergo additional modification through Phase II conjugation. Promoter SNPs in UGT1A1 are associated with Gilbert’s disease, and some predict outcome in the context of cancer drugs95. Recent data indicate that UGT1A3 variants impact statin kinetics108.
Membrane transporters are also known to influence statin kinetics. Genetic variability in the organic anion transporter (solute carrier) SLCO1B1 has been associated with differential modulation of the cellular transport of statins into a variety of tissues, making it very tempting to speculate that SLCO1B1 genotype might influence subject risk for the development of statin-related outcomes. To date, one of the most noteworthy successes of the application of genome-wide SNP scanning to the characterization of drug outcome has been the identification of SLCO1B1 variants associated with statin-induced myopathy74. In this effort, the SEARCH collaborative group applied a 317,000 SNP scan (Affymetrix) to combined cases of definite myopathy (serum creatine kinase, CK >10 fold ULN) and “incipient” myopathy (CK >3 fold ULN). A single variant survived statistical correction for multiple testing. After re-sequencing studies revealed that this variant was in linkage disequilibrium with a previously characterized non-synonymous coding variant (V174A) in SLCO1B1, the putative causative allele was further tested for association in a subset of 49 definite myopathy cases from the original study population, revealing an odds ratio for myopathy of 4.5 per copy of the C allele (95% C.I. 2.6 –7.7), in patients exposed to high-dose simvastatin (80 mg daily). Replication efforts conducted in the Heart Protection Study (HPS) cohort, using data from trial participants exposed to a lower dose of simvastatin (40 mg daily) revealed a more modest risk, RR = 2.6 (95% C.I. 1.3 – 5.0)74. Efforts are now underway to quantify the effect of this variant on myopathy risk in a clinical practice-based setting (www.pharmgkb.org). Like other efforts being conducted within the world’s growing biobanks, success will depend upon a thorough consideration of admixture (Figure 4).
There is a range of possible scenarios by which admixture dynamics could impact outcome. Two extreme cases occur when (1) parental populations contribute to the hybrid in a unique generation of admixture (i.e., an intermixture or hybrid-isolation model); and (2) there is continuous gene-flow across several generations from the ancestral populations to the admixed one (i.e., a continuous gene flow model). In real life, however, it should be emphasized that virtually no admixed human population exhibits single time or discrete admixing (i.e., hybrid-isolation model) of the type that can be artificially produced with inbred strains of model organisms. Rather, human admixing is continuous over time - a phenomenon that creates variation in the degrees of admixture possessed by any one individual in an admixed population. This variation can be large and raises issues in the analysis of admixed populations, as it will create patterns of linkage disequilibrium that are complicated and difficult to interpret in simple association studies109. Current population genetic models and software are often limited to the island model or continuous gene flow, neither of which is likely to accurately represent the complexity of the process by which the majority of admixed human or non-human populations are formed110. Even in the data-rich field of modern genomics, no amount of data can make up for an inappropriate model111.
Many complex traits with a large impact on public health are sexually dimorphic, e.g. HDL cholesterol. Males or females may contribute disproportionate amounts of admixture or in some cases admixture may be restricted to one gender only 112. Unequal contributions of the different genders of an ancestral population to an admixed population result in a phenomenon known as ‘gender-biased admixture’. Differential susceptibility to a disease in males and females is a common observation in human populations (a case of parent of origin effect)113–114. Whether gender-biased admixture influences treatment outcome (e.g., ability of pharmacological intervention to increase circulating HDL cholesterol level) remains to be determined35.
Gender-biased admixture will cause admixture estimates from loci with different patterns of inheritance to differ markedly. Thus, admixture estimates are not pooled from all loci, but are compared between loci with different patterns of inheritance. As a result, if females contribute more than males to a given trait within an admixed population, estimates of admixture will lie on a gradient: Mitochondrial DNA > X chromosome > autosome > Y chromosome. By contrast if males contribute a greater proportion, the gradient is reversed: Y chromosome > autosome > X chromosome > mitochondrial DNA111.
Gender-biased gene flow with elevated European male and African female contributions44, 115–117 to African Americans, as a result of the European explorers, traders and missionaries who have traveled the world over the past 500 years have been predominantly males, results an admixed populations likely to exhibit male-biased admixture111. A gender-biased pattern of gene flow with an excess of European male and African female ancestry was also shown recently where the African-American X chromosome showed consistently elevated levels of African ancestry compared with the autosomal regions variability in African ancestry118. Due to a threefold higher European male contribution compared with European females to the genomes of African American individuals (Y chromosome vs. mtDNA), admixture-based gene discovery will have the most power for the autosomes and will be more limited for X chromosome analysis119. This principle has been illustrated above for warfarin outcomes in the context of an autosomal variant, CYP2C9*5, transmitted through materilinear African ancestry93. Further, any genome scan aimed at identifying variants that predict treatment response on the X chromosome in African Americans should account for its lower admixture fraction67, 69. The current admixture approach considers that all neutrally evolving loci should thus give the same estimate of admixture proportions resulting from gender- biased admixture estimates118.
The term race (often used to imply geographic or genetic ancestry) reflects population clusters based on genetic differences due to evolutionary pressure. These differences could indicate variation in allele frequencies and/or patterns of polymorphism. Inter-racial variations in response to medication were observed nearly a century ago. In 1920, Paskind investigated the effect of atropine sulphate on 20 Caucasians and 20 African American men in Cook County Hospital, Chicago, USA. Initial slowing of the heart rate, reaching a maximum of 10–15 minutes, was observed frequently in European American but not in African American subjects. More recently, self-reported race has been leveraged to direct specific therapies impacting other hemodynamic parameters (both physiologic and pathophysiologic). The combination of hydralazine and isosorbide dinitrate has clearly been associated with greater clinical benefit in patients of specific races, within the context of congestive heart failure (CHF)120–121. While prescribing based upon self-reported race remains controversial, targeted application of such an approach has been driven by well-designed trials wherein self-reported black patients randomized to hydralazine and isosorbide dinitrate realized a 43% reduction in mortality compared to standard therapy (95% CI, 11% to 63%)120.
Multiple ancestry-dependent associations and disease susceptibility loci have been reported in African-Americans specifically, using admixture mapping for cardiovascular disease122–123, dysmetabolic traits124–125, and cancer66. The success of each effort was partly due to marked differences in the frequency of each respective trait in ancestral populations. However, none of these studies has led to finely mapped genes. Thus admixture mapping is an intellectually appealing concept, but stringent criteria have yet to be achieved31, and independent replication is pending for these reports. Recently Reich et al126 mapped a clinically relevant immune response trait and replicated this association in an independent sample on chromosome 1, demonstrating that admixture mapping can not only coarsely localize traits but can also fine map a phenotypically important variant.
Ethnicity is defined by commonality in culture and social practices, including diet127. Examining allele frequencies in different ethnic groups can help differentiate functional polymorphisms (i.e., those causing a change in phenotype) from marker polymorphisms (i.e., those in linkage disequilibrium with functional polymorphisms in a selected subset of individuals). As the linkage disequilibrium of a marker polymorphism is unlikely to remain true across the breadth of human diversity, identifying the correct, causative polymorphism is important for designing accurate genetic tests for people of all ethnic backgrounds.
Historically, the North American population was formed by episodic migration flows and admixture thereafter. While the majority of these flows initiated from mainly Europe, America’s current demographic makeup is much more diversified. In 2008, the US Census Bureau (www.census.gov/popest/estimates.php) estimated the national population to be 304.3 million. Of this total, 15.4% self identified as being of Hispanic or Latino origin, 12.8% as African-Americans and 4.4% as Asians. This heterogeneity can be easily found in health risk factors to disease prevalence. Admixture among these population groups continues to be an ongoing process, and the country is becoming a multi-racial genomic mosaic country. It is time to develop and standardize treatment and drug testing procedures for this admixed population. Interethnic differences in drug responses are a well-recognized problem resulting in both undertreatment and overtreatment of individuals receiving similar doses of drugs, with the potential for lack of therapeutic effect as well as toxicity. The recent reporting of significant pharmacogenetic differences in β2-adrenergic receptor polymorphisms and bronchodilator responses to albuterol between the two largest Latino admixed groups, namely Mexicans and Puerto Ricans94, is a striking example of the hazards that are associated with ignoring admixture within ethnic groups, as is often done in the pharmacogenetic literature.
The genetic architecture underlying treatment response remains largely uncharacterized for most prescribed drugs. As candidate gene and genome-wide approaches are increasingly being applied to archived biological materials from existing clinical trial databases, the world’s growing practice-based biobanks will represent a powerful resource for characterizing the generalizability of these findings within the community. These databases contain large numbers of individuals with admixed ancestry. In the United States, there has been a significant intermixing among racial/ethnic groups, thereby creating populations that are a mosaic of multiple continental ancestral populations (European, African, and Native American). This admixture creates long segments of DNA (haplotypes) that have distinguishable ancestral origins, and, hence, genetic differences in the prevalence of polymorphisms affecting drug-metabolizing enzymes, drug transporters, receptors, and signal transduction mechanisms must be considered when individualizing pharmacotherapy. Failure to adequately control population stratification due to ancestry may lead to spurious results in large-scale pharmacogenomic association studies.
This work was supported by K01HL103165, P30HL10133, U19A170235, U01HG004608, U01HL069757, and R01DK080007.