|Home | About | Journals | Submit | Contact Us | Français|
As genotyping technology has progressed, genome-wide association studies (GWAS) have matured into efficient and effective tools for mapping genes underlying human phenotypes. Recent studies have demonstrated the utility of the GWAS approach for examining pharmacogenomic traits, including drug metabolism, efficacy, and toxicity. Application of GWAS to pharmacogenomic outcomes presents unique challenges and opportunities. In the current review, we discuss the potential promises and potential caveats of this approach specifically as it relates to pharmacogenomic studies. Concerns with study design, power and sample size, and analysis are reviewed. We further examine the features of successful pharmacogenomic GWAS, and describe consortia efforts that are likely to expand the reach of pharmacogenomic GWAS in the future.
Since 2005, genome-wide association studies (GWAS) have matured into a powerful tool to identify single nucleotide polymorphisms (SNPs) that can be reproducibly associated with a variety of human phenotypes. Currently, well over 300 papers have reported significant associations of common variants with a range of phenotypes and diseases . These successes have provided numerous insights into the relationship among genetic variants, biological pathways, and human traits, as well as shown how proper study design and analysis can lead to the success of GWAS. A key lesson from this first generation of GWAS is that no single approach will be appropriate for all phenotypes .
The genetics of drug-response outcomes, broadly referred to here as pharmacogenetic/pharmacogenomic outcomes, are a particular category of phenotypes that present unique challenges and opportunities in gene discovery . In this review we discuss the advantages and limitations of GWAS as applied to pharmacogenomic outcomes. Some of these challenges are variations on general concerns for disease gene identification, whereas others are unique to pharmacogenomic outcomes.
Like studies of disease phenotypes, the success of any pharmacogenomic GWAS will depend on the effect size and allele frequency of genetic variants that influence the trait, the sample size available to detect those variants, the population under study (treatment protocol, dosage, patient features including self-reported race/ethnicity, etc.), and study design (observational study or randomized controlled trial). Unlike most disease phenotypes, pharmacogenomic outcomes often have clear, clinically defined phenotypes and well understood mechanisms that may underlie variation in drug response, including known systems of transport and metabolism, as well as sites of drug action. In addition, larger genetic effects may exist for pharmacogenomic traits than for disease phenotypes, providing greater statistical power for genetic association studies.
An important potential limitation for pharmacogenomic GWAS is sample size. GWAS for traits like height or QT or complex diseases like diabetes need and benefit from large numbers and currently mega-meta-analyses are identifying and validating associated loci. Such large sample sets are generally not possible for pharmacogenomic outcomes since they usually include by definition both a disease (often with low prevalence) as well as a well-curated drug response phenotype (which further reduces the available study population).
In this article, we discuss key issues for GWAS, including the strengths and limitations of this approach. We then elucidate issues of heightened importance in GWAS of pharmacogenomic traits. We discuss appropriate study designs and analysis strategies, and describe lessons from successful pharmacogenomic GWAS. We end with a discussion of ongoing efforts to develop consortia for the purpose of obtaining large sample sizes for drug response outcomes.
There are clear, well-understood advantages to a genome-wide association approach to phenotype association discovery. GWAS are conventionally intended as an unbiased scan of the genome, interrogating the majority of common genetic variation for disease association. In contrast to a candidate gene approach, whether narrow or broad in scope, GWAS allow the identification of totally novel susceptibility factors that promise to provide us with better biological understanding of phenotypes . There are many candidate mechanisms that drive variability in drug responses: metabolism, transport, targets, target partners, immunologic pathways (e.g. for allergic reactions), etc. that have directed many successful candidate gene studies ,However, they cannot identify genes outside of the current knowledge of those mechanisms. GWAS allow such novel discovery.
GWAS have distinct advantages as compared to more traditional linkage based approaches . There are three key general advantages of GWAS approaches for gene identification, each of which are exaggerated for pharmacogenomic outcomes:
There are additional advantages to GWAS that are more specific to pharmacogenomic outcomes. First, GWAS provide context for understanding the relative importance of genetic contributors to pharmacogenomic traits that may otherwise be unavailable. The genetic component of human phenotypes can be assessed by estimating heritability (the proportion of variation in a trait due to genetic factors) through methods such as variance components analysis, segregation analysis, etc. Each of these methods requires family data, which, as noted above, is usually difficult to collect for pharmacogenomic outcomes .
Another specific application of GWAS in pharmacogenomics is the ability to rule out – with pre-specified confidence intervals – contributions by unidentified genes to a drug response phenotype. Because pharmacogenomic GWAS can directly investigate the role of genetic variation on clinical outcomes, the findings from pharmacogenomic GWAS can be more rapidly translated to clinical practice. As translation to the bedside is one of the goals of pharmacogenomic gene mapping , it is important to ensure that any unanticipated important genetic contribution to variability in a drug response is not missed . Of equal importance is the identification of novel mechanisms, both for drug response and/or adverse drug reactions. So, having identified variants in gene X or Y as contributors to a variable drug response, it is key to ensure that there is no other important genetic contributor before mounting a trial. Understanding the influence of genetic variants in drug response can limit unanticipated variability in a drug treatment . The role of GWAS in this process is evident in the evaluation of the genetic component of warfarin dosing . The strong association of variants in VKORC1 and CYP2C9 for stable warfarin dosing were well established [10–12], but before the National Heart, Lung and Blood Institute (NHLBI) in the United States would mount a large clinical trial it was important to determine if there were other genetic variants that also had large effects on stable warfarin dosing. GWAS [13, 14] have now ruled out large contributions by other loci, thereby allowing clinical trials to proceed . Similarly, a GWAS for clopidogrel effect on ADP-induced platelet aggregation identified only one associated locus, at CYP2C9/19, laying the groundwork for design of clinical trials . As genotyping platforms with increased SNP density become available, the coverage of genetic variation in the human genome will become more complete, providing greater confidence that clinically important genetic effects on pharmacogenomic traits will not be missed. Thus while many variants in drug metabolism genes have been shown to confer large clinical effects, that have often been identified without GWAS (e.g. by well informed candidate gene studies), even GWAS with “negative” results add this crucial additional information .
Despite the advantage of GWAS studies discussed above, there are important caveats that must be remembered in their design and application. While many of these caveats are true of GWAS in general, the impact of these concerns may be different in pharmacogenomic studies than in general trait mapping.
A key assumption in GWAS is what is known as the common disease/common variant hypothesis . The common disease/common variant hypothesis proposes that most of the genetic risk for common, complex diseases is attributable to relatively common (minor allele frequency >0.05) polymorphisms . The alternative to the common disease/common variant hypothesis is that multiple rare variants cause disease at high prevalence in the population through a variety of mechanisms, Such variants can represent genetic heterogeneity of variants in a single gene, or multiple rare variants within genes in the same pathway that have cumulative effects. These two hypotheses have important implications—common variants are thought to impart subtle effects on gene function, often through changes to gene regulation. Rare variants may have larger effects on gene function, such as nonsynonymous variants that alter the amino acid sequence of the resulting protein, and as a result lead to large changes in disease risk or trait values. As a result, it is likely that both common and rare variants will contribute to common phenotypes, but the relative proportions will influence the appropriate methods for discovering associated variants. The GWAS approach is well powered to detect common variants with modest effects. GWAS is less effective at testing rare variation, a problem that is confounded by the DNA microarrays used in these studies, which have been designed to capture common variation. Even “next generation” GWAS that will reliably interrogate (directly or indirectly) all variation with minor allele frequency > 0.005 may be insufficient to identify enough of the contributory variation to allow us to understand biology if most of that variation has minor allele frequency < .005, as the sample sizes required to achieve sufficient statistical power for such effects may be prohibitive. As “next generation” sequencing becomes more accessible, and whole genome sequencing becomes more affordable, more rare variant analysis will be possible in pharmacogenomics.
An important concern in GWAS studies for pharmacogenomics is of the potential for bias in the selection of genetic variants . Although large numbers of variants with low minor allele frequency are included in the densest GWAS platforms, GWAS have little power, given sample sizes available, to detect significant associations with low minor allele frequency (MAF) SNPs. Additionally, it is widely recognized that genotype quality is not as high for rare variants as it is for more common variants. Consequently, a common approach is to not assess the significance of associations with rare variants (MAF < .01). This further compounds the limited statistical power to detect associations with less common genetic variants. Moreover, SNPs included on high throughput platforms must pass stringent tests for ease of genotyping, which leads regions with gene duplications (and pseudogenes) to be poorly represented on high-throughput genotyping products, and many of these – such as CYPs or the HLA locus are precisely the genes of greatest interest for pharmacogenomic study. The human cytochrome P-450 (CYP) family of genes that encodeenzymes active in xenobiotic metabolism have been associated with a large number of pharmacogenomic outcomes . They are known to be highly polymorphic, with a wide range of allele frequencies across populations, and contain complex structural variation, with unique haplotypic structure and copy number variations . The coverage of these types of variation is limited on current GWAS genotyping platforms .
Experimental design is a crucial component of any successful GWAS, and pharmacogenomic studies have different advantages and limitations than traditional disease studies. The importance of proper definition and collection of phenotype data has become increasingly appreciated in the context of GWAS . An important advantage in pharmacogenomic studies is that multiple response phenotypes are often collected within the same study, such as efficacy and adverse events, allowing a broader dissection of trait genetics in a single study.
However, because all pharmacogenomic outcomes are responses to the environmental exposure of the drug and because these drugs are given in response to a disease condition, there may be complex interactions between disease and drug response relevant in phenotype definition. Precise definitions are essential for both the disease and drug response phenotypes, which are often discrete diagnoses from these complex relationships. For example, in some but not all cases, rare adverse drug reactions may represent a “tail” of response distributions and where to define that cut-off within the distribution can be a challenge. The SEARCH Collaborative Group demonstrated a successful approach to address this issue by combining subjects with both definite and incipient statin-induced myopathy into a single case definition . In other cases, a rare adverse reaction is an unexpected outcome often unrelated to the desired mechanism of action .
One efficient use of resources to collect pharmacogenomic phenotypes is to collect samples within the context of clinical trials, which streamlines the collection procedures. The use of clinical trial data for GWAS studies is not only an efficient use of resources, but has the advantage that similarly treated “controls” for the phenotype of interest are built into the trials. However, because some trials are not designed for GWAS mapping, the study designs used for collection may not be ideal for pharmacogenomic analysis (e.g. multiple drugs used in treatment arms, etc) . Obviously, this “challenge” is inherent to the treatment of diseases like cancer or end stage congestive heart failure in which it would be unethical to fail to treat patients with the current standard of care for this illness. If pharmacogenomic efforts are sub-studies of clinical trials, sample sizes may decrease, which reduces the power of the pharmacogenomic component. Because meeting recruitment targets is a primary goal in most clinical trials, genomic and pharmacogenomic efforts are often included only as sub-studies to which subjects may or may not consent; as a result, the power and generalizability of genomic studies is compromised. Genetic studies added as an afterthought may be viewed as creating a barrier to recruitment and are thus may not be a priority for sponsors. Collecting drug response phenotypes in health care systems with electronic medical records is another method of accruing subjects that is now being explored.
Sample size limitations are a challenge in any GWAS study, but are amplified in many pharmacogenomic studies. Particularly when studying rare drugreactions or adverse events, it is by definition not feasible to recruit thousands of patients with rare outcomes. This is a particular limitation in pharmacogenomic GWAS studies as the replication of association results in independent populations has become the “gold standard” for validation of results . If the collection of a reasonable sample size for a discovery cohort is at the edge of practicality, this makes the collection of a well-powered replication cohort often impossible. Consortia efforts (discussed below) have been motivated by this limitation, to combine samples from across the world to increase power and potentially identify replication cohorts to maximize power and provide validation to associated signals. However even the establishment of networks of investigators cannot necessarily overcome these limitations, and the field must look to creative approaches of validation/replication possibly involving functional studies or examination of related intermediate phenotypes.
There are unique “challenges” associated with validation/replication for pharmacogenomics. Clinical trials are expensive, and every study is unique since they are designed to represent an advance over previous studies to answer novel therapeutic questions. Therefore, in pharmacogenomics greater emphasis may have to be placed on functional validation of GWAS “signals” and on biological plausibility. Additionally, one must recognize that the larger the sample size, the more likely that features which confound the genotype/phenotype relationship will be undocumented or uncontrolled, thus diluting the “purity” of the phenotype and potentially reducing power .
Besides sample size, there are other practical limitations in study design for pharmacogenomic studies. As mentioned previously, family based designs are generally impractical with drug response outcomes, which means the field relies heavily on cohort or case-control studies for GWAS . While the number of cases may be limited by event frequency as discussed above, finding and selecting appropriate controls presents additional challenges. While GWAS of common diseases have taken advantage of the use of shared controls across studies, this is not often possible in pharmacogenomic studies, as typically controls must also be exposed to the drug of interest (though this may not be necessary in all cases). Other matching criteria must also be considered, such as disease interactions, population admixture, and additional environmental and clinical exposures.
As GWAS have become more prevalent, methodologies for the analysis and interpretation of results have co-evolved. Many tools have been developed and evaluated in the context of GWAS studies, and have resulted in the many successes seen to date. However, there are still many challenges in the analysis strategies used for GWAS in general, as well as particular challenges for pharmacogenomics, as discussed below.
The majority of previous GWAS studies have relied on the use of traditional statistical methodologies for analysis, and several tools have become widely used in the field. Software packages such as PLINK , have become very popular to implementing logistic regression (for case-control or cohort studies), linear regression (for quantitative traits), and family based association tests for GWAS studies.
After various types of corrections for multiple testing (Bonferroni, permutation approaches, etc) results of these analyses are typically prioritized with replication strategies. For single samples, the union of significant results from several analytical approaches (committee-based approaches) or measures of reliability from internal model validation is often used toprioritize robust signals. When more than one sample is available, multistage replication strategies are often employed to discover, prioritize, and validate signals. Finally, when multiple samples are available, meta-analysis is often used to obtain more comprehensive measure of association signals . Challenges in sample collection (discussed above) can limit the use of such multistage replication and meta-analysis strategies in pharmacogenomics. One alternative approach for replication, or at least prioritization, of association signals in pharmacogenetic studies is to utilize non-clinical GWAS studies of large collections of human tissue, cell lines, and genetic model organisms .
Such traditional approaches have been very powerful for identifying strong single-locus associations (“low-hanging fruit”) for a wide range of phenotypes in both common diseases and pharmacogenomic outcomes (reviewed below), and are typically applied in a way that fits within the “unbiased” intentions of GWAS association studies. Despite the successes of these approaches, their limitations for detecting and prioritizing more complex models have become a hot topic in the literature .
As many successful GWAS have been published, the sum of the genetic contributions of associated variants in many common traits is far below the estimated heritability of the traits. These gaps in explained heritability are potentially clarified by several potential etiologies. Low power to detect low effect sizes, the presence of rare variants contributing to phenotypes, unmeasured nucleotide or structuralvariation, complex methylation/epigenetic mechanisms, and gene-gene/gene-environment interactions are all hypothesized to contribute to the unexplained trait variation . In response to these limitations, new analytical approaches are evolving to detect complex genetic risk models, discussed below. These limitations are leading to refinement of methods for GWAS analysis, and these may be especially appropriate for pharmacogenomic studies.
While this “unbiased” intent of GWAS studies is to detect potentially new genetic associations that might not have been considered as candidate genes, there has been a recent appreciation for the fact that these simple analytical approaches ignore the large amount of expert knowledge available for a particular outcome. In response, there has recently been rapid development in the use of network and pathway analysis for analysis of GWAS data [30–33]. Literature searches (automated or hand-curated), databases of previous results, etc. are being exploited to improve the power of GWAS. Because there is much known about the mechanism and metabolism of many of the drugs evaluated in pharmacogenomic studies, there is very well directed guidance for such knowledge-driven analysis. The Pharmacogenomics Knowledge Base (PharmGKB) is an important resource and data repository that summarizes and curates drug response/gene relationships via gene variant annotation, hand-curated literature review, and important pharmacogenomic genes and pathways. An example of the potential of pathway-based analysis is discussed below.
Arguably the most important demonstrations of the utility and challenges of GWAS studies in pharmacogenomics are the empirical results of successful studies. A brief description of the outcomes evaluated in pharmacogenomic GWAS and the strongest signals identified is listed in Table 1. Details of each study can be found in the references provided.
The potential and drawbacks of an agnostic, unbiased approach for genetic association studies in pharmacogenetics are illustrated by a GWAS of the activity of a well-known polymorphic drug metabolizing enzyme, thiopurine methyltransferase(TPMT) in lymphoblastoid cell lines from the HapMap project . The goal of the experiment was to assess whether the TPMT polymorphism could be “rediscovered” in this fashion . Although common polymorphisms in TPMT were well tagged, and TPMT polymorphisms were associated with TPMT activity, the GWAS indicated that 96 genes were ranked higher than was TPMT itself. The extent to which these higher ranked genes are false vs. true positives is not yet clear, but indicate the difficulty of using GWAS approaches even for putatively monogenic traits.
An example of a GWAS for drug pharmacokinetics is provided by an analysis of methotrexate clearance determined in over 3000 courses of the drug given to 434 children with leukemia . Many candidate gene studies have previously been conducted to identify genetic variation associated with methotrexate pharmacokinetic variability, with limited success. Using GWAS, the SLCO1B1 gene was represented by multiple polymorphisms in several LD blocks, a finding that was replicated in an independent cohort of patients, suggesting that there are multiple mechanisms by which alteration of OATP1B1 (encoded by SLCO1B1) could affect methotrexate pharmacokinetics. Although methotrexate had been shown to be an OATP1B1 substrate, it was a rather weak one [38, 39], and so the gene had not risen to the top of candidate gene lists. This finding has implications for both toxicity to methotrexate, and to possible drug interactions with widely used OATP1B1 substrates, such as statins.
The utility of pathway-based analysis is demonstrated by Hartford et al. 2007 , who performed a GWAS examining etoposide-induced leukemia with MLL. They prioritized variant associations based on expression results, to identify alterations in three biological pathways: adhesion, Wnt signaling and regulation of actin. Results in an independent validation cohort confirmed the alterations in the adhesion pathway. None of the alterations identified were significant based on traditional association analysis, demonstrating the potential of more complex modeling to identify pathway-level associations.
While most of the published studies identified variants at a genome-wide significance level, many of them found strong potential signals that did not stand up to traditional analyses [41–43]. These negative results may represent true negative results, but it is highly likely that many of these studies were limited by many of the challenges discussed above (power, coverage, etc).
In order to address many of the limitations discussed above, particularly in regards to limited sample sizes and lack of traditional replication cohorts, researchers are successfully combining resources and establishing worldwide collaborations to support large-scale GWAS. Given the complexities of drug response phenotypes, this approach seems especially appealing in the application of GWAS to pharmacogenomics. By combining cohorts from around the globe, pharmacogenomic studies will have higher power to detect and validate response-determining variants.
The SEARCH Collaborative Group demonstrates the success of such a collaboration. The SEARCH Collaborative Group examined a rare outcome of statin therapy – myopathy, defined as markedly elevated creatinine kinase. In its most extreme form, this can result in the potentially fatal adverse effect of rhabdomyolysis, but these cases are exceedingly rare. The SEARCH Collaboration also found that myopathy was rare (~0.1%) with low dose simvastatin, so they focused their efforts on 98 cases identified in 6031 patients receiving high doses (80 mg/day) of the drug. A GWAS that studied 85 of these cases and90 controls identified rs4363657, in perfect LD with a known functional non-synonymous SNP in SLCO1B1 at genome-wide significance. The 5-year incidence of myopathy was 18% in individuals homozygous for the risk allele (2.1% of the study group), 3% in heterozygotes, and 0.6% in those with no risk allele. The result was replicated in a separate cohort of patients receiving a lower dose of 40 mg/day (relative risk 2.6 per C allele).
The success of this study illustrates several important points in the study design of pharmacogenomic GWAS. First, large collaborative samples can provide a valuable resource for collecting a critical mass of subjects with a rare phenotype. Second, rare phenotypes are sampled from the extreme tail of drug response distributions. As a result, genetic variants that influence these traits may have larger genetic effect sizes, and therefore be detectable with small sample sizes, than more common outcomes. Third, similar outcomes can sometimes be combined into a single case group. Here, in the initial association phase, definite and incipient myopathy patients were considered together. Fourth, replication of an association should take place in a similar population. In this study, the replication cohort was treated with a lower dose, 40 mg of simvastatin daily as compared to 80 mg in the initial group. We note that selecting cases from lower dose regimen for a follow up study may be preferable to the converse (i.e., higher doses in the follow up cohort) as those cases have a more extreme phenotype (by developing toxicity at a lower dose). This can limit the dilution of the association signal in the confirmatory study.
Several additional pharmacogenomics consortia have been established to evaluate a number of drug response outcomes, including the International Severe Irinotecan Neutropenia Consortium (http://www.pharmgkb.org/views/project.jsp?pId=69), and the International Tamoxifen Pharmacogenomics Consortium (http://www.pharmgkb.org/views/project.jsp?pId=63). These groups have pooled data from around the world to investigate genetic predictors of drug response with high power and comparison across global populations. While the initial work of these consortia has typically been focused on candidate/known genetic effects, they are moving towards GWAS. For example, the International Warfarin Pharmacogenetics Consortia (IWPC) (http://www.pharmgkb.org/views/project.jsp?pId=56) originally combined data for over 4000 individuals from 24 international sites, to develop and test warfarin dosing algorithms , and are currently using the cohort data for a GWAS (through the IWPC-GWAS consortium) to identify and confirm previous findings, and potentially discover novel variants that explain potential trait variation across multiple populations. Such collaborations are extremely important for rare events, such as adverse events. TheInternational Serious Adverse Events Consortium (www.saeconsortium.org)represents one important effort in pharmacogenomics for adverse events, pulling together commercial, academic, and industry partners to collect data for well-powered GWAS.
These combined datasets represent exciting resources for pharmacogenomics GWAS, but are not without challenges. Concerns with consistent data collection, storage, data-ownership issues, etc. can be concerns in these collaborative efforts.
Genome-wide association studies have proven to be an exciting tool for gene mapping in common human traits, and are demonstrating their potential in pharmacogenomic outcomes as well. As pharmacogenomic GWAS mature, there is an increased appreciation for issues that are specifically related to these unique phenotypes. Practical considerations, related to study design and available sample sizes highlight the need for creative methods of replication, beyond the traditional replication cohorts that are used for common disease genetics, and the necessity of combining samples across consortia. The complex physiology of drug response outcomes highlights the need for analytical methods that incorporate this complexity, using the wealth of information available about drug mechanisms and pathways.
The authors are grateful to the Pharmacogenetics Research Network (PGRN) Publications Committee for helpful advice. Some of the work described in this review was funded by the NIH/NIGMS Pharmacogenetics Research Network grants U01 GM61393, U01 HL65962, U01 HG04603, U01 GM61388, U01 GM63340, UO1 GM61390 and Database (U01GM61374 http://www.pharmgkb.org). This work was additionally supported by CA21765, the American Lebanese Syrian Associated Charities (ALSAC); and by CureSearch. Dr. Relling receives a portion of the income St. Jude receives from licensing patent rights related to TPMT polymorphisms.This publication [or project] was partially supported by NIH/NCRR/OD UCSF-CTSI Grant Number KL2 RR024130. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
The current article has been partially funded by U01 GM61393, U01 HL65962, U01 HG04603,U01 GM61388, U01 GM63340, UO1 GM61390 and Database (U01GM61374 http://www.pharmgkb.org). This work was additionally supported by CA21765, the American Lebanese Syrian Associated Charities (ALSAC); and by CureSearch.