|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) of antidepressant treatment outcome have been at the forefront of psychiatric pharmacogenetics. Such studies may ultimately help match medications with patients, maximizing efficacy while minimizing adverse effects. The hypothesis-free approach of the GWAS has the advantage of interrogating genes that otherwise would have not been considered as candidates due to our limited understanding of their function, and may also uncover important regulatory variation within the large regions of the genome that do not contain protein-coding genes. Three independent samples have so far been studied using a genome-wide approach: The Sequenced Treatment Alternatives to Relieve Depression sample (STAR*D) (n=1953), the Munich Antidepressant Response Signature (MARS) sample (n=339) and the Genome-based Therapeutic Drugs for Depression (GENDEP) sample (n=706). None of the studies reported results that achieved genome-wide significance, suggesting that larger samples and better outcome measures will be needed. This review discusses the published GWAS studies, their strengths, limitations, and possible future directions.
The pursuit of genetic predictors of treatment response or adverse events has been, in recent years, the focus of many investigators around the world. These pharmacogenetic studies were aimed not only at the discovery of clinically useful predictors but also at untangling pathophysiology and shedding light on mechanisms of drug action. So far, few studies have succeeded in identifying genetic markers of sufficient clinical utility that they can be used in pre-treatment pharmacogenetic tests. Perhaps the best-known example is warfarin therapy, where three polymorphisms predictive of optimal dose have been identified (Wadelius et al., 2005). The U.S. Food & Drug Administration now reminds prescribers that patients who carry certain alleles may require dose adjustment, although the cost-effectiveness of genetic testing for these alleles is not clear (Patrick et al., 2009).
Major depressive disorder (MDD), an important public health priority that is predicted to be the second leading cause of death and disability in the coming decade (Murray and Lopez, 1996), has been the focus of multiple genetic and pharmacogenetic studies. So far, candidate gene studies have yielded results that lack sufficient predictive power to be useful in a clinical setting. The candidates for these studies were derived from an understanding of depression and the mechanism of action of antidepressant drugs, both of which are quite limited. Variation in genes involved in the synthesis, transport and metabolism of amines, as well as, pharmacokinetic and pharmacodynamic targets of antidepressants have been at the forefront of these investigations (reviewed in (Drago et al., 2009)). With the development of genome-wide platforms, the implementation of a “hypothesis-free” approach that would interrogate, at least in part, all known genes became the favored strategy to uncover novel and more clinically representative candidates. This review aims at discussing the results and limitations of genome-wide association studies of antidepressant treatment in major depressive disorder and potential future directions in this field.
Three antidepressant outcome studies have implemented genome-wide approaches to detect common variation associated with antidepressant response: the Sequenced Treatment Alternatives to Relieve Depression study (STAR*D) (n=1953) (Garriock et al., 2010), the Munich Antidepressant Response Study (MARS) (n=339) (Ising et al., 2009), and the Genome-based Therapeutic Drugs for Depression study (GENDEP) (n=706) (Uher et al., 2010). These studies are very different in their design, genotyping platform used, sample size, and outcome measures. None of the studies reported findings that were genome-wide significant. Nevertheless, “top hits” were reported and interpreted in the context of their plausible biological role in antidepressant pathophysiology. Table 1 compiles the “top hits” from all three studies.
The Sequenced Treatment Alternatives to Relieve Depression study (STAR*D) is the largest antidepressant trial conducted to date. It included 4041 participants of which 1953 provided DNA samples (Garriock et al., 2010). Although STAR*D was not originally designed for pharmacogenetic studies, the thorough characterization of efficacy and tolerability, together with a single drug treatment (citalopram) at the first level provided a large sample suitable for GWAS (for a review on strengths and limitations see Laje et al., 2009). STAR*D aimed at assembling a sample that was representative of the diverse ancestry of patients in the United States. From a genetic standpoint, this requires corrections during analysis that adjust association signals to account for differences in allele frequencies between people of differing ancestry. The genotyping platform used in STAR*D, now considered obsolete, introduced a number of genotyping problems that required very strict quality control measures that discarded a large number of markers.
The results reported from the STAR*D were derived from Level 1 where all participants were treated with citalopram. Two main phenotypes were derived: response, defined by a 50% or greater reduction in the Quick Inventory of Depressive Symptomatology —Self Report (QIDS-SR) score from baseline to final visit; and remission defined by score of 5 or less on QIDS-SR at follow-up. The response phenotype included responders (n= 883) and non-responders (n=608). The remission phenotype included 743 remitters but the comparison group was limited to the non-responders (n=608).
The results reported by Garriock et al. (2010) and reviewed by McMahon (2010) identified SNPs associated with response near the Ubiquitin protein ligase E3C (UBE3C) gene (rs6966038, p=4.65e—7) and the Bone morphogenic protein 7 (BMP7) gene (rs6127921, p=3.45e—6), and a third intronic SNP in the RAR-related orphan receptor alpha (RORA) gene (rs809736, p=8.19e—6). These SNPs were also associated with remission.
The Munich Antidepressant Response Signatures study (MARS; n=339) (Ising et al., 2009) used a similar replication sample and STAR*D to validate any findings. This replication sample approach is very powerful in its ability to provide results that may be more generalizable. However, the small test sample used (n=339) may not be large enough to detect markers with small effects. MARS defined three outcome phenotypes: early partial response, derived from a 25% or greater reduction in Hamilton Depression Rating Scale (HAM-D) score after 2 weeks of treatment. The responder phenotype, that required a 50% or greater reduction in HAM-D score after 5 weeks of treatment and the remission phenotype, defined by a score of 10 or less on HAM-D after 5 weeks or before discharge. All MARS participants were inpatients, in contrast with STAR*D and GENDEP that enrolled only outpatients.
The MARS results included two markers: rs6989467 in the Cadherin-17 gene (CDH17) that were associated with early partial response (p=7.6e—7) and rs1502174 (p=8.5e—5) in the Ephrin type-B receptor 1 gene (EPHB1), associated with all three phenotypes (early partial, response and remission).
The GENDEP study (Uher et al., 2010), which was the only study expressly designed for pharmacogenetics, was also limited in the number of subjects required to detect associations with small effects (n=706). The subjects were either treated with escitalopram (n=394) or nortriptyline (n=312). This study used a percentage change in Montgomery–Asberg Depression Rating Scale (MADRS) score from baseline to week 12 as their outcome and tested for genetic association against the whole sample, the escitalopram treated cohort, the nortriptyline treated cohort and a “genotype-by-drug” effect.
GENDEP reported results of interest in two markers: rs2500535 in the interleukin-11 gene (IL11) (p=3.56e—8) associated with nortriptyline response and rs1126757 in the Uronyl 2-sulphotransferase gene (UST) (p=2.83e—6) associated with citalopram response.
At first glance, it is apparent that none of the GWAS results survive standard genome-wide correction for the number of markers and phenotypes tested. The top results also do not overlap at the level of alleles or genes. There are many possible explanations for this that we will discussed in the next sections.
One very important issue in all genetic studies is phenotype definition. In studies of antidepressant treatment outcome, there are two primary phenotypes involved: depression and antidepressant outcome. Neither one is trivial. While major depressive disorder definitions are clearly outlined in the American Psychiatric Association’s Diagnostic and Statistic Manual (DSM-IV-TR) as well as in the International Classification of Diseases (ICD-10), these definitions are designed to be reliable and relatively broad, consistent with the needs of clinicians. However, genetic studies work best with phenotypes that are not just reliable, but biologically valid and relatively narrow.
Antidepressant treatment response does not have a universally agreed definition, thus many studies conducted carry different interpretations of outcome measurement and inter-study reliability is unknown. To compound this problem, remission in MDD can occur spontaneously, placebo reductions of symptom severity up to 40% may occur (Khan et al., 2003), and it can thus be difficult to demonstrate that antidepressants are efficacious except in relatively severe depression (Fournier et al., 2010). Variable treatment adherence, medication tolerability, and many typically unmeasured variables, such as adverse life events, pose further complications. While we would ideally have quantitative, biological measures of response, we are instead forced to work with arbitrary, clinical measures.
Patients with major depression are often ill in many ways. Large studies such as STAR*D have identified clinical variables associated with poorer outcome such as frequency and severity of concomitant medical illness and/or anxious depression subtype (Trivedi et al., 2006). Other variables such as personality disorders, alcohol and drug abuse, among others have been reported as relevant to antidepressant treatment outcome (reviewed in Serretti et al., 2008). Therefore, to the extent that these factors vary across samples, very different association results may be produced. On the other hand, it might be unrealistic to conduct a study that accounts for all of these factors. The number of participants needed would be quite large, and an even larger number of potential participants would need to be screened. The generalizability of any findings would be very limited because the large majority of MDD patients who require treatment are very likely to have one or more complicating factors. The samples included in the available pharmacogenetic studies represent “typical” MDD patients, which is reasonable, but this probably has significant implications for the strength of any genetic association signals these samples produce.
An illustration of the impact of phenotype definition is offered by the famous linked polymorphic region (LPR) in the serotonin transporter gene. This gene (SLC6A4), a natural candidate for antidepressant response, is the proximal target of selective serotonin reuptake inhibitors. Multiple studies and a subsequent a meta-analysis by Serretti et al. (2007) suggested an association with antidepressant outcome, but the signal has not been universal. For example, another meta-analysis that included a larger cohort found no association (Taylor et al., 2010). Other studies (eg, Murphy et al., 2004) had suggested a role for the LPR in antidepressant medication side effects. Hu et al. (2007), in their study of the STAR*D sample, looked at both tolerability and outcome, and found an association signal only with tolerability, not with outcome. Participants who could not tolerate citalopram had a lower response and remission rate. Thus pharmacogenetic studies of antidepressant response should consider tolerability when defining outcome phenotypes.
Varying phenotype definitions will have a direct, but complex impact on statistical power. More stringent criteria or narrower phenotype definitions will likely result in a smaller number of cases to analyze, usually reducing power. However, a more selective phenotype definition may improve association signals (and thus true power) by reducing heterogeneity. There is always a balancing act between narrow phenotypes and larger sample sizes, and the ideal solution will vary from study to study.
To maximize the information extracted from smaller cohorts of patients, clinical trials have routinely used methods such as mixed models to infer or impute outcome information when some data is missing. While this method may be useful in clinical trials, pharmacogenetic studies make different assumptions. Phenotypic imputation increases the probability of case (here, responder) misclassification errors, which significantly decrease the power to detect a genetic association (Edwards et al., 2005), compared to control misclassifications. Power to detect genetic associations is influenced not only by the effect size (as in clinical trials), but also by allele frequency and marker coverage. In the next section, we will discuss genotyping issues as they relate to power to detect association in the three GWAS studies of antidepressant response published to date.
Genotyping technologies have experienced exponential growth over the past few years. In 2005 state-of-the-art technologies offered genotyping of what was then considered a large number of 100,000 makers in one run at an affordable price. Today, we can routinely genotype 2.5 million markers covering not only common variants (as the older platforms included) but also rare and copy number variants in one experiment. Due to these rapidly evolving technologies, the three published antidepressant response samples were genotyped on several different platforms, all of which would now be considered obsolescent.
The STAR*D sample used the Affymetrix 500K and 5.0 arrays, while MARS used the Illumina exon centric Human-1 and 330 arrays, and GENDEP used the Illumina 610 array. Each of these arrays contains a different number of markers and, since marker selection methods also varied, the degree to which various parts of the genome are interrogated (known as the genomic coverage) varies widely. The genotyping platforms have become more reliable as the technology has evolved, but earlier arrays have higher rates of genotyping errors that may generate spurious associations (when not well handled) and that have the effect of reducing genomic coverage (even when well handled). For example, Garriock et al. (2010) report that the arrays they used provided less than 50% genomic coverage in the STAR*D sample.
Genomic coverage is important since it has a major impact on power to detect association. Markers on genome-wide arrays are meant to sample nearby functional variation in genes. Since the markers themselves are not generally functional, the power to detect genetic association with a (ungenotyped) functional variant is directly proportional to how well the marker samples the variant. If the marker samples the variant perfectly, then power is what would be expected from sample and effect sizes. If the marker samples the variant only very poorly, however, power will remain very poor even in an extremely large sample.
In recent years, in-silico imputation of common variants has become routine in GWAS. This method implements a statistical algorithm that assigns most likely genotypes based on haplotype structure from both the patient and a reference set (Li et al., 2009). Imputation can increase the number of common markers available to over 2 million in a cost-efficient way, facilitating direct comparisons between samples genotyped on different platforms. Unlike phenotype imputation, which we have criticized above, marker imputation can be verified by direct genotyping of the sample under study.
Inadequate sample size is perhaps the most common reason for GWAS to fail. This is because the large multiple testing burden involved in a typical GWAS, combined with the modest effect sizes that are most often observed for genetic associations, requires large samples to achieve statistical significance. To put this in perspective, for a marker to achieve a corrected statistical significance of about p=0.05 in a GWAS, the uncorrected p-value must be under 5×10−8 (Dudbridge and Gusnanto, 2008). In a typical GWAS of 1 million markers, 100 will return a p-value of 10−6 or less purely due to chance.
Of the issues we have discussed so far — phenotype definition, genotyping, and sample size — inadequate sample size is often the hardest to remedy. For this reason, meta-analytic techniques that can combine results across many samples are rapidly gaining popularity in the GWAS field (de Bakker et al., 2008). Meta-analytic methods may provide a more definitive answer to the effects of common genetic variation in antidepressant treatment outcome. Any meta-analysis has to overcome significant differences between studies at the design, phenotype definition, and genotyping levels. Some of the relevant design differences include treatment setting, medications used, outcome measures, depression severity, tolerability assessment, and treatment adherence, among others.
The most difficult problems to overcome are those of phenotype definition due to the differences in information gathered and the consensus that needs to be reached to fit a phenotype definition that captures the essence of antidepressant response, and is applicable across studies. Paradoxically, due to the lack of placebo arms, spontaneous remission, and treatment intervals that may be shorter than those estimated for an antidepressant to exert its therapeutic effect, the more reliable definition to employ may be “non-response”. To maximize power it may be tempting to use definitions that capitalize on imputation of outcomes as discussed above. While this may increase the number of individuals available for analysis the loss of power due to misclassification may have an adverse effect on power to detect true associations. Well defined non-responders to one or two adequately dosed antidepressants may yield a more solid phenotype to test. In any case, findings from meta-analysis of large samples should be promising candidates for replication studies, and if replicated, for further studies devoted to characterization of genes and pathways.
Does the lack of consistent results in the published GWAS of antidepressant response reflect a lack of genetic effects? This seems unlikely, although the heritability of antidepressant response is not known, it does seem to run in families (Franchini et al., 1998), suggesting a genetic component. Larger sample sizes are clearly needed. Meta-analysis methods that include the available large samples and, potentially, some of the many smaller samples that are already available in industry, provide a possible solution. It is also possible that common genetic variation may account for only a small proportion of individual variation in antidepressant response, and that a more complete picture will come from studies that measure less common alleles, generally by means of newer arrays including rare markers and large-scale sequencing.
In light of the available evidence that psychotropics like fluoxetine may affect RNA editing (Englander et al., 2005), or the effect of non-coding RNAs on gene expression (He and Hannon, 2004) we may also need to expand our horizons to include transcriptomics and proteomics data. These methods are currently better suited for in-vitro experiments that only yield very limited information about what is happening in different areas of the living brain.
The gap between post-mortem studies and depressed patients may be bridged by neuroimaging. Novel methods that integrate genetics with neuroimaging can provide some insights into the in-vivo functional consequences of genetic variation, such as potential regulatory interactions between genes, for example, in the serotonin pathways (HTR2A and SLC6A4) (Laje et al., 2010).
Another source of regulation of antidepressant efficacy may be parent-of-origin and/or environmental factors that influence epigenetic variation. Epigenetic factors may be heritable and can influence gene expression. Recent findings suggest that antidepressants can also affect epigenetic signatures, and agents that modify epigenetic signatures can exert antidepressant effects (reviewed in Duman and Newton, 2007).
In summary, the next decades should bring significant progress in our understanding of antidepressant response and tolerability. This progress will require a significant effort in sample collection, characterization and clinical validation; genomics, transcriptomics and proteomics; and neuroimaging. For the promise of personalized medicine to be realized in the realm of antidepressant treatment, the field must produce clinically meaningful findings and not just statistically significant associations. Success could bring more personalized care, more effective medications, fewer adverse events, and a reduction in the burden of major depressive disorder for both patients and society.
Competing interests: Drs. Laje and McMahon are listed as co-inventors in patent applications filed by the NIH related to genetic markers of antidepressant treatment outcome. Under federal law the NIH is required to pay inventors a portion of any royalties the NIH receives through licenses granted under patents. Funded by the Intramural Research Program of NIMH and K99MH085098 (GL), NIH, US DHHS. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.