One very important issue in all genetic studies is phenotype definition. In studies of antidepressant treatment outcome, there are two primary phenotypes involved: depression and antidepressant outcome. Neither one is trivial. While major depressive disorder definitions are clearly outlined in the American Psychiatric Association’s Diagnostic and Statistic Manual (DSM-IV-TR) as well as in the International Classification of Diseases (ICD-10), these definitions are designed to be reliable and relatively broad, consistent with the needs of clinicians. However, genetic studies work best with phenotypes that are not just reliable, but biologically valid and relatively narrow.
Antidepressant treatment response does not have a universally agreed definition, thus many studies conducted carry different interpretations of outcome measurement and inter-study reliability is unknown. To compound this problem, remission in MDD can occur spontaneously, placebo reductions of symptom severity up to 40% may occur (
Khan et al., 2003), and it can thus be difficult to demonstrate that antidepressants are efficacious except in relatively severe depression (
Fournier et al., 2010). Variable treatment adherence, medication tolerability, and many typically unmeasured variables, such as adverse life events, pose further complications. While we would ideally have quantitative, biological measures of response, we are instead forced to work with arbitrary, clinical measures.
Patients with major depression are often ill in many ways. Large studies such as STAR*D have identified clinical variables associated with poorer outcome such as frequency and severity of concomitant medical illness and/or anxious depression subtype (
Trivedi et al., 2006). Other variables such as personality disorders, alcohol and drug abuse, among others have been reported as relevant to antidepressant treatment outcome (reviewed in
Serretti et al., 2008). Therefore, to the extent that these factors vary across samples, very different association results may be produced. On the other hand, it might be unrealistic to conduct a study that accounts for all of these factors. The number of participants needed would be quite large, and an even larger number of potential participants would need to be screened. The generalizability of any findings would be very limited because the large majority of MDD patients who require treatment are very likely to have one or more complicating factors. The samples included in the available pharmacogenetic studies represent “typical” MDD patients, which is reasonable, but this probably has significant implications for the strength of any genetic association signals these samples produce.
An illustration of the impact of phenotype definition is offered by the famous linked polymorphic region (LPR) in the serotonin transporter gene. This gene (SLC6A4), a natural candidate for antidepressant response, is the proximal target of selective serotonin reuptake inhibitors. Multiple studies and a subsequent a meta-analysis by
Serretti et al. (2007) suggested an association with antidepressant outcome, but the signal has not been universal. For example, another meta-analysis that included a larger cohort found no association (
Taylor et al., 2010). Other studies (eg,
Murphy et al., 2004) had suggested a role for the LPR in antidepressant medication side effects.
Hu et al. (2007), in their study of the STAR*D sample, looked at both tolerability and outcome, and found an association signal only with tolerability, not with outcome. Participants who could not tolerate citalopram had a lower response and remission rate. Thus pharmacogenetic studies of antidepressant response should consider tolerability when defining outcome phenotypes.
Varying phenotype definitions will have a direct, but complex impact on statistical power. More stringent criteria or narrower phenotype definitions will likely result in a smaller number of cases to analyze, usually reducing power. However, a more selective phenotype definition may improve association signals (and thus true power) by reducing heterogeneity. There is always a balancing act between narrow phenotypes and larger sample sizes, and the ideal solution will vary from study to study.
To maximize the information extracted from smaller cohorts of patients, clinical trials have routinely used methods such as mixed models to infer or impute outcome information when some data is missing. While this method may be useful in clinical trials, pharmacogenetic studies make different assumptions. Phenotypic imputation increases the probability of case (here, responder) misclassification errors, which significantly decrease the power to detect a genetic association (
Edwards et al., 2005), compared to control misclassifications. Power to detect genetic associations is influenced not only by the effect size (as in clinical trials), but also by allele frequency and marker coverage. In the next section, we will discuss genotyping issues as they relate to power to detect association in the three GWAS studies of antidepressant response published to date.