Perhaps the greatest advantage of STAR*
D from a clinical perspective is also its greatest limitation for study of pharmacogenetics. Specifically, as an effectiveness study, STAR*
D by design included patients with substantial medical and psychiatric comorbidity. Because participants were drawn from primary as well as specialty care settings, the general medical burden exceeds that of a typical clinical trial (18
). Participants could also receive concomitant treatment with medications, such as beta blockers, that could have an impact on mood or treatment response. Similarly, individuals with ongoing substance misuse were eligible for participation as long as they did not require additional treatment targeting their co-occurring conditions (19
). Consistent with other clinical trials, STAR*
D excluded patients with severe suicidality and limited the sample to nonpsychotic outpatients. Although the broad inclusion criteria greatly increase the generalizability of STAR*
D findings to clinical practice, they also introduce heterogeneity that may make genetic effects more difficult to detect. On the other hand, if one goal of pharmacogenetic testing is to develop clinically useful diagnostic tools, clinically representative populations are precisely the ones that require study.
Several other features of STAR*
D may increase sample heterogeneity and thereby diminish power to detect genetic associations. First, STAR*
D did not include detailed assessments of medication adherence at level 1 (citalopram), such as pill counts or measurement of blood citalopram levels. Failing to consider nonadherence could lead to misclassification of outcomes (for example, when poorer outcome is a result of treatment non-adherence). This concern is likely to be more than theoretical, because adherence to treatment with antidepressants is known to be poor in general practice (21
). Second, although most of the relevant sociodemographic characteristics were ascertained, such as gender, race-ethnicity, income and employment, and marital status, others, such as social support or religiosity, that have been previously linked to antidepressant outcome were not included in STAR*
). Personality disorders have been reported as potential confounders of antidepressant treatment outcome in some studies, including recent meta-analyses (23
), but not all (24
D did not assess personality traits; however, it is important to note the well-known inaccuracy of assessment of trait characteristics, such as personality disorders, in the context of depressive states (26
Another STAR*D limitation is the absence of a placebo arm at level 1. As a result, participants classified as treatment responders can be considered as two admixed populations: those whose improvement was attributable to citalopram and those whose improvement was attributable to placebo. (In fact, the placebo literature suggests that the latter group can be subdivided further—for example, by identifying those whose improvement is attributable to time or regression to the mean and those whose improvement is a result of the placebo effect.) A very simplistic response to this aspect of the design is to raise concern about the specificity of any associations: for example, might variation in genes associated with treatment response simply be linked to shorter episodes of depression regardless of intervention? It can also be argued that specificity is a second-order question—that is, “after” finding effects one could then proceed to determine specificity in other data sets. Indeed, this next step would be necessary even with a placebo arm, because there was also no active comparator at level 1.
It also bears emphasis that strategies exist to partially address the problem of placebo response. One approach applied in clinical investigations is to examine “patterns” of response that are characteristic of “true-drug” response and patterns characteristic of placebo response (27
), although more recent data cast some doubt on the utility of these parameters (30
). Alternatively, one might focus on response to next-step treatments (that is, level 2 and subsequent levels), presuming that placebo response should be greatly diminished after an initial treatment failure (31
One further limitation in the STAR*
D genetic data set is the difference between individuals who participated in the genetics study and the STAR*
D cohort as a whole, which is discussed in detail elsewhere (32
). Because the genetics portion of the study was added after study initiation, participants in that portion would be skewed toward those who entered the study later (which should not introduce bias) and those who remained in the study longer (either because of good treatment response and participation in follow-up or because of poor treatment response leading to continued participation in subsequent levels (which could well introduce bias). Moreover, a substantial literature documents that ethnic and racial groups may differ in their willingness to participate in genetics studies (33
). In general, although these distinctions should have little impact on the detection of associations in most cases, they could certainly have an impact on the generalizability of results.
Biomarkers of depression treatment outcome have been scarce. Thus there was no justification for collecting serum or whole blood in STAR*D. However, new and more sophisticated techniques may arise that could provide important information about these phenotypes. The lack of these biological materials for STAR*D participants may limit associations between genetic variations and their function.
D has some limitations for investigation of tolerability outcomes that bear consideration. In particular, the primary measure of adverse effects by bodily system—the Patient-Rated Inventory of Side Effects—was not administered at study entry. Therefore, subsequent reports of adverse effects cannot be distinguished from preexisting symptoms. This consideration was apparent in analyses of sexual symptoms, where it was impossible to determine whether these symptoms were truly treatment emergent (34
). To circumvent this problem, one approach is to consider only adverse effects not present at the initial postbaseline visit. Alternatively, some potential adverse effects, such as insomnia, can be identified by using items on ratings scales that were completed at baseline (Laje G et al., unpublished data).