|Home | About | Journals | Submit | Contact Us | Français|
Following the drug-approval process, concerns remain regarding the safety of new drugs that are introduced into the marketplace. In the case of rare adverse events, the number of subjects that are treated in randomized controlled trials is invariably inadequate to determine the safety of the new pharmaceutical. Identifying safety signals for new and/or existing drugs is a major priority in the protection of public health. Unfortunately, design, analysis, and available data are often quite limited for detecting in a timely fashion any potentially harmful effects of drugs. In this review, we examine a variety of approaches for determining the possibility of adverse drug reactions. Our review includes spontaneous reports, meta-analysis of randomized controlled clinical trials, ecological studies, and analysis of medical claims data. We consider both experimental design and analytic problems as well as potential solutions. Many of these methodologies are then illustrated through application to data on the possible relationship between taking antidepressants and increased risk of suicidality.
Although premarketing clinical trials are required for all new drugs before they are approved for marketing, with the use of any medication comes the possibility of adverse drug reactions (ADRs) that may not be detected in the highly selected populations recruited into randomized clinical trials. A primary aim in pharmacovigilance is the timely detection of either new ADRs or a relevant change in the frequency of ADRs that are already known to be associated with a certain drug that may only be detected in more typical clinical populations because of their greater range of illness severity and more comorbid illness and use of other medications (14). Moreover, less common ADRs will require larger populations to be detected. Historically, pharmacovigilance relied on case studies such as the Yellow Card system in Britain and case control studies (41). The Uppsala monitoring center reports (http://www.who-umc.org) and classic Venning publications also highlighted the importance of individual case reports for signal detection (59).
More recently, several large-scale postmarketing studies have been designed to detect ADRs. However, these studies are often unrepresentative of the potential users of a drug, have incomplete data, have short follow-up, and have inadequate sample size for rare ADRs. Furthermore, a control group (i.e., patients suffering from same disease but undergoing no active treatment) is often unattainable except in very special circumstances (14).
In the following sections, we review a variety of approaches (Table 1) for studying ADRs ranging from spontaneous reports to ecological studies to analyses of medical claims databases. Our review focuses on both design and analytic issues, highlights strengths and limitations, and is illustrated with examples of the relationship between antidepressant use and increased suicidality. We begin our discussion with perhaps the weakest data: spontaneous reports of AEs, which are subject to numerous sources of bias (under-reporting, media attention effects, poor quality of the data in terms of large amounts of missing data on relevant demographic characteristics, and duplication of reports). Next, we move to ecological data, which have the benefit of often covering an entire population of interest and therefore permit analysis of extremely rare events (e.g., child suicide) but are limited by the fact that we do not know if the same individual who experienced an AE is the same individual who took a particular medication. Next, we consider meta-analysis of randomized controlled trials (RCTs). This is a favorite approach of the psychopharmacological division of the U.S. Food and Drug Administration (FDA) and involves pooling information from multiple RCTs, typically placebo controlled. The obvious advantage of this approach is that it enjoys the scientific and statistical benefits of randomization; however, there are numerous limitations, including entrance criteria that may exclude those patients at highest risk of the AE (e.g., suicidality), small sample sizes consisting of patients monitored for short time periods, and ascertainment biases that are associated with a focus on spontaneously reported AEs. Finally, we review various approaches to the design and analysis of studies that are based on large-scale medical claims databases. Medical claims data are often based on large enough samples to evaluate all but the rarest AEs. Furthermore, they are more generalizable to routine practice than RCTs are because they do not have exclusion criteria beyond that which gets one into the specific health care system in the first place. The limitation is that these studies are not randomized; therefore, results can be biased owing to factors such as confounding by indication, in which patient characteristics lead to treatment of a particular type and it becomes difficult to disentangle the effects of treatment from the characteristics of the patients that lead to treatment.
Most ADRs are the product of clinician observation or patient self-report. Recently, the World Health Organization (WHO) and the U.S. FDA have used automated methods to mine spontaneous report databases. Modern data mining combines statistics with ideas, tools, and methods from computer science, machine learning, database technology, and other classical data-analytical technologies (25). In the context of drug safety, the objective is to detect local structures or patterns and to determine if they are inconsistent with chance occurrence. Patterns are usually embedded in a mass of irrelevant data. Interesting patterns can arise from artifacts of the data-recording process or from genuine discoveries about underlying mechanisms. Therefore, deciding whether a pattern is interesting requires knowledge from experts to understand exactly what is being described. The increasing number of large databases maintained by various regulatory agencies and pharmaceutical companies around the world provide the opportunity for some novel exploration in post-approval drug safety.
The WHO has the largest international database of case reports of spontaneous adverse events (AEs). The U.S. FDA introduced the MedWatch program in June 1993 to expand reporting of suspected ADRs in the Adverse Event Reporting System (AERS), which contains more than two million reports of suspected ADRs. Similar databases exist in various European and other countries including India, China, Taiwan, and Iran. Data-mining algorithms (DMAs) have been developed to screen large spontaneous reporting system (SRS) databases for statistical dependencies between drugs and AEs in hope of improving the ability to identify ADRs. Each of the signal-detection methods has distinguishing features, and no single method is suitable for all circumstances.
Several statistical methods have previously been suggested for postmarketing safety surveillance. Hauben & Zhou (29) grouped these methods into two categories: numerator-based methods and denominator-dependent methods; the former category makes no adjustments for the population at risk (e.g., number of prescriptions sold for each drug), whereas the latter category makes direct or indirect adjustments. Spontaneous reporting centers and drug safety research units routinely use various numerator-based methods such as empirical Bayes screening (EBS) (a variant of which is used by the FDA), Bayesian Confidence Propagation Neural Network (BCPNN used by WHO), and proportional reporting ratio (PRR). Denominator-based methods include Cumulative Sum, Time Scan, and Poisson methods.
Proportional Reporting Ratio (PRR) is the simplest method available for signal detection. Its computational form is similar to the well-known Relative Risk calculation for 2 × 2 tables in epidemiology (15). It is the ratio of the number of reports of a specific AE to all AEs for a particular drug. Of concern is that large numbers of AE reports of a particular kind effectively inflate the denominator for that drug and thereby reduce sensitivity for detecting other signals associated with that drug. PRRs have large numbers of false-positive signals because they provide no adjustment for multiple comparisons.
Bate (2) and his colleages (3) developed Bayesian Confidence Propagation (BCPNN), which can handle large data sets and is robust to missing data. It is based on identifying relationships between a drug and AE that differ significantly from the background interrelationship in the database. The information stored as the weights in BCPNN is used for quantifying drug-ADR dependencies. The algorithm computes the information component (IC) and its interval estimates between specific drugs and AEs present on the same report. To detect a signal based on the IC of drug-event association, the analyst performed sequential time scans of the database. An IC with a lower 95% CI > 0 that increases with sequential time scans is the criterion for signal detection.
EBS (11) computes the baseline (expected) frequency under a row (drug) and column (event) independence assumption for multiple two-way tables. If the drug and event are independent, the proportional representation of that event for a specified drug should be the same as the proportional representation of that event in the entire database. Three distance measures—(a) relative risk, (b) logP, and (c) the geometric mean of the posterior distribution of the true relative reporting ratio—were used to rank drug-event frequencies according to their magnitude (27). The likelihood function assumes each observed count is a draw from a Poisson distribution with varying unknown means with a common prior distribution: a mixture of two gamma distributions. The U.S. FDA currently uses a Multi-Item Gamma Poisson Shrinker (12), which is a variant of the Gamma Poisson Shrinker (GPS). The MGPS algorithm computes signal scores for pairs and for higher-order combinations of drugs and events that are significantly more frequent than their pair-wise association would predict (55).
The Cumulative Sum (CUSUM) method is based on the cumulative sum of differences between observations and their expected values. A signal is detected if the signal statistics exceed the threshold value (53). The threshold is determined by average run length (ARL) based on the mean and variance of the background incidence. The method requires a background comparison time interval and may therefore limit the timely identification of safety problems.
The Poisson method is a denominator-based method that requires some estimate of the population at risk (e.g., national prescription rates). Gibbons et al. (21) developed a random-effects Poisson regression model for the simultaneous analysis of large numbers of spontaneous reports of possible ADRs. Parameters are estimated using marginal maximum likelihood, and individual drug-AE rate ratios are estimated using either empirical Bayes or parametric or nonparametric full Bayes methods. Confidence (posterior) intervals that do not include 1.0 provide evidence for either significant protective or harmful associations of the drug and the AE. The Veterans Administration (VA) has recently developed an SRS that has denominators based on all outpatient prescriptions, making denominator-based methods even more attractive (F. Cunningham, personal communication).
Yang et al. (62) applied PRR, a variant of PRR based on the odds ratio [relative odds ratio (ROR)], and MGPS to the FDA AERS database through the second quarter of 2007. They concluded that PRR and ROR identify 100% and 99.8% of the MGPS signals, respectively, and signals from both algorithms overlap by ~95%. MGPS identified only 30% of PRR and ROR signals. However, MGPS reduced signal review process by ~70% at the risk of missing potential signals (62). At GlaxoSmithKline, MGPS and PRR were evaluated, and based on the properties of these two methods, they chose MGPS over its competitor. They have also developed a user-friendly and flexible interface that can perform disproportionality analysis in real time (1). Shalviri et al. (49) compared the performance of PRR, ROR, and BCPNN in the Iranian Pharmacovigilance Center and concluded that PRR and ROR detected all the signals that were detected by BCPNN as well as those that were not detected by BCPNN. However, the proportion of serious signals detected by BCPNN appears to be higher (49), which suggests that BCPNN has fewer false positives. Puijenbroek et al. (40) examined the concordance of the various estimates with the WHO-adopted BCPNN and found that different measures are broadly comparable when four or more cases per combination have been collected (40). There is not one algorithm that is the best for all situations.
SRS data have numerous limitations. These include (a) confounding by indication (i.e., patients taking a particular drug may have a disease that is itself associated with a higher incidence of the AE (e.g., antidepressants, depression, and suicide), (b) systematic under-reporting, (c) questionable representativeness of patients, (d) effects of media publicity on numbers of reports, (e) extreme duplication of reports, and (f) attribution of the event to a single drug when patients may be exposed to multiple drugs. Also, nearly all the AERS analyses now being used fail to account in some way for the number of prescriptions for each drug. Finally, spontaneous reports do not reliably detect ADRs that occur widely separated in time from the original use of the drug (5). Nowhere is this more problematic than in detecting the effects of drugs on the fetus or long-term effects such as on malignancy in patients who take immunosuppressant medications or experience pulmonary hypertension and cardiac valvular effects from fenfluramine. These limitations can degrade the capacity for optimal data mining and analysis (28). Despite these known limitations, these are the data on which regulatory agencies around the world primarily rely for the purpose of postmarketing surveillance. Fortunately, other methods can provide useful information that is complementary to that produced using existing methods. In the following sections we discuss alternatives to the use of spontaneous reports for identifying ADRs.
For very rare events (e.g., death by suicide) that occur at rates on the order of 1 in 10,000 or less, there may be few options for routine drug surveillance. One approach is to use ecological data that relate changes in drug prescription rates to AE rates. These more global associations do not support causal inferences, but the availability of large denominators and close-to-complete enumeration of events such as suicides can generate hypotheses and help support inferences drawn from other studies. In some cases, natural experiments, such as black box warnings, which the FDA placed on antidepressants because of their potential link to suicide attempts (24), provide an opportunity to evaluate the positive or negative consequences of decreased access to the drug resulting from the event of interest. Here we compare national rates of the AE before and after the public health warning to determine if the warning has had the anticipated effect. If the warning is specific to a stratum of the population, comparison of changes in that stratum versus those for which the warning did not apply provides stronger inferences.
The traditional approach to ecological analysis typically involves either log-linear or Poisson regression analysis of rates over time using exposure based on prescription rates during the same time period. Serial correlation can be accommodated using Huber-White robust standard errors, which allow for an arbitrary autocorrelation pattern (31, 60, 61). Where data from multiple countries are combined, both fixed-effects (34) and random-effects models (30) can be used to allow each country to have its own linear time trend. For the fixed-effects log-linear model, weighted least squares can be used to adjust for heterogeneity in the residual errors using each country’s population as a weight. A similar approach can be taken using a mixed-effects Poisson regression model where each country’s population is used as an offset (see Reference 30).
A more informative approach was suggested by Goldsmith et al. (22) in which AE rates are stratified by demographic characteristics such as age, race, and sex within counties, and a mixed-effects Poisson regression model is used to analyze the data, treating the county as the unit of analysis. County population is used as an offset in the Poisson regression model. To evaluate drug-AE interactions, county-level prescription rates are added to the model to determine if changes in prescription rates are associated with changes in the AE rate. When longitudinal data are available, between-county effects and within-county effects can be uniquely estimated by expressing prescription rates as two variables, one for the mean over time (between) and the other for the yearly deviations from the mean (within). The methodology has been described in detail by Gibbons et al. (19, 20).
Prior to drug approval and/or release of a new drug, a series of RCTs are conducted that include frequencies of spontaneously reported adverse events. Whereas an individual study typically has an insufficient sample size to detect a statistically significant drug-AE relation, synthesis of data from several trials may provide a more powerful statistical inference. In many cases, the research synthesis is performed using meta-analysis. For example, given concerns about the safety of drug-eluting stents, Stettler et al. (54) conducted a meta-analysis of 38 trials in 18,023 patients comparing sirolimus-eluting stents, paclitaxel-eluting stents, and bare-metal stents. A brief overview of statistical methodologies commonly used in meta-analysis of binary outcomes follows. It is important to note that owing to variability in (a) treatment outcomes, (b) indication for treatment, and (c) mode of treatment, meta-analysis is hypothesis generating and does not itself provide a causal inference.
The Mantel-Haenszel (MH) method assumes a fixed effect and combines studies using the inverse variance of the study-specific odds ratio to determine the weight given to each study. It was originally developed to analyze odds ratios, but it has been extended to include other measures. The basic idea for a fixed-effects model involves the calculation of a weighted average of the treatment effect across all the eligible studies. The MH test assumes that the odds ratio is equal across all studies.
DerSimonian & Laird (DL; see Reference 10) provide an estimate of the combined effect of multiple trials incorporating heterogeneity across studies. They assume a normal distribution for the treatment effects across studies with common mean θ and variance τ2. This method provides a simple noniterative way to compute the heterogeneity parameter τ2 and adjusts the weight given to each study for the estimated heterogeneity across studies. This method is more generalizable than the MH test, which assumes a common effect across all studies. However, a number of simulation studies (35, 50, 51) have shown that the heterogeneity estimate has a large negative bias, leading to a biased estimate of the pooled treatment effect, as well. In addition, the Q statistic that is used to test heterogeneity has low power to detect departure from homogeneity (26, 37).
Another approach to account for heterogeneity while combining results across trials is mixed-effects logistic regression. The estimation procedure, marginal maximum likelihood (MML), used is somewhat complicated; however, several standard statistical packages are available to perform these types of analysis. The statistical properties of estimates of treatment effect and between-study variance using this method are well established. Hedeker & Gibbons (30) provide detailed theoretical treatment of mixed-effect logistic regression. In addition to unbiased estimates, it also allows trial-level covariates in the analysis. Therefore, if needed, it is possible to relate the size of effect to one or more characteristics of the trials involved. Finally, unlike the previous two methods, studies with zero events can be included in the analysis, and there is no need to add a constant to studies that have a single arm with no events (i.e., continuity correction).
An extension of the use of random effects in meta-analysis to incorporate heterogeneity of effects across studies is the use of discrete mixtures of random effects. Among other effects, discrete mixtures of random effects can capture multimodality of intervention effects, which may be harmful in some circumstances and beneficial in other contexts or vary over time (7).
The use of medical claims data for postmarketing drug surveillance offers several advantages. First, medical claims represent person-level data, similar to RCTs and spontaneous reports, but unlike spontaneous reports, we know the population at risk. Second, several medical claims databases such as the VA or PharMetrics databases contain longitudinal information on AEs, concomitant medications, and comorbid diagnoses both before and after the drug exposure. Third, the populations that can be sampled are often large enough to study even the rarest of events. Their primary limitation is that they are observational, and any association identified may or may not represent a causal link between the drug and the AE because of potential unmeasured confounding. The primary objective is to design an observational study such that many of the benefits of a RCT are preserved.
Cases are defined as patients who have experienced the AE of interest, and controls are similar to the cases but have not experienced the AE. The goal of the analysis is then to compare the rate of drug exposure between cases and controls. If a significant difference is identified, then there is evidence of an ADR. In some cases, propensity score matching (44) can be used to identify controls that are matched in probability on a large number of potential confounders to the cases. More often, however, the cases and controls are matched on a smaller set of observable characteristics (e.g., age and sex), and other potential confounders are included as covariates in the analysis. A major limitation of case-control studies in drug surveillance is that the available potential confounders are inadequate for matching the cases and controls in terms of severity of illness. As such, the resulting comparison may still represent confounding by selection (i.e., sicker patients are more likely to be treated and exhibit the AE).
A cohort study identifies a sample from a well-defined population according to predetermined criteria. In drug surveillance, investigators use two general approaches to conduct a cohort study. The cohort can be defined in terms of an illness (e.g., cardiovascular disease) or based on an exposure: for example, all patients taking a particular drug, both within a given timeframe. In some cases, we may wish to identify new cases of an illness or an exposure in patients who have neither been diagnosed with the illness nor treated for the illness for quite some time (e.g., several years). This strategy works well for databases with long-term enrollment patterns such as the VA but may be less ideal for managed-care databases where patients may not be continuously enrolled for long enough periods of time. In this case, the cohort study can be designed to have a fixed time window before and after the indication (either diagnostic date or first treatment date), for example, a period of one year before and after the indication. The primary advantage of collecting data before and after diagnosis or drug exposure is that we can evaluate the rate of the AE both before and after the start of treatment. If the drug is producing the AE, then rates should generally be higher following initiation of the drug compared with before initiation.
Within-subject designs are those in which the same patients are repeatedly measured over time, typically before and after initiating drug treatment. The basic idea is to compare the rate of a nonfatal AE before and after exposure to the drug. The strength of the design is that it is restricted to only those patients who ultimately take the drug, thereby minimizing selection effects. A good example is a set of studies of lithium and suicide risk (56). However, if patients with the disease are at higher risk of the AE in general (e.g., depression and suicidality), then a limitation of the design is that the natural course of the disease (e.g., decrease in the severity of depressive symptomatology over time) can become confounded with the pre-post nature of the design. In some cases, the emergence of the AE may even lead to treatment. For example, a suicide attempt may lead to identification of the depressive disorder, which may in turn lead to treatment. Using the regression-to-the-mean effect alone, we would expect the AE on average to decrease, and this decrease could incorrectly be attributed to a protective effect of the drug. Fortunately, it is unlikely that such regression effects would mask an adverse drug effect. However, when the indication for treatment is related to the likelihood of the AE, one must take great care in properly adjusting for the natural course of the disease (e.g., person-time logistic regression).
Within-subject designs are also useful for understanding the effects of pattern or intensity of exposure on the AE rate. For example, breaking the surveillance time into discrete intervals (e.g., months) allows the analyst to explore the temporal association between drug exposure and the AE. This method permits simultaneous evaluation of both within-subject and between-subject effects and affords the investigator a finer view of drug-AE interactions, which can also adjust for the natural course of the illness.
Between-subject designs involve comparing patients who took the drug with those who did not. Often it is most useful to compare monotherapy with no therapy, at least with respect to the other drugs within the relevant class of drugs. Concomitant drug therapy with related classes of drugs can be included in the statistical model as covariates or as a separate comparison group. The importance of considering monotherapy is that patients using multiple drugs within the same class may be of greater initial severity and/or treatment resistant and may in general have higher rates of AEs. The primary limitation of between-subject designs is that they are subject to confounding by indication and/or severity of the indication. In general, more severely ill patients will be treated with pharmacotherapy; therefore, we would expect them to have a greater incidence of AEs related to the severity of illness. Various matching strategies, such as propensity score matching, described in the following section, are helpful for reducing bias; however, these techniques can be done only to the extent that the potential confounders are available and measurable. The need to adjust for potential confounders in observational studies is yet another reason why it is so important to obtain data on AEs, concomitant treatment, and concomitant diagnoses prior to treatment with the drug of interest and to include them as covariates in between-subject analyses.
A final complication of between-subject designs is time. Assume that the cohort is defined in terms of an index diagnosis (e.g., depression) and that the likelihood of the AE changes with time from the index episode. In a naturalistic study, the initiation of treatment may not coincide with the diagnosis, leading to a possible further confound between treated and untreated patients. To solve this problem, we can also match treated and untreated patients in terms of the timing of treatment in addition to other demographic and prognostic factors. However, the untreated patients do not have a time of treatment. In this case, we must match pairs of treated and control patients on other potential confounders and then match each pair in terms of time of risk based on the treated patient (i.e., start the surveillance period for the matched pair at the time of treatment for the treated patient). In this way, confounding between time from index diagnosis and treatment initiation is no longer a factor in the comparison of control and treated patients.
Propensity score matching (44) provides a way to balance cases (treated) and controls when numerous potential confounders have been measured. The first step is to develop a model for the probability of being treated with a certain drug (i.e., the propensity score) based on all measured confounders. Then, matching on the propensity score rather than on each confounder can provide balance between the groups, leading to between-group comparisons with greatly reduced bias. Equally important, matching on the propensity score can also identify the presence of bias that is inherent in the particular between-group comparison of interest that cannot be controlled through simple adjustment. For example, in the now-classic example of relationship between smoking and mortality (9), pipe smoking is seen to have the highest risk of mortality, not cigarette smoking. However, pipe smokers were on average older than cigarette smokers, so the actual comparison was potentially between a 30-year-old cigarette smoker and a 70-year-old pipe smoker. Stratification on age produced an unbiased comparison (at least with respect to observed covariates) that showed a clear effect of cigarette smoking on mortality, above that for both cigar and pipe smoking throughout the life cycle. Propensity score matching is therefore a multivariate extension of the single univariate stratification procedure described by Cochran (9).
A limitation of using propensity score matching in drug surveillance studies is that medical claims data often have limited information regarding potential confounders. Good choices include demographic variables, diagnostic variables, and concomitant medications, as well as comorbidities. In some cases, longitudinal medical records can be used to identify whether there is a history of related AEs prior to exposure to the drug of interest. The key here is to adjust for the severity of illness, which may be related to both the use of the drug and the AE.
Consider two different pharmacologic treatments given for the same or similar clinical indications. Although the observed response of each treatment relative to no treatment may be subject to confounding by indication, the relative effect of these two treatments may be less biased by unmeasured prognostic factors (called generic biases) that promote the use of either treatments (48).
As an illustration, there has been concern at the FDA that selective serotonin reuptake inhibitors (SSRIs) may be causally linked to suicidality (ideation, behavior, and completion) (57, 58) in children and in young adults. To study this notion using differential effects, we can contrast the frequency of suicide attempts in depressed patients who received either an SSRI or an older tricyclic antidepressant (TCA), but not both, adjusting for known prescribing predictors such as age. In parallel, the frequency of suicide attempts is then also contrasted for patients who received SSRI + psychotherapy or TCAs + psychotherapy, but not both. If SSRIs are stimulating suicide attempts, we would expect to see an excess of suicide attempts in many, if not most, comparisons of SSRIs with other treatments for depression. Otherwise this pattern is compatible with confounding by indication. Rosenbaum (48) describes sensitivity analyses and adjustment for additional covariates using differential propensity score adjustment.
The term coherence is used both informally and formally, and both formal and informal definitions are relevant in the current context. Informally, we may predict that a treatment will produce several observable associations, and there is coherence if each prediction is checked and confirmed. Formally, we may devise a measure of how much the outcomes of a patient resemble those predicted for the treatment in question and use that measure of coherence as an outcome in the analysis (45–47). In drug surveillance, examples of relevant predicted associations are as follows:
Rosenbaum (46, 47) has suggested methods for creating a coherence score that measures the degree to which these predictions hold and how it can be used as an outcome to study its sensitivity to unobserved biases.
Logistic regression can be used when interest is restricted to the first AE, whereas Poisson regression can be used when the focus of analysis is on multiple AEs within a given timeframe. Fixed-effects models are used for between-subject comparisons that consider a single drug and a single AE. For simple within-subject comparisons, a conditional logistic or Poisson regression model can be used (4), the parameters of which can be estimated using generalized estimating equations. Hedeker & Gibbons (30) provide a general overview of these models.
When the data are clustered and/or longitudinal, mixed-effects logistic or Poisson regression models can be used (30). An example of where mixed-effects models are of particular importance is when multiple drugs and/or AEs are considered simultaneously. Here, the unit of clustering is the drug-AE interaction. We may compare the rate of each AE for each drug before and after exposure in an overall analysis, where time (pre- versus post-exposure) is treated as a random effect in the model. Empirical Bayes estimates of the random time effect for each drug-AE interaction and corresponding posterior variances can be used to construct confidence intervals that can, in turn, be used to screen large numbers of drug-AE interactions simultaneously. Similar approaches can be used for between-subject comparisons (nested within drugs and AEs), and coherence between within-subject and between-subject results (i.e., drugs for which both approaches reveal similar findings of increased risk) can be used as a guide for selecting drug-AE interactions that are of concern and are in need of further study.
Person-time logistic regression (13, 18) allows us to use drug exposure as a time-varying covariate in estimating the hazard rate of an AE on a month-by-month basis. Unlike the previous analyses during which exposure was considered constant from the time of treatment initiation through the end of the follow-up period, in this analysis, treatment is evaluated on a month-by-month (or any other fixed time window) basis. This analysis combines patients who did not take the drug with nonmedicated months for patients who did take the drug and compares them with active treatment months. This analysis determines the effects of duration and pattern of exposure on our overall conclusions. This model also adjusts for month, which allows the risk of the AE to decrease (or increase) over time. The general algorithm is as follows:
The proportional hazards model assumes that the effect of the medication is constant over time, whereas the non-proportional hazards model allows the effect of the medication on the AE to vary over time.
Worldwide, approximately one million people commit suicide annually. In the past 25 years, ~750,000 people committed suicide in the United States, and suicides outnumbered homicides by a ratio of at least 3:2. Deaths from suicide in the United States exceeded deaths from AIDS by 200,000 in the past 20 years (22). The estimated cost to the nation in lost income alone is $11.8 billion annually. Nonetheless, suicide is a rare event with an annual rate in the United States of 12/100,000, making it an extremely difficult if not impossible phenomenon to study using conventional approaches (see 8 and 22 for reviews of design, sample size, and statistical/methodological issues related to suicide research).
Gibbons et al. (21) analyzed data from the FDA AERS from 1998 to 2004 for all antidepressants and completed suicides. The dataset consisted of a total of 28,317,382 records, which included all reported AEs and drug combinations. The denominator for the analyses was based on national prescription rates for each antidepressant by year.
Figure 1 presents a plot of the empirical Bayes (EB) rate multiplier estimates and their confidence limits for each drug for all ages combined.
Figure 1 reveals that, as a class, SSRIs (drugs 1–5) and serotonin and norepinephrine reup-take inhibitors (SNRIs; drugs 6–9) have rate multipliers that are significantly less than 1.0, which is lower than the national average suicide AE report rate for antidepressants. By contrast, as a class, TCAs (drugs 10–18) have rate multipliers that are significantly higher than the national average suicide rate for antidepressants. TCAs have significantly higher risk of suicide AE reports as compared with SSRIs and SNRIs, although these data do not support a causal inference because of potential confounding by indication. This finding is striking because one might predict an increase in reports related to SSRIs given the highly publicized concern over a possible link between suicide and use of SSRIs.
The recent decrease in suicide rate over time correlates with increased antidepressant use in Europe (33, 36, 38, 42, 43), Scandinavia (32), the United States (39), and Australia (23). Doubling of prescriptions for serotonin reuptake inhibitors (SSRIs) correlated with a 25% decrease in the suicide rate in Sweden (32). In an analysis of 27 countries, Ludwig & Marcotte (34) showed that an increase of one pill per capita (a 13% increase over 1999 levels) was associated with a 2.5% reduction in suicide rates, a relationship that was more pronounced in adults than in children.
Ecological modeling from large numbers of small areas can provide a stronger basis for understanding the association between antidepressant medication utilization and suicide completion. Gibbons et al. (19) obtained U.S. county-level data on suicide rates and antidepressant prescription rates for 1996–1998 and analyzed their relationship using mixed-effects Poisson regression adjusted for sex, race, age, income, and unobservable county-level effects. Results of the analysis revealed that increases in SSRI and SNRI prescriptions were associated with decreases in suicide rates both among counties and within counties over time. Conversely, counties with higher rates of TCA prescriptions were associated with higher suicide rates, which may be a function of their greater toxicity upon overdose and/or their increased use in areas with poorer access to quality mental health services. Subsequently, Gibbons and coworkers (20) replicated these findings in children and young adolescents (aged 5–14 years).
Gibbons et al. (16) showed that there were significant reductions in antidepressant prescriptions in children following the public health advisory in March of 2004 and record increases in youth suicide rates during the same time period. By contrast, in older adults aged 60 and older, the rate of antidepressant treatment continued to increase and the rate of suicide continued to decrease. In 2005, increases in youth suicide rates continued to be significantly above projected levels from pre-2004 trends, as well (6).
Hamad et al. (24) reported data on suicidal ideation and behavior for 24 pediatric antidepressant trials (n = 4582). This study is the key analysis that led to the black box warning in the United States. Only 20 of the studies were used in this analysis because 4 studies had no events in drug or placebo arms; none of the studies reported any suicides. We have reanalyzed these data using MH, DL, and MML methods. Estimated odds ratios and probability values for suicidal ideation or behavior as reported in the clinical records were OR = 1.9967 (p < 0.0018) for MH, OR = 1.8013 (p < 0.0139) for DL, and OR = 1.5599 (p < 0.1692) for MML, the first two producing a significantly higher rate in those randomly assigned to an antidepressant relative to those assigned to placebo. Note that the MML method that includes all 24 studies and directly incorporates study-to-study variability into the overall treatment effect does not reveal a significant treatment effect. The differences among the three methods appear to be related to the random effect variance estimates, which are (a) fixed at zero for MH, (b) estimated at 0 (p < 0.8784) for DL, and (c) estimated at 0.60 (p < 0.1025) for MML. Although not significant, the estimated treatment effect variance of 0.6 is, in fact, quite large, indicating considerable heterogeneity across the 24 studies.
Gibbons et al. (17) studied 226,866 veterans with a depressive diagnosis in 2003–2004, with at least six months of follow-up, and with no history of depression or treatment from 2000–2002. They compared suicide attempt rates between patients receiving SSRIs, SNRIs, and TCAs versus patients not receiving antidepressant therapy. They also compared suicide attempt rates before and after initiating antidepressant therapy in the same individuals. Age-specific analyses were also performed.
The overall rate of suicide attempts for patients following initiation of SSRI treatment was 364/100,000. This group includes patients who were treated with an SSRI alone or in combination with a TCA and/or an SNRI. By comparison, the rate of suicide attempts for all other patients was 1057/100,000, nearly three times this rate. This comparison group includes depressed patients who were not treated as well as those treated with an SNRI and/or a TCA. The overall odds ratio for the comparison of SSRI treatment versus all other treatment or nonantidepressant treatment was OR = 0.34 (CI = 0.31–0.38), p < 0.0001. More specifically, comparison of those not treated with any antidepressant (335/100,000) to those treated with an SSRI alone (123/100,000) also revealed a statistically significant association between SSRI treatment on decreased rates of suicide attempts (OR = 0.37; CI = 0.29–0.47, p < 0.0001).
Comparison of the rate of suicide attempts before and after treatment with an SSRI only revealed that the rate of suicide attempts was significantly lower following treatment (123/100,000) than before treatment (221/100,000), RR = 0.56; CI = 0.44–0.71, p < 0.0001.
Similar effects were found for SNRIs and TCAs. Overall, on SSRIs with concomitant medications, regardless of whether it was given with a TCA and/or an SNRI, no significant difference was found in suicide attempt rates before (402/100,000) versus after (363/100,000) SSRI treatment (p < 0.11), although the direction of the effect remained protective for treatment with SSRIs. This latter example is a differential effects analysis. Similar results have been reported by Simon et al. (52).
Using the same VA cohort, a person-time logistic regression analysis was conducted using a 12-month follow-up period from the index depressive episode. Comparison of the suicide attempt rate was restricted to patients with SSRI treatment (monotherapy) and patients with no medication treatment, and covariates included sex, age, race, and previous suicide attempts (prior to diagnosis). The person-time analysis revealed a significant decrease in suicide attempt rate for patients with SSRI monotherapy treatment [hazard ratio (HR) = 0.17, CI = 0.10–0.28, p = 0.0001]. This result compares favorably with the observed data, for which there were a total of 1,134,173 observations (i.e., number of months). Of these, 805,525 months were without medication and 328,648 months were with medication. The corresponding number of suicide attempts was 207 (0.026%) and 17 (0.005%), respectively, yielding an observed HR = 0.19 (estimated HR = 0.17).
Overall, the suicide attempt rate decreased with time from the index episode (see Figure 2a). Figure 2b provides the estimated hazard functions for the non-proportional hazards model. A significant interaction was found, which indicated that the magnitude of the difference between SSRI monotherapy and no antidepressant treatment groups decreased over time (p < 0.0001). Figure 2b reveals that for the non-proportional hazards model the difference in hazard rates is largest early in treatment (HR = 0.08) but decreases by ~30% per month (HR = 1.29 for the drug-by-month interaction), such that the monthly hazard rates are essentially equivalent by ~9 months following the index episode. Figures 2a and 2b do not support the conclusion that antidepressants are causally linked to suicide attempts in adults, and if anything, the effects are protective, particularly early in the course of treatment when suicide attempt rates are highest.
We have shown that a rich arsenal of experimental and statistical methods exist for the analysis of postmarketing drug surveillance data. Each method has its own strengths and weaknesses, and it is only through coherence of findings that reasonable scientific inferences can be drawn. These analyses are further complicated by rarity of the AE and for those AEs that are related to the indication for treatment. To illustrate the use of several of these methods, we have provided an overview of some of the existing literature on the relationship between antidepressants and suicidality using many of these methods and have included some new analyses that have not previously been performed to illustrate some of the newer methods described here. Overall, the data do not support a risk of suicidality (ideation, attempts, or completion) in adults, and if anything, the data suggest protective effect. There are conflicting data on the relationship between suicidal thinking and behavior and antidepressant use in children, but the available data to date do not support the hypothesis that antidepressant use leads to increases in youth completed suicide rates.
The purpose of this review is to stimulate further experimental and statistical developments in this critically important area, and we hope that the review lays the foundation for improved applications to identifying new drug-AE interactions.
This work was supported by NIMH grants MH062185 (J.J.M.), MH8012201 (R.D.G. and C.H.B.), and MH40859 (C.H.B.) and AHRQ grant 1U18HS016973 (R.D.G.).
R.D.G. has served or is currently serving as an expert witness for the U.S. Department of Justice and Wyeth and Pfizer Pharmaceuticals on cases related to antidepressants and antiepileptic drugs and suicide. J.J.M. has received research support from GlaxoSmithKline and Novartis for unrelated brain-imaging studies and has served as an unpaid adviser to Eli Lilly and Lundbeck Pharmaceuticals. C.H.B. directed a suicide prevention program at the University of South Florida that received funding from JDS Pharmaceuticals.