|Home | About | Journals | Submit | Contact Us | Français|
Background: Employers recently requested a valid metric of depression treatment quality. Such an indicator needs to measure the proportion of the population in need who receive high-quality care, and to predict clinical improvement. Methods: We constructed an administrative database indicator derived from HEDIS criteria for antidepressant medication management, and tested it in 230 employed patients in five health plans. Results: Indicator rates were 7.0% in the population in need. Conformance to indicator criteria in this population was associated with 23.0% improvement in depression severity over 1 year (p = .02). Conclusions: Administrative database indicators that predict clinical improvement are a very rare accomplishment. Existing depression indicators may need to be calculated for the population in need to provide a valid metric for employer purchasers.
In a move towards value-based purchasing, a recent forum of employers announced they were prepared to actively negotiate with their health plans to improve the quality of depression treatment provided their workforce ‘if a valid metric of depression treatment could be identified’ (Apgar, 2002). Among the most promising candidates for this metric are the National Committee on Quality Assurance's (NCQA) Health Plan Employer Data and Information Set (HEDIS) outpatient depression indicators. Derived from guidelines summarizing expert consensus on treatment research and clinical care (Depression Guideline Panel, 1993), HEDIS depression indicators are currently constructed from administrative databases in over 300 health plans to characterize the quality of care a health plan provides to the population in treatment, diagnosed patients who initiate antidepressant medication (Thompson, Bost, Ahmed, Ingalls, & Sennett, 1998).
To be a valid metric of the quality of depression care from the perspective of a purchaser, an indicator needs to meet two criteria: (1) it should measure the quality of care a health plan provides to the population in need, and (2) it should predict improvement in clinical outcomes in this population. Little is known about whether indicators currently calculated for the population in treatment accurately characterize the population in need because (unlike other chronic conditions) most depressed patients are not in treatment (Rost, Smith, Guise, & Matthews, 1994; Rost et al., 2001). In addition, it is unclear whether HEDIS-based outpatient depression indicators predict improved outcomes in either the population in treatment or the population in need (Rost, Williams, Wherry, & Smith, 1995; Simon et al., 1995; Melfi et al., 1998; Katon et al., 2000; Fortney, Rost, Zhang, & Pyne, 2001; Bull et al., 2002; Schoenbaum et al., 2002); however, multiple studies question whether administrative databases contain too much measurement error to derive valid quality metrics for any condition (Hunt et al., 2000; Cotter, Smith, Rossiter, Pugh, & Bramble, 1999; Bloom, Harris, Thompson, Ahmed, & Thompson, 2000; Kobak et al., 2002; Jones et al., 2000). Conducted as a secondary analysis in a cooperative study database (Rost et al., 2001), the first objective of the study was to calculate HEDIS-based outpatient depression indicator rates for the population in need. The second objective of this study was to investigate whether this indicator significantly predicted improved health in this population.
Our previously described methods (Rost et al., 2001; Wells, 1999), approved by the Human Research Advisory Committees at the University of California Los Angeles, RAND, and the VA Greater Los Angeles, are summarized here. Research teams from Partners in Care (PIC) and the Mental Health Awareness Project (MHAP) administered a depression screener to a consecutive sample of adult patients insured by five health plans who provided administrative data for participants recruited at an index primary care visit from 27 staff/group and network primary care practices. Patients who screened positive on the depression screener at the index visit were eligible for the parent study if they reported: (1) they intended to receive care in the clinic on an ongoing basis; (2) they had telephone access; (3) they were not pregnant/breastfeeding, cognitively impaired or seriously physically ill, and (4) spoke English or Spanish. In addition, one of the parent studies excluded patients who: (5) failed to meet criteria for past-year major depression; (6) screened positive for bereavement, mania or alcohol problems; or (7) did not speak English. Because HEDIS depression indicators are currently calculated for patients beginning treatment, the study team excluded patients who entered the parent study in treatment (see Fig. 1). Because this manuscript is part of a larger series of studies examining employer health benefit purchasing (Rost, Smith, & Fortney, 2000), the study team also excluded patients who reported no full or part time employment. Note that we included 32 depressed patients who did not meet criteria for current major depression because: (1) HEDIS-eligible patients include diagnosed patients with minor depression and (2) quality improvement initiatives improve outcomes comparably across the spectrum of depression severity, including minor depression (Wells et al., 2000; Rost, Elliott, & Dickinson, 2002). After completing baseline, patients from the parent study completed structured interviews at 6 and 12 months, with response rates of 97.4 and 88.7% in this cohort (see Fig. 1).
The indicator was calculated from complete medical, mental health specialty care, and pharmacy data for each participant for 1 year following the index visit. Parallel to HEDIS 3.0 criteria, patients were included in the indicator's numerator if: (a) a prescription for an antidepressant medication was noted from up to 30 days before to 14 days after index episode start date; (b) the prescription was filled a sufficient number of times for patients to be able to take the medication for 84 out of 114 days following the first prescription; and (c) three non-emergency room visits with any mental health diagnosis (ICD9 code between 290 and 319 inclusive) to a primary care or mental health provider (if all three visits were to mental health providers, at least one of them had to be to a prescribing provider) were noted during the 12 weeks following the index episode start date. The index episode start date was defined as the date of the first primary care visit with a depression diagnosis, or the index visit for depressed patients who were not diagnosed. We elected to test an indicator combining criteria (a) through (c) above because the current HEDIS indicator requires all three criteria be met, and multicollinearity prohibited us from drawing meaningful conclusions about which criteria made stronger contributions to improved outcomes.
All patients meeting the study's eligibility criteria were included in the indicator's denominator as members of the population in need. Thus, all subjects received an administrative database diagnosis of depression (ICD 9 codes of 296.2, 296.3, 298.0, 300.4, 309.1 and 311) during the 3 months following the index visit and/or reported at screening 2 or more weeks of feeling depressed or losing interest in the past year with 1 or more weeks in the past month (Rost et al., 2001). We selected 3 months as the cutoff for diagnosis to be able to assess whether patients met numerator criteria (b).
Depression severity was measured at each wave by the 23-item modified Center for Epidemiological Studies—Depression (mCES-D) (Rost, Nutting, Smith, Werner, & Duan, 2001) 100-point scale examining DSM-IV depressive symptoms (α = .80), with higher scores indicating greater severity. Depression severity scores were log-transformed before analysis to achieve a more normally distributed dependent variable.
Multivariate models presented in this manuscript used baseline measures of gender, minority status, education, depression and/or dysthymia diagnoses derived from structured interviews, role functioning, household income, physical comorbidities, and health plan. We selected these covariates from more than twenty sociodemographic, occupational, and clinical characteristics by using stepwise regression methods to identify all variables that entered the depression severity model at p < .20.
To address the first objective, we calculated indicator rates in the population in need as defined above. To address the second objective, we investigated the relationship of the indicator to depression severity change using multilevel longitudinal models (Gibbons & Hedeker, 1994; Singer, 1998; Raudenbush & Bryk, 2002). Because both projects randomized patients to an effectiveness intervention, the dependent variable across all waves was modeled with a random intercept as a linear combination of indicator, time, intervention, indicator × time, intervention × time, and the covariates listed above, to control for possible intervention effects that were not mediated by the indicator (Kraemer, Wilson, Fairburn, & Agras, 2002). We tested the relationship between the indicator and depression severity change by examining the parameter estimate and associated statistical significance of the indicator × time term, retransforming parameter estimates using smearing retransformation (Duan, 1983). We characterized the strength of the relationship between the indicator and depression severity change by calculating the area between the curves, reporting the percent improvement in depression severity over 1 year attributable to indicator–concordant care. To rule out the possibility that the indicator's impact on severity reflected concurrent psychotherapy, we conducted an analysis in the subgroup of patients whose administrative database records indicated they had no psychotherapy visits in the 6 months following the index visit. We were prohibited from using instrument variables to correct for selection bias by the small sample size; however, previous research suggests that non-instrumented estimates of indicator–outcome relationships are conservative (Schoenbaum et al., 2002; Fortney et al., 2001). Power analyses indicated that the sample size provided us greater than 80% power to detect a .8 effect of the indicator on severity change using a two-sided test with alpha set at .05.
On average, the 230 patients eligible for this analysis were at baseline 41.6-years old (SD = 11.2), 66.5% female, 39.1% minority, 48.7% currently married, and 74.4% with some college education. Occupationally, 43.5% of participants were employed at baseline as professionals/administrators, 22.2% as managers/salespeople, and 34.3% as clerical/service workers. The majority of subjects met criteria for recent major depression and/or dysthymia on structured interviews with 74.8% meeting criteria for 1-year major depression, 11.3% meeting criteria for 1-year dysthymia, and 13.9% meeting criteria for substantial depressive symptoms. Patients reported an average of 1.2 (SD = 1.4) physical comorbidities.
Among the 230 patients comprising the population in need, 7.0% were positive on the indicator with 18.3% starting antidepressant medications (criteria a), 13.0% continuing antidepressant medications that were started (criteria b), and 11.7% receiving follow-up visits (criteria c). As Table 1 and Figure 2 show, conformance with the indicator was associated with an average annual improvement in depression severity of 23.0% (Indicator × Time t = −2.37, p = .019, df = 639). By 1 year, patients who were indicator positive had realized a 51.8% reduction in depressive symptoms (from 45.4 to 21.9), while patients who were indicator negative realized a 16.5% reduction in depressive symptoms (from 45.4 to 37.9). Thus, patients who were indicator positive realized 3.1 times the improvement in depressive symptoms that their indicator negative counterparts realized. Conformance with the indicator was associated with a 32.1% improvement in the 203 patients in need who received no psychotherapy (Indicator × Time t = −2.29, p = .023, df = 563).
To be a valid metric of depression treatment for a purchaser, an indicator should meet two criteria. First, the indicator should characterize the quality of care provided to the population in need. Second, the indicator should predict improved clinical outcomes in this population. In terms of the first criterion, the HEDIS indicator currently in use potentially overestimates the rate of high quality medication management in the population in need by as much as four-fold (NCQA, 2003). In terms of the second criterion, the HEDIS indicator currently in use predicts a 23% improvement in clinical outcomes in the broader population in need. This degree of improvement is clinically as well as statistically significant, associated with six times the average effect size of antidepressant medication compared to placebo in patients with major depression (Walsh, Seidman, Sysko, & Gould, 2002).
While providing evidence that depression indicators developed in administrative databases ‘work,’ these findings suggest that the indicator may need to be calculated in both narrowly and broadly defined populations to be relevant for the range of stakeholders whose decisions quality indicators are meant to influence (Hermann, 2002; Hermann & Palmer, 2002). Applied narrowly to the population in treatment, the indicator provides adherence data that health plans have used to guide adherence improvement initiatives (Roberts, Cockerham, & Waugh, 2002; Hoffman et al., 2003). Applied broadly to the population in need, the indicator provides data on diagnosis and medication initiation/adherence, for broader quality improvement initiatives. From the employer perspective, an indicator characterizing the population in need is more meaningful because companies absorb considerable costs when depressed workers are not diagnosed and effectively treated (Druss et al., 2001; Kessler & Frank, 1997; Kessler et al., 1999; Greenberg, Kessler, Nells, Finkelstein, & Berndt, 1996; Russell, Patterson, & Baker, 1998; Martin, Blum, Beach, & Roman, 1996; Burton, Conti, Chen, Schultz, & Edington, 1999; Ettner, Frank, & Kessler, 1997). Employer purchasers need to evaluate the quality of the entire process, rather than the quality of any single part of the process to judge how depression treatment quality is affecting these costs.
Calculating depression indicators for the population in treatment can be done in pre-existing administrative databases. Calculating depression indicators for the population in need is more cumbersome because it requires a population-based survey. Economies of effort may be obtained by adding a highly sensitive and specific 2-item depression screener (Kroenke & Spitzer, 2002) to telephone surveys that NCQA may adopt as part of its accreditation process. The potential benefit of calculating depression indicators for the population in need is that the resulting indicator could motivate employer purchasers to support cost-effective quality improvement initiatives which modestly increase health care expenditures (Rost, Pyne, Dickinson, & LoSasso. The cost-effectiveness of ongoing management for primary care major depression, 2005) (Schoenbaum et al., 2001; Lave, Frank, Schulberg, & Kamlet, 1998; Simon et al., 2001).
The internal validity of the findings we report is strengthened by our careful construction of a ‘real world’ indicator using ‘real world’ databases. However, we recognize that future research needs to address challenges to the internal validity of these findings from selection bias and causal inference issues. In terms of selection bias (Newhouse & McClellan, 1998; Sturm, Unutzer, & Katon, 1999), our inability to use instrumented models in all likelihood downwardly biases the indicator–outcome relationships we report by an estimated 20% or greater (Fortney et al., 2001; Schoenbaum et al., 2002). In terms of inference, we recognize the non-experimental design prevents us from definitively concluding that high-quality depression care improves clinical outcomes; however, experimental studies showing that interventions that improve antidepressant medication management do in fact significantly reduce depressive symptoms (Wells et al., 2000; Rost, Nutting, Smith, Elliott, & Dickinson, 2002; Katon et al., 1995; Katon et al., 1996; Schulberg et al., 1996; Katzelnick et al., 2000; Katon et al., 1999; Simon, VonKorff, Rutter, & Wagner, 2000; Hunkeler et al., 2000), strengthen the causal link between quality and outcomes this study and other investigators observe (see Table 2).
The external validity of the study is strengthened by its ability to employ longitudinal models to test a quality indicator derived from administrative databases in a highly representative population of depressed primary care patients; however, we recognize limitations to external validity. Because we lacked pre-index visit administrative data that HEDIS 3.0 currently relies on, we had to use patient report to identify patients beginning a new treatment episode, allowing us to test the indicator in a comparable but not completely equivalent population. Because the parent study recruited a consecutively screened primary care sample, we could not include the estimated 6% of depressed patients who get their care exclusively from specialty care settings without seeing a primary care provider for any reason during their episode (Rost, Zhang, Fortney, Smith, & Smith, 1998). While it is possible the population we analyzed had a less severe form of depression than the population in which HEDIS indicators are currently calculated, we do not have sufficient power to determine whether the indicator is differentially effective in more or less severe populations. Future investigators are encouraged to examine this important question. While our methods can be criticized for proposing too broad a definition of the population in need, we note that quality improvement interventions targeting this very population achieve comparable improvements in clinical outcomes across the severity spectrum including those with significant depressive symptoms only (Wells et al., 2000; Rost, Elliott, & Dickinson, 2002). While there are clear advantages in demonstrating the value of quality indicators to employer purchasers, we recognize that the indicator–outcome relationships we report do not generalize to the non-employed patients insured through Medicare and/or Medicaid, some of whom may have more severe disorder (Marcotte & Wilcox-Gok, 2001).
Although it is reassuring that we can demonstrate the expected indicator–outcome relationships, the fact that we have administrative data from only five health plans limits the generalizability of our results. We encourage future investigators to explore these relationships within and across a considerably larger and more diverse group of health plans and patients using analytic strategies capable of precisely differentiating the relative contribution of acute and continuation indicators to a broad range of outcomes. Such databases can also make an important contribution to determining whether a broader application of the indicator is more susceptible to case mix differences across health plans than the narrower application.
Given the enormous challenges in creating ‘real world’ metrics, the definition of an administrative database indicator that predicts clinical improvement is an almost unique accomplishment in health care (Krumholz et al., 1998; Ahmed, Elbasha, Thompson, Harris, & Sneller, 2002), particularly in mental health (Hermann et al., 2000; Buchanan, Kreyenbuhl, Zito, & Lehman, 2002). This study contributes to the quality improvement literature by being the first study to our knowledge to demonstrate that a HEDIS-derived indicator predicts patient-reported clinical improvement (Melfi et al., 1998; Krumholz et al., 1998; Ahmed et al., 2002). The study also raises an important question about whether HEDIS-derived depression indicators need to be calculated in both broadly and narrowly defined populations to be relevant to employers as well as the health plans they sponsor.
We greatly appreciate the expert programming and encouragement that Carl Elliott contributed to this project. We also appreciate the help that Martha Mancewicz, Bernadette Benjamin, and the VA Center for the Study of Healthcare Provider Behavior provided with database construction. We gratefully acknowledge the efforts of Stephanie Mitchell, Mary Abdun-Nur, Maureen Carney, Carole Oken, Sarah Scholle, and our colleagues in the Quality Improvement for Depression (QID) Cooperative Study. We are also grateful to the managed care plans, clinicians, and patients who participated in this study from the Alamo Mental Health Group (San Antonio TX), Allina Medical Group (Twin Cities MN), Magellen/GreenSpring Behavioral Health (MD), Humana Health Care Plans (San Antonio TX), Kaiser Permanente (Northern CA), MedPartners (Los Angeles CA), PacifiCare of Texas (San Antonio TX), Patuxent Medical Group (MD), San Luis Valley Mental Health/Colorado Health Networks (San Luis Valley CO), Valley-Wide Health Services (San Luis Valley CO), and the Veterans Administration Greater Los Angeles Healthcare System (Los Angeles CA). This work was funded by the Agency for Healthcare Research and Quality (HS08349), by the National Institute of Mental Health (MH54444, MH54443, MH01170, MH63651, the Research Center on Managed Care for Psychiatric Disorders MH54623), and the John D. and Catherine T. MacArthur Foundation (96-42901A-HE).