Our work demonstrates the difficulty in identifying patients with depression from administrative data. Recognizing that depression is underreported in administrative data, we specifically designed Algorithm 1 to explore the effects of using pharmacy codes as primary identifiers of depression. The pharmacy inclusion criteria for Algorithm 1 were broad, thus capturing more members (increased sensitivity) at the expense of generating more false positive diagnoses (decreased specificity and positive predictive value). The more stringent Algorithm 2, which required a diagnosis of depression, had a better specificity, but much lower sensitivity.
Both algorithms suffered from low positive predictive value, and thus frequently falsely classify patients as having depression. Predictive value depends upon the prevalence of the underlying condition. Only in highly selected primary care patient populations does the prevalence of depression approach 20 percent (
Pearson et al. 1999), corresponding to a positive predictive value of less than 53 percent for both algorithms. One study found the prevalence to range between 5 to 10 percent for unselected elderly patients (
Barry et al. 1998), corresponding to a positive predictive value of less than 33 percent for both algorithms. This means that if administrative data were used to derive quality measures for depression, only 33 percent of those patients in the denominator would actually have a physician diagnosis of depression by chart review.
These findings are especially important given the close relationship of our quality measures to the HEDIS measure for Antidepressant Medication Management. Linking Algorithm 2 with quality measures 1, 2, and 3 approximates the current HEDIS technical specifications (
Allison, Wall et al. 2000). Algorithm 2 uses the same pharmacy and ICD-9 codes as HEDIS for denominator construction. Both approaches require each member to have at least 12 months of continuous enrollment in a managed care plan with pharmacy benefits.
However, there are some differences between our quality measures and the HEDIS measure. Our quality measures were intended to reflect the 1993 AHCPR guidelines and not to duplicate the HEDIS Antidepressant Medication Management measure, which was in draft format at the time of data collection for this study. Similar to the HEDIS measure, we required a new diagnosis of depression because many of our quality measures apply to the acute phase of depression. In contrast to the HEDIS measure, which allows either a primary or secondary diagnosis of depression, we required a primary diagnosis.
Quality measure 1 differs from the corresponding HEDIS measure by requiring one visit to a primary care provider within 6 weeks of diagnosis, whereas the HEDIS measure requires three visits within 12 weeks. We made our criteria for follow-up more lenient because, in certain cases, telephone contact without an office visit is appropriate and would not be reflected in the administrative data. We included two measures of antidepressant medication adherence. Our measure 2, Adherence during the Acute Phase of Treatment, is similar to the HEDIS measure, as both reflect adherence during the first three months of treatment, allowing for gaps in medication supply. Our measure 3, Adherence during the Maintenance Phase of Treatment, examined adherence within the first four months of treatment, while the HEDIS measure examines adherence during the first six months. We constructed measure 3 to reflect adherence to the minimum recommended by the 1993 AHCPR guideline (i.e., four months). Our work also reveals important differences in quality measurement according to which algorithm was used to define the denominator. Algorithm 1, associated with a positive predictive value and corresponding higher false positive rate, led to lower rates for each quality measure. In fact, three of the measures differed by 7–8 percent. Such underreporting of quality performance is important and may lead to loss of credibility in provider feedback with crippling of improvement efforts (
Allison, Calhoun et al. 2000). Therefore, when planning a quality improvement project, positive predictive value is probably the most important operating characteristic of a disease-identification algorithm.
The number of identified members, which increases with the sensitivity of the algorithm, has implications when generating performance profiles. Creating valid physician profiles requires sufficient patient numbers. In this study, the impact of requiring a diagnostic code to identify members with depression reduced the eligible member population by two-thirds. Consequently, fewer practices meet minimum volume criteria for individualized performance profiles.
Even beyond the difficulties imposed by administrative data, several factors contribute to the diagnostic challenges of depression. Although depression affects up to 10 percent of the U.S. population at an estimated annual cost of $44 billion (
Hall and Wise 1995) and produces impairment in quality of life similar to that of other serious chronic diseases (
Wells, Stewart et al. 1989), the diagnosis is often missed by physicians. Primary care physicians recognize only about one-half of all depressed patients in the outpatient setting (
Wells, Hays et al. 1989;
Kessler, Cleary, and Burke 1985;
Borus et al. 1988). The detection rate by primary care physicians falls to 30 percent for patients with significant medical comorbidity (Tylee, Freeling, and Kerry 1993). This may result from physicians attributing signs and symptoms of depression to other medical illness. Somatic symptoms used in making the diagnosis of depression (e.g., fatigue, sleep disturbance, weight loss) are also presenting features of many other medical illnesses. Subjective symptoms such as depressed mood and anhedonia may also be inappropriately regarded as an understandable reaction to illness.
Concern about patient confidentiality and the potential for jeopardizing reimbursement and other benefits may also lead physicians to deliberately substitute alternative diagnoses on claims and encounters. In a survey of 440 primary care physicians randomly selected from the membership lists of professional organizations, 50 percent of respondents reported substituting another diagnostic code in the prior two weeks for one or more patients who met the criteria for major depression (
Rost et al. 1994). Physicians may underreport depression to protect patients from social stigma and possible occupational and legal consequences (
Hoyt et al. 1997;
Hirschfeld et al. 1997). For example, medical records are often subpoenaed during custody hearings. In addition, many physicians may be uncertain about making such diagnoses because of limited training with behavioral health disorders. As a result, valid cases of depression are not identified by physicians, let alone by algorithms based on administrative data. Furthermore, patients identified from administrative data may represent more severe cases (
Valenstein et al. 2000).
Some patients reluctantly express psychological symptoms and may deny mood changes. These patients may present instead with a variety of nonspecific somatic complaints such as headache, abdominal pain, insomnia, weight loss, or low energy. This symptom overlap leads to a complex interaction between depression and medical comorbidity. Therefore, medical illness frequently presents as depression and depression as other medical illness. This problem is especially troubling in the elderly, who suffer from a higher burden of comorbidity (
Coulehan et al. 1990).
The overlap in treatment of depression and other medical conditions makes identification of depressed patients from pharmacy data difficult. Antidepressants are now used for a wide variety of diseases other than depression such as chronic pain, neuropathic pain, fibromyalgia, chronic fatigue syndrome, migraine and tension headaches, irritable bowel disease, premenstrual dysphoric disorder, insomnia, eating disorders, premature ejaculation, panic disorder, post-traumatic stress disorder, social phobia, and anger attacks (
Barkin et al. 1996a, 1996b;
Compas et al. 1998;
Davies et al. 1996;
Fishbain et al. 1998;
Keck and McElroy 1997;
McQuay et al. 1996;
Merskey 1997;
Metz and Pryor 2000;
Moreland and St. Clair 1999;
Pappagallo 1999;
Simon and Von Korff 1997). These syndromes have variable overlap with clinical depression, and it is often difficult to discern which problem is primary (
Keck and McElroy 1997;
Clarke 1998). Often a corresponding ICD-9 diagnosis of depression cannot be found when an antidepressant prescription is written at an office visit. Given the tolerability and safety of the newer antidepressants, clinicians may tend to prescribe these agents for nonspecific psychiatric symptoms or behavioral problems (e.g., stress) without a making a clear diagnosis. As a result, there is concern about inappropriate use of antidepressants (
Bouhassira et al. 1998).
The use of antidepressants for disorders other than depression and the treatment of depression without a diagnosis are both reflected in the rate of antidepressant use among members with a false positive diagnosis. Algorithm 2 produced fewer cases of false positive identification and, even among the false positive cases, the rate of antidepressant use was lower for Algorithm 2 (53 percent versus 84 percent).
The limited observation period available through administrative databases of health plans is both a strength and limitation. Administrative data permits longitudinal observation at the member level, unlike certain other data types. However, annual member disenrollment averaged 29 percent in 1999 for HMOs reporting to NCQA's
Quality Compass 2000 (2000). This limits identification of new cases. Although recently enrolled members may appear to be newly identified with a disease, it is possible that the disease is long-standing. Variation in quality performance for new cases of major depression may partially result from undetected variation in disease chronicity. Katon did not find important differences in antidepressant treatment patterns in a staff-model HMO after adjusting for multiple factors, including prior history of depressive episodes (
Katon et al. 2000).
Our paper raises the need for caution in interpreting quality measures based on administrative data. For example, we found that rates of appropriate antidepressant treatment (e.g., follow-up visits, appropriate dosage) were substantially higher when the specifications for the member population required a diagnosis of depression. Likewise, Kerr found that variations in the specifications of quality-of-care measures for depression treatment influenced conclusions about the adequacy of antidepressant prescribing patterns in two managed care practices (
Kerr et al. 2000). Kerr varied the definition of a new episode of depression (four-month versus nine-month clean period) and the minimum number of visits with a diagnosis of depression. Patients with two or more visits coded for depression were more likely to receive antidepressants at the appropriate dosage than those with only one visit coded for depression. This may, in part, result because increased algorithm specificity from the requirement of a diagnosis code may lead to the identification of patients with more severe depression as opposed to a temporary crisis or generalized stress.
Our study also has specific implications for the training of primary care physicians. We found that primary care physicians who documented a diagnosis of depression in the medical record rarely documented the symptoms required to make that diagnosis using the DSM-IV criteria for major depression. This finding may reflect poor documentation rather than poor interviewing skills. Other studies suggest that interviewing style of the primary care physician is related to the recognition of depression (
Badger et al. 1994;
Robbins et al. 1994). Training and feedback based upon medical record review has been shown to increase both recognition of depression and documentation of symptoms (
Linn and Yager 1980). In addition, automated office screening tools have been shown to increase recognition of depression without placing excessive demands on physicians (
Zung et al. 1983;
Moore, Silimperi, and Bobula 1978;
Hoeper et al. 1984;
Magruder-Habib, Zung, and Feussner 1990).