|Home | About | Journals | Submit | Contact Us | Français|
To develop a model for adjusting patients' reports of behavioral health care experiences on the Experience of Care and Health Outcomes (ECHO™) survey to allow for fair comparisons across health plans.
Survey responses from 4,068 individuals enrolled in 21 managed behavioral health plans who received behavioral health care within the previous year (response rate=48 percent).
Potential case-mix adjustors were evaluated by combining information about their predictive power and the amount of within- and between-plan variability. Changes in plan scores and rankings due to case-mix adjustment were quantified.
The final case-mix adjustment model included self-reported mental health status, self-reported general health status, alcohol/drug treatment, age, education, and race/ethnicity. The impact of adjustment on plan report scores was modest, but large enough to change some plan rankings.
Adjusting plan report scores on the ECHO survey for differences in patient characteristics had modest effects, but still may be important to maintain the credibility of patient reports as a quality metric. Differences between those with self-reported fair/poor health compared with those in excellent/very good health varied by plan, suggesting quality differences associated with health status and underscoring the importance of collecting quality information.
Although effective treatments exist for most mental health disorders, many individuals with mental illness receive inadequate care (McGlynn 2003). This is worrisome because mental illnesses and substance abuse impose substantial burdens on patients, employers, and the health care system (Kessler et al. 1999; Pincus and Pettit 2001; Wang, Simon, and Kessler 2003). Patients' reports about their care experiences can provide important information that is not available from other sources (Cleary and Edgman-Levitan 1997) and provide insight into access and delivery concerns that can influence patients' decisions to seek care and adhere to treatment, which can affect health outcomes (Fremont et al. 2001; Wells et al. 2004).
Increasingly, plan-level summaries of patient experiences are being used in public reports about care quality (Goldstein et al. 2001) and by accreditation agencies (National Committee on Quality Assurance [NCQA] 2005). Policy makers and payers have become interested in assessing quality by comparing the experiences of patients across behavioral health organizations in an effort to increase accountability and better inform quality improvement efforts (Eisen et al. 2001). The Experience of Care and Health Outcomes (ECHO™) survey was developed under the CAHPS® quality measurement program (Shaul et al. 2001) to assess the experiences of patients receiving behavioral health care through managed care plans, specialized managed behavioral health organizations, and other organizations (Eisen et al. 2001).
Plan differences in patient reports about aspects of care such as patient–provider interactions and customer service may be meaningful to plans engaged in quality improvement or accreditation processes, to payers and consumers choosing among plans, and to regulators and policy makers interested in determining whether plans are meeting requirements or benchmarks. However, comparisons may be misleading if scores are not adjusted for differences in the underlying characteristics of the patient populations across plans (Zaslavsky 2001a). For example, patients' health status or age may influence how patients evaluate their experiences independently of the quality of care received. Moreover, quality of care may differ between certain types of patients (e.g., younger and older patients) in a consistent way within all health plans. If plans are compared using the average score of all enrollees, relative rankings may not accurately reflect quality-of-care differences. Ideally, health plan ratings would reflect how plans would be rated by an identical patient population. Thus, it is important to control for underlying patient characteristics when calculating plan-level scores. Even if the adjustment to scores is modest, it is important to adjust scores for differences in patient characteristics to ensure both the actual and perceived fairness of the comparisons.
The goals of this study are twofold. First, we identify patient characteristics that are associated with consumer ratings of behavioral health plans. Second, we develop a statistical model to adjust plan-level summaries of patients' ratings of behavioral health care plans so that ratings more accurately reflect quality of care received. Because commercial and Medicaid plan populations may systematically differ, we develop separate models for these two plan types.
A priori, we expect self-reported mental health status, general health status, and age to be important adjusters based on prior research in the general health care sector (Zaslavsky et al. 2001b). Education, income, gender, race/ethnicity, and seeking care for alcohol or drug use may also be important case mix adjusters and are tested.
This study used data collected during the field test of the ECHO survey (Shaul et al. 2001), as well as survey data submitted voluntarily by health plans to the National CAHPS Benchmarking Database (National CAHPS Benchmarking Database 2005). These data were combined to achieve the largest sample of ECHO survey respondents available to date. The field test data included responses of enrollees in six Medicaid plans in Minnesota and seven commercial plans in New Jersey. The NCBD data included responses of enrollees in three Medicaid plans and five commercial plans in Colorado, Florida, New York, and Ohio.
Health plan enrollees were eligible to receive the survey if they were 18 years or older, had been continuously enrolled during the previous year, and had a diagnostic or procedural code in administrative records indicating they had received behavioral health services in the preceding year. Eligible services included treatment for mental illness, personal or family problems, and/or treatment for alcohol or drug use that was provided by a specialty behavioral health care provider in an outpatient, inpatient, or partial or day treatment setting. Enrollees who received behavioral health services only in primary care settings (e.g., enrollees who received psychotropic medications from their primary care physician) were not eligible. These criteria were consistent with those used by the NCQA to compute mental health and chemical dependency utilization rates (National Committee on Quality Assurance 2000).
Sampling procedures varied across the sites. For the field test data, Minnesota pooled eligible enrollees from six Medicaid plans and drew a random sample of 2,500 individuals. In New Jersey, four commercial plans selected simple random samples of 1,400 enrollees from their eligible population, while three plans with insufficient enrollment selected all eligible enrollees. The resulting samples ranged from 842 to 1,426. For the NCBD data, in New York, 300 survey recipients were sampled in each of the two Medicaid plans, and 750 were sampled in each of the two commercial plans. In Ohio, 300 survey recipients were sampled from the larger of the two commercial plans, while 200 respondents were sampled from the smaller plan. Nearly 1,300 were sampled in a commercial plan in Florida, and 1,110 were sampled in a Medicaid plan in Colorado. In total, 14,482 behavioral health care patients were sampled. In all sites, only one plan enrollee was selected from each household. In all cases, the surveys were administered by independent survey vendors between September 2000 and February 2002. Half of the sites sent one or two mailings with telephone follow-up of nonrespondents (at least six calls), while half relied solely on mailings.
The ECHO survey is a registered CAHPS instrument that can be used to elicit reports and ratings of care experiences from patients who receive behavioral health services through a managed behavioral health plan or managed care plan (http://www.cahps.ahrq.gov). The sites used various versions of the instrument ranging from 50 to 88 items; longer versions included supplemental items about the health plan and about symptoms and functional status. Only the items that were common to all (or nearly all) of the survey instruments were used in this study.
The dependent variables used to develop the case-mix adjustment models consisted of a global rating of all counseling or treatment received in the previous 12 months, a global rating of the health plan that managed the care, and 22 questions that asked about specific experiences in four domains (timely access to care, patient–provider interaction, treatment information, and health plan approval and service). Specific questions are noted in an appendix available on the journal website. To determine plan scores for the four domains, we calculated the mean of mean responses to the questions in each domain.
Eight variables were tested as potential case-mix adjusters. These included self-reported general health status; self-reported mental health status; whether the respondent's reasons for receiving counseling or treatment in the past year included getting help for alcohol or drug use, age, gender, race/ethnicity; education; and income. Income was available only for enrollees in seven commercial plans. Table 1 indicates means for these variables.
Because public reports and accreditation agencies use plan summaries to assess plan quality, we focus on the adjustment of such summary statistics. To be a relevant adjuster, a patient characteristic must be a significant predictor of scores and vary in distribution across plans. For example, if self-reported health status was not correlated with CAHPS scores, or the distribution of health status did not differ across health plans, it would not be an important case-mix adjuster. Measures of these two criteria can be combined into a summary measure of “explanatory power” (EP) that can be used to compare the relative impact of potential adjusters (Zaslavsky 1998; O'Malley et al. 2005). To determine predictive power, each individual-level rating was regressed on each adjuster in separate linear models. Dummy variables for plans were included in the predictive models, so the resulting coefficients were estimates of the within-plan effects. To measure how much the patient characteristic varied across plans, a variance ratio (the ratio of the adjuster's between-plan variance to its within-plan variance) was calculated. The EP of each adjuster relative to each rating and report item was calculated by multiplying the adjuster's predictive power (contribution to R2) by its variance ratio (Zaslavsky 1998). While predictive power refers to the contribution of the variable to predicting individual responses, EP refers to the contribution of the variable to explaining differences among mean responses for groups (plans, in this case).
Adjusters that were significantly associated with patient reports and had high EP were selected for the base model. Once the base model was determined, the remaining variables were retested by computing their incremental EP after controlling for adjusters in the base model.
In the initial models, linear specifications of ordinal variables were included. This specification assumes that the difference between the levels is constant (i.e., the difference in ratings between poor and fair health is the same as the difference in ratings between very good and excellent health). Two advantages of this specification are that it results in a more parsimonious model, allowing for testing of interactions of case-mix adjusters with plan once the base model is specified, and that it is consistent with other CAHPS case-mix adjustment models. The disadvantage of this specification is that it may lead to a suboptimal specification if the assumptions regarding the effect of differences in ratings are incorrect. To test the appropriateness of using linear specifications, we re-estimated our initial prediction regressions using categorical specifications of ordinal variables. We then compared the adjusted R 2 of the linear specification with the adjusted R2 of the categorical specification to determine whether use of the linear specification was appropriate.
Once the final adjustment model(s) was specified, the CAHPS analysis program (http://www.cahps.ahrq.gov) was used to calculate unadjusted and adjusted plan means for the two global ratings and four composite summary measures. Two plans with fewer than 30 respondents were excluded, leaving 19 plans in the sample for the impact analysis. The unadjusted and adjusted plan scores were centered to have a mean of zero so that the impact of alternative models could be compared. The impact of adjustment on plan scores was summarized by three measures: the mean absolute adjustment, the largest positive adjustment and the largest negative adjustment. In public reports, the relative ranking of plans may be important to consumers. The impact of adjustment models on plan rankings was summarized by two measures: Kendall τ correlations and the percentage of all possible pairs of plans whose rank-order changed postadjustment. Finally, the uniformity of each adjuster's coefficient across plans was assessed by testing the interaction between the adjuster and a group (plan) variable in each model.
Of the 14,482 individuals in the initial sample, 18 percent were ineligible to receive the survey because of incorrect contact information, language barriers, illness, or death. The percentage of ineligible recipients ranged from 2 to 62 percent across the six sites and was generally lower among commercial plans (2–34 percent) than among Medicaid plans (25–62 percent). Among the 11,855 eligible survey recipients, 48 percent responded. Among the 5,671 respondents, 28 percent were excluded from the analysis because they reported that they either were no longer enrolled in the plan, had been enrolled for less than the survey period of 1 year, had not received behavioral health services in the past 12 months as administrative records had indicated, or had someone else complete the survey for them (respondents who reported only receiving some help with reading, writing, or translation were not excluded). The remaining 3,067 enrollees in commercial plans and 1,001 enrollees in Medicaid plans (4,068 total) who reported receiving behavioral health services within the past 12 months comprised the analysis sample.
Data were available to compare respondents and nonrespondents in three of the six sites. Respondents were significantly more likely to be older and to be female than nonrespondents (data not shown). In Minnesota, respondents had significantly more mental health visits than nonrespondents. In New Jersey, respondents were significantly more likely to have an alcohol/drug-related visit. No other significant differences between respondents and nonrespondents were found.
In models that only controlled for plan effects, self-reported general health status and mental health status had positive, statistically significant associations with global ratings and with nearly all of the report questions in the commercial and Medicaid samples (Table 2). When tested simultaneously, mental health status was statistically significant more often than general health status, but general health status remained significant in a substantial number of the models and sometimes was significant when mental health status was not (results not shown). Consequently, both health status measures were retained for the base model.
After controlling for mental health, general health, and plan effects, older age was frequently associated with more positive ratings and reports, while having more education was often associated with less positive ratings and reports (Table 2). Although not significantly related to either of the global ratings, alcohol/drug use was frequently associated with reports about experiences, sometimes positively and sometimes negatively. Significant associations with gender and race/ethnicity were generally less frequent, although black race/ethnicity was negatively associated with several reports in the commercial sample. Income was a significant predictor of plan ratings and several reports about experiences in the commercial sample.
We next examined whether the linear specification of case-mix adjuster variables was appropriate. The increase in adjusted R 2 from using a categorical specification was modest. Considering the Global Rating of Behavioral Health Care, the linear specification performed the worst for education, with a 15 percent increase in EP from a categorical specification. Considering the Global Rating of Behavioral Health Plan, in three of the five ordinal variables, the linear specification had a higher adjusted R 2 than the categorical specification. The linear specification of income performed somewhat poorly relative to the categorical specification, with an increase in adjusted R 2 from 0.0236 to 0.0363. Given the generally modest improvement in model fit due to categorical specifications, the desire to keep the model parsimonious to allow for plan interactions, and the desire to have a model as consistent with other CAHPS adjustment models when appropriate, we chose to keep the ordinal specification in the model.
In models that controlled only for plan effects, mental health status had the highest levels of EP for both the ratings and reports in the commercial and Medicaid samples with one exception (results not shown). The mean EP levels for Hispanic ethnicity were higher in both the samples, but Hispanic ethnicity was related to considerably fewer report questions compared with mental health status, so was not chosen for the base model. General health status, the other variable significantly related to nearly all the ratings and reports, had high mean levels of EP compared with most other variables, so it was also chosen along with mental health status for the base model (results not shown).
After controlling for variables in the base model, education had sizeable EP for ratings in both the samples (Table 2). Additionally, age and black race/ethnicity were important in the Medicaid sample, and income was important in the commercial sample. With respect to reports, Hispanic ethnicity had the highest mean EP in both the commercial and Medicaid samples, followed by age in the Medicaid sample and income in the commercial sample. Although the mean EP value for age was low in the commercial sample, age was significantly related to half of the report questions. Alcohol/drug use had the third highest mean EP level in both the Medicaid and commercial samples. Gender was important in the Medicaid sample, but only modestly influential in the commercial sample. Education had relatively low levels of mean EP, but was significantly associated with a relatively high number of reports in each sample. For reports, black race/ethnicity was more important than age and gender in the commercial sample and more important than education in the Medicaid sample.
The final adjustment model included mental and general health status, education, age, black race, Hispanic ethnicity, and alcohol/drug use. Alternative models adding income for the commercial sample and gender for the Medicaid sample were also evaluated.
Table 3 compares unadjusted and adjusted scores for the six reporting measures. On average, the effects of the case-mix adjustment on plan scores generally appear to be modest. For example, for the Global Rating of Behavioral Health Care, which has the largest adjustment, the mean absolute adjustment was 0.08 in the commercial sample and 0.10 in the Medicaid sample. For some plans, adjustment is important. For example, the largest positive adjustment was 0.2, which is equivalent to the mean difference between commercial and Medicaid plans. When considering changes in plan rankings, the effect was greater, although still modest. Most Kendall τ correlations were above 0.80, indicating that adjustment altered plan rankings in <10 percent of the possible pairings between plans ([1−0.80]/2). Slightly greater percentages of plan pairs (13–14 percent) switched their internal ranked order for the Global Rating of Behavioral Health Plan and Plan Approval and Service in the Medicaid sample.
Adding gender to the model for the Medicaid plans resulted in no substantial change in the average magnitude of adjustments for the ratings and report composites (results not shown). However, changes in the largest positive and negative adjustments led to an increase in the percentage of pairs that switched order for the Global Rating of Behavioral Health Care (from 0 to 5 percent) and the Global Rating of Behavioral Health Plan (from 13 to 20 percent). Similarly, adding income to the model did not have much impact on scores for the commercial plans. The mean absolute adjustment and the largest positive and negative adjustments were generally greater with income in the model. However, the adjustments increased the percentage of plan pairs that switched order only in the case of Timely Access to Care, and the increase was modest (from 0 to 5 percent).
Plan interactions with mental and general health status were statistically significant in the models predicting Global Rating of Behavioral Health Care, and only with mental health status for Global Rating of Behavioral Health Plan (results not shown). In addition, seven plan-by-mental-health interactions, and eight plan-by-general-health interactions, were statistically significant in the models predicting the 22 report questions. In contrast, fewer plan interactions with the demographic characteristics were statistically significant—three with age, two with gender, two with black race/ethnicity, four with Hispanic race/ethnicity, two with education, and two with income. In 24 tests, one or two interactions would be expected by chance.
If we assume that the effect of health status on individual reports is consistent across plans independent of actual experience with the plan, the significant interaction effects of plan and health status imply that there are quality differences among health plans. To illustrate the nature of plan differences, we stratified plan scores for Global Rating of Behavioral Health Care by mental health status (Figure 1). In nearly all the plans, individuals self-reporting excellent or very good mental health status rated their care more highly than individuals self-reporting fair or poor mental health status. However, the difference between the average rating given by respondents in the “poor/fair” category and respondents in the “very good/excellent” category ranged from 0.0 to 2.6 points on the 0–10 response scale. To put this in context, the overall average rating was about 7.9. Mean ratings among the sickest respondents varied more across plans than mean ratings among the healthiest respondents (SD=0.72 and 0.29, respectively).
Measures of medical care processes, including patient reports about their care experiences, are now widely used by consumers, purchasers, and accreditation organizations (NCQA 2005). When comparing plans using patient reports of care experiences, it is important to adjust for patient characteristics that affect scores but are unrelated to differences in quality of care. In this study, we use data from more than 4,000 patients who had received care in 21 behavioral health care plans to develop a statistical model for adjusting scores for patient characteristics and examine the effects of adjustments using this model on plan scores and relative rankings.
Consistent with other CAHPS studies (Zaslavsky et al. 2001b; Kim, Zaslavsky, and Cleary 2005; O'Malley et al. 2005), the average impact of case-mix adjustment on plan scores for ratings and reports collected from behavioral health care patients was modest. Adjustments did change plan rankings in a few cases for both the commercial and Medicaid plans, with adjustments typically being larger for the Medicaid plans. For a few individual plans, the change in some summary scores was large.
Although the effects in these data are modest, adjustments would be larger in groups of plans with greater interplan heterogeneity in patient characteristics. Whether the impact is large or small, case-mix adjustment may still be important to maintain the credibility of patient reports as a quality metric. In the absence of case-mix adjustment, plans that believe their patients have worse health status than patients in other plans may believe summary scores are suspect and be reluctant to rely on them as a quality indicator. The fact that case-mix adjustment can have a meaningful effect on plan scores and rankings, and requires only a small amount of information that is typically collected for other purposes such as subgroup analyses, makes it worthwhile to carry out. Doing so can preserve the face validity of the results for plans who might otherwise argue that their patients are more severely ill than is typical.
This study had several limitations. The sample consisted of plans that participated in the field test of the ECHO survey or voluntarily submitted their survey data to the National CAHPS Benchmarking Database. We do not know how well the findings of this study generalize to other plans. We suspect, however, that the main difference between participating and nonparticipating plans would be in the average scores, rather than in different patterns of associations. Response rates varied considerably. Limited analyses comparing respondents and nonrespondents indicated that younger individuals and men may be underrepresented in the sample, patterns that have been found in other behavioral health studies (Rosenheck, Wilson, and Meterko 1997).
Although prospective studies also have shown that health status is associated with care experiences, a potential problem with case-mix adjustment based on self-reported health status is that the quality of behavioral health care received may lead to changes in health status. By controlling for health status when reporting scores, plans may fail to benefit from improving the health of their enrollees. While this effect may be real, the average change in health status due to differences in quality of care is likely to be small, relative to the underlying health status of the individual.
As expected, the self-reported health status measures were the strongest and most consistent predictors of ratings and reports among the personal characteristics included in this study. Mental health status was frequently a strong predictor, as in other studies (Zaslavsky et al. 2001b), although general health status remained important in several cases after controlling for mental health status. This association may be due to general reporting tendencies that are associated, for instance, with general life satisfaction (Rohland, Langbehn, and Rohrer 2000) or with effects of mental illness on mood and perception. Patients in worse mental health may also receive lower quality of care than patients in better mental health. Providers are likely to have more difficulty communicating with patients who are distressed and may tend to unconsciously convey negative attitudes and behaviors toward patients in poor mental health (Hall, Milburn, and Epstein 1993). The chronic and recurring nature of many behavioral health conditions, and the uncertainty involved in determining the best treatment strategy for a particular patient, increase the likelihood that multiple treatment approaches will be attempted before symptoms are relieved, which may cause dissatisfaction with care.
Differences in reported care experiences between patients who received treatment for alcohol or drug use and other patients in the sample may reflect different treatment patterns for these two groups of conditions. For instance, those being treated for alcohol/drug use were more likely to report that providers had discussed different kinds of counseling or treatment and whether to include family and friends. In contrast, they gave less favorable reports about whether providers listened carefully, explained things, and showed respect for what they had to say. Furthermore, these respondents were less likely to report feeling confident that information about them was kept private by providers and more likely to report experiencing problems with delays in treatment while waiting for plan approval. Given the differences in these scores, and the types of treatments received, users of patient reports of care might consider separate ratings for individuals who report treatment for alcohol or drug use. In the current study, this group was too small to be analyzed separately.
As in other studies, older respondents tended to give more positive ratings and reports than younger respondents (Zaslavsky et al. 2001b). Older respondents may have lower expectations regarding their care and/or more respect for providers, leading them to give more positive reports and ratings. Older patients also may receive better care than younger patients if, for instance, providers tend to be more attentive to older patients (Cleary et al. 1992). Education was negatively associated with positive ratings. Because it is unlikely that respondents with more education receive worse care than respondents with less education, the findings are consistent with the hypothesis that highly educated respondents have higher expectations regarding their care, which result in less favorable assessments. The mixed findings related to income are consistent with other studies (Dow, Boaz, and Thornton 2001; Heflinger et al. 2004). Although a substantial number of respondents are reluctant to provide income data, an income item with coarse categories may be useful for case-mix adjustment when the income distributions of plan populations vary greatly. Because the effects of race/ethnicity were not the same in commercial and Medicaid plans, it may be important to estimate separate models for the two types of plans (as we do here) and make comparisons only among plans of the same type.
The regression models used to control for case-mix differences assume that the effects of adjusters are equal within each plan. While the effects of most demographic adjusters were similar across plans, the effects of mental health status varied across plans for several rating and report questions. Therefore, the adjusted plan scores, and possibly the rankings of plans based on those scores, would depend on whether the plans were compared with respect to how patients with poor, average, or excellent mental health status would rate them (Zaslavsky 1998). Reporting separate summaries for patients in relatively poor and good health (provided that sample sizes for these groups were sufficient within each plan) may be important for identifying performance differences among plans.
In summary, mental health status, general health status, alcohol/drug use, age, education, and race/ethnicity were identified as relevant case-mix adjusters for the ECHO survey, although the case-mix adjustment model resulted in only modest changes to plan ratings and rankings.
Joint Acknowledgement/Disclosure Statement: This study was supported with funding from grants provided by the National Institute of Mental Health (T-32 MH 19733 and K01 MH66109) and the Alfred P. Sloan Foundation (98-12-7). We thank the Executive Research Committee of the National CAHPS Benchmarking Database (NCBD) for granting access to the ECHO survey data. The NCBD is funded by the U.S. Agency for Healthcare Research and Quality and is administered by Westat under Contract No. 290-01-0003. We are not aware of any financial or advocacy-related conflicts of interest.
Disclaimers: No disclaimers are required.
Additional supporting information may be found in the online version of this article:
ECHO™ Global Ratings and Reports.
Reviewer Appendix A: Comparisons of Linear and Categorical Specifications of Ordinal Variables.
Please note: Blackwell Publishing is not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.