|Home | About | Journals | Submit | Contact Us | Français|
The collection of health plans commonly referred to as “managed care” has come to include an astonishing variety of forms. Although a few are tightly integrated prepaid group practices, a much larger number reflect the complex mixes of associations of clinicians and institutions into provider groups and insurers that face myriad, sometimes conflicting, incentives and employ widely disparate information systems. Managed care plans also differ in the mix of prepaid and fee-for-service patients they enroll and the associated payor sources with which they must interact.
Given this heterogeneity, it is difficult to meaningfully compare the quality of managed care plans as a group to fee-for-service plans or to assess the relative performance among types of managed care plans. (In a similar vein, “fee-for-service” is no longer just represented by the solo practitioner paid directly by patients who then might be reimbursed by their insurers.) To begin to understand why there is so much variability in the reported quality among plans and providers one needs to delve below simple plan-level labels and address the enormous diversity in plan and provider structure and function. Furthermore, plan heterogeneity makes it generally true that if you have seen one health plan … you have seen one health plan. In fact, one may even need to ask which aspect of that one health plan you have seen. Therefore, to reach generalizable findings, it may make sense to have coordinated research initiatives that ensure that similar questions are asked and answered across a range of plans and settings.
In 1998, the Agency for Healthcare Research and Quality (AHRQ), the American Association of Health Plans (AAHP), and the Health Resources and Services Administration (HRSA) developed a collaborative initiative to examine the impact of various features of managed care on the management of chronic disease.1 A joint request for applications was issued, from which proposals were selected using a cooperative agreement format. While the project methods were not uniform across sites (as they would have been in a centrally coordinated multicenter, randomized trial), the hypotheses tested across the seven projects awarded were related. The goal of this approach was to stimulate research that explicitly would consider individual care management features of plans and providers, measured at various levels (plans, the business units of providers, and even the specific physical sites).
A key feature of interest was the specifics of the payment arrangements for individual physicians, some of whom work in medical groups that are paid by the health plans, others of whom had direct contracts with the plans, or with intermediary individual practice associations (IPAs). Furthermore, the number of plans with whom these provider groups had contracts could vary from one to many, and the proportion of patients seen on a fee-for-service basis could also vary enormously. A second key feature was the breadth of the provider networks offered by the plans, an issue that might affect both the attractiveness of the plan to enrollees with chronic illnesses, and the ways in which those chronic illnesses were managed for patients in the plan. A third area of interest was the managerial approaches used by the plans (or medical groups) to select clinicians, monitor or profile their practice patterns, develop, disseminate, and encourage adherence to practice guidelines. One could hypothesize various ways in which all these factors would affect resource use, processes of care, patient outcomes, patient assessments of care, and satisfaction. The different incentives arising from the various payment arrangements would be hypothesized to alter resource use by clinicians, and across clinicians in response to the needs for referrals, but here networks would play a role. Likewise, plan incentives and techniques to promote clinical guidelines might be affected by the extent to which their providers dealt with multiple plans. Ideally, the projects would have used similar definitions of these plan and provider components, and have examined overlapping conditions and clinical measures to allow direct comparisons across projects.
From this very cursory description of the major factors and potential linkages among them, it should be apparent that an integrated set of projects to collect and analyze the necessary data would probably rival that of the Human Genome Project (HGP). Suffice it say that the funding available was a miniscule fraction of that of the HGP. This meant not just a far smaller scale, but also a need to do the best that one could with the available resources. In some instances, projects were able to offer much greater depth of analysis by piggybacking on ongoing data collection efforts, but at the cost of having to use predetermined variables and categories. In other situations, a project could offer a very “tight” analytic design, but this could only be done with conditions that did not overlap with those of other projects. Furthermore, unlike the HGP, which could focus on a set of DNA samples that were fixed and disseminated for collective analysis, these projects often needed to collect data from plans and providers caught in the midst of the managed care backlash of the late 1990s. Thus, the analytic findings of the various projects are beginning to appear in various venues or are in progress. Since these reports will continue to enter the literature for some time, they can be identified at the AHRQ website (http://www.ahrq.gov/research/managix.htm). In addition to such “primary” findings, the projects have developed various methodological insights that are of interest not only to the study of managed care, but to issues of quality measurement, patient and consumer surveys, and complex study design.
In this issue, we bring together from these projects papers addressing key methodological issues raised as one attempts to measure and analyze performance in such complex systems (Adams et al. 2003; Kahn et al. 2003; Keating et al. 2003; Lozano et al. 2003; Shenkman et al. 2003; Stuart et al. 2003). Not surprisingly, the investigators have identified several important barriers to accurately assessing performance. Keating et al. (2003) and Stuart et al. (2003) use data generated primarily by providers (administrative data and abstractions of providers charts) and find that the source and availability of data used to measure quality have implications for apparent performance. The medical record is often considered the “gold standard” for documenting what was done to (for) a patient. This may well be reasonable in the hospital setting; although things are sometimes done but not recorded, these are likely to be chance events and less likely to be significant. In the context of patients with chronic conditions who may receive their care from a wide variety of providers, examining just the chart of the principal physician may systematically omit certain types of treatment. Billing information in such settings may be more complete than the primary charts, but it, too, may have consistent biases, especially for services that may not be reimbursed. Thus, not only is there no “best” data system across services and conditions, but the strengths and weaknesses of each data system are likely to vary across plans. While integrated delivery systems are more likely to have consolidated medical records, they may be far less able to access data from services provided outside the core providers of the plan. Furthermore, the quality of the data systems may vary in idiosyncratic ways across health plans of a given type.
These issues are probably more significant for studies of chronic illness than acute care, since patients with chronic illness often see many providers over time and their experiences with care are spread out. This increases the probability that chart review alone, when performed on the chart of the primary care provider or dominant provider, will understate quality or create the need to abstract multiple charts per patient.
Given these results, one might decide to seek data directly from patients instead of from providers. (In the post HI PAA environment this may also appear to be attractive.) However, Kahn et al. (2003) find that this approach is hampered by poor response rates in any one survey wave and the fact that response rates decline over time in longitudinal cohorts of patients. Furthermore, there may be bias in this dropout rate, as patients who have lower income, worse health status, or who are less satisfied with their care at the initial survey have disproportionately lower subsequent response rates. Thus, longitudinal patient surveys may be limited by loss of those patients who are of perhaps the greatest policy interest. Furthermore, there appear to be differences across health plans in the likelihood that people will respond over time, further biasing estimates of quality and process. If one attempts to take account of these response biases by weighting the regression results, the estimated effects of certain variables differ, further indicating that methods matter. Kahn and colleagues also point out that the high rate of apparent “gender switching” from one survey wave to the next suggests that many questionnaires are filled out by someone else in the household, even when the instructions are quite explicit that the patient is to fill it out. This finding has implications for all types of survey work, not just longitudinal ones.
Shenkman et al. report that the difficulties of sample selection and data collection are further exacerbated when the focus is on children. In the pediatric population, each individual chronic disease is rare, so that using typical approaches to defining a patient category (e.g., “all patients with diagnosis code XXX.X”) does not yield a large enough population among which performance can be evaluated (Shenkman et al. 2003). Rather, the approach taken by some policymakers is to identify pediatric patients by their needs instead of their diagnoses. One is left, then, attempting to assess the performance of health plans and the impact of health plan characteristics on broad populations such as “children with special health care needs.” Although all such children may have significant problems, the specific diagnoses may have important effects on the ability of practitioners to provide what is, or is perceived to be, quality of care. Since there is little reason to expect that the mix of diagnoses within the category of children with special health care needs is the same across plans or provider groups (in fact, one would expect differences across provider groups based on their expertise), then assessments of performance will be influenced by these unmeasured differences in case mix.
Although there is general agreement that quality measures should include patient outcomes, perceived functional status, satisfaction with care processes, and adherence to process guidelines, there has been less focus on the level—health plan or medical group or individual provider or some combination of these—at which to measure quality. Most of the resources devoted to quality measurement and the development of quality indicators to date have been directed at health plan assessment (e.g., HEDIS is the Healthplan Employer Data and Information Set and CAHPS is the Consumer Assessment of Health Plans Survey). In the special section, however, Lozano et al. (2003) show that much of the variability in performance is at the clinic level, even within a plan.
Moreover, there is clearly no single conceptually correct level for analysis. In some instances, the variables and outcomes of interest are best conceptualized at the health plan level because they deal with issues of coverage or coordination. In other instances, the clinic level may be most relevant for various reasons. For example, if the goal of the research is to identify factors or processes that may be improved, then the focus should be at the organizational level at which they take place. Similarly, Lozano et al. show that the apparent differences in performance across plans in the quality delivered to certain racial groups can be partially explained by the clinics used by those subpopulations.
This raises yet another issue. To a far greater extent than is the case in acute inpatient care, the quality of chronic care involves patient actions in a wide variety of ways, from changing risk factors and behaviors, to adherence to prescription drug regimens and diet, to perceptions of processes and quality. If a health plan (or provider group) happens to enroll a population that is especially nonadherent, should it be “blamed” for the poor outcomes? Alternatively, should plans and providers have no responsibility for learning how to tailor messages and recommendations so they can be understood by a wide range of patients?
These issues are surely difficult, but are they completely intractable? Fortunately, the studies coming out of this initiative also offer some instruction for how to proceed in this often-messy environment. Stuart et al. (2003) find that the specific construction of the quality indicator is important, and that proportional measures may be more robust to missing data. Keating et al. (2003) provide a good rationale for using data from multiple sources when assessing performance and show that this approach can yield much more complete data (at least in the current information system context) than relying on a single source. Kahn et al. (2003) have suggestions for improving the use of patient survey data, including using nonresponse weighting and explicitly reporting response rates and characteristics tied to nonresponse. The data offered by Lozano and colleagues imply that measuring performance at the provider level is crucial, even if it is to be subsequently aggregated to the plan level. On the predictor variable side, Shenkman and colleagues (2003) demonstrate that data reduction techniques (in particular, principal components and factor analysis) can be used to parse the impact of health plan factors from patient characteristics on outcomes such as the need for an emergency room visit or inpatient admission.
Finally, Adams et al. (2003) note that simulating the impact of different study designs could improve research. They focus on using simulation to achieve more efficient sampling designs. One could also use this approach, moreover, combined with plausible assumptions about the issues at hand (and sensitivity analyses), to do simulations before engaging in research to determine which projects and types of information are most likely to generate useful information. Indeed, this strategy could be extrapolated to selecting approaches to data collection for performance reporting itself.
Taken together, the research reported in this special section represents an important step forward in our understanding of performance assessment. Subsequent analyses and publications from these studies will add to our knowledge about the specific importance of various factors in explaining variations in quality of care for chronic illness in managed care settings. There are many issues still unresolved, however. Given the complexity of the remaining issues, it is likely that independent, small, inexpensive studies will offer only very limited answers, and such studies may produce conflicting results over time as they will be unlikely to account for the multiple layers in the system. As an alternative, the developers of the initiative from which the papers in this special section arose sought a series of projects, but did not require that the studies would be tightly coordinated, in part because in 1998 the conceptual models were not available to specify all the pieces to a single jig-saw puzzle. Furthermore, as substantial as it was, the funding was far short of what one would need to pull together all the pieces. In fact, the set of projects seems to provide selected pieces from different puzzles. However, their methodological contributions go a long way to offer insights into what would be needed in a coordinated approach that takes account of the range of managed care systems, the various layers and players involved and the challenges in determining the effects of biases in the types of data that can be collected.
1The Robert Wood Johnson Foundation provided additional support for one project after the design of the overall initiative.