|Home | About | Journals | Submit | Contact Us | Français|
Measuring and reporting patients' experiences with health plans has been routine for several years. There is now substantial interest in measuring patients' experiences with individual physicians, but numerous concerns remain.
The Massachusetts Ambulatory Care Experiences Survey Project was a statewide demonstration project designed to test the feasibility and value of measuring patients' experiences with individual primary care physicians and their practices.
Cross-sectional survey administered to a statewide sample by mail and telephone (May–August 2002).
Adult patients from 5 commerical health plans and Medicaid sampled from the panels of 215 generalist physicians at 67 practice sites (n=9,625).
Ambulatory Care Experiences Survey produces 11 summary measures of patients' experiences across 2 domains: quality of physician-patient interactions and organizational features of care. Physician-level reliability was computed for all measures, and variance components analysis was used to determine the influence of each level of the system (physician, site, network organization, plan) on each measure. Risk of misclassifying individual physicians was evaluated under varying reporting frameworks.
All measures except 2 achieved physician-level reliability of at least 0.70 with samples of 45 patients per physician, and several exceeded 0.80. Physicians and sites accounted for the majority of system-related variance on all measures, with physicians accounting for the majority on all “interaction quality” measures (range: 61.7% to 83.9%) and sites accounting for the largest share on “organizational” measures (range: 44.8% to 81.1%). Health plans accounted for neglible variance (<3%) on all measures. Reporting frameworks and principles for assuring misclassification risk ≤2.5% were identified.
With considerable national attention on the importance of patient-centered care, this project demonstrates the feasibility of obtaining highly reliable measures of patients' experiences with individual physicians and practices. The analytic findings underscore the validity and importance of looking beyond health plans to individual physicians and sites as we seek to improve health care quality.
The past dozen years have seen unprecedented investments in developing and testing health care quality measures. Yet, by most accounts, we remain in the very early stages of these endeavors.1,2 The Institute of Medicine (IOM) report Crossing the Quality Chasm highlighted the challenges of achieving a truly outstanding health care delivery system.3 Patient-centeredness was 1 of 6 priority areas identified as having vastly inadequate performance. With this, patient-centered care went from being a boutique concept to a widely sought goal among health care organizations nationwide.
These events heightened the importance of patient surveys as essential tools for quality assessment. A well-constructed survey offers a window into patient experiences that is otherwise unavailable. Surveys measuring patients' experiences with health plans are already widely used, with results reported annually through the National Committee on Quality Assurance and the Centers for Medicare and Medicaid Services. However, the limitations of measuring quality only at the health plan level have become clear.4–6 In the late 1990s, several initiatives began developing surveys measuring patients' experiences with medical groups and then, increasingly, with individual physicians. The Massachusetts Ambulatory Care Experiences Survey (ACES) Project, a statewide demonstration project conducted in 2002 to 2003, is the most extensive such effort completed to date.
The ACES Project involved a collaboration among 6 payers (5 commercial health plans and Medicaid), 6 physician network organizations, and the Massachusetts Medical Society working through Massachusetts Health Quality Partners. The project sought to address 3 principal concerns related to the feasibility and merit of this form of measurement: (1) ascertaining sample sizes required to achieve highly reliable physician-specific information; (2) evaluating the need for plan-specific samples within physicians' practices; and (3) establishing the extent to which patients' experiences are influenced by individual physicians, practice sites, physician network organizations, and health plans.
Sample frames were provided by the 6 payer organizations. Files from the 5 commercial plans included all adult members in the plan's HMO product, and for each member, provided a primary care physician identifier. Because the Medicaid Primary Care Clinician (PCC) Plan registers members to sites, not physicians, Medicaid provided primary care site identifiers for each member. A linked sampling file was created using available physician and site identifiers.
Because a principal objective of the study was to estimate plan-physician interaction effects, physicians had to have at least 75 patients in each of two or more commercial plans to be eligible. Practice sites were eligible if they had at least 2 adult generalist physicians, and if at least two-thirds of the site's physicians were eligible. Because the majority of adult generalists in Massachusetts are in settings with at least 4 physicians (81%) and accept multiple payers, these criteria caused few exclusions except among physicians in solo practice (6.9%) and in two-to-three physician practices (12.1%) wherein one or more physicians had an extremely limited panel. We worked with leadership at each of the 6 Physician Network Organizations to identify eligible practice sites statewide.
We randomly sampled 200 commercially insured patients from each physician's panel, stratifying by health plan, with plan allocations as equal as possible within physician samples. Most physician samples drew patients from 2 commercial plans, but some panels afforded sample from 3. Primary Care Clinician enrollees were selected by identifying sites from the sample that had at least 200 PCC enrollees and drawing a starting sample of approximately 100 PCC enrollees per site. The study's starting sample (n=44,484) included 39,991 commercially insured patients from the practices of 215 generalist physicians at 67 sites, and 4,493 Medicaid PCC enrollees from 47 of these sites (see Fig. 1).
The ACES administered to study participants was developed for the project, drawing substantially upon existing validated survey items and measures.4,7,8 The survey's conceptual model for measuring primary care corresponds to the IOM definition of primary care.9 Ambulatory Care Experiences Survey extends beyond previous similar measures of these domains in ways suggested by our National Advisory Committee, which included national and local organizations representing physicians, patients, purchasers, health plans, and accreditors. The survey produces 11 summary measures covering 2 broad dimensions of patients' experiences: quality of physician-patient interactions (communication quality, interpersonal treatment, whole-person orientation, health promotion, patient trust, relationship duration) and organizational features of care (organizational access, visit-based continuity, integration of care, clinical team, and office staff). All measures range from 0 to 100 points, with higher scores indicating more favorable performance. The initial survey question named the primary care physician indicated by the respondent's health plan and asked the respondent to confirm or correct this. For Medicaid, respondents confirmed or corrected the primary care site, and then wrote-in their primary care physician's name.
Data were obtained between May 1 and August 23, 2002 using a 5-stage survey protocol involving mail and telephone.10 The protocol included an initial survey mailing, a reminder postcard, a second survey mailing, a second reminder postcard, and a final approach to all nonrespondents using either mail (commercially insured) or telephone (Medicaid). For Medicaid enrollees, mailings included both English and Spanish materials, and telephone interviews were available in both languages. Costs per completed survey were $10 for commercial and $24 for Medicaid.
This protocol yielded 12,916 completed surveys (11,416 commercial, 1,304 Medicaid) for a response rate of 30% after excluding ineligibles (n=1,716). On average, 58 completed questionnaires per physician were obtained (SD=14.4). Nonresponse analyses are summarized here, with further detail provided at [available online at http://www.blackwell-synergy.com/doi/suppl/10.1111/j.1525-1497.2005.00311.x/suppl_file/JGI+311+Supplementary+Material.doc]. As is customary in survey research, nonresponse was higher among younger, poorer, less educated, and nonwhite individuals.11–13 To evaluate whether nonresponse threatened the data integrity for physician-level analysis, we evaluated whether the nature or extent of nonresponse differed by physician. Using U.S. Census block group-level data for the full starting sample (n=44,484), we evaluated whether the factors associated with individuals' propensity to respond (e.g., age, race, education, socioeconomic status) differed by physician. There were no significant interactions between propensity to respond14 and physician—signifying that the lower propensity of certain subgroups applied equally across physicians. To evaluate whether differential response rates among physicians threatened data integrity, we evaluated whether the timing of survey response differed by physician. Previous research suggests that late responders report more negative experiences than initial responders, and that late responders' characteristics and experiences approximate those of nonrespondents.15–17 While some evidence suggests that early and late responders are equivalent,18–20 no research has found more favorable views among late responders. We found no significant interactions between stage of survey response and physician (P>.20). Overall, nonresponse analyses suggest that differential response rates across physicians (15% to 35%) were associated with effects no greater than 0.8 points, which are approximately one-seventh the size of true physician-level differences observed for the ACES measures. Thus, despite a relatively low response rate, nonresponse analyses suggested that the nature of nonresponse does not vary significantly by physician, and that the extent of nonresponse does not vary sufficiently to threaten the integrity of physician-level analyses.
Analyses included all respondents who confirmed having a primary care physician, whose physician was part of the original sample, and who reported at least one visit with that physician in the past 12 months (n=9,625). In the commercial sample, these criteria excluded 790 respondents who indicated that the plan-named physician was incorrect (6.9%) and 1,129 respondents who reported no qualifying visits (11%). For Medicaid, the criteria excluded 732 respondents who either did not name a primary physician or named a physician not included in the study sample.
We compared the sociodemographic profiles, health, and care experiences of commercial and Medicaid samples. Next, we calculated the physician-level reliability (αMD) of items and scales under varying sample size assumptions using intraphysician correlation and the Spearman Brown Prophecy Formula.21 The physician-level reliability coefficient (αMD) signifies the level of concordance in responses provided by patients within physician samples (range: 0.0 to 1.0). The Spearman Brown Prophecy Formula estimates reliability for projected sample sizes in the same way that estimated standard error allows studies to estimate the accuracy of a mean in a proposed experiment, and thus allowed us to determine measure reliability under varying sample size assumptions.
Next, we estimated the probability of incorrectly classifying a physician's performance (“risk of misclassification”) using the Central Limit Theorem of statistics to compute the expected percentage of physicians who would be misclassified to the next lower performance category under varying levels of measurement reliability (αMD=0.70 to 0.90). The Central Limit Theorem asserts that, for an individual physician, the observed mean score across multiple independent patients is approximately normally distributed around a physician's true mean score. This applies here because scores are limited to a 0 to 100 range, the patients are independent, and the statistic in question is a sample average for an individual physician. This fact allows us to estimate, for an individual physician, the probability of being misclassified within any range of scores. With additional information on the distribution of true mean scores in a population of physicians, we can make unconditional misclassification probability estimates for that physician population.
Finally, we used Maximum Likelihood Estimation methods22 to determine the influence of each level of the system (physician, site, network organization, plan) on each ACES measure, controlling for patient characteristics (age, sex, race, years of education, number of chronic medical conditions) and for system interaction effects (i.e., plan-site and plan-MD interactions).
Commercial and Medicaid samples differed significantly on most sociodemographic and health characteristics (Table 1)Medicaid enrollees reported shorter primary care relationships than commercially insured (P≤.05), but otherwise reported similar primary care experiences. Where differences occurred, trends favored the experiences reported by Medicaid enrollees somewhat.
With samples of 45 patients per physician, all ACES measures except 2 (clinical team, office staff) achieved physician-level reliability (αMD) exceeding 0.70, and several measures exceeded αMD=0.80 (Table 2)Sample sizes required to achieve αMD≥0.90 generally exceeded 100 patients per physician. Variability among physicians, as measured by the standard deviation of the physician effect (SDMD), averaged 6.0 points on “organizational” features of care (range: 3.7 to 10.5) and 7.2 points on “interaction quality” measures (range: 4.1 to 13.9).
Analyses of misclassification risk highlighted 3 important influences: (1) measurement reliability (αMD), (2) proximity of a physician's score to the performance cutpoint, and (3) number of performance cutpoints.Table 3 illustrates diminishing probability of misclassification for scores more distal to a performance cutpoint (column 1), and with higher measurement reliability (αMD). The findings suggest the usefulness of establishing a “zone of uncertainty” around performance cutpoints to denote scores that cannot be confidently differentiated from the cutpoint score. The logic is very similar to that of confidence intervals and, in the illustration provided (Table 3)diagram), is equivalent to a 95% confidence interval. Note that higher measurement reliability affords smaller uncertainty zones (i.e., ±3.26 points vs 6.3 points). In this illustration, 3 performance categories are established, and risk of misclassification to the next-lower category is no greater than 2.5%. That is, a physician whose true score is “above average,” has no more than 2.5% risk of being observed as “average” with a given respondent sample. A performance report imposing multiple cutpoints would require this buffering around each cutpoint in order to maintain the desired level of classification certainty. Thus, even with highly reliable measures, going beyond 2 or 3 cutpoints appears ill-advised in the complexity and overlapping uncertainty zones that would likely result.
Table 4 reveals that, on average, 11% of variance in the measures (range: 5.0% to 22.1%) was accounted for by our models. Of the variance accounted for by the delivery system (physician, site, network, plan), individual physicians and practice sites were the principal influences. For organizational features of care, sites accounted for a larger share of system-related variance than physicians (range: 44.8% to 81.1%), but the physician contribution was substantial (range: 17.8% to 39.4%). For all measures of “interaction quality,” physicians accounted for the majority of system-related variance (range: 61.7% to 83.9%), though sites played a substantial role (range: 22.4% to 38.3%). Networks accounted for no variance except on 3 measures (organizational access, visit-based continuity, office staff). Health plans accounted for negligible variance on all measures.
Plan-physician interaction effects were less than 2 points for all measures, and were zero for 5 measures, suggesting that experiences in a physician's practice do not differ meaningfully by plan (commercial payers). ANOVA models testing Medicaid-physician interaction effects suggest that these findings generalize to Medicaid. That is, within a physician's panel, Medicaid enrollees produced results that were identical or nearly identical to those of commercial patients.
This demonstration project addresses several important questions concerning the feasibility and merit of measuring patients' experiences with individual physicians. Perhaps most importantly, the measures showed high physician-level reliability with samples of 45 patients per physician, and at these levels of reliability, the risk of misclassification was low (≤2.5%) given a reporting framework that differentiated 3 levels of performance. However, the results also underscored that high misclassification risk is inherent for scores closest to the performance cutpoint (“benchmark”) and where multiple cutpoints are introduced. Thus, reporting protocols will need to limit the number of performance categories, and determine how to fairly handle cases most proximal to performance cutpoints, where misclassification risk will be high irrespective of measurement reliability. Also importantly, the study showed that individual physicians and practice sites are the principal delivery system influences on patients' primary care experiences. Network organizations and health plans had little apparent influence. These findings accord with previous analyses of practice site, network, and health plan influences on patients' experiences,4,23 but extend beyond previous work by estimating the effects of individual physicians within practices sites.
The results address several persistent concerns about physician-level performance assessment. First, the difficulties of achieving highly reliable performance measures have been noted for condition-specific indicators, given limited sample sizes in most physicians' panels, even for high prevalence conditions like diabetes and hypertension.2,24,25 Because the measures evaluated here apply to all active patients in a physician's panel, sample sizes required for highly reliable measures are easily met. Second, questions about whether measurement variance can be fairly ascribed to physicians themselves versus to the systems in which they work and the patients they care for have been prominent a concern.24–27 In this study, physicians and sites accounted for the vast majority of delivery system variance for all measures, but there was substantial sharing of variance between physicians and sites. The results suggest a shared accountability for patients' care experiences and for improving this aspect of quality.
Importantly, although, a substantial amount of variance on all measures remained unaccounted for. Although the delivery system variance accounted for here is considerably higher than that accounted for in studies of other types of performance indicators,24,26,28 it nonetheless raises a critical question for the quality field: Is it legitimate to focus on the performance of physicians and practices when there are so many other influences at play? This seems analogous to questioning whether clinicians ought to focus on particular known clinical factors, such as blood cholesterol levels, even when these factors only account for a modest share of disease risk. As with health, the influences on quality are multifactorial. Because there is not likely to be a single element that substantially determines any dimension of quality, we must identify factors that show meaningful influence and are within the delivery system's purview. In this study, average performance scores across the physician population spanned more than 20 points out of 100. This suggests that meaningful improvement can be accomplished simply by working to narrow this differential. The well-documented benefits of high quality clinician-patient interactions—including patients' adherence to medical advice,29–33 improved clinical status,34–37 loyalty to a physician's practice,38 and reduced malpractice litigation39–41—suggest the value of doing so.
As with any area of quality measurement, however, there are costs that must be considered against the value of the information. Data collection costs associated with this study suggest that obtaining comparable information for adult primary care physicians statewide (n=5,537) would cost approximately $2.5 million- or 50-cents per adult resident. Extrapolating to adult primary care physicians nationally (n=227,698), “per capita” costs across U.S. adults appear similar. Of course costs of such an initiative are highly sensitive to numerous variables—including the frequency, scope and modes of data collection6—and thus can only be very roughly gauged from this initiative. But a serious investment would clearly be required to accomplish widespread implementation of such measures, and while several such initiatives are currently underway,42–45 the potential for sustaining and expanding upon these over the long term remains unclear.
There are several relevant study limitations. First, the study included only patients of managed care plans (commercial and Medicaid). For other insurance products, where payers are not explicitly aware of members' primary care arrangements, a different sampling methodology would be required. Second, the initiative was limited to one state. In other geographic areas, individual plan effects could be larger, although previous evidence from national studies suggests these are unlikely to be of a magnitude necessitating plan-specific samples in a physician's practice.4,23 Finally, the measures here do not afford information on technical quality of care. Methodologies other than patient surveys are required for that area of assessment.
In conclusion, with considerable national attention focused on providing patient-centered care, this project demonstrates the feasibility of obtaining highly reliable measures of patients' experiences with individual physicians and practices. Physician-specific samples were created by pooling sample across multiple payers, and with samples of 45 completed surveys per physician, highly reliable indicators of patients' experiences were established. The finding that individual physicians and sites account for the majority of system-related variance indicates the appropriateness of focusing on these levels for measuring and improving these dimensions of quality.
The importance of adding measures of patients' experiences of care to our nation's portfolio of quality measures is underscored by recent evidence that we are losing ground in these areas.46,47 The erosion of quality on those dimensions stands in sharp contrast to recent improvements noted in the technical quality of care.48–50 The improvements in technical quality, seemingly spurred by the routine monitoring and reporting of performance on these measures, lend credence to the aphorism that “what gets measured gets attention.” With key methodological barriers to measuring patients' experiences with individual primary care physicians and practices addressed, it is time to add this balance and perspective to our nation's portfolio of quality measures.
This research was supported by grants from the Commonwealth Fund and the Robert Wood Johns Foundation. The authors gratefully acknowledge members of our National Advisory Committee, our Massachusetts steering committee, and our project officers (Anne-Marie Audet, M.D., Steven Shoenbaum, M.D., and Michael Rothman, Ph.D.) for their invaluable advise and guidance throughout the project period. The authors also gratefully acknowledge Paul Kallaur, Nina Smith and their project staff at the Centers for the Study of Services (CSS) their technical expertise and commitment to excellence in obtaining the data from the study sample. Finally, we thank Ira B. Wilson, M.D., M.S. for comments on an earlier draft and Jamie Belli for technical assistance in preparing this manuscript.