|Home | About | Journals | Submit | Contact Us | Français|
Many wish to change incentives for primary care practices through bundled population-based payments and substantial performance feedback and bonus payments. Recognizing patient differences in costs and outcomes is crucial, but customized risk adjustment for such purposes is underdeveloped.
Using MarketScan’s claims-based data on 17.4 million commercially-insured lives, we modeled bundled payment to support expected primary care activity levels (PCAL) and 9 patient outcomes for performance assessment. We evaluated models using 457,000 people assigned to 436 primary care physician panels, and among 13,000 people in a distinct multi-payer medical home implementation with commercially-insured, Medicare and Medicaid patients.
Each outcome is separately predicted from age, sex, and diagnoses. We define the PCAL outcome as a subset of all costs that proxies the bundled payment needed for comprehensive primary care. Other expected outcomes are used to establish targets against which actual performance can be fairly judged. We evaluate model performance using R2s at patient- and practice-levels, and within policy-relevant subgroups.
The PCAL model explains 67% of variation in its outcome, performing well across diverse patient ages, payers, plan types and provider specialties; it explains 72% of practice-level variation. The outcome-specific models explain 17% to 86% of practice-level variation in 9 performance measures, often substantially outperforming a “one-size-fits-all” CMS-like HCC score, eg, with grouped R2s of 47% versus 5% for predicting “prescriptions for antibiotics of concern.”
Existing data can support the risk-adjusted bundled payment calculations and performance assessments needed to encourage desired transformations in primary care.
The goal of primary care payment reform is to achieve “better value—defined as better outcomes at less cost— … by rewarding physicians for prevention and coordination rather than volume of services.”1 While many have argued the importance of risk adjustment for calculating bundled payments and bonuses for good performance,2-4 little guidance exists regarding how to do so. This paper addresses that gap.
In a 2010 survey, 25 of 26 patient-centered medical home (PCMH) pilots principally relied on fee-for-service (FFS) payments, typically augmented with a management fee under ten dollars per member per month.5 The fee is often slightly higher for the “very sick” than for others, as in the AMA’s 2008 RUC calculations for Medicare’s 2008 Medical Home Demonstration Project.6 Reforms envisioning larger bundled payments typically acknowledge the need for stronger risk adjustment. For example, the Center for Medicare and Medicaid Innovations 2011 Comprehensive Primary Care Initiative (CPCI) retains FFS payments supplemented with a management fee averaging up to $20 per-patient-per-month, and ranging between $60 and $480 per-patient-per-year, depending on the patients’ CMS-HCC score.7 The CPCI also proposes significant bonuses, to be calculated based on shared savings and performance measures. A more radical reform proposed by Goroll et al2 would replace all primary care FFS income with comprehensive monthly bundled payments plus substantial performance-based bonuses. Our current work directly supports the Goroll framework. Its monthly payments are neither intended to cover all services (full capitation), nor to be just an add-on to existing fee-for-service revenues. Rather, we sought to develop a principled approach to computing the “primary care activity level” (PCAL) needed, that is, the cost of all services that primary care practitioners (PCPs) should provide. These payments rightly vary hugely between fundamentally healthy and highly complex patients.
Although we focus here on primary care payments, our approach is relevant to many other settings. An accountable care organization (ACO) could use our PCAL calculation to set budgets and incentives for its PCPs.8 Or, a model of the outcome “PCAL minus FFS reimbursement” could be used to calculate risk-adjusted case-management supplements to FFS, as proposed for the CPCI. Our paradigm also aligns well with the goals of value-based purchasing, and can be used to produce a risk-adjusted expected value for any population-based outcome that can be modeled in existing large databases.9,10
Our bundled payment model was implemented in 2009 by the Capital District Physician’s Health Plan (CDPHP), a not-for-profit, network-model, physician-guided health plan with 350,000 members concentrated in upstate New York. CDPHP implemented an early version to pay 3 PCMH pilot practices for their CDPHP patients (private HMO, private non-HMO, Medicare Advantage, and Medicaid HMO enrollees) in January 2009.11 This pilot was organized as a “virtual all-payer” system, in that CDPHP financed practices to implement the PCMH as if CDPHP had insured all their patients.3
Another leg of envisioned reform is outcome-based bonuses. Goroll and others have called for large risk-adjusted bonuses (up to 25% of total income) for achieving desired outcomes in cost, quality, and patient experience.2 Although using non-adjusted performance measures may create undesirable incentives for practices to avoid the sickest patients, even crude adjustments are rare.12-15 Here, we explore the importance of risk adjustment for assessing provider performance and examine our models’ performance for patient panels assigned to primary care practices. Our approach is population-based and empirical; it seeks to encourage improved management and outcomes for whole persons. Risk adjustment rewards practices when their patients’ outcomes are better than expected. Here, “what-is-expected” reflects patient-specific normative relationships calculated via a model. When the model is refit to new data, the norm shifts to reflect the “new normal;” thus, as a delivery system improves, “the bar” rises with it. These models currently rely only on age, sex, and claims-based diagnoses to define both predictors and outcomes. Soon, electronic health records and patient surveys must also be used, both to include non-medical factors as predictors, and quality- and patient-experience data as outcomes.
While bundled base payments allow a PCP to allocate resources efficiently, bonus payments can directly discourage low-value services and encourage activities that promote clinical quality, patient well-being, and satisfaction. Risk-adjusted bonuses are intended to ensure that each practice can earn rewards for doing a good job with its patients, and to mitigate incentives for cherry-picking easy patients and dumping difficult ones.
For each performance measure, we first build a patient-level model to predict its associated outcome from patient characteristics (age, sex and diagnoses). A practice is judged by comparing its patients’ aggregated observed outcome (O) to its model-based expected (E), or predicted, level. We acknowledge, but do not address here, the many issues associated with separating “signal” from “noise” when judging single practices on individual outcomes, or when creating a useful composite score (leading to a practice-level bonus payment) based on multiple measures.13,16,17 Our aim is to demonstrate the feasibility of risk-adjusted performance assessment, and its importance, given that fixed targets punish good providers whose complex patients, even if doing “better than expected,” don’t hit targets that are easier to achieve with healthier patients.
Each practice receives a monthly base payment to support providing its patients with comprehensive primary care. For a complex patient, this might need to be ten or even fifty times larger than for a healthy one. We must “get the price right” for highly diverse individual patients.
It is now quite standard to develop full capitation payments (eg, to a Medicare Advantage plan), by first using a large benchmark dataset to fit a model to predict Y0 = total cost from age, sex, and diagnoses. The purpose of modeling is to establish the relative amounts of resources that are typically used for different kinds of patients. The mean value of the predictions into payments. For example, we might specify where and are used to ensure that a one unit increase in translates into an appropriate additional payment for an individual who needs more attention, and that the total of all payments matches budgeted funds.
Using existing claims data to calculate bundled primary care payments is similar but harder. The main problem is that, unlike total cost, the actual primary care activity level (or PCAL), that is, the money spent on providing comprehensive services – cannot be observed directly. Why? Because today’s billing data reflect the sorry state that reform seeks to redress: many services that the bundled payment is intended to encourage are often not done, or even when done, are either undercompensated or not billable at all.18
Since the PCAL outcome cannot be observed directly in claims data, we collaborated with researchers at Verisk Health, Inc., of Waltham, Massachusetts, and the Massachusetts Coalition for Primary Care Reform (MACPR) to create an outcome Y described in detail below, as a proxy. We used regression to predict Y, calling this prediction PCAL.
We also developed risk-adjustment models for 9 utilization and efficiency measures. One predicts total health spending, an important target for reduction. Three relate to pharmaceuticals. Total “spending on prescription drugs” is a poor performance measure because it reflects both valuable and wasteful spending. Nonetheless, it may be useful to know when a practice’s pharmaceutical spending is far from expected. More focused performance measures include “number of prescriptions for ‘antibiotics of concern’” and “total number of antibiotic scripts,” each based on a HEDIS definition from the National Committee for Quality Assurance.19 We also modeled 3 hospitalization count measures, ranging from all admissions to only ambulatory-care sensitive (ACS) ones.20 Two additional models count: relative value units (RVUs) for advanced imaging, and emergency department (ED) visits. We evaluate all measures at both individual and practice levels.
We estimated PCAL models using 2007 Thomson Reuters MarketScan Commercial Claims and Encounter data. MarketScan contains age, sex, eligibility information, and medical and pharmacy claims for beneficiaries mostly in large, well-insured firms. The estimation sample included 17.4 million commercially-insured people with at least six months of eligibility, non-missing age and sex, and prescription drug coverage; over 166 thousand were age 65 and over. We calculated number of covered months (eligibility), used for analytic weighting, and various components of total expenditures (eg, specialist care, hospital care, outpatient drugs, and emergency department visits), used for constructing several outcome variables and PCAL.
To evaluate model performance for practices, we created a Practice-Based subset from among the 1,668,486 people in MarketScan who could be assigned to a PCP (multi-person practices could not be identified). We selected patients with known county-of-residence assigned to health plans that: were not consumer-directed or exclusive provider organizations, had at least 1000 enrollees, and had acceptable data (at least half of professional claims with a valid provider ID, specialty, and county). Based on the plurality of PCP dollars in 2007 (and, if none, then 2006 dollars), we assigned each patient uniquely to a family medicine, internal medicine, geriatric, pediatric or other primary care physician (PCP). This method resembles CMS’s proposed rule for ACOs.21 Those with no PCP visits in either year (29%) were randomly assigned to a practice in the same county, in proportion to numbers of patients already assigned to that PCP. Restricting to PCPs with between 500 and 5000 assigned patients left 456,781 patients and 436 PCPs. Except for randomly assigning unassigned patients, these are real physician panels, including all insured individuals, even those with no primary care contacts. We used this subsample to evaluate practice-level performance.
We also validated PCAL predictions in a distinct 13,000-person Capital District Physician’s Health Plan database with enrollees in HMO, PPO, and POS plans covered by private employers, Medicare, and Medicaid. After describing the PCAL model and its properties, we present results from CDPHP’s internal validation studies on its 2006-07 data, before PCMH implementation.
The PCAL outcome Y is a subset of a person’s current spending designed to represent the dollars that should have been available for delivering comprehensive primary care. PCAL is the from regressing this Y on age, sex and diagnoses in benchmark data, typically after dividing by its sample average to achieve a normalized risk score, nrs. In the risk-adjustment literature, nrs is commonly called a relative risk score.22 After normalization, 1.0 denotes average primary care need, while, say, 1.5 describes someone with 50% greater need. We “lightly recalibrate” PCAL for use in a new population or subpopulation by regressing the same PCAL outcome Y on nRS, yielding . For example, a and b may be payer- or plan-specific constants. The resulting PCAL predictions can be divided by their sample mean in the new population, producing a normalized risk score there.
Our idea for specifying the proxy outcome Y for PCAL is to use resources spent on other kinds of care to “signal” the need for primary care services, eg, to handle simple problems in-house that might otherwise be referred out; to avert crises by attentively managing chronic problems; or to coordinate care for patients during and after hospitalizations and other crises. Specifically, we define Y for each person during a year as the following dollar amount:
Where did these numbers come from? First, we consulted with five practicing primary care clinicians, asking them to estimate how much of their time was spent on various activities. Their average, rounded responses are shown in Table B1 of a Supplemental Digital Content Appendix, referenced hereafter as SDC. http://links.lww.com/MLR/A291 We then calculated the fractions of observable spending variables needed to reflect these allocations. For example, given that about 50% of PCP time is spent on core primary care services and 10% (1/5 as much) on managing prescription drugs, we calculated that 12% of prescription drug spending needed to be included in Y to make pharmacy spending contribute about 1/5 as much as core primary care spending. Thus, for every $100 of pharmacy spending in the data we added $12 to Y, envisioning that a comprehensive primary care provider would have needed that level of resources to manage the medications. Prior to making these allocations, we had top-coded each subcomponent at its 99.9th percentile; this limited the effect of extreme outliers while only reducing the overall mean by 1.7%. We included $65 to recognize fixed overhead costs of activities such as monitoring, email or phone consultations, and encouraging prevention, even for people with no current claims. We frequently shared the implications of choices with our clinician panel, thereby allowing practicing doctors to examine the face-validity of the resulting relationships. For example, before settling on the above formula for Y, our physicians reviewed and verified the plausibility of the resulting normalized PCAL scores for several dozen patient illness profiles in which various medical conditions were added to or subtracted from realistic patient profiles.
In summary, our physician panel examined for plausibility both the process for choosing model parameters (eg, the fractions used to define Y and the flat base payment amount) and their consequences for PCAL scores. We took additional comfort from unpublished sensitivity analyses suggesting that alternate PCAL models, based on fairly different choices, lead to highly correlated practice-level payments. However, if another group implementing these ideas preferred different parameters, it is not hard to derive a PCAL based on their choices. We make no claim that our choices are optimal, merely that they are reasonable; our key innovation is in conceptualizing, implementing and testing a credible and flexible approach to predicting primary care need from age and sex and the diagnoses and costs recorded in claims data. Summary statistics for the PCAL model are in the SDC, Supplemental Digital Content-1 http://links.lww.com/MLR/A291.
A practice’s base payment is the sum of the expected cost of all PCMH services (that is, the PCALs) for its assigned patients, not fees for actual services; PCAL comes from regressing the just-described Y on age, sex, and a vector of 394 hierarchical condition categories (HCCs) recorded during the same year and populated using Verisk Health’s DxCG Version 7 clinical classification. These categories refine the CMS-HCC model (with only 70 to 86 HCCs). That model is currently used to calculate full capitation payments for Medicare Advantage plans and has been proposed for risk-adjusting care-management payments in Medicare’s Comprehensive Primary Care Initiative.7,23-25
Unlike CMS’s Medicare implementation, but following the Massachusetts Alternative Quality Contract ACO8, we used a concurrent model (relying on demographic and diagnostic data to predict same-year costs) to increase PCAL’s accuracy in estimating this year’s needs, and to limit financial risk for small practices. Verisk Health provides a web-accessible description of its Version 7 release, including its differences from CMS’s HCC model.26 In the online SDC, http://links.lww.com/MLR/A291 we describe the MarketScan data and demonstrate the stability of large-parameter models estimated on it across 6 years and diverse plan types.
The PCAL model includes interactions between age groups and diseases and across disease clusters based on statistical significance and face validity with our physician panel, who also reviewed PCAL model parameters, especially examining very high-cost and relatively rare conditions for which empirically estimated costs are least precise. The initial regression model contained 569 parameters. Second-stage regressions on these fitted values for each age-sex group ensured that all predictions are non-negative and that final predictions reflect actual differences in resource use for men and women of all ages and risk scores. Plan type was ignored during model estimation, but examined for validity. For comparison, we also estimated and evaluated models predicting total health spending, total spending on all PCPs, and total spending on primary care evaluation and management services by PCPs. Following CMS’s HCC modeling procedures, all regressions annualize spending for people with partial-year eligibility and weight observations by eligible months.
Using similar methods and identical data as above, researchers at Verisk Health, Inc., with input from our physician panel and us, estimated linear regression models (including zero-cost cases) in our full claims database. We examined how well these models explain variation in outcomes for individuals and practices in the Practice-Based subset defined above. The more strongly patient characteristics predict an outcome, the more important risk adjustment becomes. Because the normalized risk score that predicts total health spending is a good proxy for total morbidity burden, we distinguish this outcome by calling it and the normalized risk score that predicts it Y0 and nRS0, respectively.27 For each outcome Yi we consider both a “tailored” specification, regressing it on the nRSi from the model calibrated specifically to predict it (that is, Yi = a + b nRSi), and a “generic” one, regressing Yi on nRS0, the normalized risk score for total spending. By definition, these regressions coincide for Y0. Comparing the predictive power (R2) of generic and tailored specifications quantifies the value of outcome-specific risk adjustment over a one-size-fitsall “risk” calculation for all outcomes, as CMS contemplates using in its Comprehensive Primary Care Initiative. We also calculated practice-level grouped R2s by reducing the data set to one observation per practice (n = 436) and using practice-specific average values for the Y and nRS variables; that is, making predictions of the form , for various outcomes among the 436 PCPs.
The model predicting the Primary Care Activity Level (PCAL) proxy Y uses 653 parameters and explains 0.67 of the variation in Y at the individual patient level (ie, R2 = 67%). In the same sample, R2s for concurrent models predicting total health spending and all PCP payments are 57% and 32%, respectively. Because the development data set is huge, the PCAL model is not overfit; the R2s when fitting it to half the data and when applying this fitted model to the other half are both 67%. (See the SDC for details http://links.lww.com/MLR/A291.) Such high R2 values result from: use of a concurrent model, top-coding the individual components of the dependent variable, and the predictability of outpatient services and pharmacy spending (which contribute most of the dollars to Y).
To test how well the model applies to plans of different types, we examined mean actual and predicted PCAL normalized risk scores for 5 MarketScan plan types (SDC http://links.lww.com/MLR/A291 ). After rescaling, that is, predicting within each type using type-specific intercepts and slopes with the PCAL normalized risk score, we can predict uniformly well for all 5 plan types (R2s = 66% to 68%). Rescaling avoids underpaying for enrollees in consumer-directed health plans and non-capitated point-of-service plans in this sample. Using separate regressions on each of 22 age-gender groups, ranging from age 1 or less to age 65 and over, model R2s also remain high, explaining 60% to 66% of the variation within these age-gender groups (SDC, http://links.lww.com/MLR/A291).
The model strongly differentiates among patients: PCALs for the 0.5% of the population with the highest predicted primary care need are 16 times average, versus 1/10 of average for the 30% with the lowest predicted need: a 160-fold variation! Furthermore, across the risk spectrum, PCAL closely tracks what it is designed to predict (see SDC Figure B2 http://links.lww.com/MLR/A291). The largest absolute deviation between the PCAL proxy (Y) and PCAL is found for the top 0.5%, where mean is about 8 percent lower than that for the mean fitted PCAL.
To the extent that Y is a good estimate of the level of primary care needed for patients, a ratio of observed (Y) to expected equal to 1 is ideal, while a ratio of, say, 2 for a group suggests that the real need for their primary care is twice what is predicted. Figure 1’s left-most panel thus suggests that, with payments based on an age-sex prediction of Y, nearly 75% of the groups defined by the presence of an HCC are “underpriced” by a factor of 2 or larger. Also, while CMS-HCC-like model predictions are far more accurate than age-sex-based predictions, about half the O to E ratios for it are bigger than 1.4 (middle panel). Practices should not be asked to care for a patient expected to require over $1400 worth of work for only $1000 of payment! With our PCAL model, however, practices can assume that they will get approximately the right resources when enrolling people with medical problems in just about any HCC. Underpriced medical problems penalize practices that care for sick people and allow practices to achieve unearned profits by focusing on the healthy. Figures B3 and B4 in the SDC http://links.lww.com/MLR/A291 further show that the age-sex models tend to underpay more for rarer conditions while the CMS-HCC-based model underpays fairly uniformly for both common and uncommon medical conditions.
To assess the financial risk that a PCAL payment would impose on practices, we examined PCAL and its predicted values at the PCP-level, using the PCP-assigned subsample of the MarketScan data. When individual predictions are summed to the PCP-level and the results multiplied by a normalizing constant that makes the sum of the PCALs equal to the sum of the proxy values it predicts, the HCC model explains 72% of the variation in the PCP-level average of Y (Figure 2A), versus only 42% for a model to predict from age and sex alone (Figure 2B). Figure 2A also shows that efficient practices, as measured by the constructed PCAL proxy, are not concentrated among either simple or complex patient panels.
The needs of pediatric patients differ from those of older patients. To evaluate how well PCAL serves for different practice types, we first classified each of our 436 practices with at least 80% of its services assigned to a single primary care specialty to that specialty. Remaining practices were classified as “multispecialty” or “other” (eg, acute care, emergency, inpatient or radiology). As seen in Figure 3, while pediatric practices (19% of our sample practices) had far lower average risk scores and PCAL proxies than other practices, the model fit to their data alone had essentially the same slope and intercept as the model fit to all the data. Family medicine (29% of our sample) and internal medicine (14% of our sample) had more complex and higher average cost enrollees, but again, no obvious bias was found within or between these specialties.
CDPHP replicated the predictive power of the PCAL model at the PCP level using 2 prior years of data (2006-07) on 13 physicians (22,800 patient-years). At the individual level, the PCAL model explains 54% of the variation in PCAL services provided to commercial, Medicaid, and Medicare patients. The PCP-level R2 in this outside sample is 73%, as compared to the 72% achieved in the commercially-insured development sample, even though that sample included no Medicare or Medicaid enrollees. To see if models calibrated on MarketScan’s data predict well across payer types, we regressed total spending on a single predictor (concurrent nRS) in CDPHP’s 75% commercial, 7% Medicare and 18% Medicaid enrollees. Individual-level R2 values are high in each subpopulation: 60%, 65% and 56% respectively (SDC http://links.lww.com/MLR/A291).
High R2s suggest that risk adjusting is important for PCAL capitation. Holding PCPs responsible for all spending (full capitation) imposes sizeable risks on individual practices. The average practice size in our sample was 1048 people, somewhat smaller than a typical PCP patient panel, but realistic if PCPs only receive bundled payments for some patients. For each of 4 dependent variables, we calculated both age-sex models and Verisk Health HCC models that also used diagnoses. VH-HCC risk adjustment meaningfully reduces unexplained practice-level variations in spending relative to non-risk-adjusted variation, with the largest reductions in the PCAL and total-spending models. Standard deviations (SDs) in Table 1 also show that financial risks under the PCAL model are far less than under full capitation, where practices are at risk for total health spending. Thus, full capitation — even with sophisticated risk adjustment that reduces the PCP-level average per capita SD of total health spending from $1,438 to $682 — still leaves a PCP exposed to an SD 225% higher than the annual PCP-specific revenue of $303 per patient. Practice-level PCAL payments are only slightly more risky than spending narrowly defined on core primary care services to all PCPs (SDs = $76 and $66, respectively).
Table 2 summarizes results for 9 potential performance measures. Along with sample means and standard deviations, 4 R2s are shown for each model. The individual-level R2s from generic models range from 3% to 42%, while those from the outcome-specific models are much higher, explaining 19% to 53% of patient-level variation in outcomes. The final 2 columns present the corresponding grouped R2 values at the practice level, for which the generic model R2s range from 0% to 78% while the tailored models range from 17% to 86%. Whereas the generic models explain a large fraction of the variation in broad measures such as total drug spending, hospitalizations not related to childbirth and pregnancy, and advanced imaging RVUs, they predict some other measures quite poorly.
Consider predicting number of prescriptions for antibiotics of concern, a HEDIS quality measure. The low R2s for the generic model mean that such prescriptions are poorly related to predicted total health spending, while the model that specifically identifies the effect of condition categories on antibiotic use is highly predictive. Figure 4A shows the data behind the tailored model’s grouped R2 of 47%; Figure 4B, for the generic model’s grouped R2 of 5%. In each figure, the underlying mean normalized risk scores vary considerably at the practice level (roughly 4-fold from low to high), but the tailored model is far more predictive.
Table 2 explores 3 commonly used hospital-admission models. Total admissions is the broadest measure; admissions excluding behavioral health hospitalizations (which are often contracted separately) and maternity hospitalizations is intermediate; the narrowest counts only ambulatory-care sensitive (ACS) admissions, as defined by AHRQ.20 Tailored models do only modestly better than generic models for these measures, and broader measures are generally more predictable than narrower ones. While conceptually attractive, ACS admissions are too rare in this commercially-insured sample to reliably predict, even at the practice level.
The emergency department visit data are provocative. First, predicted total spending at both the individual and practice level is essentially uncorrelated with ED visit use. Second, tailored models are only modestly predictive for individuals (R2 = 25%), with an even lower grouped R2 (17%). Perhaps other variables — such as income, education, proximity, and payer type — are important for risk-adjusting ED visits. Alternatively, although there are no theorems allowing us to directly interpret differences in the values of an individual versus a grouped R2, the highly unusual drop when moving from the individual to the practice-level measure may mean that practice-level factors strongly influence ED visit use. To the extent that the PCMH can control these factors, large unexplained practice-level variance could make risk-adjusted ED visits rates a particularly good performance measure.
Another advantage of risk-adjusted over non-risk-adjusted measures, and of the choice of measures such as prescriptions for antibiotics of concern or imaging RVUs over total health spending, is that they permit incentives to target rewarding good performance not just more fairly, but also more precisely. For a fixed amount of bonus money, a predictive model that reduces the unexplained standard deviation of the performance measure by half will enable the payment per unit of the outcome to be about twice as large, strengthening incentives to do well.
Zaslavsky and Epstein, examining HEDIS quality measures for individuals and plans, showed that socioeconomic variables can strongly predict quality outcomes.28,29 Although little SES data are coded in large populations, health plans or provider networks can use SES proxies (such as payer-type and geography-based variables) to modify PCAL and other claims-based predictions.
There is a growing consensus that “improving the health of patients and the viability of the healthcare delivery system [requires] a better model of compensating clinicians.”1 One key component is monthly care-coordination payments for primary care teams to support “up-front costs to maintain the required level of care;” such payments “should be risk-adjusted to ensure that there are no inherent incentives to avoid the treatment of more complex, costly patients.”1 Another is performance-based payments for achieving quality and efficiency goals. To protect providers with complex patients, these should also be risk-adjusted. Whether or not the desired primary care transformation takes place in a patient-centered medical home,2,4,30,31 proposed reforms all recognize the importance of paying for care coordination and providing credible performance assessments.
We developed and evaluated risk-adjusted PCAL base payments and performance measures using empirical criteria to estimate essentially all the resources needed for care and to determine what constitutes good performance. Empirical models, based on observed-to-expected comparisons, can be derived, tailored and updated more quickly than resource-intensive and subjective target-setting based on expert and stakeholder panels. Our work suggests that claims-based models may provide “good enough” incentives to start, much as claims-based risk adjustment has been used in the Medicare Advantage program. One early adopter of our claims-based PCAL, CDPHP, is expanding its use into a large second phase PCMH pilot.11
By calculating a bundled payment for only a particularly relevant subset of spending for primary care, we avoid the problem of full capitation imposing unreasonable financial risk on typical primary care practices whose incomes currently comprise only 5-7% of total health spending in the US. 32,33 This is an important motivation for our narrower, less financially risky measure: the primary care activity level (PCAL).
Risk adjustment is central to creating fair bundled primary care payments, because the costs and complexities of caring for patients vary enormously. We found that the predicted and apparent costs of providing comprehensive primary care vary more than 100-fold across patients and showed that sophisticated risk adjustment (here, a 394-category HCC model) is required to adequately distinguish across such huge differences.
While estimating a PCAL-like model is relatively straightforward, implementation in a multi-stakeholder environment is complex. Although many choices were needed to define the particular models shown here, we did not systematically explore all alternatives. Future research should study, for example, prospective vs. concurrent PCAL models, different top-coding choices, and employing other fractions for various kinds of health care spending in the PCAL proxy outcome. In large-scale implementation of any PCAL model it will be important to explore and address the implications for all kinds of patients. For example, for a patient with multiple complications of diabetes, the elevated PCAL dollars might not be appropriate if all care has been transferred to a specialist, but could be extremely useful if used to promote coordinated care between a PCMH and an endocrinologist. Another tool for fine-tuning the basic PCAL logic would be to place clinically-determined “credibility constraints” on unreasonably high, or low, model coefficients or predictions.
Risk adjustment is also important for performance assessment, as we demonstrate for several cost- and utilization-based performance measures – explaining about half of all practice level variation. Its importance for clinical quality and patient experience measures can be determined in patient-level databases that can link such outcomes to claims. We posit that any measure should be risk adjusted “until proven otherwise” – that is, unless it is shown that patient factors cannot predict it.
We have demonstrated the utility of claims-based risk adjustment across diverse provider specialties, health plan types, payers, age, sex, various outcomes and in distinct datasets. Although models in this paper were designed to support replacing fee-for-service payments in a medical home entirely with bundled care-coordination payments and large bonuses,2 the approach applies more widely. Risk adjustment for fundamental payment reform is ready for implementation.
This research was supported by Verisk Health Inc. and a grant from The Commonwealth Fund. Dr. Ash was partially supported by UL1RR031982 from the National Center for Research Resources. We are grateful to Yelena Shulga of Verisk Health for modeling work, and John Ayanian, Jim Burgess, Mike Chernew, Catarina Kiefe, Andrea Kronman, Lisa Lines, Tom McGuire, and staff from The Commonwealth Fund for useful insights. The ideas here are preliminary and do not necessarily reflect the opinions of the University of Massachusetts, Boston University, Verisk Health, The Commonwealth Fund, or any other organization or entity.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.