|Home | About | Journals | Submit | Contact Us | Français|
Taiwan has instituted a pay-for-performance (P4P) program for diabetes mellitus (DM) patients that rewards doctors based in part on outcomes for their DM patients. Doctors are permitted to choose which of their DM patients are included in the P4P program. We test whether seriously ill DM patients are disproportionately excluded from the P4P program.
This study utilizes data from the National Health Insurance (NHI) database in Taiwan for the period of January 2007 to December 2007. Our sample includes 146,481 DM-P4P patients (16.56 percent of the total) and 737,971 non-DM-P4P patients.
We use logistic and multilevel models to estimate the effects of patient and hospital characteristics on P4P selection.
The results show that older patients and patients with more comorbidities or more severe conditions are prone to be excluded from P4P programs.
We found that DM patients are disproportionately excluded from P4P programs. Our results point to the importance of mandated participation and risk adjustment measures in P4P programs.
Pay-for-performance (P4P) programs that reward providers based on outcome-based performance measures, or other “external” incentives that are not determined solely by provider behavior, can produce unintended consequences (Epstein, Lee, and Hamel 2004). This is important especially when performance measures do not include risk adjustments to account for patient comorbidity or the severity of patients' conditions (Shen 2003). Providers who treat patients with more severe conditions may worry about unfair penalization because these patients are likely to cause a drop in providers' performance scores (Werner and Asch 2005). Incentives sometimes exist, therefore, to reap greater rewards by inappropriately excluding patients with more severe conditions. The potential for gaming the system using such adverse selection is problematic whenever providers are allowed to select patients for their P4P programs.
Of course, P4P programs could exclude patients appropriately in a number of circumstances. Programs might, for example, benefit from systematically excluding patients with characteristics that make them inappropriate for the measurement tools being used, or who require unique treatments (British Medical Association [BMA] 2009; Centre for Studies in Social Sciences 2009;). However, this kind of exclusion (active exclusions) such as “exception reporting” may function also as acts of adverse selection.
One study showed little evidence of adverse selection, with hospitals reporting low rates of patient exclusion in P4P programs (Doran et al. 2008). Other studies reached the opposite conclusion, showing that adverse selection does indeed pose a significant problem (Doran et al. 2006; Sigfrid et al. 2006; Gravelle, Sutton, and Ma 2010;). These latter three U.K. studies had a different context than our study generally. For instance, providers in the United Kingdom were not allowed to select the patients included in the P4P program, and these studies focused mostly on the exclusion rate for each hospital, not patient comorbidities or complications (Ryan 2009). A key objective in this paper is to evaluate the exclusion from P4P programs for diabetes mellitus (DM) of patients who have comorbidities or severe conditions. In this study, we hypothesize that providers are likely to exclude older patients and patients with high comorbidities or more severe conditions from P4P programs.
The DM-P4P program designed by the National Health Insurance (NHI) in Taiwan has been the most comprehensive, mature P4P program in Taiwan. The program is voluntary, and it has had roughly two periods of evolution since 2001. In the first period, the program not only required health care providers to participate in clinical training to become certified in Taiwan's Diabetes Shared Care System (Chiou et al. 2001), but it also encouraged health care providers to increase monitoring and follow-up care for patients. In addition, providers were given the freedom to decide which patients were enrolled in their P4P programs. Before the end of 2006, financial incentives were given only for process-based services (e.g., hemoglobin A1C testing). Both “increased physician fees” and “case management fees” were provided in addition to regular fee-for-service reimbursements (Lee et al. 2010). The design in this period focused on rewarding process-based services, a feature that lasted into the next period.
In the second period of the program's evolution, initiated toward the end of 2006, the NHI started paying extra bonuses for treatment outcome measures. In particular, they included the presence of two poor outcomes as an indication of poor care, that is, providers were not rewarded for patients with poor outcomes because these were patients who do not receive better care; the two poor outcomes used were the “Percentage of A1C ≥9.5 percent” and “Percentage of low-density lipoprotein (LDL) ≥130 mg/dl.” Importantly, these measures do not include risk adjustments.
Design in this period of the program's evolution focused not only on rewarding process services but also on rewarding intermediate outcome measures. In addition, the incentive structure for intermediate outcome measures adopted a so-called quality tournament, in which only the highest-performing 25 percent of providers received rewards. The rewards were paid according to a composite score. The calculation of the composite score is similar to that of the indicator average (Reeves et al. 2007). However, the DM-P4P program in Taiwan adopts the rank, and not the rate, of each indicator. For example, if there are two intermediate outcome indicators, then the indicator average for a provider is calculated by mean rank of the providers according to the two indicators. This score represents the mean rank at which each measure was met.
Our study was based on this current design that emphasizes intermediate outcomes and uses a quality tournament to disburse rewards.
Our data come from two secondary databases. One database contained information collected from the regular NHI claim data for the period from January 2007 to December 2007 and was used to obtain patient and hospital characteristics. The other database, the P4P database, was intended to supplement regular claims data. DM patient outcome data, such as A1C or LDL values, were reported by the hospitals themselves and were entered into the P4P-specific database automatically.
One Taiwanese study found using a survey questionnaire that the accuracy of diabetes diagnoses in the Taiwan NHI database was only 74.6 percent. Because of this failure in accuracy, our study required patients who were selected to have received a diagnosis of diabetes (ICD-9-CM 250) and to have undergone more than four outpatient visits. This selection process ensures that the accuracy of diagnosis achieved is 99.16 times greater than that for patients with ≤1 outpatient visit (Lin et al. 2005). We looked at all DM patients in the regular NHI claim data who were strictly defined by these criteria in 2007; we divided them into two groups for comparison (P4P patients versus non-P4P patients). After applying the plurality algorithm (described below), the total number of DM patients was 884,452. There were two essential criteria for identifying members of the P4P patient group. First, outcome data for the diabetic patients had to be available in the P4P database. Second, the patients had to have at least one “P14 ×” code (internal code) in the regular claims database. Ultimately, the total number of DM-P4P patients in our sample was 146,481, and the total number of non-DM-P4P patients was 737,971. Although the physician participation rate was around 47 percent (10,720/22,952), only about 16.56 percent (146,481/884,452) of DM patients were eligible to be selected by the NHI P4P program. This is because many of the physicians who saw large numbers of DM patients did not participate.
To address the problem of patients' multiple visits to different hospitals and provider accountability, the assignment algorithm called the plurality provider algorithm was applied in our study (Pham et al. 2007). The algorithm assigns a patient to the physician (or practice) who billed for the greatest number of care visits in a given year. Ties between physicians were resolved by favoring the physician with the greatest total charges for that patient (Pham et al. 2007).
The independent variables affecting the likelihood of a patient being enrolled in a DM-P4P program include age, gender, comorbidity (Meduru et al. 2007), severity/complication (Selby et al. 2001; Rosenzweig et al. 2002;), and number of visits. These variables characterize the patient. Hospital characteristics (Doran et al. 2006; Doran et al. 2008;) include patient volume, summary/baseline score of hospital in prior year, and hospital level. Definitions for some of these independent variables are discussed below.
We adopt the chronic illness with complexity (CIC) method for adjusting comorbidity in patients with multiple chronic diseases. This index includes nondiabetes physical illness complexity (e.g., any cancer), diabetes-related complexity, and mental illness and substance abuse complexity. We ignore diabetes-related complexity because it contains only three kinds of diabetes complications. Instead, we use the diabetes complications severity index (DCSI) and calculate the comorbidity count using CIC. For patient severity/complication, we adopted Selby and colleagues and Rosenzweig and colleagues's DCSI (Rosenzweig et al. 2002; Young et al. 2008;), which includes seven categorizations of complications: retinopathy, nephropathy, neuropathy, cerebrovascular complications, cardiovascular complications, peripheral vascular disease, and metabolic complications. Finally, the number of visits for each patient (individual variable) and the P4P patient volume at each hospital (hospital variable) are counted after assigning patients to a specific physician according to the plurality algorithm. The other variable, called the baseline score, is derived from the Raw Sum Score in the previous year for the hospital (Reeves et al. 2007). The Raw Sum Score method is recommended by the Centers for Medicare and Medicaid Services (CMS) for aggregating indicators within conditions; in this method, the process for calculating the Raw Sum Score is to sum the numerators, sum the denominators, and then calculate the ratio of summed numerators to summed denominators (Shwartz et al. 2008). We used two dichotomous measures (A1C <9.5 percent and LDL <130 mg/dl) and two process measures (annual A1C and LDL tests) to construct the Raw Sum Score (baseline score). This baseline score captures what the CIC and DCSI measures do not capture because hospitals with a lower baseline score in the previous year were more likely to exclude patients in the current year under some conditions (Doran et al. 2008).
We estimate logistical models to study how patient inclusion in the P4P program is related to hospital factors and patient characteristics. The dependent variable in this model is a dummy variable taking the value 1 for patients excluded from the P4P program and 0 for those included. We analyzed the data using SAS, version 9.1. The effect of hospital factors and patient characteristics are multilevel issues (Young 2008). We also estimated hierarchical models using HLM 6 to investigate the sensitivity of our logistic model results. Results are quite similar. The interpretation of the hierarchical model is available in the Appendix to this paper.
Table 1 presents the distribution of patient characteristics between the study group (P4P enrollment) and the reference group (no P4P enrollment). Among the patients who were enrolled in P4P, 51.92 percent were female, 41.34 percent had DCSI scores >0, and 34.50 percent had at least one additional medical condition (comorbidity). The average patient age was 61.9 years, and the average number of patient visits was 9.37. The group of patients enrolled in P4P had a lower average age and fewer patients with high comorbidity (CIC count, χ2-test, p<.001) or severe conditions (DCSI score, χ2-test, p<.001) compared with the non-P4P group. The P4P group also had a higher number of visits (t-test, p<.001) and a greater proportion of females (χ2-test, p<.001).
We estimate a logistic model to determine whether patients with higher comorbidity or severity are prone to be excluded from P4P programs by hospitals. We include patient and hospital characteristics in the regression equation for the logistic model (Table 2). The variables of age, DCSI score, CIC count, and hospital level have a significant odds ratio (OR) for participating in P4P. The results show that older patients and patients with higher comorbidity or severity are prone to be excluded from P4P programs. As the DCSI score and CIC count increases from zero to four or five, the probability of being excluded also increases. Hospitals with lower baseline scores in the previous year (2006) are more likely to exclude patients from P4P programs in the current year (2007) (OR=0.98, p<.001). Hospital size has a significant negative effect on the number of patients who participate in P4P programs. As hospital size increased, so did the probability that DM patients would be excluded from P4P programs as compared with the clinics. Discrimination in the logistic model is assessed using the C index, which is equivalent to the area under the receiver-operating characteristic curve. The model shows a high level of discrimination (0.72). The mixed-effects model shows similar results to those of the logistic model, with magnitudes that are also very similar. (See Table SA1 in the Appendix for a detailed interpretation and explanation.)
We found that patients with greater severity or comorbidity were more likely to be excluded from P4P programs no matter whether we used the logistic or mixed-effects models. In addition, we found that hospitals with a lower baseline score in the previous year (2006) were more likely to exclude patients in the current year (2007), perhaps because hospitals with lower baseline scores in the previous year may want to increase their benefits in the next year. This result is similar to the results of other related studies (Doran et al. 2008). Another finding is that our study, like the other studies, demonstrated that larger hospitals may be more likely to exclude patients from P4P programs (Doran et al. 2008). It is for this reason that 65 percent of the variance in patient participation is explained by hospital characteristics and only 35 percent by patient characteristics.
The primary policy implications of this study include a pronounced need to prevent adverse selections during P4P programs' patient selection process. Several approaches have been proposed for preventing adverse selection. The first is to set target thresholds below 100 percent. Physicians would earn the maximum financial reward without achieving the target for all patients (Fleetcroft et al. 2008). For example, the United Kingdom sets thresholds of 40–50 percent for the measure A1C ≤7 (BMA 2009). This approach is not suitable for Taiwan because it represents a design based in competition among hospitals (a dynamic threshold), not on a policy that would allow these hospitals to earn more money by exceeding the threshold (a fixed threshold).
A second approach would allow physicians to remove inappropriate patients from the calculation of quality achievement (exception reporting) (BMA 2009). Several authors (Doran et al. 2008; Fleetcroft et al. 2008;) have observed that this approach offers three advantages: It is precise, can increase the acceptance rate of the P4P program because of its active exclusion design, and may also help to eliminate situations in which patients are refused care because of severe medical conditions. However, evidence continues to indicate that the benefits from this design may be hampered by abuses of the system (Doran et al. 2006; Sigfrid et al. 2006; Gravelle et al. 2010;).
The final approach is to make risk adjustment for outcome or process measures (Landon et al. 2003; Asch et al. 2006;). For payers, an appropriate risk adjustment framework is important because physicians' behavior can only be changed by incentives when they consider the data to be complete and accurate and the score calculation to be fair; otherwise there will be a backlash against the system or physicians may try to game the system (Bokhour et al. 2006). As noted by Ryan and colleagues, “in the absence of complete risk adjustment, providers may engage in statistical discrimination: the application of perceived group characteristics to individuals” (Ryan 2009). Statistical discrimination may make providers avoid patients on the basis of unmeasured severity.
America and the United Kingdom face problems of politics in the implementation of P4P programs (Gulland 2003a, b; Tanenbaum 2009). Taiwan's reforms face similar difficulties. In Taiwan, the design of the original 2001 P4P program had itself faced the problem of interest group politics (Chang 2004). To resolve opposition to the P4P program and to encourage providers to participate, the original design was compromised in a manner that allowed voluntary provider participation and free patient program enrollment. In addition, there was no time schedule to complete the evolution of the P4P program. In May of 2009, the latest version of the DM-P4P program achieved some reforms by establishing the requirement that providers reach the new P4P patient enrollment rate of 30 percent and by requiring that providers attain a volume of P4P patients greater than the mandated threshold (50 or more).
Several other factors contribute to the problems of implementation. The insufficient funds for implementation of the P4P program represent yet another difficulty. The yearly DM-P4P cost is only about 3–5 percent of the total expenditure in DM care in Taiwan (Lai et al. 2009). The limited investment by the NHI makes it difficult to learn more about implementing mandated participation from programs in the United Kingdom because it may not cover the additional expenses required for the implementation of the P4P program by hospitals (Epstein 2006), which involves procedural changes such as the reporting of clinical data (Halladay et al. 2009).
Universal electronic health record data made it possible for the United Kingdom to measure the process and outcome for all DM patients, and exclusions were selected only as an active process (Curcin et al. 2010). Hence, while patient complexity may result in the exclusion, we cannot rule out the cost of reporting clinical data as an additional motivator for the decision to exclude patients. At the very least, this study proved that there is an association between patient complexity and exclusion from the P4P program.
There were some limitations in our study. First, our data are limited by having no recorded reasons for exclusions, so we cannot determine the specific reasons why patients were excluded from P4P programs. Second, because of considerations about patient privacy, we cannot link our claims data to the “cause of death” data file supported by the Department of Health. We can only calculate all-cause inpatient mortality, making it likely that the total number of deaths was underestimated in this study. However, Table SA2 in the Appendix shows that the exclusion of deaths affected the result only slightly using the logistic model. Third, while our interpretation and motivation are connected to physicians' individual practices of inclusion and exclusion, our analysis occurs at the hospital level rather than at the physician level.
There are several reasons that we perform our analysis at the hospital level. First, although in Taiwan incentive calculations were oriented toward physicians, the NHI's payment was actually given to the hospital, which then in theory delivered payments to its own physicians who were treating P4P patients. However, we do not know whether the administrators of hospitals actually passed this P4P benefit on to physicians. Every hospital has its own physician fee policy. Second, some U.K. studies related to P4P exclusion also performed analysis at the practice level (Doran et al. 2008; Gravelle et al. 2010;). Third, our physician-level data are incomplete, and we have records only of physician ids in our database. Because of these reasons, we decided to analyze exclusion behavior at the hospital level. In spite of this orientation, 65 percent of the variance in patient participation is explained by hospital characteristics, which indicates that these may factor significantly in patient enrollment in P4P.
Based on our findings, we recommend that the government would benefit most from carrying out a deliberative and stepwise reform by first executing risk adjustment in the DM-P4P program and then gradually investing more money to cover the hospital costs of running the P4P program. Then, finally, the government will be in a well-grounded position, in terms of anticipating the reactions to the incentives and estimating its costs, to fully implement mandated participation in the P4P program.
Joint Acknowledgment/Disclosure Statement: The authors would like to thank the Bureau of National Health Insurance (No. DOH97-NH-1006) and Ministry of Education of Taiwan (No. 99RH0021) through its “Aiming for the Top University and Elite Research Center Development Plan” for their financial supports. In addition, the authors are also very grateful to Tung-Liang Chiang, Ph.D., for his support.
Additional supporting information may be found in the online version of this article:
Appendix SA1: Author Matrix.
Table SA1: Factors Associated with the Exclusion of DM Patients from P4P Programs.
Table SA2: Factors Associated with the Exclusion of DM Patients from P4P Programs Using a Logistic Model.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.