The first stage of analysis was the development of the Rasch model. When a Rasch model is able to be constructed, it indicates that there is an underlying dimension that is common to all of the variables. Several indicators are utilized in assessing the overall fit of the data to Rasch model: model reliability, item quality using point measure correlations, and measure quality using infit and outfit statistics. Winsteps version 3.63.2 was used to analyze the data.
Using RMA, we are able to independently assess the reliabilities for hospitals and improvement programs. The reliability analyses reveal the extent to which the program usage items yield an internally consistent measure. Hospital reliability is r=0.83 and program reliability is r=0.97. These values indicate a high degree of fit; thus providing an initial indication that a unidimensional representation of the data exists.
Next, the relationship of individual improvement program variables to the overall model is examined. The point measure correlation of each item with the total score indicates how well a program predicts the total number of programs supported. For this particular measure, the sign of the correlation should be positive and is the relevant aspect of the analysis, rather than the magnitude of the correlation. As shown in , all signs are positive with the exception of ISO/TS-certified programs. The negative correlation indicates that the ISO/TS-certified program variable does not fit the overall model for hospitals and this variable was removed from further analysis.
Finally, how well the individual items fit the model is considered. Mean square fit statistics are used to assess whether observations are in agreement with the Rasch model values. Two types of fit can be used in the Rasch model, outfit and infit statistics. The outfit measure is sensitive to unexpected observations of hospitals and programs that are relatively far from their position. The infit measure is sensitive to unexpected patterns of observations by hospitals and programs that are close to their position (Wright and Masters 1982
). One way of examining individual item fit is by means of z
-scores. The standardized z
-scores for the fit statistics should fall within a −2 to +2 range for acceptable fit. As shown in , all of the remaining improvement programs are within the −2 to +2 standardized z
-score range defining acceptable fit. Based on these results, we conclude that the data provide a good fit to the Rasch model, thus revealing the existence of a unidimensional latent trait that can be described as the program capability of hospitals.
Even though a Rasch model was able to be effectively constructed based on improvement program usage, its correspondence to the stage model presented in also needs to be assessed. The measure scores in report the difficulty of each program, the scores are ordered according to level of difficulty with the easier programs starting at the bottom. Assuming that all lower stage programs are implemented before the next higher level stage programs from , the Wilcoxon test, a nonparametric version of the t-test, was used to analyze whether a difference exists between the results of the Rasch model and the four-stage model presented in . No statistically significant differences were observed (W=−6, ns/r=8) between the improvement programs in the proposed four-stage model and the results presented by the Rasch model. Although some differences were present, the results provided by the Rasch model appeared to be consistent with the four-stage view of cumulative development.
The end result of the Rasch analysis is that all improvement programs receive a difficulty score and all hospitals receive an ability score. Both the difficulty score and the ability score are reported as logits. The results for both program location and hospital location are presented graphically in , the variable map. The vertical axis in the center of the variable map serves as a ruler, with logits as a common unit of measurement, allowing the comparison of both hospitals and improvement programs.
On the right side of the map, the distribution of the improvement programs is shown. Those improvement programs that are located higher on the map are accomplished less frequently than those that are lower on the map. Rasch modeling literature refers to this aspect as a program's difficulty. Within the health care context, two significant sources of difficulty arise because: (1) an improvement program can be difficult to deploy throughout a hospital or (2) a hospital is unable to understand how a particular improvement program can be effectively employed. The hierarchy from the easiest to the most difficult programs to implement exhibits a range that spans more than 6 logits. This result implies that a well-defined hierarchical structure exists among the improvement programs. Based upon the Rasch difficulty scores, there is a statistically significant difference in difficulty (~z=3.85) between the bottom 25 percent of the programs and the top 25 percent of the programs, which approximates the first stage (quality conformance) and the highest level stage (low cost) presented in .
The distribution of hospitals is located on the left side of the map. The hospitals in this study had widely varying abilities in carrying out programs. Hospitals located higher on the left side of the map have been able to adopt more programs than hospitals that are located lower on the map. Rasch modeling literature refers to this aspect as the hospital's ability to carry out a program. Based upon the Rasch ability scores, there is a statistically significant difference (~z=2.13) between the bottom 10 percent of the hospitals and the top 10 percent of the hospitals in their ability to carry out programs. That is, the hospitals with the most ability were able to implement more programs than the hospitals with lesser levels of ability.
What does the preceding statistical analysis tells us? First, the ordering of improvement programs that we obtained in this study is consistent with the stages presented in the Competitive Progression Theory. As a result, we would expect that the improvement programs representing the highest stage activities would be accomplished less often than improvement programs representing the lowest stage activities in this model (because they are more difficult), which we found to be true. Finally, because of the range of difficulty presented by the improvement programs, we would then expect that hospitals would have varying levels of capability in carrying out improvement programs. Indeed, the most capable hospitals were able to implement more programs than the least capable hospitals. Aside from the theoretical framework that proposes general reasons for the success or failure in implementing improvement programs, the practical significance of this analysis lies in the interpretation of the variable map for individual hospitals.
The variable map is interpreted in terms of both hospitals and improvement programs. A program that is 1 logit higher than a hospital's position means that it is twice as difficult to accomplish as a program that is at its level. The converse is also true. A program that is 1 logit lower than a hospital's position means that it is twice as easy to accomplish as a program that is at its level. The difficulty of adopting the program can also be described in terms of its probability of success. Understanding these relationships can guide an organization in making more effective operating decisions and more efficient resource allocations.
For example, if Hospital A as denoted by the arrow in (at the level of the Balanced Scorecard) decided to pursue FOCUS PDSA, there is a 27 percent probability that it will be able to successfully implement the program. If Hospital A decides to pursue the Malcolm Baldrige Award, the chances of success are even lower, <5 percent. However, if the hospital did not have an employee suggestion system and decided to implement that type of program the probability of successfully implementing that program is quite high, nearly 75 percent.
The probabilities of success cited in the previous paragraph are related to the difficulty of achieving a particular program. Probability of success like difficulty is determined by distances on the vertical ruler (Bond and Fox 2007
). If a program that a hospital wants to accomplish is 1 logit higher than where the hospital is currently situated in terms of its ability, there is only a 27 percent probability that the program will be accomplished. If a program that a hospital wants to accomplish is 2 logits higher, it is three times as difficult and there is only a 12 percent probability that the program will be accomplished. If a practice that a hospital wants to accomplish is 3 logits higher, it is four times as difficult and there is only a 5 percent probability that the program will be accomplished. One can see that the farther away a program is from the ability of the hospital the greater the chance for failure.
It is important to note that these are only probabilities. Any selection of a program will generally require devoting a substantial amount of financial, human, and time resources in order to ensure that the program is successful. It should be noted that low probabilities do not preclude a hospital from attempting to implement more difficult programs. It simply means that greater effort and more resources will need to be expended by a hospital to ensure the successful implementation of the program.
The second stage of analysis views the overall impact of a hospital's ability to implement programs within its organization relative to its operating performance. To measure the impact on performance, we used the results of the Rasch analysis in conjunction with the Leapfrog score. Because inclusion in the Leapfrog database is voluntary, it should be noted that only 48 of the hospitals taking part in this study also participated in the Leapfrog survey, which is 44 percent of the responding hospitals. A regression model was built using the Rasch hospital ability score (independent variable) and its respective Leapfrog score. Initially, we included case-mix index, number of beds, and urban or rural location in the model to control for differences among hospitals. However, none of these control variables were found to be significant and they were removed from the model. The results from the final model indicate the Rasch ability score is significantly correlated with the Leapfrog score, F(1.47)=16.846, p<.01. The r2=0.264 indicates that over 25 percent of the total variance in the Leapfrog score is explained by the Rasch model ability score. That is, the hospitals with greater numbers of implemented programs tended to have higher scores on the Leapfrog Hospital Quality and Safety Survey.