Examining a single underlying construct
Among the 45 items, a total of 18 items (shown in Table ) met the model's expectations fairly well (infit and outfit MNSQ within 0.5 and 1.5). These 18 items also covered those seven specified categories of Picker's original inpatient questionnaire. The 18 items had point-biserial correlations in a range from 0.66 to 0.84. A principal component analysis on the residuals of Rasch scores showed no additional factors in that (a) the variance explained by items was more than 4 times greater than that of the first principal component (i.e. dividing 68% by 4.6% yields 14.78 times); (b) the variance explained by the Rasch factor was 68%, greater than that the cutoff of 50% (c) the first eigenvalue was 2.6, less than the cutoff of 3; and (d) the percentage of variance explained by the first principal component was of 4.6%, less than 5%. These results indicated that there was a good model-data fit and that the assumption of unidimensionality held for these 18 polytomous items. The category Rasch-Andrich thresholds (step difficulties) were ordered as -3.76, -1.91, 1.57 and 4.11.
Item difficulties with DIF-free and Fit MNSQ statistics of the 15 items in the 2003 English inpatient questionnaire of the Picker Institute Europe
Overall, these 18 items exhibited a good model-data fit. Hence, they measured a single construct for patient satisfaction and an interval scale of logits was achieved for further comparison and analysis [38
]. The hospital measures ranged from -1.59 to 9.63 with mean 2.64 and standard deviation 2.09, indicating items were easier for these hospitals and a wide range of hospitals dispersed on the interval scale.
The hospital sample separation reliability was 0.94 (Cronbach's α [50
] = 0.96), indicating that these 18 items could differentiate the hospitals very well. The separation index for the items (a measure of the spread of the estimates relative to their precision) was as high as 4.01, allowing us to differentiate between five statistically distinct strata of item difficulties with the formula of strata = (4 × 4.01 + 1)/3 [51
Analysis of variance on the hospital measures reveals a significant difference (F = 8.318; p < .001) among types of hospitals: General practices (M = 4.72 logits) performed the best, followed by Large hospitals (M = 3.47 logits), Teaching hospitals (M = 2.62 logits), Small hospitals (M = 2.25 logits), and Medium hospitals (M = .82 logits).
The three most difficult items to be satisfied by patients were: item 39 (Staff told you about any medication side effects to watch when going home), item 41 (Doctors or nurses gave your family information needed to help you) and item 27 (Hospital staff talks about your worries and fears). The easiest one was item 34 (Hospital staff did everything they could to help you control your pain). The mean and standard deviation of items was 0.00 and 1.52, respectively. All of items were in a range of absolute 4.9 logits. The item difficulties were well spread out across the hospitals, indicating that these items could differentiate hospitals fairly well so as to reach a hospital separation reliability of 0.94.
Item invariance refers to the fact that the estimated item location parameters should not depend on the sample used to calibrate the estimates [50
]. Table shows that no DIF items were found across different types of hospitals, suggesting these items measure the same construct across types of hospitals such that their performances can be directly compared.
KIDMAP used for diagnosing hospitals
Figure shows a Web-KIDMAP for a particular hospital. In the right-hand bottom corner (the 4th quadrant), there are six 'easier not achieved' items that the hospital was expected to have achieved given the performance estimate of 6.14 logits and the percentile rank of 10 (see to the right of the percentile column in Figure ).
KIDMAP profile of an actual assessed hospital.
The most unexpected errors among the three items, one noted with an asterisk (*) and two with a caret (^), are indicated as statistically significantly different (p < 0.05 and p < 0.01, respectively). The label 6.5.4 in the 4th quadrant means that the 5th category of item 6 was endorsed as 4 by the hospital. Actually, the hospital (6.41 logits) had a very good chance to achieve a score higher than 4 but failed to achieve (e.g., see the left-hand side in Figure ). These unexpected responses are informative and worth noting, because the hospital's weakness did not match the patients' perception. Note that the unexpected response was identified by inspecting the hospital's own performance level (i.e., self comparison), rather than the averaged scores in Picker's item-by-item diagram, Figure .
Three steps to read the Web-KIDMAP
We analyzed the data from the 2003 EPIE inpatient questionnaire [17
] and developed a Web-KIDMAP diagram that could be visualized on the Internet for (a) inter-hospital comparison (by inspecting performance levels along the logit scale), (b) intra-hospital comparison (by inspecting response patterns and residual Z-scores), and (c) model-data fit checking (by inspecting MNSQ statistics).