In this study, the psychometric properties of the item bank have been examined using a sample of 555 respondents and an incomplete calibration design. Each item was presented to between 262 and 293 respondents. These figures are above the minimum, of 200 respondents, regarded as necessary to implement the models used in this paper [12
]. It could be argued that it would have been desirable for all respondents to be presented with all items, but this would have placed an unacceptable burden on the, often frail, population in this study. Incomplete calibration designs are regularly implemented in the development and maintenance of item banks used in educational testing [4
] and have gained some recognition in health related applications [15
]. Developments in psychometric theory mean that it is now possible to perform the same types of analysis on data resulting from incomplete designs, as is performed on data from complete calibration designs [22
]. The number of items in the anchors following the analysis, indicate that the design was still amply linked [9
One of the major assumptions underlying the use of the item response theory models described in this paper is that the items reflect a single latent trait (θ). This has been examined using item response theory based full-information factor analysis. Part of the full-information factor analysis was performed on sub-sets of the data, as exploratory analyses on incomplete designs may lead to instable results. However, the confirmatory factor analysis was performed on all data. The results, together with the high level of internal consistency, as measured by Cronbach's alpha, and the acceptable fit of the two-parameter logistic item response theory model to the data indicate that the items presented in this paper probably represent a unidimensional construct in a population of respondents requiring residential care.
Another important assumption when using item response theory models in conjunction with marginal maximum likelihood estimation procedures is that the values of the latent trait (θ) follow a pre-specified, usually Normal, distribution. In this study, there was no evidence that these values did not follow a Normal distribution. This is in contrast to many previously published studies into health and quality of life outcomes, where a strongly skewed distribution was found. The authors feel that there are two reasons for this contrast. Firstly, in this study, the respondents all had some level of restriction in their ability to perform activities of daily life. Secondly, the item bank includes items well above and well below the level of functional status enjoyed by the respondents. This means that the item bank did not have a ceiling or floor effect with respect this this population.
In this study, 81 (51%) of 160 items were removed from the item bank because they did not conform to the psychometric standards required of the item bank. This is a much higher level than would be expected in the calibration of an item bank for use in educational measurement. However, when the results are examined more carefully, 28 items were removed because they were too difficult or too easy for the population in this study. In addition, 26 items were removed because they had different item parameters for different groups of respondents. These problems would have been identified much earlier in an educational item bank. Hence, only 26 (25%) of 106 items were removed due to item misfit. The number of items retained in the item bank may have been higher if a more flexible model, based on, for example, non-parametric smoothing techniques had been used [26
]. However, this type of model is less suitable as a base for implementing modern testing algorithms, such as computerised adaptive testing. In addition, it is possible that more items could be made available if the items demonstrating differential item functioning were included in the item bank with different item location parameters (βi
) for males and females or for younger and older respondents. This may seem complicated, but is straightforward in the framework of a computerised item bank.
This paper has concentrated on the two-parameter logistic item response theory model. However, the one-parameter logistic item response theory model was also fitted to the 79 items remaining in the item bank. This model fitted the data significantly less well than the two-parameter model, even after 3 items demonstrating misfit at the item level were removed. This confirms the choice of the two-parameter model. This model was chosen because it allows the probability of responding in the category 'can' to be modelled more flexibly than when the one-parameter logistic model is used. This enables a more realistic model for the data to be built than when the more restrictive approach associated with the one-parameter model is chosen [18
This paper has examined the calibration of the ALDS item bank in a population requiring residential care. It has been shown that the item bank has sound psychometric properties and could form a stable base for a wide range of applications. However, it is possible that the items will have different measurement characteristics for patients requiring treatment for specific chronic conditions or in other countries. Hence, it is important that the ALDS item bank is tested carefully before it is used to assess the functional status in other groups of respondents or in other countries.