In this study it was shown how CAT can successfully be applied as an efficient method of reducing the length of the administration of the MASQ-AD in ROM. Simulated CATs under five stopping rules (required standard errors in decreasing steps of 0.10) were performed. Naturally, when increasing the required measurement precision, the average number of administered items increased. Likewise, the relationship between the latent depression estimates using the full and adaptive assessment increased as measurement requirements increased. Moreover, with increasing required measurement precision, the criterion validity of the latent depression estimate was decreasingly attenuated by measurement error.
In spite of the obvious loss of information as requirements of measurement precision were relaxed, the magnitude of this loss was surprisingly low. For example, the CAT requiring SE to be at least 0.4, recording on average only about a fifth part of the items per respondent, gave depression scores that correlated 0.946 with the full assessment score, and 0.651 with the MASQ-GD scale score, which was only marginally smaller than the original concurrent validity (0.739). In addition, the CAT estimates under the 'SE(
θ)
<0.5' stopping rule, recording on average only about a seventh of the items per respondent, correlated 0.917 with the original score, and had an AUC (0.769) for predicting depression that was only marginally smaller than that of the full assessment estimates (0.810). This study, therefore, links up to other studies on CAT [
1,
6,
7,
10] in its conclusion that it is a fruitful way of increasing the efficiency of self report questionnaires.
Before discussing which stopping rule is best for a real MASQ-AD CAT in ROM, it should be stressed that the present outcomes were based on a simulated CAT on data that were obtained in a standard assessment. Obviously, simulated and actual adaptive administrations may yield different results concerning item reductions because respondents may behave differently in reality. Therefore, in addition to the present study, an actual MASQ-AD CAT administration should be studied as well. Fortunately, others have shown that the outcomes of simulated and actual CAT administrations can be very similar [
37], which may render the present study instructive nevertheless. Stopping rules 'SE(
θ)
<0.4' and 'SE(
θ)
<0.5' seem to be the best for real MASQ-AD CAT administrations in patient populations. CATs under these rules used a relatively low number of items, showed a substantive correlation with the original latent depression estimates, and had a small attenuation in criterion-related validity.
In addition to allowing for an improvement of the efficiency of MASQ-AD assessment, IRT modeling provided estimates of test information (see Appendix). This showed that the information peaked somewhat below the middle of the latent trait scale. Figure shows that for
θ >1, information is relatively low. In addition, the lower panel shows the estimated distribution of the latent depression variable for each of the two MINI groups, separately. It is apparent that the test is informative in the region where these two groups are to be told apart; in the region where the patients suffering from depression prevail (i.e., on the right hand side), information is lower. Consequently, in that region small differences are not as easily detected as for respondents with lower scores of depression (which caused the MINI depression group requiring, on average, about four items more than the no depression group under the 'SE(
θ)
<0.3' rule). MASQ users should ask themselves if the test information presented complies with their testing goal. For example, when using CAT for monitoring the development of mental health, and for obtaining reliable change scores, an item bank with more uniform test information may be preferred. By contrast, in other situations MASQ-AD assessment may be much more aimed at deciding whether a respondent scores high on depression or not (see, [
38]). In such cases CAT as described in this study seem be a sound choice. Actually, when predictive utility is the only goal, it may be even better to adjust the CAT algorithm and use so called clinical decision adaptive testing [
39,
40], in which items with threshold parameters around a cut-score are needed. When MASQ-AD users are satisfied with the test information presented in this study, however, they can take advantage of CAT as a method of very efficient MASQ-AD assessment in ROM.
Although CAT provides an opportunity to improve the efficiency in patient assessment substantially, its implementation not only needs knowledge of building an item bank and psychometric analysis. For an actual implementation, a test delivery system is needed, and this system should be flexible enough to incorporate the dynamic nature of CAT. Evidently, the extent of the investment of switching to CAT depends upon the current assessment method. For example, when ROM is performed using web-based software, which is the case at the Rivierduinen Centers, the adaptation does not have to be so extensive. By contrast, for those institutions still using paper and pencil questionnaires, such a system may have to be built from scratch. For such institutions it may be fruitful to register at the Assessment Center (
http://www.assessmentcenter.net), a free online research management tool sponsored by PROMIS, which enables mental health workers and researchers to create websites and CATs for their own patients and scales.
In this study we sought to find a solution for the main disadvantage of ROM: its time consumption [
5]. Although we restricted our attention to the efficiency of a single scale, the ultimate solution would be to convert all scales used for assessment in ROM to a CAT version. One could add smart testing designs to this [
41-
43], for example, allowing for the collection of scores for only some time points for each subject. Such a hybrid system would lead to an even larger reduction in time consumption. It should be noted that efficient assessments have, at least, two other advantages. First, for some patient groups, such as severely diseased patients, administering many questions may decrease the quality of the answers given [
3]; short assessments will lead to data with higher quality. Second, ROM data can be used by researchers for effectiveness studies [
4]. In such studies, it is important that patients who entered treatment remain in treatment. Because the willingness of patients to cooperate in research is known to decreases with the size and number of questionnaires [
44,
45], shorter ROM assessments may lead to lower drop-out rates. We hope that both mental health researchers and practitioners are convinced by these outcomes and will implement the MASQ-AD CAT or develop a CAT for their preferred mental health instruments for use in ROM.