This evaluation has shown good agreement between reviewers and the final consensus rating for most QUADAS items and very positive feedback from reviewers who have used QUADAS. Two items, uninterpretable results and withdrawals, were found to be problematic. There was poorer agreement among reviewers and between reviewers and consensus for these items than for other items; feedback from reviewers also suggested problems with these items. One reviewer suggested that this might be because it is difficult to know what to do it if it is unclear if there are any uninterpretable results or withdrawals. Our own use of QUADAS supports this: we have found it very difficult to know how to score this item if the study does not report whether there were any uninterpretable results/withdrawals, and if all patients who entered the study appear to be accounted for. In such situations it is often unclear whether the study authors simply excluded uninterpretable results or withdrawals from their reports, or if there truly were no uninterpretable results or withdrawals. We have handled this problem by giving more explicit instructions for scoring these QUADAS items: we have stated that they should be scored as yes if it appears that all patients who were entered into the study completed the study.
The assessment of inter-rater reliability also highlighted possible problems with the items on the availability of clinical information and selection criteria. The item on clinical information is very specific to each review and it is therefore essential that clear guidelines on scoring this item be provided, outlining exactly what information should be available to the person interpreting the results of the index test. This definition should be agreed a priori. This was done for the review used for this evaluation and is reflected in the very high levels of agreement between two of the reviewers and the final consensus. It is unclear why the third reviewer showed much poorer agreement (50%) with the final consensus rating. It is unclear why the item on selection criteria showed poorer agreement with the consensus rating. This item was not highlighted as problematic in the feedback from reviewers. It may be related to the fact that no review specific information was provided for this item.
All additional items suggested for inclusion in QUADAS were considered as part of the development of QUADAS but were items that were not selected by the panel of experts for inclusion in the final tool. One of the items suggested for inclusion, the item relating to the threshold for the index test could be covered as part of item 8 (description of index test details). This is something to consider including in the guidelines for scoring this item when making guidelines specific to your review.
There was substantial variation in the time taken to complete QUADAS, ranging from less than 10 minutes to over 1 hour. This may be explained by the fact that some reviewers counted the time taken for the whole process of data extraction, including reading the paper, whereas others only counted the time taken to complete QUADAS. Despite this, half the reviewers took less than 15 minutes and 17/20 took less than half an hour to complete QUADAS suggesting that QUADAS is relatively quick to complete.
Strengths and weaknesses of the study
The major strength of this study is that we carried out a detailed evaluation of QUADAS, which specifically included the views and experience of users. We are unaware of any other quality assessment tools for diagnostic accuracy studies that have undergone any process of evaluation.
Ideally, we would have liked to assess the "construct validity" of the tool – "the degree to which a test measures what it claims, or purports, to be measuring" [6
]. As QUADAS aims to provide an indication of the quality of a study one way to assess this would be to take a set of "high" quality studies and a set of "low quality" studies and determine whether QUADAS can distinguish between these. This is known as "extreme groups" [6
]. The problem with this process is determining which studies are high quality and which are low quality: there is no objective way of doing this. In addition, a systematic review is likely to include studies covering a range of quality. A quality assessment tool needs to be able to distinguish subtle differences across this full range of study quality, not just the extremes. We therefore decided against this method of evaluation.
Unanswered questions and future research
We originally proposed to carry out a meta-epidemiological regression analysis to investigate the association of individual QUADAS items with estimates of test performance. However, due to limited time and resources such an evaluation was not feasible. This is an area where future research would be beneficial. The Cochrane Collaboration is planning to extend its database to include diagnostic test accuracy reviews and is in the process of producing a handbook providing guidelines for the conduct of such reviews. The recommendations on quality assessment include a modified version of QUADAS (items 2, 8 and 9, the items relating to reporting rather than quality have been removed), and this will be built into the new Cochrane software. All diagnostic reviews included in the new Cochrane Database will therefore include an assessment of QUADAS with the results entered into the Review Manager Software in a structured way. In the future, once a number of Cochrane Test Accuracy Reviews have been completed, a meta-epidemiological regression analysis can be pursued.