We designed a 37-item assessment tool that we developed by combining items from two available instruments: the enhanced Overview Quality Assessment Questionnaire (OQAQ) [8
] containing 10 items and a checklist created by Sacks [7
] containing 24 items. We supplemented this with three additional items based upon methodological advances in the field since the development of the original two instruments: Language restriction
: Language restriction in systematic reviews remains controversial. Some studies have suggested that systematic reviews that include only English language publications tend to overestimate effect sizes [10
], whereas other studies suggest that such language restrictions may not do so [11
]. An item was added to determine whether a language restriction was applied in selecting studies for the systematic review. 2) Publication bias
: Publication bias refers to the tendency for research with negative findings to get published less frequently, less prominently, or more slowly, and the tendency for research with positive findings to get published more than once. Publication bias has been identified as a major threat to the validity of systematic reviews. Empirical research suggests that publication bias is widespread, and that a variety of methods are now available to assess publication bias [12
]. An item was added to determine whether the authors assessed the likelihood of publication bias. 3) Publication status
of studies suggests that published trials are generally larger and may show an overall greater treatment effect than studies published in the 'grey' literature [20
]. The importance of including grey literature in all systematic reviews has been discussed [21
]. The assessment of the inclusion of grey literature considers whether or not the authors reported searching for grey literature.
The 37-item assessment tool was used to appraise 99 paper-based reviews identified from a database of reviews and meta-analyses [22
] and 52 Cochrane systematic reviews from the Cochrane Database of Systematic Reviews [9
]. After the list of selected systematic reviews was generated, full copies of these were retrieved, copied, and masked to conceal author, institution, and journal. Reviews in languages other than English (i.e., French, German, and Portuguese) were translated into English with the assistance of colleagues before masking [23
For each included systematic review, two reviewers independently assessed the methodological quality with the 37 items (CH, BS).
Statistical analyses and graphs displaying the results obtained were produced using SPSS version 13.0 for Windows. The 37 items were subjected to principal components analysis, and Varimax rotations were used to rotate the components. Items with low factor loadings of < 0.50 were removed.
We convened an international panel of eleven experts in the fields of methodological quality assessment and systematic reviews. The group was selected from three organizations involved both in the conduct of systematic reviews and in the assessment of methodological quality. The group was made up of clinicians, methodologists and epidemiologists, and reviewers who were new to the field. Some individuals were previously involved in the Cochrane Collaboration, while a number were not. By examining the results of the factor analysis, they reflected critically on the components identified and decided on the items to be included in the new instrument. The nominal group process took place in San Francisco during a one day session.
We conducted the following NGT in order to achieve agreement. After delivery of an overview of the project and the planned process for the day, the panel reviewed the results of the factor analysis. The aim of the NGT was to structure interaction within the group. Firstly, each participant was asked to record his or her ideas independently and privately. The ideas were then listed in a round-robin format. One idea was collected from each individual in turn and listed in front of the group by the facilitator, and the process was continued until all ideas had been listed. Individuals then privately recorded their judgements. Subsequent discussions took place. The individual judgements were aggregated statistically to derive the group judgements. The nominal group was also asked to agree on a final label for each of the 11 components. A description was formulated for each of the items and a next-to-final instrument was assembled. This was circulated electronically to the group for a final round of fine tuning.