BMC Health Serv Res. 2005; 5: 25. | PMCID: PMC1084246 |
Copyright © 2005 Atkins et al; licensee BioMed Central Ltd.
Systems for grading the quality of evidence and the strength of recommendations II: Pilot study of a new system
David Atkins,
1 Peter A Briss,
2 Martin Eccles,
3 Signe Flottorp,
4 Gordon H Guyatt,
5 Robin T Harbour,
6 Suzanne Hill,
7 Roman Jaeschke,
8 Alessandro Liberati,
9 Nicola Magrini,
10 James Mason,
3 Dianne O'Connell,
11 Andrew D Oxman,
4 Bob Phillips,
12 Holger Schünemann,
5,13 Tessa Tan-Torres Edejer,
14 Gunn E Vist,
4 John W Williams, Jr,
15 and The GRADE Working Group
1Center for Practice and Technology Assessment, Agency for Healthcare Research and Quality, 540 Gaither Rd. Rokville, MD 20852, USA
2Community Guide Branch, Centers for Disease Control and Prevention, MS K73, 4770 Buford Highway, Atlanta, GA 30341, USA
3Centre for Health Services Research, University of Newcastle upon Tyne, 21 Claremont Place, Newcastle upon Tyne NE2 4AA, UK
4Informed Choice Research Department, Norwegian Health Services Research Centre, Pb. 7004 St. Olavs Plass, 0130 Oslo, Norway
5Departments of Clinical Epidemiology and Biostatistics and Medicine, McMaster University, 1200 Main Street West, Hamilton, Ontario L8N 3Z5, Canada
6Scottish Intercollegiate Guidelines Network, 9 Queen Street, Edinburgh EH2 1JQ, UK
7Department of Clinical Pharmacology, Faculty of Medicine and Health Sciences, University of Newcastle, Level 5, New Med 2 Building, Newcastle Mater Hospital, Waratah, NSW 2298, Australia
8Department of Medicine, McMaster University, 1200 Main Street West, Hamilton, Ontario L8N 3Z5, Canada
9Department of Oncology and Hematology, Università di Modena e Reggio Emilia, Azienda Ospedaliera Policlinico, Via dal Pozzo 41, 41100 Modena, Italia and Centro per la Valutazione della Efficacia della Assistenza Sanitaria (CeVEAS), Modena, Italy
10Centro per la Valutazione della Efficacia della Assistenza Sanitaria (CeVEAS), NHS Centre for the Evaluation of the Effectiveness of Health Care, Viale Muratori 201, Modena 41100, Italy
11Cancer Epidemiology Research Unit, Cancer Research and Registers Division, The Cancer Council NSW, PO Box 572, Kings Cross NSW 1340, Australia
12Centre for Evidence-based Medicine, University Department of Psychiatry, Warneford Hospital, Oxford OX3 7JX, UK
13Departments of Medicine and Social & Preventive Medicine, University at Buffalo, State University of New York, ECMC-CC142, 462 Grider St, Buffalo, NY 14215, USA
14Global Programme on Evidence for Health Policy, World Health Organisation, CH-1211 Geneva 27, Switzerland
15The Center for Health Services Research in Primary Care, HSR&D, Department of Veterans Affairs Medical Center and Duke University Medical Center, 508 Fulton St., Durham, NC 27705, USA
Received January 23, 2004; Accepted March 23, 2005.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This pilot study of the GRADE approach to grading the quality of evidence and strength of recommendations helped to identify problems with the approach and enabled us to address these. We found that it was possible to resolve most of the disagreements we had when making judgements independently and there was agreement that this approach warrants further development and evaluation.
Many of the disagreements were a direct result of a lack of information. We concluded that there is a need for detailed additional information in evidence profiles, and have modified the evidence profiles accordingly. When we have found an empirical basis or compelling arguments, we have also provided precise definitions. For example, we have agreed on a basis for defining strong and very strong associations. However, in many cases we continue to rely on judgement. We have addressed this by always including the rationale for such judgements in footnotes attached to the evidence profile.
The evidence profiles used in the pilot study were based on systematic reviews. [
2-
13] Much of the information we found lacking was missing in these original systematic reviews, particularly information about harms and side effects. It was outside of the scope of this study to systematically collect this information. However, systematic reviews of evidence of harms, as well as benefits, are essential for guidelines development panels. If reviews, such as Cochrane reviews, are going to meet the needs of guideline development panels, and others making decisions about health care, it is essential that evidence of adverse effects is systematically included in these.
An important benefit of the approach to grading evidence and recommendations that we used in this study is that it clarifies the source of true disagreements, as well as helping to resolve disagreements through discussing each type of judgement sequentially. Judgements about the relative importance of different outcomes and about trade-offs, as well as about the quality of evidence, are made explicitly, rather than implicitly. This facilitates discussion and clarification of these judgements. It may be helpful to guideline panels and others to use this approach before making decisions and recommendations.
The most common source of disagreement that we encountered was differences in what we consider to be sparse data. We have not reached a consensus on a definition of sparse data, but have acknowledged that we have different thresholds and now recognize this when we make judgements about the quality of evidence [
16].
We have as a result of this pilot study been able to make considerable improvements to our system for grading the quality of evidence and strength of recommendations. The evidence profiles used in the pilot study have been modified and now include information that was missing and was found to be an important source of disagreement, as illustrated in Table and Table and the criteria used for grading the quality of evidence for each important outcome have been modified as summarised in Table . Guideline generation includes judgement. Individual, residual judgements will impact on the agreement we measured in this study. Thus, lower kappa values are expected. Further refinement of the GRADE system and additional instructions will improve agreement.
| Table 9Example of a modified GRADE evidence profile quality assessment. Table 9 and 10 is what Table 1 and 2 became when including the improvements made based on the pilot study experience. |
| Table 10Example of a modified GRADE evidence profile summary of findings. Table 9 and 10 is what Table 1 and 2 became when including the improvements made based on the pilot study experience |
| Table 11Modified GRADE quality assessment criteria |
Judgements about confidence in evidence and recommendations are complex. The GRADE system represents our current thinking about how to reduce errors and improve communication of these complex judgements. Ongoing developments include:
• Exploring the extent to which the same system should be applied to public health and health policy decisions as well as clinical decisions
• Developing guidance for when and how costs (resource utilisation) should be considered
• Developing guidance for judgements regarding sparse data
• Adapting the approach to accommodate recommendations about diagnostic tests when these are based on evidence of test accuracy
• Incorporating considerations about equity
• Preparing tools to support the application of the GRADE system
Plans for further development include studies of the reliability and sensibility of this approach and a study comparing alternative ways of presenting these judgements [
17]. We invite other organisations responsible for systematic reviews of the effects of healthcare or practice guidelines to work with us to further develop and evaluate the system described here.