|Home | About | Journals | Submit | Contact Us | Français|
Contributors: RH and JM coordinated the work on this project and drafted the final grading system. The review process and agreement of the underlying principles was carried out by the grading review group which, in addition to the authors, comprised Dr Grahame Howard (chair) (Western General Hospital, Edinburgh); Dr Doreen Campbell (Scottish Executive Department of Health); Professor Adrian Grant (University of Aberdeen); Professor Jeremy Grimshaw (University of Aberdeen); Professor Phillip Hannaford (University of Aberdeen); Dr Chris Kelnar (University of Edinburgh); Professor Julian Little (University of Aberdeen); Professor Gordon Lowe (University of Glasgow); Ms Jill Mollison (University of Aberdeen); Dr Leo Murray (Ayr Hospital); Dr Moray Nairn (SIGN); Professor Gillian Needham (University of Aberdeen); Dr Gillian Penney (University of Aberdeen); Professor James Petrie (SIGN; guarantor); Professor Nigel Pitts (University of Dundee); Dr David Signorini (formerly with Western General Hospital, Edinburgh); Professor Frank Sullivan (University of Dundee); and Ms Gail Topping (Fife Health Board).
The Scottish Intercollegiate Guidelines Network (SIGN) develops evidence based clinical guidelines for the NHS in Scotland. The key elements of the methodology are (a) that guidelines are developed by multidisciplinary groups; (b) they are based on a systematic review of the scientific evidence; and (c) recommendations are explicitly linked to the supporting evidence and graded according to the strength of that evidence.
Until recently, the system for grading guideline recommendations was based on the work of the US Agency for Healthcare Research and Quality (formerly the Agency for Health Care Policy and Research).1,2 However, experience over more than five years of guideline development led to a growing awareness of this system's weaknesses. Firstly, the grading system was designed largely for application to questions of effectiveness, where randomised controlled trials are accepted as the most robust study design with the least risk of bias in the results. However, in many areas of medical practice randomised trials may not be practical or ethical to undertake; and for many questions other types of study design may provide the best evidence. Secondly, guideline development groups often fail to take adequate account of the methodological quality of individual studies and the overall picture presented by a body of evidence rather than individual studies or they fail to apply sufficient judgment to the overall strength of the evidence base and its applicability to the target population of the guideline. Thirdly, guideline users are often not clear about the implications of the grading system. They misinterpret the grade of recommendation as relating to its importance, rather than to the strength of the supporting evidence, and may therefore fail to give due weight to low grade recommendations.
In 1998, SIGN undertook to review and, where appropriate, to refine the system for evaluating guideline evidence and grading recommendations. The review had three main objectives. Firstly, the group aimed to develop a system that would maintain the link between the strength of the available evidence and the grade of the recommendation, while allowing recommendations to be based on the best available evidence and be weighted accordingly. Secondly, it planned to ensure that the grading system incorporated formal assessment of the methodological quality, quantity, consistency, and applicability of the evidence base. Thirdly, the group hoped to present the grading system in a clear and unambiguous way that would allow guideline developers and users to understand the link between the strength of the evidence and the grade of recommendation.
The review group decided that a more explicit and structured approach (figure) to the process of developing recommendations was required to address the weaknesses identified in the existing grading system. The four key stages in the process identified by the group are shown in the box.
The strength of the evidence provided by an individual study depends on the ability of the study design to minimise the possibility of bias and to maximise attribution. The hierarchy of study types adopted by the Agency for Health Care Policy and Research is widely accepted as reliable in this regard and is given in box boxB1B1.1
The strength of evidence provided by a study is also influenced by how well the study was designed and carried out. Failure to give due attention to key aspects of study methods increases the risk of bias or confounding and thus reduces the study's reliability.3 The critical appraisal of the evidence base undertaken for SIGN guidelines therefore focuses on those aspects of study design which research has shown to have a significant influence on the validity of the results and conclusions. These key questions differ between types of studies, and the use of checklists is recommended to ensure that all relevant aspects are considered and that a consistent approach is used in the methodological assessment of the evidence.
We carried out an extensive search to identify existing checklists. These were then reviewed in order to identify a validated model on which SIGN checklists could be based. The checklists developed by the New South Wales Department of Health were selected because of the rigorous development and validation procedures they had undergone.4 These checklists were further evaluated and adapted by the grading review group in order to meet SIGN's requirements for a balance between methodological rigour and practicality of use. New checklists were developed for systematic reviews, randomised trials, and cohort and case control studies, and these were tested with a number of SIGN development groups to ensure that the wording was clear and the checklists produced consistent results. As a result of these tests, some of the wording of the checklists was amended to improve clarity.
A supplementary checklist covers issues specific to the evaluation of diagnostic tests. This was based on the New South Wales checklist,4 adapted with reference to the work of the Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic Tests and Carruthers et al.5,6
The checklists use written responses to the individual questions, with users then assigning studies an overall rating according to specified criteria (see box boxB2B24). The full set of checklists and detailed notes on their use are available from SIGN.7
The next step is to extract the relevant data from each study that was rated as having a low or moderate risk of bias and to compile a summary of the individual studies and the overall direction of the evidence. A single, well conducted, systematic review or a very large randomised trial with clear outcomes could support a recommendation independently. Smaller, less well conducted studies require a body of evidence displaying a degree of consistency to support a recommendation. In these circumstances an evidence table presenting summaries of all the relevant studies should be compiled.
Having completed a rigorous and objective synthesis of the evidence base, the guideline development group must then make what is essentially a subjective judgment on the recommendations—one that can validly be made on the basis of this evidence. This requires the exercise of judgment based on clinical experience as well as knowledge of the evidence and the methods used to generate it. Although it is not practical to lay out “rules” for exercising judgment, guideline development groups are asked to consider the evidence in terms of quantity, quality, and consistency; applicability; generalisability; and clinical impact.
Increasing the role of subjective judgment in this way risks the reintroduction of bias into the process. It must be emphasised that this is not the judgment of an individual but of a carefully composed multidisciplinary group. An additional safeguard is the requirement for the guideline development group to present clearly the evidence on which the recommendation is based, making the link between evidence and recommendation explicit and explaining how they interpreted that evidence.
The revised grading system (box (boxB3)B3) is intended to strike an appropriate balance between incorporating the complexity of type and quality of the evidence and maintaining clarity for guideline users. The key changes from the Agency for Health Care Policy and Research system are that the study type and quality rating are combined in the evidence level; the grading of recommendations extrapolated from the available evidence is clarified; and the grades of recommendation are extended from three to four categories, effectively by splitting the previous grade B which was seen as covering too broad a range of evidence type and quality.
Inevitably, some compromises had to be made, and for some areas of practice, such as diagnosis, recommendations higher than grade B are unlikely because of the type of study that can feasibly be conducted in those areas. However, the review group expects that grade A recommendations will become relatively rare under the new system, and that grade B will come to be regarded as the best achievable in many areas. Early results from applying this system in practice suggest that this expectation is well founded. Further research will be required to establish the extent to which this new system meets the objectives set for it.
Funding: Scottish Intercollegiate Guidelines Network.
Competing interests: None declared.