To guarantee that clinical practice guidelines can be an effective tool to improve care for (cancer) patients they should meet specific quality criteria (
Feder et al, 1999;
Shekelle et al, 1999). This concern is felt worldwide, and has been underlined by renewed calls for internationally recognised standards to promote the rigorous development of clinical guidelines and to assess their quality (
Shaneyfelt et al, 1999;
Grilli et al, 2000). Clearly, these standards should be valid, reliable and feasible.
The AGREE Collaboration has recently developed such criteria in the context of an EU-funded research project. Bringing together researchers and policy makers from 12 countries (UK, The Netherlands, Denmark, Finland, France, Switzerland, Spain, Canada, Italy, Germany, USA, New Zealand), the collaboration's aim is to establish comparable frameworks for the assessment and monitoring of the quality of clinical practice guidelines, including the process of development and the reporting of the process. The AGREE Instrument was developed through a multistage process of item generation, selection and scaling, field testing and refinement procedures. A small working group first compiled a comprehensive checklist of 82 items from existing appraisal instruments and relevant literature that covered recognised components of guideline quality. The term ‘quality’ was defined as the confidence that the biases linked to the rigour of development, presentation and applicability of a guideline had been minimised during the development process. Most of the items were derived from existing lists or instruments (e.g.
Lohr and Field, 1992;
Grol et al, 1998b;
Cluzeau et al, 1999) to cover all aspects of the concept of quality. Following preliminary testing, the checklist was reduced to 32 items classified into five quality domains. This was then circulated to all the members of the AGREE collaboration and other international experts for their comments. The resulting ‘first’ version of the instrument was then field tested for reliability and validity on 100 guidelines with 195 appraisers from 11 countries, with 31 cancer guidelines, including guidelines from the FNCLCC and from the Canadian Cancer Care Ontario Practice Guidelines Initiative. After refinement, the instrument was field tested again on a random sample of 33 guidelines (including 14 cancer guidelines) from the first field test with a new set of appraisers. The results were encouraging and demonstrated that the instrument was easy to use and could be applied consistently to a broad range of guidelines across different countries (
AGREE Collaboration, 2003). Generally, the scores for cancer guidelines were high with the instrument (for example, they were higher than the scores for guidelines on diabetes and asthma for rigour of development).
The final AGREE instrument consists of 23 key items () categorised into six domains (see:
http://www.agreecollaboration.
org). Each domain is intended to measure a separate dimension of guideline quality.
Scope and purpose (items 1–3): These items are concerned with the overall aim of the guideline, the specific clinical questions and the target patient population.
Stakeholder involvement (items 4–7): These items focus on the extent to which the guideline represents the views of its intended users. Guideline development needs to be carried out by a multidisciplinary group involving all stakeholders whose clinical activities are likely to be covered in the proposed guideline. This also includes patient groups.
Rigour of development (items 8–14): These items relate to the process used to gather and synthesise the evidence, and the methods used to formulate the recommendations and to update them. The recommendations should be explicitly linked to the supporting evidence. However, because most current guidelines use a mixture of ‘expert’ judgement and literature review, disclosure of disagreement or uncertainties encountered during the development may help to clarify the process. Guidelines should be reviewed externally before publication, and the process used clearly described. They should also always include a date of publication, and because guidelines need to reflect current research, they should contain a clear statement about the updating procedures.
Clarity and presentation (items 15–18): These items deal with the language and format of the guidelines. Since the main role of guidelines is to help clinicians and patients make better decisions, busy clinicians need simple, patient-specific, user-friendly guidelines that are easy to understand. Good guidelines present clear information about the management options available and the likely consequences of each. This information can be presented in a variety of formats to suit the needs of the user.
Applicability (items 19–21): These items cover the likely organisational, behavioural and cost implications of applying the guidelines. Guidelines should be feasible to use in the current organisation of care and must fit into routine practice and the time constraints present. In addition, review criteria should be developed that link the guideline use to audits and other quality improvement initiatives.
Editorial independence (items 22–23): These items assess the independence of the recommendations and acknowledgement of possible conflict of interests for the members of the guideline development group. An increasing number of guidelines are funded directly, or indirectly, by external funding. There should be an explicit statement that the views and/or interests of the funding body have not influenced the final recommendations.
To help users understand the items, the instrument contains a users' guide with explanatory notes. Each item is scored on a reduced four-point Likert scale, and there is an overall rating as to whether the guideline should be recommended or not for use in practice.
The AGREE instrument was developed through a detailed and lengthy process that took many years to complete. Despite this, most of the AGREE quality criteria are still based on theoretical assumptions rather than on empirical evidence. They were developed through discussions between researchers from several countries who have extensive experience and knowledge of clinical guidelines. It remains to be shown that these criteria are actually linked to ‘better’ quality guidelines leading to improved patient care and outcomes. Another issue is that the AGREE instrument relies heavily on the quality of the background documentation on which the guidelines are based. Although defining quality by the rigour of reporting rather than the rigour of content may not provide information on the intrinsic quality of the guidelines, it is clear that without some information about the development process it is impossible to assess the quality of guidelines (
Hayward et al, 1995). Finally, guidelines need to be used if they are to assist decision-making in practice. Our understanding of what attributes of guidelines determine this complex process is limited, although important research is emerging in the field (
Grol et al, 1998b;
Foy et al, 2002). The quality of a guideline is affected by scientific considerations as well as human and practical factors. Future validation research will need to focus on how these elements interact in clinical practice.