The Canadian Medical Association (CMA) maintains a national online database of clinical practice guidelines known as CMA Infobase (www.cma.ca/cpgs/index.asp
). The database contains over 2000 guidelines that can be searched electronically by key word and medical subject heading. For a guideline to be entered into the CMA Infobase, it must have been produced, reviewed or endorsed in Canada by a national, provincial or territorial medical or health organization, professional society, government agency or expert panel within 5 years of the current date. A database of all such organizations is maintained, and through regular contact with the organizations, all new guidelines and revisions are added to the database as they become available. The extent to which the guidelines in the CMA Infobase are representative of all guidelines developed or used in Canada is unknown.
We searched the CMA Infobase for all English-language or bilingual guidelines produced or reviewed from 1994 to 1998 and coded in the database as having a pharmacological (drug therapy) focus or identified as drug therapy guidelines through a manual search. We excluded guidelines that may have been produced during this period but that subsequently expired and were not updated; we also excluded immunization guidelines.
Descriptive information was collected about each guideline. This included language, year of development, developer, type of developer (e.g., professional organization, government agency), endorsement by a professional organization, publication status (peer-reviewed publication or not published), stated drug company sponsorship, disease topic (classified according to the tenth revision of the International Statistical Classification of Diseases and Related Health Problems20
) and drug focus (classified according to the Comparative Drug Index Therapeutic Classification System as used by the American Hospital Formulary System).
The standardized instrument used to assess the guidelines was the Appraisal Instrument for Clinical Guidelines, a 37-item instrument developed by Cluzeau and associates.21
This instrument was selected after a systematic search of the literature22
revealed that it was the most well-developed guideline appraisal instrument available. Although limited, there are data showing that the instrument has acceptable reliability, and there is preliminary evidence of criterion validity.9
The appraisal instrument is currently being used by the Independent Appraisal Service of the National Health Service (NHS) in the United Kingdom to assess all guidelines funded by the NHS through the National Clinical Guidelines Group.24
Cluzeau and associates developed the appraisal instrument so that it could be applied by anyone (general practitioner, specialist, health care manager, policy-maker or researcher) interested in assessing guidelines without prior training in how to use the instrument. For each of the 37 items the appraiser is asked to indicate whether information is present (Yes, No or Not sure) and then to judge the quality of the information. To ensure that all questions are interpreted consistently, the instrument comes with a user manual. We merged the user manual and the instrument into one document for ease of use by the appraisers. Minor modifications were made to some of the definitions of what constituted a Yes response.
To allow comparison of guideline performance, the 37 items in the instrument are separated into 3 dimensions (). The first, rigour of guideline development (20 items), reflects attributes necessary to enhance guideline validity and reproducibility. The second, context and content of the guideline (12 items), addresses the attributes of reliability, applicability, flexibility and clarity. The third, application or implementation of the guideline (5 items), assesses the implementation, dissemination and monitoring strategies.
Although not formally part of the appraisal instrument, we also included a global assessment of guidelines that Cluzeau and associates9
have used previously. The appraiser was asked whether he or she would “strongly recommend this guideline for use in practice without modifications,” “recommend this guideline for use in practice on condition of some alterations or with provisos” or “not recommend this guideline” (not suitable for use in practice). We supplemented this global assessment by asking respondents to provide a global quality rating using an 11-point scale. The wording of this question was as follows: “Overall, how would you rate the quality of this guideline on a scale ranging from 0 to 10, with 0 indicating the lowest possible quality and 10 representing the highest possible quality?”
Each guideline was assessed independently by 3 appraisers (a physician, a pharmacist and a methodologist [an individual with graduate training in research methods]). In total, 56 appraisers (19 physicians, 29 pharmacists and 8 methodologists) were assigned to assess the guidelines. The physicians and pharmacists were assigned guidelines related to their area of expertise. Using the assessments, we calculated the frequency with which the guidelines adhered to each of the 37 appraisal items. Adherence was defined by agreement of at least 2 of the 3 appraisers.
The 3 summary scores of guideline quality (rigour of development [dimension 1], context and content [dimension 2] and application [dimension 3]) assigned by each appraiser were calculated by summing the values for the items constituting each dimension. A Yes response was assigned a value of 1, and all other responses were given a value of 0. A dimension quality score was then obtained by calculating the mean of the appraisers' scores. This was then expressed as a percentage of the maximum possible score out of 100% for each dimension in order to compare scores across the 3 dimensions, as done by Cluzeau and associates.9
Mean dimension quality scores were calculated with 95% confidence intervals for guideline groups having similar descriptive characteristics.
We assessed appraiser agreement by calculating the percentage of guidelines for which the 3 appraisers scored the quality of each dimension within 20 percentage points of each other and by calculating the intraclass correlation coefficients for the 3 dimensions. We assessed reliability of the instrument by examining the internal consistency (Cronbach's α) of each dimension. This was done by calculating the correlation between all items within a dimension to test to what extent they measured the same underlying concept.