|Home | About | Journals | Submit | Contact Us | Français|
The Canadian Medical Association maintains a national online database of clinical practice guidelines developed, endorsed or reviewed by Canadian organizations within 5 years of the current date. This study was designed to identify and describe guidelines in the database that make recommendations related to the use of drug therapy, and to assess their quality using a standardized guideline appraisal instrument.
Drug therapy guidelines in the database were identified with the use of search terms and hand searching. Descriptive information about the developers, endorsement by other organizations, publication status, disease and drug focus was abstracted. Each guideline was independently assessed by 3 appraisers (a physician, a pharmacist and a methodologist) with the use of the Appraisal Instrument for Clinical Guidelines. Conditions were classified according to the tenth revision of the International Statistical Classification of Diseases and Related Health Problems.
We identified 217 drug therapy guidelines produced or reviewed from 1994 to 1998. Guideline developers included national organizations (47.0%), paragovernment organizations (39.6%) and professional associations (30.9%); 31.3% of the guidelines were published, and 10.6% stated drug company sponsorship. The most common conditions addressed by the guidelines were infections and parasitic diseases (39.6%), neoplasms (11.5%) and diseases of the circulatory system (11.5%). Drugs most commonly cited were anti-infective agents (42.9%), antiviral agents (15.2%) and cardiovascular drugs (16.1%). Eleven organizations produced 176 (81.1%) of the guidelines. In all, 14.7% of the guidelines met half or more of the 20 items assessing rigour of guideline development on the appraisal instrument (mean quality score 30.0% [95% confidence interval (CI) 27.5%–32.6%]), 61.8% met half or more of the 12 items assessing guideline context and content (mean score 57.0% [95% CI 54.6%–59.3%]), and none met half or more of the 5 items assessing guideline application (mean score 5.6% [95% CI 4.7%–6.5%]). Overall, 64.6% of the guidelines were recommended with modification by at least 2 of the 3 appraisers, 9.2% were recommended without change, and 26.3% were not recommended. The quality of the guidelines assessed varied significantly by developer, publication status and drug company sponsorship. No substantial improvement in guideline quality was observed over the 5-year study period.
Developers of Canadian drug therapy guidelines are producing guidelines that are often perceived to be clinically useful to physicians and pharmacists, although the methods (or the description of the methods) by which they are developed need to be more rigorous and thorough.
Clinical practice guidelines are “systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances.”1 When acted upon, they have been shown to have the potential to improve both the process of care and patient health outcomes.2,3,4,5 However, these beneficial effects will not be realized unless well-developed and valid guidelines are implemented by clinicians and policy-makers.
With the exponential growth in guidelines development, clinicians are increasingly being confronted with differing and sometimes contradictory disease-specific guidelines.6,7,8 In one study from Britain,8 the recommendations from 20 practice guidelines on anticoagulation treatment in atrial fibrillation were applied to 100 consecutive patients. The proportion of patients requiring anticoagulant treatment varied from 13% to 100%, depending on the guideline followed. The authors of the study attributed the variation to the nonsystematic development of the guidelines. Others have also raised serious concerns about the quality of guidelines being developed.9,10,11,12,13,14,15 Some have suggested that the lack of ability to critically appraise the quality of clinical practice guidelines has been a barrier to their use.16 Others have proposed that, if physicians were given instruments to appraise guidelines systematically or were given guidelines that had been systematically appraised, the adoption of high-quality and useful guidelines would be increased.17,18,19
The objectives of this study were to identify and describe Canadian clinical practice guidelines that make recommendations related to the use of drug therapy and to assess systematically the quality of these guidelines with the use of a standardized guideline appraisal instrument.
The Canadian Medical Association (CMA) maintains a national online database of clinical practice guidelines known as CMA Infobase (www.cma.ca/cpgs/index.asp). The database contains over 2000 guidelines that can be searched electronically by key word and medical subject heading. For a guideline to be entered into the CMA Infobase, it must have been produced, reviewed or endorsed in Canada by a national, provincial or territorial medical or health organization, professional society, government agency or expert panel within 5 years of the current date. A database of all such organizations is maintained, and through regular contact with the organizations, all new guidelines and revisions are added to the database as they become available. The extent to which the guidelines in the CMA Infobase are representative of all guidelines developed or used in Canada is unknown.
We searched the CMA Infobase for all English-language or bilingual guidelines produced or reviewed from 1994 to 1998 and coded in the database as having a pharmacological (drug therapy) focus or identified as drug therapy guidelines through a manual search. We excluded guidelines that may have been produced during this period but that subsequently expired and were not updated; we also excluded immunization guidelines.
Descriptive information was collected about each guideline. This included language, year of development, developer, type of developer (e.g., professional organization, government agency), endorsement by a professional organization, publication status (peer-reviewed publication or not published), stated drug company sponsorship, disease topic (classified according to the tenth revision of the International Statistical Classification of Diseases and Related Health Problems20) and drug focus (classified according to the Comparative Drug Index Therapeutic Classification System as used by the American Hospital Formulary System).
The standardized instrument used to assess the guidelines was the Appraisal Instrument for Clinical Guidelines, a 37-item instrument developed by Cluzeau and associates.21 This instrument was selected after a systematic search of the literature22 revealed that it was the most well-developed guideline appraisal instrument available. Although limited, there are data showing that the instrument has acceptable reliability, and there is preliminary evidence of criterion validity.9 The appraisal instrument is currently being used by the Independent Appraisal Service of the National Health Service (NHS) in the United Kingdom to assess all guidelines funded by the NHS through the National Clinical Guidelines Group.24
Cluzeau and associates developed the appraisal instrument so that it could be applied by anyone (general practitioner, specialist, health care manager, policy-maker or researcher) interested in assessing guidelines without prior training in how to use the instrument. For each of the 37 items the appraiser is asked to indicate whether information is present (Yes, No or Not sure) and then to judge the quality of the information. To ensure that all questions are interpreted consistently, the instrument comes with a user manual. We merged the user manual and the instrument into one document for ease of use by the appraisers. Minor modifications were made to some of the definitions of what constituted a Yes response.
To allow comparison of guideline performance, the 37 items in the instrument are separated into 3 dimensions (Table 1). The first, rigour of guideline development (20 items), reflects attributes necessary to enhance guideline validity and reproducibility. The second, context and content of the guideline (12 items), addresses the attributes of reliability, applicability, flexibility and clarity. The third, application or implementation of the guideline (5 items), assesses the implementation, dissemination and monitoring strategies.
Although not formally part of the appraisal instrument, we also included a global assessment of guidelines that Cluzeau and associates9 have used previously. The appraiser was asked whether he or she would “strongly recommend this guideline for use in practice without modifications,” “recommend this guideline for use in practice on condition of some alterations or with provisos” or “not recommend this guideline” (not suitable for use in practice). We supplemented this global assessment by asking respondents to provide a global quality rating using an 11-point scale. The wording of this question was as follows: “Overall, how would you rate the quality of this guideline on a scale ranging from 0 to 10, with 0 indicating the lowest possible quality and 10 representing the highest possible quality?”
Each guideline was assessed independently by 3 appraisers (a physician, a pharmacist and a methodologist [an individual with graduate training in research methods]). In total, 56 appraisers (19 physicians, 29 pharmacists and 8 methodologists) were assigned to assess the guidelines. The physicians and pharmacists were assigned guidelines related to their area of expertise. Using the assessments, we calculated the frequency with which the guidelines adhered to each of the 37 appraisal items. Adherence was defined by agreement of at least 2 of the 3 appraisers.
The 3 summary scores of guideline quality (rigour of development [dimension 1], context and content [dimension 2] and application [dimension 3]) assigned by each appraiser were calculated by summing the values for the items constituting each dimension. A Yes response was assigned a value of 1, and all other responses were given a value of 0. A dimension quality score was then obtained by calculating the mean of the appraisers' scores. This was then expressed as a percentage of the maximum possible score out of 100% for each dimension in order to compare scores across the 3 dimensions, as done by Cluzeau and associates.9 Mean dimension quality scores were calculated with 95% confidence intervals for guideline groups having similar descriptive characteristics.
We assessed appraiser agreement by calculating the percentage of guidelines for which the 3 appraisers scored the quality of each dimension within 20 percentage points of each other and by calculating the intraclass correlation coefficients for the 3 dimensions. We assessed reliability of the instrument by examining the internal consistency (Cronbach's α) of each dimension. This was done by calculating the correlation between all items within a dimension to test to what extent they measured the same underlying concept.
We identified 217 clinical practice guidelines in the CMA Infobase that met our inclusion criteria. Nearly two-thirds of the guidelines were developed between 1996 and 1998. Guideline developers included national organizations (47.0%), paragovernment organizations (39.6%) and health professional associations (30.9%). In all, 31.3% of the guidelines were published, and 10.6% reported receiving drug company sponsorship. The most common health conditions addressed by the guidelines were infections and parasitic diseases (39.6% of the guidelines), neoplasms (11.5%) and diseases of the circulatory system (11.5%). The most common drugs dealt with in the guidelines were anti-infective agents (42.9%), antiviral agents (15.2%), cardiovascular drugs (16.1%), gastrointestinal drugs (13.4%), and corticosteroids and antineoplastics drugs (12.0% each). Eleven organizations produced 176 (81.1%) of the guidelines.
Table 1 presents the proportion of guidelines that met each of the 37 quality criteria in the appraisal instrument. Fig. 1 shows the distribution of the guidelines by dimension quality score. In all, 14.7% of the guidelines met half or more of the 20 items assessing rigour of guideline development (mean quality score 30.0% [95% confidence interval (CI) 27.5%–32.6%]), 61.8% met at least half of the 12 items assessing guideline context and content (mean score 57.0% [95% CI 54.6%–59.3%]), and none met half or more of the 5 items assessing guideline application (mean score 5.6% [95% CI 4.7%–6.5%]). A selected list of guidelines produced in 1998 is presented in Table 2 along with the appraisers' mean dimension quality scores (a complete list of all the guidelines appraised and their dimension quality scores can be found at www.cma.ca/cmaj/vol-165/issue-2/grahamtable2s.pdf.)
The mean global quality rating (range of scores 0–10) for the 217 guidelines was 4.8 (95% CI 4.6–5.1; median 4.7, standard deviation 1.9); this finding suggested that, overall, the appraisers perceived the guidelines to be of medium quality. Similarly, the mean global assessment rating (strongly recommend as is, recommend with modification or not recommend) revealed that nearly three-quarters of the guidelines were recommended by at least 2 of the 3 appraisers (9.2% without change, 64.6% with modifications); 26.3% of the guidelines were not recommended for use in practice.
Rigour of guideline development (dimension 1) and context and content (dimension 2) varied significantly among the guidelines. Factors significantly related to higher scores in dimension 1 were publication status (mean score 41.6% [95% CI 35.9%–47.3%] for published guidelines v. 24.7% [95% CI 22.6%–26.9%] for unpublished guidelines) and type of developer (mean score 46.5% [95% CI 37.7%–55.3%] for guidelines produced by an organization other than government, para-government or professional association v. 27.1% [95% CI 24.7%–29.4%] for those produced by any of these 3 types of developers]). The rigour of guideline development also differed by specific developer; for example, guidelines produced by Health Canada's Steering Committee for Clinical Practice Guidelines for the Care and Treatment of Breast Cancer and those from Cancer Care Ontario received significantly higher quality scores than guidelines from other bodies. In dimension 2, higher scores were significantly related to the same factors as those associated with higher scores in dimension 1 as well as to the following factors: being produced by a national organization (mean score 63.3% [95% CI 60.3%–66.3%] v. 51.3% [95% CI 48.1%–54.6%] for guidelines not produced by a national organization), being produced by a government agency (mean score 64.3% [95% CI 59.6%–68.9%] v. 55.4% [95% CI 52.8%–58.0%] for guidelines not produced by a government agency), not being produced by a paragovernment agency (mean score 59.7% [95% CI 56.7%–62.8%] v. 52.7% [95% CI 49.1%–56.3%] for guidelines produced by a paragovernment agency), endorsement by a health professional organization (mean score 66.1% [95% CI 62.6%–69.7%] v. 55.7% [95% CI 53.1%–58.3%] for guidelines not receiving an endorsement) and drug company sponsorship (mean score 64.9% [95% CI 58.7%–71.0%] v. 56.1% [95% CI 53.5%–58.6%] for guidelines not receiving drug company sponsorship). Quality scores were not found to be related to year of publication or release, production by a health care professional organization or language (English v. bilingual). (A complete list of dimension quality scores by guideline characteristic is available online [www.cma.ca/cmaj/vol-165/issue-2/grahamtable3s.pdf].)
Examination of agreement between the appraisers revealed that the appraisers' scores were within 20 percentage points for 85%, 34% and 87% of the guidelines for dimension 1, 2 and 3 respectively. The intraclass correlation coefficients for the 3 dimension scores were 0.80, 0.42 and 0.02 respectively.
Assessment of the reliability of the guideline appraisal instrument showed that 2 of the 3 dimensions (1 and 2) were internally consistent. The Cronbach's α was 0.85 for rigour of guideline development and 0.74 for context and content. Too few guidelines met any of the criteria in dimension 3 (application) for useful data on reliability to be generated.
The Pearson's correlation coefficients between appraisers' dimension scores and their global assessment (recommend or not) were 0.37 (n = 647) for dimension 1, 0.54 (n = 647) for dimension 2 and 0.18 (n = 647) for dimension 3. All coefficients were highly significant (p < 0.001), which indicated criterion validity. The Pearson's correlation coefficients between appraisers' dimension scores and their global quality rating score (0–10) were 0.45 (n = 646), 0.63 (n = 646) and 0.24 (n = 646), respectively; again, all correlations were highly significant (p < 0.001).
The rigour of development was low for the 217 Canadian clinical practice guidelines produced between 1994 and 1998 that made recommendations related to the use of prescription medications. Only 14.7% of the guidelines met half or more of the criteria for rigour of development. The quality of the guidelines in terms of their context and content was considerably higher, with 61.8% of the guidelines meeting half or more of the criteria in this dimension. The quality of the guidelines varied by developer, publication status and drug company sponsorship. There was little evidence that the quality had improved substantially over the 5-year study period or that it was higher for guidelines produced by a health professional association than for those produced by another organization. Overall, 64.6% of the guidelines were recommended for use with modification by at least 2 of the 3 the appraisers, and 9.2% were recommended for use in their current form; the appraisers did not recommend 26.3% of the guidelines. On the whole, it would seem that developers of Canadian drug therapy guidelines are producing guidelines that are often perceived to be clinically useful to physicians and pharmacists, although the methods (or the description of the methods) by which they are developed must become more rigorous and thorough.
An important caveat about the results is that the guideline appraisal instrument actually assesses the quality of the reporting of the guideline development process rather than the actual quality of the process. Although we attempted to locate and provide all documentary material describing the development process, even this information may not have adequately explained the actual process. Until there is widespread appreciation of the elements comprising guideline quality, there is little incentive for developers to report their methods in detail, especially when journal editors often prefer short reports. Although guideline developers who failed to adequately describe their development process might challenge our findings, we are comfortable assessing the quality of guidelines on the basis of only the written material available. Practitioners and policy-makers wishing to make a decision about using or endorsing a particular guideline would, in all likelihood, not have any additional information other than what we provided to our appraisers.
Our findings are in keeping with those reported by Cluzeau and associates9,23 for 60 guidelines in the United Kingdom and those reported by Grilli and colleagues15 for 431 international guidelines developed by specialty societies. In all 3 studies, only a minority of developers used (or described the use of) rigorous development methods. In fact, Grilli and colleagues suggested that 3 items are particularly related to the scientific quality of guidelines: the description of the types of professionals and stakeholders involved in the guideline's development; the description of the sources of information used to retrieve the relevant evidence and explicit grading of the strength of evidence supporting the recommendations. The proportion of the guidelines we reviewed that met these criteria were 86.6%, 17.6% and 17.9% respectively. Nevertheless, compared with the guidelines evaluated by Cluzeau and associates and Grilli and colleagues, the Canadian guidelines we appraised had higher scores for items related to guideline context and content (i.e., clinical usefulness).
The results of our study lead us to conclude that the quality of all clinical practice guidelines in Canada should be assessed in a systematic fashion by an independent body using a standardized appraisal instrument, similar to what is done by the Independent Appraisal Service in the United Kingdom. The resulting quality assessments should be made available to practitioners, policy-makers and the public to facilitate informed decision-making about the quality and usefulness of particular guidelines. The appraisal instrument and the criteria used to assess guideline quality should be decided upon through an evidence-based consensus process involving guideline sponsors, developers and users and be disseminated to guideline developers so that they are aware of how their guidelines will be assessed. Journal editors should be encouraged to use a guideline appraisal instrument as a publication criteria checklist when reviewing clinical practice guideline reports for possible publication. Finally, appraisals of guideline quality should be incorporated into the CMA Infobase so that anyone retrieving a guideline from this database will have access to this information.
This article has been peer reviewed.
Acknowledgements: This project was supported by a financial contribution from the Health Transition Fund, Health Canada. The views expressed herein do not necessarily represent the official policy of federal, provincial or territorial governments.
Dr. Graham is a Medical Research Council of Canada Scholar and was an Ontario Ministry of Health Career Scientist when the study was initiated. Dr. Hébert is an Ontario Ministry of Health Career Scientist. Dr. McAlister is a Population Health Investigator for the Alberta Heritage Foundation for Medical Research.
Competing interests: None declared.
Correspondence to: Dr. Ian D. Graham, Clinical Epidemiology Unit, Rm. 410, C-4 North, Ottawa Hospital — Civic Campus, 1053 Carling Ave., Ottawa ON K1Y 4E9; fax 613 761 5492; igraham/at/ohri.ca