Although several tools to evaluate the credibility of health care guidelines exist, guidance on practical steps for developing guidelines is lacking. We systematically compiled a comprehensive checklist of items linked to relevant resources and tools that guideline developers could consider, without the expectation that every guideline would address each item.
We searched data sources, including manuals of international guideline developers, literature on guidelines for guidelines (with a focus on methodology reports from international and national agencies, and professional societies) and recent articles providing systematic guidance. We reviewed these sources in duplicate, extracted items for the checklist using a sensitive approach and developed overarching topics relevant to guidelines. In an iterative process, we reviewed items for duplication and omissions and involved experts in guideline development for revisions and suggestions for items to be added.
We developed a checklist with 18 topics and 146 items and a webpage to facilitate its use by guideline developers. The topics and included items cover all stages of the guideline enterprise, from the planning and formulation of guidelines, to their implementation and evaluation. The final checklist includes links to training materials as well as resources with suggested methodology for applying the items.
The checklist will serve as a resource for guideline developers. Consideration of items on the checklist will support the development, implementation and evaluation of guidelines. We will use crowdsourcing to revise the checklist and keep it up to date.
Systematic reviews and meta-analyses of randomized trials that include patient-reported outcomes (PROs) often provide crucial information for patients, clinicians and policy-makers facing challenging health care decisions. Based on emerging methods, guidance on improving the interpretability of meta-analysis of patient-reported outcomes, typically continuous in nature, is likely to enhance decision-making. The objective of this paper is to summarize approaches to enhancing the interpretability of pooled estimates of PROs in meta-analyses. When differences in PROs between groups are statistically significant, decision-makers must be able to interpret the magnitude of effect. This is challenging when, as is often the case, clinical trial investigators use different measurement instruments for the same construct within and between individual randomized trials. For such cases, in addition to pooling results as a standardized mean difference, we recommend that systematic review authors use other methods to present results such as relative (relative risk, odds ratio) or absolute (risk difference) dichotomized treatment effects, complimented by presentation in either: natural units (e.g. overall depression reduced by 2.4 points when measured on a 50-point Hamilton Rating Scale for Depression); minimal important difference units (e.g. where 1.0 unit represents the smallest difference in depression that patients, on average, perceive as important the depression score was 0.38 (95% CI 0.30 to 0.47) units less than the control group); or a ratio of means (e.g. where the mean in the treatment group is divided by the mean in the control group, the ratio of means is 1.27, representing a 27% relative reduction in the mean depression score).
To inform clinical guidelines and patient care we need high quality evidence on the relative benefits and harms of intervention. Patient reported outcome (PRO) data from clinical trials can “empower patients to make decisions based on their values” and “level the playing field between physician and patient”. While clinicians have a good understanding of the concept of health-related quality of life and other PROs, evidence suggests that many do not feel comfortable in using the data from trials to inform discussions with patients and clinical practice. This may in part reflect concerns over the integrity of the data and difficulties in interpreting the results arising from poor reporting.
The new CONSORT PRO extension aims to improve the reporting of PROs in trials to facilitate the use of results to inform clinical practice and health policy. While the CONSORT PRO extension is an important first step in the process, we need broader engagement with the guidance to facilitate optimal reporting and maximize use of PRO data in a clinical setting. Endorsement by journal editors, authors and peer reviewers are crucial steps. Improved design, implementation and transparent reporting of PROs in clinical trials are necessary to provide high quality evidence to inform evidence synthesis and clinical practice guidelines.
Quality of life; CONSORT PRO; Reporting; Clinical trials
Cochrane Reviews are intended to help providers, practitioners and patients make informed decisions about health care. The goal of the Cochrane Applicability and Recommendation Methods Group (ARMG) is to develop approaches, strategies and guidance that facilitate the uptake of information from Cochrane Reviews and their use by a wide audience with specific focus on developers of recommendations and on healthcare decision makers. This paper is part of a series highlighting developments in systematic review methodology in the 20 years since the establishment of The Cochrane Collaboration, and its aim is to present current work and highlight future developments in assessing and presenting summaries of evidence, with special focus on Summary of Findings (SoF) tables and Plain Language Summaries.
A SoF table provides a concise and transparent summary of the key findings of a review in a tabular format. Several studies have shown that SoF tables improve accessibility and understanding of Cochrane Reviews.
The ARMG and GRADE Working Group are working on further development of the SoF tables, for example by evaluating the degree of acceptable flexibility beyond standard presentation of SoF tables, developing SoF tables for diagnostic test accuracy reviews and interactive SoF tables (iSoF).
The plain language summary (PLS) is the other main building block for dissemination of review results to end-users. The PLS aims to summarize the results of a review in such a way that health care consumers can readily understand them. Current efforts include the development of a standardized language to describe statistical results, based on effect size and quality of supporting evidence.
Producing high quality PLS and SoF tables and making them compatible and linked would make it easier to produce dissemination products targeting different audiences (for example, providers, health policy makers, guideline developers).
Current issues of debate include optimal presentation formats of SoF tables, the training required to produce SoF tables, and the extent to which the authors of Cochrane Reviews should provide explicit guidance to target audiences of patients, clinicians and policy-makers.
Health care professionals worldwide attend courses and workshops to learn evidence-based medicine (EBM), but evidence regarding the impact of these educational interventions is conflicting and of low methodologic quality and lacks generalizability. Furthermore, little is known about determinants of success. We sought to measure the effect of EBM short courses and workshops on knowledge and to identify course and learner characteristics associated with knowledge acquisition.
Health care professionals with varying expertise in EBM participated in an international, multicentre before–after study. The intervention consisted of short courses and workshops on EBM offered in diverse settings, formats and intensities. The primary outcome measure was the score on the Berlin Questionnaire, a validated instrument measuring EBM knowledge that the participants completed before and after the course.
A total of 15 centres participated in the study and 420 learners from North America and Europe completed the study. The baseline score across courses was 7.49 points (range 3.97–10.42 points) out of a possible 15 points. The average increase in score was 1.40 points (95% confidence interval 0.48–2.31 points), which corresponded with an effect size of 0.44 standard deviation units. Greater improvement in scores was associated (in order of greatest to least magnitude) with active participation required of the learners, a separate statistics session, fewer topics, less teaching time, fewer learners per tutor, larger overall course size and smaller group size. Clinicians and learners involved in medical publishing improved their score more than other types of learners; administrators and public health professionals improved their score less. Learners who perceived themselves to have an advanced knowledge of EBM and had prior experience as an EBM tutor also showed greater improvement than those who did not.
EBM course organizers who wish to optimize knowledge gain should require learners to actively participate in the course and should consider focusing on a small number of topics, giving particular attention to statistical concepts.
Systematic reviews and meta-analyses of randomized trials that include patient-reported outcomes (PROs) often provide crucial information for patients and clinicians facing challenging health care decisions. Based on emerging methods, guidance on combining PROs in meta-analysis is likely to enhance their usefulness.
The objectives of this paper are: i) to describe PROs and why they are important for health care decision-making, ii) illustrate the key risk of bias issues that systematic reviewers should consider and, iii) address outcome characteristics of PROs and provide guidance for combining outcomes.
We suggest a step-by-step approach to addressing issues of PROs in meta-analyses. Systematic reviewers should begin by asking themselves if trials have addressed all the important effects of treatment on patients’ quality of life. If the trials have addressed PROs, have investigators chosen the appropriate instruments? In particular, does evidence suggest the PROs used are valid and responsive, and is the review free of outcome reporting bias? Systematic reviewers must then decide how to categorize PROs and when to pool results.
Patient-reported outcomes; Health-related quality of life; Meta-analysis; Systematic review; Health care decision-making
Clinical practice guidelines (CPGs) recommend universal prenatal screening for Group B Streptococcus (GBS) to identify candidates for intrapartum antibiotic prophylaxis to prevent early onset neonatal GBS infection. Interventions to promote physician adherence to these guidelines are imperative. This study examined the effectiveness of academic detailing (AD) of obstetricians, compared with CPG mailshot and no intervention, on the screening of pregnant women for GBS.
A randomized controlled clinical trial was conducted in the medical cooperative of Porto Alegre, Brazil. All obstetricians who assisted in a delivery covered by private health insurance managed by the cooperative in the 3 months preceding the study (n = 241) were invited to participate. The obstetricians were randomized to three groups: direct mail (DM, n = 76), AD (n = 76) and control (C, n = 89, no intervention). Those in the DM group were sent guidelines on GBS. The AD group received the guidelines and an educational visit detailing the guidelines, which was conducted by a trained physician. Data on obstetrician age, gender, time since graduation, whether patients received GBS screening during pregnancy, and obstetricians who requested screening were collected for all participant obstetricians for 3 months before and after the intervention, using database from the private health insurance information system.
Three months post-intervention, the data showed that the proportion of pregnant women screened for GBS was higher in the AD group (25.4%) than in the DM (15.9%) and C (17.7%) groups (P = 0.023). Similar results emerged when the three groups were taken as a cluster (pregnant women and their obstetricians), but the difference was not statistically significant (Poisson regression, P = 0.108). Additionally, when vaginal deliveries were analyzed separately, the proportion screened was higher in the AD group (75%) than in the DM group (41.9%) and the C group (30.4%) (chi-square, P < 0.001).
The results suggest that AD increased the prevalence of GBS screening in pregnant women in this population.
Guidelines; Physicians; Pregnancy; Screening; Streptococci
Randomized controlled trials (RCTs) that are inappropriately designed or executed may provide biased findings and mislead clinical practice. In view of recent interest in the treatment and prevention of thrombotic complications in cancer patients we evaluated the characteristics, risk of bias and their time trends in RCTs of anticoagulation in patients with cancer.
We conducted a comprehensive search, including a search of four electronic databases (MEDLINE, EMBASE, ISI the Web of Science, and CENTRAL) up to February 2010. We included RCTs in which the intervention and/or comparison consisted of: vitamin K antagonists, unfractionated heparin (UFH), low molecular weight heparin (LMWH), direct thrombin inhibitors or fondaparinux. We performed descriptive analyses and assessed the association between the variables of interest and the year of publication.
We included 67 RCTs with 24,071 participants. In twenty one trials (31%) DVT diagnosis was triggered by clinical suspicion; the remaining trials either screened for DVT or were unclear about their approach. 41 (61%), 22 (33%), and 11 (16%) trials respectively reported on major bleeding, minor bleeding, and thrombocytopenia. The percentages of trials satisfying risk of bias criteria were: adequate sequence generation (85%), adequate allocation concealment (61%), participants’ blinding (39%), data collectors’ blinding (44%), providers’ blinding (41%), outcome assessors’ blinding (75%), data analysts’ blinding (15%), intention to treat analysis (57%), no selective outcome reporting (12%), no stopping early for benefit (97%). The mean follow-up rate was 96%. Adequate allocation concealment and the reporting of intention to treat analysis were the only two quality criteria that improved over time.
Many RCTs of anticoagulation in patients with cancer appear to use insufficiently rigorous outcome assessment methods and to have deficiencies in key methodological features. It is not clear whether this reflects a problem in the design, conduct or the reporting of these trials, or both. Future trials should avoid the shortcomings described in this article.
China is experiencing increased health care use and expenditures, without sufficient controls to ensure quality and value. Transparent, cost-conscious and patient-centered guidelines based on the best available evidence could help establishing these quality and practice measures.
We examined how guidelines could support the Chinese health reform. Specifically, we summarized the current state of the art and related challenges in guideline development and explored possible solutions in the context of the Chinese health reform.
China currently lacks capacity for evidence-based guideline development and coordination by a central agency. Most Chinese guideline users rely on recommendations developed by professional groups that lack demonstration of transparency (including conflict of interest management and evidence synthesis) and quality. These deficiencies appear larger than in other regions of the world. In addition, misperceptions about the role of guidelines in assisting practitioners as opposed to providing rules requiring adherence, and a perception that traditional Chinese medicine (TCM) cannot be appropriately incorporated in guidelines are present.
China’s capacity could be strengthened by a central guideline agency to provide or coordinate evidence synthesis for guideline development and to oversee the work of guideline developers. China can build on what is known and work with the international community to develop methods to meet the challenges of evidence-based guideline development.
Venous thromboembolism (VTE) is a common preventable cause of mortality in hospitalized medical patients. Despite rigorous randomized trials generating strong recommendations for anticoagulant use to prevent VTE, nearly 40% of medical patients receive inappropriate thromboprophylaxis. Knowledge-translation strategies are needed to bridge this gap.
We conducted a 16-week pilot cluster randomized controlled trial (RCT) to determine the proportion of medical patients that were appropriately managed for thromboprophylaxis (according to the American College of Chest Physician guidelines) within 24 hours of admission, through the use of a multicomponent knowledge-translation intervention. Our primary goal was to determine the feasibility of conducting this study on a larger scale. The intervention comprised clinician education, a paper-based VTE risk assessment algorithm, printed physicians’ orders, and audit and feedback sessions. Medical wards at six hospitals (representing clusters) in Ontario, Canada were included; three were randomized to the multicomponent intervention and three to usual care (i.e., no active strategies for thromboprophylaxis in place). Blinding was not used.
A total of 2,611 patients (1,154 in the intervention and 1,457 in the control group) were eligible and included in the analysis. This multicomponent intervention did not lead to a significant difference in appropriate VTE prophylaxis rates between intervention and control hospitals (appropriate management rate odds ratio = 0.80; 95% confidence interval: 0.50, 1.28; p = 0.36; intra-class correlation coefficient: 0.022), and thus was not considered feasible. Major barriers to effective knowledge translation were poor attendance by clinical staff at education and feedback sessions, difficulty locating preprinted orders, and lack of involvement by clinical and administrative leaders. We identified several factors that may increase uptake of a VTE prophylaxis strategy, including local champions, support from clinical and administrative leaders, mandatory use, and a simple, clinically relevant risk assessment tool.
Hospitals allocated to our multicomponent intervention did not have a higher rate of medical inpatients appropriately managed for thromboprophylaxis than did hospitals that were not allocated to this strategy.
Thromboprophylaxis; Medical patients; Anticoagulants; Venous thromboembolism; Cluster randomization; Standard orders
Many academic medical centres have introduced strategies to assess the productivity of faculty as part of compensation schemes. We conducted a systematic review of the effects of such strategies on faculty productivity.
We searched the MEDLINE, Healthstar, Embase and PsycInfo databases from their date of inception up to October 2011. We included studies that assessed academic productivity in clinical, research, teaching and administrative activities, as well as compensation, promotion processes and satisfaction.
Of 531 full-text articles assessed for eligibility, we included 9 articles reporting on eight studies. The introduction of strategies for assessing academic productivity as part of compensation schemes resulted in increases in clinical productivity (in six of six studies) in terms of clinical revenue, the work component of relative-value units (these units are nonmonetary standard units of measure used to indicate the value of services provided), patient satisfaction and other departmentally used standards. Increases in research productivity were noted (in five of six studies) in terms of funding and publications. There was no change in teaching productivity (in two of five studies) in terms of educational output. Such strategies also resulted in increases in compensation at both individual and group levels (in three studies), with two studies reporting a change in distribution of compensation in favour of junior faculty. None of the studies assessed effects on administrative productivity or promotion processes. The overall quality of evidence was low.
Strategies introduced to assess productivity as part of a compensation scheme appeared to improve productivity in research activities and possibly improved clinical productivity, but they had no effect in the area of teaching. Compensation increased at both group and individual levels, particularly among junior faculty. Higher quality evidence about the benefits and harms of such assessment strategies is needed.
Clinical practice guidelines are one of the foundations of efforts to improve healthcare. In 1999, we authored a paper about methods to develop guidelines. Since it was published, the methods of guideline development have progressed both in terms of methods and necessary procedures and the context for guideline development has changed with the emergence of guideline clearinghouses and large scale guideline production organisations (such as the UK National Institute for Health and Clinical Excellence). It therefore seems timely to, in a series of three articles, update and extend our earlier paper. In this second paper, we discuss issues of identifying and synthesizing evidence: deciding what type of evidence and outcomes to include in guidelines; integrating values into a guideline; incorporating economic considerations; synthesis, grading, and presentation of evidence; and moving from evidence to recommendations.
Clinical practice guidelines are one of the foundations of efforts to improve health care. In 1999, we authored a paper about methods to develop guidelines. Since it was published, the methods of guideline development have progressed both in terms of methods and necessary procedures and the context for guideline development has changed with the emergence of guideline clearing houses and large scale guideline production organisations (such as the UK National Institute for Health and Clinical Excellence). It therefore seems timely to, in a series of three articles, update and extend our earlier paper. In this third paper we discuss the issues of: reviewing, reporting, and publishing guidelines; updating guidelines; and the two emerging issues of enhancing guideline implementability and how guideline developers should approach dealing with the issue of patients who will be the subject of guidelines having co-morbid conditions.
Clinical practice guidelines are one of the foundations of efforts to improve health care. In 1999, we authored a paper about methods to develop guidelines. Since it was published, the methods of guideline development have progressed both in terms of methods and necessary procedures and the context for guideline development has changed with the emergence of guideline clearing houses and large scale guideline production organisations (such as the UK National Institute for Health and Clinical Excellence). It therefore seems timely to, in a series of three articles, update and extend our earlier paper. In this first paper we discuss: the target audience(s) for guidelines and their use of guidelines; identifying topics for guidelines; guideline group composition (including consumer involvement) and the processes by which guideline groups function and the important procedural issue of managing conflicts of interest in guideline development.
Accurate diagnosis is a fundamental aspect of appropriate healthcare. However, clinicians need guidance when implementing diagnostic tests given the number of tests available and resource constraints in healthcare. Practitioners of health often feel compelled to implement recommendations in guidelines, including recommendations about the use of diagnostic tests. However, the understanding about diagnostic tests by guideline panels and the methodology for developing recommendations is far from completely explored. Therefore, we evaluated the factors that guideline developers and users need to consider for the development of implementable recommendations about diagnostic tests.
Using a critical analysis of the process, we present the results of a case study using the Grading of Recommendations Applicability, Development and Evaluation (GRADE) approach to develop a clinical practice guideline for the diagnosis of Cow Milk Allergy with the World Allergy Organization.
To ensure that guideline panels can develop informed recommendations about diagnostic tests, it appears that more emphasis needs to be placed on group processes, including question formulation, defining patient-important outcomes for diagnostic tests, and summarizing evidence. Explicit consideration of concepts of diagnosis from evidence-based medicine, such as pre-test probability and treatment threshold, is required to facilitate the work of a guideline panel and to formulate implementable recommendations.
This case study provides useful guidance for guideline developers and clinicians about what they ought to demand from clinical practice guidelines to facilitate implementation and strengthen confidence in recommendations about diagnostic tests. Applying a structured framework like the GRADE approach with its requirement for transparency in the description of the evidence and factors that influence recommendations facilitates laying out the process and decision factors that are required for the development, interpretation, and implementation of recommendations about diagnostic tests.
Guideline panellists have differing opinions on whether resource use should influence decisions on individual patients. As medical care costs rise, resource use considerations become more compelling, but panellists may find dealing with such considerations challenging
The GRADE system can be used to grade the quality of evidence and strength of recommendations for diagnostic tests or strategies. This article explains how patient-important outcomes are taken into account in this process
The GRADE system classifies recommendations made in guidelines as either strong or weak. This article explores the meaning of these descriptions and their implications for patients, clinicians, and policy makers
Guideline developers use a bewildering variety of systems to rate the quality of the evidence underlying their recommendations. Some are facile, some confused, and others sophisticated but complex
Guidelines are inconsistent in how they rate the quality of evidence and the strength of recommendations. This article explores the advantages of the GRADE system, which is increasingly being adopted by organisations worldwide
Lower urinary melatonin levels are associated with a higher risk of breast cancer in postmenopausal women. Literature for premenopausal women is scant and inconsistent.
In a prospective case–control study we measured the concentration of 6-sulphatoxymelatonin (aMT6s), in the 12-hour overnight urine of 180 premenopausal women with incident breast cancer and 683 matched controls.
In logistic regression models, the multivariate odds ratio (OR) of invasive breast cancer for women in the highest quartile of total overnight aMT6s output compared with the lowest was 1.43 [95% confidence interval (CI) = 0.83–2.45; Ptrend = 0.03]. Among current non-smokers no association was existent (OR, 1.00, 95% CI, 0.52–1.94; Ptrend = 0.29). We observed an OR of 0.68 between overnight urinary aMT6s level and breast cancer risk in women with invasive breast cancer diagnosed >2 years after urine collection and a significant inverse association in women with a breast cancer diagnosis >8 years after urine collection (OR, 0.17, 95% CI = 0.04–0.71; Ptrend = 0.01). There were no important variations in ORs by tumor stage or hormone receptor status of breast tumors.
Overall we observed a positive association between aMT6s and risk of breast cancer. However, there was some evidence to suggest that this might be driven by the influence of subclinical disease on melatonin levels, with a possible inverse association among women diagnosed further from recruitment. Thus, the influence of lagtime on the association between melatonin and breast cancer risk needs to be evaluated in further studies.
melatonin; aMT6s; premenopausal; night work; breast cancer
Overactive bladder (OAB) affects the lives of millions of people worldwide and antimuscarinics are the pharmacological treatment of choice. Meta-analyses of all currently used antimuscarinics for treating OAB found similar efficacy, making the choice dependent on their adverse event profiles. However, conventional meta-analyses often fail to quantify and compare adverse events across different drugs, dosages, formulations, and routes of administration. In addition, the assessment of the broad variety of adverse events is dissatisfying. Our aim was to compare adverse events of antimuscarinics using a network meta-analytic approach that overcomes shortcomings of conventional analyses.
Cochrane Incontinence Group Specialized Trials Register, previous systematic reviews, conference abstracts, book chapters, and reference lists of relevant articles were searched. Eligible studies included randomized controlled trials comparing at least one antimuscarinic for treating OAB with placebo or with another antimuscarinic, and adverse events as outcome measures. Two authors independently extracted data. A network meta-analytic approach was applied allowing for joint assessment of all adverse events of all currently used antimuscarinics while fully maintaining randomization.
69 trials enrolling 26′229 patients were included. Similar overall adverse event profiles were found for darifenacin, fesoterodine, transdermal oxybutynin, propiverine, solifenacin, tolterodine, and trospium chloride but not for oxybutynin orally administered when currently used starting dosages were compared.
The proposed generally applicable transparent network meta-analytic approach summarizes adverse events in an easy to grasp way allowing straightforward benchmarking of antimuscarinics for treating OAB in clinical practice. Most currently used antimuscarinics seem to be equivalent first choice drugs to start the treatment of OAB except for oral oxybutynin dosages of ≥10 mg/d which may have more unfavorable adverse event profiles.
Systematic reviews of randomized trials that include measurements of health-related quality of life potentially provide critical information for patient and clinicians facing challenging health care decisions. When, as is most often the case, individual randomized trials use different measurement instruments for the same construct (such as physical or emotional function), authors typically report differences between intervention and control in standard deviation units (so-called "standardized mean difference" or "effect size"). This approach has statistical limitations (it is influenced by the heterogeneity of the population) and is non-intuitive for decision makers. We suggest an alternative approach: reporting results in minimal important difference units (the smallest difference patients experience as important). This approach provides a potential solution to both the statistical and interpretational problems of existing methods.
In the last few years, a new non-pharmacological treatment, termed apheresis, has been developed to lessen the burden of ulcerative colitis (UC). Several methods can be used to establish treatment recommendations, but over the last decade an informal collaboration group of guideline developers, methodologists, and clinicians has developed a more sensible and transparent approach known as the Grading of Recommendations, Assessment, Development and Evaluation (GRADE). GRADE has mainly been used in clinical practice guidelines and systematic reviews. The aim of the present study is to describe the use of this approach in the development of recommendations for a new health technology, and to analyse the strengths, weaknesses, opportunities, and threats found when doing so.
A systematic review of the use of apheresis for UC treatment was performed in June 2004 and updated in May 2008. Two related clinical questions were selected, the outcomes of interest defined, and the quality of the evidence assessed. Finally, the overall quality of each question was taken into account to formulate recommendations following the GRADE approach. To evaluate this experience, a SWOT (strengths, weaknesses, opportunities and threats) analysis was performed to enable a comparison with our previous experience with the SIGN (Scottish Intercollegiate Guidelines Network) method.
Application of the GRADE approach allowed recommendations to be formulated and the method to be clarified and made more explicit and transparent. Two weak recommendations were proposed to answer to the formulated questions. Some challenges, such as the limited number of studies found for the new technology and the difficulties encountered when searching for the results for the selected outcomes, none of which are specific to GRADE, were identified. GRADE was considered to be a more time-consuming method, although it has the advantage of taking into account patient values when defining and grading the relevant outcomes, thereby avoiding any influence from literature precedents, which could be considered to be a strength of this method.
The GRADE approach could be appropriate for making the recommendation development process for Health Technology Assessment (HTA) reports more explicit, especially with regard to new technologies.