In summary, we found that while two-thirds of cardiovascular risk management therapy recommendations made in the nine different guidelines we examined were based on RCT evidence, less than half of these RCT-based recommendations were deemed “high quality” using an evidence-grading scheme that went beyond considerations of internal validity alone to take into account clinical relevance and direct applicability of the RCT to that recommendation. As a result, less than one-third of recommendations that advocated specific cardiovascular risk management therapies in these evidence-based guidelines were actually based on high-quality evidence.
The most frequent reason for RCT-based recommendations to be down-graded was that the RCT was conducted to answer a particular question in a restricted study population but was then extrapolated in the guideline to justify using the tested intervention in a related, but different, clinical scenario and/or in a more general population. In a similar vein, other investigators have recently questioned whether the evidence cited in the Third Report of the National Cholesterol Education Program, or NCEP III, as support for recommendations to use statins for primary prevention of cardiovascular disease is directly applicable, since one-tenth of the patients in the 16 “primary prevention” trials cited in that guideline had cerebrovascular or peripheral vascular disease at baseline [24
As a corollary, it is evident that while a particular RCT may be used as the basis for multiple recommendations, RCTs will not provide the same quality of evidence for each recommendation (and in some cases guideline developers may extrapolate beyond the limits of the evidence in making particular recommendations). For example, the 2003 Kidney Disease Outcome Quality Initiative guidelines [25
] recommended statins for all patients with chronic kidney disease and LDL > 2.59 mmol/l, including those with end-stage renal disease, on the basis of RCTs such as the Heart Protection Study which were positive, but excluded patients with end-stage renal disease [26
]. However, a recently published RCT conducted in 1255 hemodialysis patients with type 2 diabetes mellitus found no reduction in the primary outcome of cardiovascular events or death but instead an unexpected increase in the risk of stroke with statin therapy [27
We do not mean to imply that recommendations should not be made in the absence of high quality evidence or that RCT evidence should not be extrapolated beyond the limits of trial eligibility criteria. Indeed, we recognize that trialists design RCTs with relatively homogenous populations in order to maximize internal validity (at the expense of external validity), and there are published guides on how and when to extrapolate RCT evidence to individual patient situations [28
]. However, we do believe that transparency about any extrapolation of RCT evidence is critical, particularly in light of studies demonstrating that the composition and interpersonal dynamics of a guideline panel influence the extent to which their consensus recommendations diverge from the available evidence base [29
Our findings that only some guidelines linked their recommendations to citations and that only some used explicit grading systems to communicate the quality of the evidence echo earlier reviews [7
]. Similarly, our finding that many treatment recommendations are not based on RCT evidence has been reported before in other fields [35
]. However, our unique finding is that even those recommendations in evidence-based guidelines that cite internally valid RCTs as support may not be underpinned by high-quality evidence. We did not compare the countries of origin for cited studies, because previous studies have already established that local evidence tends to be over-represented in guidelines [7
]. While the guidelines we studied were published at different times, the range was narrow (2003–2006) and our study focused on the type of evidence cited by each guideline rather than by the specific recommendations made.
Despite a number of strengths, our study has some limitations. First, we did not systematically search for different guidelines, but instead examined only a small sample of guidelines; future studies should expand our work to explore the quality of evidence underlying guidelines in other topic areas. For example, we believe a systematic examination of all guidelines produced by a particular organization (or, alternatively, all published guidelines in a particular topic area) would provide useful additional insights. To do so, we advocate the use of explicit grading schemes such as those of CHEP or GRADE (as in the recently reported framework for World Health Organization Rapid Advice Guidelines [38
]). However, while we found a high degree of inter-rater reliability for assessing whether RCT evidence was high quality or not (kappa 0.78 after all investigators completed a training set), future studies should also assess the inter-rater reliability scores for the GRADE or CHEP schemes if used by investigators less familiar with the schemes. Second, because the guidelines were inconsistent in how they cited studies in support of therapy recommendations (with some providing the citation directly with the recommendation and others providing numerous citations at the end of supporting text associated with recommendations), there is a potential risk we may have misattributed citations to particular recommendations. We attempted to minimize this risk by having two investigators extract recommendations and citations independently for each guideline and by always biasing in favor of the guideline (i.e., if several citations were attached to a recommendation, we assigned the highest evidence rating achieved by any of the studies to that recommendation). However, future researchers may want to consider prospectively surveying guideline developers to determine exactly which pieces of evidence are considered for each recommendation; also, documentation of the debate around the evidence for particular recommendations in different overlapping guidelines would provide potentially interesting insights. Finally, although our choice to restrict our analysis to cardiovascular therapy recommendations may be perceived as a limitation, it in fact strengthens our conclusions since therapy recommendations are those most likely to be based on RCT evidence. Thus, our findings represent a “best-case” scenario, insofar as very few preventive or diagnostic guideline recommendations are based on RCT evidence.
In conclusion, our finding that less than one-third of treatment recommendations (and less than half of those citing RCTs in support of the advocated treatment) were based on high-quality evidence in national evidence-based guidelines for common conditions should sound a note of caution to consumers of clinical practice guidelines who assume that the sobriquet “evidence based” means that all recommendations contained therein are derived from high-quality evidence. In particular, we have documented that even evidence arising from internally valid RCTs may not be directly applicable to the populations, interventions, and outcomes specified in a guideline recommendation. As a recent editorial noted, “external validity is the neglected dimension in evidence ranking” [39
]. Indeed, in order to make the evidence base underlying therapy recommendations more transparent in future guidelines, we advocate wider adoption of evidence-rating schemes (such as the CHEP system or the GRADE system) that go beyond just judging the internal validity of supporting evidence but also incorporate considerations of the clinical relevance and applicability of that evidence to the clinical scenario the recommendation is being made for. A clearer understanding of the strengths and limitations of the underlying evidence base will then permit clinicians to individualize the application of practice guideline recommendations to their patients.