|Home | About | Journals | Submit | Contact Us | Français|
A core activity of evidence-based practice is the search for and appraisal of evidence on specific clinical issues. Clinicians vary in their competence in this process; we therefore developed a 16-item checklist for quality of content (relevance and validity) and presentation (useability, attribution, currency and contact details). This was applied to a set of 55 consecutive appraisals conducted by clinicians and posted at a web-based medical journal club site.
Questions were well formulated in 51/55 (92%) of the appraisals. However, 22% of appraisals missed the most relevant articles to answer the clinical question. Validity of articles was well appraised, with methodological information and data accurately extracted in 84% and accurate conversion to clinically meaningful summary statistics in 87%. The appraisals were presented in a useable way with appropriate and clear bottom-lines stated in 95%.
The weakest link in production of good-quality critical appraisals was identification of relevant articles. This should be a focus for evidence-based medicine and critical appraisal skills.
Evidence-based medicine demands the search for relevant information, appraisal of the evidence and dissemination of the resultant information1,2. Sometimes these ‘critical appraisals’ are conducted in the context of a medical journal club run on problem-based and evidence-based lines3,4,5. Many clinicians lack the necessary skills6,7, and the quality of critical appraisals in current clinical practice cannot be relied upon8. Therefore we developed a quality checklist and applied it to a set of appraisals generated in a medical journal club.
We performed an Internet search to identify databanks of critical appraisals relevant to our specialty (obstetrics and gynaecology). This was done by use of the meta-search engine Copernic9, which simultaneously consults Yahoo, Infoseek, Altavista, Webcrawler, Hotbot, Lycos, Northern Light, Deja.com, Excite, Directhit, Euroseek, Fast Search, MSN Websearch, Nescape Netcentre and Magellan. Copernic not only searches the most commonly used engines10 but also removes any duplicate web pages automatically. The following terms were used for the search: ‘cats’, ‘critically appraised topics’, ‘evidence-based medicine’, ‘evidence-based practice’, ‘critical appraisals’, and ‘evidence-based journal clubs’. This resulted in 53 hits but only 14 were databanks of critical appraisals (CAT banks) and just 3 of these were related to obstetrics and gynaecology—namely, Birmingham Women's Hospital website (55 appraisals), University of Rochester website (4) and University of North Carolina website (3). The University of Rochester and University of North Carolina CAT banks were excluded from our study because of their small sample size.
A journal club based on principles of evidence-based practice1,11 and effective adult learning3,7 was started at the Birmingham Women's Hospital in July 1998. The aim was to identify, appraise, summarize and disseminate evidence for guiding decisions in specific clinical settings2—for example, delivery suite, clinics or wards. First, clinical questions based on patients' problems were framed in an answerable form; then electronic bibliographic databases were searched for relevant articles; the retrieved articles were appraised for validity, clinical importance of results and applicability; and, finally, the information was recorded in electronic format suitable for storage and retrieval on the hospital intranet and the Internet [www.thenhs.com/bham-womens/cats/index.htm ]. The clinician preparing the evidence summaries had access to advice on literature searching, critical appraisal and the use of computer software. The appraisals were peer reviewed and refined during the journal club meetings. The process led to one-page summaries that have been termed critically appraised topics or CATs12.
Our 16-item checklist focuses on six domains—relevance, validity, useability, attribution, currency and contact details (Table 1). The first two domains assess content and the remaining four presentation. The domains and items for assessment of quality of content were developed from published work1,11. The domains and items for the assessment of quality of presentation were generated from evaluations of health-related websites13,14,15. Relevance (items 1-5, Table 1) was a measure of the adequacy of procedures for formulating clinical questions and searching and selecting appropriate articles. Validity (items 6-8, Table 1) was determined by assessing the adequacy of the critical appraisal for the different types of clinical questions1,11. We defined useability (items 9-11, Table 1) as a measure of the ease with which the presented evidence could be used in clinical practice; this depended on reporting clinically meaningful summary statistics and the appropriateness and clarity of bottom-lines for clinical use. Attribution (item 12, Table 1) was evaluated by presence or absence of clear references to data source. Currency (items 13 and 14, Table 1) was a measure of whether the appraisal was up to date, and depended on the provision of creation and ‘kill-by’ dates. We also sought contact (items 15 and 16, Table 1) details since these would help with feedback and clarification of the material contained within the appraisal. For each of the 16 items in the checklist we developed criteria for adequacy as shown in Table 1. This checklist was then piloted by two reviewers (AC and PL) on ten appraisals from our hospital's bank of appraisals. The agreement between the reviewers was excellent and the original checklist needed only minor modifications. Any disagreements between the reviewers were resolved by consensus.
Three reviewers—two trained in critical appraisal (AC and PL) and one in literature searching (MP)—were involved in data extraction. We began the examination of each appraisal by evaluating its clinical question. Keywords based on the various parts of the question were selected and searches were conducted in relevant electronic bibliographic databases. The five most relevant citations, chosen because of the directness with which they addressed the clinical question, their methodological rigour and their sample sizes, were selected. Systematic reviews, if available, were ranked highly. The reviewers conducted the searches independently and the top five citations (‘gold standard’) were drawn by consensus. If the appraisal was based on one of these five articles, we considered it appropriate. However, if any of the five selected articles had not been available to the author of an appraisal at the time of its production, we excluded these from the gold standard list. In addition, when a systematic review was available but was not selected, the appraisal was regarded as inadequate. Two independent reviewers (AC and PL) then critically appraised the identified studies. These served as the standard against which the appraisal (CAT) and its bottom-line were compared. Any disagreements between the reviewers were resolved by consensus or arbitration by a senior author (HG or KSK).
Scores for the quality items in our checklist were recorded as percentages of the total number of appraisals, except for the items that dealt with the choice of keywords and use of electronic bibliographic sources, where the denominator was the total number of appraisals where a deliberate electronic search was conducted. The analysis was conducted for all appraisals as well as for a subgroup where an appropriate article had been obtained, since appraisal of irrelevant articles cannot be of value in clinical practice. Agreement between the reviewers was assessed by percentage agreement and kappa statistic. Minimally acceptable agreement was set at a kappa level of 0.616.
Between July 1998 and December 2000 55 appraisals were posted on the intranet and the Internet sites hosting the medical journal club. 41 different clinicians (21 senior house officers, 17 registrars and 3 consultants) had authored the appraisals. Among the types of clinical questions that led to the appraisals, therapy issues were the commonest (30), followed by prognostic (17) and diagnostic (8) (Table 2). Findings on quality of content and presentation are summarized in Table 3.
The agreement between reviewers regarding various quality items varied between 95 and 100% (kappa 0.77-1). For formulation of clinical questions the agreement was 95% (0.77), for correct conversion to clinically meaningful figures it was 96% (0.84) and for attribution, currency and contact it was 100% (1).
The findings of this study indicate that the weakest link in generation of critical appraisals in clinical practice is the selection of articles. Among appraisals on therapy issues, Cochrane Library, a rich source of systematic reviews and randomized trials, would be the most appropriate database, yet it was searched in only 5 of 36. In addition, appraisals were sometimes inaccurate. However, when an appropriate article had been retrieved for appraisal, useability was found to be high, with appropriate and clear bottom-lines generated in the majority.
A possible source of bias was overlap between the study authorship and appraisal authorship. In 12 (22%) of the appraisals, 4 of the study authors (AC, PL, KSK and HG) were directly involved. We compared the quality of these 12 appraisals with that of the 43 in which they had no direct involvement. There was no difference in measured quality.
One reason for a poor-quality appraisal could be abbreviated reporting rather than poor execution. However, in our journal club all appraisals were generated by use of the CATMaker software8 which prompts the author to enter all relevant details without any major restriction on length of text. In addition, the appraisals were posted on the web without any truncation and, if required, we used web-design software (Microsoft Frontpage) to allow additional information to be incorporated into the appraisal. Therefore it is unlikely that abbreviated reporting affected quality in our study.
To our knowledge, this is the first study that attempts to assess the quality of web-based appraisals generated by clinicians. Currently quality of information is assessed through indirect measures such as authority of the source (for example, a medically qualified doctor or a professional body) and attribution (whether reference to the original article is given)13. Our checklist incorporated these indirect measures of accuracy but in addition we developed items that directly measured the content for relevance and validity. Our checklist also shares several items (searching, selection and validity assessment) with the QUOROM statement17 which is used to assess the quality of systematic reviews. This improves the face validity of our instrument. We believe our quality instrument may contribute further to the development of quality assessment tools for web-based health information.
Currently training in evidence-based medicine focuses on teaching of critical appraisal, but our study shows that selection of appropriate articles must also be addressed. Clinicians should approach appraisals by fellow clinicians with healthy scepticism, since they are sometimes irrelevant and occasionally inaccurate.