|Home | About | Journals | Submit | Contact Us | Français|
Valid and reliable ischemic stroke subtype determination is crucial for well-powered multicenter studies. The Causative Classification of Stroke System (CCS, available at http://ccs.mgh.harvard.edu) is a computerized, evidence-based algorithm that provides both causative and phenotypic stroke subtypes in a rule-based manner. We determined whether CCS demonstrates high interrater reliability in order to be useful for international multicenter studies.
Twenty members of the International Stroke Genetics Consortium from 13 centers in 8 countries, who were not involved in the design and development of the CCS, independently assessed the same 50 consecutive patients with acute ischemic stroke through reviews of abstracted case summaries. Agreement among ratings was measured by kappa statistic.
The κ value for causative classification was 0.80 (95% confidence interval [CI] 0.78–0.81) for the 5-subtype, 0.79 (95% CI 0.77–0.80) for the 8-subtype, and 0.70 (95% CI 0.69–0.71) for the 16-subtype CCS. Correction of a software-related factor that generated ambiguity improved agreement: κ = 0.81 (95% CI 0.79–0.82) for the 5-subtype, 0.79 (95% CI 0.77–0.80) for the 8-subtype, and 0.79 (95% CI 0.78–0.80) for the 16-subtype CCS. The κ value for phenotypic classification was 0.79 (95% CI 0.77–0.82) for supra-aortic large artery atherosclerosis, 0.95 (95% CI 0.93–0.98) for cardioembolism, 0.88 (95% CI 0.85–0.91) for small artery occlusion, and 0.79 (0.76–0.82) for other uncommon causes.
CCS allows classification of stroke subtypes by multiple investigators with high reliability, supporting its potential for improving stroke classification in multicenter studies and ensuring accurate means of communication among different researchers, institutions, and eras.
Stroke research requires valid and reliable stroke subtype determination, particularly for well-powered multicenter studies. The Causative Classification of Stroke System (CCS) is a Web-based, semiautomated, evidence-based classification system constructed upon the simple, useful, and time-tested Trial of ORG-10172 in Acute Stroke concept of categorizing ischemic stroke etiology into 5 major subtypes.1,2 CCS (available free for academic use at http://ccs.mgh.harvard.edu) is a decision-making algorithm that incorporates information from the diagnostic evaluation of each patient in a rule-based manner to assign the most likely causative mechanism. Patients can be categorized crudely into 1 of 5 subtypes, or, in a more refined approach, into 8 or even 16 subtypes that specify the level of confidence of an assignment (table). An internal assessment of CCS by 2 raters (κ = 0.90)1 and a subsequent external assessment by 5 raters from 4 US centers (κ = 0.86)2 demonstrated excellent reliability. In order to ensure its utility in multicenter clinical studies, we sought to determine multirater reliability of CCS in a more diverse setting within the network of the International Stroke Genetics Consortium (ISGC).
Raters from 13 centers in 8 countries independently assessed abstracted data from 50 patients with acute ischemic stroke using CCS software. Raters were recruited from among members of the ISGC through a letter of invitation, and included 16 stroke neurologists, 2 clinical neuroscientists, 1 stroke fellowship-trained emergency physician, and 1 neurology resident. Six centers provided ratings by a single rater, 5 centers provided consensus ratings from paired raters, and 2 centers provided separate and independent ratings by 2 raters.
Abstracted case summaries were prepared by an experienced stroke neurologist who did not participate in the assessment process. Abstracted data were derived from medical records of 50 consecutive patients presenting to Massachusetts General Hospital with ischemic stroke during a 1-month period. Abstracted data included original reports of brain imaging, vascular imaging, cardiac evaluation (EKG, echocardiography), and other laboratory tests. A copy of clinical description of index stroke and past medical history as reported by the admitting physician was also included. All data were in English.
Each ISGC rater was provided with prior CCS publications1,2 and a summary of the operational aspects of the system. They then completed an interactive online CCS training module designed to develop consistency among users in identifying critical data elements for subtype assignment. Raters completed assessments by filling out a standard form for each case vignette, which included all the activated data entry fields on the classification form as well as the final CCS subtype assignment. Raters were also asked to provide their expert opinion on stroke subtype independent of their CCS assignment.
Clinical characteristics of the 50 case vignettes have been described in detail elsewhere.2 All patients had brain imaging (MRI in 45, CT in 43, both in 37), vascular imaging (CT angiography in 40, magnetic resonance angiography in 25, both in 17), and electrocardiography, 41 had echocardiography, and 7 had vascular ultrasound studies. Etiologic investigations revealed a high-risk cardiac source in 14 and low-risk cardiac source in 20 (per CCS criteria), intracranial or extracranial atherosclerotic stenosis in 14, lacunar infarction in 8, arterial dissection in 3, findings suggestive of primary antiphospholipid syndrome in 2, and angiographic moyamoya pattern and intracranial aneurysm in 1 patient each. Eleven patients had multiple potential causes. Etiologic investigations failed to reveal an etiology in 5 patients.
An updated version of CCS (version 2.0) was used for the present study (figure 1), which provided phenotypic in addition to causative subtyping (figure 1B). Phenotypic categories included supra-aortic large artery atherosclerosis, cardioembolism, small artery occlusion, and other uncommon causes. There were 4 possible states for supra-aortic large artery atherosclerosis and cardioembolism (major, minor, absent, incomplete evaluation), 3 for small artery occlusion (major, absent, incomplete evaluation), and 2 for the other uncommon causes group (major, absent). Possible states for “other uncommon causes” did not include “incomplete evaluation” because there was no standard minimum diagnostic evaluation for the group of disorders in this category. A mechanism was considered major if its potential to cause stroke was high, minor if it was low, and absent if relevant diagnostic investigations were normal. High-risk sources corresponded to evident causes and low-risk sources corresponded to possible causes, as described in detail in prior CCS publications.1,2 Because phenotypic classification depended on only the presence of a potential source, complex aortic atheroma was grouped under supra-aortic large artery atherosclerosis in the phenotypic classification, as opposed to cardioaortic embolism in the causative classification.
Data analyses were performed by investigators uninvolved in rating case vignettes. Reproducibility of both causative and phenotypic subtype assignments was evaluated by the kappa statistic for multiple raters.3 The sample size (number of case vignettes) required to achieve a κ of 0.75 (α = 0.05, power = 0.80) was 50 for 15 raters. Kappa values were compared using z test. One-way analysis of variance was used to assess differences in concordance for each rater with other raters.
Sources of disagreement among raters were investigated through systematic comparison of data entry fields (checkboxes) activated by each rater with a standard template produced by the data abstractor. Disagreements were classified as 1) failure to enter crucial data elements (for instance, atrial fibrillation) supplied in case summaries (errors of omission) and 2) entry of data not reported in the vignette (errors of substitution). After an initial collection of ratings, refinements to the text and to the software were made in order to eliminate identified sources of these errors.
The study was approved by the local Human Studies Committee.
There were a total of 750 stroke subtype assignments for 50 case vignettes. Concordance among ratings was high for the 5-causative subtype CCS (figure 2). Each rater was concordant with other raters on an average of 84% (SD ±3%) of the ratings (p = 0.837 by analysis of variance). The reliability analysis revealed a κ of 0.80 (95% confidence interval [CI] 0.78–0.81) for the 5-subtype, 0.79 (95% CI 0.77–0.80) for the 8-subtype, and 0.70 (95% CI 0.69–0.71) for the 16-subtype causative CCS. The kappa across 5 consensus ratings was similar to that for 10 single-rater diagnoses (p = 0.752 for 5-item CCS, 0.343 for 8-item CCS, and 0.655 for 16-item CCS). Expert global judgment of stroke subtype was elicited for 10 of the 15 sets of vignettes. It differed from the 5-causative subtype CCS in 22 of the 500 ratings (4%).
There was excellent agreement for phenotypic subtype designation. The κ value for phenotypic classification was 0.79 (0.77–0.82) for supra-aortic large artery atherosclerosis, 0.95 (0.93–0.98) for cardioembolism, 0.88 (0.85–0.91) for small artery occlusion, and 0.79 (0.76–0.82) for other uncommon causes. One feature of the CCS system is its ability to assign patients with multiple competing etiologies into known subtypes. Twenty-two percent of cases in the current study had more than one etiology. However, despite this, the size of unclassified category in 15 sets of ratings ranged between only 0% and 8%.
The mean agreement rate between the data abstractor and the study raters on 5-subtype CCS was 88% (95% CI 86%–90%). Rater errors of omission or substitution that altered the causative subtype assignment occurred in 74 case ratings (out of 750 case ratings); 50 were errors of omission and 24 were errors of substitution. Seventy percent of all errors were related to data entry fields for imaging evaluation of the brain and brain vessels, 13% for cardiac evaluation, 10% for clinical evaluation, and 7% for evaluation for other uncommon causes. Interpretation of imaging findings from reports led to disagreement as to whether a vascular cutoff was due to embolus or local atherosclerosis, a vascular stenosis was due to atherosclerosis or nonocclusive nonatherosclerotic stenosis, and a small deep infarct described in the radiology reports was lacunar. Other examples of differences in interpretation of the abstracted data included whether calcific aortic stenosis indicated rheumatic aortic valve disease, whether migraine-related stroke was a diagnosis of exclusion, and whether prothrombotic abnormalities were the underlying mechanism of stroke in patients with no other evident cause.
In addition to differences in interpretation of the abstracted data, one software-related factor contributed to interobserver disagreement. This was caused by automatic disabling of data entry fields for cardiac emboli sources when the field for incomplete cardiac investigation was activated. This led to inability to enter cardiac sources revealed by clinical history, physical examination, or EKG (for instance, atrial fibrillation) into the system when a rater judged that cardiac investigations were not complete (for instance, due to absence of echocardiography).
Several refinements and changes in the CCS system were therefore made, in order to eliminate rater-related and software-related sources of disagreement and accommodate suggestions from participating raters:
Etiologic stroke subtype designations are elemental to clinical investigation of cerebrovascular diseases given the heterogeneity of biologic mechanisms underlying stroke. It is, therefore, imperative to have an easily replicable classification system in which all the terms used are sufficiently clear that they can be used interchangeably by different investigators. The CCS appears to provide a satisfactory basis for communication among multiple raters. Taken together with prior internal1 and external US2 reliability studies, the present international study further demonstrates that the CCS can be used with high reliability by multiple raters involved in the study of stroke.
Interrater reliability is an important measure of the quality of classification systems. Most systems or scales currently used in clinical stroke research fail to achieve excellent reliability. Kappa values reported by independent investigators at multirater settings range from 0.42 to 0.68 for TOAST,5–8 0.25 to 0.64 (unweighted kappa) for the modified Rankin Scale score,9–11 and 0.27 to 0.68 for the Bartel Index.12 A computerized algorithm that used original TOAST rules revealed a relatively higher κ value (0.68, 95% CI 0.44–0.91) but there were only 2 raters and 20 cases.6 Disagreement is likely to be greater in larger and unselected cohorts, in cohorts with diverse etiologies, and at settings where multiple raters (>2) are involved. Deviations from perfect reliability introduce measurement or misclassification error to stroke research and this, in turn, erodes the efficiency and power of clinical studies, a critical issue for genetic research in particular, where the effect sizes of genetic variants are presumed to be small or moderate.13 Depending on the study design and variability in outcome measure, an improvement in κ value from 0.50 to 0.80 will permit a reduction in sample size by up to 40% to achieve the same study power.14 CCS reduces the variance from subjective interpretation of clinical data by introducing a well-referenced, well-defined, and evidence-based subtype assignment, particularly in patients with multiple competing etiologies or incomplete diagnostic investigation. Application of the CCS in multicenter studies therefore offers the potential for increasing the efficiency of those studies by reducing the sample size needed and, hence, the cost.
Classification errors generally arise from 3 sources: the abstracted patient data, the rater, and the input system. Data-related errors include ambiguities in the abstracted data (for instance, inconsistencies between 2 similar tests, such as CT and MRI) and lack of data that are critical for subtyping (which may prompt raters to use their best guess). In the current study, we used a manual, which required documentation of clinical findings and original test report results in a regularized and standard manner, to guide the data abstraction process in order to minimize such potential sources of variance. Reliability can be artificially high if classification systems are tested in cohorts that disproportionately consist of subjects that strongly match a specific category. Such subjects are unequivocally classified into that category. As employed in the current study, selection of consecutive patients who harbor a wide spectrum of stroke etiologies including multiple competing mechanisms as well as rare and uncertain causes may reduce this source of bias.
Rater-related errors arise from between-rater differences in the classification and assessment of abstracted data. Such variation in expert opinion is probably to be expected, given that judgments regarding etiologic stroke subtype assignment depend on raters' experience, knowledge, and understanding of the classification rules. An inevitable source of rater-related error is the overlooking of important abstracted information. This is a particular concern in circumstances where assessment of a cluster of patients is needed and focus and attention must be maintained for long periods of time. Another important basis for rater-related disagreement noted in the present study was the unavailability of radiographic images for visual assessment. Reports of radiographic images do not always provide the most critical information necessary for accurate subtyping. Interpretation of radiographic images by multiple raters, however, also introduces variability to research studies, and its overall impact on the classification results will have to be tested in a relevant setting.15,16
CCS offers a number of features to prevent users from entering inconsistent data. These include input error checking, automatic disabling, enabling, checking and unchecking of dependent elements, and tool-tips to provide more detailed explanations of terms and definitions used in the text. The only system-related factor for disagreement in the present study was the inability of system logic functions to account for differences in expert opinion for the need for echocardiography for stroke evaluation. High variance in expert opinion for echocardiography is not unanticipated because there is no uniform recommended approach for performance of echocardiography for evaluation of stroke.17,18 CCS requires echocardiography only if there is clinical suspicion of cardiac embolism and if clinical history, cardiac examination, and EKG do not reveal a source. CCS now allows classification of patients with known sources of cardiac embolism into relevant subtypes regardless of the expert opinion for echocardiography. The resultant improvement in the reliability from the simulation analysis suggests that the CCS can be used in multicenter research with minimal level of inconsistency.
The CCS offers subtype information in 2 different formats: the causative subtype and phenotypic subtype. Identification of the causative subtype requires integration of multiple aspects of ischemic stroke evaluation including symptom characteristics, vascular risk factors, diagnostic test results, response to treatment, and prognosis. In other words, designation of the causative subtype requires a decision-making process. For instance, the diagnosis of small vessel occlusion requires not only the presence of a lacunar infarct in a clinically relevant location but also exclusion of conditions such as dissection, atherosclerosis, vasculitis, or vasospasm of the parent artery at the origin of the penetrating artery, major cardiac emboli sources, and other relevant uncommon causes of stroke.
The process of phenotypic subtyping, on the other hand, does not require any judgment on the part of the clinician-investigator. Furthermore, there are no tradeoffs among positive test findings and thereby inadvertent loss of information. For instance, in a patient with a lacunar infarction in the pons, multiple stereotypic lacunar TIAs during the preceding days, patent foramen ovale, and moderate to severe stenosis in the origin of one of the vertebral arteries, causative CCS subtype is “probable small vessel occlusion,” whereas phenotypic subtype is “atherosclerosis + small artery occlusion + cardiac embolism.” The phenotypic subtyping allows the study of interactions among etiologic subtypes, patient selection in large-scale epidemiologic and genetic studies, as well as coding for administrative purposes. The CCS contains a total of 96 possible phenotypic combinations. Another algorithm that has recently been developed a priori for the purpose of phenotypic subtyping offers 4 subtypes and 5 possible states for each subtype, resulting in 625 possible combinations.19 Caution, therefore, should be exercised in using phenotypic subtypes in research projects with limited sample size or where the primary purpose is to assess etiologic stroke subtypes in simultaneous context with other covariates of interest.
This study does not address the reliability of the CCS in settings where there are raters from different professional backgrounds, including, for example, nurses, residents, general neurologists, and emergency physicians. Reliability analyses were performed using unweighted kappa. Because 16-item CCS takes into account the level of confidence in subtype assignments, disagreements can occur within each category as well as between different categories. Weighted kappa penalizes disagreements in terms of their relative importance. Given that between-category disagreements are generally accepted to be more serious than intracategory disagreements, the use of weighted kappa would have resulted in higher agreement rates compared to the unweighted approach.20 The use of weighted kappa in multicategory nominal scales, however, requires subjective weighting of disagreements and this largely depends on the clinical or research setting where the scale is being used. The use of abstracted case vignettes generally inflates kappa values. The reliability of the CCS might vary in settings where real patients or actual medical records are used, diagnostic investigations are cursory, and the causative spectrum of stroke is different.
The automated CCS offers high reliability for stroke subtyping with kappa values that compare very favorably with other classification algorithms. This demonstrates that classification of stroke subtypes by investigators from different countries can achieve sufficient comparability, and suggests the potential utility for CCS in improving stroke classification in multicenter trials in which accurate subtyping is critical.
Statistical analysis was conducted by Drs. Arsava and Ay.
The authors thank Dr. Bo Norrving for comments on the manuscript.
Dr. Arsava and Dr. Ballabio report no disclosures. Dr. Benner serves/has served as a consultant for Siemens Medical Solutions, Bayer Schering Pharma, and Perceptive Informatics. Dr. Cole receives royalties from the publication of Stroke Essentials for Primary Care (Springer, 2009); received a speaker honorarium from Concentric Medical Inc.; serves as a consultant for Wyeth; and receives research support from the US Department of Veterans Affairs and the NIH (NINDS R01 NS045012-05 [Co-I]). Dr. Delgado-Martinez reports no disclosures. Dr. Dichgans served as Genetics Section Editor for Stroke and receives research support from BMB, NGFN-Plus, Wellcome Trust, and the Foundation for Vascular Dementia Research. Dr. Fazekas serves on scientific advisory boards for Biogen Idec and Teva Pharmaceutical Industries Ltd./Sanofi-Aventis; serves on the editorial advisory boards of Cerebrovascular Diseases, the Polish Journal of Neurology and Neurosurgery, the Journal of Neurology, Stroke, and the Swiss Archives of Neurology and Psychiatry; and has received consulting fees, speaker honoraria, and grant support from Bayer Schering Pharma, Bayer Vital GmbH, Baxter International Inc., Biogen Idec, Merck Serono, and Teva Pharmaceutical Industries Ltd./Sanofi-Aventis. Dr. Furie has served on scientific advisory boards for Novartis and GE Healthcare and receives research support from the NIH/NINDS [R01-HS011392 (PI) and P50-NS051343 (PI)], the American Heart Association, and the Deane Institute. Dr. Illoh reports no disclosures. Dr. Jood has received research support from the Yngve Land foundation for neurological research. Dr. Kittner receives research support from the NIH (NINDS 1RO1NS045012-01A1 [PI], NINDS 1U01NS069208 [Co-PI], and NHGI 1U01HG004436-01A1 [Co-I]) and the US Department of Veterans Affairs; and has provided expert witness testimony in medico-legal cases. Dr. Lindgren serves on scientific advisory boards for Boehringer Ingelheim and Sanofi-Aventis/Bristol-Myers Squibb; has received speaker honoraria from AstraZeneca; and receives research support from the Swedish Research Council, Lund University, Region Skåne, and Strokeriksförbundet. Dr. Majersik and Dr. Macleod report no disclosures. Dr. Meurer receives research support from the NIH (NINDS P50NS044283 [Site PI]) and the Emergency Medicine Foundation. Dr. Montaner, Dr. Olugbodi, Dr. Pasdar, and Dr. Redfors report no disclosures. Dr. Schmidt serves on a scientific advisory board for Novartis; has received funding for travel and speaker honoraria from Pfizer Inc., Novartis, Merz Pharmaceuticals, LLC, Lundbeck Inc., and Takeda Pharmaceutical Company Limited; and serves on the editorial boards of Clinical Neurology and Neurosurgery and Neuropsychiatrie. Dr. Sharma has served on a scientific advisory board for Boehringer Ingelheim; has received speaker honoraria from Sanofi-Aventis and Bristol-Myers Squibb; and receives fellowship support from the UK Department of Health. Dr. Singhal has received research support from the NIH/NINDS (P50 NS051343 [Project PI], R01NS051412 [PI], P01 NS035611 [Co-I], R01NS059775-01 [Co-I], and R01NS38477 [Co-I]); his wife is an employee of and holds stock options in Vertex Pharmaceuticals; and has served as a medical expert witness in medicolegal cases concerning stroke. Dr. Sorensen has served on scientific advisory boards for Olea Medical and Breakaway Imaging; has received funding for travel from Genentech, Inc., Siemens Medical Solutions, Millennium Pharmaceuticals, Inc., and AstraZeneca; serves as a Section Editor of Stroke and on the editorial boards of The Oncologist and the Journal of Clinical Oncology; holds patents re: Method for evaluating novel, stroke treatments using a tissue risk map, Imaging system for obtaining quantitative perfusion indices, Delay-compensated calculation of tissue blood flow, High-flow oxygen delivery system and methods of use thereof, and Magnetic resonance spatial risk map for tissue outcome prediction; receives royalties from the publication of Cerebral MR Perfusion Imaging (Thieme, 2000); has received speaker honoraria from Siemens Medical Solutions, Novartis, and GE Healthcare; has served as a consultant to Mitsubishi Tanabe Pharma Corporation, AstraZeneca, and Genentech, Inc.; receives research support from Millennium Pharmaceuticals, Inc., Siemens Medical Solutions, AstraZeneca, Genentech Inc., Novartis, Merck Serono, Schering Plough Corp, and the NIH [NINDS NS38477 (PI), NCI CA137254 (PI), NINDS NS063925 (PI), and NINDS NS061119 (PI)]; and holds stock and stock options in Epix Pharmaceuticals. Dr. Sudlow serves/has served on the editorial boards of Stroke, BioMed Central Cardiovascular Disorders, the British Medical Journal, and the Cochrane Stroke Group; receives royalties from the publication of Stroke: A Practical Guide to Management, 3rd ed. (Blackwell, 2008); and receives research support from the Scottish Executive Health Department, the Wellcome Trust, and the UK Binks Trust Research Fellowship. Dr. Thijs serves/has served on scientific advisory boards for Shire plc, Merck Serono, and SYGNIS AG; had received funding for travel from Boehringer Ingelheim and Shire plc; served as an Associate Editor for Acta Neurologica Belgica; has received speaker honoraria from Shire plc, Abbott, Pfizer Inc., Boehringer Ingelheim, and Medtronic, Inc.; receives research support from SERVIER, Schering-Plough Corp., SYGNIS AG, CoAxia, Inc., Medtronic, Inc., Novo Nordisk, Bristol-Myers Squibb, Pfizer Inc., Shire plc, AstraZeneca, ThromboGenics NV, Eli Lilly and Company, Sanofi-Aventis, Boehringer Ingelheim, Daiichi Sankyo, Asubio Pharmaceuticals, Inc., FWO Flanders, and Vlaams Instituut voor Biotechnologie; and holds stock in Novo Nordisk. Dr. Worrall serves as an Associate Editor of Neurology® and on the editorial board of Seminars in Neurology; receives royalties from the publication of Merritt's Neurology, 10th, 11th, and 12th ed. (chapter author); receives/has received research support from the NIH (NHGRI/NIH U-01 HG005160 [Co-PI], U01 NS069208-01[Co-PI], NHLBI contract [PI], NINDS R25 NS065733 [Mentor], NINDS R01 NS42147 [Site PI], NINDS R01 NS 39987 [Executive committee, Site PI], NINDS R01 NS039512 [Executive committee, Co-I], NHLBI contract [Physician investigator], K08-NS45802 [PI], and R01-NS42733 [Site PI]), and the University of Virginia-CTSA Pilot Project. Dr. Rosand serves on the editorial board of Stroke and receives research support from the NIH (5R01NS059727-02 [PI], 3RO1NS059727-01A1S1 [PI], and U01 NS069208-01 [Site PI]), the American Heart Association, and the Deane Institute for Integrative Research in Atrial Fibrillation and Stroke. Dr. Ay receives research support from the NIH (R01-NS059710 [PI] and U01 NS069208-01.
Address correspondence and reprint requests to Dr. Hakan Ay, Massachusetts General Hospital, Harvard Medical School, 13th Street, Bldg. 149, Room 2301, Charlestown, MA 02129 gro.srentrap@yah
Disclosure: Author disclosures are provided at the end of the article.
Received January 31, 2010. Accepted in final form June 15, 2010.