|Home | About | Journals | Submit | Contact Us | Français|
Self-report post-concussion symptom scales have been a key method for monitoring recovery from sport-related concussion, to assist in medical management, and return-to-play decision-making. To date, however, item selection and scaling metrics for these instruments have been based solely upon clinical judgment, and no one scale has been identified as the “gold standard”. We analyzed a large set of data from existing scales obtained from three separate case–control studies in order to derive a sensitive and efficient scale for this application by eliminating items that were found to be insensitive to concussion. Baseline data from symptom checklists including a total of 27 symptom variables were collected from a total of 16,350 high school and college athletes. Follow-up data were obtained from 641 athletes who subsequently incurred a concussion. Symptom checklists were administered at baseline (preseason), immediately post-concussion, post-game, and at 1, 3, and 5 days post-injury. Effect-size analyses resulted in the retention of only 12 of the 27 variables. Receiver-operating characteristic analyses were used to confirm that the reduction in items did not reduce sensitivity or specificity. The newly derived Concussion Symptom Inventory is presented and recommended as a research and clinical tool for monitoring recovery from sport-related concussion.
The medical management of sport-related concussion has suffered from a dearth of empirical data from prospective controlled outcome studies. This has led to a burgeoning number of conflicting injury classification systems and return-to-play guidelines. Although there are now well over a dozen different proposed sets of guidelines, it has been recognized that few, if any, of these are evidence-based, and none has been universally accepted (Aubry et al., 2002; Guskiewicz et al., 2004). The various guidelines are all in agreement, however, that a player should be symptom-free before returning to play (Cantu, 1998; Kelly et al., 1991; Kelly & Rosenberg, 1997; LeBlanc, 1998). Although the rationale for this recommendation also remains poorly substantiated by any evidence to date, the primary concern is that players may be at an elevated risk of repeat concussion during the symptomatic post-concussive period. There is some preliminary evidence that such a period of vulnerability may exist, and that recovery following a second concussion may be somewhat more prolonged (Guskiewicz et al., 2003). In prospective controlled studies, the risk of a second concussion within the same season in American football has been reported to range from approximately 3%–6% for players who suffer an initial concussion (Guskiewicz et al., 2003; Macciocchi, Barth, Littlefield, & Cantu, 2001). A second concern is the risk of delayed brain swelling, or “second-impact syndrome”. This can be a life-threatening condition, but it is extremely rare. There are less than 20 documented cases in the world literature to date, the causative mechanism remains unclear, and it can occur without a second injury (McCrory, 2001; McCrory & Berkovic, 1998; Mori, Katayama, & Kawamata, 2006).
There is a general consensus, however, that until these risks are clarified, concussed players should be free of the residual effects of concussion before returning to competition. A number of methods have been explored to measure concussion-related symptoms or impairments, including brief “sideline” neurocognitive examinations (McCrea, 2001; McCrea, Kelly, Kluge, Ackley, & Randolph, 1997; McCrea, Kelly, Randolph, Cisler, & Berger, 2002), balance testing (Guskiewicz, 2001, 2003; Guskiewicz, Ross, & Marshall, 2001), and more extensive neuropsychological testing (Barr, 2001; Bleiberg et al., 2004; Echemendia & Julian, 2001; Erlanger et al., 2003; Hinton-Bayre & Geffen, 2002; Lovell & Collins, 1998; Macciocchi, Barth, Alves, Rimel, & Jane, 1996; Peterson, Ferrara, Mrazik, Piland, & Elliot, 2003; Randolph, 2001) designed to detect changes in cognitive functioning by comparing players with their own preseason baseline. The use of self-report subjective symptom checklists or scales has also been a consistent component of concussion management, and these have repeatedly been demonstrated to be sensitive to the effects of concussion (Macciocchi et al., 1996; Maroon et al., 2000; McCrea et al., 2003; McCrory, Ariens, & Berkovic, 2000; Peterson et al., 2003).
Self-report symptoms are also the primary decision-making factor in the most commonly used guidelines for return-to-play. Concussed athletes typically show elevated scores on symptom concussion checklists for at least as long as impairment is detectable via more time-consuming and expensive methodologies (e.g., neuropsychological testing), despite concerns that players might under-report symptoms in order to be cleared to return-to-play (Peterson et al., 2003; Randolph, McCrea, & Barr, 2005). In addition, recent publications, including a consensus paper, have recommended that players should be asymptomatic before screening for impairment using any type of neuropsychological testing (McCrea et al., 2005; McCrory et al., 2005). Finally, serious doubts have been raised regarding the reliability and incremental utility of neuropsychological testing in detecting recovery from sport-related concussion (Randolph et al., 2005). In a recent study that was the first to explore the use of computerized neurocognitive tests utilizing “real-world” retest intervals to measure test reliability, the stability coefficients of these measures proved to be extremely poor (Broglio, Ferrara, Macciocchi, Baumgartner, & Elliott, 2007). For the most widely used of these tests (ImPACT), stability coefficients ranged from only 0.15 to 0.39, with an average of 0.29. This is far below the level of stability needed for individual decision-making (usually recommended to be at least above 0.8), and suggests that these instruments lack sufficient reliability to be of use in establishing cognitive recovery. This type of finding further underscores the central role of subjective symptom checklists in monitoring recovery from concussion.
A variety of subjective symptom scales have been used in the study of sport-related concussion, although these typically involve substantial overlap in item content, which has been chosen to date on the basis of clinical experience with concussion-related symptoms. The overall sensitivity of these scales to the effects of sport-related concussion has been repeatedly demonstrated (Barth et al., 1989; Erlanger et al., 2001; Lovell & Collins, 1998; Lovell et al., 2003; McCrea et al., 2003; McCrory et al., 2000; Mrazik et al., 2000). Until recently, however, the psychometric properties of these checklists/scales have remained largely unexamined. In a recent study, data from one of these scales were reported for approximately 1700 high school and college athletes, and compared with data from a concussed sample of 260 athletes surveyed within five days of injury (Lovell et al., 2006). This paper provided only descriptive statistics regarding the scale, did not involve a prospective controlled study, and did not explore the relative utility of individual items in differentiating concussed from non-concussed athletes. Piland, Motl, Ferrara, and Peterson (2003) reviewed data from a group of 279 college athletes who were administered a 16-item symptom scale at baseline to explore the factor structure of the scale, which was hypothesized to consist of three relatively distinct domains (Piland et al., 2003). They eliminated seven items, primarily on the basis of face/content validity, and achieved a better fit to their model. Clinical validity was explored with a small sample of concussed players (N = 17).
Although this latter study involved a sophisticated approach to exploring certain psychometric properties of a concussion symptom scale, the primary application of a symptom scale in the medical management of sport-related concussion is in the efficient and sensitive detection of the effects of concussion, as opposed to the characterization of these effects. In this context, item selection should be driven primarily by sensitivity to concussion, requiring an empirical approach to determine item retention. In addition, the study of Piland and colleagues did not explore the scaling characteristics of their instrument, perhaps because of the relatively small number of injured players in their sample. As a result, it remains unclear which symptoms are actually sensitive to the effects of concussion, and whether or not a 7-point Likert-type scale is necessary for the detection and tracking of concussion-related symptomatology.
We recently completed three separate studies, involving the use of largely overlapping symptom scales, with data on over 16,000 athletes at baseline and over 600 athletes following concussion. We combined these datasets in order to enable an empirical study of each item's value to the scale. The purpose of this paper was to derive the most sensitive and efficient scale possible for the detection and tracking of self-reported symptoms following sport-related concussion.
The data for this study were derived from three separate projects: The Concussion Prevention Initiative (CPI), the NCAA Concussion Study (NCAA), and the Project Sideline (Sideline). The protocols and subject inclusion for each of these projects are described subsequently. The symptom scales employed in each project are contained in Table 1. Each symptom, in each project, was scored on a 7-point Likert-type scale from 0 (absent) to 6 (severe). Although the symptom scales for each project differed slightly, they did have substantial overlap with one another, and with symptom scales used in earlier studies (Lovell & Collins, 1998; Macciocchi et al., 1996).
This project involved the collection of prospective data from 14 colleges and 110 high schools from 2000 to 2003, involving athletes from football, men's soccer, women's soccer, men's lacrosse, women's lacrosse, men's ice hockey, and women's ice hockey. The total number of athletes examined at baseline was 9,094 (72.7% male), with 375 subsequent concussions. The data from this project have not yet been published, and the symptom checklist was only one component of this study. Detection of concussion was made on a clinical basis in accordance with procedures followed by the NCAA and Sideline studies (essentially, evidence of an alteration in mental status as the result of a mechanical insult to the head), referenced subsequently.
This study involved 4,238 male football players from 15 US colleges. All players underwent preseason baseline testing in 1999, 2000, and 2001. There were 196 subsequent concussions, with assessments points at the time of injury, 3 hr post-injury, and at 1, 2, 3, 5, 7, and 90 days post-injury. Portions of the data from this study have been reported elsewhere (Guskiewicz et al., 2003; McCrea et al., 2003).
This Milwaukee-based project began in 2000 and involved a total of 18 high schools in the southeastern Wisconsin area, including athletes from football, hockey, and soccer teams. The baseline sample included a total of 3,018 athletes (97% male), with a total of 70 subsequent concussions. Portions of these data have been presented elsewhere (McCrea, Hammeke, Olsen, Leo, & Guskiewicz, 2004). The overall concussion rate across studies (2%–5% per season) is consistent with epidemiological survey data (Powell & Barber-Foss, 1999).
There were a total of five post-injury assessments that were common to all three studies: Immediately post injury, post-game (approximate 3 hr post injury), Day 1, Day 3, and Day 5. The primary analysis was designed to eliminate any items that proved to be insensitive to concussion. The criterion for retention was an effect size of at least 0.3 on at least two of the five post-injury assessment points. This essentially requires an increase in the average level of symptomatology of 0.3 SD over the baseline mean for that item. The baseline means for all items were less than 0.5 on the 7-point Likert scale (0–6), and the standard deviations associated with these means ranged from 0.3 to 1.0. Achieving this retention criterion, therefore, required a very modest increase over baseline level of symptomatology for any variable, an effect size increase of 0.3 over baseline could be achieved with a mean rating score at that assessment point that was still less than 1. Given this rather liberal retention criterion, we felt that requiring this effect size to be reached on at least two assessment points was an adequately conservative approach to preclude retaining a variable as the result of a spurious finding (false-positive).
Applying this retention rule (an effect size change of 0.3 from baseline on at least two observations) resulted in the elimination of 13 variables, leaving the following 14 variables: Headache, nausea, balance/dizziness, fatigue, drowsiness, feeling slowed down, in a fog, difficulty concentrating, difficulty remembering, neck pain, blurred vision, sensitivity to light, sensitivity to noise, and sensitivity to light/noise. Because sensitivity to light and sensitivity to noise were independently sensitive, and these variables were combined in only the NCAA dataset, sensitivity to light/noise was also eliminated as a separate variable. In addition, it seemed likely to us that neck pain was attributable to cervical strain and not a direct result of concussion; as a result, this variable was eliminated as well, leaving a total of 12 symptoms.
Rasch rating scale analysis (Linacre, 2004; Rasch, 1960), one of the models within the Rasch measurement family, was used to explore the utility of the 7-point Likert scale. There were several indicators suggesting that there was insufficient information in the data to yield reliable parameter estimates if a 7-point scale were used; for example, (a) a number of rating categories had less than 10 observations; (b) irregularity in observation frequency across categories was found that signaled aberrant category endorsement by subjects; (c) average measures did not advance monotonically with category; and (d) some step categories advanced by less than 1.4 logits, whereas others advanced by more than 5.0 logits. These findings suggested that the number of categories could be reduced below 7 for most of the remaining 12 variables. After a great deal of discussion, however, a decision was made to retain the original Likert scaling.
This decision was made on several bases. First, if the number of categories was reduced for each item, it would require an assumption about how the players in the study would have responded to a scale with a more limited range (e.g., would a response of 1 on the original 7-point scale remained a 1 if the scale became dichotomous, or might that player now respond with a 0?). The only alternative to this assumption would be re-validating the new scaling with a new sample of players. It was ultimately concluded that this assumption was not one that could be comfortably made, and the labor-intensive nature of these studies makes a follow-up validation project with a reasonable sample size rather impractical. In addition, the use of a Likert-type scaling to monitor symptom recovery had an intuitive appeal to the clinicians in our group, who felt that the information regarding symptom severity might have clinical significance in some cases (e.g., in detecting a worsening headache in the rare player who suffers a delayed deterioration in neurological status). Finally, some of the items in the scale did seem to appear to meet assumptions for 7-point scaling, suggesting that reducing the scaling for these items might result in some loss of information.
To ensure that we did not lose substantial sensitivity by eliminating items, we conducted receiver-operating characteristic (ROC) analyses of data from two post-injury assessment points: Immediately post-injury and Day 5 post-injury. Scores for all concussed players using both the original scales and the newly derived 12-item scale were compared with the scores for the entire baseline sample on the original scales.
Although sensitivity and specificity are commonly used to assess the diagnostic efficiency of tests, both sensitivity and specificity rely on a single cut-off score. A more complete description of classification accuracy is given by the area under the ROC curve (AUC) (Zhou, Obuchowski, & McClish, 2002). The curve plots the probability of detecting a disorder (sensitivity) and false signal (1—specificity or false positive) for an entire range of possible cut-off scores (Hsiao, Bartko, & Potter, 1989). The AUC provides a measure of the model to discriminate between persons with a disorder versus people without the disorder. Perfect discrimination is achieved at an AUC of 1.00, with chance falling at an AUC of 0.50, represented as the area under the diagonal line traversing from zero false-positive rate, and zero sensitivity, to perfect sensitivity and 100% false-positives. AUC of 0.7–0.79 have been characterized as acceptable, 0.8–0.89 as excellent, and 0.9 or above as outstanding.
There are different methods to calculate AUCs. Parametric methods are based on the bivariate normal distribution, which assume a normal distribution for cases with the disease and a normal distribution for cases without, or that the data have been monotonically transformed to normal. Parametric methods also assume homoscedasticity. The assumptions can be restrictive and thus, we elected to use a non-parametric approach (DeLong, DeLong, & Clarke-Pearson, 1988).
Fig. 1 shows the ROC curves for both the CSI Day 1 and the original full-scale Day 1. They are nearly identical. CSI Day 1 had an AUC of 0.867 (95% CI .85, .88), whereas the full-scale Day 1 had an AUC of .871 (95% CI .85, .89). Both tests showed excellent diagnostic discrimination. Fig. 2 shows the ROC curves for both the CSI Day 5 and the full-scale Day 5. Again, both are essentially identical. CSI Day 5 had an AUC of .689 (95% CI .67, .71), whereas the full-scale Day 5 had an AUC of .71 (95% CI .69, .73). Not surprisingly, the diagnostic discrimination is poorer on Day 5 than on Day 1, as symptoms have substantially resolved by the fifth day post-injury.
Descriptive statistics are presented for the concussed sample at all common assessment points for the new 12-item scale, which we have termed the Concussion Symptom Inventory (CSI) in Table 2, and graphically depicted in Fig. 3. Scores were significantly elevated from the immediate post-injury assessment point through Day 3, returning to baseline levels by Day 5.
Self-report post-concussion symptom scales have been a key methodology in monitoring recovery from sport-related concussion, and assisting in return-to-play decision-making. To date, however, the scales employed for this purpose have been constructed on the basis of clinical judgment and no single scale has been identified as a “gold standard” for this purpose. The newly derived Concussion Symptom Inventory (CSI) is presented in the Appendix. To our knowledge, this is the first scale that has been empirically derived for the purpose of monitoring subjective symptoms following concussion. The source data also constitute the largest sample of prospectively studied cases of concussion in the literature to date, with a concussed sample size of 641 athletes compared with a baseline sample of 16,350 athletes. We have elected to include space on the form for the scale for the recording of any additional subjectively reported symptom that a clinician believes may have been due to the concussion. This will allow for a full clinical documentation of subjective symptomatology, including symptoms that were very rarely reported (or not queried) in our prospective studies. We are not proposing specific “cut-off” scores, because we lack sufficient empirical data at this point to suggest that there actually is a quantifiable risk of returning a player to competition based upon a particular CSI score. This is, of course, true of any symptom scale or other technique for measuring impairment following concussion.
We propose that athletic trainers and team medical personnel employ the CSI as a standardized methodology for tracking symptom resolution following concussion, and incorporate the information from the CSI into clinical decision-making regarding return-to-play. This recommendation is appropriate, given the lack of empirically derived alternative scales to date. The decision-making process regarding return-to-play should be informed by the evolving literature on the natural history and outcome of sport-related concussion, and by the specific clinical circumstances of the individual player. It is important to emphasize that the CSI is not intended to constitute the sole basis for clinical decision-making in the medical management of sport-related concussion, and that individual players may also experience concussion-related symptoms (e.g., sleep disturbance) that are not recorded within the CSI owing to the relative infrequency with which they occurred in our concussed sample. We do not intend for athletic trainers or team medical personnel to rely solely upon the results of the CSI to determine recovery.
The CSI does, however, provide an empirically based, rapid, and systematic methodology for tracking subjective symptoms following sport-related concussion. To date, a variety of symptom scales have been used in clinical assessments and studies of the natural history of concussion, and item selection and scaling have been driven by clinical judgment rather than empirical data. This large sample of players with baseline and post-concussion data allowed us to eliminate a number of items that proved to be largely insensitive to concussion.
The risks of “premature” return-to-play following sport-related concussions are as yet poorly delineated, and none of the many guidelines that have been promulgated for this purpose are evidence-based. They are all in agreement, however, that players should be symptom-free before being cleared to return. This would seem to be a reasonable and appropriately conservative approach to concussion management, particularly in younger athletes, until additional data regarding risks are accrued and clinical decision-making can be driven by reliable evidence. The CSI constitutes a relatively rapid, standardized way of monitoring symptom recovery, with no loss in sensitivity or reliability in comparison with much longer inventories. It has been demonstrated that concussed athletes routinely endorse subjective symptoms for at least as long as impairment is typically detectable by more time-consuming and expensive methodologies (e.g., neuropsychological testing), despite concerns that players might under-report symptoms in order to be cleared to return-to-play. Finally, recent publications, including a consensus paper, have recommended that players should be asymptomatic before screening for impairment using any type of neuropsychological testing, further underscoring the central role of subjective symptom checklists in monitoring recovery from concussion.
Although the CSI was derived from sport-related concussion data and the primary utility of the scale is intended to be for this purpose, it is conceivable that the scale might also prove useful in studies of the natural history of concussion due to other causes. To date, there is no clear consensus regarding a specific scale for use in tracking recovery of symptoms from concussion/mild traumatic brain injury in clinical (non-athletic) populations. Although the CSI has not yet been validated for applications outside of sports, it may have some appeal owing to the fact that it was empirically derived from data obtained from injured athletes. This population is typically highly motivated to recover, and therefore would be unlikely to over-endorse symptoms as a result of psychological factors (e.g., depression, somatoform tendencies, malingering). Although the CSI is clearly composed of a number of symptoms that are not exclusively specific to concussion (e.g., headache, drowsiness), it is reasonably safe to assume that the symptoms that were retained for this scale were in fact generated by concussion and not by other factors that might be operating in some proportion of mild TBI patients recruited from a non-sports setting.
This project was supported in part by funding from the NCAA, the National Operating Committee on Standards for Athletic Equipment (NOCSAE), Center for Disease Control's National Center for Injury Prevention and Control (NPIPC), National Academy of Neuropsychology, Waukesha Memorial Hospital Foundation, National Federation of State High School Associations, NFL Charities, Green Bay Packer Foundation, Milwaukee Bucks, Herbert H. Kohl Charities, Waukesha Service Club, and the Medical College of Wisconsin General Clinical Research Center (M01-RR00058 from the National Institutes of Health).
The authors would also like to acknowledge the invaluable assistance of Amy Mathews, MSW, and Stephen Marshall, PhD, in data management.