Our findings provide some insight into the psychometric and measurement properties of various concussion assessment tools that could be used to evaluate concussion in young athletes. Although more evidence exists on the use of various assessments in professional and collegiate athletes and although high school athletes are increasingly being studied, our investigation is one of the first to research the measurement properties of neuropsychological and balance tests in a youth sports population.
Although both age and sex effects on neuropsychological test performance have been studied in high school athletes,
published data on these variables in professional, collegiate, or younger athletes are limited. We did find significant sex differences on performance of the SAC, with females scoring higher than males. Although previous authors
have indicated a trend toward females achieving slightly higher scores, these findings did not reach statistical significance. Our lack of significant differences with respect to the verbal memory tests (Bushke SRT Sum, CLTR, Delayed Recall) is surprising considering that both age and sex affect performance on the Buschke SRT.
However, the SRT is also moderately related to IQ, which was not measured in this investigation. Our findings indicate that separate norms for males and females should be used on the SAC for younger athletes, as reported previously for high school athletes.
We also found that older athletes (aged 12 to 14 years) performed better on the BESS, Coding, and Trails B than did the younger athletes in our sample. Although our groups are close in age from a developmental standpoint, the finding that the older group performed better on the Trails B and Coding is substantiated by improvement in the age-appropriate norms cited in the literature for those tests.
The age-appropriate norms for the Trails B indicate that performance improves with each year's increase in age,
whereas the norms for the WISC-III Coding improve with each 3-month increase in age.
With respect to the BESS, previous authors
have demonstrated improved performance on postural stability tests with increasing age. In addition, investigators who were specifically using the BESS as a measure of postural stability have found lower BESS scores in healthy college
and high school athletes,
compared with our findings.
Test-retest reliability is important in all measures as a means to identify practice effects, a factor that could influence the test result. During serial testing, assessments with low reliability can cause profound variability in the scores of individuals with no alterations in cognitive function or deficits in balance ability. Test-retest reliability is the first step in the process to validate cognitive batteries for the assessment of concussion.
In the sport-related concussion literature, reports have been published on the test-retest reliability of paper-and-pencil neuropsychological assessments
computerized test batteries,
and the SAC in older populations.
However, to date, no authors have reported the test-retest reliability of the BESS or neuropsychological assessments specific to the pediatric athletic population.
With respect to the cognitive assessments, we found poor to good reliability coefficients, ranging from .46 to .83. We noted lower test-retest reliability (ICC = .51 to .65,
= .51 to .65) in the tests that assessed verbal learning and memory and the SAC (ICC = .46,
= .46). We did show differences in the reliability of the verbal learning tests between male and female subjects, with females exhibiting better test-retest reliability on the Buschke SRT, CLTR, and Delayed Recall. Further investigation of these differences revealed an outlier in the males, which likely led to lower reliability in the male subjects. Slight practice effects were observed on most of the tests. Significant changes at time 2, indicating practice effects, were observed on only 2 of the measures (Trails B, BESS). These findings are consistent with those of previous authors who were testing both uninjured high school athletes
and healthy adults.
In a sample of high school athletes, Barr
reported reliability coefficients of
= .54 for the Hopkins Verbal Learning Test total score and
= .56 for the Hopkins Verbal Learning Test Delayed Recall score. Similarly, the test-retest reliability of adults demonstrated poor to moderate reliability on the Buschke SRT total (
= .62), CLTR (
= .54), and SRT Delayed Recall (
tests and on the California Verbal Learning Test (
= .29 to .67).
We did find moderate to good test-retest indices (ICC = .65 to .83,
= .71 to .83) for the tests that assessed attention, concentration, and visual processing, including the Trails B, Coding, and Symbol Search tests. Two groups studying adult populations reported higher reliability coefficients than we found; however, this could be due to the increased variability in performance often seen on these tests in younger subjects.
Dikmen et al
noted good test-retest reliability for the Trails B (
= .89) and the Digit Symbol Test (
= .89) in subjects with a mean age of 43.6 years after a test-retest interval that ranged from 2 to 12 months, whereas Hinton-Bayre et al
also found good reliability coefficients in professional rugby players (19.4 ± 2.1 years of age) on the Speed of Comprehension Test (
= .78), Digit Symbol (
= .74), and Symbol Digit (
= .72) assessments after a 1- to 2-week test-retest interval. In contrast, Barr
found lower test-retest reliability on the Symbol Search (
= .58), Trails B (
= .65), and Digit Symbol (
= .73) tests in high school athletes with a mean age of 15.9 ± 0.98 years, with a test-retest interval of 60 days.
Test-retest reliability of composite scores for various computerized neuropsychological platforms has also been reported. Using a 2-week test-retest interval on the HeadMinder Concussion Resolution Index (HeadMinder Inc, New York, NY), good reliability was found on indices for Simple Reaction Time (
= .70) and Processing Speed (
= .82), whereas moderate reliability was noted for Complex Reaction Time (
Similarly, Iverson et al
reported moderate to good reliability on the composite scores for Verbal Memory (
= .70), Visual Memory (
= .67), Reaction Time (
= .79), and Processing Speed (
= .86) of ImPACT (version 2.0; ImPACT Applications Inc, Pittsburgh, PA) after a mean test-retest interval of 5.8 days. Using a 1-week test-retest interval, moderate to good reliability has been reported using CogSport for speed indices (
= .69 to .82); however, lower reliability coefficients were noted for the accuracy indices (
= .31 to .51).
Although most of the measures reported from the computerized assessments demonstrate moderate to good reliability, all of the authors mentioned above studied test-retest reliability over a shorter period of time (1 to 2 weeks) rather than the 60 days used in our investigation. It should be noted that both the age of the participants and the test-retest interval may affect the test-retest reliability coefficients reported across the various studies.
One contributing factor that may explain our low test-retest reliability on some assessments is the large SEM value. For example, our scores for the CLTR ranged from 28 to 91 on the initial test and from 12 to 96 on the retest, with an SEM of 12.12 and an S
of 17.15. Similarly, we had large SEM (11.67) and S
(16.50) values with the Trails B. However, these findings are less surprising for the latter test, given the results of other sport-related concussion studies in which the Trails B was administered without prior administration of the Trail Making Test A. Guskiewicz et al
reported SDs ranging from 14.09 to 18.23 for the Trails B in a collegiate population, and McCrea et al
noted SDs of 18.69 and 22.12 in collegiate control athletes and those with concussions, respectively.
Another explanation for the lower reliability of some measures could be related to our subjects' scores within a truncated range on tests with a restricted range of scores available.
Having subjects who score within a truncated range has been shown to produce lower reliability correlations.
For example, the SAC is scored between 0 and 30 points. In our specific population, the range of scores for the SAC was 20 to 30 on the initial test and 23 to 30 on the retest. Our subjects represented a more homogeneous group for this test and presented less variability and, thus, a lower test-retest reliability coefficient.
On the other hand, assessments without a limited scoring range, such as the Trails B, usually provide higher test-retest reliabilities.
Although several groups have investigated the test-retest reliability of various cognitive assessments in athletes,
the literature is somewhat lacking with respect to the test-retest reliability of postural stability assessments in athletes. Previous authors have found that eyes-closed balancing tasks are novel for most children; therefore, the possibility of a learning effect exists, which can influence the retest scores.
Such a finding has been noted with the BESS: a learning effect has been reported upon serial testing.
We found acceptable (
= −.70) reliability of BESS performance in our subjects, but the significant improvement during the retest session likely affected our reliability. Additionally, increased variability in children with regard to measures of balance is well documented until children reach adult-like postural stability, near the age of 11 years.
In 2 studies testing younger children than we used in our investigation, the test-retest reliabilities on an eyes-closed, single-leg stance ranged from .59 to .77, and on a tiltboard test of balance, for both eyes-open and eyes-closed conditions, reliabilities were low (
Similarly, Westcott et al,
using the Pediatric Clinical Test of Sensory Interaction and Balance, found reliability coefficients for combined sensory condition scores ranging from .45 to .69 with the feet together and from .44 to .83 during a heel-to-toe (tandem) stance in children between 4 and 9 years of age. It is important to note that no authors have compared the BESS to the aforementioned balance tests; therefore, one explanation for the different reliabilities could be that the tests assess balance differently. Additionally, our subjects were older than those reported in the above studies; it is plausible that they exhibited better balance ability and, therefore, improved reliability.
Reliable Change Indices
Although use of the RCI has become more popular in determining change in cognitive function after concussion,
no method has yet been accepted for determining how many “points” on a particular assessment indicates a cognitive or balance deficit. Some authors
have identified the change score as clinically meaningful if it lies outside of the 90% CI. However, the use of the 90% CI might be too conservative for sport-related concussion, because the impairments are often subtle and resolve rather quickly.
Yet this recommendation was based on the results of a study of high school and collegiate athletes. Recently, it has been suggested that sports medicine clinicians should be more conservative when making decisions regarding a young athlete who has sustained a concussion
; therefore, our 70% CI scores may not be too conservative to use with our younger population. It should also be noted that the RCI method we used included a correction for practice effects, as suggested by Chelune et al.
Clinicians should not use a single RCI as the sole determinant in return-to-play decision making after a concussion. The RCI values are intended to help the clinician decide what constitutes a meaningful change in an athlete's score and should be interpreted along with the individual's clinical examination, concussion history, presenting symptoms, and other assessment data. Additionally, the limitations of using the RCI as a means of determining change include the need to understand the statistical procedure and alternate methods of detecting change (ie, standard regression) that may provide better results.
Relationship Between the Standard Assessment of Concussion and Neuropsychological Tests
We found that the SAC was significantly correlated with 4 of 6 neuropsychological measures; however, these correlations demonstrated a weak relationship, accounting for only 8.2% to 13.3% of the variance in SAC scores. Two possible reasons for the weak relationships are the restricted range of scores and the domains tested. The SAC is known to have a restricted range of scores in a normative sample,
as was the case in our study. This factor may help explain why we found only weak relationships between it and the neuropsychological tests.
Another possible explanation for the weak relationships is that these measures likely assess somewhat different domains of cognitive function. The gap here may lie between the measurement of global cognitive functioning and specific cognitive abilities. The correlations between total SAC score (ie, as an indicator of overall cognitive functioning) and scores on measures of specific cognitive functions (eg, memory) were weak. The SAC is advocated as a sideline mental status assessment tool useful in the first 48 hours after injury and is a valid and moderately reliable gross measure of cognitive functioning in a mixed sample of high school and collegiate athletes during this acute phase of injury.
In contrast, more advanced neuropsychological testing is often useful in detecting subtle cognitive abnormalities in specific cognitive domains, in determining prolonged deficits in cognitive function, or in aiding in return-to-play decision making once the athlete is asymptomatic.
Cognitive areas, including verbal learning and memory (Buschke SRT), attention and concentration (Trails B, Coding, Symbol Search), visual-motor function (Trails B, Coding, Symbol Search), sequencing (Trails B), and processing speed (Trails B, Coding, Symbol Search), were assessed with the neuropsychological battery.
Although the SAC assesses the domains of memory and concentration, the brevity of those sections as well as the entire SAC may prevent an extensive assessment of those cognitive areas further out from the acute injury. These results support the notion that both the SAC and more complex neuropsychological assessments should be used in the evaluation of an athlete after concussion.
In an attempt to better isolate various cognitive domains, we performed a post hoc analysis of the relationship between each of the 4 SAC domains and the 6 neuropsychological measures. No meaningful relationships were revealed; thus, it is likely that these assessments do serve different purposes in a concussion assessment protocol.
Also of clinical importance are the high correlations found among some of the neuropsychological assessments, specifically the Trails B, which was highly correlated with 2 of the other neuropsychological measures. Those clinicians using a neuropsychological test battery should select tests that will assess the various cognitive domains typically affected by concussion and that can be administered in a short amount of time. Whereas a standard neuropsychological examination consists of multiple cognitive domains and requires 3 to 6 hours to administer, baseline and follow-up examinations for sport-related concussion typically last 20 to 30 minutes and target neurocognitive functioning most sensitive to impairments after a concussive injury.
We feel that the tests used in our study could be added to a battery for prospective injury investigations. These tests can be administered in 20 minutes and assess the domains typically affected by concussion: memory (Buschke SRT), attention (Coding, Symbol Search, Trails B), and speed and flexibility of cognitive processing (Coding, Trails B).
The inclusion of additional tests related to reaction time, visual memory, and complex attention would strengthen the battery and provide a more global measure of cognitive function. Future authors should address the psychometric properties of additional assessments.
Although we determined that we cannot predict SAC scores very well based on the neuropsychological test scores, we found that improved performance on the neuropsychological tests correlated with improved performance on the SAC. Our results indicate that psychometrically the SAC behaves similarly in young athletes, adolescents, and adults, in whom the instrument has demonstrated reliability, validity, clinical sensitivity, and specificity in detecting neurocognitive impairment after concussion. However, further research in injured subjects is required to determine the sensitivity and specificity in detecting cognitive dysfunction during the acute period after concussion in younger athletes.