The calibrated severity metric based on ADOS raw totals offers a method of quantifying ASD severity with relative independence from individual characteristics such as age and verbal IQ. It should have utility in various genetic, neurobiological, and clinical research endeavors, including treatment trials, that otherwise would use unstandardized ADOS raw totals. Calibrated scores have more uniform distributions across age- and language-groups compared to raw totals, making it possible to compare children’s scores longitudinally across distinct algorithms. In part because of the modular system of the ADOS, chronological age, nonverbal IQ, and verbal and nonverbal mental age did not predict either raw totals or severity scores in this sample. The severity metric builds on this modular system to reduce the influence of participants’ verbal IQ, which accounted for 10% of the variance in severity scores versus 43% of the variance in raw totals, a reduction from a large to medium effect size. The remaining influence of verbal IQ on the severity metric can be seen in the drift of mean scores toward greater severity in older age groups with lower language levels (Modules 1 and 2). This apparent age effect seems likely to be explained by lower verbal IQ in the older children without fluent speech. Though this effect has not been eliminated entirely, the calibrated metric is better able to measure autism severity beyond verbal impairment than are raw ADOS totals.
Calibrating scores within narrowly-defined age/language cells achieved the reduction in verbal IQ effects within the new metric and corrected for artificial variability in individuals’ scores across time. Unfortunately, a greater number of calibration cells precludes a user-friendly age/language ‘prefix’ to the severity score, as mentioned in the introduction. The method described here necessarily defines autism severity in relation to individuals of similar age and language ability. When using these scores clinically and for research, one must keep in mind the age/language level of the child/sample, as there clearly will be developmental and adaptive functioning differences among children with the same severity score on this 10-point scale. This is true of all standardized scores. Calibrated severity scores do not measure functional impairment, but are intended to provide a marker of severity of autism symptoms relative to age and language level. The module a child can be given depends on his/her expressive language level, and thus will continue to be an important indicator of adaptive functioning for most children.
The dataset described here included children from various areas in the United States, both urban and rural. Participants represented both consecutive clinic referrals and research participants. While this is likely a representative sample for a North American clinical research center, it is worth examining how referral bias might have influenced these calibrated scores. Though the dataset was large (N=1807 assessments from children with ASD), its division into age/language cells for calibration resulted in a few small cell sizes. For example, children under age 5 who are not language delayed are unlikely to be referred for an evaluation unless they exhibit notable ASD symptomatology, so we would expect these cells to have a more limited distribution in the higher end of the range of ADOS scores. Another referral bias involved the tendency for children of higher severity to have more clinic reevaluations than those with less pronounced features of ASD. Indeed, the mean severity scores across the 18 calibration groups ranged from 6.64 (in young children with fluent speech) to 8.10 (in older children with phrase speech only), indicating that severity scores are still somewhat influenced by developmental level and referral bias.
After attempting a number of methods for standardizing ADOS scores, we believe that the present method of using ADOS diagnostic classifications to ‘anchor’ severity scores best controls for recruitment effects that would be present in any large clinical research sample, and therefore results in a metric more likely to be generalizable across datasets. If a cell in this calibration sample had predominantly high- or low-scoring children, this restricted range would only be assigned to severity scores associated with one classification (autism, ASD, or nonspectrum), allowing for more variability in other datasets across the other possible classifications. Ideally this method circumvents to some degree the inevitable effects of recruitment. Anchoring severity scores to ADOS classification instead of clinical diagnosis also avoids conflicting dimensional and diagnostic assignment. Within the present method, severity scores reflect ADOS raw totals regardless of the participant’s diagnosis, so a child with a non-ASD best estimate diagnosis potentially could receive a score of 6 on the metric while a child with autism receives a 3, if the former child showed more autistic symptomatology relative to his/her age and language within that 45 minute assessment than did the child with autism.
More work is needed to test the validity and utility of this calibrated severity metric. Module change, especially into Module 3 (fluent speech), may inflate an individual’s severity score. Some longitudinal variation in these scores is expected, but the purpose of the metric is to measure change beyond typical variation in ASD. For this reason, the fact that approximately 20% of ASD assessments with ‘autism’ ADOS classifications receive the highest severity score of 10, creating a ceiling effect, was preferred over drawing out the distribution of the metric with the result of less meaningful differences between scores. We hope to further examine patterns of severity score change over time in a longitudinal sample, identifying trajectory classes and the risk variables that predict class membership.
Another future direction is to calibrate the Social Affect and Restricted, Repetitive Behavior (RRB) domains of the revised ADOS algorithms separately in order to measure severity within these symptom domains. This process will need to employ a different method of mapping raw scores onto a severity metric, due to the fact that each domain has a smaller range of possible raw totals than the overall score (with a maximum of only 8 points for the RRB domain).
Although based on a large sample, this is not a metric of symptom severity in a “true” ASD population because ADOS data on such samples do not exist at present. As larger population studies become available, the metric should be recalibrated within those samples for a more accurate reflection of the distribution of ADOS scores in the ASD population.
These results also may be influenced by the historical period in which some of the data were collected. This sample grew over a 16-year period in which patterns in ASD identification evolved. As greater numbers of children are identified at earlier ages (thus including milder cases at younger ages), it is possible that severity scores might have been assigned differently to raw totals if only recently collected data were used.
The ADOS calibrated severity metric represents a step towards achieving greater comparability of scores across time, age, and module, and is less influenced by verbal IQ than raw scores. Therefore, it should provide a better measure of ASD severity than other methods currently available, including ADOS raw total scores. This metric must be replicated in a large independent sample. To test the validity of the metric, calibrated scores should be used to track observed changes in ASD severity against sources of convergent validity.
Calibrated scores could be used to predict outcome, changes in adaptive skills over time, and associations between severity of core features and clinical characteristics such as behavior problems, peer relationships, and school achievement. This metric may also prove useful in interpreting results from studies of the effectiveness of interventions, and in characterizing samples for genetic and neurobiological research. An important reminder, however, is that the calibrated severity metric is based on a relatively brief, office-based observation with a clinician, and thus is only one part of a necessarily broader picture of the strengths and difficulties of a child with ASD.