The current study was undertaken in order to evaluate the revised ADOS algorithms as proposed by Gotham et al. (
2007). The aim was to investigate the sensitivity and specificity of the revised algorithms in an independent sample of 558 children, and to investigate their contributions to a clinical ASD classification. The independent sample consisted of children referred for child-psychiatric problems/ASD and children from an epidemiological study of ASD in mental retardation. Thus, not all participants were referred for problems in the autism spectrum, yet they were all evaluated by experienced clinicians. The sample contained children from three of the five divisions as made by Gotham: administered with module 1 (Some Words group only); module 2 (5 and Older only) and module 3. In general, the current sample was older and contained more clinical non-autism ASD-classifications and lower functioning non-spectrum cases than the sample of Gotham et al. (
2007).
It is promising that the correlation between IQ and the revised algorithms as reported by Gotham et al. in all modules (based on Verbal IQ and Non Verbal IQ; 2007) was not replicated in the current study, comparable to the replication study in the US (Gotham et al.
2008). The fact that no correlation was found with IQ or with age in all modules indicates that the revised algorithm domains seem to be independent from these variables in the current sample, which is what Gotham et al. strived for. Nevertheless, this can not be put forward as an enhancement on behalf of the new algorithms, since the same pattern was found for the original algorithm domains, with no correlations with age and IQ in either module. The comparison of data may be complicated by the fact that Gotham et al. (
2007) had measured VIQ and NVIQ, whereas the current IQ measures were based on various tests, resulting in differences between the outcomes. For part of the children TIQ’s were available based on verbal and nonverbal subtests, whereas others could only complete non-verbal IQtests. However, both correlations in Gotham’s sample between VIQ and NVIQ and the ADOS algorithms were significant whereas the correlations in the current sample were not.
With respect to sensitivity and specificity, our results indicate that applying the revised algorithms improves the balance between these in module 2 and 3, without losing strength with respect to efficiency of the classification. This is comparable to the results as reported by Overton et al. (
2008). The efficiency of the revised algorithms (i.e., the percentage of cases classified correctly) was better for AD than for non-autism ASD. This may be due to the fact that part of the non-spectrum children (the clinical group) were referred for developmental or behavioral problems that gave reason to investigate whether or not ASD was apparent, and therefore may have scored on the ADOS without receiving an ASD diagnosis or may not have sored on the ADOS, while receiving an ASD diagnosis. Amongst children with MR, the distinction between AD and non-autism ASD is even less clear than in a normally intelligent population, as is the discrimination between ASD and non-spectrum in some cases. It is important to keep in mind that the ADOS does not reflect the distinctions made by clinicians between AD and non-autism ASD (see also Lord et al.
2000). Additionally, even the combination of ADOS and ADI-R does not lead to a perfect discrimination between ASD and non-spectrum cases, which indicates that the actual criteria for such diagnosis are not clear enough yet (Risi et al.
2006).
The balance between sensitivity and specificity is important since a very specific instrument without sufficient sensitivity tends to miss cases, whereas a very sensitive measure without sufficient specificity would be overinclusive. In a former study in a Dutch, low-functioning, population, the ADOS tended to be overinclusive, e.g., to have a high sensitivity without an accordingly high specificity (De Bildt et al.
2004). The currently reported balance implies that this is less the case with the revised algorithms in the current (only partially overlapping) sample. However, in module 1, Some Words of the current sample, the balance did not improve, due to a decrease in specificity when sensitivity increased.
The question is whether this balance is reasonable to aim for in this sample. Not only the population as a whole is lower functioning than the previously reported samples, also the clinically classified non-spectrum cases are lower functioning than the A(S)D cases and, in modules 1 and 3, older too. This is due to the fact that the majority of the non-spectrum cases were recruited from a population-based study of ASD in MR, not referred cases for ASD (De Bildt et al.
2005).
Based on the specific character of this population, the current sample may increase the difficulty in obtaining a satisfactory specificity, and therefore may affect the balance between sensitivity and specificity: Compared to the findings of Gotham et al. (
2007,
2008), the current study reports lower values of specificity for all algorithms. As reported by Gotham et al. (
2007), in the nonverbal module 1 participants with very low nonverbal mental ages (≤15 months) specificity was 50 or more % lower as compared to nonverbal children administered with module 1 with mental ages of more than 15 months. Although in the current sample all children were verbal, the same issue may affect the results due to the very low IQ’s reported. This may especially be the case in module 1, Some Words, where the non-spectrum cases have a mean IQ of 30.
Another interesting issue with respect to distinguishing ASD from low-functioning children with MR without ASD is the addition of the RRB domain to the SA domain for a classification based on the ADOS. The current study indicates that including the RRB domain additional to SA in the algorithm contributes significantly to the clinical ASD classifications in module 2, 5 and Older and module 3, yet does not increase correct classification of ASD in module 1, Some Words. For modules 2 and 3 these results resemble what Gotham et al. (
2007,
2008) reported. For module 1, Some words, the outcome is not surprising, since RRB’s are important features of low-functioning children with MR as well and may not be specific for or give lead to an ASD diagnosis in such a population.
Due to the restricted behavioural repertoire shown in low functioning children, together with the overlap with behaviours in children with ASD, it may be ambitious to strive for an instrument that is able to distinguish between those two without missing or overincluding cases. However, this should not be judged as a flaw of the algorithms or ADOS, yet is inherent to the nature of the two disorders, because of their behavioural resemblance.
Nevertheless, for (at least part of the) clinicians diagnosing ASD this is daily practice: distinguishing the specific developmental disorder ASD from a more general developmental disorder or MR. Standardized and valid instruments would be of great value in this process.
The currently reported increased balance between sensitivity and specificity in relatively low-functioning children administered with module 2 (5 and older) and module 3 indicates that in these groups, the revised algorithms do not lose any strength compared to the original one, and therefore should be preferred. Adding RRB in the algorithm increases the discriminative power. For module 1, Some Words, an even lower functioning group, the revised algorithms are less clearly preferred. Although the sensitivity increased, the specificity decreased, and the RRB domain did not add to distinguishing between ASD and MR. Further research in very low functioning children is needed in order to obtain the perfect algorithm, or perhaps even a better applicable ADOS-module. After all, the original algorithm of module 1 is not applicable to children with mental ages under 24 months, and although 15 months seems feasible with the revised algorithms, MR and ASD lead to the same behaviours in such low-functioning children.
Whether a high sensitivity (revised algorithms) or a high specificity (original algorithm) is preferred when balance between the two is not feasible (which needs to be further investigated before concluding so) depends on the question behind the administration of the particular ADOS. For research it may be important to only include definite cases of ASD, resulting in the need for a higher specificity. On the contrary, for other uses it may be important not to miss any possible ASD-case, with a need for a higher sensitivity as a result. In any case, this discussion once again emphasizes the fact that the ADOS should never be used as the only indication for whether an ASD is present, also not in low-functioning children.
Unfortunately, we were not able to reliably investigate the children with MR separately, since their number in the current sample was too small, too much distributed amongst the various modules and too little amongst various classifications within these groups. With respect to the further development of the algorithm of the ADOS, investigating it in a large (combined) group with MR will increase the knowledge on the value of the ADOS and its algorithms in this specific group.
Additionally, the number of cases that could be included in the current study may limit interpreting the findings. We could not include module 1, no words or module 2, younger than 5. The current study therefore leaves questions open on the value of the revised algorithms of the ADOS in very young or low functioning children. Additional studies concerning these subpopulations will contribute to investigation of the value of the revised algorithms.
To conclude, our results corroborate the findings as reported by Gotham et al. (
2008) that the advantages of the revised algorithm (i.e., better representation of observed diagnostic features, increased comparability between modules in algorithm content and number, and improved predictive value for autism) have not been at great expense of the sensitivity and specificity, for module 2, 5 and older and module 3. With respect to module 1, in low-functioning children, more research is needed in order to reach good balance between high sensitivity and high specificity.