|Home | About | Journals | Submit | Contact Us | Français|
Clinical best estimate diagnoses of specific autism spectrum disorders (autistic disorder, pervasive developmental disorder-not otherwise specified, Asperger’s disorder) have been used as the diagnostic gold standard, even when information from standardized instruments is available.
To determine if the relationships between behavioral phenotypes and clinical diagnoses of different autism spectrum disorders vary across 12 university-based sites.
Multi-site observational study collecting clinical phenotype data (diagnostic, developmental and demographic) for genetic research. Classification trees were employed to identify characteristics that predicted diagnosis across and within sites.
Participants were recruited through 12 university-based autism service providers into a genetic study of autism.
2102 probands (1814 males) between 4 and 18 years of age (M age=8.93, SD=3.5 years) who met autism spectrum criteria on the Autism Diagnostic Interview–Revised and Autism Diagnostic Observation Schedule and had a clinical diagnosis of an autism spectrum disorder.
Best estimate clinical diagnoses predicted by standardized scores from diagnostic, cognitive, and behavioral measures.
Though distributions of scores on standardized measures were similar across sites, significant site differences emerged in best estimate clinical diagnoses of specific autism spectrum disorders. Relationships between clinical diagnoses and standardized scores, particularly verbal IQ, language level and core diagnostic features, varied across sites in weighting of information and cut-offs.
Clinical distinctions among categorical diagnostic subtypes of autism spectrum disorders were not reliable even across sites with well-documented fidelity using standardized diagnostic instruments. Results support the move from existing sub-groupings of autism spectrum disorders to dimensional descriptions of core features of social affect and fixated, repetitive behaviors, together with characteristics such as language level and cognitive function.
In the field of autism spectrum disorders (ASD), diagnostic instruments have been helpful in defining populations,1 merging samples,2 and comparing results across studies,3,4 nevertheless, best estimate clinical diagnoses (BEC) have long been the gold standard.5,6,7 In single-site studies, BEC diagnoses added information to standardized instruments to predict later diagnoses8,9 and classify children according to developmental trajectories of adaptive and language functioning.10,11 However, researchers have recently expressed skepticism about the scientific and clinical value of categorical ASD groupings in DSM-IV-TR12 and ICD-1013 (i.e., autistic disorder (AUT), pervasive developmental disorder-not otherwise specified (PDD-NOS), Asperger’s disorder (ASP)), upon which BEC diagnoses are based.5,14,15
The Simons Simplex Collection (SSC) is a multi-site project, aiming to study de novo genetic variations in families that have one child with ASD and one or more unaffected siblings. Diagnostic parameters for probands were intentionally set to include common forms of ASD: AUT, PDD-NOS and ASP. Stringent requirements for training and maintenance of reliability in the selection, administration and scoring of standardized instruments and cognitive tests were set. However, there was a deliberate decision to provide no specific training in diagnosis; rather, senior clinicians were asked to consider all available information to make BEC diagnoses (AUT, PDD-NOS, ASP) using DSM-IV-TR criteria as they normally would in their practices, thereby allowing examination of relationships between BEC diagnoses of different ASDs, demographics, and standardized developmental and behavioral phenotype measures across sites. This design allows us to assess whether there are differences in BEC diagnoses of children with ASD across sites that are not associated with differences in characteristics of the children, but rather that are associated with site- and clinician-based differences in how information is used to make diagnoses.
2102 probands from 4 to 18 years were evaluated at 12 university-based centers. To prioritize children more likely to have de novo copy number variations, inclusion criteria for probands were: a) meeting criteria for ASD on the Autism Diagnostic Observation Schedule (ADOS16), b) meeting Collaborative Programs for Excellence in Autism (CPEA) ASD criteria on the Autism Diagnostic Interview-Revised (ADI-R17), which has less stringent cut-offs for social and communication domains than “autism” criteria and no requirement for repetitive behaviors or age of onset,3,18 c) having a nonverbal mental age of at least 18 months and d) a BEC diagnosis of AUT, PDD-NOS or ASP (see www.sfari.org15). Families were excluded if the proband had significant hearing, vision or motor problems likely to affect interpretation of behavioral data, and because of the focus on de novo variations, if any known relative, third degree or less, had ASD; a sibling had substantial language or psychological problems related to ASD or the proband had Fragile X, Tuberous Sclerosis, Down Syndrome or a significant early medical history (e.g., very low birthweight). Sites contributed between 97 and 229 families.
Each proband was administered the ADOS and a hierarchy of cognitive tests was implemented across sites, with 88% receiving the Differential Ability Scales, Second Edition (DAS-II19), 7% the Mullen Scales of Early Learning (Mullen20) and 2-3% each receiving the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV21), Wechsler Abbreviated Intelligence Scale (WASI22), or other scales. Parents were interviewed using the ADI-R and Vineland Adaptive Behavior Scales, Second Edition (Vineland-II23), and completed questionnaires, including the Aberrant Behavior Checklist (ABC24). Parents provided informed consent and children provided assent, approved by Institutional Review Boards at each university.
Examiners attended standard research trainings and maintained research reliability with project consultants through semi-annual workshops and video scoring (details in eMethods). Following review of all information and observing the proband in person or on video, the senior clinician (47 psychologists, 6 physicians -- psychiatrists, pediatricians, a clinical geneticist -- and 3 master’s level clinicians) specified a BEC diagnosis of AUT, PDD-NOS or ASP according to DSM-IV-TR criteria. Clinicians’ years of experience in ASD ranged from less than 5 to more than 20 (see Table 1). Because one goal was to examine the contribution of BEC diagnoses in a protocol that asked experienced clinicians to consider information as they would in other research or their own practice, no training was provided in clinical diagnoses of ASD.
Relevant proband characteristics were classified as diagnostic [ADI-R standard algorithm domain totals: ADI-Social, ADI-R Verbal Communication (ADI-VC), ADI-R Nonverbal Communication (ADI-NVC), Restricted and Repetitive Patterns of Behavior (ADI-RRB); ADOS domain scores: Social + Communication (ADOS-S+C) and Restricted Repetitive Behavior (ADOS-RRB) totals from Modules 1-4, Social Affect (ADOS-SA) from revised algorithms25 for Modules 1-3; Calibrated Severity Scores (ADOS-CSS) from Modules 1-326] or demographic, developmental and behavioral [age, gender, race, ethnicity, maternal education, site, verbal IQ (VIQ), performance IQ (NVIQ), Vineland Adaptive Behavior Composite (Vineland-Composite), and irritability and hyperactivity scores from the ABC]. We also considered diagnosticians’ characteristics (type of degree; years of experience).
Differences between sites were assessed as follows: continuous variables were described through minimum and maximum values, means, and standard deviations within each site. Distributions of continuous characteristics were approximated using kernel density estimation27; site densities were overlaid for visualization. Variance was partitioned into within- and between-site variances using mixed effects models28 for means that included random site effects. Intra-class correlation coefficients; i.e., the ratios of between-site to total variance, are reported. Sites significantly deviating from the rest with respect to mean values were identified based on tolerance bands under the assumption of no differences between sites employing permutation tests.29,30 Categorical measures were described through ranges of proportions across sites; sites differences were assessed with χ2 tests for independence.
To investigate how BEC diagnosis was associated with behavioral domains from diagnostic measures of ASD and whether there were differences between sites in using demographic, developmental and behavioral measures in making BEC diagnoses, we employed the recursive partitioning technique, CART31 (classification and regression tree). CART is a statistical technique for discovering relationships between variables. It contrasts to more familiar linear and generalized linear models, which evaluate and test for significance relationships of known forms. CART is particularly well suited here because we do not know how various specific diagnostic features influence clinicians’ decisions about distinctions among ASDs, whether scores on standardized instruments are linearly related to BECs, or if the same relationship between one scale and diagnosis exists for all levels of other scales (e.g., interactions between scales). In such situations, CART can reveal relationships between variables that might go unnoticed using other analytic techniques and generates empirically-derived cut-points within continuous variables. It is important to note, however, that CART is not a probabilistic model, which means that formal inferences regarding the significance of predictors cannot be made (see details in eMethods).
In the CART analyses, we sequentially fit models, adding groups of predictors at each step. This is akin to forward variable selection in classic regression analysis. The order for inclusion of sets of predictors of BEC diagnosis was: CART.1 included only diagnostic scales and clinician characteristics; CART.2 included diagnostic scales, clinician characteristics and site; CART.3 included diagnostic scales, clinician characteristics and site, as well as proband demographic, developmental and behavior characteristics. Finally, separate CART models were fit for each site.
Tree models were first fully ‘grown’ and then ‘pruned’ (see eMethods). All analyses were performed with R32 using the recursive partitioning library rpart. Due to space constraints, the main text focuses on the CART.2 model, with a brief discussion of CART.1 and CART.3 (see eResults for details).
Results from parametric models regarding site differences are also presented using classic inferential procedures. After diagnostic scales associated with BEC as outcome were identified in CART.1, we fit logistic regression models for AUT vs. PDD-NOS or ASP and for ASP vs. AUT or PDD-NOS as functions of these scales and clinician characteristics. We then fit models that added site as either a fixed or random effect and tested interactions for site by each scale. Finally, the first model was compared to the second two models to assess the effect of site, using likelihood ratio tests.
As shown in Figure 1, statistically significant differences emerged across sites in the proportion of probands assigned to the three ASD diagnostic categories (AUT, ASP and PDD-NOS) using BEC diagnoses, χ2(22)=358, p<0.001. Two out of 12 sites gave fewer than half of the probands AUT diagnoses, while one site gave AUT diagnoses to all probands (see Table 1). Two sites gave PDD-NOS diagnoses to more than 40% of probands. Sites also showed significant differences in the proportion of probands receiving diagnoses of ASP, ranging from 0 to nearly 21%.
Because the sites were clinics known for different strengths, differences in recruitment were expected to yield site differences in behavioral phenotypes and demographics. The question is the degree to which differences in particular ASD diagnoses across sites related to differences in the children, either in specific diagnostic, or other features, or to differences in the clinicians and their use of information about the children.
In contrast to differences in BEC diagnoses, sites showed no statistically significant differences in ASD diagnostic classifications yielded by standardized instruments (see Table 1). In part, this was a function of the CPEA-defined ASD diagnostic criteriasee 18 which requires relatively mild social-communication deficits on both the ADI-R and ADOS, but does not require the presence of any repetitive behavior.
Though there was substantial variation in measures of core features of ASD and developmental scores across individual children within sites, distributions were surprisingly similar across sites (See Table 2), with only one site falling outside a 99.5% tolerance band (compared to 11 other sites) on the ADI-Social and ADI-Communication domains and none on the ADI-RRB score. Site density distributions and permutation tolerance bands of ADI-Social, ADOS-RRB domains and NVIQ are shown online in eFigure 1 as examples (additional figures available upon request). All but 14 participants met ADI-R criteria for onset of symptoms before 3 years17.
Patterns of across-site variability for ADOS domain scores were similar. No site-related intraclass correlation exceeded 0.07 (see eResults for further explanation). Thus, the large site differences in BEC diagnoses were not accompanied by equivalent differences in standardized diagnostic scores.
Mean chronological age was 8.93 years (SD 3.5), with similar distributions of age across sites (see Table 2), with 1814 males and 288 females; differences in sites’ proportions of males: females ranged from 5:1 to 9:1 but were not statistically significant. Maternal education was high and homogeneous. Participants from all but three sites were 70% to 90% Non-Hispanic Caucasian, with 4% indicating Asian-American and 4% African-American ancestry and 8% more than one race. Mean IQs were relatively high; Vineland-Composite scores were lower, with less variation within and across sites.
Following the sequential model fitting strategy outlined earlier, classification trees were grown for BEC diagnosis using different sets of predictors. Details of CART.1 are presented in eResults and eFigure 2, with a brief description here. The most powerful predictor selected was ADOS-S+C, a standard measure of clinician-observed social-communication available for all participants. The 61% of children with moderate to severe social-communication deficits were primarily classified as AUT; diagnoses for the remaining 39% of children with milder social-communication deficits, including most of the children with BEC PDD-NOS and ASP diagnoses and about one-third of the children with BEC AUT, showed interactions with a series of predictors, including ADOS module and calibrated severity scores, each of the ADI-R domains, and clinicians’ years of experience and type of degree. Even the smallest nodes were heterogeneous across different ASDs. More experienced diagnosticians gave a higher proportion of AUT diagnoses; Ph.D.-level clinicians used PDD-NOS as a diagnosis more often than M.D.s or master’s level clinicians. This model reduced the misclassification error from 0.30 (with random assignment based on prevalence) to 0.24, a 20% percent reduction in misclassification rate (explained error, which corresponds to percent-explained variation in a linear regression).
The CART.2 model (Figure 2) added site as a predictor. The first branching was identical to CART.1. However, in CART.2, the second step in both right and left branches was site, indicating that site differences accounted for more variance in BEC diagnoses than any other factor after ADOS-S+C. When site was included in the model, most effects of clinician characteristics disappeared.
In general, similar biases affected several sites at a time. For example, ADOS-S+C ≥12 (left branch) was the only information used in 9 of 12 sites; of the children who had moderate to high observed social-communication deficits at these 9 sites, 91% were given BEC diagnosis of AUT. In the 3 remaining sites, additional information was associated with differentiation of PDD-NOS, ASP, and AUT..
Site differences also appeared at several steps in the right branch, indicating interactions between site and diagnostic scales for children with less severe social-communication deficits. “Walking through” the first few steps of the right branch of CART.2 (which includes the 825 children with relatively mild social-communication scores, <12, on the ADOS), five sites in the left sub-branch (acfgi) made proportionately more AUT diagnoses than the other seven sites, with one site (g) giving only AUT. Four of these five sites further differentiated children using ADOS-CSS (which takes into account age, language level and RRBs). Children with less severe ADOS-CSS scores were split by site again, with two sites (af) further split by abnormalities in parent reports of children’s verbal communication (ADI-VC).
The seven sites in the rightmost sub-branch (bdehjkl) predominantly gave children with milder social-communication impairments PDD-NOS BEC diagnoses. The ADOS-CSS was again taken into account, with children scoring <6 (milder severity) receiving mostly PDD-NOS diagnoses and those ≥ 6 given any of the three ASD diagnostic classifications depending on parent-reported historical accounts (ADI-Social and ADI-RRB), as well as diagnosticians’ years of experience. When differentiation by site was included, misclassification error rate improved from 0.24 to 0.21 (29% reduction of the total misclassification rate, which constitutes 9% improvement over CART.1).
Site was a very important factor, both as a main effect and in interaction with diagnostic scales, based on comparisons of CART.1 (using only the diagnostic scales and clinician characteristics) to CART.2 (which also included site and site-by-scale interactions). All p-values comparing the respective nested models were highly significant (p<1e−10). From the models where site was treated as a random factor, the variances of the random effects for site and site-by-covariate were quite large - the coefficient of variation (CV=SD of the random effect/mean effect of the covariate) ranged from 0.33 to 4.95, with the largest CV corresponding to the interaction between site and ADOS-CSS, indicating variability between sites in interpreting observed overall severity of autism symptoms in the context of children’s ages and language levels.
In CART.3 (see eFigure 3), demographic, developmental and specific behavioral characteristics were added. The primary difference from previous CART models was that, among children with moderate to severe social-communication deficits, the most important factor for BEC diagnosis became VIQ. When children had ADOS-S+C >12 and VIQs <85, 93% received AUT diagnoses across all sites.
In contrast, BEC diagnoses of children with ADOS-S+C >12 and VIQ >85or children with milder ADOS-S+C (<12; right branch), were affected by site differences and many different interactions with each of the diagnostic variables at different stages, as shown in CART.3. Splits were also made on VIQ and NVIQ at a number of places in the tree with cut-offs in IQ ranging from 85 to 122, depending on site. There were no effects of gender, ethnicity/race or maternal education, but there were effects of chronological age, adaptive behavior, and hyperactivity. When demographic, developmental and behavioral measures were included, the misclassification rate decreased to 0.17 (43% reduction of the total misclassification rate, which is an improvement of 23% compared to CART.1 and 14% compared to CART.2).
Individual trees were generated for each site using diagnostic, developmental and demographic variables as predictors. The numbers, although smaller compared to the number of participants used for CART.1-3, are sufficient to have relative confidence in the results (n from 97 to 229). In order to test the stability of models, results for CART.2 generated from the first half of the sample (n=933) were applied to the second half of the sample (n=1169). Misclassification rates were nearly identical (0.23 vs. 0.25); see eMethods). Presented online in eFigure 4 are models for 11 out of 12 sites, omitting the site where all probands had AUT.
Several findings for the 11 individual-site CART were striking. As shown in eTable 1, VIQ was the single feature most related to BEC diagnoses in five sites and the second or third strongest predictor in five others (see eFigure 4). However, there were striking site differences in VIQ cut-points and whether IQ was associated with differentiating AUT from PDD-NOS/ASP or AUT/PDD-NOS from ASP. The next most frequent predictors across sites were ADOS social-communication or repetitive behaviors, emerging first in four and two sites respectively. For 9 sites, one of these three measures predicted an entire “node” of diagnosis, in most cases, AUT; but in one case, ASP. Six sites had age effects, primarily such that ASP diagnoses were given to older children, though the age cut-points varied from 5.25 to 12 years. Cut-points for AUT vs. PDD-NOS/ASP for the ADOS-S+C domain varied from 8 to 16. Only one site had an effect of gender and also of maternal education.
Findings of differences in BEC diagnoses related to the training or level of experience of senior diagnosticians appeared to be accounted for by site differences in almost all cases, though the direction of effect (whether senior clinicians influenced others in their sites) cannot be determined. Within sites, clinician differences did not have significant effects on BEC diagnoses.
Several conclusions are inescapable. In these 12 university-based sites, with research clinicians selected for their expertise in ASD and trained in using standardized diagnostic instruments, there was great variation in how best estimate clinical (BEC) diagnoses within the autism spectrum (i.e., autistic disorder, PDD-NOS, Asperger’s disorder) were assigned to individual children. Clinical diagnoses were not random. It is not surprising that clinicians often feel strongly that their distinctions among the various ASD diagnoses mean something. However, while patterns within and across the sites were clearly discernible, they were idiosyncratic and complex.
Despite the fact that the sample was somewhat restricted in age and skewed in IQ, and that children were required to meet minimal ASD criteria on the ADI-R and ADOS, we anticipated recruitment differences associated with different referral populations. Had these restrictions not been in place, even greater site differences might have been expected. Nevertheless, in contrast to differences in BEC diagnoses, differences in distributions among children’s scores on standardized diagnostic measures across sites were almost never significant. Observational (ADOS) summary scores and verbal IQ, as well as children’s ages, parent-reported (ADI-R, ABC) information about repetitive behaviors, communication abnormalities and hyperactivity influenced diagnoses in many sites. However, careful examination suggested that patterns within sites varied considerably in how and when (along a decision tree), they took into account different factors in deciding which diagnosis to apply to children within the spectrum. Though predictors overlapped across sites, they also differed markedly in “cut-points” (e.g., individual site VIQ cut-points between AUT/PDD-NOS and ASP ranged from 62 to 127) and the order in which information was used.
Differences in BEC diagnosis could reflect regional variation. For example, in some regions,, children with diagnoses of AUT receive different services than children with other ASD diagnoses; elsewhere, AUT diagnoses may be avoided as more stigmatizing than diagnoses of PDD-NOS or ASP.
An important concern is the stability of findings based on CART models, which is a tool for discovery, rather than hypothesis testing and inference. To assess this, we evaluated how well the models developed on the 1,169 most recent participants compared to those developed on the first 933 subjects in the data collection. The misclassification rates for both models were very similar (see eMethods).
Another potential concern for the results presented in eFigure 4 is the relatively small sample size for the individual sites. Again, results for models developed on the most recent 1,169 participants were nearly identical to those from the first 933 participants, except that maternal education played less of a role in the larger sample and one site had a gender effect.
Previous research2,9 has shown that within a site, clinicians’ diagnoses can add information to standardized scores. With consistent application of BEC decision rules, or if standard training had been offered, BEC diagnoses might have been an important source of information in this study. However, given the evidence that there is little standard meaning of BEC diagnoses across sites, their utility in research is questionable.
These results have implications for revisions of current diagnostic frameworks such as DSM-V and ICD-11. Recurrent evidence of the importance of information external to a psychiatric diagnosis, particularly verbal IQ and current language level (e.g., ADOS module), supports the need for cognitive function and language level to be considered as essential to BEC diagnoses of ASD. Diagnostic classifications based upon retrospectively-recalled information from the ADI-R were not as useful as expected in discriminating groupings within the autism spectrum in this selected population, perhaps because there was so little variability. However, dimensional observational and parent-report measures of social communication and repetitive behaviors clearly contributed to clinical diagnoses. Within these 12 sites with experienced and well-trained staff, distributions of dimensional measures of standardized instruments were much more consistent than categorical BEC diagnoses. More precise diagnostic criteria might have improved them, but how to do this succinctly and address the range of developmental and individual variability in ASD is not clear. As others have suggested,33 the conceptualization and measurement of ASD as a behavioral diagnosis, based on different dimensions (e.g., social-communication and repetitive behaviors) that are strongly influenced by intelligence and language skills, may be more useful in providing links to brain function,34 genetics35 and services36 than clinical categorical diagnoses of autistic disorder, PDD-NOS or Asperger’s disorder.
We are grateful to all of the families at the participating SFARI Simplex Collection (SSC) sites. We appreciate obtaining access to phenotypic data on SFARI Base. Approved researchers can obtain the SSC population dataset described in this study [https://ordering.base.sfari.org/~browse_collection/archive[sfari_collection_v10_1]/ui:view()] by applying at https://base.sfari.org.
This research was funded by the Simons Foundation and NIMH to CL (R01 MH081873-01A1). The Simons Foundation had a role in the design and conduct of the study, including independent funding of a data management core to store data. Neither funding organization had a role in the analysis, interpretation of the data; or preparation, review, or approval of the manuscript.
CL receives royalties from the publisher of diagnostic instruments described in this paper. She gives all profits generated by the University of Michigan Autism and Communication Disorders Center (UMACC) and this and all other UMACC projects, including the SSC, to charity.