Functional connectivity MRI data from a set of 26.4 million “connections” per subject is able to successfully classify a subject as autistic or typically developing using a leave-one-out approach with an accuracy of 60.0% (p < 2.2 × 10−10), across a set of 964 subjects contributed from 16 different international sites. Overall specificity was 58.0% and overall sensitivity was 62.0%. Classification consisted of a weighted average of connections that used no information about the left out subject except for age, gender, site, and handedness. Using a weighted average of all 26.4 million connections resulted in a classification accuracy of 55.7% (p = 0.00017), with best accuracy (60.0%) achieved for a subset of connections that satisfied p < 10−4 for a difference between autism and control among remaining subjects for each left-out subject. Classification scores significantly covaried with metrics of current disease severity including ADOS-G (as opposed to ADI-R, which incorporates disease severity at early ages), SRS, and verbal IQ metrics. Classification accuracy significantly improved in sites for which longer BOLD imaging times were used, but no relationship was found between number of subjects contributed by a site and classification accuracy.
Classification accuracy was lower in this multisite study despite its much larger sample size when compared with a prior study using similar methods from a single site (Anderson et al., 2011d
). The prior study achieved ~80% accuracy, with 90% accuracy for subjects under 20 years of age in both a primary cohort and a replication sample of affected and unaffected individuals from multiplex families. Several reasons may explain this difference. Expanding a classifier to accommodate multisite data necessarily involves dealing with many additional sources of variance. The pulse sequence, magnetic field strength, scanner type, patient cohort and recruitment procedures, scan instructions (eyes open vs. closed vs. fixation), BOLD imaging length, age distribution, gender differences, and population ethnicity all varied across sites. Each of these variables has the potential to decrease sensitivity and specificity of functional connectivity measurements for autism. Nevertheless, a multisite cohort helps test generalizability of the results across different samples, making it more likely that connections identified as discriminatory between autism and control reflect disease properties rather than particulars of a single dataset.
Classification accuracy in the multisite cohort varied with the subset of connections used to construct the classifier. This finding reflected a tradeoff between improved accuracy when using more connections with decreased accuracy when including less specific connections in the classifier. This result argues against a homogenous regional distribution of connectivity abnormalities in autism in favor of a heterogeneous spatial distribution of connectivity disturbances that involves specific brain regions. Analysis of brain regions most affected in abnormal connections herein confirms the findings of previous reports: areas of greatest abnormality included the insula, regions of the default mode network including posterior cingulate and medial prefrontal cortex, fusiform and parahippocampal gyri, Wernicke Area (posterior middle and superior temporal gyrus), and intraparietal sulcus (Anderson et al., 2011a
; Gotts et al., 2012
). All of these regions correspond to functional domains that are known to be impaired in autism, including attention, language, interoception, and memory. We note that some of these regions are in brain areas with relatively high susceptibility artifact and sensitivity to changes in brain shape (such as the medial prefrontal cortex). However, given the coherent distribution of the default mode network, we favor an interpretation of network-based differences attributable to autism rather than underlying structural or artifactual sources of these findings.
When interrogating subsets of connections from an independent dataset based on the Euclidean distance between ROIs and connection strength in a previous study, we found that the most informative connections consisted of typically strong connections between distant ROIs that were weaker in autism, and typically negatively correlated connections, that were less negative in autism (less anti-correlated) (Anderson et al., 2011d
). In the current study, the connection bins based on strength and distance that showed greatest classification accuracy were not precisely the same connection bins found previously. Rather, they were adjacent to the bins in the previous study. This is the case because the classification algorithm in the current study takes advantage of larger numbers of connections. There was again a tradeoff between using more connections, given that individual connections exhibited relatively little information, and using sets of connections that differed more in autism. Thus, bins of medium strength connections (0.3 < z
< 0.5) outperformed the more specific bins of stronger connections (z
> 0.5) because the slightly weaker sets of connections included many more connections in the bin. This cautionary finding is relevant when attempting to identify the “optimal” set of connections for constructing candidate brain imaging biomarkers for ASD. Although specific affected regions appear to have autism connectivity abnormalities, classification schemes using only a small number of connections are likely to suffer from the high variance in metrics for individual connections.
This point is reinforced by a significant positive relationship between classification accuracy across sites and the length of BOLD imaging time per subject. Previous studies of test-retest reliability using functional connectivity MRI have shown that accuracy of results varies with one over the square root of BOLD imaging time (Van Dijk et al., 2010
; Anderson et al., 2011c
), with only moderate reproducibility when short BOLD imaging times such as 5 min are used (Shehzad et al., 2009
; Van Dijk et al., 2010
; Anderson et al., 2011c
). This relationship would suggest that classifiers using information from many brain regions continue to show benefit from much longer imaging times, with continued improvements even after hours of imaging across multiple sessions per subject to the extent this is practical (Anderson et al., 2011c
). Improvements in pulse sequence technology may also facilitate acquisition of greater numbers of volumes in shorter periods of time (Feinberg and Yacoub, 2012
). The correlation between total imaging time and accuracy was more significant than the correlation between number of volumes used after scrubbing and accuracy. This might indicate that imaging time is more important than the number of volumes used. As multiband acquisition protocols become more prevalent (Setsompop et al., 2012
), it will be important to determine the extent to which finer sampling vs. longer imaging time will contribute to specificity of BOLD fcMRI measurements.
In a prior study that examined the effect of BOLD imaging time on ability to identify functional connectivity values obtained from a single individual compared to a group mean, individual “connections” could only be reliably distinguished after 25 min of BOLD imaging time. The number of connections that could be reliably distinguished increased exponentially with imaging time for at least up to 10 h of total imaging time (Anderson et al., 2011c
). Indeed, there is good theoretical basis that any desired accuracy can be obtained with sufficient imaging time, stretching into many hours. Although Van Dijk and colleagues report that the intrinsic connectivity measurements stabilize around 5 min of imaging time, they also state that noise continues to decrease at a rate of 1/sqrt(n), where n is the amount of imaging time (Van Dijk et al., 2010
) (which is in accordance with our findings from (Anderson et al., 2011c
). Moreover, they report that the stabilization is of composite network-level metrics rather than connections between small individual ROIs. In contrast, we have found that coarse network-level measurements are not particularly informative in classification compared to fine-grained metrics that take into account specific differences in the spatial distribution of connectivity. There may be no upper limit for continued improvements if more imaging time were obtained.
We found significant relationships between the classification score and some behavioral measures, such as social function and daily living skills, however, the proportion of variance in the behavioral measures that was explained by the linear relationship between the classification score and the behavioral measure was small (between 0.5 and 2.9%). This may be due to the overall poor accuracy of the classification approach. As accuracy and techniques for combining multisite data improves, we also expect an increase in the proportion of variance accounted for by the correlations.
Additional benefits may be achieved through improved classification algorithms that take advantage of machine learning techniques to allow more effective weighted combinations of connections. Similarly, multimodal classifiers remain a promising, relatively untapped method for characterizing diagnostic and prognostic information about autism. Given classification accuracies of single site datasets exceeding 80% for structural MRI (Ecker et al., 2010a
; Jiao et al., 2010
; Uddin et al., 2011
; Calderoni et al., 2012
; Sato et al., 2013
), diffusion tensor MRI (Lange et al., 2010
; Ingalhalikar et al., 2011
), positron emission tomography (Duchesnay et al., 2011
), and magnetoencephalography (Roberts et al., 2010
; Tsiaras et al., 2011
; Khan et al., 2013
), it would be of great interest to determine whether different modalities identify similar cohorts of subjects correctly, and whether a combination neuroimaging approach that leverages these different features might be able to achieve even greater accuracy than any one alone.
Although multisite datasets such as those in ABIDE are invaluable for testing replicability of neuroimaging findings in autism, they contain inherent limitations that should be recognized. Large inhomogeneities in acquisition parameters, subject populations, and research protocols limit the sensitivity for detecting abnormalities. These inhomogeneities may overwhelm the ability of discriminating many findings, and may lead to overconfidence in a result as definitive because of the large sample of subjects used. There remains a need for replicating results in high-quality, carefully controlled individual datasets that may show increased sensitivity for some results compared to multisite data, as exhibited by classification accuracy in the present study. Preprocessing methods may also bias results in unpredictable ways, as has been suggested with head motion correction strategies (Power et al., 2012
; Van Dijk et al., 2012
) and regression procedures (Murphy et al., 2009
; Anderson et al., 2011b
; Saad et al., 2012
). Datasets such as those in ABIDE will be of great value in testing multiple procedural manipulations in relatively large samples allowing determination of optimal processing methods for specific questions. Ultimately, it is unknown whether differences in resting state functional connectivity in autism arise from differential performance of the “resting” task or underlying differences in structural connectivity reflected in the measurements. Continuing comparison with structural metrics such as diffusion tensor imaging will help to clarify this point.
Nevertheless, it remains an attractive hypothesis that with longer imaging times, controlled acquisition strategies, integration of multimodal features, and improvement in classification methodology, neuroimaging may be able to contribute useful biological information to the clinical diagnosis and care of individuals with ASD and further elucidate pathophysiology and brain-based intermediate phenotypes.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.