Although previous studies have applied various machine-based classification techniques to brain imaging measures in attempts to diagnose people with various neuropsychiatric illnesses, none to our knowledge have achieved a similar degree of accuracy across as wide a range of neuropsychiatric illnesses as those of the present study. We attribute this success to several unique features of our classification strategy. First, we applied these classification techniques across multiple individually and accurately defined brain regions, rather than to a single image of the entire brain, as is common using techniques such as voxel-based morphometry. Second, we used spherical wavelet transforms to capture spatial patterns of variation in local morphological features, rather than relying on individual and group variability of those local features alone when measured at single isolated voxels. Third, we applied hierarchical clustering techniques to identify natural groupings of spatial patterns of variation in morphological features of the brain across participants, rather than applying those clustering techniques to measures at each individual voxel of the brain. This approach was intended to classify brains according to normal and pathological spatial variations in morphological features that would identify unique, distributed, circuit-based disturbances across the brain associated with specific neuropsychiatric illnesses. 
The high diagnostic sensitivity and specificity of our classification algorithms across a wide range of disorders and across cohorts of varying sizes, demographic characteristics, and treatment histories, and even in a high risk sample (most of whom were unaffected by manifest illness in their lifetime) demonstrates the exceptional robustness of these methods for imaging-based diagnostic classification of individuals with chronic, well-characterized illness.
Our ability to use only morphological features of the brain to classify and diagnose individuals accurately as having a specific neuropsychiatric illness suggests that the brains of the individuals who share a primary clinical diagnosis also likely share a common core neurobiological substrate for that illness, despite the widely known and undeniable etiologic heterogeneity of virtually all neuropsychiatric disorders. This shared substrate does not mean that the brains of people who have a given neuropsychiatric diagnosis are identical. Indeed, visual inspection of the classification trees shows evidence for variability of feature vectors within diagnostic groupings, and even evidence for the presence of morphological subtypes within clusters of a single clinical diagnosis. That variability could represent either the presence of differing etiologic subtypes within a single diagnostic label or the presence of additional, co-occurring illnesses for persons who share a single primary clinical diagnosis, which is common in clinical samples.
Potential sources of error in classification included errors in the methods for extracting feature vectors from the images. Errors in extracted features would have increased their variance and therefore reduced the statistical power of our algorithms for accurately classifying and diagnosing individual brains. We have previously demonstrated, however, that our methods for spatial normalization of brain regions to the template brain are highly accurate. 
Similarly, the methods that we used to map surface features conformally onto a unit sphere have also been previously validated. 
Finally, we computed scaling coefficients using the well-validated wavelet transform. The methods applied in each of the various steps, therefore, have been extensively validated previously, and they computed highly accurate scaling coefficients. Another source of potential error in classification was the overlap of the feature spaces across disorders. We validated the structures discovered in our datasets by using leave-one-out and split-half cross validation procedures, which generally demonstrated very low rates of misclassification in independent datasets and a high level of reproducibility in generating the algorithms used for group classifications.
Although we expected that the accuracy of our classification algorithms would improve significantly by including as many as possible of the brain regions that are components of the many neural circuits distributed across the brain, cost constraints limited our inclusion of only those brain regions that already had been delineated in sufficient number for each disorder at the time when we trained and validated each classification algorithm. We have listed in the brain regions initially assessed in the training of each classification algorithm and those that contributed significantly to accurate classification. If and when these algorithms are ever used in future real-world applications, a clinician who had narrowed the diagnostic field for a patient to one of two disorders and who wanted assistance in determining which of those disorders was most likely present would use to determine the brain regions that need to be defined on the brain images of that patient and then enter those definitions into the relevant diagnostic algorithm. To determine whether a patient's most likely diagnosis is schizophrenia or bipolar disorder, for example, a clinician would need to obtain precisely defined boundaries of the amygdala, hippocampus, and cerebral hemispheres (regions that contribute to accurate discrimination of these disorders) and then enter those boundary definitions into the classification algorithm for schizophrenia and bipolar disorder.
Because our classification algorithm requires for its input the highly precise delineation of several brain regions, it is yet unsuited for dissemination as a complete and practical tool to aid in clinical diagnosis, for three reasons. First, the manual delineation of brain regions is onerous, requiring 3–4 days of rater time to define the various brain regions used to train and validate each of our classification algorithms. Any potential future application of the classification algorithms in practical clinical settings will require the development either of a more automated tool for region delineation or the availability of a central expert processing center that can define these brain regions with a high degree of precision and at relatively low cost, neither of which is currently available. Nevertheless, with these caveats limiting the real-world clinical implementation and dissemination of these techniques and algorithms, we have provided compelling evidence demonstrating the proof of concept that accurate neuropsychiatric diagnosis in single individuals is possible using anatomical MRIs alone. Given that demonstration, the way forward to real-world clinical implementation and dissemination is clear: we must now develop methods that will make precise region delineations widely available for front-line clinicians. Therefore, along with other investigators, we are designing and testing algorithms that automatically delineate brain regions and extract their image features with sufficiently high precision to maintain the performance of our diagnostic algorithms. Second, the critical assessment, comparison, and benchmarking of various combinations of algorithms for region delineation and diagnostic classification will be important in determining which of these combinations is most accurate and cost-effective for future applications within real-world settings. Third, because our algorithms were tested in chronically ill patients, they will need to be tested in new-onset patients if ever they are to be used in real-world practice, as the greatest diagnostic confusion tends to arise clinically with newly presenting patients. It is possible that the brain features that our algorithms found to be most discriminating among the various diagnostic categories were the consequences of the chronicity of the illnesses or their treatments, in which case they may not be as accurately discriminating in new-onset patients. This possibility notwithstanding, it is worth noting that most psychiatric illnesses are already chronic by the time patients present for initial clinical evaluation 
, with the duration of illness prior to first treatment ranging from 2 years for schizophrenia 
, to 6–8 years for mood disorders and 9–23 years for anxiety disorders 
Although our classification algorithms identified patients in specific diagnostic groups with remarkable accuracy, they did so in individuals who had chronic, well-defined illness. We intentionally assessed performance of these algorithms in patients whose illnesses were diagnostically clear and unambiguous because we needed confidence in the accuracy of the ground truth clinical labels with which the results of our automated classification could be compared. Those ground truth clinical labels generally are clearest and least ambiguous in chronic patients, for whom the range of symptoms and their clinical courses have fully evolved over time. 
The accuracy of those ground truth clinical labels was essential for this initial proof-of-concept demonstration that anatomical images alone can accurately diagnose neuropsychiatric illness. The clinical diagnoses were accurate and unambiguous for the participants with either TS, ADHD, SZ, or BD in our cohort because the participants had chronic, well-characterized psychiatric illnesses that had evolved over an average duration of illness of more than 10 years 
, and their diagnoses had been established using carefully applied, research-based diagnostic instruments (SCID) 
using DSM-IV-TR criteria. Diagnoses were confirmed by two senior, board-certified clinicians who reached clear consensus for the neuropsychiatric diagnoses 
for each participant. Moreover, we note that our classification accuracies were greatest, and in fact near perfect, when discriminating between two disorders (i.e., between TS or ADHD, SZ or BD, or SZ or TS), compared with rates when we were discriminating between one patient group and healthy participants (). We attribute this remarkable accuracy to the fact that when discriminating two patient groups, morphological abnormalities in both groups deviated from normal, so that the vectors representing those abnormalities in feature space for each group were further from one another than were their distances from the vectors in feature space for the healthy participants.
Diagnosing neuropsychiatric illnesses using brain imaging measures alone has the potential to transform the clinical care and research of these conditions. If imaging-based diagnoses prove to be as accurate at the stage of initial diagnosis as they seem to be in the diagnostic classification of our chronically ill patients, they will offer the promise of reducing the cost and morbidity associated with inappropriate treatments that are begun following an incorrect initial clinical diagnosis. In addition, imaging-based classifications will likely facilitate the development of primary or secondary prevention strategies for persons who are at increased risk for developing a neuropsychiatric illness. We have demonstrated the feasibility of identifying people who are at risk for becoming ill by discriminating individuals at high or low risk for familial MDD, a sample that include many individuals who had not yet manifested overt symptoms of illness. One of the most important potential future applications of this work is the use of the natural groupings of brains generated by our algorithms to identify brain-based subtypes within a single clinical diagnosis. Differing neurobiological subtypes of illness likely have differing natural histories and respond differentially to specific therapeutic interventions. Identifying those neurobiological subtypes would thereby facilitate the development of truly individualized plans for clinical care. Finally, brain-based diagnoses and the identification of biological subtypes will reduce the presence of phenocopies that are detrimental to the discovery of the genes that predispose to the development of neuropsychiatric illness.