|Home | About | Journals | Submit | Contact Us | Français|
Multi-voxel pattern analysis (MVPA) has been applied successfully to a variety of fMRI research questions in healthy participants. The full potential of applying MVPA to functional data from patient groups has yet to be fully explored. Our goal in this study was to investigate whether MVPA might yield a sensitive predictor of patient symptoms. We also sought to demonstrate that this benefit can be realized from existing datasets, even when they were not designed with MVPA in mind. We analyzed data from an fMRI study of the neural basis for face processing in individuals with an Autism Spectrum Disorder (ASD), who often show fusiform gyrus hypoactivation when presented with unfamiliar faces, compared to controls. We found reliable correlations between MVPA classification performance and standardized measures of symptom severity that exceeded those observed using a univariate measure; a relation that was robust across variations in ROI definition. A searchlight analysis across the ventral temporal lobes identified regions with relationships between classification performance and symptom severity that were not detected using mean activation. These analyses illustrate that MVPA has the potential to act as a sensitive functional biomarker of patient severity.
In 2001, Haxby and colleagues demonstrated that information could be decoded from patterns of fMRI activation across voxels that was not evident from univariate analyses (Haxby et al., 2001); since the publication of this seminal study, a class of techniques referred to as multi-voxel pattern analysis (MVPA) has been applied to a variety of questions (e.g., Cox and Savoy, 2003; O’Toole et al., 2005; Spiridon and Kanwisher, 2002; for reviews see Haynes and Rees, 2006; Norman et al., 2006; O’Toole et al., 2007). These techniques analyze recorded activity patterns using tools such as machine learning classifiers, to measure the information present within populations of voxels.
Despite the success of applying MVPA in a wide range of contexts, few studies have extended the method to investigations of atypical neural activity. Several functional studies have used multivariate approaches to classify individuals into different groups (in contrast to classifying trials into conditions) for depression (Fu et al., 2008) and drug addiction (Zhang et al, 2005). Similarly, functional models have predicted future responses to Cognitive Behavioral Therapy (full vs. partial) in depressed patients (Costafreda et al., 2009) and estimated years-to-onset of Huntington’s disease symptoms (Rizk-Jackson et al., 2010). Only a small number of clinical studies have conducted within-subject between-condition MVPA: abnormal activity patterns have been reported during object representation and working memory processes in schizophrenia (Kim et al., 2010; Yoon et al., 2008), and unusual patterns have been detected in the medial prefrontal cortex of participants on the autism spectrum during mental state reflections (Gilbert et al., 2009). None of these patient studies used MVPA measures to predict individual differences in clinical symptom severity.
The primary goal of this paper is to report the potential for MVPA to give high levels of sensitivity in relating fMRI data to patient symptoms. Through incorporating the unique contributions of individual voxels, subtleties within activation patterns are reflected in MVPA outcome measures, such as classification performance. Such subtleties are often ignored in univariate analyses, where the levels of voxel activation, or mean activation of a region, are evaluated. In a region that is functionally relevant to a disorder with atypical cognitive or behavioral symptoms, this multivariate characterization could act as a sensitive measure of variation among affected individuals. To investigate this possibility here, we use MVPA to examine a dataset from a study of fusiform gyrus activation in individuals with an Autism Spectrum Disorder (ASD; Schultz et al., 2008).
Investigating the face processing differences in people with autism has been a very active area of research, not least because of the importance of face processing to successful social functioning. Among other deficits, ASD patients show large impairments in recognizing facial identity across changes in viewing conditions (Wolf et al, 2008), despite typical performance at processing complex objects (Boucher and Lewis, 1992, Wolf et al, 2008). A large number of functional neuroimaging investigations have studied the neural substrates of these behavioral abnormalities, particularly in the fusiform gyrus, a highly face-selective brain region, which has come to be known as the fusiform face area (FFA; Kanwisher et al., 1997; Kanwisher and Yovel, 2006; Winston et al., 2004), although the specificity of fusiform computations is much debated (Gauthier et al., 1999; Kanwisher and Yovel, 2006; Schultz et al., 2003). The FFA is strongly activated when typically-developing individuals view faces, but is frequently hypoactive when individuals with an ASD view unfamiliar faces (Critchley et al., 2000; Deeley et al., 2007; Grelotti et al., 2005; Hall et al., 2003; Hubl et al., 2003; Koshino et al., 2008; Pierce et al., 2001; Piggot et al., 2004; Schultz et al., 2000, 2008; Wang et al., 2004). The processes responsible for this relative hypoactivation are an area of ongoing debate1. Although it is a related and important question, the present study is neutral on the proximate causes of hypoactivation, focusing instead on the potential for MVPA to give a sensitive functional biomarker.
A secondary aim of the paper is to provide an example of the way MVPA can be used to realize this benefit in studies designed without MVPA in mind. This suggestion may resonate with patient-group investigators looking to make the most of existing datasets, which are often expensive, time-consuming and logistically difficult to obtain. In this study, we illustrate this point with an extreme case; analyzing a dataset from an fMRI study that was not planned with MVPA in mind and that was, in many ways, suboptimal for this purpose. Designing a study for subsequent MVPA typically involves a number of considerations: The established sensitivity of MVPA to subtle visual differences (e.g., Kamitani and Tong, 2005) makes controlling visual properties, such as luminance and the visual angle of presented images, particularly important. Additionally, in order to draw conclusions about a specific category, an appropriate number of classes are required. Any two-way classification is affected by the activity patterns of both classes, so multiple comparisons are required for drawing conclusions about one condition of interest. For example, above-chance classification between class A and class B could be due to encoded class A information (where the classifier succeeds based on ‘A vs. not-A’) or class B information (‘B vs. not-B’). The successful separation of A vs. B, C and D, but not among B, C and D, however, gives some confidence that A, or at least certain features of A, are central to an area’s encoded information (as applied in O’Toole et al., 2005 and Spiridon and Kanwisher, 2002).
The above design considerations are recommended where possible; however, it is still possible to benefit from the MVPA approach when a dataset has been designed for univariate analyses. The dataset analyzed here was collected to examine how fusiform activation varies during face tasks that differ in their attentional and perceptual loads (Schultz et al., 2008), in individuals with autism. Designed specifically for univariate analyses, the study was organized in a way contrary to the optimal design considerations reviewed above: the stimuli in each condition were not individually matched for luminance, and two categories of stimuli, faces and houses, were presented to participants. Additionally, the house condition required participants to make a ‘same’ vs. ‘different’ judgment about two side-by-side houses, while the face condition was intentionally varied along several dimensions by run, including the perceptual judgments required, number of face stimuli, and presence or absence of emotional expression. A constant house condition was included in the study to act as a common baseline for between run comparisons of the different face tasks. Finally, each face condition was allocated a relatively short amount of fMRI time (five 20-second blocks each) giving a small number of trials for each of the face conditions.
In this study, we applied MVPA to four of the six runs in this fMRI dataset by classifying the activity patterns for viewing faces and houses within the participants. By grouping together the different face trials into one class, we were able to increase the number of trials to a suitable level for performing MVPA (see Pereira et al. (2009) for a discussion of the factors relevant to selecting classifier exemplars), while allowing us to investigate underlying commonalities in face activity patterns between the different face tasks. Our results showed that classification performance was more strongly related to symptom severity than a univariate measure of mean activation. The greater sensitivity of MVPA was consistent across a variety of approaches to defining the regions of interest, including an anatomical definition, face-responsive voxels defined in the control group, and even in an area defined based on the mean activation difference itself. Furthermore, using a roaming searchlight analysis across the ventral temporal (VT) lobe, we found a symptom severity relationship with MVPA in regions of cortex that were not highlighted using a univariate measure. This is the first study, to our knowledge, that reports a link between functional MVPA results and standardized measures of symptom severity: an important target for many patient-based investigations. We also hope this study will be encouraging to patient-group researchers who are looking to maximize the utility of existing fMRI datasets, and to those searching for functional techniques that are sensitive to patient symptoms.
Twelve males on the autism spectrum (ages 9.3 – 24.2, mean (M) = 13.9 years) and twelve typically-developing male controls (ages 9.4 – 23.3, M= 13.6 years) were selected for these analyses from a total sample of more than twenty in each group on the basis of having the lowest scanner movement, while matching the groups by age. All participants were recruited and studied at the Yale Child Study Center. All participants or their legal guardians gave informed written consent and were compensated for their participation, in accordance with procedures and protocols approved by the Institutional Review Board of the Yale University School of Medicine. Each ASD diagnosis was confirmed by a consensus diagnosis process involving two Ph.D level clinicians experienced with ASD differential diagnoses, using results from the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1989; Lord et al., 2000) and Autism Diagnostic Interview-Revised (ADI-R; Le Couteur et al., 1989; Lord et al., 1994). Total ADOS scores ranged between 9 and 22 (M = 15.9, standard deviation (s.d.) = 4.3). The ADOS is a set of standardized semi-structured interactions between a clinician and the relevant individual. The participant’s behaviors and responses are recorded and scored by the clinician against a series of standardized categories. An algorithm is subsequently used to combine these scores, where a cut-off helps determine each diagnosis. The ADI is an extensive structured interview, conducted by a clinician, with a parent of the patient. The interviewee is questioned extensively about their knowledge and experiences of the patient’s current and prior behaviors, and developmental trajectory. Responses are scored according to an established list of criteria, which are then combined through a standardized algorithm. These forms of assessment are suitable for participants of all ages: the clinician administering the ADOS selects one of four different modules of interactions, based on the mental age of the individual being assessed.
Control participants were screened for personal and family histories of psychiatric disorders and neurological trauma, and current axis I disorders with standardized symptom inventories that cover all DSM-IV axis I disorders: the parent report Childhood Symptom Inventory for children aged 5 - 11 years (Gadow and Sprafkin, 1994), the Adolescent Screening Inventory for individuals aged 12 – 18 years (Gadow et al., 2002), and the Adult Self-Report Inventory for individuals older than 18 years (Gadow et al., 1999).
Participants’ IQs were assessed using the WASI, WISC-IV or WAIS-III. The ASD group’s mean IQ (M = 101.6, s.d. = 21.8) was lower than the control group’s (M = 112.8, s.d. = 10.1), although not significantly so (t22 = 1.60, p = 0.12). The groups did not differ in chronological age (t22 = 0.19, p = 0.85). All participants were administered the Benton Test of Facial Recognition (Benton et al., 1994), the Edinburgh Handedness Inventory (Oldfield, 1971) and a computerized battery of face recognition, perception and memory (Wolf et al., 2008) as part of the original study. All participants, except for one in each group, were right handed. Scores on the Benton Facial Recognition Task were significantly higher (t22 = 3.45 , p = 0.002) in controls (M = 43.1, s.d. = 2.8) than in ASD individuals (M = 38.1, s.d. = 4.2), as expected (Wolf et al., 2008).
Participants were presented with gray-scale face and house images. The facial expressions and behavioral task differed between the four runs (see Figure 1). Images were presented in alternating 20-second blocks separated by 12 seconds of rest, giving a total of five face and five house blocks in each run. In the first face task, participants viewed two neutral expression faces side-by-side and indicated with a button press whether the images showed the same or different people. The pictures were presented for 3500 ms, followed by a 500 msec inter-stimulus interval (ISI). Participants were asked to answer as quickly and confidently as possible. The second task followed the same format as the first but with fearful instead of neutral faces. In the third task, participants passively-viewed single neutral faces for 1750 ms, followed by a 250ms ISI. The fourth task had participants passively view dynamic movies of faces changing their expression from neutral to fearful. The neutral face appeared for 1250 ms, followed by 500 ms of an emotion morph and then 1750 ms of a fearful expression. The house stimuli in each run were presented side-by-side for 3500ms and participants were asked to indicate with a button press whether the images showed the same or different houses. Run order was counterbalanced across participants by reversing the task order in half the sample. The block order varied across runs within participants.2 All participants practiced the tasks for 20 to 40 minutes before the scanning session until it was clear that they understood the tasks. All participants also underwent mock scanning to habituate them to the scanning environment and to train them to stay still.
Neutral face stimuli were taken from a single standardized set (Endl et al., 1998). Fearful face images were from multiple sources (Gur et al., 2002; Karolinska directed emotional faces image set, Lundqvist and Litton, 1998; NimStim Face Stimulus Set, Tottenham et al., 2002; Japanese and Caucasian Facial Expressions of Emotion; Matsumoto and Ekman, 1988, California Facial Expressions, Dailey, Cottrell & Reilly, 2001; Pictures of Facial Affect, Ekman and Friesen, 1976; in-house database). The dynamic faces were from a custom stimuli collection created with MorphMan 2000 Software (STOIK Imaging, Moscow). All stimuli were presented once during the experiment. Hair, ears and any other peripheral identifying features were cropped from all faces. The house stimuli were custom photographs of houses in New Haven, CT. All stimuli were converted to grayscale and resized to 150 × 210 pixels. Stimuli were presented using E-Prime (Psychology Software Tools Inc., Pittsburgh, PA) and PsyScope 1.2.5 for the dynamic faces (Cohen et al., 1993).
Functional T2-weighted images were acquired using a Siemens Trio 3-T scanner with a standard quadrature head coil. 40 axial slices were acquired parallel to the AC-PC plane, with whole-brain coverage. In-plane voxel size = 3.516 × 3.516 mm, slice thickness = 3.5 mm with no gap, TR = 2320 ms, TE = 25, flip angle = 60°. T1-weighted anatomical images with the same thickness were acquired in the same session (TR = 300, TE = 2.43, flip angle = 60°), as was a T1-weighted 3D anatomical data set (MP-RAGE, TR = 2530, TE = 3.66, TI = 1100, flip angle = 7°, resulting in 1 mm3 voxels). Stimuli were presented on a translucent screen at the rear of the scanner and viewed through a periscopic prism system mounted on the head coil. Singly-presented faces subtended approximately 8° of visual angle horizontally in the middle of the screen. Side-by-side stimuli subtended 19° of visual angle horizontally. During trials requiring a response, participants used fiber-optic button boxes in each hand to indicate their choice.
Imaging data were preprocessed using the Analysis of Functional NeuroImages (AFNI) software package (Cox, 1996). All functional images were slice time corrected and deobliqued to bring them in line with the axial plane. The first four volumes of each functional run were removed to allow the signal to homogenize, and signal spikes were removed (using AFNI’s 3dDespike). A motion correction algorithm was applied to register all volumes to the closest functional volume to the anatomical scan (Cox and Jesmanowicz, 1999) and the anatomical image was aligned to this same functional volume. Linear and quadratic trends were removed from each run and low frequency patterns were removed using a high-pass filter threshold of 0.0125 Hz. Voxel activation was scaled to have a mean of 100, with a maximum limit of 200. The skull was removed from the anatomical images, and all images were transformed into a standardized space (Talairach and Tournoux, 1988). Voxels in the functional datasets were resampled in the process to 3.5 mm × 3.5 mm × 3.5 mm. All participants displayed less than 3.6 mm of movement. The two groups did not differ in mean (t22 = 0.87, p = 0.39) or maximum (t22 = 1.30, p = 0.21) movement in the scanner.
All analyses were conducted within the boundaries of the VT cortex, which was manually defined for each participant in the same manner as previous studies (Haxby et al., 2001), extending 70 to 20 mm posterior to the anterior commissure in Talairach brain atlas coordinates, consisting of lingual, parahippocampal, fusiform and inferior temporal gyri. To examine the area of the fusiform gyrus typically activated by faces, we defined regions of interest in three distinct ways. Although a face localizer is a frequent method for locating the fusiform area responsive to faces, this is not suitable for a patient group known to show reduced fusiform activation to faces. In the first approach for sampling the relevant fusiform area, we placed three spheres (radius 0.5cm each) at the average Talairach coordinates for the centers of right hemisphere FFA activation from face vs. object comparisons in previous studies (+40x, -55y, -10z from Kanwisher et al., 1997; +38x, -58y, -10z and +36x, -50y, -10z from Schultz et al. (2000)’s control groups 1 and 2). The overlapping spheres covered a total region of 29 voxels. Although the FFA’s exact location varies between individuals, this approach was one way to approximate the area typically involved in face processing. We focused on the right hemisphere in this approach, as left hemisphere coordinates were not given in one of the published studies (Schultz et al., 2000), and only a small proportion of participants contributed to the average left hemisphere coordinates in the other (Kanwisher et al., 1997).
In the second approach, we isolated the area of significantly greater activation to faces than to houses in the right and left VT cortex of the control group. For this purpose we performed a traditional univariate analysis on images smoothed with a Gaussian filter (full-width, half-maximum = 8 mm) and extracted beta values from a face vs. house contrast using the four runs (with six nuisance vectors for translation and rotation movement), in the VT lobe of control participants. Corrected significance was established with a Monte Carlo voxel-cluster threshold technique (program AlphaSim) in the intersection of participants’ VT regions, giving an overall corrected alpha level of 0.05 (voxelwise p < 0.001; cluster size >=3 voxels). This resulted in a 25-voxel right fusiform gyrus cluster (center of mass at +41x, -39y, -17z) and a 29-voxel left fusiform gyrus cluster (center of mass at -47x, -41y, -16z) that were significantly more active to faces than houses in the control group. Conducting MVPA within these clusters would be circular for the controls, so they were only used for assessing the relationship between classification accuracies and symptom severity in patients (an independent group from the control participants).
In the third approach to defining relevant regions, we directly examined the area of face hypoactivation, by isolating the cluster of voxels that showed significantly less face activation in the ASD group than in the control group. We used the univariate approach described above, with the same corrected threshold, to identify the fusiform hypoactive area, by applying a two-tailed between-groups t-test to the face vs. house coefficients in the intersection of participants’ VT regions. A 23-voxel cluster of face hypoactivation was detected in the right fusiform gyrus of the ASD group (center of mass at +41x, -38y, -18z). A 6-voxel cluster was identified in the left fusiform gyrus (center of mass at -49x, -41y, -16z). We believe the area of fusiform hypoactivation is a very suitable area to test the sensitivity of MVPA, given that there is theoretical interest in the hypoactivation, and considering that defining the region based on univariate activation should only help the univariate relationship in a comparison to MVPA. The fusiform regions of interest are shown in Figure 2.
Although ASD individuals frequently show reduced fusiform face activity, their parahippocampal place area (PPA; Epstein and Kanwisher, 1998) demonstrates robust activity to images of places (Humphreys et al., 2008). We assessed the right and left PPA regions for a relationship between classification performance and symptom severity. We expected that faces and houses would be successfully classified within the PPA, however by examining the relationship with symptom severity, we could determine if any such link extended to other relevant areas of VT cortex. We localized PPA clusters in the right and left hemispheres by first identifying the peak voxel in a house > face activation contrast (in the manner described above for the second fusiform approach) in the right and left medial ventral temporal lobe of each participant. We then centered a 3 × 3 × 3 voxel cluster at each peak voxel, creating a 27-voxel cluster in each hemisphere: a volume similar in size to the fusiform regions.
Finally, to verify that any relationship to symptoms was specific to this latter stage of visual processing, we analyzed two areas in the occipital lobe sensitive to basic visual features: a voxel cluster with the same volume as one of the fusiform regions of interest (ROIs), and a larger area based on the approximate location of Brodmann’s area 17 (BA17). We defined the volume-matched cluster by localizing the cluster of 23 voxels (the same volume as the right hypoactive fusiform region) with the greatest response to face and house trials compared to the fixation period, near to the calcarine sulcus in each participant. This anatomical restriction helped localize the cluster to the typical region of V1. The BA17 region was defined through the AFNI implementation of the Talairach daemon database (Lancaster et al., 2000). This area was included to ensure that we checked for symptom relationships with classification performance in activity patterns that may be distributed across a larger volume.
All pattern analyses were implemented in MATLAB using custom scripts and the framework provided by the Princeton Multi-Voxel Pattern Analysis toolbox (Detre et al., 2006). MVPA was conducted on spatially unsmoothed data. All time points were first convolved with a model of the hemodynamic response and thresholded at 0.8 for each slice repetition time (TR), giving eight TRs in each block. The multi-voxel analyses were performed in a 5-fold cross-validation procedure: the data were separated into five across-run folds so that each fold included data from every run and, therefore, examples from all face conditions. A classifier was then trained on four folds and tested on the independent fifth, where the testing set was alternated for each of five iterations. We performed z-scoring within each cross-validation fold to preserve fold independence. To reduce the risk of the classifier model overfitting the training data in the large BA17 region, an ANOVA feature selection method was employed for this area, where voxels are selected that differ significantly between face and house conditions with a liberal threshold of p < 0.05. Crucially, the ANOVA was only conducted on training data for each fold, therefore ensuring that feature selection was not peaking. This particular selection approach preserves a reasonable number of features and has been used successfully in a number of MVPA studies (e.g., Diana et al., 2008; McDuff et al., 2009; Polyn et al., 2005). This feature selection procedure yielded a mean of 58 voxels (s.d. = 12) for controls and 66 voxels (s.d. = 19) for patients.
A ridge regression classifier was used for the classification procedure; an attractive choice for its ability to compensate for multicollinearity among features (a property of fMRI data). The method has also achieved success in previous machine learning applications (Zhang and Yang, 2003). Classification performance was recorded as the proportion of correct guesses from all iterations of the cross-validation procedure. Ridge regression requires a value to be selected for its penalty parameter. A custom script was developed to select this parameter using an embedded cross-validation technique, where the classifier was trained on part of the training data and tested on the remaining training time-points using a range of penalty values. By restricting the penalty-search process to training data, we ensured that the testing data remained independent. The process was performed for a broad range of penalty values (0, 0.01, 0.1, 1, 10, 100, 1000, 10000) and then for a narrower search of ten penalties around the broad-search penalty that gave the highest classification performance. The penalty value giving the highest overall classification performance on the training data was selected for the test set classification.
To compare classification performance of the control and patient groups, we used two-tailed between-group t-tests on classification accuracy. We employed permutation testing (Golland and Fischl, 2003) to assess whether each individual’s classification performance was greater than expected by chance, by randomly permuting class labels one thousand times to simulate the null distribution. All permutations included the same number of face and house blocks in each fold to avoid biasing the classifier, and the complete cross-validation process was conducted for each permutation, including the described penalty-search procedure. Classification performance in the top 5% of the random permutations indicated above-chance accuracy (at p < 0.05).
We also examined within- and between-category pattern correlations, in a similar manner to Haxby et al. (2001), to explore the basis for the significant MVPA–symptom relationships we report. Specifically, we calculated mean face and house patterns for each of the four runs and then correlated every face and house trial’s activity pattern with the other runs’ average patterns, in each participant. The resulting correlations were averaged to give a measure of within-category reliability for faces (face trials correlated with mean face patterns) and houses (house trials correlated with mean house patterns), and a measure of between-category similarity (face trials correlated with mean house patterns and house trials correlated with mean face patterns).
We employed a spherical searchlight analysis (Kriegeskorte et al., 2006) to explore how the relationship to symptoms varied for the univariate and multivariate results across the VT lobe. Three-dimensional searchlight clusters were first mapped onto the VT area of each participant, creating a series of voxel clusters that cover the VT cortex. For the voxels within each searchlight, activity patterns for the face and house trials were classified with a Gaussian Naive Bayes classifier in a 5-fold cross-validation procedure, with all types of face stimuli in each fold, as described above. Classification performance was allocated to the central voxel of each searchlight, giving a map of accuracies. This analysis was performed three times for each participant, using radii of 2, 3 and 4 voxels, producing clusters with volumes of 33, 123 and 257 voxels respectively, when not restricted by the VT region’s boundaries. To directly compare the multivariate searchlights with a univariate approach, we also recorded the mean activation to faces and to houses within each searchlight. We compared the methods’ sensitivities to symptom severity by correlating the ASD individuals’ multivariate result (classification performance) and then univariate result (mean activation to faces minus mean activation to houses) from each searchlight, with the ADOS social scores, where higher scores indicate a greater number and severity of social symptoms indicative of autism. It is not appropriate to report the maximum correlation values from an extensive analysis such as this (Vul et al., 2009), however it is possible to compare the sensitivities of the univariate and multivariate measures to symptom severity. To achieve this aim, we permuted the clinical scores 10,000 times and then correlated the MVPA and univariate results with the permuted clinical scores. This created a null distribution for testing the significance of the correlation with the actual clinical scores.
Before relating MVPA results to symptom severity, we tested for significant classification performance in the regions of interest. Three approaches were taken to define these regions, as described in the Methods, in part because of the difficulty of using a traditional face localizer in a group characterized by face hypoactivation. In the first approach, three overlapping spheres were placed at coordinates from previous FFA studies, giving a 29-voxel cluster in the right fusiform gyrus. Both the typically-developing (M = 0.74, s.d. = 0.09) and ASD (M = 0.78, s.d. = 0.07) groups showed high face vs. house classification performance, where chance was 0.50. Permutation tests revealed that classification performance was significantly above chance for all typically-developing and ASD participants (p < 0.05 for all but one control who had a trend at p = 0.06). There were no significant group differences in classification performance (t22 = 1.24, p = 0.23).
The second approach employed a 25-voxel cluster in the right fusiform gyrus and a 29-voxel cluster in the left fusiform gyrus, reflecting significant face activation in the control group. These were not suitable regions of analysis for the control participants, but ASD participants demonstrated high classification performance in the right (M = 0.69, s.d. = 0.11) and left (M = 0.64, s.d. = 0.06) regions. Permutation testing revealed significant above-chance performance for all but one ASD individual in the right cluster (ten participants at p < 0.02, one at p = 0.05, one at p = 0.38) and all but two in the left cluster (ten participants at p < 0.02, one at p = 0.09, one at p = 0.39). The participant with the least significant result in the right cluster (p = 0. 38) and a trend in the left cluster (p = 0.09) corresponded to the left-handed ASD individual in the sample.
The final fusiform region of interest was the fusiform hypoactivation detected in the ASD group through a univariate group comparison: a 23-voxel cluster in the right fusiform gyrus and a 6-voxel cluster in the left fusiform gyrus. The control group showed high classification performance in the right (M = 0.71, s.d. = 0.08) and left (M = 0.71, s.d. = 0.08) clusters. The ASD group had lower performance in the right hypoactive cluster (M = 0.66, s.d. = 0.09) than controls, although not significantly so (t22 = 1.53, p = 0.14). The ASD group’s classification performance in the smaller left fusiform cluster (M = 0.54, s.d. = 0.04) was significantly lower than the control group’s (t22 = 6.51, p < 0.001). Permutation testing showed that classification accuracies were significantly above chance in the right and left regions for all controls (p < 0.04). In the ASD group, classification performance was significant or approaching significant for the right cluster in all but one participant (nine at p < 0.05, two at p < 0.08 - including the left-handed ASD participant, one at p = 0.18), however for the left hypoactive cluster, ten of the ASD participants’ classification accuracies were not significantly above chance (two at p < 0.03, one at p = 0.06, nine at p > 0.13). As performance in the left hypoactive region was not significant for the majority of the ASD group, possibly because of the small size of the cluster (6 voxels), we did not analyze this area further.
To assess the importance of the voxel patterns, we replaced the voxel responses with the regions’ mean activation levels at each time point, and repeated the above classifications. This replacement produced a substantial reduction in ASD classification performance for the coordinate-defined spheres (M = 0.59, s.d. = 0.07; two-tailed paired comparison: t11 = 9.43, p < 0.001), the area of control-group right face activation (M = 0.52, s.d. = 0.07; t11 = 6.99, p < 0.001), control-group left face activation (M = 0.54, s.d. = 0.06; t11 = 6.61, p < 0.001), and the right hypoactive cluster (M = 0.50, s.d. = 0.07; t11 = 5.56, p < 0.001). Control participants also experienced significant reductions in classification performance for the coordinate-defined spheres (M = 0.56, s.d. = 0.07; t11 = 5.62, p < 0.001) and right hypoactive cluster (M = 0.64, s.d. = 0.05; t11 = 3.59, p = 0.004). It is noteworthy that in the region of right hypoactivation, using mean activation values instead of the voxel patterns was particularly detrimental for ASD classification performance: the mean reduction in performance was 0.07 (s.d. = 0.07) for controls and 0.16 (s.d. = 0.10) for ASD participants, giving a significant group-difference in the size of the decrease (t10 = 2.83, p = 0.02). Using mean activation values here is conceptually similar to a typical univariate analysis, but has an advantage of producing results on the same scale as MVPA. The MVPA and mean-replaced classification results are shown in Figure 3.
We conducted face vs. house classifications in PPA clusters for each participant, for our subsequent analysis of the relationship between MVPA results and symptom severity. In a 27-voxel cluster centered in the right PPA of each individual, control (M = 0.84, s.d. = 0.07) and ASD (M = 0.87, s.d. = 0.05) participants showed high classification performance. Similarly, a 27-voxel cluster in the left PPA gave high classification performance in the control (M = 0.82, s.d. = 0.08) and ASD (M = 0.85, s.d. = 0.06) individuals. Permutation testing revealed greater-than-chance accuracy for both regions in every participant (p < 0.04). We also conducted the face vs. house classification within two visually-responsive occipital areas, a 23-voxel cluster near the calcarine sulcus of each participant, and a larger approximation of BA17, for subsequently relating performance to symptom severity. In the 23-voxel cluster, classification performance was high in control (M = 0.67, s.d. = 0.04) and ASD (M = 0.69, s.d. = 0.06) participants, with no significant difference between the groups (t22 = 0.90, p = 0.38). Permutation testing showed greater-than-chance accuracy in all participants (p < 0.05). Similarly for the BA17 region, classification performance was high in control (M = 0.77, s.d. = 0.04) and ASD (M = 0.76, s.d. = 0.04) participants, with no significant difference between the groups (t22 = 0.48, p = 0.64). Permutation testing revealed greater-than-chance accuracy in all participants (p < 0.002). We had predicted above-chance performance in these regions in advance, because of the visual differences in the stimuli, as described in the introduction. Verifying above-chance classification performance is important for the later link to symptom severity: as the two stimuli classes can be distinguished in these visually-responsive areas, any lack of a relationship with symptom severity in the next stage of the investigation cannot be because there is no relevant information in these regions.
We investigated the sensitivity of MVPA to individual variation in patient symptoms by examining the relationship of classification performance to standardized measures of clinical severity. We also assessed the relationship between symptoms and a univariate measure. Face vs. house classification accuracy was significantly negatively correlated with patients’ ADOS total scores for all the right fusiform regions (Table 1). These scores are a measure of severity from a structured extended interaction with the patient by an experienced clinical professional. Higher ADOS scores indicate greater severity of symptoms, such that lower classification accuracies were found in more severely affected ASD individuals. Significant negative correlations were also found between the ADOS social component sub-scores and classification accuracies in the right hypoactive and control-group right face activation clusters (Figure 4). The coordinate-defined spheres relationship approached significance. The social component score of the ADI, an additional assessment of social symptom severity from rated interviews with one of the patients’ parents, was significantly related to classification performance in all three right fusiform regions. Performance in the left area of control-group face activation was not significantly related to the clinical measures, although approached significance for the ADOS total score. The regions’ mean activation values (face z-scores – house z-scores) were not significantly correlated with any measure of clinical severity. Table 1 lists the statistical values for these results.
Analyses of the PPA clusters showed that classification performance was not significantly correlated with ADOS total or social scores (Table 1). Classification accuracies were, however, significant correlated with the ADI social scores in the right and left PPA clusters (both at p = 0.02). Despite this latter significant result, the weak relationship between PPA classification accuracy and symptom severity as measured by the ADOS, suggests a degree of specificity for the fusiform areas. Classification performance within the 23-voxel cluster near the calcarine sulcus, and within the approximate BA17 region, was not significantly correlated with ADOS total or social scores (all p > 0.7), suggesting the significant relationships in the fusiform regions do not result from differences in basic visual processing. This is also evidence against scanner motion acting as a mediating factor in the significant relationships with symptoms: any systematically-varying scanner motion would affect other brain areas encoding stimuli differences. Although BA17 classification performance was unexpectedly significantly negatively correlated with ADI social scores (p = 0.05), the very weak correlations with ADOS scores (p = 0.78 and p = 0.83), and a weak ADI relationship with performance in the 23-voxel occipital cluster (p = 0.68), give confidence that the strong fusiform and ADOS correlations are not because of early visual processing or motion differences. Motion effects are further ruled-out by the very weak correlations between scanner movement and classification performance in all the fusiform regions (coordinate-defined spheres: r = -0.06, p = 0.84; control-group right face activation: r = -0.14, p =0.66; control-group left face activation: r = 0.00, p > 0.99; right hypoactive cluster: r = -0.07, p =0.84).
Additionally, neither age (coordinate-defined spheres: r = 0.04, p = 0.90; control-group right face activation: r = 0.13, p = 0.69; control-group left face activation: r = -0.11, p = 0.74; hypoactive cluster: r = 0.16, p = 0.63) nor IQ (coordinate-defined spheres: r = 0.37, p = 0.24; control-group right face activation: r = 0.47, p = 0.12; control-group left face activation: r = 0.18, p = 0.58; hypoactive cluster: r = 0.42, p = 0.17) were significantly correlated with performance in the fusiform regions, suggesting these variables were not driving the significant effects. Finally, we examined the signal-to-noise ratio (SNR) to ensure that the relationships between classification performance and symptom severity were not driven by a systematically lower SNR in participants with greater symptom severity. For each ASD individual, we calculated a value for the SNR by dividing the mean baseline (an estimate of the signal) by the standard deviation of the residual time series (an estimate of the noise). There were no significant relationships between symptom severity, measured through the ADOS social scores, and mean SNRs in the VT lobe (r = 0.03, p = 0.93), right control-group face activation (r = -0.36, p = 0.25), left control-group face activation (r = -0.46, p = 0.13) or right hypoactive cluster (r = -0.31, p = 0.33). The SNR in the coordinate-defined spheres was close to being significantly related to symptom severity (r = -0.57, p = 0.055), however the weak relationships for the other regions suggest that systematic differences in SNR cannot account for the MVPA – symptom severity relationships.
Scores on the Benton face recognition task, where higher scores indicate greater face recognition ability, were not significantly related to classification performance in the coordinate-defined spheres (ASD: r = 0.29, p = 0.35; controls: r = -0.32, p = 0.31) or the left cluster of control-group face activation (ASD: r = 0.46, p = 0.13), but approached significance in the right cluster of control-group face activation (ASD: r = 0.54, p = 0.07) and right hypoactive cluster (ASD: r = 0.56, p = 0.06; controls: r = -0.28, p = 0.37) for ASD participants.
Behavioral performance for the in-scan ‘same vs. different’ task with neutral faces was not significantly correlated with classification performance in the fusiform regions for ASD participants (coordinate-defined spheres: r = 0.10, p = 0.76; control-group right face activation: r = -0.23, p = 0.47; control-group left face activation: r = -0.33, p = 0.30; right hypoactive cluster: r = -0.15, p = 0.63) or controls (coordinate-defined spheres: r = -0.27, p = 0.40; right hypoactive cluster: r = 0.05, p = 0.88). It is possible that these weak correlations are due to behavioral performance approaching ceiling, although controls (face M = 0.88, s.d. = 0.07; house M = 0.96, s.d. = 0.03) showed greater task accuracy than ASD participants (face M = 0.80, s.d. = 0.09; house M = 0.92, s.d. = 0.05) for faces (t22 = 2.34, p = 0.03) and houses (t22 = 2.66, p = 0.01). It is also possible that the influence of the passive viewing task activation on the classification results dilutes a link between behavioral and classifier performance.
We also explored whether lower classification accuracies in more severely-affected participants result from greater variability in their multi-voxel face patterns, or because their face and house patterns are less discriminable (more positively correlated). To examine this, we performed within- and between- category correlation analyses in the fusiform regions with MVPA – symptom relationships. The right hypoactive region’s face patterns were significantly or close-to-significantly less correlated with average face patterns in individuals with increased symptom severity (ADOS social: r = -0.63, p = 0.03; ADI social: r = -0.64, p = 0.03; ADOS total: r = -0.54, p = 0.07). There were no significant relationships between house correlations and ADOS total or social scores, with just a trend for ADI social scores (r = -0.55, p = 0.07). The between-category correlations were more positive (reflecting less discriminable patterns) in individuals with higher ADI social scores (r = 0.71, p = 0.01; r = 0.69, p = 0.01), but this did not reach significance for the ADOS total or social scores. In the right cluster of control-group face activation, only the ADI scores gave significant relationships: increased symptom severity was associated with less correlated within-category face patterns (r = -0.63, p = 0.03) and higher between-category correlations (r = 0.78, p = 0.003; r = 0.73, p = 0.01), with a weak trend for house patterns (r = -0.52, p = 0.09). The coordinate-defined spheres had no significant symptom relationships for within- or between-category correlations, excepting between house correlations and ADOS total scores (r = -0.59, p = 0.04), although this was not significant for the ADOS or ADI social scores.
We examined regional variation of multivariate and univariate sensitivities to symptom severity in the VT lobes using a spherical searchlight analysis. We ran the searchlight technique with multivariate (face vs. house classification accuracy) and univariate (mean activation to faces minus mean activation to houses) measures for each ASD participant. The recorded values for each sphere were correlated with the ADOS social scores. The searchlight procedure was conducted using a radius of 2, 3 and then 4 voxels to examine how the results vary with searchlight size. The strength of the relationships between the searchlight measures and symptom severity was greater overall for the multivariate measure than for mean activation, indicated by two-tailed paired t-tests on the absolute correlation coefficients of searchlights with a 2-voxel, (t970 = 3.96, p < 0.001), 3-voxel (t970 = 3.06, p = 0.002) and 4-voxel radius (t970 = 2.47, p = 0.01).
We also examined which searchlights were significantly related to symptom severity by permuting the participants’ ADOS social scores 10,000 times and computing the searchlights’ correlations for each permutation. Comparing the correlation from using the correct ADOS scores gave a map of p-values for each searchlight size, for the univariate and multivariate measures. No clusters of more than one searchlight were significantly correlated with ADOS social scores when mean activation was employed as the searchlight dependent variable, for any of the three radii (at a liberal threshold of p < 0.01). In contrast, when classification performance was employed as the searchlight measure, using a 2-voxel radius detected three searchlight clusters (of at least 2 contiguous central voxels) that were significantly related to symptom severity: a 10-voxel cluster centered in the right fusiform gyrus (center of mass: x = 39, y = -36, z = -18), a 4-voxel cluster centered in the left parahippocampal gyrus (center of mass: x = -25, y = -39, z = -13) and a 3-voxel cluster centered in the right inferior temporal gyrus (center of mass: x = 58, y = -31, z = -19). These results are shown in Figure 5. Using a 3-voxel searchlight radius revealed a 4-voxel cluster in the right fusiform gyrus (center of mass: x = 41, y=-35, z = -20) that partially overlapped with the 10-voxel cluster reported in the 2-voxel radius analysis. No significant searchlights were detected with a 4-voxel radius. We also applied a cluster-based correction for multiple comparisons, although this did not produce significant results for any of the searchlight radii. Despite this null finding, the detection of these regions at a liberal threshold, using MVPA results but not the univariate measure, suggests the presence of stronger relationships between symptoms and the MVPA results, than the univariate measure employed here.
The primary aim of this study was to examine if MVPA measures can be sensitive to patient symptom severity. We found that classification performance, a multivariate measure of separability for the face and house fMRI patterns, was strongly related to standard measures of clinical severity in ASD participants. Specifically, in both anatomically and functionally defined clusters of right fusiform voxels, classification performance was significantly negatively correlated with symptom severity, while mean activation levels were not. This greater sensitivity of pattern analyses extended to voxels that were defined using differences in univariate measures. Assessments of the PPA region showed that this sensitivity is not a general property of VT cortex. Analyzing two occipital areas additionally confirmed that the finding did not generalize to activity patterns involved in early visual processing. A searchlight analysis across the ventral temporal lobes detected regions where classification performance was significantly related to symptom severity, which were not detected using the searchlights’ mean activation levels, although only when a liberal threshold was employed.
We have provided an example of obtaining these benefits from a functional dataset not designed with MVPA in mind. By combining multiple face conditions into one face class, we were able to utilize a large number of trials for classifier training and testing. Although the design of the experiment placed limitations on the conclusions that can be drawn from the results (discussed below), multivariate classification performance was still more sensitively related to symptom severity than the univariate measures we employed.
The findings in this study provide, to the best of our knowledge, the first evidence that MVPA functional results can reliably predict clinical symptoms. The sensitivity to clinical severity obtained using MVPA supports the idea that subtle variations in activity patterns, reflected in MVPA results, can in some cases more sensitively reflect individual variation in an area’s functional characteristics, than certain measures of mean activation. It is noteworthy that this stronger link to symptom severity was also found in a set of voxels that was defined based on a univariate statistic (i.e., a significant difference in activation values between groups). This latter finding gives confidence that the greater sensitivity does not reflect the activity patterns of voxels that are separate from those demonstrating univariate differences.
The greater sensitivity to individual differences reported here will be of interest to researchers who are involved in characterizing variation across participants in a wide range of fields. Among clinical investigators, this interest may even extend to those looking to select individuals for future interventions. As Scherf et al. (2010) noted when discussing the failure to find a significant relationship between (univariate) fusiform gyrus activation and ADOS scores, “such predictability could have substantial implications for identifying individuals who might benefit from a behavioral intervention designed to improve face processing” (p.13). Future research will be required to establish if the sensitivity reported here extends to other patient groups, and to other regions with reduced univariate activation.
The findings reported in this study are also relevant to clinical researchers looking to make the most of existing functional datasets. The detection of regions with patterns of activity that reflect variations in patient symptoms, without a corresponding significant univariate relationship, suggests the encouraging possibility that additional regions of interest may be identifiable in previously-collected datasets. In this dataset, we found that symptom severity was related to face vs. house classification performance in a coordinate-defined area (based on face activity coordinates in prior literature), which neighbors hypoactivation in this particular group of ASD participants. A significant relationship here suggests that areas of the fusiform gyrus, without a significant difference in univariate activation, may nevertheless show activity patterns that vary systematically with symptom severity. This may be expected from an activity pattern perspective, where a significant group difference in univariate activation can be conceptualized as two very distinct activity patterns. Variations in face perception-related activity may still be present in nearby areas of cortex, even if undetectable with univariate techniques.
Linking multivariate searchlight results to individual differences gives further potential for revealing new regions of interest. In the context of this dataset, our searchlight finding of a symptom relationship in the inferior temporal gyrus (ITG) fits with several previous studies that have suggested the region may play a role in face processing in this patient group (e.g., Koshino et al., 2008; Schultz et al, 2000). Some univariate studies have not detected ITG involvement (e.g., Pierce et al., 2001), giving the possibility that systematic differences in ITG activity – differences that may not always be detectable with univariate analyses – could have been present, undetected, in the functional data of such studies. Although the identified searchlight locations have backing from prior literature, the failure to detect these regions at a more stringent threshold means these results should be interpreted with some caution. Despite this caveat, the multivariate searchlight approach has the potential to highlight new regions that are functionally related to patient symptoms.
Our examination of within- and between- category correlations may be of interest to investigators exploring the basis for MVPA–symptom relationships. In this dataset, the within-participant consistency of multi-voxel face patterns was lower in individuals with greater symptom severity in the hypoactive fusiform region. This may be an underlying factor in the MVPA–symptom relationships, although our finding that greater symptom severity, when measured by the ADI, is accompanied by less discriminable face and house patterns is also suggestive. Future studies may wish to examine further the relative contributions of face pattern consistency and face / non-face discriminability, including whether the variations in multi-voxel face patterns reflect larger differences across face types (which varied by run here), or within face types.
Although in this particular context we found multivariate measures to be a strong predictor of symptoms, there may be other contexts and questions where univariate measures are more sensitive. In a recent MVPA study, Quamme et al. (2010) reported that univariate measures were more sensitive than MVPA to task behavior, at the group level, in several of the regions they examined. As Quamme and colleagues described, the complexity of MVPA is accompanied by a vulnerability to over-fitting noise, which is less likely to occur with across-voxel averages (Quamme et al, 2010). It is therefore very possible that univariate measures could provide a more sensitive measure of individual differences than MVPA in some circumstances. For this reason, it should not be concluded from this paper that MVPA will always be a more sensitive measure for tracking functionally-relevant individual differences, but that in certain circumstances it can be. We further note that a variety of univariate measures are available for fMRI analyses. Although we found that MVPA results were a stronger predictor of symptom severity than the univariate measure we employed (mean face activation – mean house activation), this may not apply to all univariate measures. This is additionally relevant as the MVPA and univariate measures employed here differ in the number of free parameters. Overall, we view the univariate and multivariate approaches as complementary, with each adding its own value.
Despite the findings we report here, it must be acknowledged that the design of the original fMRI study places a limit on interpretations. Specifically, the inclusion of two stimulus categories, faces and houses, limits the condition-specific conclusions that can be drawn, as discussed in the introduction. Future studies of neural differences in face processing may consider including additional stimulus classes, such as scrambled images and object categories (as in Haxby et al., 2001; Spiridon and Kanwisher, 2002). Assessing classifications of faces vs. non-faces, alongside the classification of different non-face categories, would further investigations of face-specific activity patterns in ASD individuals. Employing alternative processing and classification methods may also contribute additional insights (see O’Toole et al. (2007) for a discussion of different classification approaches). Another approach for future research would be to perform MVPA within anatomically-defined ROIs. For example, ASD symptom severity should be correlated with face-related classification performance in the fusiform, but not parahippocampal, gyrus. As PPA activity can sometimes extend into the fusiform gyrus (Epstein and Kanwisher, 1998), it may be desirable to restrict such anatomically-defined fusiform ROIs to certain sub-sections of the gyrus for face vs. house classifications, or expand the non-face classes as discussed above. Future studies may also wish to use MVPA to study activity pattern variations for different face identities in individuals with an ASD. Investigating the nature of the multi-voxel patterns generated by different faces in this group could advance our understanding of their face processing differences.
We have shown that MVPA can act as a sensitive fMRI predictor of patient symptoms. We believe this study highlights an important use of MVPA techniques for the study of autism and other clinical conditions. The application of pattern analysis techniques to patient differences is still in its infancy, but this investigation shows that the approach has the potential to measure clinically relevant patterns. Furthermore, MVPA combined with mapping techniques can identify brain regions that may not be revealed with certain univariate approaches.
This work was supported by NIMH grant R01MH073084 (R.T. Schultz), with further support from NIH grant R01MH070850 (S. L. Thompson-Schill). We thank members of the Thompson-Schill lab and CHOP Center for Autism Research for helpful discussions. We thank Lauren Hallion for valuable comments on an earlier version of the manuscript, and the anonymous reviewers for their insightful suggestions.
1For example, recent evidence has suggested that hypoactivation may be less robust for familiar faces (Pierce et al., 2004; Pierce and Redcay, 2008), but this is not always found (Grelotti et al., 2005). A further debate concerns the source of fusiform signal variance in ASD patients: while some studies have suggested that individual differences in looking behavior may drive signal variance (Bookheimer et al., 2008; Dalton et al., 2005; Hadjikhani et al., 2004; Hadjikhani et al., 2007; Perlman et al., 2010), the paradigms used in several of these studies have been critiqued (Klin, 2008; Schultz et al., 2008) and not all investigations have been able to confirm this (Humphreys et al., 2008; Schultz et al., 2008).
2The original experiment also contained a fifth run investigating image fixation on a cross hair presented between the eyes on each face, and a sixth run with overlapping faces and houses, which were not analyzed here.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.