Object selectivity in human visual cortex
Eleven subjects (6 male, 9 right-handed, 12 to 34 years old) were presented with digital photographs of objects belonging to 5 categories (“animals”, “chairs”, “human faces”, “fruits” and “vehicles”) while recording electrophysiological activity from 48 to 126 intracranial electrodes implanted to localize epileptic seizure foci (Experimental Procedures). There were 5 different exemplars per category (
Figure S1B
). Images were grayscale, contrast-normalized and were presented for 200 ms in pseudo-random order, with an interval between images of 600 ms (
Figure S1A
), while subjects performed a one-back task. The physiological responses from two example recording sites are shown in . An electrode located in the left temporal pole (Talairach coordinates: −27.0, −11.8, −35.3, area number 43 in
Table S2) showed differential activity across object categories, with an enhanced response to images from the “fruits” category (, black trace). To quantify the degree of selectivity, we defined the intracranial field potential (IFP)
response as the range of the IFP signal, that is, max(IFP)-min(IFP), in a time window from 50 to 300 ms after stimulus onset. The time window from 50 to 300 ms was chosen so as to account for the approximate latency of the neural response (
Richmond et al., 1983;
Schmolesky et al., 1998) while remaining within the average saccade time (
Rayner, 1998). We note that in many cases, the selectivity extended well past the 300 ms boundary (e.g., see
Figure S1 and F
for two particularly clear examples). Throughout the manuscript, we focus on the early part of the physiological responses (see Discussion). We also quantitatively compared different time windows and other possible definitions of the IFP response (see
Figure S3 and Supplementary Material). In the example in , a one-way ANOVA across categories on the IFP responses yielded
p<10
−6 (Experimental Procedures). The selective response was consistent across repetitions (
z-score=3.2, Supplementary Material) and also across the different exemplars for each category (). When we ranked the objects according to the IFP response, the 5 best objects were the 5 exemplars from the “fruits” category (, bootstrap
p<10
−5, Experimental Procedures). Therefore, the category selectivity in this electrode cannot be ascribed to responses to only one or a few particular exemplar objects. Another example electrode is shown in . This electrode, located in the left inferior occipital gyrus (Talairach coordinates: −48.8, −69.1, −11.8, area number 14 in
Table S2) in a different subject, showed stronger responses to images from the “human faces” category (, blue trace). More examples are shown in
Figure S1 D–G
for more examples.
While many studies have focused on analyzing the average responses across multiple repetitions, the brain needs to discriminate object information in real time. This imposes a strict criterion for selectivity. We used a statistical learning approach (
Bishop, 1995;
Hung et al., 2005) to quantify whether the information from IFPs available
in single trials was sufficient to distinguish one category from the other categories (Experimental Procedures). shows the distribution of the IFP responses for the trials where a fruit was presented (black trace) versus all the other trials (gray trace) and shows the corresponding ROC curve (
Green and Swets, 1966;
Kreiman et al., 2000). We used a Support Vector Machine (SVM) classifier with a linear kernel to compute the proportion of single trials in which we could detect the preferred category (in this example, fruits). Other ways of defining selectivity, including a one-way ANOVA on the IFP responses (
Kreiman et al., 2000), a point-by-point ANOVA (
Thorpe et al., 1996) and the use of other statistical classifiers (
Bishop, 1995), yielded similar results (Supplementary Material). We separated the repetitions used to train the classifier and those used to test its performance to avoid overfitting. The output of the classifier is a performance value (labeled “Classification Performance” throughout the text) that achieves 100% if all test trials are correctly classified and 50% for chance levels. In the examples in and , we could decode the preferred object categories (“fruits” and “human faces” respectively) from the IFP responses in single trials with performance levels of 69±6% and 65±5% respectively.
We recorded data from 912 electrodes in 11 subjects (see
Table S1 for the recording site location distribution). To determine whether a given classification performance value was significantly different from chance levels, we used a bootstrap procedure whereby we randomly shuffled the object category labels (Supplementary Material). Based on this bootstrap procedure, we used a threshold of 3 standard deviations from the mean based on the null hypothesis (
Figure S2A
). We observed selective responses in at least one electrode in all the subjects. Most of the selective electrodes (67%) responded preferentially to one rather than multiple object categories (
Figure S2B
). Across all the electrodes, we observed selective responses to the 5 different object categories that were tested (
Figure S2C
) expanding on previous IFP reports describing mostly selectivity to faces (
McCarthy et al., 1999). The distribution of selective responses across categories was not uniform: there were more electrodes responding to the “human faces” category than to any of the other four categories (
Figure S2C
). We note that the observation of an electrode that shows selectivity for a particular object category does not necessarily imply the existence of a “brain area” devoted exclusively to that category (see Discussion). The classification performance values ranged from 58% to 82% (
Figure S2D
, 61%±4%, mean±s.d.). Other statistical classifiers and other definitions of IFP response yielded similar conclusions (
Figure S3 and Supplementary Material).
As illustrated in the two examples in , the category selectivity could not be explained by preferences for individual exemplars. Very few electrodes showed statistically significant performance levels when the classifier was trained to discriminate among individual exemplars instead of across categories (Supplementary Material). Additionally, the classification performance also dropped significantly when each category was arbitrarily defined by randomly drawing 5 exemplars out of the total of 25 exemplars (Supplementary Material). We asked whether the physiological differences across object categories could be explained by low-level image properties shared by the exemplars of a given category. To address this question, we considered fifteen different image properties including contrast, number of bright or dark boxes, etc. and computed the correlation coefficient between the IFP responses and the image properties for each electrode. Although in a few electrodes we observed significant correlations, on average, the low-level image properties that yielded the strongest correlations could not account for more than 10% of the variance in the IFP responses (Supplementary Material).
Of the 912 electrodes, 111 (12%) showed selectivity. These electrodes were non-uniformly distributed across the different lobes (occipital: 35%, temporal: 14%, frontal: 7%, parietal 4%). To determine the location of the electrodes, we co-registered the pre-operative MR images with the post-operative CT images and mapped each electrode onto one of 80 possible brain locations (
Table S2). The areas with the highest proportions of selective electrodes were the inferior occipital gyrus (86%, area 14 in
Table S2), fusiform gyrus (38%, area 17 in
Table S2), parahippocampal portion of the medial temporal lobe (26%, area 19 in
Table S2), mid-occipital gyrus (22%, area 15 in
Table S2), lingual gyrus in the medial temporal lobe (21%, area 18 in
Table S2), inferior temporal cortex (18%, area 31 in
Table S2) and the temporal pole (14%, area 43 in
Table S2). The number of selective electrodes in all the areas that we studied is shown in
Table S1
.
The analyses reported thus far describe the selectivity of individual electrodes but the brain has access to information from many different locations. In order to quantify how well we could decode information by considering the ensemble activity across electrodes or across brain regions, we constructed a vector containing the responses of multiple electrodes and used the same statistical classifier approach described above (see (
Hung et al., 2005) and Experimental Procedures). Data from multiple electrodes were concatenated assuming that the activity was independent (Supplementary Material).
Figure S4A
shows one way to build such an ensemble vector by selecting electrodes based on the
rv value, that is the ratio of the variance across object categories to the variance within object categories. The electrode selection was performed using only the training data. Using such an ensemble of 11 electrodes, the overall classification performance ranged from 74% to 95%. These values correspond to binary classification for each object category where chance performance is 50%; the values for multiclass classification where chance performance is 20% are shown in
Figure S4B
. These values provide only a lower bound to the information available in the neural ensemble because the addition of more electrodes, the use of more complex non-linear classifiers and the incorporation of interactions across areas could enhance the classification performance even more. In order to further evaluate the spatial specificity of the selective responses, we built neural ensembles by drawing electrodes based on their location (
Table S1). The regions that yielded the highest classification performance levels are shown in and included the inferior occipital gyrus (area 14 in
Table S2), fusiform gyrus (area 17 in
Table S2), lingual gyrus (area 18 in
Table S2), parahippocampal gyrus (area 19 in
Table S2), inferior temporal cortex (area 31 in
Table S2) and the temporal pole (area 43 in
Table S2). The classification performance values for all areas are shown in
Figure S5. The ensemble results in (and
Figure S5) are consistent with the location results presented for individual electrodes above.
Timing of selective responses
The intracranial field potentials show a temporal resolution of milliseconds. We therefore used the detailed time course of the IFP to estimate the latency of the selective responses. We followed the procedure in (
Thorpe et al., 1996): (i) we defined an electrode to be “selective” if there were 25 consecutive bins (bin size = 2 ms at BWH and 3.9 ms at CHB) where a one-way ANOVA on the IFP responses across object categories yielded
p < 0.01; (ii) for the selective electrodes, we defined the latency as the earliest time point where 10 consecutive bins yielded
p < 0.01 in the same analysis (
Figure S6A
). The latencies depend on the threshold parameters; this parameter dependence is described in
Figure S6B–C
. The parameters 25, 10 and 0.01 represent conservative cut-off values that yield a low probability of false alarm. This selectivity criterion yielded 94 selective electrodes (cf. 111 selective electrodes reported above); the number of electrodes that passed the selectivity criterion in 1000 shuffles of the category labels was 0. The distribution of latencies is shown in ; the mean latency across all the electrodes was 115±44 ms. The latencies for those electrodes in the inferior occipital gyrus (area 14 in
Table S2) were shorter than the ones for electrodes in temporal lobe locations (
Figure S6D
). In order to evaluate the earliest time point at which we could reliably decode category information in single trials, we binned the IFP responses using windows of 25 ms and computed the IFP range in each bin. We used an SVM classifier to decode category information
in each time bin (). This is a demanding task for the classifier because we only used a very small amount of data (25 ms) to train the classifier and to test its performance (see
Figure S7A–C
for the results for each object category and using bin sizes of 25, 50 ms and 100 ms). In spite of the task difficulty, we could still discriminate category information as early as ~100 ms after stimulus onset. The latencies and decoding results are compatible with single neuron studies in macaque monkey inferior temporal cortex (
Hung et al., 2005;
Keysers et al., 2001), with psychophysical observations in humans (
Potter and Levy, 1969) and also with scalp EEG recordings in humans (
Thorpe et al., 1996).
Invariance to scale and viewpoint
A key aspect of visual recognition involves not only being able to discriminate among different objects but also achieving invariance to object transformations (
Desimone et al., 1984;
Hung et al., 2005;
Logothetis et al., 1995;
Riesenhuber and Poggio, 1999;
Rolls, 1991). We considered two important and typical object transformations: scaling and depth rotation. Each exemplar object was presented in a “default” rotation and size (3 degrees of visual angle) and also two additional rotations (~45 and ~90 degrees) and two additional scales (1.5 and 6 degrees of visual angle;
Figure S1C
). Except in the case of human faces, the definition of a “default” viewpoint for the other object categories is arbitrary (these “default” viewpoints are shown in
Figure S1B
). We asked whether the selective responses, such as the ones illustrated in , would be maintained in spite of these object transformations. An example of a recording site that showed strong robustness to scale and depth rotation transformations is shown in . This electrode, located in the parahippocampal part of the right medio-temporal-gyrus (Talairach coordinates: 32.6, −34.8, −13.6), showed an enhanced response to “human faces” at all the scales and depth rotations that we tested. The selective responses were consistent across repetitions, exemplars and object transformations (
Figure S8). The distribution of IFP responses for the preferred and non-preferred categories and the ROC analyses () suggested that it would be possible to discriminate the preferred object across object transformations in single trials. To evaluate the degree of extrapolation across changes in scale and depth rotation, we trained the classifier using
only the IFP responses to the “default” scale and rotation and tested its performance using the IFP responses to the transformed objects (Experimental Procedures). Other train set / test set combinations yielded similar results (Supplementary Material). The decoder showed invariance, with an average classification performance across the four conditions in of 73±8%. Overall, invariance to scale and depth rotation was observed in at least one electrode 10 of the 11 subjects. Forty-nine electrodes (44% of the selective electrodes) showed invariance to scale; the classification performance ranged from 60% to 82% (
Figure S2F
, 64%±4%, mean±s.d.). Sixty-four electrodes (58% of the selective electrodes) showed invariance to depth rotation; the classification performance ranged from 60% to 81% (
Figure S2E
; 65%±4%, mean±s.d.). Most of the electrodes that showed invariance to scale changes also showed invariance to depth rotation changes and vice versa. There was a strong correlation between the classification performance values for these two transformations (; Pearson correlation coefficient=0.70). The majority of the electrodes that showed invariance to scale or depth rotation were located in the occipital lobe (inferior occipital gyrus, area 14 in
Table S2) and in the temporal lobe (parahippocampal gyrus (area code 19), inferior temporal cortex (area code 31), fusiform gyrus (area code 17), medial temporal lobe (area code 18) and temporal pole (area code 43)). The number of electrodes in each location that showed invariance to scale and depth rotation are shown in
Table S1. The classification performance values for a neural ensemble of 11 electrodes ranged from 58% to above 90% (
Figure S4 C–D
). The neural ensembles formed by electrodes in the inferior occipital gyrus or electrodes in the temporal lobe showed the strongest degree of robustness to scale and depth rotation changes (; see
Figure S5 for the classification performance values for ensembles from all brain regions).
We estimated the latency of the IFP responses as described in the previous section (
Figure S6A
). Latencies were computed separately for each one of the 5 combinations of scale and depth rotation. There was no significant difference among the latencies for the different object transformations (). Robust and selective responses were apparent within ~100 ms of processing, suggesting that the neural responses in the temporal lobe may not require additional time to analyze different scales or depth rotations. It should be noted that this statement refers to categorizing the images based on the physiological responses and may not extrapolate to identifying individual exemplars. Consistent with the short latencies, we could also decode the object category as early as ~100 ms after stimulus onset even when extrapolating across scaled and rotated versions of the objects (; see also
Figure S7 D–F
).