|Home | About | Journals | Submit | Contact Us | Français|
Recent FMRI studies suggest that cortical face processing extends well beyond the fusiform face area (FFA), including unspecified portions of the anterior temporal lobe. However, the exact location of such anterior temporal region(s), and their role during active face recognition, remain unclear. Here we demonstrate that (in addition to FFA) a small bilateral site in the anterior tip of the collateral sulcus (‘AT’; the anterior temporal face patch) is selectively activated during recognition of faces but not houses (a non-face object). In contrast to the psychophysical prediction that inverted and contrast reversed faces are processed like other non-face objects, both FFA and AT (but not other visual areas) were also activated during recognition of inverted and contrast reversed faces. However, response accuracy was better correlated to recognition-driven activity in AT, compared to FFA. These data support a segregated, hierarchical model of face recognition processing, extending to the anterior temporal cortex.
In humans and other primates, face recognition is vital for social interaction, in a way distinct from recognition of non-face objects. Analogously, evidence from clinical (Moscovitch et al., 1997) and psychophysical (Farah et al., 1998; Hole et al., 1999) studies suggest that visual object recognition includes specialized (e.g. holistic) process used for recognizing faces, and a more generic (or ‘analytic’) process for recognizing non-face objects. Furthermore, specific image transformations (e.g. inversion and contrast reversal) differentially affect these two recognition mechanisms (Kemp et al., 1996; Farah et al., 1998; Hole et al., 1999). Such evidence raises several questions. First, are these face versus non-face recognition mechanisms mediated by: a) two anatomically distinct pathways, or b) a largely shared and overlapping pathway? Second, if such segregated pathways exist, are inverted and/or contrast reversed faces processed in the non-face stream, or via the face pathway (with lower accuracy)?
Based on sensory selectivity, fMRI studies have described several cortical areas that respond more to faces, compared to non-face objects, including (but not limited to) the Fusiform Face Area (FFA) (Kanwisher et al., 1997; McCarthy et al., 1997). Recently, an additional sensory-driven face-selective patch (‘AT’) was reported in the anterior temporal lobe (Tsao et al., 2008; Rajimehr et al., 2009; see also Kriegeskorte et al., 2007; Nestor et al., 2011). In a wide range of tests, the anterior temporal lobe is involved in object recognition in humans (Sergent et al., 1992; Price et al., 1996; Allison et al., 1999; Kriegeskorte et al., 2008) and monkeys (Mishkin et al., 1983; Tanaka, 1997). However, less is known about activity in the anterior temporal lobe in general, and in AT in particular, during tasks such as facial recognition, and/or related tasks of object-based attention. Early fMRI studies reported that FFA activity increased during attention to faces in general (Wojciulik et al., 1998; O’Craven et al., 1999), and/or during recognition of famous/familiar faces (Grill-Spector et al., 2004). However, those studies did not test activity in then-unknown AT. Accordingly, one goal here was to test whether this anterior temporal face area responds specifically during facial recognition tasks.
A related goal was to test both face versus non-face streams of recognition processing. Non-face objects can be divided into two subcategories; here we tested the effects of both types. First, recognition can be directed to the infinite range of objects that are obviously not faces; here we tested houses as a commonly used example. Secondly, we tested whether non-typical (transformed) face images are processed in face or non-face streams. Familiar faces are difficult to recognize when they are inverted (Murray et al., 2000; Tanaka and Farah, 2003), or when luminance contrast is reversed (Kemp et al., 1996; Hole et al., 1999; Itier and Taylor, 2004). Many psychophysics studies suggest that such image-transformed faces may be processed outside the face recognition network (e.g. Farah et al., 1998; Rhodes et al., 2006). However, such transformed faces are otherwise identical to normal faces. Thus as an alternative, they could be processed within the standard face selective pathways of the brain, but at lower signal strength (e.g. Valentine et al., 1991; Freiwald et al., 2009; Eimer et al., 2010). A third hypothesis suggests that recognition of these transformed faces relies on both selective and non-selective areas (e.g. Moscovitch et al., 1997; Pitcher et al., 2011). In either case, these transformed faces offer a valuable window onto the neural mechanisms underlying normal facial recognition.
17 subjects (11 female), aged 20–37 years, participated in this study. All subjects had normal or corrected-to-normal visual acuity and radiologically normal brains, without history of neuropsychological disorder. All experimental procedures conformed to NIH guidelines and were approved by Massachusetts General Hospital protocols. Written informed consent was obtained from all subjects.
Computer-generated 3D images of faces (FaceGen, Singular Inversions, Canada) and houses (Google-Sketchpad Software) were used as recognition targets (Figure 1a). The face set included 10 independent, emotionally neutral, Caucasian male adults without hair. The house set included images of 10 different houses, of ranch/colonial style.
The differences between face identities reflected a corresponding variation in the size/location of facial features (e.g. eyes, lips and nose), but other factors such as age, ethnicity, skin color, overall face size were adjusted between faces. We used a similar approach to generate the house image set: all houses were based on a common 3D template, and differences between individual houses were produced by varying the size and/or location of architectural features (e.g. windows, door and, stairs). All images were achromatic, and presented at the center of the screen against a light gray uniform background. Each image subtended 10° of visual angle (average of horizontal and vertical extent). For each individual face and house, we generated images in three views: frontal and rotated ± 45° in depth. We also generated two additional image sets based on the 10 original faces: inverted faces, contrast reversed faces.
As a detection target, we also presented a small, low-contrast dot on all stimuli (Figure 1b). To manipulate detection of the dot, the luminance of pixels within the dot was slightly decreased, relative to the original pixel values in the underlying face/house. The location of this dot was randomly selected in each trial. Dot size varied depending on visual field eccentricity between 0.4° to 2°, according to the cortical magnification factor (Yue et al., 2011). The levels of contrast between 1) the dot and the background face/house, and 2) the face/house relative to the uniform gray background, were systematically varied in accord with the subjects performance, using a staircase procedure (see below).
We used an independent set of retinotopically-matched face and scene images to localize face-selective (Fusiform Face area (FFA), Occipital Face Area (OFA)), and scene-selective (Parahippocampal Place Area (PPA), Transverse Occipital Sulcus (TOS), Retro-Splenial Cortex (RSC)) areas. Face images were based on group photographs, and scene images were indoor scenes (for further details see Rajimehr et al., 2009, and Nasr et al., 2011). An independent set of intact and scrambled everyday objects (including cars, furniture, tools, vehicles, etc.) was used to localize object-selective area (Lateral Occipital Complex (LOC)). Details of those localizing stimuli are described in Yue et al. (2011).
Stimuli were presented via LCD projector (Sharp XG-P25, 1024 × 768 pixels resolution, 60 Hz refresh rate) onto a rear-projection screen. Matlab 7.8 (MathWorks, US) and Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) were used to control stimulus presentation.
Subjects performed either of two tasks (1-back recognition or 1-back dot location), both requiring a two alternative forced choice (2 AFC) response. The task remained constant within a given block, but varied pseudo-randomly across blocks. During the recognition task, subjects reported whether consecutively presented face/house images were drawn from a common face/house identity, or from two different faces/houses. In this task, the location of the simultaneously presented dots was irrelevant (Figure 1b). In the dot location task, subjects compared the dot location between two consecutive stimuli and reported whether the two dots were presented in the same (left-right) hemifield, or in different hemifields. During this task, face and house identities were irrelevant. Dot location was confined to the visual field region in which the face or house stimuli was also presented; this matched the spatial extent over which attention was distributed in the recognition versus dot location tasks.
Subjects were familiarized to the tasks and stimuli by practicing for ~ 45 minutes prior to scanning. After their performances stabilized, subjects were scanned. For each individual subject, task difficulty for face stimuli was adjusted by setting the level of stimulus/dot contrast so that subjects’ performance converged at ~75%, for the one-back recognition and dot location tasks on inverted faces. To minimize sensory variations, stimulus/dot contrast for remaining stimuli (i.e. normal and contrast reversed faces) was set to an overlapping range, irrespective of performance. Task difficulty for house stimuli was adjusted similarly by setting the level of stimulus/dot contrast so that subjects’ performance converged at ~75%, for the one-back recognition and dot location tasks on houses.
To localize relevant cortical areas (FFA, OFA, PPA, TOS, RSC and, LOC), subjects participated in a separate scan session using a different dummy task, in which they only had to fixate a central dot and report a color change in that dot (red to green or vice versa), while objects from different visual categories were presented on the screen.
Figure 1b shows a schematic example of the experimental procedure within a run. Each run started with presentation of a small green fixation point at the center of screen that remained visible for 10 s. Then the color of the fixation dot changed to either blue or red, to cue subjects to perform either the recognition task or the dot location task. The relationship between the color of the fixation point and the subjects’ task was counterbalanced between subjects. Stimulus presentation began 10 s after this color change. Within each block, 15 stimuli were presented one-by-one at the center of the screen. To avoid luminance aftereffects, each stimulus was presented very briefly (0.2 s) followed by a decrease in contrast, based on a time dependent exponential function (1/(1 + 50 * exp(−1.5/t)); total duration 1.3 sec). A uniform gray ISI of 3.5 s separated each pair of stimuli. Consecutive stimuli within each pair were always different from each other, and always rotated 45° from each other, always including a frontal view. Subjects were frequently reminded to fixate the central target, which remained visible throughout the run. At the end of each block, following the last stimulus offset and subsequent blank interval, the color of the fixation point changed to the opposite color (i.e. from red to blue or vice versa). Ten seconds after this color change, the stimulus presentation began again and subjects were required to do the alternative task. Subjects reported their answers by pressing one of two keys on a keypad. Accuracy was stressed more than speed, but subjects had to report their answers before presentation of the following stimulus (i.e. within 5 s from stimulus onset). For each session, we analyzed the BOLD signal changes during task executions, but not during preparatory (fixation only) intervals.
Since the overall scan sessions lasted more than 5 hours, we divided the experiment into three sessions (i.e. face recognition, house recognition and, localizer session). Several precautions were used to reduce possible differences between sessions: 1) for each subject, the sequence of scan sessions was selected randomly. 2) For all subjects, fMRI activity was registered to a high resolution structural scan of the same individual (see below). Therefore we were able to compare the location of activity across sessions. 3) All activity contrasts and comparisons between activity amplitude across conditions were based on fMRI signals collected within a given session. Face-activated sessions consisted of 12 runs (4 runs for each face type). Each run consisted of 4 blocks: 2 blocks for recognition and 2 runs for dot location task and the stimulus type (i.e. either normal, inverted or contrast reversed images) remained the same between the two tasks. The sequence of tasks within each run was selected pseudo-randomly (i.e. either recognition followed by dot location, recognition and then dot location, or dot location followed by recognition, dot location and then recognition). Details of the house-activated session were similar, except that house sessions (lacking the contrast and inverted condition) were based on 4 runs.
All but one subject were scanned three times: one session for the face stimuli, one session for the house stimuli, and another session with localizer stimulus sets. The remaining subject was not able to participate in the house session. Each scan session lasted less than 1.5 hours, and was conducted immediately after ~30–45 minutes training on the recognition and dot location tasks.
All subjects were scanned in a horizontal 3T scanner (Siemens Tim Trio). Gradient echo EPI sequences were used for functional imaging during task (TR 2500 ms, TE 30 ms, flip angle 90°, 3.0 mm isotropic voxels, and 41 axial slices) and localizer (TR 2000 ms, TE 30 ms, flip angle 90°, 3.0 mm isotropic voxels, and 33 axial slices) sessions. In task sessions, the field of view included the whole brain, for all subjects. In the localizer session, the field of view included the whole brain excepting the superior tip of parietal cortex. A 3D MP-RAGE sequence (1.0 mm isotropic voxels in humans) was also used for high-resolution anatomical imaging from the same subjects. Functional and anatomical data were preprocessed and analyzed using FreeSurfer and FS-FAST (version 5.1; http://surfer.nmr.mgh.harvard.edu/).
For each subject, we reconstructed the inflated cortex based on a high resolution anatomical scan (Dale et al., 1999; Fischl et al., 1999; 2012). All functional images were corrected for motion artifact and then spatially smoothed using a 3D Gaussian kernel (2.5 mm HWHM), and normalized across scans. To estimate the intensity of the hemodynamic response, a model based on a γ function was fit to the fMRI signal, and then the average signal intensity maps were calculated for each condition (Friston et al., 1999; Nasr et al., 2011). Voxel-wise statistical tests were conducted by computing contrasts based on a univariate general linear model. Finally, the significance levels were projected onto the inflated/flattened cortex after a rigid co-registration of functional and anatomical volumes. To generate group-average maps, functional maps were spatially normalized across sessions and across subjects using Freesurfer. Next, activity within each individuals’ brain was transformed spatially onto the ‘averaged human brain’ using a spherical transformation (Fischl et al., 1999), then averaged using both fixed and random effects models (Friston et al., 1999).
For each individual subject, we defined regions of interest (ROIs) for FFA, OFA, PPA, TOS, RSC and LOC, based on independent localizing images (see above). For the lesser-known area AT, ROIs were defined based on even versus odd runs from the recognition task (Kriegeskorte et al., 2007). Since this analytic difference could lead to differential ROI-based results between FFA and AT, we also re-defined ROIs for FFA based on even versus odd runs, as used for AT.
Figure 2 shows the map of group-averaged (n=14) activity differences throughout cortex, during recognition of normal faces (accuracy (mean ± s.d.): 88.4 ± 5.8%), relative to activity during the dot location task (80.7 ± 8.3%), based on identical stimuli in matching regions of the visual field (see Methods). Among visual sensory areas, activity increases during the recognition task (i.e. recognition-related activity) were essentially confined to a string of patches along the ventral temporal lobe (Figures 2 and 3A–B). The most posterior patch coincided with the Fusiform Face Area (FFA), which was localized with an independent set of scans and stimuli (see Methods and Supp. Figure 1A). The most anterior patch was located near the tip of the temporal lobe, at the anterior terminus of the collateral sulcus (CoS). Based on a corresponding sulcal/gyral location and Talairach coordinates (Table 1), this latter site is presumptively the anterior temporal face patch (AT (Rajimehr et al., 2009), as defined based on sensory stimuli. In both FFA and AT, recognition-related activity was bilateral, without obvious hemispheric bias. More variably, additional patches of face recognition activity were also found between FFA and AT, especially in the left hemisphere (e.g. Figure 3A). The activation in AT and other intermediate patches between FFA and AT was further confirmed in grouped-average maps based on random effects analysis (Supp. Figure 2A–B).
Since response accuracy differed between the recognition versus dot location tasks (t(16)=4.27, p<10−3), we tested whether these residual performance differences contributed to the striking differences in the maps. Group-averaged activity maps were regenerated based on the subset of subjects (n=6) who showed the least (<4%) difference between their performances in the recognition versus dot location tasks (t(5)=1.44, p=0.21). The group-averaged map based on this subset of subjects (n=6; Supp. Figure 3) showed the same basic features as the larger group averaged maps (n=17; e.g. Figure 3A–B). This result supports the conclusion that the maps reflected the task differences in this comparison, rather than differences in performance and/or task difficulty. Note that the distribution of spatial attention was also equated between the two tasks (see Methods).
The house recognition task (see Methods) produced a quite different pattern of cortical activity, when similarly compared to the dot location task (Figures 3C–D, Table 2). Here, the response accuracy between the house recognition (77.3 ± 9.6%) and dot location (77.6 ± 8.5%) tasks did not differ significantly (t(15)=0.16, p=0.88). The activity increase during house recognition was centered in the Parahippocampal Place Area (PPA, sometimes referred to as the building selective region (Levy et al., 2004)), and in the Lateral Occipital Complex (LOC), the Transverse Occipital Sulcus (TOS) and the Retro-Splenial Cortex (RSC) - but not in AT (also see section 3.6). All these features were confirmed in random effect maps (Supp. Figure 2C–D).
Although the level of activity was generally higher during house recognition compared to face recognition, we did not find any activity in area AT even at a threshold of p=10−5 (i.e. lower than that used for the face recognition maps) (Supp. Figure 4A–B and Table 2). Therefore, the lack of AT activity during house recognition is likely unrelated to threshold difference between the face versus house recognition maps (see Discussion section 4.2).
As expected from psychophysics (Kemp et al., 1996; Itier and Taylor, 2004), the subjects’ response accuracy for contrast reversed faces (77.4 ± 7.8%) was significantly impaired (t(16)=6.89, p<10−3), compared to that for normal faces. Also consistent with the psychophysics (Kemp et al., 1996; Itier and Taylor, 2004), response accuracy during recognition of inverted faces (71.2 ± 6.9%) decreased even further (t(16)=3.1, p<0.01), relative to contrast reversed faces. As expected, subjects’ performance on the dot location task remained constant (t(16)=0.43, p=0.68) during inverted (76.2 ± 9.1%) and contrast reversed (76.9 ± 7.3%) face conditions.
Interestingly, we found that the activity produced by recognition of inverted (Figure 3E–F and Supp. Figure 2E–F) and contrast reversed (Figure 3G–H and Supp. Figure 2G–H) faces was confined to the areas activated by normal faces (Figure 3A–B and Supp. Figure 2A–B). That result does not support psychophysical predictions (Farah et al., 1998; Rhodes et al., 2006) that recognition of such transformed faces is mediated by generic non-face mechanisms, or by a combination of face and non-face selective areas (Moscovitch et al., 1997; Pitcher et al., 2011). Instead, it supports the hypothesis that the same patches activated during recognition of normal faces were also modulated during recognition of transformed faces.
In independent scan sessions using a dummy task (see Methods), we presented retinotopically-matched face versus scene stimuli to localize areas FFA, PPA, TOS and RSC (Supp. Figure 1A). Consistent with previous sensory tests (Rajimehr et al., 2009), activity differences in FFA and PPA were robust (p<10−15), whereas activity in AT was relatively weak: the latter was found only at lower threshold levels (p<10−2) in the right hemisphere of the group-averaged maps (Supp. Figure 5). The fMRI maps indicated that these particular face images were more effective than these place images in V1 and other lower tier visual areas. However there was no retinotopic bias in the observed pattern of activity, nor in the face/place localizing stimuli, and the topography of all the face/place-biased areas was consistent with the literature (Kanwisher et al., 1997; McCarthy et al., 1997; Haxby et al., 1999; Grill-Spector et al., 2004; Rajimehr et al., 2009).
Figure 4 shows the location of the anterior temporal patch across different experimental conditions. Importantly, we found no overlap between the AT and activity evoked during house recognition within temporal lobe – even though the two maps were generated using independent data sets. Instead, the location of AT remained remarkably consistent across different experimental conditions (Figure 4 and Tables 1), and in comparisons of even versus odd runs (Supp. Figure 6). Activity amplitudes sampled in AT during odd versus even runs were significantly correlated in both the left (r=0.32, p<10−5) and right (r=0.39, p<10−8) hemispheres.
In a further confirmation, we found that the location of AT (i.e. anterior tip of collateral sulcus) remained quite consistent across individual subjects (Supp. Figure 7). Thus the overall data confirmed the existence of reliable AT activity in a discrete location within the anterior tip of collateral sulcus based on 1) group-averaged activity maps based on both fixed (Figure 3) and random (Supp. Figure 2) effects, 2) comparison of even versus odd runs and, 3) across individual subjects, and 4) across both hemispheres.
In contrast to the consistent location of AT, the location of additional face patches between FFA and AT varied from one subject to another, and such intermediate patches were not found in all subjects (Supp. Figure 7).
Next we compared the activity dependence within three main face-selective areas (AT, FFA and OFA) for task-based versus sensory-based factors. Figure 5A–B shows the group-averaged activity maps during either face recognition or dot location tasks, both measured relative to the baseline (i.e. fixation only). In these maps, FFA activity was significant during both dot location tasks (i.e. sensory input without a face relevant task) and face recognition tasks (sensory plus a relevant task), but the AT response was mainly confined to recognition blocks. Figure 5C shows the activity within each region of interest (ROI) in left and right hemispheres. Independent applications of two-factor repeated measures ANOVA (‘task’ (recognition versus dot location) and ‘brain hemisphere’ (left versus right)) to fMRI activity within each area showed a significant effect of ‘task’ in AT (F(1, 16)=20.41, p<10−3) and FFA (F(1, 16)=33.54, p<10−4) but not in OFA (F(1, 16)=2.65, p=0.12). Although there was a tendency for stronger activity in right relative to left FFA and OFA, and an opposite tendency for AT, the effect of ‘brain hemisphere’ and ‘task’ × ‘brain hemisphere’ interaction was insignificant in all three areas (p>0.10).
To compare the level of task-related activity increase between the face-selective areas (i.e. OFA, FFA and AT), we defined a task-dependency ratio as (100 × (ActivityRecognition − ActivityDot Location)/(ActivityRecognition + ActivityDot Location)) based on the measured fMRI activity within these areas (Figure 6). Application of a two-factor repeated measures ANOVA (‘brain area’ (AT versus FFA versus OFA) and ‘brain hemisphere’ (left vs. right)) showed a significant effect of ‘brain area’ (F(2,32)=9.10, p<0.01) due to higher AT task-dependency ratio compared to FFA and OFA (post-hoc Bonferroni; p<0.05). This comparison also showed that the task dependency did not vary significantly between the two hemispheres (F(1,16)=0.11, p=0.74). The ‘brain area’ × ‘brain hemisphere’ interaction also remained insignificant (F(2,32)=0.06, p=0.94).
Although the amplitude of activity in AT was generally weaker than that in FFA and OFA (Figure 5C), the above analysis indicates that the recognition task modulates AT significantly. Moreover, the AT modulation is stronger than that seen in more posterior face-selective areas FFA and OFA (Figure 6).
Although FFA and AT were both activated during recognition of faces, ROI analysis revealed a difference between the pattern of task-driven responses (i.e. ActivityRecognition − ActivityDot Location) within AT and FFA across face types (Figure 7). In AT (Figure 7A), recognition of normal and inverted faces evoked the largest and the smallest responses respectively (i.e. normal faces > contrast reversed faces > inverted faces); qualitatively this matched subjects’ response accuracy across face types (Figure 7D). In FFA (Figure 7B), this pattern was different; the contrast reversed faces evoked the weakest response (i.e. normal faces > inverted faces > contrast reversed faces). In contrast to these two areas, task-driven activity within OFA (Figure 7C) was weak, showing little variation across face types.
To further evaluate differences between the task-driven responses in AT, FFA, and OFA, we applied a three factor repeated measures ANOVA, with ‘brain area’ (AT versus FFA versus OFA), ‘face type’ (normal vs. inverted vs. contrast reversed) and brain hemisphere (Left versus Right) as independent factors. We found a significant effect of ‘brain area’ (F(2, 32)=25.98, p<10−4), mainly reflecting stronger activity in AT and FFA relative to OFA (post-hoc Bonferroni; p<10−4). Confirming the difference between patterns of task-driven responses within these areas, we also found a significant ‘brain area’ × ‘face type;’ interaction (F(4, 64)=4.26, p<0.01). We also found a significant ‘brain area’ × ‘brain hemisphere’ interaction (F(2, 32)=4.02, p=0.02), reflecting a stronger response in right relative to left FFA and OFA, and a weak opposite tendency in AT. Other effects and interactions were insignificant (p>0.20). Consistent with these results, separate application of two-factor repeated measures ANOVA (‘face type’ and ‘brain hemisphere’) to AT, FFA and OFA activity showed a significant effect of ‘face type’ in AT and FFA (F(2, 32)>4.25), p<0.02) but not in OFA (F(2, 32)=1.00, p=0.37). The effect of ‘brain hemisphere’ was significant in FFA and OFA (F(1, 16)>7,44), p<0.03) but not significant in AT(F(1, 16)=2.14), p=0.16). In all three brain areas, the interaction of ‘face type’ × ‘brain hemisphere’ was not significant (F(2, 32)<1.17), p>0.32).
When comparing the pattern of response in AT, FFA and OFA (Figure 7A–C) to the three face types, we found that the response pattern in AT (i.e. normal faces > contrast reversed faces > inverted faces) was most similar to the pattern of subjects’ response accuracy across face types (Figure 7D), compared with that in FFA or OFA. This qualitative correspondence of behavior to fMRI was further confirmed based on a linear regression model. In this model we used subjects’ response accuracy (across three face types) as dependent factors, and FFA, AT and OFA activity as independent factors. Since the effect of hemisphere was considerable in FFA and OFA, we applied separate tests to activity measured in left and right hemispheres. Such results showed that AT activity may account for the variation of response accuracy across face types, in both hemispheres (Table 3). Unlike the activity in AT, the activity in FFA and OFA did not significantly affect the model. The same result was also found based on the activity averaged across the two hemispheres.
Since the ROIs for AT were defined differently compared to those in FFA (see Methods), we conducted a further control analysis to confirm that this difference did not affect the overall results. As in AT, we defined a ROI in FFA based on odd versus even runs (Supp. Figure 8). This re-analysis did not change the results: we still found a significant effect of brain area (F(2,32)=27.78, p<10−4) and a significant ‘face type’ × ‘brain area’ interaction (F(4, 64)=5.24, p<0.01) which confirmed that the profile of activity varied between these three brain areas. Again, we also found a significant ‘brain area’ × ‘brain hemisphere’ interaction (F(2,32)=5.21, p<0.01). Other effects and interactions remained insignificant (p>0.20). Also, repetition of this test based on FFA activity measured within alternative ROIs (defined similar to AT) showed similar results (Supp. Table 1).
This study shows that the face recognition-related activity detected previously in FFA also extends anteriorly to area AT. The general presence of recognition-related activity in anterior temporal cortex has been described previously (Sergent et al., 1992; Price et al., 1996; Allison et al., 1999; Kriegeskorte et al., 2007). The present study clarified and revealed several additional features. First, activity increases during face recognition are concentrated in specific small patches in anterior temporal cortex, including the anterior tip of the collateral sulcus. Secondly, face inversion and contrast reversal produce differential effects on the amplitude of task-driven activity within the anterior and posterior face patches (FFA and OFA), and the pattern of activity in AT showed a better correlation with the accuracy of face recognition. Third, although subject’s performance was severely impaired by inversion and contrast reversal, those image transformations did not change the brain regions activated during such perception; recognition of inverted and contrast reversed faces still activated face-selective areas.
One notable site activated by our facial recognition task was the anterior temporal face patch (AT). This area was first demonstrated using fMRI in awake behaving monkeys (Tsao et al., 2003; 2008; Rajimehr et al., 2009; Pinsk et al, 2009; Ku et al., 2011; Nasr et al 2011), based on sensory comparisons alone (faces versus. scenes, objects, etc.) while monkeys performed a fixation-only task. A recent fMRI study (Rajimehr et al., 2009) confirmed a presumptive homologue of AT in human cortex, using identical sensory stimulus comparisons and a cross-species cortical mapping transformation.
However, previous studies could not localize human AT precisely, due either to a limited number of subjects (Tsao et al., 2008), weak evoked activity (Tsao et al., 2008; Rajimehr et al., 2009) or analysis limitations (Kriegeskorte et al., 2007). Thus, none of those studies could measure the reliability of AT activation. Here we showed that AT is consistently and reliably localized in the anterior terminus of the collateral sulcus (section 3.6 and Figure 3). This localization was confirmed by comparison of activity maps between different experimental conditions, based on either fixed (Figures 3) or random effects (Supp. Figure 2) analysis, and also between odd and even runs (Supp. Figure 6). In all of these tests, the location of AT remained essentially constant. Moreover, this localization was also confirmed in individual subjects (Supp. Figure 7). The consistent location of AT in the individual and group-averaged map was especially remarkable, because the relatively small size of AT (~8 mm at the thresholds used here) makes it more vulnerable to misalignment across the individual cortical maps. In all these respects, AT appeared to be a distinct, stable, cortical area, like those in lower level visual cortex.
We tested the selectivity of AT for face recognition by scanning subjects during an analogous house recognition task. Variations in house stimuli were confined to feature size and configuration, analogous to the variations in the face images (Figure 1). Although house recognition evoked a very strong response, that house-driven response did not propagate into area AT. Based on this comparison, AT activity is most selective for faces. However, further tests with different stimuli/tasks will help to resolve this point.
The recognition task for normal faces revealed an additional string of patches located between FFA and AT, in both hemispheres. The number and exact location of these intermediate patches (e.g. the distance of each patch from FFA and AT) varied from one subject to another (Supp. Figure 7). However, the location of these intermediate patches remained constant across even-versus-odd measures of facial recognition, indicating that such patches were not statistical noise (Supp. Figure 6). Moreover, the intermediate patches found in the fixed effect group maps (n=17) also survived random effects analysis (Supp. Figure 2A–B), suggesting that such patches are present in additional populations.
This string of intermediate patches produced by our recognition task (demonstrated in both group (Figure 3) and individual (Supp. Figure 7) maps) may be related to the patches described previously based on purely sensory contrasts in humans (Tsao et al., 2008; Pinsk et al., 2009; Rajimehr et al., 2009; Weiner and Grill-Spector, 2010). As in the present data, those previously described patches were located on the anterior portion of the fusiform gyrus, anterior to FFA. It is difficult to accurately compare these patches across studies for several reasons. First, as described above, the intermediate face patches here were based on task differences rather than sensory differences, as in the previous studies. Secondly, the location of the intermediate patches was variable across subjects, along the length of the fusiform gyrus. With these caveats, the location of the intermediate patch(es) in our study best matched the location of FFA-2 in Pinsk et al., 2009), and the middle fusiform face patch in Weiner and Grill-Spector, 2010). Comparing the present data to the two individual hemispheres labeled in Tsao et al., (2008) revealed an anterior-to-posterior shift in the functional labeling, respectively, the intermediate patches reported here best matched the location of FFA in Tsao et al., (2008). Similarly, a more posterior patch in Tsao et al 2008 was designated OFA, despite the fact that it was located clearly on the fusiform gyrus (Weiner and Grill-Spector, 2010; Pitcher et al., 2011b).
Previous psychophysical studies have suggested two alternative theories about how image-transformed (i.e. inverted and/or contrast reversed) faces are processed in the brain. One theory suggests that face transformations such as inversion produce qualitative changes in perception, so that these transformed faces are processed either outside face-selective areas, like non-face objects (Farah et al., 1998; Rhodes et al., 2006) - or depend on activity within non-face selective areas, at least partly (Moscovitch et al., 1997; Pitcher et al., 2011). An alternative theory suggests that face image transformations do not qualitatively change the underlying neural processes. Instead, such transformations simply add more noise to the process, which reduces recognition accuracy (Valentine et al., 1991).
Here, we found that the pattern of recognition-related activity remained similar between normal, inverted and contrast reversed faces; it was essentially confined to face-selective areas FFA and AT at the thresholds used here (p<10−7). Although this approach revealed no activity outside face-selective areas, it is possible that the transformed faces evoked weak (sub-threshold) activity outside face-selective areas. To the extent that this is the case, our data cannot rule out the possibility that inverted and contrast reversed faces are also processed to some extent outside face-selective areas. Our conclusion here is that the reduced activity in AT and FFA to inverted and contrast reversed faces largely supports the second theory described above: such manipulated faces are processed at a lower signal/noise ratio, compared to normal faces (Valentine et al., 1991), resulting in lower response accuracy.
In our data (Figure 7), the task-driven activity in FFA decreased significantly in response to the contrast reversed faces compared to normal contrast faces, showing less sensitivity to face inversion. This variation in recognition-related activity is generally consistent with sensory variations reported earlier in FFA, for both inverted and contrast reversed faces. Specifically, published reports have consistently shown decreased activity in FFA to contrast-reversed faces (George et al., 1999; Gilad et al., 2009; Yue et al., 2011). Similarly, most previous reports did not show a decrease in activity for inverted faces relative to normal faces (Haxby et al., 1999; Kanwisher et al., 1998; Aguire et al., 1998 but see also Yovel and Kanwisher, 2005; Rossion 2009). Reasons for the contradictory fMRI reports are not yet clear (for discussion see Yovel and Kanwisher 2005 and Rossion 2009) and factors other than sensory input (e.g. attention and task) may play a significant role in reducing the FFA responses to inverted faces.
In contrast to the activity profile in FFA, we found that AT activity decreased significantly in response to face inversion, and showed less dependence on face contrast. In addition, AT showed relatively stronger task-related modulation compared to FFA and OFA (section 3.7), plus a significant relationship with subjects’ response accuracy. Consistent with the anterior-posterior location of these areas, all this functional data suggests that (compared to FFA and OFA) AT contributes more directly to the final stages of face processing, which may also include holistic face encoding. This interpretation is consistent with the results from single cell recordings in non-human primates showing a robust representation of facial identity in the anterior face patch, which is considered homologous to human AT (Freiwald and Tsao, 2010). By comparison, in the more posterior face patches (homologous to human FFA), even specific non-face stimuli (e.g. oval shaped stimuli with a feature contrast) could evoke strong activity (Tsao et al., 2006; Ohayon et al., 2012).
The existence of recognition-driven activity in FFA and AT is generally consistent with previous studies showing face-selective activity in these areas, based on sensory-driven responses (e.g. Kanwisher et al., 1997; Tsao et al., 2008; Rajimehr et al., 2009). Grill-Spector and colleagues showed further that FFA activity is stronger during correct (rather than incorrect) recognition of faces (Grill-Spector et al., 2004), but they did not measure activity in then-unknown AT. Using a different approach, our results clarify previous findings by showing that face recognition-driven activity is not confined to FFA (Grill-Spector et al., 2002); instead it extends to a discrete face-selective site in the anterior temporal lobe (i.e. AT).
Although we showed that the difference in task difficulty between dot location and face recognition cannot account for the recognition-related activity reported here (see section 3.2), it can be argued that object-based attention (as a part of the face recognition process) may contribute to the evoked responses. However, this possible involvement of object-based attention in the recognition process does not weaken our main conclusions that: 1) the pattern of FFA and AT responses differed significantly from each other during a face recognition task, and 2) the pattern of task driven activity in AT is more similar to the pattern of subjects’ recognition accuracy across face types.
The current data showed that the level of task-related activity modulation in AT is larger than the level of task-related activity modulation in FFA and/or OFA. Since BOLD changes in AT can be diminished due to the susceptibility effect in nearby petrous skull bone (Ojemann et al., 1997), incorporation of a simple task of recognition can increase the contrast to noise ratio, thus helping to detect AT in group-averaged (Figure 3 and Supp. Figure 2) and individual subjects (Supp. Figure 7) in future studies.
Our findings, and much existing data, support a straightforward model: faces are partially extracted from non-face stimuli in FFA; then that information is sent to AT and subsequent areas such as prefrontal (Heekeren et al., 2004) and/or orbitofrontal (Bar et al., 2006) cortex) for further analysis. Presumably the ascending connections from FFA to AT are matched by reciprocal projections from AT back to FFA; such reciprocal projections are almost universally present throughout the brain (Felleman and Van Essen 1991). The role of the intermediate face patches lying between FFA and AT is not yet clear; perhaps their function is correspondingly intermediate between that of FFA and AT. Of course, many additional brain areas must be also involved in facial recognition (e.g. OFA, LOC, prefrontal cortex); this model is limited to the roles of these temporal cortical areas.
This model is consistent with the empirical findings from recent fMRI studies of facial recognition. Grill-Spector et al. (2004) reported stronger activation in FFA during trials in which famous faces were correctly recognized, compared with incorrect trials. However, that study did not sample fMRI activity from more anterior temporal cortex, including AT. MVPA analysis has been used to show identity encoding either in anterior temporal cortex (Kriegeskorte et al., 2007) or in both anterior temporal cortex and FFA (Nestor et al., 2011). That MVPA data is consistent with the present data, although those studies did not precisely localize a specific face-selective region in anterior temporal cortex (i.e. AT). Moreover, those studies did not test for a correlation between AT activity and subjects’ recognition accuracy. Finally, neither of those MVPA studies reported recognition-related amplitude modulation anywhere in the brain, perhaps reflecting the low task demand (Kriegeskorte et al., 2007) and/or decreased sensitivity in the event-related design (Nestor et al., 2011; Kriegeskorte et al., 2007), compared to the block design (used here).
This model (and our results) are consistent with clinical studies of prosopagnosia, a syndrome in which patients cannot recognize familiar faces (Bodamer, 1947; Damasio et al., 1990). Previous studies of acute prosopagnosia reported deficits in facial recognition following damage to human temporal cortex that apparently included FFA and/or AT (Damasio et al., 1990). Some fMRI studies reported little or no activity in FFA in patients with developmental (Furl et al., 2011), early-onset acquired (Hadjikhani and de Gelder, 2002) or congenital (Bentin et al., 2007) prosopagnosia compared to control subjects. However other studies maintained that prosopagnosia can be accompanied by normal FFA activity (Marotta et al., 2001; Behrmann and Avidan, 2005; Steeves et al., 2006). According to the above model, damage to either FFA or AT (and/or the reciprocal connections between these two areas) could produce prosopagnosia, in part because of the reciprocal connections. However, our model also suggests that AT is necessarily activated during (successful) face recognition, unless faces are differentiable on the basis of lower level features (Rotshtein et al., 2005). Extensions of this model could also include OFA, to the extent that damage to OFA contributes to prosopagnosia.
The fact that AT activity covaried with subjects’ recognition accuracy across different experimental conditions (i.e. normal, inverted and contrast reversed faces) suggests a link between face recognition and evoked activity in AT (see Yovel and Kanwisher, 2005 for a similar approach in FFA). However, we were not able to compare evoked activity during correct versus incorrect recognition trials (Grill-Spector et al., 2004) due to the blocked design method used here. Also, the method that we used to adjust recognition accuracy (at 75% accuracy) reduced variability between subjects; therefore it was not possible to test the correlation between AT (or FFA) activity and subjects’ response accuracy within a given face type (rather than between face types; section 3.8). In future studies, a larger sampling may enable such predictability measurements of face recognition relative to AT activity.
Supp. Table 1- Parameters of a regression model to predict recognition accuracy across face types based on variation of AT, OFA and FFA activity*
We thank Xiaomin Yue, Garth Coombs and Ellen Lau for help with data collection, and Jon Polimini for help with imaging protocols. We thank Daphne Holt for review and suggestions on the manuscript. This study was supported by National Institutes of Health (NIH Grants R01 MH67529 and R01 EY017081 to RBHT, the Martinos Center for Biomedical Imaging, the NCRR, the MIND Institute, and the NIMH Intramural Research Program.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.