PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Nat Neurosci. Author manuscript; available in PMC Feb 10, 2010.
Published in final edited form as:
PMCID: PMC2819705
NIHMSID: NIHMS173109
A face feature space in the macaque temporal lobe
Winrich A Freiwald,1,2,3,4,6 Doris Y Tsao,1,2,3,5,6 and Margaret S Livingstone1
1Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, USA.
2Centers for Advanced Imaging & Cognitive Sciences, Bremen University, Bremen, Germany
3Division of Biology, California Institute of Technology, Pasadena, California, USA.
4The Rockefeller University, New York, New York, USA.
5Athinoula A. Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, USA.
Correspondence should be addressed to W.A.F. (winrich.freiwald/at/googlemail.com)
6These authors contributed equally to this work.
The ability of primates to effortlessly recognize faces has been attributed to the existence of specialized face areas. One such area, the macaque middle face patch, consists almost entirely of cells that are selective for faces, but the principles by which these cells analyze faces are unknown. We found that middle face patch neurons detect and differentiate faces using a strategy that is both part based and holistic. Cells detected distinct constellations of face parts. Furthermore, cells were tuned to the geometry of facial features. Tuning was most often ramp-shaped, with a one-to-one mapping of feature magnitude to firing rate. Tuning amplitude depended on the presence of a whole, upright face and features were interpreted according to their position in a whole, upright face. Thus, cells in the middle face patch encode axes of a face space specialized for whole, upright faces.
Viewing the world, we are confronted by myriad visual objects. How does the brain extract these objects from the incoming bits and pieces of information? The representation of an object (as opposed to a spot, edge or smear of color) must involve a mechanism for representing its gestalt. What is the neural mechanism by which curves and spots are assembled into coherent objects? And how does the brain preserve fine distinctions between individual objects throughout this process?
We have a good understanding of how edges, a form common to all objects, are coded by cells in area V1 (ref. 1), but the mechanisms by which the brain analyzes shapes at the next level are less understood. One major experimental difficulty is that there are so many different forms and no clear approach to choosing one set of forms over another for testing each cell. It is clear, however, that any study of object recognition must employ a restricted set of all possible forms. The challenge, then, is to find a way to constrain the stimulus space by incorporating prior knowledge about the cells’ stimulus preferences.
Functional magnetic resonance imaging (fMRI) provides a solution to this challenge2. Using fMRI in macaque monkeys, we found a cortical area in the temporal lobe that is activated much more by faces than by nonface objects3. Subsequent single-unit recordings showed that this area, the middle face patch, consists almost entirely of face-selective cells4. Targeting single-unit recordings to this area provides a powerful strategy for dissecting the mechanisms of high-level form coding in a homogeneous population of cells that are selective for a single type of complex form.
The space of faces still contains an infinite variety of particular forms (as it must for face perception to be useful). An effective strategy to further reduce the stimulus space is to represent faces as cartoons5. This approach has several justifications. First, the nameable features making up a cartoon (eyes, nose, etc.) correspond to the brightness discontinuities of real faces6, and thus approximate the representation relayed by early visual areas. Second, a cartoon face is clearly perceived as a face, and cartoons therefore effectively convey the overall gestalt of a face. Thus, cartoons constitute appropriate stimuli for studying the neural mechanisms of face detection. Third, cartoons convey a wealth of information about individual identity and expression through both the shape of individual features (for example, mouth curvature), and the configuration of features (for example, inter-eye distance). Therefore, cartoons constitute appropriate stimuli for studying the neural mechanisms of face differentiation. Finally, cartoon shapes can be completely specified by a much smaller set of parameters than would be required to specify individual pixel values of images of real faces, thus simplifying analysis. For all these reasons, cartoons provide a powerful and effective way of simplifying the space of faces.
We asked how cells in the middle face patch detect and differentiate faces. We used fMRI to localize the middle face patch and then targeted it for single-unit recording. We first measured responses of cells to photographs of faces and other objects; the results of this test confirmed the selectivity of the middle face patch for faces. We next measured the responses of these cells to pictures of both real and cartoon faces and found that responses to cartoon faces were comparable to those of real faces. Armed with this knowledge, we then probed cells with systematically varying cartoon faces to address three fundamental questions: what is the mechanism for face detection, what is the mechanism for face differentiation and what role, if any, does facial gestalt have in face differentiation.
Selectivity for real and cartoon faces
We determined the locations of the middle face patches in the temporal lobes of three macaque monkeys with fMRI (Fig. 1a) and then targeted one middle face patch in each monkey for electrophysiological recordings. For every cell that we recorded (286 total), we first determined the face selectivity of the cell by measuring its response to images of 16 frontal faces, 64 nonface objects and 16 scrambled patterns (Fig. 1b; examples of stimuli are shown in Supplementary Fig. 1). Across the population, 94% of the cells were face selective (Fig. 1b).
Figure 1
Figure 1
Selectivity of the middle face patch for real and cartoon faces. (a) fMRI-defined middle face patches (P < 10−4) shown on coronal slices (millimeters anterior to inter-aural line indicated in bottom left) for the three monkeys used in (more ...)
We then compared the responses of middle face patch neurons to cartoon faces and to real faces. We recorded responses of 66 cells to images of 16 real faces, 16 nonface objects, 16 cartoon faces and 16 isolated parts of cartoon faces. The cartoon faces were constructed from seven elementary parts (hair, face outline, eyes, irises, eyebrows, mouth and nose) whose shape and position were matched to those in the real faces (examples are shown in Supplementary Fig. 1). Across the population, the mean response magnitude to cartoon faces was 83% of the response to real faces, whereas the mean response to nonface objects was 17% and the response to cartoon parts was 24% of the response to real faces (Fig. 1c). Response ranges to real and cartoon stimuli were largely overlapping; all but one cell responded more to at least one cartoon face than to one of the real faces, and a cartoon face elicited the best or second best response in 45% of the cells. Furthermore, the selectivity of cells for real and cartoon faces was correlated (r = 0.62, P < 0.001) and response time courses were similar (mean correlation across cells, r = 0.90, P <0.001; Fig. 1c). Thus, although cartoon faces lack many of the details found in real faces, such as pigmentation, texture and three-dimensional structure, they constitute effective substitutes to middle face patch neurons. Therefore it is appropriate to use cartoon stimuli to probe the detailed mechanisms of face representation by these neurons (in addition, Supplementary Fig. 1 provides psychophysical data that our cartoon stimuli successfully captured essential aspects of face identity).
Face detection: selectivity for face parts
Cartoon faces can easily be decomposed into parts (without introducing additional edges, as would be the case with cutting up images of real faces) and therefore are ideally suited for studying the mechanisms of face detection. We presented a set of all 128 (27) possible decompositions of a seven-part cartoon face to 33 middle face patch neurons (Fig. 2a; stimuli are shown in Supplementary Fig. 1). ANOVA revealed that, across the population, cells were directly influenced by at least one, and at most four, face parts (Fig. 2b). This first order effect explained half of the response variance (52%) on average. In addition, a majority of cells (78%) showed significant pair-wise interactions between their part responses (P < 0.005; Fig. 2b); these second order effects explained an additional 18% of the variance. This dependence of responses on multiple parts and part interactions shows that middle face patch neurons are not simple feature detectors. However, because 70% of the response variance was explainable by first and second order effects alone, middle face patch cells are not highly nonlinear holistic cells either.
Figure 2
Figure 2
Selectivity for face parts. (a) Stimulus conditions for the cartoon face decomposition experiment (left) and responses of four example cells. All combinations of seven face parts (hair, outline, irises, eyes, eyebrows, nose and mouth) were shown, including (more ...)
Notably, middle face patch neurons did not have a single best stimulus that uniquely elicited the maximum firing rate. In particular, the response magnitude to whole cartoon faces was only 42% of that of the summed response to the seven face parts on average (Fig. 2c; see ref. 8). As a consequence, the same cell often fired at its maximum rate to both the whole face and to a variety of partial faces (Fig. 2d), a property that is useful for face detection.
Face differentiation: encoding of facial features
In the previous experiment, we determined selectivity for the presence of various face parts. We next investigated selectivity for the geometric shape of various face parts (for example, nose width). For this purpose, we used the same cartoon face stimulus described above (comprising seven parts), but now specified the geometry of parts and part relations by 19 different parameters, each with 11 values (six of these parameters are illustrated in Fig. 3a and all 19 parameters are listed in Fig. 3b and are defined in the Online Methods). The stimulus was presented in rapid serial visual presentation mode, with each parameter being updated randomly and independently every 117 ms (Supplementary Video 1). This approach allowed us to probe in detail the mechanisms by which cells distinguish different faces.
Figure 3
Figure 3
Cartoon tuning. (a) Example cartoon stimuli for six dimensions (hair width, eyebrow slant, eyebrow height, inter-eye distance, iris size and mouth top shape) with seven feature values each, spanning the entire range of values. (b) Tuning curves of three (more ...)
We first asked whether cells in the middle face patch are tuned to simple facial features. For each stimulus dimension, we computed a time-resolved tuning curve (see Online Methods and Supplementary Fig. 2) that captures how feature values (−5 to +5) influence the firing rate as a function of poststimulus time (0–400 ms). Comparing each time-resolved tuning curve to a distribution of shuffle predictors allowed us to determine whether a given feature dimension exerted a significant influence on the spiking of a cell (P < 0.001, see Online Methods). Tuning started as early as 75 ms after feature change. The time window over which tuning occurred overlapped with the time window of face selectivity, although the latter (Fig. 1d) was determined with a very different stimulus presentation regime (Supplementary Fig. 3).
Individual cells were modulated by different subsets of features (all 19 tuning curves for three example cells are shown in Fig. 3b). Across the population of 272 cells recorded in this experiment, 90% showed tuning to one or more stimulus dimensions (Fig. 3c). Individual cells were tuned to between one and eight stimulus dimensions (on average, 2.8 dimensions for cells that showed any tuning at all). Thus, each cell was specialized for a small number of feature dimensions. We did not find cells tuned to all aspects of a face.
Some features were represented more frequently in the population than others (Fig. 3d). The most popular parameter was face aspect ratio, to which more than half the cells (59%) were tuned, followed by iris size (46%), height of feature assembly (39%), inter-eye distance (31%) and face direction (27%). Thus, the most important feature categories were facial layout geometry and eye geometry; the feature categories mouth and nose were represented by five cells only. This pattern of results was robust and observed for different fixation conditions (Supplementary Text 1 and Supplementary Fig. 4). The two most prominent dimensions were associated with the largest and the second smallest physical differences between feature values (face aspect ratio and iris size, respectively). Thus, the incidence of tuning was not directly related to the magnitude of physical change associated with each dimension, but the two were positively correlated overall (Supplementary Text 2 and Supplementary Fig. 5). The low incidence of tuning for entire feature categories (mouth and nose) leads to a reduction in the dimensionality of the face space coded by the middle face patch.
Face differentiation: shape of feature tuning
In our example tuning curves (Fig. 3b), seven out of ten of the significantly modulated tuning curves were strictly ramp shaped, with a maximal response at one extreme of the feature range and a minimal response at the opposite extreme. Such ramp-shaped tuning dominated across the population as well. We sorted all of the significantly modulated tuning curves by the feature value that elicited the maximal response (Fig. 4a). Most tuning curves (62%) peaked at an extreme feature value. The distribution of maximal response values was so highly skewed to the extremes that tuning to the most extreme values was sixfold more frequent than tuning to the physically similar second-most extreme values. These extreme values extended to or even transgressed the limits of realistic face space. For example, inter-eye distances ranged from almost cyclopean to abutting the edges of the face, and the most extreme face aspect ratios were outside those of any living primate. Preference for ramp-shaped tuning was found in all feature dimensions (Fig. 4b). For most feature dimensions, both ends of the feature range were well represented, except for eyebrow slant and iris size; tuning to the popular parameter iris size was biased to the maximal iris size, with 33% of all recorded middle face patch neurons responding maximally to the largest irises.
Figure 4
Figure 4
Preference for extreme feature values. (a) For each monkey, (left) all significantly modulated tuning curves are shown (area normalized, see Online Methods), sorted from top to bottom by feature value eliciting maximal response, (right) frequency distributions (more ...)
Extreme feature values also suppressed activity more than intermediate ones did; 76% of all tuning curves exhibited minima at extremes, 14 times as many as for other feature values (Fig. 4a). Because each feature extreme elicited maximal responses in some cells and minimal ones in others, the population response to a set of face characteristics was amplified for extreme compared with intermediate feature values (Fig. 4c). Such amplification predicts that caricatured representations should be distinguished more easily than average faces, near the center of face space7,8.
The bias for response minima and maxima at extreme feature values was characteristic not only of the population of cells, but also of individual cells’ tuning curves. Two thirds (67%) of tuning curves with a maximum at one extreme showed the minimum at the opposite extreme (Fig. 4d). These tuning curves were, on average, almost linear in shape (Fig. 4d), thereby establishing a one-to-one mapping between the entire range of feature values and firing rate. Figuratively speaking, these cells measure feature dimensions.
Because response minima only rarely occurred near the midpoint of the cartoon face space, adaptation to the average face cannot account for the shape of these tuning curves. This is in contrast with the case of face-responsive (not necessarily face selective) cells anterior to the middle face patch, which have been reported to respond minimally (or maximally) to the average face9. We also tested for possible short-term adaptation effects or coding of feature changes rather than coding of features per se. It would be possible that a response is maximal to a feature extreme because extremes are preceded, on average, by the largest physical changes and not because extremes are special shapes. We performed a two-way ANOVA to test for interactions between responses to successive feature values of the same dimension. Of the 514 dimensions tested, only seven showed a significant interaction between subsequent feature values (ANOVA, P < 0.05). Thus, we do not find evidence for adaptation or change magnitude being involved in generating feature tuning.
Face differentiation: frequency of feature combinations
The typical middle face patch neuron is tuned to approximately three feature dimensions. Such conjunctive feature representation can aid in solving the binding problem10, as cells tuned to overlapping feature constellations can uniquely represent a particular face. To achieve this computational goal, the population of neurons should create all possible conjunctions of feature tuning. Because cells were tuned to 14 of the 19 stimulus features, there were C214 = 91 feature combinations possible. We found only one feature combination that occurred more often than would be expected by chance combinations (8 times, compared with a P = 0.01 significance threshold of 7.7, details in Online Methods). Thus the population approaches maximal coverage of feature combinations even across different face parts. Furthermore, even neighboring cells often encoded different feature combinations (Supplementary Text 3 and Supplementary Fig. 6).
We next asked how middle face patch neurons integrate different features. If feature integration is to preserve faithful measurement of individual features, then tuning to feature combinations should be separable into tuning to individual features11. Joint tuning functions computed by multiplying single-feature tuning curves were almost identical to the actual joint tuning functions (correlation coefficients between 0.90 and 0.97; Fig. 5a). In our sample of 771 pairs of significantly modulated marginal single-dimension tuning curves, the average correlation coefficient between the actual joint tuning functions and multiplicative predictors was 0.89 (an alternative analysis of separability is provided in Supplementary Text 4 and Supplementary Fig. 7). This result does not imply, however, that feature combination was precisely multiplicative: prediction of joint tuning functions by additive predictors was almost as good as that by multiplicative predictors (average correlation coefficient of 0.88, which was significantly lower (P < 0.01) than multiplicative predictors, Wilcoxon rank sum test; Fig. 5b). Separable joint tuning allows for faithful readout of individual feature values and confirms the importance of feature representation in the middle face patch.
Figure 5
Figure 5
Joint tuning to feature dimension pairs. (a) Left, six joint tuning functions of an example neuron tuned to four features (tuning curves shown in marginals) and (right) predicted joint tuning functions based on a multiplicative model (correlation coefficients (more ...)
Mechanisms for holistic processing: facial context
Face recognition has been characterized as holistic in that the face is processed as whole unit without breakdown into parts or part relations12,13. What role, if any, does the facial whole have in the representation of face identity in the middle face patch? We addressed this question in two experiments that are similar to two classical psychophysical tests of holistic processing. In the first, we tested how coding of individual facial features depended on the presence or absence of the rest of the face. This experiment was motivated by the finding that humans recognize parts of faces best when they are embedded in a whole face13. In the second experiment, we tested the effect of face inversion on tuning. This was motivated by the fact that humans recognize a facial feature embedded in an upright face better than the same feature embedded in an inverted face14.
For the first experiment, we measured cartoon tuning in 49 neurons under three different contexts. In context 1 (the original tuning experiment), all 19 dimensions were simultaneously varied and a single dimension showing significant tuning was identified. In context 2, this one feature dimension was then varied while all others were kept constant (at their mean value). In context 3, the same feature dimension was varied, but all nonvarying face parts were removed (tuning of five example cells measured under these three contexts is shown in Fig. 6a; Supplementary Videos 2–4). The shape of the tuning curves determined in contexts 1 and 2 were similar (r = 0.93, P [double less-than sign]0.001). In the absence of the rest of the face (context 3), 14 cells (29%) lost any significant tuning. The tuning curves of the remaining 35 cells that kept their tuning were similar in shape across all three contexts (mean r = 0.93, 0.66 and 0.66 between contexts 1 and 2, 2 and 3, and 1 and 3, respectively, P [double less-than sign]0.001; Fig. 6b,c). The gain of tuning curves, however, was bigger when the whole face was present (average gain ratio of 2.2 for context 2 versus 3; Fig. 6c). In addition, firing rates were higher when the whole face was shown (14.7 Hz in context 2 versus 9.3 Hz in context 3). These results indicate that the overall gestalt of a face exerts an influence on tuning to individual feature dimensions by gain modulation; when the whole face is present, gain increases and tuning becomes more robust, whereas the shape of the tuning curve remains constant. A further prediction of a gain modulation model is the separability of joint tuning functions (for which we present evidence above).
Figure 6
Figure 6
Integration of features and effects of face context. (a) Time-resolved tuning curves from five cells (left to right) showing the effect of three different contexts (top to bottom) on face feature tuning. The top row shows the responses to varying a single (more ...)
Mechanisms for holistic processing: face inversion
What brings about this gain modulation? Is it the particular gestalt of a face or simply the presence of other features in close proximity? Face inversion offers a unique opportunity to address this question, as upright and inverted stimuli are similar at the feature level, but substantially different in their overall gestalt1416. Our cartoon stimuli are especially well suited for studying inversion effects because face inversion does not physically alter facial layout, eye- and eyebrow-related features, but merely flips the order of feature values. For example, raised eyebrows turn into lowered eyebrows and vice versa. Nose, hair and mouth, on the other hand, are physically changed by inversion.
We tested 48 neurons to rapid serial visual presentation stimulus sequences of upright and inverted cartoon stimuli. We found a 25% reduction in the incidence of tuning with inversion (131 significantly modulated tuning curves for upright faces and 98 for inverted faces), which affected all important feature dimensions (Fig. 7a). However, there were two notable exceptions to the general reduction; tuning to eyebrow parameters was not just reduced, but was lost entirely with inversion; and substantial tuning to mouth related parameters emerged de novo (both significant at P = 0.01 using bootstrapping controls). Because mouth and eyebrow features remained physically identical on inversion, it must have been the change of their placement inside the face with inversion that caused loss of tuning to the former and emergence of tuning to the latter. If middle face patch cells match the incoming stimulus against an upright face template, this would parsimoniously explain the emergence of tuning to mouths (as they appear at locations expected for eye- and eyebrow-related features), the loss of tuning to eyebrows (resulting from mismatch between actual and expected position) and the general decrease in tuning to the other features (resulting from partial mismatch with expected position) on inversion. The template hypothesis makes a further prediction for how inversion should change the shape of tuning:
Figure 7
Figure 7
Face inversion and feature tuning. (a) Distributions of number of significantly tuned dimensions per cell for upright (black, dark gray) and inverted (light gray) cartoon stimuli (data are presented as in Fig. 3d). (b) Tuning curve of an example neuron (more ...)
When inversion merely flips the order of feature values, the tuning curve to that feature should flip as well. We examined a neuron tuned to height of feature assembly (Fig. 7b). This neuron responded maximally to the feature assembly located near the chin inside an upright face and to the feature assembly located near the forehead inside an inverted face. In both cases, this neuron preferred the feature assembly toward the bottom (in gravitational terms) of the facial outline. In the population, we found the predicted negative correlation between the shape of tuning to height of feature assembly for upright and inverted faces (r = −0.50; Fig. 7c,d), On the other hand, when inversion leaves both the ordering and physical appearance of a feature unchanged, tuning curves to the feature should stay unchanged on inversion. Indeed, we found robust positive correlations between tuning curve shapes for upright and inverted faces to the popular features of face aspect ratio, face direction, inter-eye distance and iris size (Fig. 7d).
In this study, we took advantage of the rich, but simple, face code supplied by cartoon faces to probe strategies for face detection and differentiation in the middle face patch. By studying this problem in a small cortical volume, we identified new coding principles that may be of general importance to the extraction of complex form in inferotemporal cortex.
Cells in the middle face patch detect a wide range of faces, as evidenced by their vigorous responses to both real and cartoon faces compared with objects (Fig. 1c,d). However, different cells accomplish this by different means. No cell required the presence of a whole face to respond, indicating that the detection process is not strictly holistic. Instead, responses to systematically decomposed cartoon faces showed that different cells were selective for different face parts and interactions between parts (Fig. 2a,b), and even the same cell can respond maximally to different combinations of face parts (Fig. 2d). Thus, there is no single blueprint for detecting the form of a face in the middle face patch.
The mechanism for distinguishing between individual faces appears to rely on a division of labor among cells tuned to different subsets of facial features. This was revealed by dense parametric mapping17; responses were measured to a cartoon stimulus in which all of the face parameters were independently varied. Tuning to individual features was almost universal (found in 90% of cells) and each cell was tuned, on average, to only three feature dimensions (Fig. 3c), which were integrated in a separable manner (Fig. 5a). Together, cells in the middle face patch span a face space18,19 with three salient characteristics. First, the axes of the space, represented by the tuning curves of individual cells, correspond to basic face features and not to holistic exemplars. Second, the dimensionality of the space is reduced compared with the physical face space20 (Fig. 3d) and the population focused on features related to the eye and face layout geometry. Finally, the location in the space is coded predominantly by the firing rates of cells with broad, monotonic tuning curves (Fig. 4d). A majority of tuning curves peaked at one extreme and showed a minimal response at the opposite extreme, and the dynamic range of tuning often spanned or slightly exceeded the range of physical plausibility. Monotonic tuning allows for simple readout21 and may be a general principle for high-level coding of visual shapes22,23. It may also aid in emphasizing what makes an individual face unique (that is, separates it from the standard face)7,2426, as population response variance is highest to such ‘unusual’ features (Fig. 4c). Finally, the breadth of tuning underscores the fact that cells in the middle face patch encode axes and not individual faces. This finding indicates that coding in the middle face patch is coarse, and is not sparse as it is at higher stages of the processing hierarchy27, and substantiates theoretical proposals that coarse population codes are advantageous for representing high-dimensional stimulus spaces28,29.
Psychophysicists have long proposed that perception of face identity has a holistic component12,13,30, in which a face is obligatorily processed as a whole. We found two lines of evidence for holistic processing of face geometry. First, we found that the presence of a whole, upright face increased the gain of feature tuning curves by an average factor of 2.2 (Fig. 6c). Our finding that holistic coding uses gain modulation underscores the idea that gain modulation may be a computational mechanism of general importance to cortical function31,32, even beyond coordinate transformations33,34 and for attention35,36. Second, by comparing responses to upright and inverted cartoon faces, we found that the identity of individual features is interpreted according to the heuristics of an upright face template (Fig. 7). These two results demonstrate specific neural mechanisms by which the presence of an upright facial gestalt influences feature measurement in single cells. In our experiments, cartoons were presented rapidly, putting the system to a test in a feedforward mode3739 in the sense that no expectations about the upcoming feature values could be formed. In real-world face perception, top-down feedback40 is important and may be necessary for additional effects of holistic processing.
We found a high incidence of tuning to some facial features, mostly to eyes and facial layout, and a paucity of tuning to others, mostly mouth and nose-related ones. It seems plausible that such a spatial bias of tuning preference in the face may be the result of attention or preferential looking rather than a computational strategy for face processing, as attention has been shown to augment feature tuning35,36,41. Several results are, however, incompatible with a spatial attention or preferential looking account of tuning biases. First, these accounts would predict stronger tuning to isolated face parts as a result of the absence of other potentially distracting visual stimuli (the rest of the face). However, we found the opposite (Fig. 6a). Second, preference for face aspect ratio over both internal features (nose, mouth, eyes and eye brows) and external ones (hair) can only be explained by a donut-shaped spotlight of attention and this would not cause preferential tuning for face direction or height of feature assembly, two popular parameters defined by the relative positioning of the internal features to the face layout. Third, the feature tuning bias occurred independently of slight gaze-direction biases above or below the fixation spot (Supplementary Text 1); for example, the preference for eye parameters remained even during fixations below the fixation spot. This rules out the possibility of a preferential looking account and renders a spatial attention confound unlikely, as spatial attention is tied to eye movements. Thus, preferential tuning for facial layout and eye parameters seems to be influenced little, if at all, by attention and eye positioning, but instead seems to be the result of computational mechanisms of shape analysis in the middle face patch.
For the same reasons, it seems unlikely that preferential representation of extreme feature values is a byproduct of attentional capture. Furthermore, the attentional capture account would predict response maxima for both extremes, that is, U-shaped tuning curves, because both ends of the shape spectrum are equally extreme shapes in most feature dimensions. Instead, we found that tuning curves were ramp shaped and even more response minima than maxima occurred at extremes. Similarly, the special status of extreme feature values cannot be explained by shape changes (whether attention capturing or not) rather than genuine shape preferences, as significant interactions between responses to successive feature values were found in less than 2% of all tuned feature dimensions (P < 0.05).
Our results expand existing conceptions about inferotemporal organization4244 in two major ways. First, it has been suggested that an IT cell can be characterized by its ‘critical feature’, defined as the simplest stimulus that still elicits a maximal response42. Our results suggest that such a characterization is incomplete and needs to be augmented by a description of the cell’s feature tuning and its full selectivity for parts and part interactions. Cells in the middle face patch are not only selective for the presence of subsets of face parts (Fig. 2), but also show tuning to subsets of face features (Fig. 3). The critical feature for a cell would be a face optimized along all dimensions to which the cell is tuned. However, knowing this single best image would not allow one to distinguish between features to which the cell is tuned, and parts that are simply required to be present (in whatever shape). Furthermore, the predominance of broad, ramp-shaped tuning suggests that all levels of response to a tuned feature, including minimal responses, are important (minimal responses are just as informative about what feature is present as maximal responses, see ref. 8 for a related idea). This notion, that all levels of response to a tuned feature are informative, is not included in the critical feature account of IT. On average, the response to a full face was less than the sum of the responses to each part and cells often fired maximally to different combinations of face parts (Fig. 2d). Therefore, an IT cell, at least in the middle face patch, is only incompletely characterized by a single critical feature; instead, it is necessary to describe all of the parts and part combinations for which the cell is selective.
The second major insight from our findings concerns the functional organization of IT. It has been suggested that cells selective for visually similar critical features are grouped into columns. Our results indicate that what cells in the middle face patch have in common is a strong preference for faces over other objects, but this preference is a true form selectivity that cannot be captured by common selectivity to any fixed visual feature. There was a marked diversity in part selectivity and feature tuning in the middle face patch, and the tuned features of two neighboring face cells often shared no visual similarity at all (for example, hair width versus eyebrow slant). This diversity of feature tuning provides the brain with a rich vocabulary to describe faces and shows how a high-dimensional parameter space may be encoded even in a small region of IT. The macaque temporal lobe contains three face patches anterior to the middle face patch, and future experiments may reveal how the vocabulary of the middle face patch is used by the anterior face patches.
METHODS
Methods and any associated references are available in the online version of the paper at http://www.nature.com/natureneuroscience/.
Supplementary Material
Suppl_text_figs
01
ACKNOWLEDGMENTS
We are grateful to the late D. Freeman and N. Nallasamy for stimulus programming; R. Tootell, W. Vanduffel and members of the Massachusetts General Hospital monkey fMRI group for assistance with scanning; A. Dale and A. van der Kouwe for allowing us to use their multi-echo sequence and undistortion algorithm; T. Chuprina and N. Schweers for animal care and Guerbet for providing Sinerem contrast agent. This work was sponsored by US National Institutes of Health grant EY16187 and German Science Foundation grant FR 1437/3-1. D.Y.T. was supported by a Sofja Kovalevskaja Award from the Alexander von Humboldt Foundation.
Footnotes
Note: Supplementary information is available on the Nature Neuroscience website.
Reprints and permissions information is available online at http://www.nature.com/reprintsandpermissions/.
1. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat's striate cortex. J. Physiol. (Lond.) 1959;148:574–591. [PubMed]
2. Kanwisher N, McDermott J, Chun MM. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 1997;17:4302–4311. [PubMed]
3. Tsao DY, Freiwald WA, Knutsen TA, Mandeville JB, Tootell RBH. Faces and objects in macaque cerebral cortex. Nat. Neurosci. 2003;6:989–995. [PubMed]
4. Tsao DY, Freiwald WA, Tootell RBH, Livingstone MS. A cortical region consisting entirely of face-selective cells. Science. 2006;311:670–674. [PMC free article] [PubMed]
5. Brunswik E, Reiter L. Eindruckscharaktere schematisierter Gesichter. Zeitschrift Fuer Psychologie. 1937;142:67–135.
6. Biederman I. Recognition-by-components: a theory of human image understanding. Psychol. Rev. 1987;94:115–147. [PubMed]
7. Rhodes G, Brennan S, Carey S. Identification and ratings of caricatures: implications for mental representations of faces. Cognit. Psychol. 1987;19:473–497. [PubMed]
8. Benson PJ, Perrett DI. Perception and recognition of photographic quality facial carricatures: implications for the recognition of natural images. Eur. J. Cogn. Psychol. 1991;3:105–135.
9. Leopold DA, Bondar IV, Giese MA. Norm-based face encoding by single neurons in monkey inferotemporal cortex. Nature. 2006;442:572–575. [PubMed]
10. Mel B, Fiser J. Minimizing binding errors using learned conjunctive features. Neural Comput. 2000;12:247–278. [PubMed]
11. Grunewald A, Skoumbourdis EK. The integration of multiple stimulus features by V1 neurons. J. Neurosci. 2004;24:9185–9194. [PubMed]
12. Young AW, Hellawell D, Hay DC. Configural information in face perception. Perception. 1987;16:747–759. [PubMed]
13. Tanaka JW, Farah MJ. Parts and wholes in face recognition. Q. J. Exp. Psychol. A. 1993;46:225–245. [PubMed]
14. Thompson P. Margaret Thatcher: a new illusion. Perception. 1980;9:483–484. [PubMed]
15. Yin RK. Looking at upside-down faces. J. Exp. Psychol. 1969;81:141–145.
16. Bartlett JC, Searcy J. Inversion and configuration of faces. Cogn. Psychol. 1993;25:281–316. [PubMed]
17. Brincat SL, Connor CE. Underlying principles of visual shape selectivity in posterior inferior temporal cortex. Nat. Neurosci. 2004;7:880–886. [PubMed]
18. Valentine T. A unified account of the effects of distinctiveness, inversion and race in face recognition. Q. J. Exp. Psychol. A. 1991;43:161–204. [PubMed]
19. McKone E, Aitkin A, Edwards M. Categorical and coordinate relations in faces, or Fechner’s law and face space instead? J. Exp. Psychol. Hum. Percept. Perform. 2005;31:1181–1198. [PubMed]
20. Fraser IH, Craig GL, Parker DM. Reaction time measures of feature saliency in schematic faces. Perception. 1990;19:661–673. [PubMed]
21. Guigon E. Computing with populations of monotonically tuned neurons. Neural Comput. 2003;15:2115–2127. [PubMed]
22. Kayaert G, Biederman I, Op de Beeck H, Vogels R. Tuning for shape dimensions in macaque inferior temporal cortex. Eur. J. Neurosci. 2005;22:212–224. [PubMed]
23. De Baene W, Premereur E, Vogels R. Properties of shape tuning ofmacaque inferior temporal neurons examined using rapid serial visual presentation. J. Neurophysiol. 2007;97:2900–2916. [PubMed]
24. Rhodes G. Superportraits: Caricatures and Recognition. East Sussex, UK: Psychology Press Publishers; 1996.
25. Webster MA, MacLin O. Figural aftereffects in the perception of faces. Psychon. Bull. Rev. 1999;6:647–653. [PubMed]
26. Leopold DA, O’Toole AJO, Vetter T, Blanz V. Prototype-referenced shape encoding revealed by high-level aftereffects. Nat. Neurosci. 2001;4:89–94. [PubMed]
27. Quiroga RQ, Reddy L, Kreiman G, Koch C, Fried I. Invariant visual representation by single neurons in the human brain. Nature. 2005;435:1102–1107. [PubMed]
28. Zhang K, Sejnowski TJ. Neuronal tuning: to sharpen or broaden? Neural Comput. 1999;11:75–84. [PubMed]
29. Eurich CW, Wilke SD. Multidimensional encoding strategy of spiking neurons. Neural Comput. 2000;12:1519–1529. [PubMed]
30. Galton F. Composite portraits, made by combining those of many different persons into a single, resultant figure. J. Anthropol. Inst. 1879;8:132–144.
31. Salinas E, Thier P. Gain modulation: a major computational principle of the central nervous system. Neuron. 2000;27:15–21. [PubMed]
32. Liu Y, Jagadeesh B. Neural selectivity in anterior inferotemporal cortex for morphed photographic images during behavioral classification or fixation. J. Neurophysiol. 2008;100:966–982. [PubMed]
33. Andersen RA, Bracewell RM, Barash S, Gnadt JW, Fogassi L. Eye position effects on visual, memory, and saccade-related activity in areas LIP and 7a of macaque. J. Neurosci. 1990;10:1176–1196. [PubMed]
34. Zipser D, Andersen RA. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature. 1988;331:679–684. [PubMed]
35. Treue S, Martínez Trujillo JC. Feature-based attention influences motion processing gain in macaque visual cortex. Nature. 1999;399:575–579. [PubMed]
36. Koida K, Komatsu H. Effects of task demands on the responses of color-selective neurons in the inferior temporal cortex. Nat. Neurosci. 2007;10:108–116. [PubMed]
37. Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. Nature. 1996;381:520–522. [PubMed]
38. Keysers C, Xiao D-K, Földiák P, Perrett DI. The speed of sight. J. Cogn. Neurosci. 2001;13:90–101. [PubMed]
39. Fabre-Thorpe M, Richard G, Thorpe SJ. Rapid categorization of natural images by rhesus monkeys. Neuroreport. 1998;9:303–308. [PubMed]
40. Cox D, Meyers E, Sinha P. Contextually evoked object-specific responses in human visual cortex. Science. 2004;304:115–117. [PubMed]
41. McAdams CJ, Maunsell JHR. Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4. J. Neurosci. 1999;19:431–441. [PubMed]
42. Tanaka K, Saito H-A, Fukada Y, Moriya M. Coding visual images of objects in the inferotemporal cortex of the macaque monkey. J. Neurophysiol. 1991;66:170–189. [PubMed]
43. Fujita I, Tanaka K, Ito M, Cheng K. Columns for visual features of objects in monkey inferotemporal cortex. Nature. 1992;360:343–346. [PubMed]
44. Wang G, Tanaka K, Tanifuji M. Optical imaging of functional organization in the monkey inferotemporal cortex. Science. 1996;272:1665–1668. [PubMed]
45. Bruce C, Desimone R, Gross CG. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J. Neurophysiol. 1981;46:369–384. [PubMed]