The ability of primates to effortlessly recognize faces has been attributed to the existence of specialized face areas. One such area, the macaque middle face patch, consists almost entirely of cells that are selective for faces, but the principles by which these cells analyze faces are unknown. We found that middle face patch neurons detect and differentiate faces using a strategy that is both part based and holistic. Cells detected distinct constellations of face parts. Furthermore, cells were tuned to the geometry of facial features. Tuning was most often ramp-shaped, with a one-to-one mapping of feature magnitude to firing rate. Tuning amplitude depended on the presence of a whole, upright face and features were interpreted according to their position in a whole, upright face. Thus, cells in the middle face patch encode axes of a face space specialized for whole, upright faces.
Faces are among the most informative stimuli we ever perceive: Even a split-second glimpse of a person's face tells us their identity, sex, mood, age, race, and direction of attention. The specialness of face processing is acknowledged in the artificial vision community, where contests for face recognition algorithms abound. Neurological evidence strongly implicates a dedicated machinery for face processing in the human brain, to explain the double dissociability of face and object recognition deficits. Furthermore, it has recently become clear that macaques too have specialized neural machinery for processing faces. Here we propose a unifying hypothesis, deduced from computational, neurological, fMRI, and single-unit experiments: that what makes face processing special is that it is gated by an obligatory detection process. We will clarify this idea in concrete algorithmic terms, and show how it can explain a variety of phenomena associated with face processing.
Face processing; Face cells; Holistic processing; Face recognition; Face detection; Temporal lobe
Faces are found generally to be perceived as thinner when viewed upside down. When a face is viewed upright, the internal features are thought to influence the perception of face shape. However, when inverted, it has been proposed that disruption to holistic processing means that these factors can no longer be used to judge the shape of a face. We show that it is not the case that an inverted face reverts to some average shape whereby fat faces appear thinner upside down whereas thin faces appear fatter. The fact that the illusion appears to occur for most face shapes is discussed with regard to the horizontal–vertical illusion.
face perception; inversion; size; shape
Upright faces are thought to be processed holistically. However, the range of views within which holistic processing occurs is unknown. Recent research by McKone (2008) suggests that holistic processing occurs for all yaw-rotated face views (i.e., full-face through to profile). Here we examined whether holistic processing occurs for pitch, as well as yaw, rotated face views. In this face recognition experiment: (i) participants made same/different judgments about two sequentially presented faces (either both upright or both inverted); (ii) the test face was pitch/yaw rotated by between 0° and 75° from the encoding face (always a full-face view). Our logic was as follows: if a particular pitch/yaw-rotated face view is being processed holistically when upright, then this processing should be disrupted by inversion. Consistent with previous research, significant face inversion effects (FIEs) were found for all yaw-rotated views. However, while FIEs were found for pitch rotations up to 45°, none were observed for 75° pitch rotations (rotated either above or below the full face). We conclude that holistic processing does not occur for all views of upright faces (e.g., not for uncommon pitch rotated views), only those that can be matched to a generic global representation of a face.
face recognition; inversion; holistic processing; pitch and yaw axes
Primates possess the remarkable ability to differentiate faces of group members and to extract relevant information about the individual directly from the face. Recognition of conspecific faces is achieved by means of holistic processing, i.e. the processing of the face as an unparsed, perceptual whole, rather than as the collection of independent features (part-based processing). The most striking example of holistic processing is the Thatcher illusion. Local changes in facial features are hardly noticeable when the whole face is inverted (rotated 180°), but strikingly grotesque when the face is upright. This effect can be explained by a lack of processing capabilities for locally rotated facial features when the face is turned upside down. Recently, a Thatcher illusion was described in the macaque monkey analogous to that known from human investigations. Using a habituation paradigm combined with eye tracking, we address the critical follow-up questions raised in the aforementioned study to show the Thatcher illusion as a function of the observer's species (humans and macaques), the stimulus' species (humans and macaques) and the level of perceptual expertise (novice, expert).
Thatcher illusion; monkey; face recognition; holistic perception
While own-age faces have been reported to be better recognized than other-age faces, the underlying cause of this phenomenon remains unclear. One potential cause is holistic face processing, a special kind of perceptual and cognitive processing reserved for perceiving upright faces. Previous studies have indeed found that adults show stronger holistic processing when looking at adult faces compared to child faces, but whether a similar own-age bias exists in children remains to be shown.
Here we used the composite face task – a standard test of holistic face processing – to investigate if, for child faces, holistic processing is stronger for children than adults. Results showed child participants (8–13 years) had a larger composite effect than adult participants (22–65 years).
Our finding suggests that differences in strength of holistic processing may underlie the own-age bias on recognition memory. We discuss the origin of own-age biases in terms of relative experience, face-space tuning, and social categorization.
Current models of face processing support an orientation-dependent expert face processing mechanism. However, even when upright, faces are encountered from different viewpoints, across which a face processing system must be able to generalize. Different computational models have generated competing predictions of how viewpoint variation might affect the perception of upright versus inverted faces. Our goal was to examine the interaction between viewpoint variation and orientation on face discrimination. Sixteen normal subjects performed an oddity-paradigm requiring subjects to discriminate changes in three simultaneously viewed morphed faces presented either upright or inverted. In one type of trial all the faces were seen in frontal view, in the other all faces varied in viewpoint, rotated 45° from each other. After the effects of orientation were adjusted for perceptual difficulty, there were only main effects of orientation and viewpoint, with no interaction between orientation and viewpoint. We conclude that the effects of viewpoint variation on the perceptual discrimination of faces is not different for upright versus inverted faces, indicating that its effects are independent of the expertise that exists for upright faces.
Face processing; holistic processing models; viewpoint invariance; visual agnosia; prosopagnosia
It has long been argued that face processing requires disproportionate reliance on holistic or configural processing, relative to that required for non-face object recognition, and that a disruption of such holistic processing may be causally implicated in prosopagnosia. Previously, we demonstrated that individuals with congenital prosopagnosia (CP) did not show the normal face inversion effect (better performance for upright compared to inverted faces) and evinced a local (rather than the normal global) bias in a compound letter global/local (GL) task, supporting the claim of disrupted holistic processing in prosopagnosia. Here, we investigate further the nature of holistic processing impairments in CP, first by confirming, in a large sample of CP individuals, the absence of the normal face inversion effect and the presence of the local bias on the GL task, and, second, by employing the composite face paradigm, often regarded as the gold standard for measuring holistic face processing. In this last task, we show that, in contrast with normal individuals, the CP group perform equivalently with aligned and misaligned faces and was impervious to (the normal) interference from the task-irrelevant bottom part of faces. Interestingly, the extent of the local bias evident in the composite task is correlated with the abnormality of performance on diagnostic face processing tasks. Furthermore, there is a significant correlation between the magnitude of the local bias in the GL and performance on the composite task. These results provide further evidence for impaired holistic processing in CP and, moreover, corroborate the critical role of this type of processing for intact face recognition.
configural; faces; face perception; global processing; acquired prosopagnosia
How a visual stimulus is initially categorized as a face in a network of human brain areas remains largely unclear. Hierarchical neuro-computational models of face perception assume that the visual stimulus is first decomposed in local parts in lower order visual areas. These parts would then be combined into a global representation in higher order face-sensitive areas of the occipito-temporal cortex. Here we tested this view in fMRI with visual stimuli that are categorized as faces based on their global configuration rather than their local parts (two-tones Mooney figures and Arcimboldo's facelike paintings). Compared to the same inverted visual stimuli that are not categorized as faces, these stimuli activated the right middle fusiform gyrus (“Fusiform face area”) and superior temporal sulcus (pSTS), with no significant activation in the posteriorly located inferior occipital gyrus (i.e., no “occipital face area”). This observation is strengthened by behavioral and neural evidence for normal face categorization of these stimuli in a brain-damaged prosopagnosic patient whose intact right middle fusiform gyrus and superior temporal sulcus are devoid of any potential face-sensitive inputs from the lesioned right inferior occipital cortex. Together, these observations indicate that face-preferential activation may emerge in higher order visual areas of the right hemisphere without any face-preferential inputs from lower order visual areas, supporting a non-hierarchical view of face perception in the visual cortex.
face perception; visual cortex; Mooney; fusiform gyrus; prosopagnosia; FFA
Recognition and individuation of conspecifics by their face is essential for primate social cognition. This ability is driven by a mechanism that integrates the appearance of facial features with subtle variations in their configuration (i.e., second-order relational properties) into a holistic representation. So far, there is little evidence of whether our evolutionary ancestors show sensitivity to featural spatial relations and hence holistic processing of faces as shown in humans. Here, we directly compared macaques with humans in their sensitivity to configurally altered faces in upright and inverted orientations using a habituation paradigm and eye tracking technologies. In addition, we tested for differences in processing of conspecific faces (human faces for humans, macaque faces for macaques) and non-conspecific faces, addressing aspects of perceptual expertise. In both species, we found sensitivity to second-order relational properties for conspecific (expert) faces, when presented in upright, not in inverted, orientation. This shows that macaques possess the requirements for holistic processing, and thus show similar face processing to that of humans.
Holistic coding for faces is shown in several illusions that demonstrate integration of the percept across the entire face. The illusions occur upright but, crucially, not inverted. Converting the illusions into experimental tasks that measure their strength – and thus index degree of holistic coding – is often considered straightforward yet in fact relies on a hidden assumption, namely that there is no contribution to the experimental measure from secondary cognitive factors. For the composite effect, a relevant secondary factor is size of the “spotlight” of visuospatial attention. The composite task assumes this spotlight can be easily restricted to the target half (e.g., top-half) of the compound face stimulus. Yet, if this assumption were not true then a large spotlight, in the absence of holistic perception, could produce a false composite effect, present even for inverted faces and contributing partially to the score for upright faces. We review evidence that various factors can influence spotlight size: race/culture (Asians often prefer a more global distribution of attention than Caucasians); sex (females can be more global); appearance of the join or gap between face halves; and location of the eyes, which typically attract attention. Results from five experiments then show inverted faces can sometimes produce large false composite effects, and imply that whether this happens or not depends on complex interactions between causal factors. We also report, for both identity and expression, that only top-half face targets (containing eyes) produce valid composite measures. A sixth experiment demonstrates an example of a false inverted part-whole effect, where encoding-specificity is the secondary cognitive factor. We conclude the inverted face control should be tested in all composite and part-whole studies, and an effect for upright faces should be interpreted as a pure measure of holistic processing only when the experimental design produces no effect inverted.
face perception; inversion effects; holistic processing; composite task; part-whole task; culture differences; attention; global-local
During the first year of life, infants’ face recognition abilities are subject to “perceptual narrowing,” the end result of which is that observers lose the ability to distinguish previously discriminable faces (e.g. other-race faces) from one another. Perceptual narrowing has been reported for faces of different species and different races, in developing humans and primates. Though the phenomenon is highly robust and replicable, there have been few efforts to model the emergence of perceptual narrowing as a function of the accumulation of experience with faces during infancy. The goal of the current study is to examine how perceptual narrowing might manifest as statistical estimation in “face space,” a geometric framework for describing face recognition that has been successfully applied to adult face perception. Here, I use a computer vision algorithm for Bayesian face recognition to study how the acquisition of experience in face space and the presence of race categories affect performance for own and other-race faces. Perceptual narrowing follows from the establishment of distinct race categories, suggesting that the acquisition of category boundaries for race is a key computational mechanism in developing face expertise.
Understanding how the human visual system recognizes objects is one of the key challenges in neuroscience. Inspired by a large body of physiological evidence, a general class of recognition models has emerged, which is based on a hierarchical organization of visual processing, with succeeding stages being sensitive to image features of increasing complexity. However, these models appear to be incompatible with some well-known psychophysical results. Prominent among these are experiments investigating recognition impairments caused by vertical inversion of images, especially those of faces. It has been reported that faces that differ 'featurally' are much easier to distinguish when inverted than those that differ 'configurally'; a finding that is difficult to reconcile with the physiological models. Here, we show that after controlling for subjects' expectations, there is no difference between 'featurally' and 'configurally' transformed faces in terms of inversion effect. This result reinforces the plausibility of simple hierarchical models of object representation and recognition in the cortex.
It is widely agreed that the human face is processed differently from other objects. However there is a lack of consensus on what is meant by a wide array of terms used to describe this “special” face processing (e.g., holistic and configural) and the perceptually relevant information within a face (e.g., relational properties and configuration). This paper will review existing models of holistic/configural processing, discuss how they differ from one another conceptually, and review the wide variety of measures used to tap into these concepts. In general we favor a model where holistic processing of a face includes some or all of the interrelations between features and has separate coding for features. However, some aspects of the model remain unclear. We propose the use of moving faces as a way of clarifying what types of information are included in the holistic representation of a face.
holistic; configural; relational; moving faces; composite task; part-whole task; inversion
Face processing relies on a distributed, patchy network of cortical regions in the temporal and frontal lobes that respond disproportionately to face stimuli, other cortical regions that are not even primarily visual (such as somatosensory cortex), and subcortical structures such as the amygdala. Higher-level face perception abilities, such as judging identity, emotion and trustworthiness, appear to rely on an intact face-processing network that includes the occipital face area (OFA), whereas lower-level face categorization abilities, such as discriminating faces from objects, can be achieved without OFA, perhaps via the direct connections to the fusiform face area (FFA) from several extrastriate cortical areas. Some lesion, transcranial magnetic stimulation (TMS) and functional magnetic resonance imaging (fMRI) findings argue against a strict feed-forward hierarchical model of face perception, in which the OFA is the principal and common source of input for other visual and non-visual cortical regions involved in face perception, including the FFA, face-selective superior temporal sulcus and somatosensory cortex. Instead, these findings point to a more interactive model in which higher-level face perception abilities depend on the interplay between several functionally and anatomically distinct neural regions. Furthermore, the nature of these interactions may depend on the particular demands of the task. We review the lesion and TMS literature on this topic and highlight the dynamic and distributed nature of face processing.
faces; lesion studies; transcranial magnetic stimulation; fusiform face area
Face perception is a critical social ability and identifying its neural correlates is important from both basic and applied perspectives. In EEG recordings, faces elicit a distinct electrophysiological signature, the N170, which has a larger amplitude and shorter latency in response to faces compared to other objects. However, determining the face specificity of any neural marker for face perception hinges on finding an appropriate control stimulus. We used a novel stimulus set consisting of 300 images that spanned a continuum between random patches of natural scenes and genuine faces, in order to explore the selectivity of face-sensitive ERP responses with a model-based parametric stimulus set. Critically, our database contained “false alarm” images that were misclassified as faces a computational face-detection system and varied in their image-level similarity to real faces. High-density (128-channel) event-related potentials (ERPs) were recorded while 23 adult subjects viewed all 300 images in random order, and determined whether each image was a face or non-face. The goal of our analyses was to determine the extent to which a gradient of sensitivity to face-like structure was evident in the ERP signal. Traditional waveform analyses revealed that the N170 component over occipitotemporal electrodes was larger in amplitude for faces compared to all non-faces, even those that were high in image similarity to faces, suggesting strict selectivity for veridical face stimuli. By contrast, single-trial classification of the entire waveform measured at the same sensors revealed that misclassifications of non-face patterns as faces increased with image-level similarity to faces. These results suggest that individual components may exhibit steep selectivity, but integration of multiple waveform features may afford graded information regarding stimulus appearance.
Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance) can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a) derive a facial trait judgment model from training data and b) predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations) and classification rules (4 rules) suggest that a) prediction of perception of facial traits is learnable by both holistic and structural approaches; b) the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c) for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions.
Multicomponent signals consist of several traits that are perceived as a whole. Although many animals rely on multicomponent signals to communicate, the selective pressures shaping these signals are still poorly understood. Previous work has mainly investigated the evolution of multicomponent signals by studying each trait individually, which may not accurately reflect the selective pressures exerted by the holistic perception of signal receivers. Here, we study the design of the multicoloured face of an Old World primate, the mandrill (Mandrillus sphinx), in relation to two aspects of signalling that are expected to be selected by receivers: conspicuousness and information. Using reflectance data on the blue and red colours of the faces of 34 males and a new method of hue vectorisation in a perceptual space of colour vision, we show that the blue hue maximises contrasts to both the red hue and the foliage background colouration, thereby increasing the conspicuousness of the whole display. We further show that although blue saturation, red saturation and the contrast between blue and red colours are all correlated with dominance, dominance is most accurately indicated by the blue-red contrast. Taken together our results suggest that the evolution of blue and red facial colours in male mandrills are not independent and are likely driven by the holistic perception of conspecifics. In this view, we propose that the multicoloured face of mandrills acts as a multicomponent signal. Last, we show that information accuracy increases with the conspicuousness of the whole display, indicating that both aspects of signalling can evolve in concert.
An object or feature is generally more difficult to identify when other objects are presented nearby, an effect referred to as crowding. Here, we used Mooney faces to examine whether crowding can also occur within and between holistic face representations (C. M. Mooney, 1957). Mooney faces are ideal stimuli for this test because no cues exist to distinguish facial features in a Mooney face; to find any facial feature, such as an eye or a nose, one must first holistically perceive the image as a face. Through a series of six experiments we tested the effect of crowding on Mooney face recognition. Our results demonstrate crowding between and within Mooney faces and fulfill the diagnostic criteria for crowding, including eccentricity dependence and lack of crowding in the fovea, critical flanker spacing consistent with less than half the eccentricity of the target, and inner-outer flanker asymmetry. Further, our results show that recognition of an upright Mooney face is more strongly impaired by upright Mooney face flankers than inverted ones. Taken together, these results suggest crowding can occur selectively between high-level representations of faces and that crowding must occur at multiple levels in the visual system.
peripheral vision; spatial vision; object recognition; inversion; asymmetry
Understanding the developmental origins of face recognition has been the goal of many studies of various approaches. Contributions of experience-expectant mechanisms (early component), like perceptual narrowing, and lifetime experience (late component) to face processing remain elusive. By investigating captive chimpanzees of varying age, a rare case of a species with lifelong exposure to non-conspecific faces at distinctive levels of experience, we can disentangle developmental components in face recognition. We found an advantage in discriminating chimpanzee above human faces in young chimpanzees, reflecting a predominant contribution of an early component that drives the perceptual system towards the conspecific morphology, and an advantage for human above chimpanzee faces in old chimpanzees, reflecting a predominant late component that shapes the perceptual system along the critical dimensions of the face exposed to. We simulate the contribution of early and late components using computational modeling and mathematically describe the underlying functions.
People with autism and schizophrenia have been shown to have a local bias in sensory processing and face recognition difficulties. A global or holistic processing strategy is known to be important when recognizing faces. Studies investigating face recognition in these populations are reviewed and show that holistic processing is employed despite lower overall performance in the tasks used. This implies that holistic processing is necessary but not sufficient for optimal face recognition and new avenues for research into face recognition based on network models of autism and schizophrenia are proposed.
vision; face recognition; autism; schizophrenia; holistic coding; configurational coding
Considerable evidence suggests that qualitatively different processes are involved in the perception of faces and objects. According to a central hypothesis, the extraction of information about the spacing among face parts (e.g., eyes and mouth) is a primary function of face processing mechanisms that is dissociated from the extraction of information about the shape of these parts. Here, we used an individual-differences approach to test whether the shape of face parts and the spacing among them are indeed processed by dissociated mechanisms. To determine whether the pattern of findings that we reveal is unique for upright faces, we also presented similarly manipulated nonface stimuli. Subjects discriminated upright or inverted faces or houses that differed in parts or spacing. Only upright faces yielded a large positive correlation across subjects between performance on the spacing and part discrimination tasks. We found no such correlation for inverted faces or houses. Our findings suggest that face parts and spacing are processed by associated mechanisms, whereas the parts and spacing of nonface objects are processed by distinct mechanisms. These results may be consistent with the idea that faces are special, in that they are processed as nondecomposable wholes.
The face inversion effect has been used as a basis for claims about the specialization of face-related perceptual and neural processes. One of these claims is that the fusiform face area (FFA) is the site of face-specific feature-based and/or configural/holistic processes that are responsible for producing the face inversion effect. However, the studies on which these claims were based almost exclusively used stimulus manipulations of whole faces. Here, we tested inversion effects using single, discrete features and combinations of multiple discrete features, in addition to whole faces, using both behavioral and fMRI measurements. In agreement with previous studies, we found behavioral inversion effects with whole faces and no inversion effects with a single eye stimulus or the two eyes in combination. However, we also found behavioral inversion effects with feature combination stimuli that included features in the top and bottom halves (eyes-mouth and eyes-nose-mouth). Activation in the FFA showed an inversion effect for the whole-face stimulus only, which did not match the behavioral pattern. Instead, a pattern of activation consistent with the behavior was found in the bilateral inferior frontal gyrus, which is a component of the extended face-preferring network. The results appear inconsistent with claims that the FFA is the site of face-specific feature-based and/or configural/holistic processes that are responsible for producing the face inversion effect. They are more consistent with claims that the FFA shows a stimulus preference for whole upright faces.
Individuals with body dysmorphic disorder (BDD) are preoccupied with perceived defects in appearance. Preliminary evidence suggests abnormalities in global and local visual information processing. The objective of this study was to compare global and local processing in BDD subjects and healthy controls by testing the face inversion effect, in which inverted (upside-down) faces are recognized more slowly and less accurately relative to upright faces. Eighteen medication-free subjects with BDD and 17 matched, healthy controls performed a recognition task with sets of upright and inverted faces on a computer screen that were either presented for short duration (500 msec) or long duration (5000 msec). Response time and accuracy rates were analyzed using linear and logistic mixed effects models, respectively. Results indicated that the inversion effect for response time was smaller in BDD subjects than controls during the long duration stimuli, but was not significantly different during the short duration stimuli. Inversion effect on accuracy rates did not differ significantly between groups during either of the two durations. Lesser inversion effect in BDD subjects may be due to greater detail-oriented and piecemeal processing for long duration stimuli. Similar results between groups for short duration stimuli suggest that they may be normally engaging configural and holistic processing for brief presentations. Abnormal visual information processing in BDD may contribute to distorted perception of appearance; this may not be limited to their own faces, but to others’ faces as well.
body dysmorphic disorder; inverted faces; face inversion effect; face processing; global and local
How does the brain learn to recognize objects visually, and perform this difficult feat robustly in the face of many sources of ambiguity and variability? We present a computational model based on the biology of the relevant visual pathways that learns to reliably recognize 100 different object categories in the face of naturally occurring variability in location, rotation, size, and lighting. The model exhibits robustness to highly ambiguous, partially occluded inputs. Both the unified, biologically plausible learning mechanism and the robustness to occlusion derive from the role that recurrent connectivity and recurrent processing mechanisms play in the model. Furthermore, this interaction of recurrent connectivity and learning predicts that high-level visual representations should be shaped by error signals from nearby, associated brain areas over the course of visual learning. Consistent with this prediction, we show how semantic knowledge about object categories changes the nature of their learned visual representations, as well as how this representational shift supports the mapping between perceptual and conceptual knowledge. Altogether, these findings support the potential importance of ongoing recurrent processing throughout the brain’s visual system and suggest ways in which object recognition can be understood in terms of interactions within and between processes over time.
object recognition; computational model; recurrent processing; feedback; winners-take-all mechanism