|Home | About | Journals | Submit | Contact Us | Français|
Event-related potentials (ERPs) associated with face perception were recorded with scalp electrodes from normal volunteers. Subjects performed a visual target detection task in which they mentally counted the number of occurrences of pictorial stimuli from a designated category such us butterflies. In separate experiments, target stimuli were embedded within a series of other stimuli including unfamiliar human faces and isolated face components, inverted faces, distorted faces, animal faces, and other nonface stimuli. Unman faces evoked a negative potential at 172 msec (N170), which was absent from the ERPs elicited by other animate and inanimate nonface stimuli. N170 was largest over the posterior temporal scalp and was larger over the right than the left hemisphere. N170 was delayed when faces were presented upside-down, but its amplitude did not change. When presented in isolation, eyes elicited an N170 that was significantly larger than that elicited by whole faces, while noses and lips elicited small negative ERPs about 50 msec later than N170. Distorted human faces, in which the locations of inner face components were altered, elicited an N170 similar in amplitude to that elicited by normal faces. However, faces of animals, human hands, cars, and items of furniture did not evoke N170. N170 may reflect the operation of a neural mechanism tuned to detect (as opposed to identify) human faces, similar to the “structural encoder” suggested by Bruce and Young (1986). A similar function has been proposed for the face-selective N200 ERP recorded from the middle fusiform and posterior inferior temporal gyri using subdural electrodes in humans (Allison, McCarthy, Nobre, Puce, & Belger, 1994c). However, the differential sensitivity of N170 to eyes in isolation suggests that N170 may reflect the activation of an eye-sensitive region of cortex. The voltage distribution of N170 over the scalp is consistent with a neural generator located in the occipitotemporal sulcus lateral to the fusiform/inferior temporal region that generates N200.
Face recognition has been investigated extensively in humans and monkeys using behavioral (e.g., Bruce, 1988), neuroimaging (Haxby, Grady, Horwitz, Salerno, Ungerleider, Mishkin, & Schapiro, 1993; Puce, Allison, Gore, & McCarthy, 1995; Sergent, Ohta, & MacDonald, 1992), and electrophysiological methods (Allison, Ginter, McCarthy, Nobre, Puce, Luby, & Spencer, 1994a; Allison, McCarthy, Belger, Puce, Luby, Spencer, & Bentin, 1994b; Allison et at., 1994c; Desimone, 1991; Gross, Rodman, Gochin, & Colombo, 1993; Jeffreys, 1993; Kendrick & Baldwin, 1987; Perrett, Mistlin, Chitty, Smith, Potter, Broennimann, & Harries, 1988; Seeck & Grüsser, 1992). That specialized brain areas contribute to face recognition is suggested by occasional patients with posterior lesions who suffer a specific deficiency in identifying familiar faces, a syndrome labeled prosopagnosia (Bodamer, 1947; see reviews by Damasio, Tranel, & Damasio, 1990: Whitely & Warrington, 1977). Initial studies indicated that lesions within parietooccipital cortical regions were responsible (Benton & Van Allen, 1972), but more recent studies have demonstrated that prosopagnosia is prevalent following lesions in the inferior occiptotemporal region (Damasio, Damasio, & Van Hoesen, 1982; Meadows, 1974). Although frequently observed with bilateral damage (e.g., Damasio et al., 1982; Nardelli, Buananno, Coccia, Fiaschi, Terzian, & Rizzuto, 1982), there is evidence that prosopagnosia may occur following unilateral damage to the right, but not the left, cerebral hemisphere (De Renzi, 1986; Landis, Cummings, Christen, Bogen, & Imhof, 1986; Michel, Poncet, & Signoret, 1989).
Additional information concerning the localization of neural mechanisms for face recognition has been provided by neuroimaging studies using positron emission tomography (PET). Haxby et al. (1993) found that face matching was associated with bilateral increases of blood flow in occipital-temporal cortex. To evaluate blood flow changes specific to faces, Haxby et al. (1993) compared face and location matching tasks and found that regions within the anterior and posterior fusiform gyrus showed the greatest differential activity for faces. These regions were activated bilaterally, but were larger in the right hemisphere. Sergent et al. (1992) compared two active face tasks (face identity and gender discrimination) and found significant differential activity bilaterally in the medial anterior temporal gyri, fusiform gyri, and temporal pole. Again, activation was somewhat larger on the right side. The right lingual and parahippocampal gyri also showed increased activity, as did the left middle temporal gyrus. Bilateral, but right predominant activation of the fusiform gyri was common to both Haxby et al. (1993) and Sergent et al. (1992). Recent studies using functional magnetic resonance imaging (fMRI) have also shown activation of the fusiform gyri to unfamiliar faces (Clark, Keil, Lalonde, Maisog, Courtney, Karni, Ungerleider, & Haxby, 1994; Puce et al., 1995).
One conclusion that can be drawn from the existing neuropsychological and neuroimaging literature is that face recognition utilizes a specialized neural subsystem for processing physiognomic information and relating the perceived input to prestored face representations. This subsystem appears to be localized in posterior temporal and inferior occipitotemporal regions, particularly in the right hemisphere (e.g Kay & Levin, 1982; Overman & Doty, 1982). Such an interpretation is consistent with some cognitive models of face recognition (e.g., Bruce & Young, 1986). However, other investigators have questioned whether face recognition is performed by a specialized neural subsystem. Deficits in face recognition may result from a mild form of a more general visual agnosia that appears specific simply because faces are more complex than other objects (Gloning, Gloning, Jellinger, & Quatember, 1970). Alternately, deficits in face recognition may reflect a general difficulty to discriminate within category items with preserved ability to discriminate between categories (Damasio et al., 1982). However, these interpretations are inconsistent with the double dissociation found among studies of those (relatively rare) patients who exhibit object agnosia with intact face recognition (e.g., McCarthy & Warrington, 1986), and patients who suffer exclusively from an inability to recognize familiar faces (DeRenzi, 1986).
Data concerning the neural mechanisms of face recognition have also been obtained in electrophysiological studies in monkeys. Single unit recordings have revealed cells in the inferotemporal cortex that respond to monkey and human faces (Bruce, Desimone, & Gross, 1981; Desimone, Albright, Gross, & Bruce, 1984; Young & Yamane, 1992) and to face components (Perrett, Rolls, & Caan, 1982), but not to other complex stimuli such as snakes, spiders, or food (Baylis, Rolls, & Leonard, 1985; Desimone et al., 1984; Rolls & Baylis, 1986; Saito, Yukie, Tanaka, Hikosoka, Fukada, & Iwai, 1986). This pattern supports the existence of a neural circuit specialized for face recognition that includes the inferior temporal gyrus and the banks of the superior temporal sulcus (Baylis, Rolls, & Leonard, 1987). Face-specific cells are highly sensitive to the natural appearance of the stimuli; line drawings or schematic representations of faces elicit only weak responses (Bruce et al., 1981; Perrett et al., 1982). Although most face-specific cells respond to rotated or inverted faces (Bruce, 1982; Hasselmo, Rolls, Baylis, & Nalwa, 1989; Overman & Doty, 1982), those responses are weaker and longer in latency compared to those evoked by upright faces (Perrett et al., 1988). Some units respond differentially depending upon the angle at which the face is viewed (e.g., Desimone et al., 1984). Some cells in inferotemporal cortex respond selectively to face components such as eyes, mouth, or hair (Perrett, Mitslin, & Chitty, 1987), but none has responded to pictures of faces in which these components were spatially rearranged (Desimone et al., 1984; Perrett et al., 1988). Other authors reported that small spatial distortions in the distances between the eyes, or between the eyes and mouth, reduced the overall probability of a cell’s response (Yamane, Kaji, & Kawano, 1988).
Recording brain electrical activity elicited by faces in humans may identify the neuroanatomical organization and the functional properties of the putative face recognition subsystem. In a recent study Allison et al. (1994a) recorded evoked field potentials directly from the surface of the occipitotemporal cortex and found that faces evoked a negative component with a mean latency of 192 msec (N200). Face-specific N200s were recorded from discrete regions that were not activated by other complex stimuli. However, nearby regions were selectively activated by other stimulus categories, such as letterstrings or colored checkerboards (Allison et al., 1994c). These results suggest a considerable degree of functional specialization within the ventral visual pathway.
In the present studies, we have recorded event-related potentials (ERPs) from scalp electrodes in normal volunteer subjects. ERPs recorded from electrodes over the lateral posterior scalp were elicited by faces and some face components, but not by other complex stimuli. The sensitivity of these ERPs to manipulations of face stimuli provides additional information regarding the functional properties of human neuronal subsystems related to face recognition.
Experiment 1 was conducted to determine whether face-specific ERPs could be recorded from scalp electrodes using the task of Allison et al. (1994a) in which face-specific ERPs were recorded from subdural electrodes. Subjects were presented with live categories of visual stimuli (faces, scrambled faces, cars, scrambled cars, and butterflies) and asked to mentally count the number of occurrences of a specified target category (butterflies). ERPs were averaged separately for each category.
Figure 1A presents grand-averaged ERPs elicited by faces (solid line) and scrambled faces (dashed line) from a 14-electrode montage. The ERPs in Figure 1 are arranged to approximate the electrode locations on the scalp. Of particular interest is the large negative ERP with a peak latency of 172 msec (N170) recorded from T5 and T6, which was largest for faces and smaller for the equally luminant scrambled faces. An earlier negative ERP (N100) was recorded from Oz (Fig. 1A), which was larger for scrambled than unscrambled stimuli [F(3,27) = 2.99, MSe = 6.54, P < 0.05]. A positive ERP at longer latency (P190) was evoked by faces at frontocentral scalp locations (Fig. 1A, Cz. ERPs elicited by faces and by scrambled faces also differed in the 250–500 msec latency range; at frontal locations (e.g., Fz) a positivity evoked by faces was not seen to scrambled faces, and at temporal locations (e.g., T6 scrambled faces evoked a larger positivity than did faces. The target category butterflies elicited a large P300, which was maximal over the posterior scalp (Pz) at 490 msec (Fig. 1C). In this paper we will focus upon N170, the earliest ERP related to face processing.
Figure 1B compares N170 amplitudes elicited by all nontarget stimulus categories. An ANOVA showed that the amplitude of N170 at T5 (left hemisphere) and T6 (right hemisphere) was significantly larger for faces (−3.55 µV) than scrambled faces (0.20 µV), cars (−0.22 µV), and scrambled cars (1.38 µV) [F(l,9) = 11.43, MSe = 30.37, p < 0.0001], with no significant differences among these latter three categories. The target category butterflies also did not evoke an appreciable N170 (not shown). N170 for faces was larger (i.e., more negative) over the right (−4.07 µV) than the left hemisphere (−3.02 µV), but this difference failed to reach statistical significance (p = 0.3).
Experiment 1 demonstrated that ERPs differentially sensitive to face stimuli can be recorded with scalp electrodes, faces elicited a negative ERP, N170, which was distributed focally over the lateral posterior scalp. N170 was not elicited by stimuli of another complex stimulus category, cars, which shared some face-like characteristics by virtue of their grilles and headlamps, nor by butterflies, an animate category. This pattern of results for N170 is similar to the subdural recordings of Allison et al. (1994a) in which faces (but not cars, butterflies, or scrambled stimuli) elicited an N200 from the inferior surface of the temporal lobe. The relationship of the N170 recorded in the present study to the N200 recorded by Allison et al. (1994a) will be considered in the General Discussion. Scrambled stimuli did not elicit N170, although scrambling increased the amplitude of the occipital N100. Allison et al. (1994c) reported that scrambled stimuli elicited larger ERPs from peristriate cortex presumably due to their many high-contrast edges. N170 was present at both T5 (left hemisphere) and T6, (right hemisphere), and was (nonsignificantly) larger in amplitude over the right hemisphere.
A positive ERP (P190) was recorded from the fronto-central scalp. P190 was also face-specific, and is similar to the vertex-positive potential recorded in previous ERP studies of face perception (e.g., Jeffreys, 1993; Seeck & Grüsser, 1992). The relationship between N170 and P190 is unclear. It is possible that N170 and P190 reflect a dipolar pattern due to a single neural generator, but latency differences and the reported sensitivity of the vertex-positive potential to animal faces and other complex stimuli (Jeffreys & Tukmachi, 1992) suggest that they may reflect different neural activity. In this and the following experiments, target stimuli generated a robust P300 (reviewed by Donchin & Coles, 1988). P300 was not elicited by faces or other nontarget stimuli, demonstrating that the subjects were correctly performing the target detection task.
All of the faces in the present study were unfamiliar to the subjects and their physiognomic features were irrelevant to the task. It is unlikely that subjects were engaged in an extensive process of face recognition, suggesting that the neural activity associated with N170 was activated automatically, perhaps reflecting mandatory processing of facial information. Such a mechanism might provide the neural basis for the “structural encoding” stage of face processing suggested by Bruce and Young (1986). Experiment 2 sought to examine further this hypothesis.
Experiment 1 demonstrated that human faces evoke N170. In Experiment 2 we sought to determine whether N170 is specific for faces per se, or could be evoked by any familiar body part such as hands. The tuning of N170 for human faces was also tested by comparing Nl70s elicited by human and animal faces.
In this experiment, cars were designated as targets and subjects were required to keep a mental count of the number presented. Nontarget categories included human faces, animal faces, and human hands. All animal faces had distinct eyes. Faces of nonhuman primates were excluded because of their similarity to human faces. Individual items of furniture were also presented to test further the apparent lack of the responsiveness of N170 to inanimate stimulus categories.
As in Experiment 1, human faces elicited a robust N170 (at 176 msec) that was slightly larger at T6 than T5 (Fig. 2). The amplitude at T6 of N170 elicited by faces was −2.18 µV, significantly more negative than the negative-going ERPs elicited by animal faces, human hands, and furniture (0.30, 0.11, and 0.71 µV, respectively). Repeated-measures ANOVA showed that these amplitude differences were statistically significant [F(4,44) = 13.7, MSe = 7.3, P < 0.0001]. Post-hoc comparisons revealed that while the N170 elicited by faces was significantly larger than that of all other stimulus categories, animal faces, human hands, and furniture did not significantly differ among themselves. The small variation between the peak latency of N170 across stimulus conditions was not significant [F(4,44) = 1.9, MSe = 136.3, P > 0.12].
Finally, it is interesting to note that despite the large difference in N170 amplitude between human faces and animal faces, at longer latencies (300–600 msec) the ERPs elicited by both types of faces were similar compared to ERPs elicited by nonface stimuli (Fig. 2).
The present experiment demonstrated that the N170 elicited by human faces was significantly larger than that elicited by animal faces and human hands. In the N170 latency range, the negative-going ERPs elicited by animal faces, hands, and furniture were statistically indistinguishable. The specificity of N170 to human faces and its insensitivity to human hands suggests that it reflects the activity of cells tuned to detect human faces and/or face components rather than being a general detector of information about body parts.
Normal subjects and prosopagnosic patients are worse in recognizing inverted faces compared to upright faces (reviewed by Benton & Van Allen, 1972; Valentine, 1988). If N170 reflects activity associated with face recognition, it should therefore be affected by face inversion. However, if N170 reflects activity in a neural circuit tuned to detect facial features prior to face recognition, it may not be sensitive to face inversion.
A 28-electrode montage was used in this experiment to provide a more complete description of the voltage distribution of N170 and related ERPs. As in Experiment 1, subjects were instructed to count mentally the number of target butterflies. faces and cars were presented in both upright and inverted positions.
The ERPs elicited by upright faces in the present experiment were similar to those observed for faces in Experiments 1 and 2. Figure 3A–D presents the distribution of voltage in color-coded topographic maps for selected latencies for the ERPs elicited by upright faces. The distribution at 88 msec corresponds to N100 (Fig. 3A). The distribution at 128 msec (Fig. 3B) corresponds to the positive peak recorded from the lateral posterior scalp that preceded N170. A broadly distributed frontal midline negative region was also evident at the same latency. The distribution at 172 msec (Fig. 3C) corresponds to the N170 elicited by faces in Experiments 1 and 2. A focal negative region over the posterior lateral scalp was present bilaterally, but was largest at T6. This hemispheric asymmetry was significant (p < 0.05). The positive focus over the anterior midline corresponded to the onset of P190. The distribution at 230 msec (Fig. 3D) corresponds to a positive peak in the ERPs recorded from the posterior lateral scalp. The distribution at 230 msec was similar to that at 128 msec (Fig. 3B) except that the positive voltage extended more posteriorly at this later latency.
The voltage distributions for the other stimulus categories of Experiment 3 were virtually identical prior to 170 msec. The focal negative distribution maximal at T6 was obtained only for upright faces and inverted faces. The N170 elicited by inverted faces was distributed similarly to that elicited by upright faces and was significantly more negative at T6 (−3.18 and −3.93 µV, for upright faces and inverted faces, respectively) than at T5 (−2.11 and −2.2 µV, respectively) [F(1,11) = 5.32, MSe = 18.2, p < 0.051. At T6, inverted faces elicited a slightly larger N170 than upright faces, while at T5 the N170 elicited by faces and inverted faces was equal in amplitude (Fig. 4). At both T5 and T6, the N170 elicited by inverted faces peaked 10 msec later than that elicited by upright faces. The latency difference was statistically significant [F(1,11) = 9.00, MSe 147, p < 0.012]. cars and inverted cars elicited identical ERPs with no N170.
Both upright faces and inverted faces elicited an N170 while cars, inverted cars, and butterflies did not. The scalp voltage distribution for all nontarget stimuli was similar up until the latency of N170 for faces and inverted faces. At this point a focal negative region was observed bilaterally over the lateral posterior scalp, which was significantly larger in amplitude over the right hemisphere. A positive region over the frontocentral midline region corresponding to P190 made this distribution appear dipolar as noted in the discussion of Experiment 1.
The overall similarity between the N170 elicited by faces and inverted faces suggested that N170 was not related to an attempt to recognize particular faces. It is instead congruent with a view that the neural mechanism generating N170 is involved in the structural analysis of visual stimuli leading to the categorization of a pictorial stimulus as “face.” N170 to inverted faces was delayed relative to faces. A delayed N200 for inverted relative to upright faces was also recorded subdurally in one patient by Allison et al. (1994b). In that patient, however, the difference was found only in the right hemisphere. Allison et al. (1994b) also found an amplitude difference in the right posterior fusiform gyrus where the amplitude of N200 elicited by inverted faces was smaller than that elicited by upright faces. We did not find that the N170 elicited by inverted faces was smaller than that elicited by faces; indeed, the N170 to inverted faces was somewhat larger in amplitude.
The similarities in N170 elicited by inverted faces and upright faces raises the question of N170’s sensitivity to the integrity of facial components. Experiment 4 examined this hypothesis by comparing the N170 elicited by full-face stimuli with those elicited by isolated face components.
As in Experiments 1 and 3, subjects kept a mental count of the number of butterflies appearing among non-target categories including full human faces and isolated eyes, lips, and noses. We compared the ERPs elicited by faces to the ERPs elicited by each of the face components.
The ERPs elicited by faces were similar in shape and scalp distribution to those elicited in the previous experiments. Again N170 was elicited by faces with a focal distribution at posterior-lateral sites (T5 and T6). As evident in Figure 5, the peak latency of N170 elicited by eyes (186 msec) was later than that elicited by faces (173 msec), and its amplitude was larger both at T5 (−4.2 vs. −2.8 µV) and T6 (−5.9 vs. −3.6 µV). The negative ERPs elicited by lips and noses were considerably later (215 and 210 msec, respectively) and were smaller (−3.1 and −2.0 µV, respectively) than the N170s elicited by faces and eyes. Repeated-measures ANOVA showed that both the latency and the amplitude differences were significant [F(3,33) = 26.43, MSe = 293, p < 0.0001, and F(3,33) = 8.30, MSe = 17.3, p < 0.0001, for latencies and amplitudes, respectively]. Post-hoc comparisons revealed that the N170 latencies elicited at T6 by faces and eyes were not significantly different, but both were significantly shorter than the latencies of the negative ERPs elicited by lips and noses. eyes elicited a significantly larger N170 at T6, than faces, lips, or noses, faces elicited a significantly larger N170 than the ERPs elicited by noses or by lips.
Figure 6A–D presents voltage distributions corresponding to the peak latencies of the negative ERPs elicited by faces, eyes, lips, and noses, respectively. The distribution for FACES (Fig. 6A) is shown for 172 msec and is similar to that obtained for faces in Experiment 3 (Fig. 3C) at the same latency. The difference between the amplitude of N170 at T5 (−2.8 µV) and T6 (−3.6 µV) was statistically significant (p < 0.05). The distribution for eyes (Fig. 6B) at 188 msec was similarly asymmetric (p = 0.02). The scalp distributions for lips (212 msec) and noses (232 msec) were different from those obtained for faces and eyes, but were similar to each other. For both lips and noses, the negative focus at T6, was less extensive and the positive midline focus was centered more posteriorly at Pz. The amplitude asymmetry between T6 and T5 was not statistically significant for either lips or noses.
The pattern of responses for faces and face components supports the conclusion of Experiment 2 that N170 reflects activity in a neural mechanism involved in the early detection of structural features characterizing human faces. The delayed and attenuated ERPs to lips and noses relative to the vigorous response to eyes is consistent with previous studies of the relative salience of facial features (reviewed by Shepherd, Davies, & Ellis, 1981). Most of these studies were concerned with the recognition of particular faces and showed that the facial outline was the most important feature followed in decreasing order of importance by the eyes, mouth, and nose (Davies, Ellis, & Shepherd, 1977; Fraser & Parker, 1986; Haig, 1986). The present results complement those findings by suggesting that eyes are the most representative facial feature even when the response does not include the recognition of particular faces (cf. Bruce, 1988).
The scalp distributions for faces and eyes were similar to each other, but different from those elicited by lips and noses. This may indicate similarly located and oriented neural generators for the N170s elicited by whole faces and isolated eyes, whereas the scalp distributions for lips and noses indicate a different configuration of neural generators.
The significantly larger N170 evoked by isolated eyes compared to whole faces argues against the notion that face integrity is the critical factor in the appearance of N170. Indeed, Experiment 4 raises the possibility that the neural mechanism generating N170 may be specific to eyes—whether present in isolation or in face context. If this is so, it is possible that the presence of additional face components in the same display modulates the response to the eyes. Experiment 5 was designed to examine these questions further by presenting eyes in a distorted face context.
This experiment addressed the question as to whether a normal face context is necessary to elicit N170. Face components were presented as in Experiment 4, but distorted faces, created by dislocating inner face components (Fig. 12), were substituted for normal faces. If N170 requires integrity of face components for its appearance, then dislocating those components should diminish its amplitude. If, however, N170 primarily reflects the detection of eyes, then the presence of eyes among the dislocated components should be sufficient to elicit N170.
Our finding in Experiment 4 that isolated eyes evoked a larger N170 than normal faces (which also contained eyes) indicates that if N170 primarily reflect eye-specific processing, then the context in which eyes are presented is important. Perhaps the presence of normally configured face features diminishes the activity of the eye detector through priming or a related process. If so, distorting the normal configuration of face features may eliminate this priming effect and the resulting N170 may be equal in amplitude to that elicited by isolated eyes.
ERPs elicited by each stimulus category are shown in Figure 7. The N170 amplitudes for distorted faces (−2.9 µV at T6, −2.0 µV at T5) and eyes (−3.2 µV at T6, −2.3 µV at T5 were statistically larger than those for lips (−0.70 µV at T6, −1.0 µV at T5), and noses (−1.3 µV at T6, −1.0 µV at T5) [F(3,33) = 7.78, MSe = 10.7, P < 0.001]. While the N170s elicited by eyes were slightly larger and later than those elicited by distorted faces, post-hoc comparisons revealed no significant difference in either measure. For both categories, N170 amplitudes were greater at T6 than T5, but this difference also did not reach statistical significance.
Faces in which the inner components were dislocated elicited a robust N170. It seems, therefore, that N170 is not dependent upon the spatial integrity of facial components as would be predicted for a holistic face-processing mechanism. This result underscores our earlier deduction that N170 is not related to face recognition per se, but to the detection of facial features, eyes elicited a larger N170 than distorted faces, but unlike Experiment 4, this difference was not statistically reliable. As predicted, this may indicate that the response to eyes presented within a distorted face was less influenced by the other face components than when presented within a normally configured face. This conclusion is tempered by the overall smaller N170 amplitudes obtained in the present experiment than those obtained in Experiment 4, including N170 amplitude to the identical category of isolated eyes. However, as the pattern of amplitudes for N170 within experiments has been consistent despite variation in its absolute amplitude across experiments, we believe that between-subject variability is the likely cause for these differences.
The present study was designed to examine, by non-invasive scalp recordings in normal subjects, some of the functional characteristics of the face-specific perceptual mechanism suggested by subdural recordings in patients. In a series of experiments we found a posterior-lateral N170, which, like the subdural N200, was elicited by human faces but not by animal faces, cars, scrambled faces, scrambled cars, items of furniture, or human hands.
N170 was larger over the right than over the left hemisphere in all experiments, although this difference did not always reach statistical significance. When tested across all subjects, N170 at T6 (−3.23 µV) was significantly larger than at T5 (−2.41 µV) [t(47) = 2.845, P) < 0.01]. The peak latency at T6 (173 msec) was similar to that at T5 (171 msec) [t47) = 1.028, P > 0.30]. N170 was as large for inverted as for upright faces. An even larger N170, similarly distributed over the scalp, was elicited by isolated eyes. In contrast to eyes, the negative ERPs elicited by isolated lips and noses were significantly smaller and delayed. Although still largest in amplitude over the posterior-lateral scalp, the ERP distribution for isolated lips and noses suggested a different configuration of neural generators than for faces and eyes.
In all experiments, stimuli were presented while subjects monitored the screen for the appearance of a target unrelated to faces. These attended targets did not evoke an appreciable N170. Since there were no task-related differences among the nontarget stimulus categories, any specific response to faces or to face components was likely due to a neural mechanism tuned to detect human physiognomic features automatically. The existence of such a mechanism is suggested by data showing that newborn infants look at human faces more than any other complex visual stimulus (Goren, Sarty, & Wu, 1975).
Behavioral studies of human face processing have focused on the identification of particular faces either in regard to their semantic representation, as in studies in which famous faces had to be distinguished from unfamiliar faces (e.g., Bruce, 1979; Bruce & Valentine, 1985, 1986; Valentine & Bruce, 1986a, 1986b) or in regard to recently formed representations in episodic memory (e.g., Shapiro & Penrod, 1986). Similarly, ERP studies of face recognition have investigated the electrophysiological correlates of face familiarity and unfamiliarity (Barrett, Rugg, & Perrett, 1988; Begleiter, Porjesz, & Wang, 1995; Hautecœur, Debruyne, Forzy, Gallois, Hache, & Dereux, 1993; Renault, Signoret, Debruille, Breton, & Bolgert, 1989; Small, 1986; Smith & Halgren, 1987) and face priming and recognition (Barrett & Rugg, 1989; Hertz, Porjesz, Begleiter, & Cholian, 1994; Schweinberger & Sommer, 1991). Less attention has been directed to the investigation of basic mechanisms for face detection and categorization among other complex visual stimuli (for reviews see Bruce, 1990a,b). Perhaps one reason for this emphasis is that even patients with severe prosopagnosia can distinguish faces from other visual stimuli (Farah, 1990). Nevertheless, an influential model of face perception assumes the existence of a “structural encoding” stage at which physiognomic information is detected and initially processed independently of recognition of personal identity or facial expression (Bruce & Young, 1986). It is possible that this function is carried out by a dedicated neural mechanism; the present results suggest that N170 might reflect part of its operation.
Many studies have demonstrated that recognition of faces is impaired by inversion more than recognition of other visual stimuli (reviewed by Valentine, 1988). In light of these findings the similarity in amplitude and spatial distribution between the N170 elicited by upright and inverted faces suggests that the function of the structural encoder is to detect and initially encode face structural information without being directly involved in the process of face recognition. Moreover the similarity in latency, amplitude, and spatial distribution between the N170s elicited by faces, inverted faces, distorted faces, and isolated eyes suggests that they may be fully activated by salient face characteristics. This pattern suggests that the structural encoder is sensitive primarily to the presence of these characteristic features rather than to their orientation or their interrelationship in space. Other findings suggested that spatial-configural processing may be less useful when faces are inverted (Sergent, 1984; Young, Hellawell, & Hay, 1987), yet the general similarity of the N170 for upright and inverted faces supports the view that at this processing stage both stimulus types are processed similarly (see also Valentine, 1988; Valentine & Bruce, 1988).
The relationship between the scalp-recorded N170 and the subdurally recorded N200 (Allison et al., 1994a, 1994b) is uncertain. The approximate 20 msec longer latency of N200 in patients with implanted electrodes could be attributed to differences between neurologically normal subjects and epileptic patients (e.g., anticonvulsant medications). However, the fusiform and inferior temporal region from which N200 is recorded is inferior to the T5 and T6 electrodes (from which N170 is best recorded), which are usually positioned over the middle temporal gyrus (Homan, Herman, & Purdy, 1987). It is therefore unlikely that the orientation of the neurons producing a cortical surface negativity over the ventral brain surface could also produce simultaneously a negative ERP over more superior scalp. Recent functional magnetic resonance imaging studies of face perception (Clark et al., 1994; Puce et al., 1995) demonstrated that faces often activate cortex within the occipitotemporal sulcus, which separates the fusiform and inferior temporal gyri. This deep sulcus is oriented obliquely, and neural generators within the bank of the sulcus would be oriented toward the middle temporal scalp. These considerations suggest that the subdural N200 is due to radial generators located mainly in the fusiform gyrus, whereas the scalp N170 is due to oblique generators located mainly within the adjacent occipitotemporal sulcus. Inverse generator models (e.g., Probst, Plendl, Paulus, Wist, & Scherg, 1993) to determine the locations and orientations of the neural generators of N170 will be required to test this assumption quantitatively.
The proposed differences in location of the N170 and N200 generators, and differences in their responsivity to faces and face components, suggest an extension of the “structural encoding” model reviewed above. Specifically, these results suggest that a portion of cortex within the occipitotemporal sulcus is not simply a lateral portion of the fusiform face-specific processor, but instead constitutes a separate eye-specific processor. The hypothesis is summarized schematically in Figure 8. In this model, N200 recorded from the fusiform/inferior temporal region may reflect the operation of the “structural encoding” stage of face processing, whereas N170 may reflect the operation of a putative eye processor whose function may be to analyze the direction of gaze and other information conveyed by the eyes. It seems unlikely that a large population of cells responsive to the eyes (judging by the amplitude of N170 to eyes alone) is involved only in face identification, particularly since the eyes are less important in this task than are other facial features such as contour, hair line, and hair style (Haig, 1984, 1986; Shepherd et al., 1981). In monkeys, cells in the superior temporal sulcus are sensitive to the eyes and to the direction of gaze (Perrett, Heitanen, Oram, & Benson, 1992; Perrett et al., 1988). Perrett et al. (1992) note that cells sensitive to face identity may be more frequent in inferior temporal cortex, whereas cells sensitive to gaze direction are more frequent in the superior temporal sulcus. It is plausible that a similar spatial differentiation of cell types exists in the human temporal lobe. Indeed, in subdural recordings, locations lateral to locations from which N200 is recorded often respond better to eyes alone than to the entire face (Allison et al., 1994c). Whether this lateral region extends into the occipitotemporal sulcus or onto the lateral surface of the temporal lobe is currently unknown.
If N170 primarily reflects eye-specific neural activity, then the context in which the eyes appear modulates that activity. Isolated eyes evoked a significantly larger N170 than a normal full face (Experiment 4) but not than a distorted face (Experiment 5). Inverted faces evoked a larger N170 than upright faces, although this difference was not significant (Experiment 3). This pattern suggests that the presence of other face features in appropriate spatial orientation diminishes the N170 evoked by the eyes contained within the face. Increasing amounts of feature dislocation or isolation causes an increase in N170. This electrophysiological result is reminiscent of behavioral results demonstrating that processing of a target embedded within a face is impeded, suggesting that a gestalt such as a face is first processed holistically and inhibits lower-level feature processing (e.g., Mermelstein, Banks, & Prinzmetal, 1979; Suzuki & Cavanaugh, 1995.) This modulation may reflect priming of an eye detector by a holistic face processor, or inhibitory interactions among face features. The lack of N170 to animal faces (Experiment 2) is problematic because the animal faces contained distinct eyes. It may be that N170 is tuned to process human eyes, perhaps by detecting the contrast between the iris and white sclera. Alternatively, the configuration of other animal face features may inhibit this putative eye processor. None of the present experiments was designed explicitly to test the eye-processor hypothesis. Such experiments are now underway and may help to resolve these issues.
Volunteer subjects were recruited from among the undergraduate and graduate students of the Hebrew University. Twelve subjects participated in each experiment, each of which lasted approximately 2 hr. Subjects were tested singly and none participated in more than one experiment.
Original stimuli were photographs that were digitally scanned, processed by graphics software, and presented as gray-scale images on a computer monitor. In each experiment, lists of 330 images were presented in three consecutive equal blocks. Each image was exposed for 250 msec with an interval of 1500 msec between the onsets of successive images. Stimulus timing was controlled by the MEL software system (Psychology Software Tools, Pittsburgh, PA).
Subjects were engaged in a target detection tasks in which stimuli from one stimulus category were designated as targets. The subjects were required to keep a mental count of the number of targets presented in each run and report that count at the end. Targets were presented on approximately 11 % of trials.
EEG was recorded simultaneously from 14 (Experiment 1) or 28 scalp locations using tin electrodes attached to electrode caps (Electrode Caps International, Ohio). Additional channels recorded the electrooculogram (EOG) from the supraorbital ridge and outer canthus of the left eye. All electrodes were referenced to an electrode placed on the tip of the nose. A ground electrode was placed on the forehead. The EEG was amplified by battery-operated amplifiers with a gain of 20,000 through a bandpass of 0.01–100 Hz. Electrode impedances were below 5 kΩ.
EEC epochs were acquired beginning 100 msec prior to stimulus onset and continuing for 1024 msec. These epochs were digitized at a rate of 250 Hz (4 msec/sample/channel) and stored on disk for off-line averaging. Codes synchronized to stimulus delivery were used to selectively average epochs associated with different stimulus types. During this averaging procedure, epochs contaminated with EOG artifacts were eliminated using root mean square EOG values as criteria.
The mean and peak amplitudes and peak latencies of particular ERP components were obtained within bounding intervals by computer program for each stimulus condition for each subject. Significant differences among these measures were tested using repeated-measures analysis of variance (ANOVA). Post-hoc comparisons were evaluated using Tukey’s test (Tukey A). In addition, across-subjects (grand-averaged) mean ERPs were computed and used to make comparison plots and topographic maps. The topographic maps represent the ERP voltage distribution flattened onto two dimensions of space with the amplitude at any time represented by color coding. Voltages between electrode locations were interpolated using a spherical spline technique.
Five categories of stimuli used by Allison et al. (1994a) were presented (Fig. 9): (1) faces, (2) scrambled faces, (3) cars, (4) scrambled cars, and (5) butterflies. Equal numbers of male and female faces were scanned from a college yearbook. None had eyeglasses or jewelry, and all males were clean shaven. Front views of cars were scanned from automotive magazines. Images of cars and faces were scrambled by digitally cutting the images into rectangular sections and then rearranging the sections until the identity of the resulting image could not be discerned. This procedure retained the luminance of the original image, but the random juxtaposition of rectangular edges tended to increase local spatial contrast. butterflies, the target category, were scanned from a field guide. The order of presentation of images drawn from the different stimulus categories was randomized and no stimulus was repeated.
Five categories of stimuli were presented (Fig. 10): (1) human faces (2) animal faces, (3) human hands, (4) furniture, and (5) cars. The human faces were the same as in the previous experiments. Animal faces were front views scanned from books and magazines. Animals without front set eyes and nonhuman primates were excluded. In this study, cars were presented on 11% of the trials and comprised the target category. All other categories were presented with equal frequency.
Five categories of stimuli were presented: (1) faces, (2) inverted faces, (3) cars, (4) inverted cars, and (5) butterflies. Stimuli were identical to those used in Experiment 1 except that the categories of scrambled faces and scrambled cars were replaced by inverted (by 180°) representations of the same faces and cars used in categories 1 and 3. Butterflies were the target category. Stimulus proportions were as in Experiment 1.
Five categories of stimuli were presented (Fig. 11): (1) faces, (2) eyes, (3) mouths, (4) noses, and (5) butterflies. Faces and butterflies were as in the previous experiments. Face components (eyes, mouths, and noses) were cropped from the images used in the faces category and were centered within a gray enclosing rectangle identical in size to the rectangle enclosing the complete faces. All categories were presented in equal proportion except for the target butterflies.
The only difference between the stimuli used in Experiment 5 and those used in Experiment 4 was that a distorted faces category substituted for the face category. Distorted faces were constructed using computer graphics to displace and dislocate inner facial components of the regular faces used in the other experiments. The displacement of inner components was done without changing their vertical orientation and without affecting the face contour (Fig. 12).
The subjects in this experiment were 12 undergraduates who did not participate in the previous experiments. Their task and all other methodological details were identical to those used in Experiment 4.
This work was supported by the United States-Israel Binational Science Foundation, the Department of Veterans Affairs, and by NIMH Grant MH-05286.
Shlomo Bentin, Hebrew University, Israel.
Truett Allison, West Haven VA Medical Center and Yale University School of Medicine.
Aina Puce, West Haven VA Medical Center and Yale University School of Medicine.
Erik Perez, Hebrew University, Israel.
Gregory McCarthy, West Haven VA Medical Center and Yale University School of Medicine.