Of key relevance for the present discussion, the model thus provides a simple candidate mechanism for “holistic” processing in the form of a neural representation tuned to upright faces: While these model neurons respond well and selectively to particular upright faces, they respond less to and differentiate poorly between inverted faces and thus do not provide as good a basis to discriminate inverted faces, accounting for the behavioral FIE, as shown in (
Jiang, et al., 2006). Thus, this model in which upright face perception is mediated by neurons tuned to whole faces is far from “a toning down of the integrative views of face perception” ((
Rossion, 2008), p. 284): Rather, as we stated in 2004 (
Riesenhuber, et al., 2004), a representation composed of neurons tuned to upright faces provides a neural substrate for such a “holistic representation” argued for by the behavioral experiments. Specifically, illustrates how these model units tuned to upright faces show holistic tuning by replicating the classic experiment by Tanaka and Farah (
Tanaka & Farah, 1993), which found that subjects were much better at recognizing a particular face part (e.g., an eye region) in a complete face than when the same part was presented in isolation: Similarly, the left two bars in show that model face units show a much greater signal change in face unit activation for two whole face images that only differ in the eye region vs. presenting the two different eye regions separately (i.e., without a whole face context). Thus, our 2004 paper and the 2006 model anticipate Rossion’s statement in his recent paper that “faces are processed holistically because of their holistic representation in the visual system” ((
Rossion & Boremanse, 2008), p. 10), and even more so with the idea of “a framework according to which holistic perception of faces is highly dependent on visual experience” (ibid., p.10), as the tightly tuned face representation in the FFA (whose tight tuning has been suggested to be a result of extensive experience with faces (
Jiang, et al., 2006)) provides just such a framework. This experience-based model not only fits with recent developmental data (
Golarai, et al., 2007), but also offers straightforward explanations for other face-specific phenomena such as the other-race effect (see discussion in (
Jiang, et al., 2006)), that would be difficult to account for with a model in which experience served to, e.g., develop a general-purpose “configural module” for “expertise processing” or the like (given that faces in general do not differ appreciably in their facial “configurations” across race groups).
Note that the model (see white bars in ) predicts no inversion effect for eye regions presented as isolated parts but a strong inversion effect for the same eye regions embedded in a whole face context (gray bars), compatible with the experimental data (
Tanaka & Farah, 1993). Moreover, inversion strongly reduces holistic processing (compare the VTU activation distance in the “whole face” condition for upright vs. inverted faces in ), as face units tuned to upright faces respond only at low levels to inverted faces, their response to inverted faces being driven by afferent features less affected by image plane rotation (e.g., the putative “mouth” feature described above).
Interestingly, given the absence of a special status of face “parts” and “configuration” in the model, which instead both are rather examples of shape changes that would cause distributed and overlapping changes in the activation patterns at the intermediate feature level, the “simple-to-complex” account predicts that comparable FIE’s can be associated with “featural” as well as with “configural” changes, as the shape-based neuronal face representation is well suited for upright, but not inverted faces, irrespective of whether face pairs differ by features or configuration.
This was the main prediction of the 2004 paper (
Riesenhuber, et al., 2004). At the time, this was a somewhat unfashionable prediction. Back then, the thinking was that there was little, if any, inversion effect for “featural” changes, the FIE being due to “configural processing” not being available for inverted faces. This widespread view was expressed, e.g., in a 2002 review paper (
Rossion & Gauthier, 2002): “First, the FIE for full faces seems to be entirely accounted for by the distinctive relational information present locally” (p.64).
But, in youthful exuberance we likewise wrote in (
Riesenhuber, et al., 2004): “If two modifications to the shape of a face – be they a result of changes in the ‘configuration’ or in the ‘features’ – influence discrimination performance to an equal degree for upright faces, they should also have an equal effect on the discrimination of inverted faces.” While this is likely true on average if changes are distributed over many features (as our data and those of others (
Yovel & Kanwisher, 2004) have shown), it is possible to come up with scenarios in which the FIE can differ even for equalized performance in the upright orientation, depending on the tolerance of afferent feature detectors to rotations in the image plane: Neurons tuned to some intermediate features might also respond well to inverted faces (e.g., the hypothetical two-horizontal lines feature from above), whereas others would not (e.g., the horizontal-to-upper-left-of-vertical), and depending on which of these intermediate units provide input to particular face units, their responses can be more or less affected by specific transformations, be they changes in “features”, “configuration”, or image-plane orientation. Again, the FIE is not determined by how much a shape change affects a face’s “configuration”, but rather by how it changes the activity distribution over the neurons tuned to intermediate features providing input to the face neurons. Very interestingly, in their recent paper (
Goffaux & Rossion, 2007), Rossion and Goffaux provide strong support for this model prediction, showing that replacing the eye region of a face (a bona fide “featural” change) causes an FIE that is as strong as (if not slightly stronger than) that caused by a horizontal displacement of the eyes (a bona fide “configural” change), see Fig. 3 in (
Rossion, 2008). As our (
Riesenhuber, et al., 2004), Yovel and Kanwisher’s (
Yovel & Kanwisher, 2004) and now also Goffaux and Rossion’s data (
Goffaux & Rossion, 2007) show, featural changes can be associated with substantial FIE’s of comparable magnitude as those caused by “configural” changes, strongly arguing against “distinctive relational information” as the only contributor to an FIE, and arguing against a special status of “configural” information in underlying the FIE. This has resulted in a welcome convergence of views. Nevertheless, despite this increasing agreement in spirit, (
Rossion, 2008) contained some specific criticisms about our 2004 paper. We will next address these criticisms in detail, and then show how the model of (
Jiang, et al., 2006) can provide a framework to discuss the recent results of (
Rossion & Boremanse, 2008).