|Home | About | Journals | Submit | Contact Us | Français|
Visual area V4 is a midtier cortical area in the ventral visual pathway. It is crucial for visual object recognition and has been a focus of many studies on visual attention. However, there is no unifying view of V4’s role in visual processing. Neither is there an understanding of how its role in feature processing interfaces with its role in visual attention. This review captures our current knowledge of V4, largely derived from electrophysiological and imaging studies in the macaque monkey. Based on recent discovery of functionally specific domains in V4, we propose that the unifying function of V4 circuitry is to enable selective extraction of specific functional domain-based networks, whether it be by bottom-up specification of object features or by top-down attentionally driven selection.
Area V4 is a midtier visual cortical area in the ventral visual pathway. Although believed to be important in object recognition, V4’s functional role remains elusive. This is in part due to the complexity of V4. V4 comprises cells that exhibit diverse receptive field preferences related to surface properties (color, brightness, texture), shape (orientation, curvature), motion and motion contrast, and depth. V4 has also been a focus of studies on visual attention. It is widely acknowledged that visual attention enhances neuronal firing to relevant stimuli in V4 and suppresses responses to distractor stimuli. However, the relationship between V4’s contribution to attentional processes and its role in object recognition processing is unclear. What is the unifying role of V4 across these multiple feature response dimensions? What ties feature and attention together in V4? This review attempts to bring the reader to our current understanding of V4. We propose that a primary role of V4 is to facilitate figure-ground segmentation of the visual scene. We further propose that the same network which mediates figure-ground computation also enables attentional filtering. We suggest that the unifying function of V4 circuitry is to enable “selective extraction,” whether it be by bottom-up feature-specified shape or by attentionally driven spatial or feature-defined selection. As the bulk of knowledge regarding V4 derives from electrophysiological and functional magnetic resonance imaging (fMRI) studies in the macaque monkey, the emphasis of this review will be on monkey studies. However, where appropriate, reference to human studies is made.
In the macaque monkey, V4 is located on the prelunate gyrus and in the depths of the lunate and superior temporal sulci and extends to the surface of the temporal-occipital gyrus (Figure 1A). V4 contains representations of both superior (ventral V4) and inferior visual field (dorsal V4) representations (Gattass et al., 1988). Recent retinotopic mapping (Figure 1B) of this region using fMRI has provided evidence that it is bounded posteriorly by V3 and anteriorly by dorsal and ventral V4A. While gross retinotopy in V4 is well understood, some important aspects of its organization are still debated. These issues include the location of V4 borders (see Stepniewska et al., 2005 for review), whether it is one area or more, and whether it is comprised of multiple functional maps. Physiologically guided injections of tracer into central and peripheral locations in V4 reveal that only central V4 receives direct input from V1 (Zeki, 1969; Nakamura et al., 1993; Yukie and Iwai, 1985). Central V4 also exhibits strong connections with temporal areas such as TE and TEO, suggesting that it plays an important role in object recognition. Peripheral V4 shows strong connections with dorsal stream areas such as DP, VIP LIP, PIP, and MST (Baizer et al., 1991; Ungerleider et al., 2008), suggesting that V4 plays a role in spatial vision and spatial attention.
Neurons in V4 have diverse response preferences. Originally V4 was characterized as a color area by Zeki (1973, 1983) based on the predominance of color selective receptive fields recorded. However, subsequent studies also found prominent orientation selectivity among V4 cells, suggesting its role in processing of shape information (Van Essen and Zeki, 1978; Schein et al., 1982; Mountcastle et al., 1987). As will be seen in the next section, the diversity of response properties (which include selectivity for color, orientation, depth, and motion) has led to competing notions of the function of V4. It is our hope that this review will offer insights that help make these differing views of V4 compatible.
Lesions of V4 lead to specific deficits in pattern recognition. Monkeys with V4 lesions are moderately impaired in a variety of simple 2D-shape detection and discrimination tasks. However, the V4 lesion literature is somewhat mixed on this issue, perhaps due to differences in the mediolateral extent of the lesions (Heywood and Cowey, 1987; Walsh et al., 1992; Merigan, 2000; Walsh et al., 2000; Girard et al., 2002; Schiller, 1993). In contrast, V4 lesions produce striking deficits in more complex perceptual tasks. For example, V4 lesions lead to loss of ability to discriminate images of 3D objects (Merigan and Pham, 1998), loss of color constancy (Walsh et al., 1993), and deficits in the ability to select relatively less salient objects from an array, or to generalize across different stimulus configurations (Schiller, 1993; De Weerd et al., 1996, 1999).
A very large number of neurophysiological experiments on V4 have focused on attention. In fact it would not be an exaggeration to say that much of our understanding of the neural mechanisms mediating attention has been informed by neurophysiological studies in monkey V4. Note that, while under natural behavioral conditions primates foveate objects of attention, most neurophysiological studies have been conducted in extrafoveal regions of V4 in monkeys performing covert attention tasks (e.g., attending to nonfoveal stimuli while maintaining fixation on a central location).
Development of fMRI over the past 15 years has dramatically advanced our understanding of human V4 and indicates that, to a large extent, human V4 is organizationally and functionally analogous to macaque V4. The retinotopic organization of area V4 and nearby visual areas appears similar in humans (Sereno et al., 1995; Hansen et al., 2007) and macaques (Fize et al., 2003; Gattass et al., 1988). That is, humans appear to possess an inferior field representation of V4 dorsally and a superior field representation ventrally (Hansen et al., 2007). However, some others report a complete hemifield representation within ventral human V4 and conclude that no dorsal V4 exists in humans (Wade et al., 2002; Winawer et al., 2010; Goddard et al., 2011). Beyond retinotopy, many fMRI studies of V4 are broadly consistent with what would be expected based on neurophysiological studies in monkey V4. However, this comparison is difficult to make because interpretation of fMRI results in terms of the underlying neural mechanisms is problematic (Buxton et al., 2004; Logothetis and Wandell, 2004). In any case, from a comparative evolutionary viewpoint, it is likely that many commonalities exist between monkey V4 and human V4, but there may also be specializations in the human that are not present in the monkey.
There have long been suggestions that V4 contains functional compartments. The original evidence for this idea comes from anatomical studies in which retrograde tracer injections in V4 labeled either predominantly thin stripes (associated with color) or pale stripes (associated with form) in area V2 and did not label thick stripes (associated with depth) (DeYoe et al., 1994). Furthermore, tracer injections in inferotemporal areas (PITv and PITd) result in interdigitated segregated label in V4 (DeYoe et al., 1994), indicating some degree of continued functional streaming in the ventral pathway. Intrinsic connections within V4 also have a patchy appearance, with clusters that measure 200–300 μm in size (Yoshioka et al., 1992). Electrophysiological studies have also reported functional clustering of color-selective neurons in V4 (Zeki, 1973; Conway et al., 2007; Conway and Tsao, 2009; Harada et al., 2009). However, in contrast to classic electrophysiological studies in V1 (Hubel and Wiesel, 1977), V2 (Hubel and Livingstone, 1987; Roe and Ts’o, 1995), and MT (DeAngelis and Newsome, 1999), efforts to map V4 with dense grids of electrophysiological penetrations have failed to reveal clear functional organization (cf. Youakim et al., 2001).
Recent advances in fMRI and optical imaging methods have provided new information about functional organization of V4. These studies show that V4 in monkeys is not a homogenous visual area. fMRI studies in alert macaque monkeys reveal color-selective functional domains in several regions of the temporal lobe (including V4, posterior inferior temporal (PIT) cortex, and TE) (Figure 2A, Conway et al., 2007; Conway and Tsao, 2006; Harada et al., 2009). To draw analogy with the color blobs of V1, these regions have been dubbed “globs” (Figures 2B and 2C) and the nonglob regions as “interglobs.” The imaging results are supported by single-unit recordings showing that glob cells are spatially clustered by color preference and may be arranged in “chromotopic” maps (Figure 2D, Conway and Tsao, 2009). Glob cells are narrowly tuned for hue, tolerant to changes in luminance, and less orientation-selective than are interglob cells (Conway et al., 2007). The identification of such globs suggests that V4 is not a homogeneous area and may comprise a collection of modules. It also highlights the need to further investigate the functional organization of V4 and the adjoining brain regions, and to elucidate their relationship with retinotopic definitions of V4, PIT, and TEO.
The first optical imaging study of V4 in anesthetized monkeys revealed orientation-selective domains which were a few hundred microns in size (Ghose and Ts’o, 1997). More recent studies using isoluminant color and achromatic gratings revealed clear functional domains with preference for surface properties (color and luminance, Figures 3B and 3D) and for shape (contour orientation; Figures 3C and 3E) in foveal regions of V4 (Tanigawa et al., 2010). These feature preference domains are segregated within V4 (Figure 3F, pixels in B and D coded pink, pixels in C and E coded green) and measure ~500 μm or less. Within color preference domains are maps for hue (Tanigawa et al., 2010), akin to the hue maps found in V2 (Xiao et al., 2003). No direct comparison between the optical imaging and fMRI studies has yet been made. However, it seems possible that the color/luminance and orientation regions identified with optical imaging correspond to some of the globs and interglobs found in V4 with fMRI. The findings that V4 contains functional domains reconciles a long-standing debate concerning the color-coding properties of V4: studies claiming that all of V4 was a color area were likely biased toward microelectrode recordings from color modules, while those reporting clear shape selectivity were likely biased toward recordings from noncolor modules (Conway and Tsao, 2006; Tanigawa et al., 2010).
How does the presence of segregated functional domains impact topography? Topographic representation in V4 appears to employ the same strategy present in other areas where there are segregated functional domains. In V1, where there are segregated left and right eye ocular dominance columns, the visual map is repeated, once for the left eye and once for the right eye (Hubel and Wiesel, 1977). In V2, where there are three functional stripe types (thin, pale, and thick, Hubel and Livingstone, 1987), the visual field is represented three times, once each for representation of color, form, and depth (Roe and Ts’o, 1995; Shipp and Zeki, 2002a, 2002b). This type of repeated representation results in an interdigitation of different feature maps. A complete visual field representation in one feature modality is achieved by a collection of discontinuous functional domains (e.g., a complete right eye visual field is achieved by coalescing all right eye ocular dominance columns in V1; a complete color visual field is achieved by coalescing all thin stripes in V2). Topographic representation within V4 is similar (Tanigawa et al., 2010; Conway et al., 2007). In the iso-eccentric axis, color maps and orientation maps are interdigitated within V4, and continuity of feature-specific representation is achieved across bands of similar functional preference (Figure 4A). Iso-polar representations map in the orthogonal axis along the color and orientation bands (Figure 4B). In sum, it appears that, at least in foveal regions of V4, the mapping strategy parallels that observed in earlier visual areas.
In this section, we summarize the current knowledge about object feature selectivities in V4. In doing so, we hope to underscore certain aspects of V4 processing that may guide our understanding of what makes V4, V4. That is, we ask, given the diversity of response types in V4, what common transformation(s) underlies these various computations that make them part of this singular area? This line of questioning has been successfully used to examine transformations that occur in other visual areas. For example, by identifying transformations across different submodalities of color (thin stripes), contour (thick/pale), depth (thick), and motion (thick), the functional transformations unique to V2 were characterized (Roe, 2003; Roe et al., 2009; Lu et al., 2010). An important viewpoint that emerged is that functional organization matters. That is, specific clustering of neurons provides insight into the functional computations that are emphasized (or more readily made) in a particular cortical area. Indeed, it is possible that such functional organizational signatures which are unique to each area, if sufficiently understood, may be used to characterize the fundamental computations performed by each cortical area. We will return to this notion at the end of this review.
Color vision begins with the L, M, and S cones in the retina. The cone names derive from their peak wavelength (at 562 nm, 535 nm, 440 nm, respectively). The cone classes do not correspond to our perception of “red” “green” and “blue”; rather, our perception of color requires multiple stages of L, M, S input integration (Chatterjee and Callaway, 2003; Gegenfurtner and Kiper, 2003; Solomon and Lennie, 2007; Conway et al., 2010). An important early stage is the generation of color-opponency: red-green neurons detect differences in L and M cone inputs, blue-yellow neurons compare S and L+M inputs, and light-dark neurons sum L and M cone inputs. These comparisons form the two cardinal color axes and orthogonal luminance axis, and are represented by discrete classes of neurons in the lateral geniculate nucleus (Derrington et al., 1984). Within V1, color opponency is further elaborated and is dominated by cells with responsiveness along the blue-yellow and red-green axes (Dow and Gouras, 1973; Livingstone and Hubel, 1984; Ts’o and Gilbert, 1988; Lennie et al., 1990; Hanazawa et al., 2000; Conway, 2001; Conway and Livingstone, 2006; Xiao et al., 2007). While V1 plays an important role in generating color, it does not contain a representation corresponding to perception (e.g., perception of hues, color constancy, Brouwer and Heeger, 2009; Parkes et al., 2009). It is not until V2 that the first evidence for hue maps (i.e., red, orange, yellow, green, blue, purple, etc.) arises; these hue maps are found in V2 thin stripes (Xiao et al., 2003). An important open question concerns the mechanisms that transform the cone signals into neurons that code hue, and whether the color-tuned neurons in V4 inherit their color preferences or compute them within V4 (Conway, 2009).
Both color and achromatic brightness (light-dark) are important stimulus features that define object surfaces. Brightness perception is subject to many of the same types of contextual influences as color perception (e.g., filling in, Krauskopf, 1963; contextual effects such as lightness constancy and color constancy effects, MacEvoy and Paradiso, 2001; edge-induced percepts such as Cornsweet brightness illusion, Roe et al., 2005, and water color illusion, Pinna et al., 2001). As shown by human functional imaging (Engel and Furmanski, 2001) and electrophysiological studies in monkeys (Livingstone and Hubel, 1984; Roe and Ts’o, 1995), at the level of V1, evidence suggests that color and brightness are largely encoded independently. Little is known about brightness representation in V4.
In his early studies, Zeki reported that the bulk of recorded cells in macaque V4 were color selective and that cells with similar color preferences were clustered in columnar fashion. Subsequent studies have also found a significant clustering of color cells in V4: some clusters contain neurons with similar color preferences and others with diverse color preferences (Kotake et al., 2009). This is consistent with the color domains revealed by functional imaging and may suggest further diversity of color domains. Earlier studies reported that V4 neurons were in fact no more selective for color than were neurons in other visual areas and that they were also selective for other stimulus dimensions (Schein et al., 1982; Tanaka et al., 1986) (although there may be a greater degree of multipeaked chromatic tuning curves than in V1 or V2) (Hanazawa et al., 2000). However, note that differences may only emerge under stimulus conditions that elicit differences (e.g., color constancy, see below). Despite the diversity of response types in V4, the association of V4 with color processing remains and is even used as a strong criterion for identifying V4 response in human studies.
An important aspect of V4 is its role in color constancy. Color constancy refers to the perception that objects maintain a constant color despite the fact that under different illumination conditions the wavelength composition of light reflected from the object changes significantly. In other words, a red apple remains red whether it is in shadow or under sunlight. Although some aspects of color and brightness constancy (Figure 5A) may arise as early as V1 or V2 (MacEvoy and Paradiso, 2001; Conway, 2001; Moutoussis and Zeki, 2002; Shapley and Hawken, 2002), the effects become quite prominent in V4 (Zeki, 1983). Using Mondrian-type stimuli (see Figure 5A, left), Kusunoki and colleagues (2006) found that changing the illumination of the background shifted the tuning of V4 neurons; cells shifted their color-tuning function in the direction of the chromatic change of the illuminant, matching the psychophysical percept (also Schein and Desimone, 1990). In contrast, colored surrounds do not change the chromatic tuning of cells in V1 or V2 (Solomon et al., 2004). Consistent with this finding, lesions of V4 leave color discrimination intact but lead to loss of color constancy perception (in monkeys: Heywood et al., 1992; Walsh et al., 1993; in man: Vaina, 1994). These findings give rise to the speculation that the color domains identified in V4 by functional imaging could be color-constant domains and not hue-selective domains per se, something which remains to be examined. Another open question concerns the degree to which the color modules within V4 are functionally independent of other feature domains. Psychophysically color constancy is dependent on scene segmentation; that is, color constancy is maintained within a depth plane but less so across depth planes (Werner, 2006). This suggests the presence of significant interactions between representations of different feature modalities (e.g., color and depth) in V4.
A recent finding points to the distinction between objects defined by high-contrast achromatic borders and equiluminant color-contrast borders. Bushnell et al. (2011b) report roughly a quarter of cells in V4 exhibit greatest response when shapes are presented at equiluminance to the background and decreasing response with increasing figure-ground luminance contrast. This response type, which has not been observed in either V1 or V2, suggests that chromatically defined boundaries and shapes are a defining feature of V4 and further strengthens the role of V4 in color processing. It also introduces the concept that there may be two distinct form pathways, one for high-contrast-defined form and another for color-defined form.
There is evidence from humans which favors the existence of an extrastriate “color area.” Stroke patients with particular circumscribed lesions of the ventral cortex acquire a deficit of color vision (achromatopsia) yet retain the ability to perceive shape, motion and depth. Imaging studies of healthy human brains show localization of extrastriate color responses to a region on the ventral surface of the brain (although whether this area is within V4 proper or is an area anterior to V4 remains debated) (Barbur and Spang, 2008; Bartels and Zeki, 2000; Hadjikhani et al., 1998; Mullen et al., 2007; Wade et al., 2008). Note that the correspondence of monkey V4 and proposed human “color area” and human cerebral achromatopsia remains in question (cf. Cowey and Heywood, 1997). Importantly, pattern analysis of fMRI responses to colored gratings in humans has shown that the spatial distribution of responses within this region covaries with perceived color, a result that is not found for other visual areas such as V1 (Brouwer and Heeger, 2009). Moreover, microstimulation of this region in humans elicits a color percept (Murphey et al., 2008). To the extent that color is considered a surface property, activation in V4 also appears to correlate with surface perception (Bouvier et al., 2008).
Thus, in the larger debate of whether there is a cortical area(s) specialized for processing color information, the weight of the evidence is suggestive that V4 does perform a transformation that is unique and is central to color perception. Such an important stage is also distinct from higher areas in inferotemporal cortex where functions such as color categorization occur (Koida and Komatsu, 2007) and where color and other object features are combined to generate recognition of objects.
A number of studies have demonstrated that V4 neurons are at least as selective for shape as they are for color. Similar to earlier processing stages, V4 cells are tuned for orientation and spatial frequency of edges and linear sinusoidal gratings (Desimone and Schein, 1987). However, V4 neurons are also sensitive to more complex shape properties (Kobatake and Tanaka, 1994).
One advanced shape property represented in V4 is curvature. Curvature, which can be considered an integration of oriented line segments, is a prominent feature of object boundaries. V4 cells (receptive fields typically 2–10 deg in size) can be strongly selective for curvature of contours (Pasupathy and Connor, 1999, 2001) as well as curved (i.e., non-Cartesian) gratings (Gallant et al., 1993, 1996). Interestingly, a similar curvature-based coding strategy appears to be used at intermediate levels of the somatosensory system (Yau et al., 2009). One proposal suggests that curvature tuning in V4 helps provide an efficient way to encode shape. In fact, recordings from V4 neurons reveal that not all curvatures are equally represented: there is a stronger representation of acute curvatures across the neural population (Carlson et al., 2011) (Figure 5B, right). In visual scenes, acute curvatures are statistically relatively rare but highly diagnostic, so, quite distinct from V1 where all local contour segments are faithfully represented, the V4 bias can be characterized as a sparse, discriminative representation of object shape (Carlson et al., 2011).
Another important aspect of shape coding that emerges in V4 is the transition from retinotopic coordinates to object-centered coordinates. Several lines of evidence suggest that V4 cells are very sensitive to the relative position of texture and contour features within the receptive field, rather than the absolute position of those features. For example, the relative responses of a V4 neuron to a variety of non-Cartesian grating patterns remains constant as those patterns are shifted across the receptive field (Gallant et al., 1996). V4 cells are extremely sensitive to the position of contour fragments within objects. For example, a given V4 cell may respond to convex contour fragments near the top of a shape but not near the bottom (Pasupathy and Connor, 2001). This invariance to relative position may be related to the observation that V4 neurons encode information about the position of stimuli relative to the center of attention (Connor et al., 1996, 1997). Tuning for relative position appears to extend across larger regions of retinotopic space at subsequent stages of processing in inferotemporal cortex (Brincat and Connor, 2004; Yamane et al., 2008). Representation of relative position is critical for any structural shape coding scheme, and current evidence suggests that V4 cells carry sufficient contour shape and relative position information for reconstruction of moderately complex shape boundaries at the population level (Pasupathy and Connor, 2002).
Until relatively recently most of the work on area V4 came from studies using animal models, particularly the macaque monkey. Human studies relied on the rare observation of neurological patients with circumscribed lesions to V4 (Rizzo et al., 1992; Gallant et al., 2000; Merigan et al., 1997). These lesion studies showed that lesions to human V4 have effects similar to V4 lesions in nonhuman primates (De Weerd et al., 1996, 2003; Merigan, 1996, 2000; Merigan and Pham, 1998; Schiller, 1995) and that human V4 lesions affect curvature discrimination (Gallant et al., 2000). More recent fMRI studies suggest that area V4 in humans is activated preferentially by concentric and radial gratings (Wilkinson et al., 2000) and textures (Dumoulin and Hess, 2007).
Computational models have been used to predict object shape from activity of neuronal populations in V4. Responses of V4 have been defined in stimulus subspace, such as contour curvature. The aim is, using V4 responses to one specific subset of (basis) curves, to read out contour curvatures from a population of V4 neuronal responses (Pasupathy and Connor, 2002). Unfortunately, no current neuronal model of V4 provides good predictions of responses to natural images (David et al., 2006). Voxel-based models of V4 developed using fMRI also provide poor predictions of responses to natural scenes, though they perform as well as neuronal models in both earlier and later areas (Naselaris et al., 2009).
There are several possible reasons why current computational models of V4 perform poorly. It could be that V4 represents complex aspects of shape that cannot be captured by the second-order nonlinearities assumed in current models (David et al., 2006). Preliminary reports suggest that this may be true for at least a subset of V4 neurons (J.L.G. and C.E.C., unpublished data). Another possibility is that V4 represents aspects of shape that are more complex than current mathematical models allow. For example, if V4 neurons represent the three-dimensional structure of occluded surfaces then there would be no way to represent this aspect of selectivity using current computational models (Lee et al., 2001).
We perceive depth in visual scenes by detecting small positional differences between corresponding visual features in the left eye and right eye images. This difference is called binocular disparity and permits binocular depth perception, or stereopsis. Disparity-selective response is initially established in V1 (Poggio and Fischer, 1977), where single neurons exhibit sensitivity to a narrow range of depths (measured by the width of disparity tuning curves). In V2, disparity selective neurons are found throughout the thin, pale, and thick stripes, but are most prevalent in the thick stripes (Livingstone and Hubel, 1988; Peterhans and von der Heydt, 1993; Roe and Ts’o, 1995; Ts’o et al., 2001). The association with thick stripes in V2 is reinforced by the presence of functional maps for near-to-far depth in thick stripes (Chen et al., 2008). Since V2 provides inputs both to dorsal (via thick stripes, DeYoe and Van Essen, 1985; Shipp and Zeki, 1985, 1989; Nascimento-Silva et al., 2003) and ventral (via thin and pale stripes, DeYoe and Van Essen, 1985; Nakamura et al., 1993; Nascimento-Silva et al., 2003) pathways, it raises the issue of how these disparity are differentially used in the two pathways.
Although binocular disparity has traditionally been considered a dorsal pathway function (e.g., Livingstone and Hubel, 1988; Sakata et al., 1997; Gonzalez and Perez, 1998), recent physiological studies are overturning this long-standing belief. Indeed, V4 cells exhibit selectivity for binocular disparity (Hinkle and Connor, 2001, 2005; Watanabe et al., 2002; Tanabe et al., 2004, 2005; Hegdé and Van Essen, 2005a), disparity-defined shape in random-dot stereograms (Hegdé and Van Essen, 2005b), and 3-D orientation of bars (Hinkle and Connor, 2002). As shown by studies in both monkeys and humans, these response characteristics are consistent with the use of disparity cues in the ventral pathway for object recognition (fine stereopsis involving higher spatial frequencies, retinal disparities < 0.5 deg, stationary or slowly moving objects), and are distinct from those in the dorsal pathway for vision related to motion, self-motion, and visually guided behavior (coarse steropsis involving lower spatial frequencies, larger retinal disparities between 0.5–10 deg, and moving targets) (Neri et al., 2004; Parker, 2007; Preston et al., 2008). Further confirming V4’s role in fine depth perception, microstimulation in V4 biases behavioral judgment of fine depth (Shiozaki et al., 2012), whereas microstimulation of MT biases behavioral judgment of coarse but not fine depth (Uka and DeAngelis, 2006). Consistent with these results, V4 and IT neurons show trial-by-trial response variation correlated with fine depth judgment (Uka et al., 2005; Shiozaki et al., 2012), while MT neuron responses correlate with coarse depth judgment (Uka and DeAngelis, 2004).
To calculate binocular disparity, how does the visual system find the appropriate matching between left and right eye images? A very useful tool for investigating this “binocular correspondence problem” is the random dot stereogram (RDS, Figure 5C, left), a stimulus in which 3D structure is perceived only with appropriate matching of dots in left and right eyes (Julesz, 1972). The degree of spatial shift of dots between the left and right eyes determines the depth plane perceived. To probe what stage in the visual system binocular correspondence is computed, a control (anticorrelated RDS, aRDS) was designed in which the matching dots were reversed in contrast (e.g., black in left eye to white in right eye, called “false matches”); this contrast reversal prevents binocular correspondence and results in lack of surface-in-depth percept (Julesz, 1972; Tanabe et al., 2008; Doi et al., 2011). Neurophysiological studies revealed that cells in V1 exhibit similar responsiveness to RDS and aRDS stimuli (Ohzawa et al., 1990; Qian and Zhu, 1997; Cumming and Parker, 1997). In contrast, V4 cells substantially reduce their selectivity to disparities in aRDSs, suggesting that the false matching responses elicited in V1 are largely rejected by the stage of V4 (Tanabe et al., 2004; Kumano et al., 2008). Although there is no evidence from single unit studies of any difference in binocular correspondence between V1 and V2 (Okazaki and I.F., unpublished data), Chen et al. (2008) reported that in V2 thick stripes, near-to-far maps are imaged in response to RDSs, but not aRDSs, suggesting that V2 also plays an important role in rejecting false matches.
Disparity cues can be used to calculate absolute distance from the observer. However, a more important function is the determination of distance relative to a background or another object (Westheimer, 1979; Erkelens and Collewijn, 1985; Regan et al., 1986). This requires calculation of relative disparity. Whereas cells in V1 encode local absolute disparity within their receptive field (Cumming and Parker, 2000), the computation of relative disparity begins in V2 (Cumming and Parker, 1999; Thomas et al., 2002). Some cells in V2 exhibit shifts in disparity tuning with shifts in plane of the background, thereby signaling depth relative to the background depth plane (Thomas et al., 2002). In V4, a much higher proportion of cells display such shifts in disparity tuning, and, furthermore, the magnitudes of these shifts are greater than those for V2 (Umeda et al., 2007). Thus, V4 is a stage central to the calculation of relative disparity between spatially adjacent visual planes, a function highly important for fine depth perception and figure-ground segregation.
Size constancy refers to our ability to perceive the size of an object despite different viewing distances (Figure 5C, right). To achieve this, information regarding the differences in retinal image size at different viewing distances must be incorporated with information about object distance. Where and how does this computation occur? The first electrophysiological study to address this question found that neurons in V4 vary their responses relative to size and distance of the viewing plane (Dobbins et al., 1998). More recently, Fujita and Tanaka hypothesized that V4 compensates for change in retinal image size by using visual cues for depth, and then calibrates for the perceived size. In the majority of V4 neurons studied, when stimuli were presented with larger crossed disparities (nearer), the size tuning curves of these cells shifted toward larger size preferences. Thus, V4 neurons may contribute to size constancy by systematically changing theirsize tuning depending on the viewing distance from objects. Consistent with these findings, lesions of V4 (prestriate cortex) (Ungerleider et al., 1977), but not parietal cortex (Humphrey and Weiskrantz, 1969), result in loss of size constancy.
While most neurophysiological research has focused on 2D shape representation, recent work has demonstrated strong representation of 3D shape information in V4 and elsewhere in the ventral pathway. Many V4 neurons are robustly tuned for 3D surface/edge orientation, in a depth-invariant manner (Hinkle and Connor, 2002). V4 neurons are also sensitive to more complex 3D surface shaped based on binocular disparity and shading cues (Hegdé and Van Essen, 2005b; Arcizet et al., 2009). Explicit coding of 3D surface shape in IT (Janssen et al., 1999; Yamane et al., 2008) is likely supported by inputs from such V4 neurons.
Due to the strong association of motion with the dorsal pathway, the role of V4 in motion processing has long been neglected. This has been true despite the number of studies that have shown considerable direction selectivity in V4 (Mountcastle et al., 1987; Desimone and Schein, 1987; Ferrera et al., 1994; Tolias et al., 2005). Depending on the directional criterion used, up to a third of V4 neurons have been characterized as direction selective. Estimates range from about 5% if assessed within the globs (Conway et al., 2007) or 13% overall (preferred: null direction criterion of 10:3, Desimone and Schein, 1987) to about 33% (preferred: null criterion 2:1, Ferrera et al., 1994) (see also Tolias et al., 2005). Although the proportion of direction-selective neurons in V4 is much less than in MT where roughly 90% of neurons exhibit direction selectivity (Albright et al., 1984), it is not dissimilar from that in V1 (20%–30%, e.g., Orban et al., 1986) or V2 (~15%, e.g., Levitt et al., 1994).
In monkey early visual cortex, clustering of direction selective neurons was observed in V2 thick/pale stripes, but not in V1 (Lu et al., 2010). Recent optical imaging studies in anesthetized monkeys (H.L., Chen, and A.W.R., unpublished data) reported clustering of direction-selective response in foveal regions of V4. The presence of directional domains suggests that motion information plays a significant role in V4 processing and that directionality is not merely a residual signal inherited from earlier visual areas.
If there is such significant presence of directional response in V4, what role does it play in the ventral processing stream? One possibility is that motion information in V4 is used for figure-ground discrimination during object motion (Figure 5D). As elegantly put forth by Braddick (1993), a moving object contains a velocity map that separates itself from its background. Based on this velocity map, two types of information are extracted: one for determining the direction and speed of the moving object, one for identifying the object from its background. To determine the global direction and speed of an object, a motion integration process is required because early direction neurons only detect local motion (i.e., the “aperture problem”). On the other hand, to distinguish an object from its background, a differential process is required (cf. Zhou et al., 2000). As previously hypothesized, these two motion functions may be subserved by two different motion pathways, a motion integration process in the dorsal stream (V1→MT→MST) and a motion differentiation process in the ventral stream (V1→V2→V4) (Braddick, 1993).
There is some evidence to support this hypothesis. Ventral and dorsal stream motion signals are anatomically distinguishable from the initial stages of cortical processing. As early as V1, two classes of directional cells can be distinguished in different sublayers of layer 4B (Nassi and Callaway, 2007). MT-projecting V1 cells, which are large cells in lower layer 4B underlying blobs, mediate fast transmission of magnocellular-drive input. V1 neurons projecting to the ventral stream are smaller, slower, and positioned to integrate magnocellular and parvocellular derived inputs. At the next stage in the ventral pathway, in V2, neurons in the thick stripes are known to be sensitive to coherent-motion-defined lines (Peterhans and von der Heydt, 1993) and exhibit orientation selectivity for both differential motion-defined borders and luminance contrast-defined borders (Marcar et al., 2000). Consistent with these electrophysiological findings, optical imaging studies demonstrate that orientation domains in thick/pale stripes are invariant for luminance borders and motion contrast-defined borders (H.L. et al. unpublished data), suggesting a common functional organization for contour processing in V2 thick stripes.
This cue-invariant border recognition process is also found in V4. Mysore et al. (2006) examined V4 responses to motion contours (borders between two patches of random dots drifting in different directions). They found that a significant proportion of V4 neurons showed selectivity to the orientation of such second-order contours and similar orientation selectivity to first- and second-order contours. Imaging studies have also revealed motion-contour orientation maps in V4 similar to conventional orientation maps (H.L. et al. unpublished data). Thus, the nature of motion signals described thus far is consistent with the role of V4 in detecting differential motion. Such a “motion differentiation process” may play a central role in figure-ground segregation.
The summaries presented here suggest that V4 plays a role in the representation of a complex array of visual stimulus features. These include: surface features such as color, luminance, shading, texture (Arcizet et al., 2008, 2009); color constancy and lightness constancy (Figure 5A); contour features such as orientation and curvature (Figure 5B); depth cues that contribute to binocular correspondence, relative disparity, and size constancy (Figure 5C); and motion cues that are used for motion contrast border identification (Figure 5D). However, the functional role of V4 in visual processing is not yet clear. Is there a common functional transformation that V4 performs across these multiple feature modalities? A better understanding of V4 function may come from studies that directly compare responses to multiple featural spaces, akin to those that have been conducted in V2 (e.g., Roe et al., 2009 for review) and in inferotemporal areas (e.g., Vinberg and Grill-Spector, 2008).
Although we as yet lack a unifying hypothesis of V4 function, several lines of evidence point to V4’s role in figure-ground segregation. Such a role would require at minimum the following computations (depicted in Figure 6):
As early as V1, neurons exhibit enhanced activity when their receptive fields lie in figure regions compared to ground regions (Lamme, 1995; cf. Knierim and van Essen, 1992; Kastner et al., 1999), consistent with placing greater emphasis on figure over ground.
In V2, studies suggest associations are first created between borders and surfaces. By measuring responses to Cornsweet stimuli (a stimulus in which a luminance contrast at an edge induces an illusory surface brightness contrast across the edge), studies using both imaging (Roe et al., 2005) and neuronal cross correlation (Hung et al., 2007) showed that edges “capture” surfaces, and thereby lead to integration of border and surface. These Cornsweet responses were found in thin stripes of V2, a well known source of inputs to V4. Such surface capture has also been described with disparity cues for V2 cells (Bakin et al., 2000). In this case, Kaniza-induced illusory edges perceived in depth due to disparity cues “capture” texture elements on the surface despite the fact that those elements lack any disparity cues. Border-surface association has also been demonstrated by von der Heydt and colleagues. In what they call “border ownership” response, they find that responses in V2 and V4 depend on the side on which a luminance-defined figure belongs (Zhou et al., 2000). Such surface capture is also associated with stereoscopic depth, as near disparity response at edges tends to be associated with the figure-side of displays (described for V2 cells in Qiu and von der Heydt, 2005). Thus, using different feature cues, V4 enhances “figureness” by differential neuronal response to the figure versus the ground side of the border.
Featural integration has been examined in studies of colinearity (e.g., Li et al., 2006) and contour completion. The existence, in early visual pathway, of neural response underlying contour completion across gaps is well described (e.g., in V1, Sugita, 1999; in V2, Peterhans and von der Heydt, 1989; Bakin et al., 2000). There are to date few studies of the role of V4 in figural completion behind occluders. However, one recent study compared responses of V4 neurons to real and “accidental” contours (contours produced by the occluder which do not provide information about the true shape of the object) (Bushnell et al., 2011a). This study found that responses to accidental contours were suppressed relative to real object contours, a suppression that disappeared with introduction of small gaps between the occluder and occluded objects. This suggests that V4 is an important stage in image segmentation.
As objects typically can be defined by multiple features (e.g., color, motion, depth, contour), another important step in figure-ground segregation involves border-surface associations across multiple cues. As shown in Figure 6D, a square shape can be defined by luminance contrast, color, depth, or motion contrast cues. Whether such invariance at mid-level processing stages is established by integration across multiple feature-specific input maps from V2 or via intra-V4 circuitry is unknown. Although the number of studies examining invariance in V4 is still limited, recent reports do support cue invariant shape coding in V4. Mysore et al. (2008) have described invariant V4 responses to shapes defined by either static or moving cues. In a study by Handa et al. (2010), monkeys were trained on a cue dependent shape discrimination task (dependent on either a motion cue or luminance cue). About a third of the neurons in V4 responded selectively to a shape under both the motion and the luminance cue conditions. Further studies are needed to support V4’s role in cue invariant shape recognition.
Central to the task of figure-ground segregation is the ability to modify what is perceived as figure and ground depending on situational cues such as stimulus context and attention. Indeed, there are numerous demonstrations of the ability of the visual system to modify the interpretation of what is figure and what is ground (e.g., the classic vase/face example where the figure is perceived as either a vase or as a pair of face profiles). That neuronal response in V4 is highly adaptable and modifiable will become particularly evident in the following section on attentional modulation. In particular, the role of top-down and bottom-up attentional influences on V4 activity has been a topic of intense investigation in the last two decades. However, only recently has the relationship between object representation and attention come into sharper focus.
In the sections above, we have summarized studies on V4’s role in processing object features. There is also a vast literature on attentional effects in V4 (for reviews, Desimone and Duncan, 1995; Chelazzi et al., 2011). Our purpose here is to try and draw ties between these two disparate bodies of literature. In some sense, these two bodies of investigation—object feature encoding and attentional effects—could be classified as studies on “bottom-up” and “top-down” influences in V4, respectively. Both interface in V4 and both selectively shape networks in V4 (cf. Reynolds and Desimone, 2003; Qiu et al., 2007). (Note that for the purposes of this review, although object “salience” may influence attention, we consider this part of the bottom-up process. Here, we use the term “attention” to refer to internally generated, top-down influences.)
We frame our conception of V4 function in terms of “selection”. The visual attention literature commonly uses the term “select” to indicate either a region of space that is selected (spatial attention) or specific object features that are selected (feature attention). In the same vein, objects in the visual scene “select” the neuronal networks in V4 that encode their features. We propose that these two “selection” processes share a common framework. More specifically, we propose that the functional architecture in V4 is the substrate through which both sets of influences are mediated and that, at the neural level, selective modulation of networks in V4 may be fundamentally the same, albeit directed from different sources.
Our perceptual system is continuously confronted with much more information than it can actively deal with. One way to reduce processing load is to select a fraction of the incoming visual information for scrutinized processing. Visual attention achieves this by focusing on a particular location in space (spatial attention) or on certain features of objects (feature attention). The ability to attend appropriately can be negatively affected by having other competing objects (distractors) in the visual field. In the biased competition model of visual attention (Bundesen, 1990; Desimone and Duncan, 1995; Grossberg, 1980), attentional selection is achieved via a competition for neural resources; this competition can be biased in several ways. One source of this bias comes from involuntary, sensory-driven bottom-up mechanisms (e.g., salient attention-attracting stimuli). Another biasing mechanism is voluntary attentional top-down feedback (e.g., internally generated goal-directed attention), which presumably originates in areas outside the visual cortex. The biased competition model states that only those stimuli that win the competition against surrounding distractor stimuli will have further access to higher order neural mechanisms linking percepts to mechanisms sustaining goal-directed actions including systems involved in memory, decision-making and generating motor plans (Desimone and Duncan, 1995; Luck et al., 1997; Moran and Desimone, 1985). One goal of this review is to consider this integrative bottom-up and top-down view in the context of functional organization in V4.
Spatial attention has often been characterized as a “spotlight” on a region in space where visual processes appear heightened (e.g., Posner, 1980). At the neural level, attention to a location in space boosts the neural activity of neurons with receptive fields within that attended locale. Such effects have been observed both in dorsal and ventral pathways. It is reported that the levels of enhancement are low in V1 and V2 (typically < 5%) and more robust in areas such as V4 and IT (15%–20% with single stimuli) (e.g., Moran and Desimone, 1985; Desimone and Duncan, 1995; Treue and Maunsell, 1996; Reynolds et al., 1999, 2000; Kastner and Ungerleider, 2000; McAdams and Maunsell, 2000; Chelazzi et al., 2001; Mitchell et al., 2003; Reynolds and Chelazzi, 2004; Cohen and Newsome, 2004). Usually the presence of distractors leads to a reduction of neural response to the stimulus in the receptive field. However, attention can significantly enhance neural response in the presence of distractors (or to low contrast stimuli). This boost in neural activity by spatial attention has been equated with increased sensitivity to stimuli (e.g., to contrast levels), thereby, in a sense, boosting the apparent visibility of an object (e.g., Reynolds et al., 2000; Carrasco et al., 2004).
Although the relationship of neuronal response and BOLD response is still not well understood, boosting of response by attention is also observed in imaging studies. When human subjects attend to cued locations in the visual field, regions of visual cortex that are topographically mapped to these locations exhibit elevated BOLD response (Tootell et al., 1998; Brefczynski and DeYoe, 1999; see also Sasaki et al., 2001; Buracas and Boynton, 2007). In macaque monkeys, optical imaging of V4 in monkeys performing spatial attention tasks also exhibit topographically appropriate elevated hemodynamic signals (Tanigawa and A.W.R., unpublished data). With respect to functional organization in V4, this spatial attention enhances activity of all functional domains falling topographically within the attended locale. Thus, although there are many questions surrounding the relationship between neuronal spiking activity and hemodynamic response, both measures indicate enhancement of response by spatial attention.
The term “feature attention” has been used to refer to both feature value (e.g., red, green, blue) and feature dimension (e.g., color). Thus the oft-used phrase “feature-based attention” can be understood as both “feature selection” and “dimension selection”. These are two sets of perceptual phenomena with distinct underlying neural mechanisms; both involve specific modulations of V4 neuronal activity. At the neural level, when multiple stimuli are simultaneously presented within the neuron’s RF (e.g., a red vertical bar and a green horizontal bar), attention to the item matching the cell’s preferred stimulus enhances the neuronal response beyond that to the items presented alone.
One form of feature-based attention uses feature values to identify relevant items in the scene (e.g., the red item; Motter, 1994; Bichot et al., 2005; Chelazzi et al., 1998, 2001). In contrast to spatial attention, behavioral evidence in humans indicates that feature-based attention can affect processing throughout the entire visual field, in a parallel fashion (Sàenz et al., 2003; Maunsell and Treue, 2006). Consistent with this, single-unit recordings from area V4 of macaque conducting feature-based search tasks have revealed that neuronal responses to elements that share the target-defining features are enhanced during the search process, even before the animal locates the designated target. Motter (1994) demonstrated that V4 neurons are differentially activated depending on a match or nonmatch between an instructional cue and the receptive field stimulus. In other words, regardless of spatial geometry, this form of feature-based attention is able to “highlight” all the objects in the visual array that are potentially relevant for the task at hand (Motter, 1994; Chelazzi et al., 1998, 2001; Bichot et al., 2005). Essentially, the mechanism allows privileged processing of these objects, while other objects are effectively filtered out in parallel across the visual array.
An important aspect of V4 function is its dynamic and context-dependent response to the visual scene. Here we supply three examples.
Evidence from a recent study (David et al., 2008) showed that feature-based attention can alter spatial tuning properties of neurons in area V4. Neuronal responses were recorded while animals were deploying both spatial and feature-based attention within the context of a modified match-to-sample task or a free-viewing visual search task. It was found that orientation and spatial frequency tuning of many V4 neurons tended to shift in the direction of the orientation and spatial frequency content of the sought target. The data appeared to be consistent with a matched filter mechanism in which neurons shift tuning to increase the neural representation of relevant features, at the cost of representation of irrelevant features. Thus, feature “highlighting” can occur not only by response enhancement but also by biasing the sensitivity of the neuronal population toward attended features.
Enhancing activity of neurons that encode attended features allows the system to also enhance the representation of whole (or bound) objects containing that feature. For instance, feature-based attention of this sort can aid selection of a designated target element on the basis of color information, e.g., the red item, which then translates into selective processing and discrimination of another feature of the same item, e.g., its shape (e.g., Sohn et al., 2004).
A second form of feature-based attention involves identifying a specific object feature while at the same time ignoring other features of the same object (Fanini et al., 2006; Nobre et al., 2006; Mirabella et al., 2007; Wegener et al., 2008). In contrast to feature tagging of objects, sorting or classifying of objects on the basis of one elemental feature dimension (e.g., color of fruit or shape of fruit, Figure 7A) requires that the unity of perceptual objects be broken down. This type of feature attention was directly examined in a study in which the activity of V4 neurons was recorded while an animal was attending to either the color or the orientation feature dimension of colored oriented bars (four possible colors and four possible orientations) (Mirabella et al., 2007). Monkeys were trained to turn a response lever to the left in response to two of the four colors (red and blue) and two of the four orientations (0° and 45°), and to the right in response to the other two colors (yellow and green) and two orientations (90° and 135°). Monkeys responded (left or right) to color-orientation pairings. To perform the task correctly, the monkeys had to selectively attend to the feature dimension that was cued, while ignoring the other feature dimension. The study reported that responses of V4 neurons to otherwise identical stimuli are modulated depending on the task cue (Figure 7B); remarkably, the selected task-relevant features were “selected” into one of two behaviorally relevant response categories (left versus right). This type of task-dependent neuronal response grouping provides the first evidence that network associations in V4 can be directed, not only by sensory-defined features, but also by top-down motor output categories.
Neuronal firing rate may not be the only means by which attentional signals are mediated. Recent findings suggest that feature-based attention may also act by increasing synchronization among the neurons selective for the relevant features, particularly in the gamma-band (35–70 hz) frequency range (Bichot et al., 2005; Taylor et al., 2005; Womelsdorf and Fries, 2007). In a visual search task that contained an array of objects defined by both color and shape, Bichot et al. (2005) showed that gamma band oscillations occurred more frequently when attended targets fell in the receptive field (both initially and prior to eye movements), suggesting a role for synchrony in feature-guided serial and parallel search. Such enhancements in gamma band oscillation have also been reported to occur during spatial attention tasks (Fries et al., 2001). Furthermore, other studies report that attentional modulation leads to decreased firing rate synchronization in V4, and proposed this as a way to reduce correlated noise and thus enhance signal-to-noise (Mitchell et al., 2009; Cohen and Maunsell, 2009). Participation of enhanced versus decreased correlation may be cell type specific. Mitchell et al. (2007) demonstrated that attentional modulation acts particularly strongly on fast spiking neurons (interneurons) in V4, highlighting a potential role of interneurons in enhancement of gamma synchrony (Tiesinga et al., 2004; Buia and Tiesinga, 2006). Anderson et al. (2011a) found attention affected firing rate differently for bursty versus nonbursty pyramidal cells. How the effects on gamma synchrony relate to neuronal firing enhancement during attention is not clear.
Synchrony has also been proposed to underlie binding of object features, thereby enabling perceptual unity (e.g., Singer and Gray, 1995). Neuronal oscillations of cells in different cortical columns in cat visual cortex may or may not synchronize depending on stimulus geometry (such as spatial separation and feature orientation) (Gray et al., 1989). Enhanced neural synchrony has also been demonstrated when contours are perceived to be part of the same surface but not when interpreted as belonging to different surfaces (Castelo-Branco et al., 2000). Thus, synchrony is a potential way to temporally bind different stimulus features in a cell assembly and provide coherent global percepts. Although our current understanding of the role of synchrony is still evolving (indeed synchrony has been implicated in many mental processes), perhaps it can be viewed as a mechanism for establishing relations (Singer, 1999), whether it be relations within a shape, within an attentional focus, or within a memory trace (e.g., Harris et al., 2003).
One hint comes from the association of gamma band oscillation with hemodynamic signals. Hemodynamic signals are thought to be more closely related to local field potentials (LFPs) than to action potentials (Logothetis et al., 2001). In fact, Niessing et al. (2005) reported that optically imaged hemodynamic response strength correlated better with the power of high-frequency LFPs than with spiking activity. Optical imaging of attentional signals in V4 in monkeys has shown enhancement of the hemodynamic response during spatial attention tasks (Tanigawa and A.W.R., unpublished data). This is consistent with reported enhancements in gamma band synchrony (Fries et al., 2001) and predicts that spatial attention acts by elevating response magnitude in all functional domains within the attended locale (Figure 8A). This study also showed that feature-based attention (e.g., attention to color) may be mediated, not via enhancement of imaged domain response, but rather via enhanced correlations between task-relevant functional domains (e.g., color domains) in V4. Thus, feature attention may be mediated via correlation change across the visual field, but only within domains encoding the attended feature (Figure 8B). These differential effects of spatial and feature attention suggest that domain-based networks are dynamically configured in V4.
We briefly give some consideration to how attentionally mediated reconfiguration of networks in V4 might be directed by top-down influences. V4 receives feedback influences from temporal (DeYoe et al., 1994; Felleman et al., 1997), prefrontal, and parietal areas (Stepniewska et al., 2005; Ungerleider et al., 2008; Pouget et al., 2009). In this sense, V4 is well positioned for integrating top-down influences with information about stimuli from the bottom-up direction.
Although imaging and neuropsychological studies strongly suggested that feedback signals from fronto-parietal cortex interact with sensory signals in visual areas such as V4, it has been difficult to prove a causal link between activity in frontal (or parietal) cortex and modulation of visually driven activity. One area in prefrontal cortex that has been proposed as a source of top-down influence is the frontal eye fields (FEF), a cortical area responsible for directing eye movements. During overt attention, FEF initiates circuits which direct the center of gaze toward salient objects. During covert attention, similar neuronal mechanisms may be at play (which has led to the “pre-motor theory of attention”) (Corbetta et al., 1998; Corbetta, 1998; Hoffman and Subramaniam, 1995; Kustov and Robinson, 1996; Moore et al., 2003; Moore and Armstrong, 2003; Moore and Fallah, 2001; Moore and Fallah, 2004; Nobre et al., 2000; Rizzolatti et al., 1987). If so, then FEF should play a causal role in directing attention and in influencing V4 activity.
Currently, the only evidence of causal influences from FEF comes from studies of spatial attention. Moore and colleagues provided the first elegant evidence showing such a causal link (Moore and Fallah, 2004). By using microstimulation in FEF, they showed a causal relationship between altered activity in the FEF and spatially specific enhanced visual representations within V4. Second, they showed that microstimulation in FEF increased perceptual abilities at the stimulated visuotopic locations.
More recently, using fMRI methods, Ekstrom and colleagues examined the effect of electrical microstimulation in FEF on visually driven responses in V4 and other extrastriate cortical areas of behaving monkeys (Ekstrom et al., 2008, 2009). They found that voxels in V4 which showed the strongest enhancement of fMRI activity caused by FEF microstimulation were not the voxels with the strongest visual responses, but rather adjacent voxels. In fact, strongly visually driven voxels themselves were unaffected or even suppressed by FEF microstimulation. These results led them to test whether effects of electrical stimulation on visually driven activity in V4 would be stronger in the presence of “distractor” stimuli. Without distractors, electrical stimulation increased fMRI activity in V4. With distractors (which normally cause a decrease in activity), the activity in V4 voxel increased substantially beyond the effect without distractors. These results are consistent with neurophysiological studies that show stronger enhancement in the presence of competitive distractors. The same authors showed in a second study that FEF microstimulation leads to a nonproportional scaling of the contrast-response curve in visual areas including V4, suggesting that attention-induced increase in neural sensitivity can be mediated by top-down influences.
In sum, existing data indicate that top-down feedback modulates activity in V4 in a way that parallels spatial attention effects, and, furthermore, the magnitude of effect depends on specifics of bottom-up stimuli (i.e., presence/absence of distractors, salience). This is clear evidence that V4 integrates both sensory and attentional effects. It remains unknown how such specificity is achieved via anatomical feedback which is described as diffuse, broad and divergent (cf. Rockland and Drash, 1996; Pouget et al., 2009; Anderson et al., 2011b).
We conclude by trying to link the feature encoding and attentional encoding (cf. Reynolds and Desimone, 2003; Qiu et al, 2007) aspects of V4 with its functional organization. We have seen that V4 encodes a range of stimulus properties (contour, color, motion, disparity) and have proposed that these contribute to figure-ground segregation processes. We have also seen that V4 is prime real estate for mediating bottom-up and top-down attentional effects. We propose (1) as suggested by studies cited in this review, that these feature representations are tied to feature-specific domains within V4, (2) that domains of shared feature selectivity are anatomically and/or functionally linked into feature-specific networks, and (3) that attentional mechanisms map onto these domain networks and shape them in spatially and featurally specific ways.
We suggest that the unifying function of V4 circuitry is to enable selective extraction, whether it be by bottom-up feature-specified shape or by attentionally driven spatial or feature-defined selection (Figure 9). Thus, during bottom-up driven processes, stimulus features select which domains to modulate. During top-down attentional processes, feedback influences select which domains to modulate. This selective modulation creates an active network of functional domains that can be dynamically configured. Under what conditions such selection is mediated by enhancement of activity versus domain-domain correlation requires further investigation. For example, in case of spatial attention, all domains within a restricted region of V4 are networked. In the case of color constancy, a color network is selected. In case of shape representation, orientation domains are networked. In case of color search a color network is also selected, albeit driven by top-down sources. Subsets of color, shape, depth, and motion domains can all be dynamically reconfigured into stimulus-specific or task-specific networks. Shifting attention from one feature to another would be implemented by enhancement of one feature domain network and suppression of another.
When viewed in this fashion, there may be surprisingly little difference between object-induced effects and attentionally induced effects. In brief, in terms of functional organization in V4, attending to an object (considered a mental state) may be very similar to making it more visible (considered an object state). Of course, finer neuronal selection is expected beyond domain-based selection. However, when viewed from a domain-based perspective within V4, vision and visual attention may not be so different and may differ largely by association with other brain regions.