|Home | About | Journals | Submit | Contact Us | Français|
When an image is presented to one eye and a very different image is presented to the corresponding location of the other eye, they compete for perceptual dominance, such that only one image is visible at a time while the other is suppressed. Called binocular rivalry, this phenomenon and its deviants have been extensively exploited to study the mechanism and neural correlates of consciousness. In this paper, we propose a framework, the unconscious binding hypothesis, to distinguish unconscious and conscious processing. According to this framework, the unconscious mind not only encodes individual features but also temporally binds distributed features to give rise to cortical representation, but unlike conscious binding, such unconscious binding is fragile. Under this framework, we review evidence from psychophysical and neuroimaging studies, which suggests that: (1) for invisible low level features, prolonged exposure to visual pattern and simple translational motion can alter the appearance of subsequent visible features (i.e. adaptation); for invisible high level features, although complex spiral motion cannot produce adaptation, nor can objects/words enhance subsequent processing of related stimuli (i.e. priming), images of objects such as tools can nevertheless activate the dorsal pathway; and (2) although invisible central cues cannot orient attention, invisible erotic pictures in the periphery can nevertheless guide attention, likely through emotional arousal; reciprocally, the processing of invisible information can be modulated by attention at perceptual and neural levels.
In everyday life, our two eyes usually receive similar inputs from visual environment. What if each of the two eyes views dissimilar images, as shown in Figure 1? Rather than melding into a stable composite, the two images rival for visibility, with one temporarily dominating perception for seconds and being replaced in dominance by the other in turn (Figure 1a). This perceptual illusion is binocular rivalry (BR, for a review see Blake, 2001). According to Wade (1998), the observation was first reported by Porta in 1593, who viewed different pages from two books with a partition between the two eyes.
Recently, there is growing interest in BR as a tool for exploring the dynamical properties of visual awareness and its neural concomitants (for a review, see Tong et al., 2006). The neural correlates of consciousness are defined by Koch (2004, p. 16) as “the minimal set of neuronal events and mechanisms jointly sufficient for a specific conscious percept”. Fundamental to this quest, however, is to understand the neural correlates of processing with and without awareness. In other words, if the two experimental conditions differ only in awareness with sensory inputs kept constant, then the neural differences between the two conditions should correlate with awareness. Unlike backward masking or crowding, wherein awareness is manipulated by changing visual stimulation (e.g. timing and spacing, respectively), in BR, visual stimulation is invariant yet the observer’s conscious state is continually in flux (for a review of different psychophysical techniques for manipulating visual awareness, see Kim and Blake, 2005). Moreover, some limitations of BR in studying the neural correlates of awareness, such as unpredictable switches in perception and relatively short suppression duration, can be surmounted by a recent technique derived from rivalry—continuous flash suppression (CFS, Fang and He, 2005; Tsuchiya and Koch, 2005). In CFS, a series of different, contour-rich, high-contrast patterns are continuously flashed to one eye at ~10 Hz to suppress information presented to the other eye (Figure 1b). CFS is effective and reliable in suppressing even highly salient images throughout an entire viewing period, sometimes longer than 3 minutes, whereas visual masking renders visible information invisible by presenting the stimuli less than 33 ms to establish objective unawareness (Box 1). Such long periods of subliminal processing in CFS might produce robust behavioral and neurophysiological effects, such as priming and subliminal conditioning. On the other hand, CFS entails deeper suppression than BR does. For example, when measured with gratings in a probe detection task, the contrast increment thresholds of CFS and BR (vs. non-rivalry conditions) are 1.4 log-units and 0.5 log-units, respectively (Tsuchiya et al., 2006). For these reasons, CFS, albeit new, is now widely used to suppress visual stimuli from awareness (Bahrami et al., 2007; Fang and He, 2005; Gilroy and Blake, 2005; Jiang et al., 2006; Jiang et al., 2007; Jiang and He, 2006; Moradi et al., 2005; Pasley et al., 2004; Tsuchiya and Koch, 2005; Yang et al., 2007). The main finding is that weak signal that fails in the competition for conscious representation can still produce significant behavioral effects and neural activations.
One of the central questions in interocular suppression (i.e. when an image is suppressed from awareness during BR or CFS) and consciousness in general, is the processing level of suppressed information. In other words, what is the fate of unconscious information and where does it reach within the brain? This is an essential question to consciousness since it constraints the distinction between consciousness and unconsciousness. If unconscious information cannot be processed at all regardless of how it is rendered invisible (i.e. it is as if no information is presented), this would prove that consciousness and unconsciousness are profoundly different at the earliest stage and easy to distinguish at both behavioral and neural levels. If, on the other hand, unconscious information is processed to the same extent as conscious information in the brain except for conscious state, this would imply that the only difference between consciousness and unconsciousness is human subjective experiences. As one can imagine, the real story is much more complicated than these extremes: unconscious information can be processed to some extent contingent on the types of stimuli and attentional resources, etc.
We propose that the brain can not only encode invisible features (orientation, motion direction, etc.) but also temporally bind distributed invisible features to give rise to cortical representation, although such unconscious binding is fragile. In the sessions to follow, we will first briefly review the scope and limits of unconscious processing during BR (see Kouider and Dehaene, 2007 for a review on visual masking) with an emphasis on human behavioral and neuroimaging studies. Then we will discuss in details the depth of invisible information processing for different types of information, ranging from features, objects, tools, faces, to affective information. This will be followed by a discussion of the functional role (especially attentional guidance) of invisible information, and how invisible processing can be modulated by top-down attention. Finally, we will close the article with the take-home messages from this area of research.
Imagine that a triangle or a square is presented so briefly that you feel you are unable to see it (subjective criterion); your accuracy in guessing the identity of the image may nevertheless be much better than chance level (objective criterion). But if you objectively cannot see the images during forced-choice procedure (that is what we need to establish), it seems paradoxical to study the effect of invisible stimuli. How can humans be affected by stimuli that “absolutely” cannot be perceived? The central idea of consciousness lies in that during the multiple stages of processing, consciousness emerges only after elaborate perceptual processing (Erdelyi, 1974). When the processing stage that gives rise to consciousness is interrupted, information is processed unconsciously to a certain degree contingent on factors such as stimulus saliency and attentional capacity. Theoretically, the ideal technique to characterize the depth of processing is to block the stage just prior to the emergence of consciousness without suppressing the stages before that. It is unclear, however, exactly which stage (and what parts of the brain) that give rise to consciousness. A fruitful approach is to probe the depth of processing under unconscious state at both behavioral and neural levels. This is somewhat similar to how attention researchers tackle the debate of early vs. late selection in attention (i.e. whether attention exerts its modulation effect at an early sensory stage or at a late stage)—the crux is to understand the processing fate of unattended stimuli (Kanwisher and Wojciulik, 2000; Lavie, 1995). What can BR and CFS tell us in this sense?
BR is effective in blocking information from reaching awareness. Observers are often unable to detect changes in a suppressed target unless those changes are accompanied by abrupt transient changes in luminance or contrast (Blake and Fox, 1974b; Blake et al., 1998). To further assess visual sensitivity during suppression, Fox and colleagues developed the test-probe procedure—probes (i.e. targets) are briefly presented to an eye during either dominance or suppression phase to assess visual sensitivities to them (for a review, see Blake, 2001). They found that suppression entails a general, non-selective loss in visual sensitivity of the suppressed eye—probes presented in the suppressed eye are more difficult to detect than those in the dominant eye (Wales and Fox, 1970), even when the probes differ significantly from the original suppressed stimuli (Fox and Check, 1968; Fox and Check, 1972).
To evaluate what can be processed under suppression, besides the test-probe procedure, researchers tap into several techniques including adaptation (Box 2) and priming (i.e. prior experience increases sensitivity to subsequent related stimuli). To what extent can interocularly suppressed information be processed? The results are mixed. After adaptation, some behavioral aftereffects (AEs) (Box 3), especially low-level AEs (e.g. pattern and translational motion AEs), can be largely preserved during interocular suppression under certain conditions; certain brain areas, particularly the amygdala and dorsal cortical areas, exhibit. robust activities for fearful faces and tools, respectively, as measured by functional magnetic resonance imaging (fMRI). Other behavioral AEs and priming, however, are severely disturbed, especially high-level AEs (e.g. complex motion and face identity AEs), semantic priming, and cueing information. To explain the behavioral findings, especially why simple features but not complex features and objects can be processed, Blake (1997) proposed that 1) rivalry suppression disrupts the binding of local features into coherent, global representations; 2) suppression transpires within visual areas forming a pathway into the parietal lobe, several stages away from V1 (also known as primary visual cortex, striate cortex, or Brodmann’s area 17), where local features are registered. A decade after this account was postulated, emerging studies especially those with fMRI have cumulated; some findings, however, cannot fit in this proposal and wait for theoretical understanding. For example, why behavioral studies fail to find object priming effect and face adaptation whereas fMRI studies are able to show neural activity to objects and fearful faces in dorsal stream and the amygdala/superior temporal sulcus (STS), respectively? This mirrors the complication of the depth of unconscious processing in BR.
In this paper, we argue that both the conscious and unconscious minds face the binding problem (Treisman, 1996): a distributed hierarchical network in the brain processes different features of an object, and it is the brain’s job (in particular, the parietal and frontal cortex) to correctly select and integrate separate neural representations of features to form coherent object representations. We propose that binding during unconscious processing is possible, albeit fragile: the brain can associate, group, or bind certain features in an invisible scene to form certain cortical representation, and such binding can be detected under optimal conditions. Under this framework, we review evidence from unconscious processing of low-level visual features (e.g. orientation, spatial frequency), and then proceed to high level visual categories (e.g. objects, tools, and faces), affective and attentive processing. In the following, the scope and limits of unconscious processing in interocular suppression will be discussed and organized in five themes: 1) feature analysis; 2) object (semantic) processing; 3) emotional processing; 4) attentional guidance by invisible information; and 5) attentional modulation of invisible information processing.
To what extent are cortical areas supporting feature analysis (e.g. V1) spared during interocular suppression and thus not directly involved in consciousness (Crick and Koch, 1995; Lin, 2008)? At one extreme all basic features (orientation, spatial frequency, etc.) can be processed in suppressed condition to the same extent as in dominant condition, and thus the cortical areas supporting such feature analysis are not directly involved in consciousness (all-exemption hypothesis). At the other extreme, no basic feature can be processed no matter what, and thus all responsible cortical areas are involved in consciousness (null-exemption hypothesis). An intermediate position is that features can be processed, but to a lesser extent, when suppressed, and thus underlying cortical areas are involved in consciousness (partial-exemption hypothesis). The critical test is to figure out the levels of perceptual and neural processing across a range of features under suppressed and dominant conditions.
The null-exemption hypothesis is unambiguously falsified by behavioral adaptation studies, as summarized in Table 1. Early studies show that, under some conditions, interocular suppression does not reduce the strengths of AEs after adaptation to a variety of low-level features (for definitions of AEs mentioned in the following, see Box 3): tilted lines (tilt AE, Wade and Wenderoth, 1978), squarewave gratings (spatial frequency AE, including contrast threshold elevation and spatial frequency shift, Blake and Fox, 1974a), McCollough-type gratings (orientation-contingent color AE, White et al., 1978), and translational motion (motion AE, Lehmkuhle and Fox, 1975; O’Shea and Crassini, 1981). Is this evidence for the all-exemption hypothesis, that the neural basis of these AEs, such as V1 and MT+, are not directly related to visual awareness? The critical test to tease apart the all-exemption hypothesis and the partial-exemption hypothesis is to clarify the relative strengths of AE after adaptation to suppressed and dominant stimuli. Two recent studies provide behavioral evidence for the partial-exemption hypothesis: the strength of negative AI after adaptation to suppressed (vs. visible) oriented gratings was significantly weaker during BR (Gilroy and Blake, 2005) or CFS (Tsuchiya and Koch, 2005). Moreover, monkey single-unit recordings (Leopold and Logothetis, 1996; Sheinberg and Logothetis, 1997), human electroencephalogram recordings (Cobb et al., 1967; Lansing, 1964), and brain imaging (Haynes et al., 2005; Lee et al., 2005; Lumer et al., 1998; Polonsky et al., 2000; Tong and Engel, 2001; Wunderlich et al., 2005) show robust awareness-dependent modulations in V1—neural events in V1 are attenuated in response to suppressed (vs. dominant) visual stimuli. Thus, these studies demonstrate that V1 is directly involved in visual awareness, supporting the partial-exemption hypothesis (Lin, 2008). How to reconcile the discrepancy between the behavioral adaptation and neurophysiological studies? Blake et al., (2006) provided a nice resolution to this debate (Figure 2). This study taps into the finding that some visual AEs depend critically on the contrast of the adaptor, with the strength of adaptation saturating at moderate to high contrast levels (Figure 2a). Critically, full-strength AEs observed in previous studies might only hold for high contrast adaptor. Indeed, using high contrast adaptors, Blake et al., (2006) replicated previous studies; however, using low contrast adaptors, they showed that interocular suppression did weaken the strength of threshold elevation AE and motion AE (Figure 2c–d). This implies that at least some of the neural events underlying rivalry suppression transpire before or at the site(s) of threshold elevation and motion adaptation. Presumably, the neural mechanisms of threshold elevation and motion AEs are closer than those of AEs that are not modulated by awareness to the neural correlates of consciousness.
Further support for the partial-exemption hypothesis calls for evidence that some (presumably complex) features might not be processed at all when suppressed. Motion AE (MAE) is an excellent candidate for testing this idea because of its rich variety. As mentioned above, translational (i.e. linear) MAE is largely spared during suppression, similar to the finding of preserved motion priming after suppression (Blake et al., 1999). Importantly, however, Wiesenfelder and Blake (1990) did observe that the duration of spiral AE after adaptation to spiral was proportional to the total duration of spiral visibility during adaptation, similar to the disruption of the drifting plaid-induced MAE during suppression (van der Zwan et al., 1993). Further evidence comes from illusory AE: subjective (i.e. illusory) contour AE (van der Zwan and Wenderoth, 1994) and square-wave illusion AE (Blake and Bravo, 1985), which are believed to arise from intercortical interactions in early visual areas (e.g. V1 and V2, Lee and Nguyen, 2001), are also disrupted during suppression. Thus, some complex features such as spiral, drifting plaid, illusory contour, and square-wave illusion are almost completely disrupted during interocular suppression.
The picture that emerges from these adaptation studies is that all visual features can be modulated by visual awareness to different degrees. Processing of basic features (e.g. tilts and simple motions) is modulated by visual awareness to a lesser extent than processing of complex features (e.g. complex motions); when contrast of basic features is high, such processing can be even exempted from modulation by consciousness. In neural terms, V1 feature analyzers, albeit inhibited to a certain extent, are largely responsive to suppressed visual features compared with later visual areas (such as MT+). This might reflect that interocular suppression occurs at early stages of processing and increases at later cortical stages, as elaborated in the following sections. A more complete picture entails moving beyond feature analysis to examine the processing of higher level visual inputs.
To what extent can objects and semantic information be processed unconsciously? Given that complex features are deeply suppressed during BR (see 3.1. Feature analysis), it seems that object representation and semantic analysis are almost impossible. Indeed, accumulating evidence suggests that 1) although face identity-specific AE can be observed in visible condition, this AE is effectively cancelled by interocular suppression (Moradi et al., 2005), 2) pictorial object priming in a naming task can be found only for stimuli that are processed sufficiently to be identified in the priming stage (Cave et al., 1998), and 3) word-priming effect in a word/non-word decision task can be measurable only if the observers consciously perceive the prime words (Zimba and Blake, 1983). These observations raise an intriguing question regarding the neural basis of the disruption of object processing and semantic priming during suppression. According to Lamme and Roelfsema (2000), it seems that encoding of simple features (e.g., spatial frequency) is hardwired and occurs without awareness when information first enters early visual cortex in a feedforward sweep. These features are then attentively grouped to enter consciousness through recurrent processing, by means of horizontal connections and feedforward/feedback projections. Built on this, there are two possible accounts for the disruption of object processing and semantic priming: one account suggests that all basic visual features of objects are processed to some extent, with the disruption due to inefficient attentive grouping or recurrent processing (i.e. grouping disruption account); the other account, which we favors, maintains that some critical features of objects are disrupted such that binding/grouping is impossible simply because of missing critical features (i.e. feature disruption account). The latter account is consistent with our unconscious binding framework; binding is possible during conscious processing if critical features can be registered and attentively grouped. To date, there is no convincing evidence regarding this issue. Below, we discuss some recent advances regarding the neural representation of object and semantic information during suppression. Note that this approach is descriptive rather than explanatory because it cannot falsify either the grouping disruption account or the feature disruption account.
The current state-of-the-art regarding the mechanisms of BR is that interocular suppression per se occurs at early stages of processing (e.g. the lateral geniculate nucleus (LGN), Haynes et al., 2005; V1, Polonsky et al., 2000) and increases at subsequent cortical stages in both the ventral (form) and the dorsal (motion) pathways, possibly due to cumulative lateral competition (Nguyen et al., 2003). In particular, neural activity within object-selective areas in the ventral pathway, especially the inferotemporal cortex (IT)1, is almost completely suppressed (Sheinberg and Logothetis, 1997; Tong et al., 1998). The IT plays an important role in both object recognition and semantic processing. First, the IT is critical for object recognition in non-human primates and humans. In primates, the IT codes complex objects and is responsible for view-and position-invariant object representations (Tanaka, 1996; Ungerleider and Mishkin, 1982). In humans, the IT, which comprises the ventral surface of the human brain, extending from around the occipito-temporal border to the middle part of the temporal cortex (Tanaka, 1997), is also vital for object perception, as demonstrated by numerous studies such as those using fMRI adaptation—the observation of decreased neural activity for repeated versus novel stimuli (Grill-Spector et al., 2006; for an overview, see Box 2). For example, repetition priming for objects is observed in mid-levels of the neural processing hierarchy, including extrastriate visual cortex extending into the IT and left dorsal prefrontal cortex, but not early visual areas or motor areas (Buckner et al., 1998). The disruption of object priming effect during BR, therefore, might be due to the suppression of the ventral pathway, where the human homologue of the monkey IT is located. Second, the IT also plays an important role in semantic processing: the semantic neural network extends from left inferior frontal cortex into the IT lobe, and includes occipital cortex and the fusiform gyrus (Tyler et al., 2001). Although with backward masking several studies have observed object recognition priming, or word priming, or both (Dehaene et al., 2001; Dehaene et al., 2004; Dell’Acqua and Grainger, 1999; Devlin et al., 2004; Gaillard et al., 2006; Kiefer and Brendel, 2006; Naccache et al., 2005; Nakamura et al., 2005), these findings are probably due to incomplete disruption of the IT. For example, IT neurons in the macaque monkeys retain substantial information about the target images despite the masking procedure (Rolls et al., 1999).
Although the proposal that interocular suppression deepens in later stages (Nguyen et al. 2003) receives large amounts of evidence in the ventral pathway as reviewed above, it is not clear whether the same holds for the dorsal pathway. In particular, the functional organization of the visual pathways in the cerebral cortex comprises not only the ventral “vision-for-perception” (i.e. to obtain information about the features of objects) pathway but also the dorsal “vision-for-action” (i.e. to guide movements) pathway (Goodale and Milner, 1992). The ventral and dorsal pathways carry out different computations on visual information from the retina: the former recognizes an object independent of its size, momentary orientation, and position; the latter computes the absolute metrics of target objects in a frame of reference centered on specific effectors (i.e. egocentric coding). Failing to observe object/semantic priming effect and the disruption of ventral visual areas (e.g. the IT) during suppression, therefore, need not be interpreted that objects or words cannot be processed at all. In particular, although areas such as the lateral occipital complex (LOC) in the ventral pathway show preferential activation to images of objects (Malach et al., 1995), the dorsal pathway also has several object-sensitive areas, including V3A/V7 (Grill-Spector et al., 1998) and intraparietal sulcus (Grill-Spector et al., 2000). Importantly, the object-sensitive regions in the dorsal pathway differ from those in the ventral pathway—the dorsal object areas, presumably because of its important role in reaching and grasping, prefer manipulable objects such as man-made tools, which are commonly associated with specific hand movements (Chao and Martin, 2000). Thus tools are a unique category of objects and serve as an excellent candidate to test the level of object processing in the dorsal pathway. Indeed, a study by Fang and He (2005) reveals that suppressed invisible images of tools can activate the dorsal pathway. Using fMRI, they showed that dorsal cortical areas (including V3A, V7, and part of the intraparietal areas) responded strongly to different types of visual objects suppressed by CFS, with stronger responses to images of tools than human faces (Figure 3). This study thus fits nicely with the perception–action model (Goodale and Milner, 1992), and is further supported by the observation that when the motion of a rival stimulus is consistent with self-generated actions during BR, such actions can extend the dominance durations and abbreviate the suppression durations of that stimulus (Maruya et al., 2007).
How to reconcile the discrepancy between ventral and dorsal activity to invisible objects? That ventral activity is almost abolished whereas dorsal activity is somehow preserved in interocular suppression might reflect the functional differences between the parvocellular (P) and magnocellular (M) channels, in terms of selectivity to spatial and temporal frequency, color, motion, and luminance contrast (Box 4). The P and M pathways are preferentially associated the ventral and dorsal cortical pathways, respectively. Such distinctions between the P and M pathways potentially form part of the anatomical basis of different sensitivity of interocular suppression in the ventral and dorsal pathways. It is proposed that rivalry transpires mainly in the P pathway with visual information in the M pathway escaping rivalry suppression (Carlson and He, 2000; He et al., 2005). Building on this sensitivity account, Fang and He (2005) suggested that dorsal activation to tools might arise from the residual signal after incomplete suppression in visual cortex or from subcortical projections. At least two important issues, however, remain unclear. The first issue concerns the functional significance of such dorsal activity for tools: what are the behavioral consequences and why? One approach to address this issue is to characterize the levels of representation for tools, using adaptation and priming methods mentioned above. The second issue involves the neurophysiological origins of dorsal activity for invisible tool images: does tool information get to the dorsal pathway through V1, or subcortical projections, or both? Neither the grouping disruption account (i.e. suppression disrupts grouping of simple features of objects) nor the feature disruption account (i.e. suppression disrupts processing of critical features of objects) could disambiguate the subcortical vs. cortical origins. Similarly, the sensitivity account of P and M does not resolve this issue. It seems that such ambiguity of neurophysiological origins is a general issue in neuroscience and might reflect the complicated connections in the nervous system. For example, area MT receives not only inputs from V1, V2, and V3 (DeYoe and Van Essen, 1988), but also direct projection from the LGN in the macaque monkeys (Sincich et al., 2004). To distinguish subcortical and cortical contributions, several important questions warrant empirical investigations: First, if rivalry indeed transpires mainly in the P pathway but not the M pathway, them why? It’s unclear how P and M cells differ in this aspect, and how the cells within each pathway differ from each other. Given that the ventral pathway receives inputs from both P and M cells (Merigan and Maunsell, 1993), it is reasonable to speculate, based on the P and M sensitivity account, that there should be some activity in the ventral pathway from M cells during interocular suppression. Second, how to isolate the interconnections between different areas in the brain? Such interconnection makes it difficult to distinguish subcortical vs. cortical contributions: it is almost impossible to exclude subcortical contributions for cortical activity, including activity in the dorsal pathway; similarly it is also difficult to reject contributions from cortical areas such as V1. Current neuropsychology research seems shed little light on this issue. For instance, although blindsight patients with a lesion in V1 can display preserved visually based action, this is not direct evidence that subcortical pathways are sufficient in the Fang and He study: it is just unlikely that V1 is totally damaged in these patients. Monkey lesions studies will help to resolve this issue. In humans, a possible approach might be to examine attentional modulation of dorsal activity to tools, and to elucidate and quantify the differences in attentional modulation of subcortical and cortical pathways (see 3.3. Face perception).
In summary, behaviorally, object identification and semantic analysis are largely depleted during interocular suppression. Neural activity in the ventral pathway is almost completely disrupted; however, there is still considerate amount of activity in the dorsal pathway to images of some categories of objects, such as tools, which provides neural evidence for the neuropsychological observation of action without identification. That invisible images of tools can activate the dorsal pathway supports our unconscious binding hypothesis in that it at least suggest that certain features are binded to give rise to dorsal object-sensitive areas.
The face, as a special category of objects, is processed differently from other categories of objects (Farah et al., 1998) with dedicated neural substrates (e.g. the fusiform face area, FFA, Kanwisher et al., 1997; the occipital face area, OFA, Halgren et al., 1999). Although activity in the FFA for suppressed faces is almost entirely abolished (Fang and He, 2005; Pasley et al., 2004; Tong et al., 1998; but see Jiang and He, 2006), invisible fearful faces can robustly activate the left amygdala (Jiang and He, 2006; Pasley et al., 2004; Williams et al., 2004) and the STS (Jiang and He, 2006). For instance, using CFS, Jiang and He (2006) observed that in the amygdala visibility did not modulate activity for fearful faces; however, it had a profound effect for neutral faces. Similarly, in the STS, activity was robust to invisible fearful faces but not to neutral faces. On the contrary, in the FFA, activity was still measurable, albeit much reduced, to both fearful and neutral faces.
Reminiscent of the unclear origins of dorsal activity for tools (see 3.2. Object and semantic processing) is the debate regarding whether activity in the amygdala to invisible fearful faces is due to projections from cortical or subcortical pathways. Accumulating evidence from both experimental animals and humans seems to favor the subcortical account. Rodents, for example, exhibit fear conditioning with auditory or visual stimuli without respective sensory cortex (Armony et al., 1997; Romanski and LeDoux, 1992). Similarly, blindsight patients with a lesion in V1 nevertheless exhibit residual abilities to detect and localize visual stimuli (Weiskrantz, 1997) and recognize facial expressions (de Gelder et al., 1999); engagement in the latter task activates the amygdala (Morris et al., 2001). Convergent evidence comes from healthy humans. For example, Morris et al. (1999) measured neural activity for two angry faces, one of which was associated with a burst of white noise through previous classical conditioning. When rendered invisible by backward masking with a neutral face, the conditioned angry face (compared with the unconditioned angry face) enjoyed increased connectivity among the right amygdala, the pulvinar, and the SC but decreased connectivity among the right amygdala, the fusiform, and the orbitofrontal cortex. When the conditioned angry face was visible, however, such co-variation disappeared. On the other hand, the left amygdala could not differentiate aware and unaware conditioned angry faces; its connectivity with the pulvinar and the SC showed no context-specific co-variation. These data suggest a subcortical pathway that enables invisible stimuli to access the amygdala. A further demonstration of subcortical but not cortical pathways’ involvement in invisible fearful face processing is provided by Pasley and colleagues (2004). They found that, invisible fearful faces (compared with non-face objects) activated the amygdala but not the IT, which suggests that rudimentary discrimination of certain complex visual patterns does not require a high-level cortical representation.
However, competing evidence argues otherwise. For instance, anatomically the crucial link between the pulvinar and the amygdala has not been demonstrated in primates yet (Pessoa, 2005). In addition, results from blindsight patients are ambiguous because in the same blindsight patients invisible information can also activate cortical areas. This alternative explanation renders it impossible to rule out the contribution of cortical pathways (e.g. extrastriate areas) in processing fearful faces. For example, images of complex objects presented in the blind visual field activate several visual areas, including MT+/V5 to rotating spiral stimulus, and lateral occipital cortex (MT+/V5 and the LOC) and posterior fusiform gyrus (V4/V8) to coloured images of natural objects (Goebel et al., 2001). It is therefore desirable to see whether invisible fearful faces can activate the amygdala in blindsight patients with full lesions in visual pathways.
How to tease the two accounts apart then? To tackle the cortical-subcortical debate, one approach is to use features that can distinguish cortical from subcortical processing. First, although anatomically both cortical and subcortical pathways terminate at the amygdala, they come with different transmission properties (Pessoa, 2005). For instance, in primates the IT (Nakamura et al., 1992; Stefanacci and Amaral, 2002), but not earlier visual cortical areas (Iwai and Yukie, 1987; Webster et al., 1991), slowly passes detailed information to the amygdala. On the contrary, information transmission in the retinotectal pathway (i.e. the retino-collicular-pulvinar-amygdala pathway, an important subcortical pathway which proceeds from the retina to the SC, posterior nuclei of the thalamus such as the pulvinar, and then onto the amygdala) is rapid and shallow (LeDoux, 2000). Critical to the current reasoning, this means that subcortical pathways are able to (relatively) surpass attentional modulation (Anderson et al., 2003; Vuilleumier et al., 2001) and visibility constraint (Morris et al., 1998; Whalen et al., 1998), but unable to analyze visual input in a fine-grained scale (Anderson et al., 2003; Williams et al., 2004). It is observed, for example, that activity in the amygdala increased significantly for happy versus neutral faces only when the faces were invisible (Williams et al., 2004). Williams et al. argue that although the amygdala still encodes affective information from the face stimuli, it has a limited capacity to differentiate affective valence when it must rely on information from subcortical inputs. Second, spatial frequency is another feature that can potentially distinguish the cortical and subcortical projections (for a review, see Johnson, 2005). Specifically, the fusiform cortex favors high spatial frequency over low frequency face regardless of emotional expressions, whereas the amygdala favors low frequency over high frequency fearful faces. Evidence for the subcortical account comes from the observation that, critically, low frequency but not high frequency fearful faces could activate the pulvinar and the SC (Vuilleumier et al., 2003). Third, a controversial one is whether the susceptibility to attentional modulation may provide another tool to distinguish subcortical and cortical processing. Pessoa and colleagues forcefully argue that a strongly automatic process should be largely independent of attention, among other top-down factors including task context, interpretation, and visual awareness (Pessoa, 2005). Indeed, using fMRI Pessoa and colleagues (Pessoa et al., 2002a; Pessoa et al., 2002b) found that, if the task was sufficiently demanding, activity in the amygdala and other areas was modulated by attention even for visible emotional faces. When participants were totally unaware of the fearful faces that flashed for 33 ms, no differential activation was observed in the amygdala (Pessoa et al., 2006). Event-related potentials (ERP) studies provide complementary evidence for this attentional modulation argument. For example, a greater frontal positivity in response to arrays containing fearful faces, relative to neutral faces, was obtained about 100 ms after stimulus onset only under attended condition (Eimer et al., 2003; Holmes et al., 2003). Pessoa and colleagues reason that, given the rich details contained in facial expressions, the critical pathway involved in processing emotional expressions is cortical, which starts from V1 to extrastriate cortex, the fusiform, the STS, the IT, and then to the amygdala. On the other hand, subcortical routes are insufficient to give rise to activity in the amygdala to invisible emotional faces. Thus, they argue for the cortical account.
Although there is no consensus regarding which pathways are more responsible for processing invisible emotional expressions, we believe that accumulating evidence favors the subcortical account. In particular, before those studies showing attentional modulation of invisible emotional face processing can be taken as evidence against the subcortical argument, the assumption that a truly subcortical pathway should be automatic and free of attentional modulation needs to be grounded firmly. To us, this assumption has not been well supported. For example, although relative to cortical pathways, subcortical pathways are less susceptible to attentional modulation, they are still susceptible to attentional modulation (Kastner and Pinsk, 2004). Attention is regarded as the gatekeeper of sensory inputs: in vision, attention modulates neural activity as early as in the LGN (Chen et al., 1998; O’Connor et al., 2002) and the pulvinar (Kastner et al., 2004; Robinson and Petersen, 1992); in audition, it starts as early as 20 ms after stimulus onset in auditory cortex (Woldorff et al., 1993). As such, attentional modulation should not be taken as evidence against the involvement of subcortical pathways in invisible emotion processing. More research is needed, though, to quantify attentional modulation of subcortical and cortical processing. Moreover, the subcortical retinotectal pathway should be anatomically established in primates. It is likely that the relative contributions from cortical and subcortical routes are quantitative rather than mutually exclusive.
In conclusion, interocular suppression disrupts ventral temporal activity for faces, but not amygdala activity for fearful faces. It is still a matter of debate, however, regarding the relative contributions of subcortical and cortical projections to amygdala activity for invisible fearful faces. Overall, accumulating evidence seems to favor the subcortical account, but more research is needed to quantify the relative contributions of the two pathways, and to elucidate whether subcortical inputs can be sufficient in processing invisible fearful faces. We speculate that unconscious detection of fearful expression might result from binding of critical features (e.g. month shape and eye shape), regardless of whether these critical features are conveyed through subcortical or cortical pathways, or both.
At any single moment, our environment bombards us with far more information than can be consciously registered and effectively processed. Attention, the ability to focus on a small portion of behaviorally relevant information while ignoring irrelevant information, acts as an information-processing bottleneck and is fundamental to subsequent processing. But as elaborated above, information kept out of consciousness can nevertheless enter into the brain and be processed at multiple stages. This raises an intriguing question about the functional and ecological significance of unconscious processing. In particular, can unconscious information guide conscious processing? For example, can information suppressed during BR/CFS influence allocation of visual attention?
To address this, brain machinery in processing invisible information should be linked to its potential role in deploying attention. At neural level, although there is no evidence that invisible information can activate the frontoparietal network, which is important for controlling attention (for a overview of the neural mechanisms of top-down and bottom-up control of attention, see Box 5), there is evidence that invisible information can activate dorsal areas such as V3A, V7, and part of the intraparietal areas (Fang & He, 2005). Behaviorally, the possibility of attentional guidance by invisible information was first examined by Schall and colleagues (1993). In their study, a black dot was used as an orienting cue, which appeared on either the left or right side of the central circle, signifying that the target would appear on either the left or right side of the screen, respectively. The cue was either 80% or 90% valid (i.e. accuracy in predicting the target location) and was presented during either the dominant (i.e. visible) or suppressed (i.e. invisible) phase of BR. They found that reaction times to the targets were significantly affected by cue validity during only the dominant phase but not the suppressed phase, which implies that symbolic central cues, when rendered invisible, cannot direct top-down attention. It is unclear, however, whether this negative finding is due to ineffective guidance of attention (i.e. the cue is processed but not to the degree of serving as an effective cue), or due to ineffective cue processing (i.e. the cue is not processed at all). If the latter account is correct, then the null attentional effect can be attributed to sensory analysis rather than attentional guidance per se. More studies are needed to clarify this issue, the critical test being specifying the processing levels of the cueing information.
Since attention includes not only top-down attention, as explored by Schall and colleagues (1993), but also bottom-up attention, it remains important to ask whether invisible information can “capture” attention in a bottom-up manner. At the cortical level, it is unclear whether suppressed information can activate the right ventral frontoparietal network, a critical network for bottom-up attention (Box 5). At the subcortical level, although it is also unclear whether the SC or the pulvinar are employed during suppression, there is some evidence that the amygdala, which is responsive to invisible fearful faces, is able to guide attention. For example, in the attentional blink (Raymond et al., 1992), when several stimuli are briefly displayed in sequence (i.e. rapid serial visual presentation), observers usually fail to detect a second target when it is presented 200–500ms after the first one; importantly, compared with normal stimuli, negative stimuli usually show a preferential ability to break through into awareness. In support of the current reasoning, it has been shown that the amygdala is important for such attenuation of the attentional blink. For example, when exposed to aversive words, patients with left anterior–medial temporal lesions or bilateral amygdala lesions failed to show attenuation of the attentional blink (Anderson and Phelps, 2001), implying a causal role of the amygdala or the anterior-medial temporal cortex in enhancing emotion-related information processing. The amygdala’s ability to modulate the attentional blink may derive from its reception of the rapid and crude information from the pulvinar in the thalamus through the retinotectal pathway (Zald, 2003; see also 3.3 Face perception). Since suppressed fearful faces can activate the amygdala, it seems logical to speculate that some invisible emotional information might be able to attract attention automatically. Indeed, using invisible erotic pictures, a recent study by Jiang et al. (2006) lends support to this hypothesis. As illustrated in Figure 4, they presented participants with an erotic picture and a scrambled picture, both rendered invisible by CFS, next to the central fixation point, one on each side. To assess whether the invisible erotic picture could guide attention, they asked participants to indicate the perceived orientation (clockwise or counterclockwise) of a briefly presented Gabor patch following the presentation of the suppressed erotic images, which could be on the left or on the right (Figure 4a). The logic is, if the erotic picture can automatically capture attention, then participants should perform better when the Gabor patch is presented on the same (vs. different) side as the erotic picture. The difference between the two conditions can be used to index the amount of automatic attentional guidance (i.e. the implicit attentional effect). In other words, a positive attentional effect means that attention is attracted to the erotic picture, whereas a negative attentional effect means that attention is repelled from the erotic picture. Interestingly, they found that invisible erotic pictures could either attract or repel observers’ spatial attention depending on their gender and sexual orientation. Specifically, for heterosexual participants, attention was attracted to invisible erotic pictures of the opposite gender (and for males, attention was repelled from invisible erotic pictures of the same gender; Figure 4b). Gay males were similar to heterosexual female participants in that they were attracted to male erotic pictures but not female erotic pictures. Bisexual females fell in-between the heterosexual male group and the heterosexual female group. By showing the power of invisible information in orienting attention, this study is consistent with earlier studies including those showing that 1) exogenous cues rendered invisible by visual masking can capture attention (McCormick, 1997), and that 2) oriented Gabor patches, with such high spatial frequency that it is perceptually indistinguishable from a uniform field, can generate orientation-dependent spatial cueing effect (Rajimehr, 2004). It remains unclear however what kind of information in the erotic pictures is responsible for orienting attention. According to the unconscious binding hypothesis, one possibility is that nude bodies in the erotic pictures, albeit invisible, increase arousal levels of the observers through features binding. Future research is needed to specify the conditions that can generate top-down and bottom-up attention, and to reveal the neural mechanisms that support such implicit attentional guidance.
In short, although invisible dot cues fail to provide cueing effect, invisible erotic images can attract attention and boost performance in the locations where these images appear. Together with other studies using masking to show orienting of attention, these results provide evidence for the existence of implicit attention. According to the unconscious binding hypothesis, it is likely that binding of certain critical features in the invisible erotic images generates representation of arousal value. How the brain binds these critical features and the exact mechanisms of attentional guidance by implicit information await future research.
It is now clear that information rendered invisible by BR or CFS can be processed to several levels and functionally can guide attention. But what are the limits of such unconscious processing? In particular, is invisible information processing constrained by concurrent processing resources, or instead it is so automatic that it is not under the control of attention?
There is no unified view regarding the relationship between attention and awareness. On the one hand, since both attention and awareness are selective in nature due to limited resources, some argue that they are identical (O’Regan and Noe, 2001; Posner, 1994). On the other hand, since we are able to attend to the locations of invisible images and can also become conscious of the gist of a scene in the near absence of attention, others maintain that attention and awareness can be dissociated (Koch and Tsuchiya, 2007; Lamme, 2003). To resolve this debate, it is necessary to address some conceptual issues. First, is attention necessary for awareness? At first glance this seems true (Dehaene et al., 2006): we become aware of what the paper is talking about only if we pay attention to it. To falsify this, we need to search for “awareness without attention”; the critical test is whether we can perceive an image without attention. This is partly supported by a study showing that we can be aware of the gist of a scene almost without attention (Li et al., 2002). Second, is attention sufficient for awareness? It is obvious that attention is insufficient for awareness: we can attend to the locations of invisible information but still be unaware of it, as in CFS. Thus, it seems that attention is neither necessary nor sufficient for awareness.
The dissociable relationship between attention and awareness provides conceptual grounds to ask how attention can modulate both aware and unaware visual processing. In aware condition, attention plays a critical role in determining the quality of representation of incoming information. On philosophical grounds, Block (2005) argues that awareness without attention is short-lived and vulnerable (“phenomenal awareness”), analogical to retinotopic fleeting memory (i.e. iconic memory for briefly presented visual stimuli); only with attention can awareness become stable and deep (“access awareness”), analogical to durable non-retinotopic memory (i.e. working memory). In neurobiological terms, when feedforward processing occurs among early visual areas, phenomenal awareness arises; only when recurrent interactions grow to include executive or mnemonic space (frontal, prefrontal, and temporal cortex) can access awareness take place (Lamme, 2003). In other words, recurrent interactions between higher brain areas and visual areas are necessary for awareness, with attention playing a critical role.
In unaware condition, does attention still play a critical role in determining how much we can process? In other words, can invisible information processing be modulated by attention? The critical test is to contrast invisible information processing under attended and unattended conditions (for a discussion of unattended vs. irrelevant stimuli, see Box 6). A recent study with CFS (Kanai et al., 2006) suggests that spatial attention could not modulate the strength of the tilt AE induced by invisible adaptors, whereas feature-based attention could. However, because they manipulated spatial attention by instructing the observers to attend to one of the two spatial markers drawn on top of the Mondrian patterns, this may not be effective in optimizing unattended condition—attention in one spatial marker can still spill over to the other maker (Lavie, 1995; see also Box 6). Indeed, a recent study (Bahrami et al., 2007) made perceptual load of the central task very high to deplete attentional resource available in each trial, such that little attention would spill over to the task-irrelevant distractors. Using such demanding task to load attention, they show that attention in the foveal task strongly modulated retinotopic activity in V1 evoked by invisible objects. However, in the same study attention failed to modulate V1 activity for noise stimuli used for CFS, making it difficult to interpret the positive results of attentional modulation of invisible objects. It thus remains unclear why attention fails to modulate neural activation for noise but succeeds in modulating neural activity of invisible objects. Future research should address whether and how attention dynamically modulates invisible information. For example, in the visual cortex, attentional modulation of visible stimuli increases from early to late processing stages, but attention effect in the LGN is larger than that in V1(Kastner and Pinsk, 2004). Whether this holds for invisible stimuli awaits empirical investigation. Interestingly, Bahrami et al. (2007) show that attentional modulation effect was larger in V1 than V2 or V3. The time is ripe for further investigations.
It thus appears that explicit attention can modulate invisible information processing. Several questions remain open though. In particular, it is unclear whether, and how, different cortical areas differ in sensitivity to attentional modulation during suppression. In addition, it is not known how different types of attention, such as space-based attention (i.e. attention to locations) and feature-based attention (i.e. attention to features), might show different modulation properties. We suspect that attention is necessary to unconscious binding; moreover, without attention, unconscious processing of features is not possible.
Human mental life extends well beyond conscious experiences. Although much has been learned about BR, the mechanism of information processing during suppression remains elusive. To understand this mechanism, we must understand the depth of information processing during suppression at behavioral, neural, and theoretical levels. In this paper, we have advocated the unconscious binding hypothesis, that binding of invisible features are possible albeit susceptible to interference. Although this hypothesis is still in its infancy, the studies reviewed here provide important insights. First, low level features can be processed unconsciously with processing level modulated by awareness and attention. Second, high level representations of objects and faces in the ventral visual areas are dramatically suppressed, but tools and emotional faces can still activate dorsal areas and the amygdale, respectively. Third, invisible information can serve as an implicit cue to guide attention, which we refer to “implicit attention”. Last, invisible information in turn can be modulated by external explicit attention. Understanding the mechanisms subserving invisible information processing will bring new insights into how the visual system operates without consciousness, as well as the neural correlates of consciousness in general.
The notion that consciousness reflects subjective experience is central to nearly all theories of consciousness. On the other hand, the scientific quest for perception without awareness and its neural correlates requires establishing objective unawareness of the stimuli. In fact, the lack of an accepted measure of awareness has made any claim of perception without awareness controversial. We believe that to be objective, measures should be both reliable and valid (cf. Lovibond and Shanks, 2002). To be reliable, measures should not be contaminated by demand characteristics (i.e. an experimental artifact where observers change their behavior to conform to the experimenter’s expectations) or response bias (e.g. individual differences in reporting thresholds). To be valid, measures should truly tap into the presumed theoretical construct of awareness. In other words, assessment should be both relevant and sensitive to the question being investigated; at the same time, assessment should be sensitive only to aware but not unaware processes (Merikle and Reingold, 1992; Wiens and Ohman, 2002).
In practice, measures of awareness can be classified into two types: subjective (self-report) and objective (forced-choice). In subjective measurement, report of seeing (or not seeing) the stimuli is taken as being aware (or unaware) of the stimuli; in objective measurement, better than chance (or around chance) performance in discriminating between alternative stimuli is regarded as being aware (or unaware) of the stimuli (Merikle et al., 2001). Subjective measures, albeit intrinsic to the concept of awareness, are potentially confounded by response bias (Green and Swets, 1966; Macmillan and Creelman, 1991): people who are under-confident tend to set up a high criterion and report stimuli invisible even when the stimuli are above visibility thresholds, making it an unreliable measurement. Instead, forced-choice procedures, which yield more criterion-independent measures of awareness, are routinely used in the quest for neural correlates of awareness (Eriksen, 1960; Holender, 1986). At the same time, to preserve the merits of subjective measures while avoiding confounds (e.g. subjective criteria), it is also of merits to use signal detection theory to characterize behavioral performance with receiver operating characteristic curves in detection task (Evans and Azzopardi, 2007; Kunimoto et al., 2001; Pessoa et al., 2005). In general, for studies strongly based on the prerequisite that the stimuli are invisible, objective measures of awareness should be used. Moreover, validity issues should be considered carefully.
Sensory systems are constantly adapting to changes in the environment and adjusting their sensitivities accordingly. In fact, it is such a ubiquitous property that it occurs at multiple stages of processing and has been studied with multiple techniques, ranging from psychophysics to single unit recordings and fMRI. When measured with psychophysics, visual adaptation refers to the phenomenon that prolonged exposure to a visual stimulus (i.e. adaptor) alters the visual system’s sensitivity to, or the appearance of a subsequent related stimulus (i.e. test), with the altered appearance called visual aftereffect (AE). When measured with fMRI, adaptation refers to the observation of decreased neural activity for repeated versus novel stimuli (i.e. fMRI adaptation, Krekelberg et al., 2006; Lin, 2007). Specifically, adaptation is termed pattern adaptation if an effective pattern (e.g. tilt) serves as an adaptor to reduce the responsiveness to a subsequent test, and termed contrast adaptation if an effective contrast image (usually but not necessarily a high contrast one) serves as an adaptor to reduce contrast sensitivity to a subsequent test. A special category of AE, called afterimage (AI), does not require a particular test to observe the effect; in other words, an image continues to appear in one’s vision after the original image has ceased.
What is the neural mechanism of adaptation? Generally, adaptation to bright environment and dark environment (i.e. light adaptation and dark adaptation, respectively) is believed to occur entirely in the retina (Shapley and Enroth-Cugell, 1984). Similarly, negative AI (see also Box 3f) is largely attributed to retinal mechanisms with some contributions of post-retinal process (e.g. Shimojo et al., 2001). Yet, there are at least two reasons to believe that, in general, AE due to pattern and contrast adaptation is mainly a cortical phenomenon with some limited subcortical contributions (for a review, see Graham, 1989). First, AE in one eye affects response to un-adapted stimulus presented to the other eye (Gibson, 1937), implying a binocular mechanism. Although neurons in the lateral geniculate nucleus (LGN) display interocular transfer of information (e.g. with their receptive field surrounds, McClurkin and Marrocco, 1984; Sillito et al., 1994), and the LGN is reciprocally connected to other thalamic nuclei that contain binocular neurons (e.g. the perigeniculate nucleus, Steriade and Deschenes, 1984), excitatory binocular processing within the geniculocortical pathway occurs first in primary visual cortex (V1, Hubel, 1960). Second, AE is orientation specific such that a horizontal adapting grating does not influences the threshold or the apparent spatial frequency of vertical test gratings (Blakemore and Nachmias, 1971). Critically, orientation selectivity and tuning is not found before V1 (Hubel and Wiesel, 1962, 1968). Thus, AE has been used to infer the properties of cortical feature analyzers (Gibson and Radner, 1937). On the other hand, AE does have some subcortical mechanisms. For instance, most neurons in the LGN still show adaptation to contrasts of drifting sinusoidal gratings, albeit to a lesser degree than neurons in visual cortex (Ohzawa et al., 1985). More strikingly, a recent study of the macaque monkeys found that magnocellular (but not parvocellular) LGN neurons showed strong contrast adaptation that originated in the ganglion cells, pushing the mechanisms of contrast adaptation to subcortical pathways (Solomon et al., 2004; but see Mante et al., 2005). That said, spatial frequency specific contrast adaptation and presumably other types of pattern specific visual adaptation is still believed to origin in V1 (Duong and Freeman, 2007), which is selective for visual features such as orientation, direction, position, and speed. For example, it has been shown that neural activity in V1 is substantially reduced after a few seconds of visual stimulation with an effective pattern, which is thought to be the neural substrate of a variety of perceptual AE. Similarly, motion adaptation in motion area V5 (Culham et al., 1999; He et al., 1998; Theoret et al., 2002; Tootell et al., 1995) and early visual areas (V1, V2 and V3, which possess direction-selective neurons, Huk et al., 2001) are thought to be responsible for motion AE.
Based on its neural underpinnings, adaptation, “the psychologist’s microelectrode” (Frisby, 1979), acts as a probe for inferring the relative contributions of V1 and other visual areas in visual awareness (e.g. orientation-selective adaptation, He et al., 1996; He and MacLeod, 2001). Specifically, when similar or equal strength of adaptation is found for visible and invisible inputs, neural correlates of such unperturbed adaptation are inferred as uncorrelated with visual awareness.
There are numerous kinds of visual aftereffects (AEs). Throughout this article, we mention several of them in the context of binocular rivalry (BR) and continuous flash suppression (CFS). Understanding these AEs is important to appreciate how they serve as tools in probing the depth of unconscious processing. Toward that end, we describe several important AEs below in order of their appearance in the text. In addition, we have prepared a webpage with demonstrations. The URL for that webpage is http://zhichenglin.googlepages.com/demonstrations.
Prolonged adaptation to an oriented visual stimulus causes a subsequent image to appear repulsed away from the adapting orientation (Gibson and Radner, 1937). For example, after prolonged viewing of an inclined grating, a vertically presented test grating appears as tilted in the opposite direction. TAE also occurs after adaptation to illusory contour tilt (Paradiso et al., 1989). TAE is believed to occur as a result of altered patterns of activity in orientation-selective neurons in V1 and V2, most likely due to inhibitory interactions (Blakemore et al., 1970; Carpenter and Blakemore, 1973; Magnussen and Kurtenbach, 1980a, b; Morrone et al., 1982; Wenderoth and Johnstone, 1987).
This comes in two forms: contrast threshold elevation and spatial frequency shift. Contrast threshold elevation is measured in contrast sensitivity function (CSF, Blakemore and Campbell, 1969). CSF is determined by finding the contrast threshold (i.e. the minimal amount of contrast needed to make a grating look striped) at different spatial frequencies; a typical finding is that the threshold is lowest (i.e. sensitivity is highest) at intermediate spatial frequencies, around 4 to 5 cycles per degree of visual angle. After prolonged exposure to a high contrast grating of a particular spatial frequency, more contrast is required (i.e. contrast threshold is elevated) to be able to detect a grating of the same spatial frequency than before adaptation, with contrast thresholds for quite different spatial frequencies being unaffected. In other words, the minimal intensity difference between light and dark bars to enable detection of a grating is elevated. This is called contrast threshold elevation; it occurs only for gratings similar to the adapting pattern in orientation. On the other hand, spatial frequency shift (Blakemore and Sutton, 1969) refers to the finding that prolonged adaptation to a high-contrast grating causes a subsequent grating shift away from the adapting spatial frequency—a grating with spatial frequency higher (or lower) than that of the adaptor appear with even higher (or lower) spatial frequency than it actually is. The AE is generally accepted as due to neural activity in V1/V2 (De Valois et al., 1982; Maffei and Fiorentini, 1973).
In the McCollough effect (McCollough, 1965), prolonged exposure to a pair of colored gratings (e.g. a vertical green grating and a horizontal red grating) causes a pair of colorless gratings appear tinged with the opposite color contingent on the orientation of the adapting gratings and the test gratings (e.g. the vertical grating appears reddish whereas the horizontal grating appears greenish). The AE can last for a long time, but it requires a period of adaptation to manifest. Although its exact neural mechanisms are still disputed, accumulating evidence suggests that they might be located early in the cortical visual pathways, probably in V1 (e.g. Humphrey and Goodale, 1998; but see Siegel and Allan, 1992 for an associative learning explanation).
Prolonged adaptation to a regularly moving stimulus renders a subsequent physically stationary test pattern to appear to move in the opposite direction (Addams, 1834; Mather et al., 1998). Known as MAE, it usually comes in several forms. One type is called the translational (i.e. linear) MAE. A well know example is the waterfall illusion: prolonged viewing of a waterfall makes subsequent stationary rocks besides the fall appear moving upward. Whether the translational MAE reflects low-level or high-level motion mechanisms depends on the nature of the test pattern: MAE measured with a dynamic test pattern is considered to reflect higher stages of motion processing than MAE measured with a static test pattern (Fang and He, 2004; Nishida et al., 1997). Another type of MAE is the spiral AE (Plateau, 1849): after adaptation to a rotating spiral, a subsequent stationary spiral (or other stationary patterns) appears to move in the opposite directions. Still another type of MAE is plaid-induced MAE: motion stimuli composed of moving gratings of different orientations are perceived as a coherent plaid pattern moving in a single direction and speed. Prolonged exposure to such moving plaid pattern can also generate MAE similar to translational MAE. A related type of MAE is the transparent MAE: bivectorial motion stimuli composed of two sets of randomly positioned dots moving in different directions and at different speeds are perceived as two overlapping surfaces moving transparently over each other (however, If the dots are locally paired, the two dot fields are not segmented into two separate surfaces but perceived as a single surface moving with the vector average velocity of the two component vectors, Qian et al., 1994; Snowden and Verstraten, 1999). Adaptation to such transparent motion results in a direction inverse of the vector sum of both inducing patterns (Riggs and Day, 1980; Verstraten et al., 1994). In general, the neural mechanisms of MAE include area V5 (Culham et al., 1999; He et al., 1998; Theoret et al., 2002; Tootell et al., 1995) and early visual areas (V1, V2 and V3, which possess direction-selective neurons, Huk et al., 2001).
In a triangular-wave spatial luminance grating, the locations of peak luminance appear as thin, bright stripes, with luminance falling off gradually and symmetrically on both sides of these peaks. After a few moments of adaptation, however, alternating light and dark illusory bars appear to be illuminated from either the right or left, resembling a square-wave grating with rounded edges (Leguire et al., 1981). In other words, adaptation renders a triangular-wave grating to appear like a square-wave grating. It may reflect the operation of cortical phase-selective mechanisms.
After exposure to an image, an illusory percept continues to appear in one’s vision although the original one has ceased (Craik, 1940). Known as afterimage (AI), it usually comes in two types: positive AI and negative AI. In positive AI, bright areas remain bright and dark areas remain dark; in negative AI, however, bright areas turn dark and dark areas turn bright. Much research has been done in negative AI. Similar to color AE, if the adaptor is a saturated color, then adaptation to it will generate an illusory percept of the complementary color at a uniform gray field. Unlike color AE, however, most AI lasts for only a few seconds to a minute: positive AI, thought to be associated with retinal latency, last only for tens of milliseconds; negative AI, attributed to photoreceptor fatigue due to photopigment bleaching, can last longer (e.g. tens of seconds). Negative AI is largely due to retinal mechanisms with some contributions of post-retinal process (e.g. Shimojo et al., 2001).
Anatomically, the parvo and magno cells in the lateral geniculate nucleus (LGN) originate from morphologically distinct retinal ganglion cells, midget cells and parasol cells, respectively (Perry et al., 1984). Coincidentally, the first letters of parvo and magno cells and their corresponding midget and parasol cells are in exact reverse. So, a potential confusion is that when one refers to the M system it is unclear whether one is referring to the midget or magno cell. The convention is to use P and M to refer to parvo and magno, respectively. The names of the channels derive from the relative sizes of the cells in the segregated laminae of the dorsal LGN (dLGN) to which they project—P cells have small cell bodies, thin axons, and slow axonal conduction speed, whereas M cells have large cell bodies, thick axons, and fast axonal conduction speed (Schiller and Malpeli, 1978). The P and M pathways are segregated in the LGN between its four dorsal layers and two ventral layers, respectively. This segregation continues up to primary visual cortex (V1), with the P pathway terminating primarily in layers 4A and 4Cβ and the M pathway in layer 4Cα and 6 (Fitzpatrick et al., 1985). The P and M pathways are preferentially associated the ventral and dorsal cortical pathways, respectively; however, they are not confined exclusively to either pathway (Felleman and Van Essen, 1991; Merigan and Maunsell, 1993). For example, visual cortical area 4 (V4), including its ventral and dorsal parts (Hansen et al., 2007), receives a mixed input from both the M and P systems. Besides these geniculo-striate pathways, it should be noted that the dorsal stream, especially the posterior parietal cortex (Pare and Wurtz, 1997), also receives visual input from the superior colliculus (SC) through the pulvinar (i.e. a subcortical projection).
Functionally, the P pathway is color sensitive, is tuned to higher spatial frequencies, is sensitive to lower temporal frequencies, and has lower contrast sensitivity; the M channel responds very poorly to isoluminant stimuli, even when moving, but is responsive to lower spatial frequencies, is sensitive to higher temporal frequencies, and has higher contrast sensitivity (Schiller and Malpeli, 1978). Thus, a common strategy to preferentially activate either pathway is to stimulate the P pathway with stimuli that are defined in color or have high spatial frequency, and the M pathway with stimuli that have low contrast or spatial frequency. However, it should be noted that in reality responses of the two pathways to most visual stimuli overlap significantly; one has to go to the very extremes of the response spectrum to get good differential activation. Additionally, although P cells have low contrast sensitivity, a high contrast stimulus will not activate them preferentially (M cells respond well to high contrasts). Similarly, low temporal frequencies or high luminance (rather than isoluminant) stimuli will not evoke preferential activation of P cells (the M system has high sensitivity and it continues to respond to isoluminance at low spatial frequencies albeit at a reduced rate). Moreover, stimuli that may preferentially activate individual P or M cells do not necessarily preferentially activate the P or M cell populations as a whole. For example, although individual P cells have lower contrast sensitivity than individual M cells, collectively they match the performance of M cells because there are so many more of them. For this reason, an M-cell lesion will not reduce behavioral contrast sensitivity (J. H. R. Maunsell, personal communication, December 10, 2007).
Efficient computation of perceptual priority is a hallmark of adaptive behavior for at least two reasons. First, while sensory inputs are massive, attention capacity is limited. Competition for limited representational resources calls for a gating mechanism to prioritize relevant information and thus reduce information overload. Such competition is biased not only by sensory saliency, whose weight decreases in the neural hierarchy, but also by visual attention, whose weight increases in the hierarchy (Kastner and Pinsk, 2004; O’Connor et al., 2002; Serences and Yantis, 2006). Second, to interpret sensory inputs, it is necessary to first assign features to either figure or ground and then integrate multiple features across space and time for perceptual coherence. This is further constrained by the distinct preferences of neurons in the hierarchy: neurons in early visual areas respond to small areas of visual space (receptive fields, RFs) and code simple features (e.g. orientation and spatial frequency), whereas neurons in later areas have large RFs and code more complex features. Attention serves to integrate distributed neural representations of features to form coherent object representations (Treisman, 1996). Two distinct forms of attention subserve such adaptive behavior. A knock on the door, for example, may distract you from focusing on the paper; or you may decide to check the time since a meeting is coming up. The former, that salient events (e.g. transient changes in luminance or contrast) capture attention, is termed bottom-up (or transient/stimulus-driven/exogenous/reflexive) attention; the latter, that goal and expectation drives attention, is dubbed top-down (or sustained/goal-driven/endogenous/voluntary) attention. Although orienting of attention is usually accompanied by eye movements (i.e. overt attention), covert orienting of attention without eye movements is possible especially in laboratory settings (Posner, 1980).
Given its important role in performing a variety of tasks, it won’t be surprising that attention is not a single entity, but a set of cortical and subcortical processes that interact mutually. First, at the cortical level, the source of top-down and bottom-up attention is generally believed to comprise two networks: 1) top-down attention originates from the dorsal posterior parietal cortex (e.g. the intraparietal sulcus) and the frontal cortex (e.g. the frontal eye field), forming the so-called dorsal frontoparietal network; 2) bottom-up attention stems from the temporoparietal junction and the ventral frontal cortex (largely lateralized to the right hemisphere), constituting the so-called ventral frontoparietal network (Corbetta and Shulman, 2002). Second, at the subcortical level, several regions have been identified to be important for control of attention. For example, the visual grasp reflex—reflexively orienting the eyes toward salient events in the visual periphery—is supported by the phylogenetically primitive midbrain circuits in all vertebrates (Ingle, 1973). Later studies pinpointed that the superior colliculus (SC) in the midbrain and the pulvinar in the thalamus are important for both overt and covert attention. Specifically, retinal projection to the SC is critical for attentional orienting and involuntary capture of attention (Rafal et al., 1991). Besides, the amygdala also plays an important role in orienting attention by projecting to cholinergic and noradrenergic cells, which are capable of exerting widespread effects on attention (Aston-Jones et al., 1999), and to cells in cortical sensory regions (Amaral et al., 1992). Third, the cortical and subcortical attention networks interact a lot; attentional selectivity can be achieved through an orchestration of subcortical reflex circuits by cortical processes that can activate or inhibit them (Easton, 1973). Indeed, anatomically the subcortical and cortical attention areas are inter- and intra-connected. For instance, the SC receives direct descending inputs from cortical visual areas and the dorsal frontoparietal network; it returns its outputs through numerous thalamic sites including the visual components of the thalamus (e.g. the LGN and the pulvinar). On the other hand, the pulvinar (especially its ventral division) receives its major inputs from the visual cortex and returns its total outputs to the cortex, serving as a hub for cortico-cortical communication (for a review, see Shipp, 2004). Note that the dorsal pulvinar (similar to the “medial pulvinar” of histological brain atlases) has connections with the cingulate, frontal, and (auditory) superior temporal areas, so its range of inputs is probably just as diverse as those to the SC (S. Shipp, personal communication, December 11, 2007).
In attention literature, it is crucial to distinguish between unattended and irrelevant stimuli—irrelevant stimuli are not necessarily unattended. This makes it important to consider whether manipulation of selective attention is adequate to render irrelevant stimuli truly unattended. For example, in a typical attention task where observers have to identify the central target while ignoring the distractors on the side (which can be compatible or incompatible with the target), several steps should be considered to make attentional selection efficient (Lachter et al., 2004; Miller, 1991; Yantis and Johnston, 1990):
In particular, the load theory (Lavie, 1995, 2005), as described in item 6, specifies how capacity limitation determines the level of distractor processing, and serves as a powerful paradigm to render irrelevant stimuli as either unattended or involuntarily attended while keeping distractors constant across different conditions. The key tenet is that as long as the central task does not consume all or most of the available capacity, you cannot but process the distractors (e.g. Volker et al., under review). Importantly, however, when steps like those listed above are considered to optimize selection efficiency, it is possible to render irrelevant stimuli unattended (as indexed by minimal processing of distractors) even under low perceptual load condition (Lachter et al., 2004).
We thank E. A. DeYoe, V. A. F. Lamme, G. E. Legge, J. H. R. Maunsell, L. Pessoa, P. H. Schiller, S. Shipp, and K. Tanaka for helpful comments; B. Bahrami, F. Fang, and M. Williams for helpful discussion about their research; M. Reinke for proofreading. Supported by Graduate School Fellowship, Graduate Research Partnership Program Fellowship, Student Research Award from the University of Minnesota (Z.L.), the James S. McDonnell Foundation, and the US National Institutes of Health (S.H.).
1The inferotemporal cortex (IT) is composed of the middle temporal gyrus and inferior temporal gyrus in humans. Although the inferior temporal cortex may indicate the ventral part of the inferotemporal cortex in humans, most people do not care about this point, and both terms are used interchangeably (K. Tanaka, personal communication, November 26, 2007).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.