|Home | About | Journals | Submit | Contact Us | Français|
Single cell recordings in monkey inferior temporal cortex (IT) and area V4 during visual search tasks indicate that modulation of responses by the search target object occurs in the late portion of the cell’s sensory response (Chelazzi et al. in J Neurophysiol 80:2918–2940, 1998; Cereb Cortex 11:761–772, 2001) whereas attention to a spatial location influences earlier responses (Luck et al. in J Neurophysiol 77:24–42, 1997). Previous computational models have not captured differences in the latency of these attentional effects and yet the more protracted development of the object-based effect could have implications for behaviour. We present a neurodynamic biased competition model of visual attention in which we aimed to model the timecourse of spatial and object-based attention in order to simulate cellular responses and saccade onset times observed in monkey recordings. In common with other models, a top-down prefrontal signal, related to the search target, biases activity in the ventral visual stream. However, we conclude that this bias signal is more complex than modelled elsewhere: the latency of object-based effects in V4 and IT, and saccade onset, can be accurately simulated when the target object feedback bias consists of a sensory response component in addition to a mnemonic response. These attentional effects in V4 and IT cellular responses lead to a system that is able to produce search scan paths similar to those observed in monkeys and humans, with attention being guided to locations containing behaviourally relevant stimuli. This work demonstrates that accurate modelling of the timecourse of single cell responses can lead to biologically realistic behaviours being demonstrated by the system as a whole.
The biased competition hypothesis has stimulated much interest in the visual attention and modelling literature. This theory suggests that competition between features in the cortical visual hierarchy is subject to a number of biases, such as “top-down” target object feedback from working memory in prefrontal cortex during visual search (Desimone 1998; Desimone and Duncan 1995; Duncan and Humphreys 1989; Duncan et al. 1997). Influential modelling work using biased competition principles produced compelling results in terms of simulating both cellular and whole system behaviours (e.g. Deco 2001; Deco and Lee 2002; Corchs and Deco 2002; Usher and Niebur 1996). However, there is scope to further explore biologically plausible computational systems based on this hypothesis. In particular, there is a need to accurately simulate the time course of attentional modulation of neuronal responses because knowledge of a target object during visual search does not immediately influence initial sensory responses. We present simulations of monkey cell responses within a system capable of biologically realistic behaviours. This shows how these cellular responses collectively might lead to visual behaviours in monkeys and humans.
A number of monkey single cell studies in ventral stream cortical areas supported the biased competition hypothesis (Chelazzi et al. 1993, 1998, 2001; Miller et al. 1993; Moran and Desimone 1985; Motter 1993, 1994a, b; Reynolds et al. 1999). These experiments typically involved placing multiple stimuli within a cell’s receptive field during a visual search task for a memorised target object. A preferred stimulus for a neuron is one that causes a strong response from the cell when presented alone in its receptive field. Conversely, a non-preferred stimulus causes only a weak response when presented alone in the receptive field. When a preferred and a non-preferred stimulus are presented together in a receptive field, the cell’s response is determined by which object matches the previously memorised target object i.e. cellular responses are influenced by the monkey’s top-down attentional goal during visual search. These target object effects take time to develop, with the initial sensory response to the search array being indifferent to which of the objects is attended. Then, after a period of 150–200 ms, the influence of the non-target object is effectively filtered out of the cell’s receptive field in anterior inferior temporal cortex (Chelazzi et al. 1993, 1998) and extrastriate area V4 (Chelazzi et al. 2001; Motter 1994a, b). When the preferred stimulus is the target, the response approaches that when this stimulus is presented alone. However, if the non-preferred stimulus is attended, responses are severely suppressed despite the presence of the preferred stimulus in the receptive field. Hence, there is a significant modulation of responses according to the target object in the late portion of the cell’s response. In contrast to the protracted development of this target object effect, spatial attention appears to influence earlier responses. Attention to a memorised spatial location that is inside a V4 cell’s receptive field modulates baseline responses preceding stimulus onset. This effect continues so that the earliest stimulus-evoked response is modulated by attention (Luck et al. 1997).
Previous computational models have not captured differences in the latency of these attentional effects and yet the more protracted development of the object-based effect could have implications for behaviour. Accurate modelling of the timing of cellular behaviours potentially leads to systems level behaviours that are more biologically realistic. We aimed to model the time course of spatial and object-based effects in cellular responses in V4 and inferior temporal cortex (IT) in order to simulate responses observed in the monkey single cell recordings and then to simulate saccadic behaviours (saccade onset times and search scan paths). We adopt a biologically plausible active vision paradigm in which the retina moves around the scene, i.e. attention is overt and saccades are made, whereas most previous models of visual attention (e.g. Deco and Lee 2002; Itti and Koch 2000) use covert attention where fixation and the retina are fixed in position. Our model makes autonomous decisions to saccade based on the activity of its cells. By simulating attentional effects observed in single cell recordings under conditions of visual search, the behaviour of cells within our model is biologically constrained. We have previously reported the search scan path characteristics of the system (Lanyon and Denham 2004a), showing that scan path behaviour mimics that observed in humans (Scialfa and Joffe 1998; Williams and Reingold 2001; Williams 1967) and monkeys (Motter and Belky 1998b) during active visual search for a feature conjunction target. Scan paths are guided to locations containing potential target features due to the attentional effects happening at the cellular level of the model. We have not previously presented the details of these cellular behaviours and the results of simulating the attentional effects and saccade latencies observed in monkey cell recordings. Here we report these single cell simulations in order to demonstrate that accurate modelling of single cell responses leads to biologically realistic behaviours being demonstrated by the system as a whole. Also, we show that the top-down prefrontal object bias to the ventral visual stream is likely to be more complex than modelled elsewhere.
We extend the seminal biased competition modelling approach described by (Deco 2001; Deco and Lee 2002; Rolls and Deco 2002, Chap. 9), in which attention arises as a result of the on-going competitive dynamics of the system. Neuronal activities within each cortical area are defined by differential equations describing inputs from other neurons in the same cortical area and from those in other regions. Simulations are run by iteratively updating activities over the time of the fixation. Hence, neurons in various cortical regions interact dynamically with one another. Within a particular region, neurons interact in a competitive manner. Using this dynamic systems approach, the evolution of neuronal activity across the cortical network can be examined over time. Similar to Deco’s model, our model comprises a ventral pathway for feature/object processing and a dorsal pathway relating to spatial processing (cf. Milner and Goodale 1995; Ungerleider and Mishkin 1982), as depicted in Fig. 1. However, in contrast to Deco’s model, spatial and object-based biases to the system’s dynamics operate concurrently here. Other details of the models, such as methods of feature detection and competitive processes, also differ. Our model is described formally in the “Appendix”.
In the model’s ventral stream, colour and orientation features are processed in a feed-forward hierarchical fashion from the retina through area V1 to V4 and then IT. Receptive field sizes increase in a biologically constrained fashion through V4 and IT, and biased competition operates between different features and different objects. The IT module represents anterior areas of IT, such as area TE, where receptive fields span much of the retinal image and populations encode objects in an invariant manner (Wallis and Rolls 1997). Within the modelled IT, objects are represented by excitatory pyramidal cells and competition between objects is mediated by a pool of inhibitory interneurons. Areas V1 and V4 encode features in a retinotopic manner with a feature map (consisting of excitatory pyramidal cells) for each feature represented. V4 neurons are known to be functionally segregated (Ghose and Ts’O 1997) and the area is involved in the representation of colour as well as form (Zeki 1993). In the modelled V4, competition occurs between different features of the same type (colour or orientation) and is mediated by an array of inhibitory interneuron pools that are retinotopically organised and spatially linked with the excitatory cells representing the features. Hence, competition operates locally between different colours and different orientations at the same retinotopic location, providing the competition between features within the same receptive field that has been found in single cell recordings (Chelazzi et al. 1993, 1998, 2001; Luck et al. 1997; Moran and Desimone 1985).
Due to its sustained activity (Miller et al. 1996) and the fact that attention requires working memory (De Frockert et al. 2001), prefrontal cortex has been suggested as the source of a working memory object-related bias to IT. This idea has been incorporated in previous models (e.g. Deco 2001; Deco and Lee 2002; Corchs and Deco 2002; Usher and Niebur 1996). Similarly, in our model object-based attention results from a bias from prefrontal cortex (possibly ventral prefrontal area 46), which represents the search target object, fed back to influence the competition between objects in IT. In common with other similar models (e.g. Deco 2001; Deco and Lee 2002; Corchs and Deco 2002; Usher and Niebur 1996), the prefrontal bias is an external signal to the system and the dynamics of prefrontal cortex are not modelled here. It is assumed that prefrontal cells would be activated by feedforward sensory information from IT (future extensions to the model could include incorporating prefrontal cortex in the dynamics of the system and allowing autonomous determination of the target object. In this regard, the model could be extended to incorporate prefrontal modelling work such as that described by Deco and Rolls 2003, or Stemme et al. 2007). The prefrontal signal to IT acts as a bias which influences the competition between objects so that, eventually, the target object wins the competition and other objects are suppressed. Hence, the initial sensory response in IT tends to reflect the sensory information received via V4 but, over time, the prefrontal bias has the effect of causing the target object to be most strongly represented. IT feeds back to V4 so that, as the target object wins the competition within IT, the feedback bias results in target features receiving a competitive advantage in the competition within V4. Target features become enhanced and non-target features become suppressed in parallel across V4. This results in an enhancement of target features across V4, similar to that recorded in monkey cells (McAdams and Maunsell 2000; Motter 1994a, b).
Previous models have tended not to simulate the latency of object-based effects in cell’s responses because the prefrontal bias was modelled as a static mnemonic value over time, which results in target object effects beginning from the onset of the sensory response. We incorporate a sensory component in the prefrontal response, which reflects the sensory response latency in prefrontal cortex. Following presentation of stimuli in a search array, activity of relevant prefrontal neurons increases beyond the level required to hold an object in memory. Identification of a target not already held in working memory begins approximately 135 ms after onset of the search array in prefrontal cortex (Hasegawa et al. 2000) and a match response discriminating a remembered target versus non-targets begins 110–120 ms after stimulus onset (Everling et al. 2002; Miller et al. 1996). Prefrontal cells also determine, from ~130 ms after presentation of an object, whether it matches the category of another object remembered during a delay (Freedman et al. 2003). Evidence for a delay in prefrontal signals biasing IT comes from experiments in split-brain patients, where the latency of IT response to top-down information from prefrontal cortex (178 ms) is longer than that resulting from bottom-up information (73 ms: Tomita et al. 1999). We model the prefrontal bias signal as having two components:
The model’s dorsal stream consists of a retinotopically organised parietal module, representing the lateral intraparietal area (LIP), in which competition operates between different spatial locations and the location becoming most active wins the competition to attract attention and, hence, becomes the target of the next saccade. LIP receives retinotopic input from V4 so that, initially, all stimuli locations are represented equally strongly within LIP. As target features begin to win the competition in V4, these locations in LIP receive a stronger bias from V4 so that they have a competitive advantage in the competition in LIP. Hence, possible target locations become enhanced and LIP acts as an indicator of behavioural relevance: it is known that the monkey LIP represents the behavioural and attentional significance of stimuli (Colby et al. 1996; Gottlieb et al. 1998; Kusunoki et al. 2000; Bisley and Goldberg 2003; Toth and Assad 2002). The influence of this process in guiding search scan paths was examined by Lanyon and Denham (2004a), where the prefrontal bias relating to the novelty value of locations in the scene was also discussed in relation to mediating ‘inhibition of return’ in the scan path.
The premotor theory of attention (Rizzolatti et al. 1987) suggests that the neural correlates of spatial attention and eye movements are linked. Psychophysical evidence suggests that a spatial attention effect exists at the saccade target location immediately before the saccade takes place and lasts some time after the saccade (Shepherd et al. 1986). Hence, the saccade target location is “primed” as the eyes arrive there. Further, microstimulation of areas involved in generating eye movements modulate responses in area V4 and produce spatial enhancement of behavioural performance (Moore and Armstrong 2003; Muller et al. 2005). Here, a spatial attention focus, associated with the eye movement, is fed through LIP to area V4, resulting in spatial enhancement early in V4 responses, similar to the effects seen in single cell recordings in this area (Connor et al. 1996; Luck et al. 1997) and neuroimaging of human extrastriate cortex (Hopfinger et al. 2000; Kastner et al. 1999). Involvement of LIP in this spatial attention bias is made on the basis of single cell recordings in LIP (Colby et al. 1996; Robinson et al. 1995), functional neuroimaging of the possible human parietal homologue of LIP (Corbetta et al. 2000; Hopfinger et al. 2000; Kastner et al. 1999), positron emission tomography (Corbetta et al. 1995; Fink et al. 1997), event-related potential studies (Martinez et al. 1999) and psychophysics (Posner et al. 1984; Shimozaki et al. 2003) that suggest spatial attentional enhancements in this area and a parietal control signal for spatial attention. We term the spatial attentional focus the “Attention Window” (AW) and its aperture is scaled according to coarse resolution featural information, assumed to be conveyed rapidly by the magnocellular pathway (see “Appendix” and Lanyon and Denham 2004b, for further details).
Hence, in our model area V4, both spatial and object-based attentional effects emerge from the dynamics of the system. Responses in area V4 are known to be modulated by both attention to space and attention to features (McAdams and Maunsell 2000; Treue and Martinez Trujillo 1999). Event-related potential studies suggest that spatial effects appear earlier than feature-based attention, which may be contingent on the earlier spatial processes (Anllo-Vento and Hillyard 1996; Anllo-Vento et al. 1998; Hillyard and Anllo-Vento 1998). In our model the initial attentional effects consist of the spatial “spotlight” of the AW. Over time object-based attention develops and it is these object-based effects that drive the feature selective behaviour in search scan paths.
The single cell recordings of Chelazzi et al. (1993, 1998, 2001) show a link between the time of development of object-based attention and saccade onset. On average saccades took place ~70–80 ms after a significant target object effect was established in IT and V4. In our model, object effects are deemed to be significant when the most active object cell in IT is twice as active as its highest competitor. Such a quantitative difference is reasonable when compared to the figures in Chelazzi et al. (1993). Once a significant difference is established, a saccade is initiated 70 ms later, reflecting latency for motor preparation. Hence, the model makes an autonomous decision to saccade based on levels of activity in its cells.
We now present the results of computational simulations of spatial and object-based attentional effects that have been observed in vivo.
Lanyon and Denham (2004a) illustrated that the spatial bias has the effect of increasing activity in LIP within the spatial AW and, because the AW is scaled according to density of stimuli in the scene, activity is increased across a wider area of LIP in sparse scenes. This scaling has most effect during scan path simulations in which stimulus density varies across the scene because it results in larger amplitude saccades in sparse areas of a scene whilst dense areas are inspected with a series of smaller amplitude saccades (see Lanyon and Denham 2004b, 2005, for demonstration of the effects of the AW on search scan paths).
The increase in activity in LIP results in increases in activity at locations within the AW in V4, due to the excitatory connection from LIP to V4. This modulation of responses in V4 by attention is our focus here. Responses of the model V4 cells whose receptive fields overlap the spatial AW are increased from the onset of the stimulus-evoked response, as shown in Fig. 2a. This qualitatively replicates effects observed by Connor et al. (1996) and Luck et al. (1997). Connor et al. found that V4 responses were modulated according to the spatial proximity of the location of the spatial focus of attention. Luck et al. found that spatial attention directed to a location inside a V4 cells’ receptive field modulates the earliest stimulus-evoked responses and also shifts baseline responses in advance of the sensory response. The effects recorded by Luck et al. did not depend on whether the stimulus was a preferred or non-preferred stimulus for the cell and happened only when the animal was attending to a location inside the cell’s receptive field. The effect of the spatial bias in our model is likewise ignorant of stimulus properties. Since features detected in V4 within the spatial AW are represented with higher levels of activity than those outside the AW, the feedforward signal to IT is strongest for those features in the AW. Hence, activity in IT is slightly stronger for objects that fall within the spatial AW, as shown in Fig. 2b.
Our simulations relate to monkey experiments conducted by Chelazzi et al. (1993, 1998, 2001). During these experiments monkeys were shown a screen containing a target object that had to be memorised across a subsequent 1,500 ms delay period. Following the delay, a search array was presented. This array contained two objects (one preferred and one non-preferred stimulus for the cell) both presented within the recorded cell’s receptive field. The monkey’s task was to make an eye movement to the memorised target object. The period of interest here is the time from search array onset until the eye movement begins. During this time the response of the recorded cell adapts from representing both objects in its receptive field to representing primarily the search target object. We simulate these target object effects in V4 and IT. In our model, prefrontal target object feedback has the effect of influencing the competitive dynamics in IT such that responses are modulated according to the target object. Feedback from IT to V4 subsequently biases competition in V4 so that target object features are represented more strongly than non-target features.
Figure 3 shows activity in IT recorded in a monkey cell (Fig. 3a) and a model cell (Fig. 3b) when a preferred and a non-preferred stimulus for the cell are present within the cell’s receptive field. One of these stimuli is the search target and is, therefore, attended. Whilst the initial responses of the cells do not depend on the search target, the late portion of the response is determined by whether the preferred or non-preferred stimulus is the target. In our model, this effect results from the sensory response signal from prefrontal cortex biasing activity in IT. For this simulation, only the sensory response in prefrontal cortex, and not the sustained mnemonic response, is included. The latency of the object-based effect is linked to the prefrontal sigmoid response function. If the signal is held constant over time, as has been implemented in previous models (e.g. Deco 2001; Deco and Lee 2002), the target effects begins from the onset of the sensory response in IT and V4. Our hypothesis is that the latency of the significant object-based effects observed in monkey IT cells (Chelazzi et al. 1993, 1998; and also V4: Chelazzi et al. 2001; Motter 1994a, b) is a result of the latency of the feedback from prefrontal cortex.
In addition to this significant target object effect late in the sensory response in IT, Chelazzi et al. (1998) found that the average response of cells showed a small target effect in the early part of the response. They suggested this might be due to a continuation of elevated firing during the delay period in their experimental paradigm. Addition of the sustained mnemonic signal to our model’s prefrontal bias allows this effect to be simulated. Figure 4b shows a simulation under the same conditions as Fig. 3b but with the inclusion of this bias signal. The early part of the response is now slightly enhanced when the target object is the preferred stimulus. However, the most significant target effect still appears later in the response (due to the prefrontal sensory response bias). This is very similar to the effect observed by Chelazzi et al. (1998), shown in Fig. 4a, and suggests that the early and late effects observed in these monkeys may be due to a prefrontal bias signal with more than one component. We will return to this topic in the discussion.
Figure 5 shows a comparison of activity in IT when two stimuli are present in the receptive field with that invoked by the individual stimuli presented alone in the receptive field. In the two stimuli case, responses are suppressed compared to the preferred stimulus being presented alone. This is due to the competing influence of the second stimulus, which suppresses activity for the first stimulus via the inhibitory interneuron pool: the inhibitory pool is reciprocally connected to the (pyramidal) cells representing the objects so that, when more objects are represented, inhibitory activity also increases. Following the onset of object-based attention, neurons representing the target stimulus out-compete those representing non-target stimuli and responses in the two stimuli case tend towards those when the attended stimulus is presented alone. This effect can be seen in the monkey recording (Fig. 5a) and the model simulations (Fig. 5b, c).
Chelazzi et al. (1998) found that a minority of monkey IT cells, for example the cell shown in Fig. 6a, do show a significant target effect in the early part of their response. We can also simulate this effect, as shown in Fig. 6b, if we assume that the sensory response of some prefrontal cells has a shorter latency and, hence, this feedback is able to influence earlier responses in some IT cells. In this case, the sigmoid prefrontal response function is shifted in time (by altering the parameter τsig in Eq. 27 of the “Appendix”) to reflect the earlier sensory response in prefrontal cortex. Hence, the IT cells shown in Fig. 6 may preferentially receive earlier prefrontal feedback, perhaps being connected to prefrontal cells with shorter sensory response latencies: prefrontal cortex receives inputs from various cortical and thalamic routes (Miller and Asaad 2002) that have differing latencies.
Object-based effects appear later than spatial effects in the model’s V4 because the object effect depends on the resolution of competition between objects in IT. IT continually feeds back to V4 and, as the target object wins the competition in IT, the effect of the feedback is to suppress non-target features in V4 and allow target features to become most active. Figures 7 and and88 show plots of activity in V4 from monkey cells recorded by Chelazzi et al. (2001) and from our model simulations. These plots appear similar to those from IT. However, the nature of the representation in our model’s V4 is different from that in IT because V4 encodes features in a retinotopic fashion, rather than object representations that are invariant to spatial translation. The object-based effect is actually a feature-based effect in V4. However, we consider the two terms are interchangeable here since feature effects are due to their relation to the target object. These object/feature effects develop from 150 ms after the onset of the stimuli on the retina.
Figure 7 shows the case where two stimuli are present in the cell’s receptive field and Fig. 8 shows a comparison of this activity with that when these individual stimuli are presented alone in the receptive field. Due to the competing influence of the second stimulus, responses are suppressed in the two stimuli case. In our model, the additional stimulus results in more overall activity in the pyramidal cells representing features. Due to reciprocal connections this leads to an increase in activity in the inhibitory interneurons and this, in turn, results in a reduction of activity in the pyramidal cells. Due to feedback from IT, object-based attention develops from about 150 ms and target features win the competitions within V4. Cells representing target features become more active whilst those representing non-target features become suppressed. Responses in the two stimuli case tend towards those when the attended stimulus is presented alone. From this time, LIP receives larger inputs from V4 at locations that contain target features, allowing these locations to win the spatial competition in LIP. Hence, LIP becomes a map of behaviourally relevant locations and attention is attracted to those locations that contain target features, producing scan paths that investigate task-relevant stimuli in the scene (see Lanyon and Denham 2004a).
In comparison to the simulation of the recordings from IT, slightly stronger object-related feedback from IT to V4 (“Appendix” Eq. 22, η) was used for the simulation of the V4 recordings. Similar patterns of activity were generated with both settings of this parameter but the object-based modulation is greater when IT feedback is stronger. Hence, stronger feedback best replicates the monkey V4 recordings by Chelazzi et al. (2001) and weaker feedback the IT recordings by Chelazzi et al. (1993, 1998). The same animals were used in both experiments but Chelazzi et al. (2001) noted that the animals received more training prior to the 2001 V4 recordings than they did for the earlier IT recordings. On the basis of our results, this suggests that the strength of object-related feedback from IT to V4 may be tuned by learning, or affected by age.
Object-based effects began earlier (from 150 to 160 ms after array onset) in the recordings from the more highly trained/older monkeys (Chelazzi et al. 2001) compared to those recorded when the monkeys were less trained and younger, when effects began 170–180 ms after array onset (Chelazzi et al. 1993, 1998). The effects in both cases began ~70–80 ms before the saccade onset. For the simulations presented so far we have modelled the earlier object-based effects but a later effect can also be simulated by assuming that prefrontal feedback to IT takes longer to become effective, perhaps due to longer sensory response latencies in prefrontal cortex. This is achieved by the shifting the sigmoid prefrontal response function time (by altering the parameter τsig in Eq. 27 of the “Appendix”) to reflect a longer latency sensory response in prefrontal cortex. For example, Fig. 9 shows the effect of increasing prefrontal response latency so that the onset of the target object effect is delayed until ~200 ms.
Saccade onset is linked to the latency of the object-based effect in both the monkey recordings (occurring ~70–80 ms after the object-based effect) and in the model (occurring ~70 ms after the object-based effect). Hence, the factors that determine the strength and timing of object-based effects in the model also impact saccade latency. Weight of IT feedback to V4 has some impact on the strength of the object-based effect due to the recurrent connections between IT and V4. Hence, when IT feedback to V4 is weaker, saccade latency is longer (for example, compare the 252 ms latency in Fig. 3b with a 241 ms latency in Fig. 7b). If this feedback is tuned by learning, as suggested above, we predict that saccade latency would be shorter in familiar scenes or tasks. The latency of the prefrontal feedback to IT has a larger effect on saccade latency in our model, because it determines the timing of object-based attention. Longer prefrontal sensory response latency results in longer latency to saccade (for example, compare the 241 ms saccade latency in Fig. 7b with the 302 ms latency in Fig. 9). When object-based effects begin in IT and V4 at ~150 ms, saccades take place at ~240 ms, but when the prefrontal response latency is increased such that object-based effects begin at ~200 ms, saccade onset is ~300 ms. Similar differences in saccade onset between their 1993/1998 and 2001 recordings were noted by Chelazzi et al. (2001). These simulations suggest that the timing of prefrontal feedback to IT (i.e. the latency of the sensory response in prefrontal cortex and/or its effect in IT) could be tuned by learning or age. However, in addition to the difference in age and training, the difference in recording sites may be relevant and stimuli configurations were slightly different in the two monkey experiments, being closer together in the 2001 than 1993/1998 recordings. Therefore, these predictions are tentative.
The biased competition hypothesis inspired seminal computational modelling of visual attention (Deco 2001; Deco and Lee 2002; Usher and Niebur 1996). We extended this line of research with the aim of accurately simulating the time course of attentional effects observed in monkey single cell recordings. As we have shown previously (Lanyon and Denham 2004a), our model produces biologically realistic visual search behaviours based on these underlying cellular behaviours. Here, we simulated the latency and modulatory effect on cellular responses of spatial and object-based attention, as well as saccade onset times observed in monkey experiments. Spatial and object-based attention operate concurrently in our model due to different biases in the system. Of particular interest was the simulation of the target object effect that develops over time to become significant only in the late portion of cell’s responses in IT and V4. In order to simulate this effect we investigated the object-based prefrontal bias normally used in biased competition models to influence neurodynamical interactions in the visual ventral stream.
The main component of our object-based prefrontal bias reflects the sensory response in prefrontal cortex and its strength builds over time in a biologically plausible manner determined by a sigmoid function (Everling et al. 2002; Miller et al. 1996; Hasegawa et al. 2000). This differs from the prefrontal signal used in other models (e.g. Deco 2001; Deco and Lee 2002; Usher and Niebur 1996; Renart et al. 2001), which has a constant value and leads to an early target effect in IT, not capturing the time course of these effects found in monkey IT and V4. Using a prefrontal sensory response function we were able to accurately simulate the onset of object-based effects in the late portion of cell responses in IT (Chelazzi et al. 1993, 1998) and V4 (Chelazzi et al. 2001). An alternative suggestion is that the late effect is due to a spatial signal originating in the frontal eye fields (Hamker 2005). In this hypothesis neurons in V4 and IT receive feedback from movement cells in the frontal eye fields (FEF) at the location of an intended eye movement. Hence, neurons that represent locations of the forthcoming saccade gain an advantage in competition. This modulation from FEF becomes significant from 150 ms into the sensory response. Whilst it is possible that a spatial feedback bias contributes to the modulation of late neuronal responses, an object/feature prefrontal bias seems to be a more plausible candidate for the target object effect since Motter (1994a, b) has shown that responses across V4 are modulated by target features in a non-spatially selective manner over a time course similar to the money experiments simulated here. The use of an object-specific bias here produced simulation results that are very close to the monkey data in terms of the nature and onset of target object effects relative to the initial sensory response. Further studies using micro-stimulation in monkeys could provide more information about the nature and source of these top-down biases.
In order to simulate the small target object effect that is present in the earlier response (Chelazzi et al. 1998), we added a second component to the prefrontal bias in the form of a small sustained mnemonic signal, consistent with the one used in other models (Deco 2001; Deco and Lee 2002; Usher and Niebur 1996; Renart et al. 2001). Similar to the effect observed in monkeys, this mnemonic bias modulates responses in IT from its earliest response but the effect is slight compared to the significant target object related effect in the later portion of the response.
Hence, our modelling results suggest that the early and late effects observed in these monkeys may be due to more than one type of bias signal, or a prefrontal bias signal with more than one component:
In our model, the strength of the object-based effect in V4 and IT is modulated by the strength of target object feedback in the ventral stream. The timing of the effect is linked to the response latency of the prefrontal signal. When simulating data from monkey single cell recordings in V4 and IT we found that recordings done later with older and more highly trained animals were best replicated with slightly stronger IT feedback to V4 and a prefrontal response with slightly shorter latency. This suggests that object-based feedback in the ventral stream could be tuned by learning or age. This predication is tentative in view of the difference in recording site and stimulus configuration differences between the monkey experiments replicated here. However, in FEF, stronger responses to target as opposed to non-target stimuli occur earlier if the monkey has had long experience of searching for the same target (Bichot et al. 1996) and this suggests that the latency of object-related responses in higher stages of cortical processing might be experience-dependant. We predict that, under the same experimental conditions, recordings in V4 and IT would show stronger and earlier modulation of responses by the target object in animals more highly trained in the task and stimuli compared to naïve animals. Future work in this area should also seek to gain a better understanding of the neurodynamics relating to the sensory response in prefrontal cortex and how it interacts with IT, and other ventral stream areas, to influence the development of object-based attention. The growing use of microstimulation in monkeys, and effective connectivity in human functional MRI, as means of assessing effects of one cortical region upon another, will provide insights for modelling in this area.
On the basis of neurophysiological evidence that spatial attention is able to operate earlier than object-based selection in extrastriate cortex (Chelazzi et al. 1993, 1998, 2001; Anllo-Vento and Hillyard 1996; Anllo-Vento et al. 1998, Hillyard and Anllo-Vento 1998; Luck et al. 1997; Motter 1994a, b), the effects of spatial attention are evident in our model early in the cells’ responses whereas object-based effects take time to develop. We predict that eye movements to a target location can be made faster following a memorised spatial cue than with knowledge of target features. Spatial attention in our model results from an eye-movement related bias to LIP, which then biases processing in V4. A plausible source of the bias to LIP is FEF because microstimulation of FEF produces a spatial enhancement of signals in V4 (Moore and Armstrong 2003) and the suggested circuitry through LIP offers one possible indirect route to produce this effect in V4 when an eye movement is made. However, other sources of spatial bias to LIP could be dorsolateral prefrontal cortex, which has connections with parietal cortex (Blatt et al. 1990), or pulvinar.
Recurrent communication between the ventral stream (V4) and the dorsal stream (LIP) is a feature of our model. In addition to the bias from LIP to V4 that results in spatial attentional in V4, a reciprocal connection from V4 to LIP allows object-based effects developing in the ventral stream to influence the competition to attract attention in LIP. Due to object-based attention, over time target features are enhanced across V4 and non-target features are suppressed, similar to effects observed in monkey V4 (McAdams and Maunsell 2000; Motter 1994a, b). As a result of V4 inputs to LIP, locations containing target features have a competitive advantage in LIP’s spatial competition to attract attention. Hence, LIP represents the locations of behaviourally relevant stimuli most strongly, as noted in single cell recordings in this area (Colby and Goldberg 1999; Gottlieb et al. 1998; Kusunoki et al. 2000), and guides the scan path accordingly. The search scan path behaviour resulting from the cellular effects reported here has been presented by Lanyon and Denham (2004a). We predict that deficits in white matter (for example, revealed by diffusion tensor imaging tractography linking functionally defined regions) connecting the human homologues of V4 and LIP will lead to less feature-selective search behaviour.
The decision to saccade in this model is linked to the development of object-based attention in the ventral stream. We were able to replicate saccade onset times for the single cell data we simulated here. Our simulations show that biologically constrained models can autonomously make plausible eye movement decisions on the basis of cell activity resulting from the system dynamics. Here, the decision to initiate a saccade is made on the basis of activity in the ventral stream (IT) but the choice of destination is encoded in the dorsal stream (LIP). Due to the link between object-based attention and eye movement decisions in our model, the timing of prefrontal feedback and strength of object-related feedback in the ventral stream affect saccade latency. We predict that latency to saccade to locations containing target features will be shorter for very familiar objects or in a task in which the subject is highly trained because prefrontal response latencies are likely to be shorter. Rapid saccades by less trained individuals will be more likely to land at locations that do not contain information relevant to the search target. However, differences between trained and untrained subjects will be less noticeable at longer saccade durations. Short-term memory of target features does reduce saccade latency (McPeek et al. 1999) and familiarity with distractors and target objects (Greene and Rayner 2001; Lee and Quessy 2003; Lubow and Kaplan 1997; Wang et al. 1994) or with the search task (Sireteanu and Rettenbach 2000) makes search more efficient. For the highly trained monkeys’ data modelled here, the latency of object-based attention in the model’s ventral stream is such that LIP does not differentiate locations containing target and non-target features until at least 150 ms after fixation/stimulus onset (see Lanyon and Denham 2004a, for illustration). Therefore, saccades made with very short latencies will be driven by bottom-up saliency rather than driven by the goals of object-based attention. Hence, we predict that saccades made after short latencies (less than ~150 ms for highly trained subjects) will not reflect object and feature-related aspects of the task. This effect has been demonstrated in human eye movement studies by van Zoest (van Zoest et al. 2004; van Zoest and Donk 2004, 2006). Our model provides a link between these eye movement behaviours and the object-based effects observed in monkey single cell recordings.
In conclusion, we have shown that accurate modelling of attentional responses in single cell data leads to a system that exhibits biologically realistic eye movement behaviours. This type of computational modelling provides valuable insights into the link between cellular responses and human and animal behaviour.
We thank the two anonymous reviewers for their supportive and constructive comments that helped us to improve this paper. We thank Ann-Marie Grbavec for insightful comments on the structure of the introduction text. We also thank Fred Hamker for helpful comments about the prefrontal object bias in an earlier version of this model. Thanks to Jason Barton in whose lab revisions to the manuscript were done. LL was funded by a University of Plymouth graduate studentship for the majority of this work and by a Michael Smith Foundation for Health Research Post-doctoral Fellowship during later stages of the write-up.
This appendix provides full details of the implementation of the model. An examination of the effects of varying certain weighting parameter values was given by Lanyon and Denham (2004a). Modules representing areas V4, IT and LIP interact dynamically to produce the attentional effects within the system. The retina and V1 do not form part of the dynamic portion of the system but act as feature detectors.
Colour processing in the model focuses on the red–green channel with two colour arrays, Γred and Γgreen, a simplification of the output of the medium and long wavelength retinal cones, being input to the retinal ganglion cells. References to red and green throughout this paper refer to long and short wavelengths. The greyscale image, Γgrey, used for form processing, is a composite of the colour arrays and provides luminance information.
At each location in the greyscale image, retinal ganglion broad-band cells perform simple centre-surround processing, according to Grossberg and Raizada (2000) as follows.
On-centre, off-surround broadband cells:
Off-centre, on-surround broadband cells:
where Gpq (i, j, σ1) is a two-dimensional Gaussian kernel, given by:
The Gaussian width parameter is set to: σ1 = 1.
These broadband cells provide luminance inputs to V1 interblob simple cells that are orientation selective.
Retinal concentric single opponent cells process colour information as follows.
Red on-centre, off-surround concentric single-opponent cells:
Red off-centre, on-surround concentric single-opponent cells:
Green on-centre, off-surround concentric single-opponent cells:
Green off-centre, on-surround concentric single-opponent cells:
These concentric single-opponent cells provide colour-specific inputs to V1 double-opponent blob neurons.
The V1 module consists of K + C neurons at each location in the original image, so that neurons detect K orientations and C colours. At any fixation, V1 would only process information within the current retinal image. However, for the purposes of simulation, the entire original image is “pre-processed” by V1 in order to save computational time during the active vision component of the system. As V1 is not dynamically updated during active vision, this does not alter the result. Only those V1 outputs relating to the current retinal image are forwarded to V4 during the dynamic active vision processing. The size of filters used in V1 determines the ratio of pixels to degrees of visual angle so that V1 receptive fields cover approximately 1° of visual angle (Wallis and Rolls 1997).
For orientation detection, V1 simple and complex cells are modelled as described by Grossberg and Raizada (2000), with the distinction that two spatial resolutions are calculated here. Simple cells detect oriented edges using a difference-of-offset-Gaussian (DOOG) kernel.
The right and left-hand kernels of the simple cells are given by:
where u+ and u− are the outputs of the retinal broadband cells above; [x]+ signifies half-wave rectification, i.e.
and the oriented DOOG filter Dpqij(lk) is given by:
where δ = σ2/2 and θ = π(k − 1)/K, where k ranges from 1 to 2K, K being the total number of orientations (2 is used here); σ2 is the width parameter for the DOOG filter, set as below; r is the spatial frequency octave (i.e. spatial resolution), such that r = 1 and σ2 = 1.2 for high resolution processing, used in the parvocellular pathway, which forms the remainder of the model; r = 2 and σ2 = 2.2 for low resolution processing, used in the magnocellular (or sub-cortical) pathway for scaling the AW.
The direction-of-contrast sensitive simple cell response is given by:
γ is set to 10.
The complex cell response is invariant to direction of contrast and is given by:
The value of the complex cells, Irijk, over the area of the current retinal image, is input to V4.
The outputs of LGN concentric single-opponent cells (simplified to be the retinal cells here) are combined in the cortex in the double-opponent cells concentrated in the blob zones of layers 2 and 3 of V1, which form part of the parvocellular system. The outputs of blob cells are transmitted to the thin stripes of V2 and from there to colour-specific neurons in V4. For simplicity, V2 is not included in this model.
Double-opponent cells have a centre-surround antagonism and combine inputs from different single-opponent cells as follows.
Red on-centre portion:
Red off-surround portion:
Green on-centre portion:
Green off-surround portion:
where σ1 = 1.2, σ2 = 1.5.
The complete red-selective blob cell is given by:
The complete green-selective blob cell is given by:
where γ = 0.2 this scales the output of V1 blob cells to be consistent with that of the orientation-selective cells; c1 = K + 1 this represents the position of the first colour input to V4 (i.e. red); c2 = K + 2 this represents the position of the second colour input to V4 (i.e. green). The blob cell outputs over the area of the current retinal image are input to V4.
The radius of the attention window is based on the density of stimuli within the retinal image at the current fixation point and is inspired by the psychophysical findings of Motter and Belky (1998a). This radius is given by AWrad below. The density is estimated using the lowest spatial resolution orientation information.
where Iijrk is the output of the orientation-selective V1 cells over the area of the current retinal image. Note, when fixation is close to the edge of the original image, the retina may “overflow” the original image. In this case, the output of the retinal and V1 stages in the “overflow” area is considered to be zero so that no “bottom-up” inputs are provided to V4 from this area. However, for the purposes of scaling the AW, this region is ignored and only that portion of the retina and V1 that does not “overflow” is processed here; r is set to 2, the lowest spatial resolution; ψ is a function that removes the lowest 95% of activity and reduces to zero the activity at points where a neighbour has been found within a Euclidean distance equal to the length of the bar stimuli.
The radius of the AW is given by
where d is the number of non-zeros in f above; m, n are the dimensions of the retinal image. Note, the size of the retinal image is flexible, with the size of areas V1, V4 and LIP being based on retinal image size.
The dynamic cortical modules follow a similar approach to that described by (Deco 2001; Deco and Lee 2002; Rolls and Deco 2002, Chap. 9), and are modelled using mean field population dynamics (Gerstner 2000; Wilson and Cowan 1972), also used by Usher and Niebur (1996). In this mean field approach average ensemble activity is used to represent populations, or assemblies, of neurons with similar encoding properties. Population averaging does not require temporal averaging of the discharge of individual cells and, thus, the response of the population may be examined over time, subject to the size of the time step used in the differential equations within the model. The response function, which transforms current (activity within the assembly) into discharge rate, is given by the following sigmoid function that has a logarithmic singularity (Gerstner 2000):
where Tr, is the absolute refractory time, is set to 1 ms; τ, is the membrane time constant (where 1/τ determines the cell’s firing threshold). The threshold is normally set to half the present maximum activity in the layer (in V4, this is half the maximum within the feature type in order that the object-based attention differences between features are not lost due to the normalising effect of this function).
LIP contains the same number of neurons as V4 and has reciprocal connections with both the orientation and colour layers in V4. The size of V4 and LIP is determined by the size of the retinal image, which is flexible. For monochromatic simulations V4 contains only orientation selective assemblies.
The dynamic portion of the model is run such that the differential equations are solved numerically by running simulations in computer software (programmed in Matlab). The differential equations are implemented as difference equations so that the activity of neurons is updated in the software program at each time step. The order of updates is LIP, V4, IT. For simulations presented here a time step size of 1 ms is used. Typically, for larger scan path simulations a time step size of 5 ms is used to speed simulations—outputs from the system are robust across reasonable step sizes.
V4 consists of a three dimensional matrix of pyramidal cell assemblies. The first two dimensions represent the retinotopic arrangement and the other represents the feature types. In the latter dimension, there are K + C layers of cell assemblies, as shown in Fig. 1b: the K layers each selective for an orientation, the C layers each selective for a particular colour. Two orientations (vertical and horizontal) and two colours (red and green) are normally used. Two sets of inhibitory interneuron pools exist: one set mediates competition between orientations and the other mediates competition between colours. V4 receives convergent input from V1 over the area of its receptive field with a latency of 60 ms to reflect normal response latencies (Luck et al. 1997). In order to simulate the normalisation of inputs occurring during retinal, LGN and V1 processing, the V1 inputs to V4 are normalised by passing the convergent inputs to each V4 assembly through the response function at Eq. 21 with its threshold set to a value equivalent to an input activity for approximately half a stimulus within its receptive field.
The output from the V1 simple cell process, Iijk, for each position (i, j) at orientation k, provides the bottom-up input to orientation selective pyramidal assemblies in V4 that evolve according to the following dynamics:
where τ1 is set to 20 ms; α is the weight of excitatory input from other cells in the pool, set to 0.95; β is the weight of inhibitory interneurons input, set to 10; Ipqk is the input from the V1 simple cell edge detection process at all positions within the V4 receptive field area (p, q), and of preferred orientation k; χ is the weight of V1 inputs, set to 4; Yij is the input from the posterior parietal LIP module, reciprocally connected to V4; γ is the weight of LIP inputs, set to 3; Xm is the feedback from IT cell populations via weight , described later; η is the parameter representing the strength of object-related feedback from IT; set to 5 for scan path simulations and simulations of single cell recordings in V4 (Chelazzi et al. 2001), but set to 2.5 for simulation of single cell recordings in IT (Chelazzi et al. 1993, 1998), as discussed in the text; I0 is a background current injected in the pool, set to 0.25; ν is additive noise, which is randomly selected from a uniform distribution on the interval (0, 0.1).
The dynamic behaviour of the associated inhibitory pool for orientation-selective cell assemblies in V4 is given by:
where λ is the weight of pyramidal cell assembly input, set to 1; μ is the weight of inhibitory interneuron input, set to 1.
Over time, this results in local competition between different orientation selective cell assemblies.
The output from the V1 simple cell process, Iijc, for each position (i, j) and colour c, provides the bottom-up input to colour selective pyramidal assemblies in V4 that evolve according to the following dynamics:
where Ipqc is the input from the V1 blob cells at all positions within the V4 receptive field area (p, q), and of preferred colour c; Xm is the feedback from IT cell populations via weight , described later.
The remaining terms are the same as those in Eq. 22.
The dynamic behaviour of the associated inhibitory pool for colour-selective cell assemblies in V4 is given by:
Parameters take the same values as those in Eq. 23.
Over time, this results in local competition between different colour selective cell assemblies.
The model IT encodes all possible objects and receives feedforward feature inputs from V4 with a latency of 80 ms to reflect normal response latencies (Wallis and Rolls 1997). V4 inputs to IT are normalised by dividing the total input to each IT assembly by the total number of active (i.e. non-zero) inputs. IT also feeds back an object bias to V4. The strength of these connections is given by the following weights, which are set by hand (to −1 or 0, as appropriate, for inhibitory feedback, although the model may also be implemented with excitatory feedback; 0, +1) to represent prior object learning. These simple matrices reflect the type of weights that would be achieved through Hebbian learning without the need for a lengthy learning procedure (such as Deco 2001), which is not the aim of this work. The result is that the connections that are active for excitatory feedback (or inactive for inhibitory feedback) are those features relating to the object.
V4 cell assemblies to IT (Feedforward)
IT to V4 cell assemblies (Feedback)
where z indicates orientation, k, or colour, c.
The pyramidal cell assemblies in IT evolve according to the following dynamics:
where β is the weight of inhibitory interneuron input, set to 0.01; Wijk is the feedforward input from V4 relating to orientation information, via weight ; Wijc is the feedforward input from V4 relating to colour information, via weight χ is the weight of V4 inputs, set to 2.5; γ is the weight of the object-related bias from prefrontal cortex, set to 1.2; PMv is the object-related feedback current from ventrolateral prefrontal cortex, injected directly into this pool.
This feedback is sigmoidal over time as follows: For the target object:
Other objects receive inhibitory feedback as follows:
where t = time (in milliseconds) and τsig is the point in time where the sigmoid reaches half its peak value: set to 150 ms for replication of most single cell simulations and all scan path simulations. Set to 200 ms to simulate longer latency object effects and saccades recorded in IT by Chelazzi et al. (1993). The bias acts to inhibit non-target objects in IT, but is also effective if modelled as an excitatory bias to the target object.
To simulate a sustained mnemonic prefrontal response (in order to replicate the small early target effect in IT), a constant excitatory feedback bias is also added as follows. For the target object only:
The remaining terms and parameters are evident from previous equations.
The dynamic behaviour of the associated inhibitory pool in IT, providing competition between objects, is given by:
where λ is the weight of pyramidal cell assembly input, set to 3; μ is the weight of inhibitory interneuron input, set to 1.
The pyramidal cell assemblies in LIP evolve according to the following dynamics:
where β is the weight of inhibitory input, set to 1; Wijk is the orientation input from V4 for orientation k, at location (i, j); Wijc is the colour input from V4 for colour c, at location (i, j); Pijd is the spatial AW bias injected directly into this pool when there is a requirement to attend to this spatial location following fixation; γ is the weight of the spatial AW bias, set to 2.5; Zpq is the bias from area pq of the novelty map (which is the size of the original image, N). Area pq represents the size of the LIP receptive field (see Lanyon and Denham 2004a, for information about the novelty bias); η is the weight of the novelty bias, normally set to 0.0009. To attract the scan path to target coloured locations: ε > χ so that colour-related input from V4 is stronger than orientation-related input. Normally set to χ = 0.8, ε = 4 (but see Lanyon and Denham 2004a for examination of relative weighting).
The remaining terms are evident from previous equations.
The dynamic behaviour of the associated inhibitory pool in LIP, providing competition between locations, is given by:
where λ is the weight of pyramidal cell assembly input, set to 1; μ is the weight of inhibitory interneuron input, set to 1.