|Home | About | Journals | Submit | Contact Us | Français|
A fundamental goal in vision science is to determine how many neurons in how many areas are required to compute a coherent interpretation of the visual scene. Here I propose six principles of cortical dynamics of visual processing in the first 150ms following the appearance of a visual stimulus. Fast synaptic communication between neurons depends on the driving neurons and the biophysical history and driving forces of the target neurons. Under these constraints, the retina communicates changes in the field of view driving large populations of neurons in visual areas into a dynamic sequence of feed-forward communication and integration of the inward current of the change signal into the dendrites of higher order area neurons (30–70ms). Simultaneously an even larger number of neurons within each area receiving feed-forward input are pre-excited to sub-threshold levels. The higher order area neurons communicate the results of their computations as feedback adding inward current to the excited and pre-excited neurons in lower areas. This feedback reconciles computational differences between higher and lower areas (75–120ms). This brings the lower area neurons into a new dynamic regime characterized by reduced driving forces and sparse firing reflecting the visual areas interpretation of the current scene (140ms). The population membrane potentials and net-inward/outward currents and firing are well behaved at the mesoscopic scale, such that the decoding in retinotopic cortical space shows the visual areas’ interpretation of the current scene. These dynamics have plausible biophysical explanations. The principles are theoretical, predictive, supported by recent experiments and easily lend themselves to experimental tests or computational modeling.
The reasons that visual scientists so far have failed to give a coherent view of cortical functions and visual perception may be both conceptual and methodological. According to the traditional view, visual information reaches the primary visual cortex in parallel pathways from the retina carrying information about fine contrast, color opponency, and motion. From the primary visual cortex segregated parallel pathways convey luminance contrast, color opponent and motion signals to different higher order visual areas in the occipital-temporal lobes and occipital–parietal lobes respectively (Zeki, 1980a,b; Ungerleider and Mishkin, 1982; DeYoe and Van Essen, 1988; Livingstone and Hubel, 1988; Felleman and Van Essen, 1991; Singer, 2004). The visual information is processed according to the accepted scheme for cortical connectivity by a set of areas arranged in a parallel and hierarchical system. Primary sensory areas send information in parallel to different sets of higher order cortical areas that are specialized to compute particular aspects of the sensory information (color, motion, etc.). As the information travels feed-forward simultaneously along parallel cortico–cortical connections, the neurons in subsequent cortical areas react to more and more complex combinations of visual attributes (Movshon et al., 1985; Tanaka et al., 1991; Wang et al., 1996; Carandini et al., 1997). But since neurons in different areas represent different aspects of the visual scene (shape, color, motion, optic flow, etc.) there is a binding problem: how are these aspects put together to a unified percept, say of somebody driving a blue Volvo down main street? One proposal is that neurons representing different aspects of the visual scene fire synchronously in assemblies across areas. One set of assemblies will fire synchronously representing the object (the blue Volvo) and other assemblies across areas fire synchronously representing the background (Milner, 1974; von der Malsburg and Schneider, 1986; Gray and Singer, 1989; Singer, 2004). These synchronous firings are frequently in the γ-range (30–80Hz), appearing in the visual areas as γ-oscillations after stimulation (Singer, 2004; Berens et al., 2008).
Even if one assumes that the binding problem has found its solution, there are other major unresolved issues with multiple areas engaged in perception, such as delays. When an object appears in the visual field of view, it is mapped with different latencies in the six layers of cortex (Maunsell and Gibson, 1992). Furthermore, the visual signals reach these different visual areas with different delays (Raiguel et al., 1989; Maunsell and Gibson, 1992; Dinse and Kruger, 1994; Katsuyama et al., 1996; Schroeder et al., 1998; Bullier et al., 2001; Tanaka et al., 2002; Vajda et al., 2004; Chen et al., 2007). The diversity of delays are particularly troublesome for the perception of moving objects as a moving object will continue to move while the motion signals are distributed to the multiple visual areas.
A problem of yet larger dimensions is how can all these neurons engaged with diverse delays and performing different computations work together to produce a coherent interpretation of the physical surround? The suggestion about synchronous firing is rather a proposed solution to the binding problem, but it does not explain the mechanisms by which the perceptions are created. The perceptual mechanisms are not explained by the notion that neurons in different visual areas by their firing of action potentials (APs) represent different aspects of the visual scene (Sejnowski, 2003). Neither are the perceptual mechanisms explained by the notion that the APs are the encodings of aspects of objects in the physical surround. In fact, the amount of information about objects and aspects of objects in the physical surround in any spike train from any visual area is small (Richmond and Optican, 1990; Heller et al., 1995; Eriksson and Roland, 2006; Eriksson et al., 2010). Furthermore, the spiking of single neurons is unreliable. Repetitions of an identical change in the physical scene evoke large trial-by-trial variations in the timing and rate of the spiking of single neurons in the primary visual cortex. This variance is so big that usually no neuron in a single trial can encode any aspect of the physical scene (Burns and Webb, 1976; Heggelund and Albus, 1978). Finally, the ideas of neurons representing physical objects or aspects of objects in the visual field of view, ignores that visual information about the physical surrounding is underdetermined (Helmholtz, 1867). The problem of how billions of neurons in mammalian brains compute and communicate their computations is a general and fundamental problem that is not solved yet. To paraphrase Helmholtz (1867), one may say that the task of the visual cortical areas rather is to compute a perceptual solution that matches the physical surrounding reasonably well. The neurons in the visual areas do this surprisingly fast; most changes in the visual scene are recognized in less than 150ms (Thorpe et al., 1996). How can the visual cortical areas be so fast? The answer to this is not known, which further illustrates that our current view of the neurobiology of vision is perhaps wrong, but definitely incomplete (e.g., Sejnowski, 2003; Olshausen and Field, 2006).
To understand visual perception one must discover the dynamic mechanisms of neuron communications and neuron computations that drive neurons in many visual areas towards an interpretation of the physical surrounding. Neurons send AP's and release neurotransmitter as a biophysical necessity to drive their target neurons into excitation or inhibition. By doing so they change the computations of the target neurons. Eventually these changes will lead to that the neurons in the visual areas collectively compute an interpretation of the visual scene. It is not necessary for this dynamics to work, that the single spike train is labelled as representing or encoding particular aspects of the physical surround. To describe how the dynamics evolve, the concepts of neural communication and neural computation will be used, because these two concepts have clear biophysical definitions (see Box for definitions and illustrations of concepts).
Communication, computation, drive, decoding and definitions
Neurons communicate by APs, and transmitter releases. The distinction between computation and communication may not be entirely sharp, as some transformation of spike trains could take place in axons. Furthermore, at the axon terminal, the presynaptic part indeed performs non-linear operations when the transmitter is released. The effect of communications is either excitatory postsynaptic currents or inhibitory postsynaptic currents. By measuring the membrane potential or current changes of the target neurons one can study the effects of communications (decoding) (Box Figure).
The neuron computations start by afferent inputs to the synapses, the synaptic currents propagate into the dendrites, which perform non-linear operations, and end by producing electrical spike activity, APs in the form of a spike train r(t), or no APs (Roland, 2002). Note that neurons thus compute even if they do not (immediately) send out APs. Any change in their (local) membrane potential Vm will affect also future inputs and outputs. Note that the APs do not show the computations. The APs are the results of the computations.
One can describe all important communications and computations within and between cortical areas by two variables, APs and the membrane potential Vm, at least for a few hundreds of ms. The neurons communicate and compute within the cortical space consisting of the mesh of axons, dendrites and the somata of the cortical neurons (Figure (Figure1).1). Communication is possible because neurons can drive their 2000–3000 target neurons. The driving neurons communicate APs to their axon terminals, which drive their target neurons by releasing (many) vesicles of neurotransmitter (Box Figure). In the target neuron dendrite, the induced rate of change in membrane potential, Vm, is proportional to the inward current
in which Cm is the membrane capacitance that is almost constant (1μF cm−2) for all neurons and Q is the intracellular charge. The intracellular charge can be described as the sum of the excitatory, inhibitory and leak currents:
by convention the current that enters the neuron, the inward current, is positive (hence the minus on the right side); EE is the excitatory reversal potential, EI is the inhibitory reversal potential, gE is the excitatory conductance, gI the inhibitory conductance, gL is what is termed the leak conductance and EL the leak conductance reversal potential. The leak conductance appears as a term, because the dendrites have ion pumps that pump Na+ out of the dendrite and K+ into the dendrite as well as other ion pumps changing the membrane potential. In vivo, if the dVm(t)/dt increases and hence the inward current increases significantly above baseline this means net excitation; if the dVm(t)/dt is significantly negative (i.e., below the baseline), this indicates net inhibition. Neurons may be net excited, without actually firing any APs immediately. If the net excitation prevails, a communication (often a feedback) targeting these pre-excited neurons later can bring them over the threshold to fire.
The difference between the reversal potential and the actual membrane potential is the input driving force. The excitatory (Vm−EE) and inhibitory (Vm−EI) driving forces of the target neurons, all other factors equal, determine how fast and to what extent the communications the neurons receive can drive the neurons towards their firing threshold or keep them away from this threshold. Note that the excitatory driving force is largest when a prevailing inhibition is just released.
Decoding are the immediate changes in inward currents/outward currents and membrane potentials of the target neurons. The target neurons in area 17 of the visual cortex decode the information of the changes in the visual scene contained in the ON r(t) response. The decoding is the immediate changes in inward current d(Vm(t))/dt and the resulting immediate change in the membrane potential ΔV/m(t).
The two most important variables for understanding rapid cortical dynamics are the membrane potential, Vm, its temporal derivative, dVm(t)/dt, and the APs (Box). The neurons emit APs to communicate with other neurons. The APs are the results of the computations of a neuron. When a neuron communicates, it sends APs to the 2000–5000 neurons it communicates to. The communication with the target neurons becomes possible, because the neuron through its transmitter release can drive the target neurons. For the many target neurons, the probability of them being driven to excitation and fire depends on many factors: the type of target neuron and its dendritic tree (Contreras and Palmer, 2003; Elston, 2003; Shinomoto et al., 2009), the instantaneous rate of the afferent APs, whether the target neurons are in up-state or down-state, the recent history of the membrane currents (Truccolo et al., 2009), the rate with which the inward current increases (Eq. 2 in Box), and the membrane potential Vm. In general, the firing probability increases with increases in the dVm/dt (Azouz and Gray, 2000). The prerequisite for an excitatory neuron to drive is that the excitatory driving force Vm−EE in the target neurons is sufficiently large (Eq. 2 in Box). With the exception of the APs, the Vm excitation rarely exceeds 25mV over baseline in vivo (baseline approximately −70 to −65mV) (Contreras and Palmer, 2003; Petersen et al., 2003a,b; Crochet and Petersen, 2006; Haider et al., 2006; Ferezou et al., 2007; Rudolph et al., 2007; Monier et al., 2008). As it is possible in vitro to drive the Vm into more excitation, it is thought that there is a considerable simultaneous outward current (inhibitory shunting inhibition from Cl− conductance) in vivo preventing the Vm to go into more excitation (Borg-Graham et al., 1998; Haider et al., 2006; Rudolph et al., 2007; Monier et al., 2008). Of importance for the following is that, if the target population Vm is high (increased by 12–20mV), the excitatory drive decreases and the inhibitory drive increases. Thus adding excitation to an already high Vm will often bring the neurons over their firing threshold, but most likely result in sparse or moderate firing. Conversely, if the Vm is low the excitatory driving force increases and the inhibitory driving force decreases in the target neurons (Box Figure). So the conditions for driving a target population in which inhibition was just released are good. A good example of this is the ON response in the retino-geniculate and geniculate-cortical spike trains. Also, if a population of neurons are just having an increase of the inward current, an excitatory communication to these neurons from another source will add to the inward current and readily fire especially the excitatory pyramidal neurons.
The target neurons's computations start with the synaptic currents (Box). Excitatory communication gives inward synaptic current, inhibitory communication gives outward current. The synaptic currents propagate into the dendrites. Here the complicated interactions between many membrane currents, axial currents and dendritic conductances take place in a non-linear manner. The excitatory dendritic potentials may propagate though the soma of the neuron to the axon to generate APs. Or the dendritic potentials are stopped by inhibitory conductances in the dendrites or soma, such that no AP's can be generated. In both cases the neuron computes (Box). In the case that it does not generate any AP's it does not communicate its computations.
What drives single neurons also drives populations of neurons. The two equations (Box) are valid for single neurons as well as for populations of neurons (Eriksson et al., 2008). At a certain cortical point, one can still describe the membrane potential changes of all membranes by Eq. 1 or 2 (but not assume that all individual membranes make identical contributions to the population value). In this paper I have chosen a deterministic formulation of the dynamics of visual cortical neurons, but the formulation could as well have been stochastic without any changes in the principles or their underlying mechanics.
The description of the neural communication and computation dynamics occupies large sections of this paper. The purpose is to describe how the single neuron computations and the communications to target neurons, the dVm/dt and driving forces in the target neurons and the resulting space–time dynamics of communications between neurons, within an area and between cortical areas, drive the brain to an interpretation of the physical surround.
One problem is that the dynamics of the computations and communications take place in a very complex network of neurons connected by billons of synapses.
Imagine that we put down a small probe of 50μm in diameter in the cerebral cortex. If the cortex is 1.4mm thick and has about 80000 neurons per mm3 (visual cortex), the probe will capture about 220 cell bodies of neurons. The sampled tissue cylinder with a cortical surface diameter of 50μm will contain not only the cell bodies and dendrites and axon terminals of these 220 neurons, but also the dendrites and axons of neurons surrounding the tissue cylinder. If one assumes that the average dendritic span of a neuron is 600μm (Lübke et al., 2003), the small tissue cylinder will contain the dendrites of approximately 75000 neurons. This number may be larger in higher order sensory areas and the prefrontal cortex as the dendritic span is much larger in these areas compared to the primary sensory areas (Elston, 2003). In addition the small cortical cylinder contains many axon terminals from neurons within the same area and from neurons in other cortical areas (Figure (Figure1).1). In this cortical space, the mesh, the spatio-temporal changes of synaptic activity could take place in a very complicated way, making it virtually impossible to understand the cortical dynamics even if one had simultaneous measurements of many neurons in the mesh. To reveal the cortical dynamics evolving after a change in the physical surround, one must get space–time maps of the evolution of changes in Vm, i.e., dVm/dt, and changes in APs, i.e., r(t) at best from all visual areas, because these maps will give information about the computations and communications in the visual areas. These quantitative maps must be in real time (ms) and the space must be the real space (=the mesh) as there is no known method of reducing this space. Some of the figures provide examples of such measurements of space–time maps. Surprisingly, despite the complexity of the mesh, the real-time computations and communications in the living cerebral cortex have a fairly simple spatio-temporal course in the mesh at the mesoscopic (population) scale. This implies that it is possible to derive principles of the computation and communication dynamics from measurements of Vm and r(t) in this mesh.
The ingredients in the proposal are (1) the biophysical concepts of computation, communication, driving, driving force, and decoding (see Box). (2) Two principles which summarize the biophysical description of neural communication: Neuron populations in any area can drive their target neurons to change dVm/dt with the constraints of Eq. 2 and the factors summarized earlier. Conversely, if the instantaneous rate, r(t) → 0 or r(t) is constant, i.e., dr(t)/dt→0, and the excitatory driving forces of the target neurons are small, the communication between areas diminish. (3) One axiom for visual transients: The retina transmits changes in the visual scene. (4) The proposal of three principles of (visual) dynamic network properties: pre-excitation, that feedbacks reconcile differences in computations between higher and lower visual areas, and that the network dynamics in visual areas are well behaved. With these ingredients, one can show that a change in the visual scene is associated with single neuron computations and communications to target neurons, changes in the dVm/dt and driving forces in the target neurons and resulting space–time dynamics of communications between neurons within an area and between cortical areas driving the brain to an interpretation of the physical surround in less than 150ms.
When the retinae detect a change in the field of view, the neurons in the visual areas engage into a dynamic sequence starting by driving the neurons into feed-forward (FF) communication and pre-excitation – then integration of the change signal in higher areas – feedback – reconciling computational differences – decoding of the visual scene – reduced driving forces – decreased communication, and sparse firing lasting until offset of the stimulus or the appearance of a new change in the visual scene. The dynamic sequence is described chronologically from 0 to 150ms. For each transient state in the dynamic sequence, the neurophysiologic dynamics of the Vm, dVm/dt, and r(t) will be illustrated with recent experimental results, and the biophysical principles and mechanisms underlying this dynamics will be discussed. This constitutes the main contents. In order to keep the text within reasonable limits, only a few of the numerous predictions are mentioned. Computational models of the visual cortex, even with simpler architecture than the actual mesh, could probably express the dynamics embedded in the dynamic sequence. However, such computational models most likely have to be large-scale models, as models with small degrees of freedom do not express the transient dynamics examined in this paper (Rabinovich et al., 2008). To keep the text within reasonable limits, large-scale computational models are neither presented nor discussed here.
Neurons simply drive other neurons into excitation or inhibition. There is no need for any distributing or gating mechanisms to regulate the communications between neurons. This principle also implies that there is no default order of excitation of cortical areas. Neurons sending their axons to other cortical areas are excitatory (Maunsell and Van Essen, 1983; Ottersen and Storm-Mathisen, 1986; Loewenstein and Somogyi, 1991; Rockland and Drash, 1996; Anderson and Martin, 2002). The cell bodies of driving neuron populations can be located in any cortical area: prefrontal, motor, higher order sensory, and primary sensory. As each neuron in cortex sends axon terminals to at least 2000–3000 other cortical neurons (Colonnier, 1981; Somogyi et al., 1998), the number of target neurons will be large when even a small population is driving (Roland, 2002) (for definitions see Box).
Two examples might illustrate the principle of neuronal driving and its consequence of no default communication between cortical areas. For example, visual area should process visual information and somatosensory areas should process somatosensory information. But in situations with simultaneous somatosensory and visual stimulation, neurons in somatosensory areas send synaptic excitations traveling into visual areas, and conversely, visual areas send synaptic excitations traveling into somatosensory areas (Takagaki et al., 2008). Moreover, visual areas can be very active without visual stimulation and without organized visual activity. Local dynamics may drive the neurons of a visual area into an up-state, during which the neurons are more excited with Vm around −55mV and the r(t) increases considerably (Destexthe and Paré, 1999; Petersen et al., 2003b; Crochet and Petersen, 2006; Ferezou et al., 2006). Occasionally the increased firing rates of a neuron population in an up-state drive target neurons in adjacent cortical areas to excitation and firing. Figure Figure22 shows an example of such a propagating net excitation. Here the higher parietal visual areas are in an up-state which spreads over the cortex in feedback direction to the primary visual area 17. Similar propagating up-states have also been observed in anesthetized rodents, and turtles (Prechtl et al., 1997; Petersen et al., 2003a,b; Crochet and Petersen, 2006; Ferezou et al., 2006; Lippert et al., 2007; Xu et al., 2007; Takagaki et al., 2008).
In experiments, the Vm dynamics disappears when glutamate antagonists are applied to the cortex in vivo (Berger et al., 2007). This indicates that synaptic activity is the major factor in producing the rich Vm and dVm/dt dynamics in vivo.
The dynamics in communication within and between cortical areas is limited only by how the neurons interconnect by synapses (and by gap-junctions). Within this anatomical constraint, spontaneous as well as evoked APs can drive neuron populations in target areas.
Only a minor proportion of the physical surround can be seen. The term the visual scene refers to the part of the physical surround that can be detected by vision. Under normal viewing conditions the rods are saturated, so the cones convey information about the visual scene. Cones only react to changes, i.e., an increase or decrease in the luminance or wavelength composition of the reflected light. To this the retinal ganglion cells react with an ON response. This ON response signals the change in the visual scene and not the visual scene itself (Kuffler, 1953; Enroth-Cugell and Jones, 1961; Hughes and Maffei, 1966; Enroth-Cugell et al., 1983; Heggelund et al., 1989). Similarly, the information that is conveyed by the lateral geniculate nucleus (LGN) ON response is about changes in the visual scene and not the actual new scene itself.
This means that the neurons in the primary visual cortex, area 17, initially after a change, should decode changes in the visual scene; but not decode the actual scene (Kinoshita and Komatsu, 2001; Eriksson et al., 2010). This hypothesis was tested by Eriksson et al. (2010). The first 20ms after the change, the r(t) was correlated with the previous scene. Thereafter, for the next 50–60ms as predicted by Eriksson et al. (2010), the r(t) was correlated to the difference in luminance between the previous and current scene, i.e., correlated with the change in the visual scene (Figure (Figure3).3). Both the r(t) of single neurons and the r(t) of a large neuron population in area 17 correlated significantly more with the luminance differences than with the previous or current visual scene (Figure (Figure3).3). First, 90ms after the sudden change in the visual scene, the r(t) was correlated with the current scene.
In retrospect, these findings by Eriksson et al. (2010) explain several observations. When we wake up from a dreamless sleep, the visual cortex will undergo the transition from no visual content to receive all changes present in the surroundings. This difference pattern will be very close to the actual scene content. Similarly, most visual physiologists present a homogenous screen and on top of this they present a stimulus. In this case, the changes in luminance and contrast introduced by the stimulus would be close to the stimulus itself. Only when the visual scene is shifted from 1ms to the next, one would measure the correlations to the difference between the past and the present scene in the spike trains. Under such conditions, the orientation preference of the area 17 neurons will initially reflect the luminance contrast change rather than the true orientation of the stimulus (Ringach et al., 2003; Xing et al., 2005; Nikolic et al., 2007; McLelland et al., 2009; Eriksson et al., 2010). Note however that the average spike correlation to the difference pattern as well as the current scene pattern is small (Figure (Figure3).3). This is in accordance with other studies (Richmond and Optican, 1990; Heller et al., 1995; Kinoshita and Komatsu, 2001; Eriksson and Roland, 2006) and further emphasize that much of the spiking activity has other purposes.
To each change in the visual scene, the retinal ganglion cells make ON responses. The ON response is a rapid increase in the firing rate, r(t), followed by a slightly slower decrease (Figure (Figure4).4). The retinal ON responses drive neurons in early visual areas to fire similar ON responses and communicate ON responses to several visual areas with neurons having favorable driving forces. Within each area, the population receiving the ON response drive neurons in the surround into a pre-excitation. When the pre-excitation is maximal, higher visual areas send feedback (FB) to all neurons that received the communication of the visual change. The targets of the FB are the large populations of neurons in lower order areas, which at this moment are pre-excited or firing. Figure Figure44 shows the r(t) ON response and related variables in the primary visual area, area 17 during the feed-forward and the FB phases.
The dynamic sequence is a consequence of the principle that the retina transmits changes in the visual scene as driving ON responses and the principle that neurons in any area can drive their target neurons into excitation if the driving forces of the target neurons are favorable. Examples of changes in the visual scene are when an object suddenly appears or an object moves into the visual field of view. Sudden changes in the visual scene are transients that are especially advantageous for the study of cortical dynamics. In artificial vision, scientists often use stimuli with continuous oscillating changes, for example gratings drifting with a constant velocity. For such stimuli, the dynamic sequence will appear in the first 100–120ms after the start of the stimulus. It is outside the scope of this article to analyze the further dynamics associated with suchstimuli.
Figure Figure55 shows a cartoon of the whole dynamic sequence when a new object appears in the field of view. This may be divided into the spatio-temporal cortical dynamics of the communications within and between areas (Figure (Figure5A),5A), and the temporal dynamics in Figures Figures5B,C5B,C of the local computations where the object is mapped in each area. Area 17, the primary visual area, has detailed retinotopy, i.e., the topology of the area resembles the topology of the retina. In area 17, the neurons have small receptive fields. The second area, exemplified by area 21 is also a retinotopic area, but with less distinct retinotopy and neurons with large receptive fields. The two last example areas are PP in the parietal cortex and one area in the inferior temporal cortex, IT. The IT area is a non-retinotopic area (Nelson and Bower, 1990), like that found in the inferior part of the temporal lobe of carnivores and primates (Figure (Figure55A).
Consider first the temporal changes in driving forces, membrane currents and firing frequency, r(t) when an object pops up in the visual scene. When the object pops up it elicits ON responses in retinal ganglion cells. The retinal ON responses drive the neurons of the LGN to generate similar ON responses. If neurons in area 17 fire sparsely, the ON responses from the LGN drives the neurons in area 17 efficiently because most neurons are likely to have a Vm varying around −65 to −75mV (Destexhe et al., 2003; Crochet and Petersen, 2006; Ferezou et al., 2006; Rudolph et al., 2007; Eriksson et al., 2008). The neurons located at the retinotopic cortical site corresponding to the retinal change compute a similar ON response, a sharp increase in r(t) and a sharp net increase in the inward current (excitation) (Figure (Figure5B).5B). The outward current also increases simultaneously, but not to the extent that it prohibits the net excitation (Borg-Graham et al., 1998; Monier et al., 2008). The ON responses from the layer IV neurons communicate net excitation to the dendrites of layer III neurons in the output layer. The layer III neurons, in turn, compute similar ON-responses and communicate these to areas 19 and 21 and eventually to parietal visual areas (PP) and to scattered populations of neurons in the inferior temporal cortex (IT) (Figure (Figure55B).
If a moving object appears in the field of view, the temporal dynamics of areas 17, 18, 19, and 21 are quite similar, suggesting that the biophysical mechanisms in area 17 may also be similar (Figure (Figure5C).5C). A moving object also elicits a retinal ON response that, in most trials, drives the area 17 neurons into a similar excitatory ON response with increases in dVm/dt, r(t) and Vm (Figure (Figure5C).5C). From area 17, areas 18, 19, and 21 are also driven into excitatory ON responses by FF communications albeit with a lag of 10ms for areas 19 and 21 and parietal visual areas (Figure (Figure5C).5C). In the particular case of objects moving over the retina, the driving forces of the retinal input are now severely reduced where the object was first mapped in area 17 (because the object has moved on).
The spatio-temporal dynamics is known from recent publications. It evolves as follows for a static object appearing in the visual scene. First the object is mapped retinotopically in area 17 as a dynamic bump in dVm/dt, r(t), and Vm (a bump is a smooth mathematical function with a precise topology) (Figure (Figure6).6). Locally, in the bump, some neurons fire ON responses as already mentioned, but the number of neurons that just increase their inward current, but do not fire, by far exceeds the number of neurons firing (Roland et al., 2006; Berger et al., 2007). This is probably because each neuron sends axons to 2000–5000 target neurons, most of these in the local surround. This makes the population net dVm/dt lag the r(t) with a few ms in Figures Figures44 and and5B,C5B,C (Eriksson et al., 2008). The layer III object-mapping neurons drive target neurons surrounding the bump (Nauhaus et al., 2009), because the Vm in the remaining of areas 17 and 18 is not directly driven by the retinal communication and the target neurons in the surround thus have favorable excitatory driving forces (Figures (Figures5A5A and and6).6). This lateral excitation spreads in all directions from the population of mapping neurons (Grinvald et al., 1994; Bringuier et al., 1999; Slovin et al., 2002; Roland et al., 2006) (Figure (Figure5).5). As seen from Figure Figure6,6, the net excitation is larger where the object is mapped, but not negligible in the rest of areas 17 and 18 (where the object background is mapped). Pre-excitation in the form of lateral spreading net excitation is also seen in other species (Petersen et al., 2003a; Homma and Tanifuji, 2006; Benucci et al., 2007; Ferezou et al., 2006, 2007; Lippert et al., 2007; Xu et al., 2007; Han et al., 2008; Nauhaus et al., 2009). Pre-excitation is proposed as a general dynamic mechanism, bringing the target neurons into a state of net-inward currents–insufficient to make them fire – but large enough for them to fire when neurons from another area communicate. For example, most neurons in areas 17 and 18 are pre-excited. When the feedback to these early visual areas arrives from higher order areas, many are brought over their firing threshold. At 65–70ms the lateral spreading pre-excitation reaches its maximal amplitude.
The layer III neurons are also the output neurons to other areas (Rockland, 1997) and drive the retinotopic corresponding neurons in area 19 and 21 to net excitation and firing (Figures (Figures5B5B and and6).6). This produces a smaller bump where the object is mapped, and subsequently a more moderate lateral pre-excitation (Figure (Figure6).6). Thus during the first 70ms large populations of neurons in many visual areas become exited just below threshold or fire because the excitatory driving forces are favorable to the FF communication of the retinal change in these areas. In most of the areas receiving the ON response, some neurons fire, but many more become net-excited below the firing threshold (Figures (Figures5A5A and and66).
If a moving object appears in the field of view, the spatio-temporal dynamics roughly follows a similar scheme. Communications from the LGN drive an initial bump in area 17 mapping the moving object. The neurons in the bump drive a lateral spreading pre-excitation and communicate FF to higher order areas (Figure (Figure7).7). Figure Figure5C5C shows the driving neurons communicating the r(t) ON response to the target population in area 21 and the net-inward currents in the neurons mapping the moving object at the area 17/18 border and the 19/21 border during FF and FB. If the object moves over the retina, the LGN-cortical communication of r(t) will target neuron populations located at subsequently more and more lateral positions from the cortical point where the moving object was initially mapped. As the neurons at the cortical site of the initial mapping fire and drive their target neurons outside this site into a lateral spreading excitation (Figure (Figure7A),7A), the moving ON input to neurons in the cortical direction of motion adds net-inward current to the already exited neurons, but only in the direction of cortical motion. When the ON communication from LGN moves further, the target neurons in the direction of cortical motion only gets additional inward current. This produces a pre-excitation and sparse firing ahead of the moving map of the object in areas 17/18 and 19/21 where most of the neurons are excited below firing threshold, but waiting for an additional excitation to bring them over threshold (Figures (Figures7A,B).7A,B). According to Harvey et al. (2009) this mechanism repeats and enhances in higher order areas 19 and 21, which compute an extended excitation ahead of the mapping of the object. This computation is communicated as an excitatory FB to areas 17 and 18 at the time when the excitatory driving forces are favorable (84–99ms Figure Figure7).7). As the specific directional pre-excitation has just evolved in area 17 and 18, inward current adds now from the excitatory FB and fire the neurons as far as 8° ahead (Harvey et al., 2009) (Figures (Figures7A,B7A,B and and1010).
Despite the uncertainty in predicting dynamics of complex systems in general, the dynamic sequence is the most likely outcome from a sudden change in the visual surround. The neurons in visual areas are driven by retinal-LGN ON responses, if the driving forces are favorable. In each visual area the neurons fire at the retinotopic sites and from these mapping neurons, many more neurons become pre-excited either by lateral net excitation or specifically in the cortical direction of object motion during the first 70ms.
In order to compute the current scene under sudden visual scene shifts, the visual cortical neurons must integrate the change signal with the activity remaining from the previous scene in order to gain information about the present scene. Thus an optimal decoding will integrate the inward current induced by the ON response with the current membrane potential. Although the pyramidal neurons in area 17 receiving the ON response from the LGN have some integrating properties, the area 17 neurons tend to respond with similar ON responses as the neurons in the LGN (Jung et al., 1952; Hubel and Wiesel, 1962; Hirsch et al., 1998; Martinez et al., 2002).
In the higher order inferior temporal visual areas, the constituents of the visual scene are mapped according to shape, not according to retinal position (Tanaka et al., 1991; Wang et al., 1996; Tsunoda et al., 2001; Yamane et al., 2006). This mapping shows size and gaze shift invariance (Tovee et al., 1993; Ito et al., 1995) meaning that the IT neurons lost information of the relations between figure and background, retinal position, and object size (Baylis and Driver, 2001). This information however is present in the parietal visual areas and may be present in the FB from these areas (Eriksson and Roland, 2006; Roland et al., 2006).
Although there is sparse information about the membrane dynamics in IT, the driving by afferent axons is such that the whole area becomes net excited with some maxima scattered over the area and neurons firing most at the locations of the maxima (Tanaka et al., 1991; Wang et al., 1998; Homma and Tanifuji, 2006; Yamane et al., 2006) (Figure (Figure55A).
The neurons in the inferior temporal cortex integrate over longer time-intervals compared to those in the primary and early visual areas (Kovács et al., 1995; Schroeder et al., 1998; Freedman et al., 2003; Brincat and Connor, 2006; Chen et al., 2007; Akrami et al., 2009) (Figure (Figure5B).5B). As the inferior temporal neurons start their integration some 50ms after the sudden scene change, it takes some time to compute the current scene. In IT cortex the neurons integrate from 50 to 70ms the change in the visual scene communicated to them in the form of an ON r(t) from lower order visual areas. The layer III neurons are the first to be driven in IT by the FF communication (Schroeder et al., 1998; Chen et al., 2007). The integration may take place in the dendrites of the neurons in supragranular layers. Simultaneously the dendrites in supragranular layers could be retrieved for their memory (Roland, 2002). The retrieved memories are r(t)s which correlate with the recognized objects (Liu et al., 2009). The proposal is that this information is then communicated as a FB to lower order areas. The dendrites of the neurons of the IT network integrate what is communicated to them by increasing their inward currents to provide a net increase in the excitation (dVm/dt) peaking around 70–80ms (Figure (Figure5B).5B). The net increases in the inward current drives the IT neurons to fire and in turn their target neurons to fire above their spontaneous rate (Li et al., 1993; Nakamura et al., 1994; Kovács et al., 1995; Schroeder et al., 1998; Rolls and Deco, 2001; Tompa et al., 2005; Chen et al., 2007). It is assumed that some of the target neurons have FB axons to areas 21, 19, and 17 (Rockland, 1997).
As seen in Figure Figure3,3, the r(t) correlation to the current scene in area 17 dominates after 90ms. This is just after the FB from the higher order visual areas has reached area 17 (Figures (Figures5B,5B, B,6,6, and and7).7). The FB arrival in area 17 gives rise to a second and smaller increase in r(t) in area 17 peaking just after 100ms (Roland et al., 2006) (Figures (Figures44–6, and and8).8). Since the IT neurons may be better suited to compute the objects of the current scene, than are the area 17 neurons, the area 17 neurons may be dependent on the FB from IT to convert their change-decoding into a decoding of the current scene.
In addition to the neurophysiological evidence for change–integration decoding by visual areas, there are other arguments. The first is that, that if the perception of the scene should be completed in 120–150ms, there is no time for any further cycles of FB and FF for recognizing the new items and the new scene. Indeed humans are able to recognize objects shown for 60–70ms even when such short exposure is followed by a mask (Kovács et al., 1995; Thorpe et al., 1996; Delorme et al., 2000) whereas recognition of the whole scene takes 40–60ms more (Rieger et al., 2005). Neither is there any chance that a second ON response to the same change in the visual scene should develop which could drive the integration of the change once more. The purpose of the FB from IT cortex to lower order visual areas then is suggested to be communication of the interpretation and recognition of current objects to lower order areas.
In cortical dynamics, feedback (FB) is the communication of a higher order area's computational results to a lower order area. The FB described here is a cortico–cortical FB appearing usually 70–120ms after the change in the visual scene. This FB is an expression of the dynamics of the network of neurons in the visual areas and independent of whether the animal is awake or anesthetized (Eriksson and Roland, 2006; Roland et al., 2006; Xu et al., 2007; Tagagaki, 2008; Harvey et al., 2009). This distinguishes FB from other “top–down” phenomena associated with attention, expectation and anticipation (Roland, 1981; Friedman-Hill et al., 2003; Ro et al., 2003; Maunsell and Treue, 2006). Also attention related FB seems to have another time course 150–300ms (Mehta et al., 2000a,b). The FB is a consequence of the principle that neurons drive their targets if the driving forces are favorable.
Neurons in higher areas from 45 to 75ms compute what is communicated to them to arrive at results that differ from those computed in the primary and lower order visual areas (Shinomoto et al., 2009). Examples are categorization of objects, memory retrieval and memory formation, interpretations and predictions (Freedman et al., 2003; Brincat and Connor, 2006; Ahmed et al., 2008; Akrami et al., 2009; Harvey et al., 2009; Liu et al., 2009). In addition the receptive field size increases from 1° in area 17 to 60° in higher order retinotopic areas (Rosa et al., 1997; Gattass et al., 2005). The span of the dendritic branches and their number of synapses increases from lower to higher order areas (Elston, 2003), and the anatomical circuitry also changes (Barone et al., 2000). Neurons in the inferior temporal cortex integrate over longer times than do area 17 and 18 neurons.
There is increasing evidence that populations of neurons in higher order areas send FB to early areas 17 and 18 in the time interval 70–120ms (Lamme, 1995; Vanduffel et al., 1997; Hupé et al., 1998; Angelucci et al., 2002; Salazar et al., 2004; Eriksson and Roland, 2006; Roland et al., 2006; Lippert et al., 2007; Xu et al., 2007; Ahmed et al., 2008; Scholte et al., 2008; Takagaki et al., 2008; Harvey et al., 2009; Wibral et al., 2009). The direct evidence for FB stems from observations of (“waves” of) net synaptic excitation (Berger et al., 2007; Eriksson et al., 2008) moving with relatively high velocity over the cortex from higher order areas to lower order areas (Eriksson and Roland, 2006; Roland et al., 2006; Lippert et al., 2007; Xu et al., 2007; Ahmed et al., 2008; Takagaki et al., 2008; Harvey et al., 2009). The spatial progress of the FB over the cortex matches the course of the FB axons, in species in which there is anatomical evidence (Cantone et al., 2005, 2006; Eriksson and Roland, 2006; Roland et al., 2006; Ahmed et al., 2008; Harvey et al., 2009) (Figures (Figures66–8). This suggests that the FB axons make synapses on their way to lower areas (Rockland, 1997). Other, indirect, evidence for FB stems from cross-correlations between local field potentials in higher order areas and area 17 with a lag of some 5–10ms (Von Stein et al., 2000; Eckhorn et al., 2004; Salazar et al., 2004; Wibral et al., 2009) and cross-correlations in the firing of neurons in supra- and infra-granular layers with the laminar firing in higher order areas (see below). Yet other evidence for FB stem from temporary or permanent interference with the function of higher order areas (Mignard and Malpeli, 1991; Vanduffel et al., 1997; Hupé et al., 1998; Galuske et al., 2002).
The neurons in the temporal cortex and the neurons in parietal cortex generate FB to area 17 and lower order sensory areas (Figures (Figures5A5A and and8).8). There is at least one FB to any change in the visual scene (Roland et al., 2006; Xu et al., 2007; Ahmed et al., 2008; Harvey et al., 2009). Feedback is the rule, rather than the exception (1st principle). Since the FBs are the computational results of the populations of neurons in higher order areas, and since the higher areas should be able to compute interpretations of any visual scene, the feedbacks should be diverse in origin, shape, amplitude, timing, cortical trajectory, and targets. This diversity in turn then should depend partly on the changes in the visual scene, for example whether the objects in the scene are moving or the scene is stationary. It is not known exactly how the excitatory FB is computed, but laminar recordings of local field potentials from temporal visual areas show excitatory activity of infra-granular neurons just prior to the FB (Schroeder et al., 1998; Chen et al., 2007). Experimentally the feedbacks are fast, propagating with velocities of 0.2–0.3mmms−1 over the cortex and in general excite large populations of neurons in lower order areas (Eriksson and Roland, 2006; Roland et al., 2006; Xu et al., 2007; Ahmed et al., 2008; Takagaki et al., 2008; Harvey et al., 2009).
The principle states that computational differences between higher and lower order areas reconcile by FB. When the FB arrives in the lower order area, the neurons in the lower order area will decode the FB and compute a result, r(t), that in most cases will give rise to cortical spatio-temporal dynamics of the Vm and r(t) similar to that in the higher order areas. This is in particular true for those populations of neurons in lower order areas being pre-exited. Higher order areas may also reconcile their mutual computational differences by mutual excitatory communications, as may lower order areas.
When a stationary object appears in the field of view, this elicits FB from temporal and parietal visual areas at 60–75ms. The FB first aims at the cortical sites mapping the change in areas 19/21 and 17/18 (Figures (Figures5A5A and and8A).8A). This is the focused FB. The focused FB first reaches the mapping sites in areas 19 and 21. At the mapping sites in area 21 and 19, the inward current is still increased at 75ms (Figure (Figure5B).5B). The additional inward current provided by the FB may further accelerate the inward current (Figure (Figure6).6). The r(t) increases, most pronounced in the infra- and supragranular layers, reach a new maximum at 80ms (Figure (Figure99).
Meanwhile the FB continues towards the mapping sites in areas 17 and 18. Here the effects of the ON response have faded. First, the intensive firing associated with the ON response may have opened K+ channels in the spiking neurons, which brought down the dVm/dt towards baseline (sometimes called spike rate adaptation). Second, with the ON response, the inhibitory drive increased, which attenuated the net-inward current and reduced the r(t) (Figure (Figure5B).5B). At 75ms the inward current at the mapping site diminished considerably to the level of the pre-excited neurons. From then on the dVm/dt in area 17 does not change much. The pre-excited neurons, however, should have been free from spike rate adaptation and significant inhibition (Figures (Figures5B,5B, B,6,6, and and8).8). Thus when the FB arrived, the target neurons in areas 18 and 17 should have reasonable excitatory driving forces. The excitatory FB increases the inward current of the pre-exited neurons and those neurons not firing, but as the Vm is still relatively high in area 17 at 75ms (Figure (Figure4),4), the increases in dVm/dt and Vm become more moderate (Figures (Figures5B,5B, B,6,6, and and8).8). For this reason the increase in r(t) in area 17 to the FB is also moderate (Figures (Figures5B,5B, B,8,8, and and9).9). This second and moderate increase in the firing rate at the object-mapping site in area 17 has been known for many years, but it has not previously been associated with cortical FB. What speaks against thalamo–cortical origin and local generation of the 100–110ms firing is that this r(t) increase is most pronounced in supra- and infra-granular layers (Figure (Figure9).9). Moreover, the r(t) increase, by 5–8ms, lags the increase in dVm/dt induced by the FB (Figures (Figures5B5B and and8B).8B). Furthermore, in area 21 at the mapping site, the increase of the r(t) peaked at 80ms and preceded the weaker increases at the mapping site in area 17 in the supra- and infra-granular layers peaking at 100ms (Figure (Figure9).9). In area 17 at the mapping site, the many pre-exited neurons decode the FB, i.e., the communication of the computational results of higher order area neurons. Since the FB, hypothetically, drives mainly pre-excited and non-firing neurons, these neurons will compute the computational results of the higher areas, and when they fire communicate these results to their local surround.
The firing associated with the FB again increases the inhibitory driving forces in area 17. The pyramidal neurons in layers II and III are monosynaptically connected to nearby inhibitory neurons and most of these connections are mutual (Holmgren et al., 2003). As many pyramidal neurons connect to one and the same interneuron (Holmgren et al., 2003), the more pyramidal neurons fire the stronger the inhibition that is returned. This means that when the firing of such an interneuron becomes sufficiently strong, it can silence many surrounding pyramidal neurons. As the FB also ceases and the retinal communication of APs to area 17 is now moderate and steady state, the outward currents drive the mapping neurons in area 17 into net inhibition (100–130ms) (Figures (Figures5B,5B, B,6,6, and and88B).
The inhibition starts at the mapping site, driving the dVm/dt below baseline (Figures (Figures66 and and7).7). At this moment the temporal, and parietal areas provide a broad FB exciting the cortex that was pre-excited in areas 21, 19, 18, and 17 and mapping the object background (Figure (Figure8A).8A). When the broad excitatory FB sweeps over areas 18 and 17, the excited neurons located in the cortex mapping the object background fire sparsely (Figure (Figure8B).8B). The firing in the cortical domain mapping the background in areas 19, 18, and 17 also follow the same pattern of a clear, but sparse increase in r(t) followed, after 15ms, by a short significant decrease of the r(t) (Figure (Figure8B).8B). As the broad FB lags the FB to the 17/18 mapping sites by 8–10ms (see Figure Figure5A),5A), the two feedbacks and their related firing provide a segmentation of figure from ground in time and cortical space (Roland et al., 2006) (Figures (Figures8A,B).8A,B). This is one example of reconciling computational differences between higher and lower order visual areas, because the IT neurons have no information about the relations between object and background (Baylis and Driver, 2001).
That the FB to the mapping sites may contain (integrated) information about the current visual scene has support from studies showing that the information about the current visual scene in the r(t) in area 17 increases in the interval 90–120ms (Heller et al., 1995; Eriksson et al., 2010). Under the assumption that the FB to areas 17 and 18 contain information about the current visual scene, the FB serves the purpose of reconciling the differences in the computational results by the higher areas (having integrated the change of scene with the previous scene) and the lower order areas (having just finished computing the change) (Figure (Figure5).5). After the decoding of the FB by the lower order areas, the discrepancy disappears. This also implies that the lower order areas, after the feedbacks, compute visual scenes as opposed to just computing visual changes by appearing or disappearing objects.
For objects moving over the retina, lower order areas compute a pre-excitation in the direction of cortical motion (Figure (Figure7).7). The higher order areas 19 and 21 compute a more extended excitation predicting the future course of the object mapping over the cortex (Movie 1 in supplementary file, 104–120ms) (Harvey et al., 2009). This predictive net excitation is sent as a FB at 115–120ms exciting the moving object map in areas 17 and 18 (Figure (Figure77 and Movie 1 in supplementary material). But mainly the FB targets the cortex in areas 17 and 18 where the neurons were specifically pre-excited, i.e., ahead of the object mapping (Movie 1 in supplementary material and Figure Figure7B)7B) (Harvey et al., 2009). The additional inward current from the excitatory FB seemingly brings the pre-exited neurons over their firing threshold, especially in the infra-granular layers (Figure (Figure10).10). Here, in area 17, the neurons fire along the predicted future cortical path (Movie 1 in supplementary material and Figure Figure10).10). The FB thus also in this condition reconciles the computational differences between higher areas (19/21) and lower visual areas (17/18) by targeting the pre-exited neuron population, which then computes a result similar to that of areas 19 and 21.
As the moving object was mapped with different delays in different visual areas, these mappings were out of phase. After 60–70ms the mapping in areas 19/21 appeared ahead of that in 17/18 (Movie 1 in supplementary material). However, after the FB from areas 19 and 21 has reached areas 18 and 17, the excitation across areas 17, 18, 19, and 21, from 150ms and onwards progress in phase over the cortex (Movie 1 in supplementary material and Figure Figure7B).7B). Whether the cortico–cortical FB alone is instrumental to synchronize the progress of the excitatory activity over several visual areas remains to be tested experimentally. Cortico–thalamo–cortical FB may also be instrumental to reconcile and bring the mappings in phase (Silito et al., 2006).
One may question whether this principle of neuron populations in the (visual) cortical areas reconciling differences in computations between higher and lower areas also works if the FB is carrying information about the visual scene that violates the physical content of that scene. This question was examined in recent studies of a visual illusion, apparent motion. Primates and carnivores perceive apparent motion when a series of static images are displayed in quick succession with a spatial shift, such as in movies, video and TV. If a static object is displayed this way, the neurons in the primary visual cortex, area 17, correctly map the object in distinct and successively different retinotopic sites as increases in the inward current and an r(t) ON-response (Ahmed et al., 2008). The firing is very sparse between these mapping sites (Figure (Figure11).11). However, after the offset of the object in one position, neurons in higher order visual areas 19 and 21 map the object as moving over the cortex between its retinotopic mapping positions (Figure (Figure11).11). This is presumably due to the larger receptive fields of the neurons in areas 19 and 21 (Manger et al., 2002; Philipp et al., 2006). This implies that computation of motion takes place in the higher order area at retinotopic coordinates in area 19 between the successive object-mapping sites, whereas the neurons in corresponding retinotopic sites of area 17 show just spontaneous activity and resting baseline Vm (Figure (Figure11).11). The 19/21 neurons, mapping the object as moving between the object-mapping sites, send a FB to the neurons located in between the retinotopic mapping sites in areas 17/18 (Figure (Figure11C).11C). This FB excites and fire the neurons in area 17 in between the retinotopic map of the object in one stationary position and the next (Ahmed et al., 2008) (Figures (Figures11E,F).11E,F). This replicates the computational result of areas 19/21 and reconciles the computational difference between areas 17/18 and 19/21 and probably other visual areas within 40ms after the offset of the object at one position.
That the FB from higher order areas was instrumental in producing apparent motion in areas 17/18 is supported by the finding that the net excitation from the FB propagated from the mapping site of the object at the off-going position to the cortical position where the object just went on. Moreover, this was followed within 5–10ms by an increase in r(t) between these object-mapping positions (Figures (Figures11E,F).11E,F). Further, if first one object is shown in the center of field of view, and at the offset of this object, two identical objects are shown flanking the center of field of view – this elicits, in humans, the perception of the first object being split and moving as two objects to the flanking positions. In the ferret visual cortex, the mapping in area 17 is first as three stationary objects. In areas 19 and 21 the objects are mapped as one object being split and then moving to the flanking positions. Areas 19 and 21 then 32ms after the offset of the center object send precise timed FBs to excite and fire neurons in between the center of field of view representation in area 17 and the retinotopic flanking positions (Ahmed et al., 2008). Prior to the FB, the mapping of the object in different areas was out of phase, but after the FB at 116–126ms the excitation in cortex across the four visual areas progressed in phase, similar to what happens in continuous motion (Figure (Figure1111D).
If a new object, object 2, appears in the field of view say 40ms after object 1, the neurons mapping object 1 in area 17 have already communicated an ON response to the higher order areas. Object 2 will start to be mapped as an ON response in area 17 at 65ms, i.e., just before the FB (about the change in the scene induced by object 1) arrives. The prediction is that the excitatory FB then will increase the inward current of the neurons in area 17 also mapping object 2 and hence increase the ON response and feed-forward r(t) associated with object 2. This increased ON response in turn will lead to a second FB to the lower order areas reconciling the presence of both object 2 and object 1. In this case the higher order area's FB cannot overrule the appearance of a salient change in the visual scene. If a mask of the same size replaces object 1 after 40ms, the second FB will excite the neurons in cortex mapping the change induced by the mask and its background. The computational differences between higher and lower order areas would be reconciled (the interpretation of the scene would be mask+background). If object 2 appears at a position besides or partly overlapping object 1, the second FB will target the object 2 mapping site as normally and the computational differences between higher and lower areas will be reconciled by a second FB.
All computational differences must be reconciled if the network of many neuron populations in the visual areas should arrive at an unambiguous computational result about the visual scene. In ambiguous scenes, Rubin's vase and Necker cube for example, the lower order areas will map the physical contours correctly. The FB communicated from the higher order areas contains their current computational result. This changes when the computation in higher order areas changes to give the alternative interpretation. Therefore the alternative result will be communicated as FB to the lower order areas. In either case the mappings in area 17 prior to the arrival of the FB will be identical and the computation of the FB by lower order areas in any case will reconcile the computational difference.
The experimental examples showed that the feedbacks from higher order areas are diverse in origin direction, shape, cortical trajectories, and targets. The decoding of the FB provides information about the higher order areas interpretation of the current visual scene (Figures (Figures55–8 and and11).11). In the examples with moving objects and apparent motion, the lower order areas, after the FB, indeed computed dynamics in cortical space and time similar to that of the higher areas. The diversity of the FBs seems related to the differences in the visual scenes. In each case it is possible to decode the FB by measuring the Vm or dVm/dt dynamics. The decoding in retinotopic cortical space shows the visual areas interpretation of the scene. The effect of the FB on the target neuron populations in lower areas is not always improving the brains interpretation of physical realities in the visual surroundings (apparent motion, split motion), but the prediction is that FBs reconcile computation discrepancies between the higher and lower areas often in favor of the higher order areas computations.
The principle of FB reconciling the computational difference between higher and lower visual areas may seem similar to a proposal by Mumford (1992) and similar to predictive coding (Rao and Ballard, 1999). However in both these proposals area 17 sends error signals to the higher order areas, in contrast to the present proposal. Furthermore the FB in the model of Rao and Ballard (1999) is inhibitory. This is in conflict with the observations that cortico–cortical axons are excitatory. Moreover the predictive coding scheme assumes that the cortex possess an internal representation of the visual world and from this make predictions that are compared with the sensory signals. For the principle of reconciliation of computational differences (Box) assumptions of prior knowledge, representations, and prediction are needless. The only assumption made is that the IT neurons integrate the communicated ON response. The FB excitation is needed to communicate the computational results of the higher order areas to the lower areas. The space–time dynamics of the decodings of all communications in the interval when the FB is active (80–130ms) makes the interpretation of the scene by the lower areas.
One may ask how the higher order areas may reconcile their computational results? They may do this by communicating their results directly. In general two mutually, excitatory connected areas will, after a while, develop increases in dVm/dt, r(t), and Vm in the cortical zones their connections target. As pre-excitation is assumed to be present in all visual areas engaged by the FF communication, this might facilitate the reconciling. This is more easily seen in retinotopic-organized areas where similar dynamics evolve after both FF and FB (Figures (Figures5,5, ,6,6, and Movie 1 in supplementary material). If the areas do not directly communicate, they may communicate FB to the same lower areas (Figure (Figure5A).5A). Here their results are reconciled.
As long as populations of neurons in different areas continue to communicate their different computational results, the network of interconnected cortical areas has not arrived at a coherent interpretation of the current visual scene. So, what stops or minimizes the communications between areas? Generally, the effect on the communications between areas depends on the biophysical state of the communicating and driving neurons and the biophysical state of the target neurons, especially their dVm/dt and Vm (1st principle, Box). In most cases the FB will reconcile the computational differences between higher and lower areas. The excitatory driving forces of the populations in lower visual areas decrease after the FB if no new changes appear in the field of view and if the computational differences between cortical areas are reconciled. Contributing factors are the high Vm, that neurons just fired, and prevailing inhibition. This brings the network into a new dynamic regime, in which neurons in most visual areas fire sparsely. The sparse firing prevents the transition to spontaneous ongoing dynamics. When the driving forces decrease, the communications between areas cease or diminish radically.
The net excitation of the feedback arriving in areas 18 and 17 increases the Vm to its maximum at 90–130ms where the object is mapped (Figures (Figures4,4, ,6,6, and and7A;7A; Harvey et al., 2009). The firing rate r(t) also increases in the interval 100–150ms (Figures (Figures99 and and10).10). If no further changes in the field of view are communicated to these sites, the r(t) and subsequently also the Vm(t) decreases (Figures (Figures4,4, ,5,5, ,9,9, and and10).10). The dVm(t)/dt turns negative in the supragranular layers, well below the baseline (Figures (Figures66 and and7A:7A: 100–110ms). This net outward current spreads out from the sites mapping the object as a lateral spreading inhibition in the supragranular layers. The lateral spreading inhibition could be the result of several inhibitory mechanisms. It does not seem to be induced by the FB, as lateral spreading inhibition appears also systematically at cortical sites where an object moving over the retina was mapped as increases in Vm(t) and r(t) (Figure (Figure7B;7B; Harvey et al., 2009). So in general (lateral spreading) inhibition seem to follow peak firing with a delay of 80–150ms. One plausible mechanism could be that the increase in Vm and r(t) opens Ca2+ channels and increases Ca2+ conductance. The increased Ca2+ activates an afterhyperpolarization K+ conductance that with a delay brings the dVm/dt down below baseline. This is most likely to happen where the firing has been most intense, which is where the object was mapped (Figures (Figures66 and and7).7). The result is a net outward current that would be maximal at the cortical point(s) of maximal net excitation (Figures (Figures66 and and77).
For the perception of stationary objects, the layer IV neurons, after the effect of the FB subsided, only receive a sustained input from the retina and LGN, i.e., dr(t)/dt≈0. Such an input could drive some regular spiking excitatory neurons and perhaps some inhibitory neurons to moderate firing (Contreras and Palmer, 2003; Shinomoto et al., 2009). In addition, the remaining non-firing neurons may be under a regime of slight overweight of shunting inhibition (Monier et al., 2008). The lateral spreading inhibition inhibits the population in the output layer, layer III, and thus severely reduce FF communication to other visual areas. These factors reduce the FF communications to other areas. Furthermore the FF communication will consist of spike trains with little frequency modulation (dr(t)/dt→0) which will have difficulties to drive target populations. Finally, in the higher order areas, the neurons will also tend to go to a more sustained firing with little modulation of r(t)(dr(t)/dt→0) (Figures (Figures5B,C).5B,C). This reduces the probability of further FBs. The neurons mapping the object will now fire sparsely, but at a rate above the spontaneous activity. As the mapping neurons and the neurons mapping the object background decoded the FB, this indicates that the sparse firing from 90 to 120ms and onwards to some extent is related to the current visual scene (see also Eriksson et al., 2010). This regime may continue until the object disappears and the inhibition associated with the OFF response allows the neurons to return to the state of spontaneous ongoing firing. Pernberg et al. (1998) and Eriksson et al. (2008) showed that the initial effect of the offset of a stimulus most likely was inhibition.
For the perception of moving objects, the scene is not coherent until a catch-up saccade brings the moving object into the center of field of view. When the object is in the center of field of view, it can be segmented from its (now moving) background and, if it is pursued with the eyes, treated almost as a stationary object. Accelerations or decelerations of the object may induce retinal slips that are corrected by minor saccades.
Indeed the dVm/dt decreases below baseline for both stationary and moving stimuli some 80ms after the peak firing (Figures (Figures44–7). For stationary stimuli, the Vm after 150ms also goes below baseline in layers I–III and stays there until the stimulus goes OFF (Eriksson et al., 2008). This also indicates that no further FB are issued towards areas 17 and 18 from higher order areas and that the layer IV driving of supragranular layers is very modest in the simple case of no further changes in the visual scene. As there is still some sparse steady rate firing after 150ms even in the supragranular layers (Figures (Figures99 and and10)10) one should probably interpret this as specific firing driven by layer IV in an supragranular environment of net inhibition. The perception of the detailed visual scene may start at 120ms (Rieger et al., 2005), when these communications and driving forces reduce to leave a constant input to the area 17 weakly driving the network of visual areas by FF communications. This stage of the cortical dynamics continues until the stimulus goes off, or until a new object appears, or until a saccade is made to other targets in the field of view.
The retina and LGN communicate changes in the visual scene. Neurons at any site can drive neurons to which they are directly connected. Pre-excitation feedbacks reconcile computational differences between higher and lower visual areas. Conversely, if the driving decrease and the excitatory driving forces decrease, the communications between areas diminish or cease. These five principles govern the most common dynamic sequence associated with a change in the visual scene:
Thus, despite the limited information about the change or the current scene in the individual spike trains in visual areas, the target neurons in the mesh, after FB, arrive at an interpretation of the current visual scene after some 100–120ms. One may regard the instantaneous spike rate, r(t) as a dynamic mechanism of driving the target neurons by inward and outward currents. The r(t) of neurons in lower visual areas may first carry certain information about the change in the visual scene and after the FB certain information about the current scene, but mainly the r(t) is a dynamic mechanism of driving target neurons. Because the r(t)s locally and from higher order areas at 90–110ms communicate to the target neurons in the mesh in a spatially coherent fashion (Figures (Figures66–8 and and11),11), the lower visual areas can compute the interpretation of the current scene. This spatial coherence is one sign of well-behaved dynamics. The 6th principle states that the dynamics of the visual neurons are well behaved at the mesoscopic scale.
For the sake of clarity I emphasized the most frequent dynamics in the description of the dynamic sequence. But in reality, the LGN ON responses communicated to area 17 must interact with the ongoing spontaneous activity of the neurons in area 17. Moreover, depending on the viewing conditions, contrast and luminance changes can be subtle, giving rise to only weak retinal ON responses, for example from the peripheral retina. Thus under several circumstances the ON responses are weak and can fail to drive the area 17 neurons to FF communication to other areas. In these occasions, there would be no dynamic sequence of FF–FB and no integration of visual changes. One should not forget that the biophysical principles of neural communication determine the directions of communications implying that there are conditions associated with other dynamic sequences. This notwithstanding, the network of cortical areas at the mesoscopic scale display relatively stable dynamic sequence associated with the appearance objects in the visual scene for its main variables Vm, dVm/dt, and r(t). The experimental results showed that the space–time dynamics of this sequence modifies depending on the nature of the change in the visual scene. The surprising fact is that the biophysical mechanisms underlying the dynamic sequence are so robust that they are effective in most trials, even under anesthesia (Grinvald et al., 1994; Bringuier et al., 1999; Eriksson and Roland, 2006; Roland et al., 2006; Lippert et al., 2007; Ferezou et al., 2007; Xu et al., 2007; Ahmed et al., 2008; Harvey et al., 2009).
It is not known from experiments how large a proportion of the dendrites undergo the spatio-temporal dynamics shown in Figures Figures55–8 and and11.11. Theoretically however, coherent Vm changes in dendrites and coherent r(t) increases in neurons at even small proportions can be quite forceful at the population scale (Zohary et al., 1990; Averbeck et al., 2006; Chen et al., 2006). The dynamics evolving from a change in the visual scene is likely to engage very large populations of neurons, at least in the order of millions in carnivores and primates.
One implication of the well-behaved cortex is that significant proportions of its neurons should not bifurcate into oscillations and universal synchronous firing as oscillations and synchronous firing represent stable fixed points precluding further dynamics. It is unlikely that the fast propagation of excitation over the cortex from FB, lateral spreading excitation, and pre-excitation could be due to millions of neurons with intrinsic oscillating Vms that suddenly produced phase-shifts mimicking the shape and precision of the diverse dynamics characteristic for different visual scenes. Although one, in principle, could describe any change in Vm and r(t) in Fourier terms, it is peculiar to use the term oscillations for dynamics that at most constitute one cycle (Figures (Figures33–8, ,11,11, and Movie 1 in supplementary material). Although the dynamic sequence of the communications and computations after a change in the field of view seem stable in most trials, it is important that the neurons can escape into new trajectories when new changes appear or when attention or thinking so demands. This suggests that the well-behaved visual cortex will not end up in stable fixed points (strong attractors).
The reason why the visual cortex is well behaved is that population Vm recordings from the cortex (often done with voltage-sensitive dyes) stem from the cortical mesh of dendrites and axon terminals from very many neurons at each cortical point (Figure (Figure1).1). This means that measurements of dVm/dt are weighted means of the excitation or reduction of excitation (inhibition) of dendrites and axon terminal from more than 75000 neurons. Due to the large span of the dendrites compared to the size of the somata, the population of neurons to which the dendrites belong will change only slowly from one cortical point to the next. This aspect of the well-behaving population Vm, the spatial coherence, was first detected by Arieli et al. (1995) (but see London et al., 1989) and confirmed by other groups (Slovin et al., 2002; Chen et al., 2006; Roland et al., 2006; Lippert et al., 2007). However even if the mesh gives spatial coherence at the mesoscopic scale, this coherence increases after the FB (Figures (Figures44 and and66–11). Another reason why the cortex populations are well behaved is that cortical well-timed communications and FB can address large populations of pre-excited neurons.
The independence of the proposed principles and the possibilities of generalizing the principles to all forms of cortical dynamics could easily be questioned. Should one, instead of having two principles dealing with the biophysics of neural communication, simply state that all cortical dynamics must follow the laws of biophysics? Well then, which are the laws of biophysics? Similarly should the idea that “all computational differences must be reconciled if the network of many neuron populations in the visual areas should arrive at an unambiguous computational result about the visual scene” be a principle? I chose to emphasize the cortical biophysical dynamics that accomplished this, instead of proposing this idea as a principle. Could one generalize the principles of pre-excitation and reconciling FB in the visual areas, to one principle stating that computational differences among directly connected neuron populations in two different cortical areas will reconcile because of their mutual excitatory connections? Perhaps, but so far there are no experimental results that support such a generalization. Furthermore, dynamics can usually not be predicted from anatomy.
As almost all immediate Vm changes, dVm/dt, are postsynaptic, it follows that observing the dVm/dt is observing the decoding (Box). It is not possible to measure these variables in each and every neuron in the cortical mesh of axons and dendrites in the cortex, but it is possible to measure the changes in Vm simultaneously over very large neuron populations (see Figures Figures2,2, ,6,6, ,7,7, ,8,8, and and11).11). Due to the nature of mesh and the well-behaving population Vm dynamics, one can observe what the neurons decode. For the retinotopic-organized areas, how dVm/dt, r(t), and Vm evolve in cortical space–time bears direct relation to the change in the visual scene, i.e., the decoding in retinotopic space shows the interpretation of the scene. For example an emerging bump means that an object appeared in the field of view. Two moving bumps mean that two objects are moving in the field of view over the retina. Feedback from areas 19 and 21 SSY targeting between stationary bumps in area 17 and moving to adjacent positions in one direction means that apparent motion is taking place (Figure (Figure11).11). Feedback targeting the retinotopic mapping of an object and the cortex representing the background differently means segmentation of object from background (Figure (Figure8).8). Prediction of the future cortical trajectory of an object moving in one direction (Movie 1 in supplementary material), etc. It is thus possible to decode (part of) the internal dynamics of the cortex in terms of interpretations of (changes in) the visual scene. The prediction from the principles is that the change in the visual scene is integrated with the firing related to the previous scene and communicated as FB to lower order areas, implying that the visual areas together after the FBs compute the visual scene. These principles obviate any binding problem.
The picture emerging from these new experimental studies is that the visual cortical areas through FF, lateral/pre-excitation and FB may arrive at a uniform coherent computation reflecting the visual scene in less than 150ms. The spread of the retinal information to large populations in many areas initially increase the computational dimensions. The pre-excitation in combination with the FB and the resulting sparse firing increases the pair-wise correlation in r(t) among neurons and reduces the dimensionality considerably (Figures (Figures33–6). The properties of the visual cortical dynamics summarized above in this section imply that one can describe the Vm, dVm/dt and r(t) dynamics as well behaved at the mesoscopic scale. This does not exclude that the Vm, dVm/dt, and r(t) dynamics of single neurons in the cortical mesh could be bad, incoherent and unrelated to the communications or computations reflected at the mesoscopic scale – one does not know. Even if this was so, it might not matter that much because the overall dynamics is well behaved and reflect the interpretation of large populations of neurons of the current visual scene.
The six principles are theoretical proposals relying on the assumption that the experimentally observed dynamics of populations of cortical neurons can be explained from the relatively simple biophysics underlying excitation and inhibition. I tried to provide plausible biophysical descriptions of the mechanisms underlying the principles. This had some experimental support from recent experiments in which the dynamics to transients was examined. But these experiments are still few. The principles carry many predictions, but in this text I mentioned only a few to avoid overloading the text. The principles are theoretical and predictive and easily lend themselves to experimental tests. They point to mechanisms for the computation of perceptual unity. They explain how the brain could compute the current visual scene in less than 120ms, by essentially one sequence of FF-lateral spreading excitation/specific pre-excitation- and FB.
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at http://www.frontiersin.org/systemsneuroscience/paper/10.3389/fnsys.2010.00028/
Movie 1. The relative Vm in response to a bar moving downwards from the peripheral field of view (statistically significant p<0.01 after Bonferroni correction). The black holes show the electrode penetration sites. When the multiunit activity at any of the 16 leads at a penetration site becomes statistically significant (p<0.01) the hole turns white. Note the mapping of the future trajectory of the bar mapping first from the higher order areas (19 and 21) at 100-130ms, and the FB from these areas to the bar mapping in lower order areas 17/18 at 116-130ms. Then when the bar mapping in 17/81 has reached the left arrow the future cortical trajectory of this mapping is maximal and the multiunit activity increases significantly ahead of the mapping. And indeed at 620ms the bar map reaches the predicted site. Note also that from 160ms and onwards the cortical excitations progress in phase in higher and lower areas. (From Harvey et al., 2009).
AP, action potential; FB, feedback; IT, inferior temporal; FF, feed-forward; LGN, lateral geniculate nucleus; Vm population membrane potential.