|Home | About | Journals | Submit | Contact Us | Français|
During natural viewing, the eyes are never still. Even during fixation, miniature movements of the eyes move the retinal image across tens of foveal photoreceptors. Most theories of vision implicitly assume that the visual system ignores these movements and somehow overcomes the resulting smearing. However, evidence has accumulated to indicate that fixational eye movements cannot be ignored by the visual system if fine spatial details are to be resolved. We argue that the only way the visual system can achieve its high resolution given its fixational movements is by seeing via these movements. Seeing via eye movements also eliminates the instability of the image, which would be induced by them otherwise. Here we present a hypothesis for vision, in which coarse details are spatially encoded in gaze-related coordinates, and fine spatial details are temporally encoded in relative retinal coordinates. The temporal encoding presented here achieves its highest resolution by encoding along the elongated axes of simple-cell receptive fields and not across these axes as suggested by spatial models of vision. According to our hypothesis, fine details of shape are encoded by inter-receptor temporal phases, texture by instantaneous intra-burst rates of individual receptors, and motion by inter-burst temporal frequencies. We further describe the ability of the visual system to readout the encoded information and recode it internally. We show how reading out of retinal signals can be facilitated by neuronal phase-locked loops (NPLLs), which lock to the retinal jitter; this locking enables recoding of motion information and temporal framing of shape and texture processing. A possible implementation of this locking-and-recoding process by specific thalamocortical loops is suggested. Overall it is suggested that high-acuity vision is based primarily on temporal mechanisms of the sort presented here and low-acuity vision is based primarily on spatial mechanisms.
During fixation, the eyes move across several arcminutes with amplitudes that fall off with the scanning frequency (Findlay, 1971; Eizenman et al., 1985). These fixational eye movements (FeyeM) cover the entire spectrum between ~1 and more than 100Hz, with increased power in two main frequency ranges: one between ~1 and ~20Hz (“drifts”) with amplitudes of less than ten to a few tens of arcminutes (′) and another between ~40 and ~100Hz (“tremor”) with amplitudes of a few arcseconds (″) to a few arcminutes (Ratliff and Riggs, 1950; Barlow, 1952; Yarbus, 1967; Bengi and Thomas, 1972; Shakhnovich, 1977; Coakley, 1983; Eizenman et al., 1985). Microsaccades (brief movements of a few to a few tens arcminutes) usually interrupt these movements a few times per second and move both eyes simultaneously in the same direction (Krauskopf et al., 1960; St Cyr and Fender, 1969; Moller et al., 2002; Engbert and Kliegl, 2004). Drifts and microsaccades usually counteract each other such that the eyes repeatedly scan the same fixational area (St Cyr and Fender, 1969; see reviews by Steinman and Levinson, 1990; Martinez-Conde et al., 2004).
Since time constants of retinal responses are in the order of 30–100ms (Sakuranaga et al., 1987; Nirenberg and Meister, 1997; O’Brien et al., 2002), snapshots of spike activity of retinal ganglion cells, even if taken within a brief time interval, usually include spikes that were evoked by FeyeM occurring tens of ms before (Barlow, 1952). These slow retinal traces would smear the perceived image if the readout circuits would assume spatial coding alone, namely, that the spatial map of retinal spikes at a given moment provides all the information that is required for perceiving the external image accurately. However, smearing could be avoided if temporal delays between retinal spikes would be appropriately processed by the visual system (Ahissar and Arieli, 2001). Traditionally, temporal mechanisms had not been considered for the processing of stationary objects, whereas they were considered for the processing of moving objects, perhaps because stationary objects were intuitively conceived to induce stationary stimuli on the retina. Currently, however, it is widely accepted that temporal aspects both in the external world and in the encoding need not be averaged out when analyzing or modeling visual coding because, as far as the retina is concerned, stationary stimuli do not exist during natural viewing–retinal images are always moving (Kagan, 2012).
In fact, our vision of stationary images depends crucially on these movements. Virtually all retinal ganglion cells respond to transients of light, and many respond only to transients (Hartline, 1938; Hubel, 1988). The dependency of vision on transients is so strong that stabilization of a retinal image not only impairs visual performance (Kelly, 1979; Tulunay-Keesey and VerHoeve, 1987; Rucci and Desbordes, 2003; Rucci et al., 2007), but eventually causes the image to fade away (Fading time can be anything between 80ms to a few seconds, depending on the stabilization method, Riggs et al., 1953; Pritchard, 1961; Yarbus, 1967; Ditchburn, 1973; Coppola and Purves, 1996). Thus, an evolutionary selection for FeyeM may have emerged to prevent retinal adaptation when viewing stationary objects. Whether FeyeM emerged first and retinal sensitivities adapted or vice verse is not known and is not relevant to the current discussion. Similarly, whether FeyeM result from uncontrolled muscular noise (Carpenter, 1988), or (/and) are centrally controlled (Shakhnovich, 1977; Coakley, 1983; Eizenman et al., 1985; Martinez-Conde et al., 2004) is not directly relevant to the current discussion, although it is certainly relevant to the understanding of vision in general. In this respect it is important to mention recent convincing evidence supporting internal control of FeyeM in a way that optimizes retinal processing of natural images (Kuang et al., 2012). In any case, what is directly relevant to the current discussion is the fact that the two phenomena are congruent: FeyeM continuously provide the transient stimuli required to activate retinal cells. This congruency constrains the possible ways in which retinal information can be encoded accurately. On the one hand, it precludes a pure spatial code at the retina, since the spatial image is smeared. On the other hand, it generates accurate temporal coding of the image, since retinal cells are predominantly activated when their receptive field (RF) crosses an edge of an image.
What are the relative contributions of spatial and temporal codes to retinal outputs? There is no doubt that spatial coding, in which spatial location of external stimuli is coded by ganglion cell identity and illumination intensity by the cell’s firing intensity, would work just fine for low resolution vision, where low resolution here means resolution coarser than the extent of FeyeM (around 10′). This is because integration of retinal responses over fields larger than the extent of FeyeM is hardly affected by FeyeM. The question is whether spatial coding would work for fine vision, i.e., for spatial details smaller than 10′, and specifically for hyperacuity vision, i.e., for spatial details smaller than 0.3′. It might be argued that the visual system is still using the spatial code for fine and hyperacuity vision, while applying some error-correction mechanisms to overcome the effect of FeyeM. It appears that error-correction mechanisms proposed in the past for the visual system, such as shifter circuits (Anderson and Van Essen, 1987) or interpolation circuits (Barlow, 1979; Crick et al., 1981), are not suitable for FeyeM (Ahissar and Arieli, 2001). Recently, an elegant mechanism that uses FeyeM statistics to correct for their effect in a discrimination task had been suggested (Pitkow et al., 2007). However, it is not yet clear whether mechanisms of this type can correct for FeyeM when the object is not known in advance or when more than one feature of the object are to be perceived or when multiple objects are present in a complex background such as in natural vision. But perhaps the strongest argument against the use of spatial-rate coding for fine vision is that the retinal rate code is an order of magnitude less accurate and reliable than the retinal temporal code (Berry et al., 1997). In every response, occurring upon an illumination transient, the timing of the first spike in a burst is accurate and reliable while the number and rate of following spikes is variable and not consistent (Reich et al., 1997). It takes a great deal of justification for the visual system to ignore the most informative code and use the less informative one when processing fine spatial details (Van Rullen and Thorpe, 2001; see A Simple Quantitative Account Against Spatial Retinal Coding of Fine Spatial Details in Appendix 2, and Meister and Berry, 1999).
We present here a mechanism for temporal encoding and decoding in the visual system. We consider it as a new dynamic theory of vision, to follow previous original suggestions by Marshal and Talbot (1942), Bryngdahl (1961), Arend (1973), Ahissar and Arieli (2001), Rucci (2008). The analysis presented here does not aim to disprove the possibility of spatial coding for fine details – this remains to be an experimental question. The analysis presented here is aimed at showing that in the visual system a temporal encoding-decoding scheme is possible, and to show a possible specific implementation; we do not claim that this is the only possible implementation, or that the system must implement dynamic processing this way. We do claim that the mechanism presented here is a plausible and efficient way to process retinal outputs and that it is consistent with a large body of data.
We refer here to the entire spectrum of FeyeM, excluding microsaccades, and to all two-dimensional visual cues that define visual objects (we term those 2D cues), excluding depth cues. Our analysis that follows will show that different 2D cues can be encoded by different variables of retinal responses: shape by inter-receptor temporal phases, texture by instantaneous intra-burst rates, and motion by inter-burst temporal frequencies. The information carried by these retinal signals is valid only during an individual cycle of eye movement, and thus must be read out during that cycle. We will show how an NPLL-like locking-and-decoding process, implemented by thalamocortical loops of the visual system, can set the timing for meaningful cortical processing of 2D cues and decode some of these cues. Decoding of relative depth information is postulated to be implemented by an independent mechanism, possibly a mechanism that relies on receptor activation during microsaccades, since these movements are predominantly in the horizontal direction (Liang et al., 2005) and are highly coordinated between the two eyes. Outlines for such an algorithm are described in Sampling of Relative Depth Information by Microsaccades in Appendix 2; the rest of the article deals with the processing of 2D details via non-saccadic FeyeM.
The primary description of the hypothesis uses statements and derivations expressed in common terminology of experimental neuroscience, supported by schematic figures. The internal consistency of the hypothesis is demonstrated using formal definitions and derivations, which are presented in Appendix 1 and referenced in the relevant places in the text, and computer simulations. Data that support our assumptions are presented in Appendix 2.
2D spatial details are sampled by each eye independently. The information conveyed to the visual system depends on the external scene, the pattern of FeyeM of that eye, and the structure of afferent RFs. According to the afferent scheme suggested by Hubel and Wiesel (1962), which is consistent with a large body of data (reviewed in Hubel, 1996; Ferster and Miller, 2000; Reid, 2001), cortical simple cells receive inputs from elongated arrays of retinal ganglion cells via parallel channels through the LGN. These elongated arrays are composed of ganglion cells that share the same polarity, ON or OFF; interestingly, this segregation seems to be refined by FeyeM during development (Rucci et al., 2000). As a result of this anatomical arrangement, simple cells respond most vigorously to bars of the appropriate polarity that are oriented parallel to the elongated axis of their RF and are flashed on or moved across it (Hubel and Wiesel, 1962). However, they do not respond only to oriented bars. Both in cats and primates simple cells respond also to single dots flashed within their RF (Hubel and Wiesel, 1962; Gur and Snodderly, 1987; Hirsch et al., 1995; Tsao et al., 2003). Simple cells and other cortical cells respond vigorously to single spots or random dot patterns that are moved across their RFs in various directions (Hubel, 1958; Skottun et al., 1988; Pack et al., 2003a; Grunewald and Skoumbourdis, 2004). Interestingly, simple cells even show selectivity for dots moving along the elongated axis of their RFs (Geisler et al., 2001). As a result, when complex stimuli are moved across a stationary retina, simple cells respond in a variety of conditions, and are not limited to lines of a specific orientation (Creutzfeldt and Nothdurft, 1978). Scanning velocities in the above experiments were a few degrees per second, comparable with eye velocities during FeyeM (Riggs et al., 1954; Bengi and Thomas, 1968). Indeed, cortical neurons respond reliably also in the reverse, natural case, in which a moving eye scans stationary images during fixation (Snodderly et al., 2001). This latter study demonstrates clearly that the primate visual system tracks accurately the contrast changes scanned by the eye during FeyeM.
The size of simple RFs decreases as they get closer to the fovea. However, as recordings approach the fovea, measurements of RF size become difficult due to eye movements, especially in awake primates. Measurements done with the aid of image stabilization, i.e., while moving the image along the on-line-recorded trajectory of the FeyeM, show that at eccentricities of 2–9°, the width of subfields of simple RFs in layer 4 averages around 12arcmin, and can be as low as 5arcmin or less (Kagan et al., 2002). We are not aware of stabilized data from smaller eccentricities. Extrapolation of these data suggest that the width of subfields of foveal simple cells is expected to be at the level of a single-cone, which is consistent with a number of anatomical, physiological, and psychophysical indications (Polyak, 1941; Daniel and Whitteridge, 1961; Smallman et al., 1996; McMahon et al., 2000). It is not yet known whether foveal simple cells possess a single subfield (monocontrast cells Kagan et al., 2002) or more, and what aspect ratios characterize their subfields. In the following we will thus analyze the responses of single subfields of foveal simple cells, and will term such a subfield sRF. An ON sRF is a retinal field in which a given simple cell responds to transitions from dark to light, and an OFF sRF is a retinal field in which a given simple cell responds to transitions from light to dark. The examples depict arrays of foveal sRFs of a similar polarity, and a size of 1×4 cones. Whenever relative cortical coding is addressed, it relates to relative coding among sRFs of the same polarity.
In the natural case, the entire visual field is sampled synchronously by all retinal cells at the frequency of the FeyeM, and each location in the visual field is sampled in succession by neighboring retinal cells (Figure (Figure1).1). During fixation, a single external dot “draws” contours on the retina, which cross sRFs at various directions, determined by the direction of movement of the eye (Figure (Figure1).1). If responses of cortical simple cells were based on feedforward (FF) circuits only (“FF response”), their duration would be determined by the duration of the intersection of the dot trajectory and the sRF. As we will see later, the actual responses of simple cells involve additional components contributed by the thalamocortical closed loop circuitry. However, in order to understand retinal encoding along and across sRFs, we virtually open these loops and analyze the FF responses as if no corticothalamic feedback is functioning.
Feedforward responses contain information about fine spatial relationships within the image. The resolution of this information depends on the orientation of the sRF with respect to the scanned image. Consider, for example, the schematic example in Figure Figure2A.2A. When traversed by the eye, the external image is encoded by sRFs oriented in various directions, all of which provide useful information to the brain. Traditionally, processing of oriented edges, such as the patterned edge, had been assumed to rely on spike rate (or count) information which represents the matching between the orientations of the sRF and the edge. For example, the global orientation of the patterned edge in Figure Figure2A2A can be determined by the identity of maximally active sRFs – vertical ones in this case (blue ellipses). The orientation angle of the edge can be approximated with high resolution by interpolating firing intensities of simple cells having nearby orientations (Wilson, 1986). However, this kind of coding is limited in spatial resolution. Simple cells that are oriented parallel to edges would respond with the same mean intensity to edges whose general orientations are similar, regardless of their fine shape (e.g., Figure Figure2B).2B). In other words, the sampling resolution of an array of sRFs is the lowest along an axis parallel to their orientation. Furthermore, the span and resolution of this intensity coding is limited by the maximal number of spikes that can be generated during a single scan of the edge. With foveal sRFs of one to two ganglion cells width, and FeyeM velocity of about 100′/s, the entire width of the sRF crosses the edge within about 5ms, leaving time for one to two spikes per sRF. This poses significant constrains on the number and reliability of sRFs required for reaching perceptual resolution (see also A Simple Quantitative Account Against Spatial Retinal Coding of Fine Spatial Details in Appendix 2).
Another important factor that enormously constrains the way this information can be read is the existence of FeyeM. Simple cell responses depend crucially on the direction of eye movement. Thus, for example, if the movement of the eye is perpendicular to the orientation of the edge (e.g., Figure Figure2),2), and its velocity is appropriate, vertical sRFs will fire at maximal rate. However, response rate will decrease significantly when the direction of FeyeM becomes more parallel to that orientation. Moreover, movement in opposite directions would yield significantly different responses in most sRFs (see Snodderly et al., 2001). Thus, reading the intensity code of simple cells must take into account the direction of FeyeM at any given moment or employ a mechanism that somehow corrects for them (e.g., Anderson and Van Essen, 1987). To retain high resolution, such a mechanism must operate pre-cortically, which does not seem to be plausible (Ahissar and Arieli, 2001).
So how does the visual system restore the information that is lost by FeyeM smearing? The simple answer is that the visual system does not need to restore this information because it is not lost. This information is conveyed in a form of a temporal code that is encoded along the elongated axes of sRFs. In this paper we thus suggest an alternative to the classical encoding scheme, an alternative which we believe is consistent with existing data at least to the same degree that the classical one is. We suggest that information obtained across the elongated axes of sRFs (e.g., blue vertical sRFs in Figure Figure2)2) is used for coarse image analysis, while fine details are encoded along the elongated axes of the sRFs (Figure (Figure2,2, red, horizontal sRFs). With this encoding scheme, spatial information is encoded by temporal neuronal variables, and hyperacuity resolution is an intrinsic property of the encoded signals (Ahissar and Arieli, 2001). As we will see below, this temporal encoding captures the fine differences between the different edges presented in Figure Figure22B.
While the mechanisms of spike generation at the retina are not completely understood, it is known that they can involve integration times of tens of ms (Meister et al., 1994; Chichilnisky, 2001). Importantly, however, onset timing uncertainty in the retina and LGN is in the order of 1ms (Berry et al., 1997; Reinagel and Reid, 2000). Here, thus, we assume that the temporal precision of retinal ganglion cells, and of relative delays between neighboring ganglion cells, are reliable at the level of 1ms. Based on that, the entire set of spatial relationships in an image can be encoded by a set of temporal delays between neuronal events (Ahissar and Arieli, 2001). Figure Figure33 demonstrates, schematically, such precise temporal encoding for stationary (Figures (Figures3A,B)3A,B) and moving (Figures (Figures3C,D)3C,D) stimuli: a horizontal sRF (red ellipse), composed of six OFF-type retinal ganglion RFs, scans an image during three cycles of an oscillatory epoch of horizontal eye movement (Figure (Figure3A).3A). Epochs of two to four oscillatory cycles of a fairly constant frequency are abundant in human and monkey FeyeM data (Barlow, 1952; Byford and Stuart, 1961; Matin and Pearce, 1964; Bengi and Thomas, 1968; Coakley, 1983; Moller et al., 2002; Bosman et al., 2009); one such example is depicted in Figure Figure3E3E (see also Figure FigureA2A2 in Appendix 2).
Figure Figure3B3B shows a burst of five spikes during the rightward movement, generated by the retinal ganglion cells that are incorporated within the sRF that scans the image. Note that with this horizontal eye movement the response is generated only during rightward movement, which occurs because this sRF is composed of OFF-type fields. With ON-type fields, responses would have been generated only during leftward movement. In the following, we refer to the direction of FeyeM that generates responses in a particular sRF as its “protraction direction,” whereas the opposite direction is referred to as “retraction direction.”
Movements in the visual field produce Doppler-shifts effect of the retinal activation frequencies. For example, a light-to-dark edge that is moving away from an OFF-type sRF (Figure (Figure3C)3C) lengthens the intervals between successive retinal activations that are caused by the FeyeM oscillation (Figure (Figure3D;3D; compare the phases of burst onsets in the FeyeM cycle). This lengthening result in decreased inter-burst and intra-burst frequencies, and the decreases are in proportion to the velocity of the object (Vx). Movement in the opposite direction will increase these frequencies (Retinal Frequencies and Encoding of Motion in Appendix 1, Eq. 3). These Doppler-shifts represent the component of velocity that parallels the trajectory of the eye. Movement components perpendicular to the trajectory of the eye might be detected by other mechanisms, possibly non-foveal ones. For example, ganglion cells of salamander and rabbit retina detect relative motion induced by FeyeM-like movements (e.g., Olveczky et al., 2003). Their detection mechanism probably relies on integration of many photoreceptors by each ganglion cell, an integration that does not occur in the human fovea (Polyak, 1941). Such retinal mechanisms, thus, could serve peripheral motion detection.
Encoding of shape, i.e., of the outline of an object, is demonstrated by the scanning of a stationary terraced edge (Figure (Figure4A).4A). Since the retinal movement of the stationary object is spatially coherent, the relative temporal phases of the bursts, across the corresponding sRFs, encode the relative spatial locations of the terraces (Temporal Encoding in Appendix 1, Eq. 1). In this example, a single cycle of FeyeM (4′ at 12.5Hz) is shown. A burst of six spikes per FeyeM cycle occurs as the six horizontal OFF-type sRFs traverse the objects. Spatial offsets of 15″ are encoded by temporal delays of ~2.5ms (Veye>=4′/40ms). Note that with encoding along the elongated axes of sRFs, neither the firing rate nor the total spike count provide information about the relative locations of the edges in the external image.
Encoding of texture, i.e., of the characteristics of surfaces such as patterning and spatial frequencies, is demonstrated by the scanning of two gratings (Figure (Figure4).4). Here, the outline of the object (its shape) is again encoded by the relative onset phases of the FF bursts. However, unlike the case of a smooth surface (Figure (Figure4A),4A), a patterned surface induces additional spikes in each ganglion cell: each ganglion cell fires every time it crosses a luminance change while scanning the surface (Figure (Figure4B).4B). The mean spatial frequency of the surface is encoded by the mean firing rate during protraction (compare Figures Figures4A,B).4A,B). The exact pattern is encoded by the temporal phases. In the following we will assume smooth surfaces for simplicity.
Most of sRFs contain a “rod” of one polarity (ON or OFF) surrounded by flanks of the opposite polarity (Hubel and Wiesel, 1962). This special structure produces a kind of an isolated corridor through which image details are encoded in only one direction – along the elongated axis of the sRF. With typical FeyeM velocities, an image contrast scanned by a sRF in a direction not parallel to its elongated axis will not excite the central “rod” due to the inhibition activated at the flank a few ms before. This protected corridor offers two important functions. First, it allows independent processing of different axes of the visual image, e.g., vertical vs. horizontal (1D vision). Second, it provides information about the direction of the FeyeM. For example, when scanning velocity is in the working range of the flank-center inhibition a vertical FeyeM will not activate a horizontal sRF and a horizontal FeyeM will not activate a vertical sRF, whatever image is scanned. Note also that complex cells are even better encoders of FeyeM direction (and perhaps also speed) due to their integration of several SCs of the same orientation. Moreover, integrated activity of different horizontal complex cells can signal onset times of cycles of FeyeM in the horizontal direction, and that of vertical complex cells in the vertical direction.
Decoding of temporally encoded signals can be implemented by various neuronal algorithms (Carr, 1993; Mauk and Buonomano, 2004; Theunissen et al., 2004). Specifically, encoding by a single common oscillatory modulation, as is induced here by FeyeM, can be decoded efficiently by simple FF neural networks (Hopfield, 2004). The efficiency of several of these decoding algorithms had been demonstrated for sensory systems other than the visual system. While some of these algorithms could also be plausible for the visual system, we prefer to suggest a specific algorithm which is based on the circuitry and cell physiology found in the primary visual thalamocortical system.
For the description of the proposed decoding process, we depict an example similar to that of Figure Figure4A,4A, but with the addition of a moving object (Figure (Figure5A).5A). Three of the horizontal bars that compose the external image are stationary and one (dashed rectangle) is moving leftward (Figure (Figure5A).5A). As a result, the FF frequencies of the sRF scanning the moving bar increase (Figure (Figure5B5B and inset): the inter-burst period is shortened by ~2.5ms in each cycle, and the intra-burst period is shortened by a similar ratio. These changes are specific to the sRF scanning the moving bar; inter- and intra-burst periods remain unchanged for the FF signals that represent the stationary parts of the external image.
Thus, shape and velocity are encoded at the retina by the temporal phases and frequencies of retinal bursts, respectively. Texture, as shown above, is encoded by the intra-burst firing pattern. The temporal structure of retinal activities is extremely reliable (Berry et al., 1997) and is well preserved up to the thalamus (Levick et al., 1972; Lee et al., 1981; Reinagel and Reid, 2000), and thus can be efficiently decoded by thalamic or thalamocortical circuits. In the following text and Figures Figures55 and and66 we demonstrate that decoding by thalamocortical circuits is possible.
A constraint of the FeyeM-driven encoding scheme is that decoding must be locked to movements of the eye. Comparison of onset times or temporal periods of neighboring retinal cells is meaningless if done across different cycles of FeyeM, because eye position and velocity are not constant across cycles. Moreover, decoding of shape and velocity requires an accurate comparison, in each FeyeM cycle, of the burst onset times of retinal sRFs. These constrains suggest that the brain should either use efferent copies of the signals that control the FeyeM (Shakhnovich, 1977; Coakley, 1983; Eizenman et al., 1985), if they are available, or employ a mechanism that can lock to the afferent sensory signals; a combination of both strategies is also possible. Here we show how the brain could use a mechanism that locks to the sensory signals without the need for an efference copy.
Existing data from the mammalian visual system indicate that the visual cortex employs oscillatory mechanisms that can be used by the visual system to decode the information encoded by FeyeM (see Cortical Oscillations below). The most efficient oscillation-based temporal decoding mechanism is probably the phase-locked loop (Gardner, 1979), whose neuronal implementation is termed the neuronal phase-locked loop (NPLL, Ahissar, 1998). In its basic form a single NPLL consists of a phase detector (PD), whose output firing rate reflects the temporal phase difference between its inputs, and a rate-controlled oscillator (RCO), whose oscillating frequency is controlled by its input firing rate; the circuit forms a negative closed loop (Figure (Figure5C).5C). Experimental evidence suggests that thalamocortical NPLLs are involved in temporal decoding in the mammalian somatosensory system (Ahissar and Vaadia, 1990; Ahissar et al., 1997, 2000; Kleinfeld et al., 1999, 2002; Ahissar and Zacksenhouse, 2001; Sosnik et al., 2001; Moore, 2004; Zacksenhouse and Ahissar, 2006; Fox, 2008). We thus explore how could visual NPLLs, if exist, facilitate the readout of the information encoded by FeyeM.
Below (text and Figures Figures55 and and6),6), we describe how a single NPLL circuit, when implemented within a visual thalamocortical loop, could decode retinal information that is temporally encoded by FeyeM. We suggest that many such NPLLs are implemented in parallel, each having a slightly different intrinsic frequency, which together would cover the entire spectrum of FeyeM frequencies (see The Visual Motor-Sensory Loop and Figure Figure99 below and A Global View on the Perceptual Process in the Discussion; Ahissar, 1998). We suggest an implementation that relies on the same experimental data often providing the basis for purely spatial models. The suggested implementation, however, involves new functional interpretations of classical terms, such as orientation selectivity and optimal response. For example, one important deviation from classical convention is that optimal responses do not necessarily involve maximally possible firing rates, and computations are not necessarily based on feature detection signaled by firing rates.
The core idea behind oscillation-based decoding is that the oscillations establish a local predictor for input timing (Figure (Figure5C).5C). By comparing the predicted time (tosc) with the actual time of the input (tin), a difference signal is generated (Rout). In closed loop oscillation-based decoding, such as in the NPLL, the difference signal modulates the local oscillator in a negative feedback manner, namely it drives it in a direction that will approach a constant difference. If parameters are tuned correctly this negative feedback forces the local oscillations to track the timing of the input (see Loop Operation in Appendix 1 and Gardner, 1979). This generates a delayed internal replica of input times, which is locked to the FeyeM (Figure (Figure5D).5D). The local oscillations, being the best internal predictors for input timing, provide an internal marker for downstream processing, or a kind of a processing clock. In addition, when the loop is locked, the difference signal, Rout, represents the deviations of the input timing from its own history (i.e., the input’s temporal modulations).
A simplified model for a thalamocortical NPLL, which includes the minimal number of basic components and the constrains of known thalamocortical circuitry (White, 1989; Douglas and Martin, 1991; Lund and Yoshioka, 1991; Sherman and Guillery, 1996), is depicted in Figure Figure6.6. Thalamocortical projections converge on simple cells (SC, Hubel, 1996) in layer IV. These simple cells can be inhibitory, as in Figure Figure6,6, or could drive inhibitory interneurons (INH, Anderson et al., 1994). The inhibitory output project onto oscillatory neurons (such as “chattering” or “fast rhythmic bursting cells”; Gray and McCormick, 1996; Cardin et al., 2005; see also Ahissar and Vaadia, 1990; Llinas et al., 1991; Silva et al., 1991), which then close the loop by projecting onto the same thalamic cells that drive the loop’s simple cells. The corticothalamic projection might be implemented directly by oscillatory neurons in layer 6 (Cardin et al., 2005) or mediated by other, corticothalamic neurons, if chattering cells of layers 2/3 (Gray and McCormick, 1996) or cortical interneurons (Cardin et al., 2009) function as the local oscillators in specific circuits. In principle, the only component in the loop whose implementation requires more than a single neuron is the thalamic array. However, it is postulated that the entire thalamocortical loop has a “width” of more than one neuron, i.e., every component is implemented by a group of coupled neurons. Such an implementation is both more realistic and more efficient (Ahissar, 1998).
In order to obtain temporal decoding, the following three basic principles should be obeyed by a single loop (see Gardner, 1979): (i) the output of the simple cells (Rout) should decrease as the retino-cortical delay (tD) increases (Figure (Figure6B,6B, red curve) and (ii) the period (inter-burst interval) of the oscillatory cells should increase as the output of the simple cells increases (Figure (Figure6B,6B, blue dashed curve); this change increases the delay between the retinal output and the cortical oscillatory cells (tD; note that for this function the ordinate describes the input, Rout, and the abscissa describes the output, tD). The multiplication of the gains (slopes) of the two transfer functions described in (i) and (ii), the so-called open-loop gain (G), should thus be negative. In Figure Figure5D,5D, an optimal set of loop parameters (G=−1) was depicted, such that the first correction of the oscillating period is the required one. When these requirements are obeyed, the thalamocortical loop is forced to “lock-in” to the retinal rhythm and to track its fluctuations (for constrains on locking dynamics, see Ahissar, 1998). The range of retinal frequencies for which these requirements are met by a specific NPLL is called the working range of that NPLL. The phase-locking machinery of NPLLs ensures that the processing is done cycle-by-cycle, with every processing cycle being matched to a specific FeyeM cycle. When an NPLL is locked to the FeyeM, outputs of its cortical simple cells recode the retinal frequency by spike count (per cycle), and preserve the temporal phase relationships between different retinal outputs (see Cortical Representations below). Furthermore, when an NPLL is locked to the FeyeM, the retino-cortical delay will be directly related to the inter-burst frequency of the retinal output (Figure (Figure6B)6B) both within each cycle (instantaneous increased retinal frequency will advance the next retinal spike while the next cortical oscillatory spike will maintain its previously determined prediction) and across cycles (due to the negative feedback nature of the loop which will enforce less inhibition on the cortical oscillatory cells and thus reduced Rout and thus increased tD; Ahissar, 1998).
As long as a neuronal loop obeys the requirements mentioned above, its exact implementation is not important, and could vary among different species or even among different individuals. For example, the circuit can be simple in cases where simple cells are by themselves inhibitory or when the oscillatory neurons project directly to the thalamus (as in Figure Figure6;6; Cardin et al., 2005), or it could be more complex if additional neurons relay information between the basic components, for example, between thalamocortical neurons and simple cells (Blasdel and Fitzpatrick, 1984).
In our model, the cortical oscillations produce internal expectations with regard to the timing of the next retinal output. These temporal expectations are manifested by the output of the oscillatory neurons, and thus also by the onset time of the corticothalamic gating signal (blue rectangles in Figure Figure66 inset and in Figure Figure5D).5D). For every given retinal location, and its corresponding thalamocortical loop, deviations of retinal output from its expected timing are detected by thalamic “relay” neurons (i.e., PDs), affect the spike count of simple cells, and thereby, re-adjust the period of the subsequent cortical oscillation cycle. For example, assume that the input period changes from 80 to 77.5ms, as is the case with the moving bar in Figures Figures5A,B.5A,B. Since the cortical expectation for the second cycle in Figure Figure55 is still 80ms, the phase difference between the input (Figure (Figure5D,5D, black spikes) and the cortical oscillations (blue rectangles) will increase (from 7.5 to 10ms; second cycle in Figure Figure5D).5D). Consequently the output of the simple cell(s) will decrease (from three to two spikes; Figure Figure5E),5E), inhibition of oscillatory neurons will decrease, and therefore, the cortical oscillating period of the next cycle will decrease, thus tracking the change occurring at the input. In this example, optimal loop parameters (G=−1) were depicted such that the first correction of the oscillating period is the required one. Thus the loop is already stabilized at its new working point, one cycle after the change in the input occurred (i.e., at period=77.5ms, delay=10ms, output=2 spike/cycle).
With less optimal parameters, complete stabilization might require more than a single cycle (Ahissar, 1998). In any case, if the retinal frequency is within the working range of the loop, the cortical period is forced to eventually converge to the retinal period, since (due to the negative feedback) the thalamic “error signal” will always drive the cortical oscillations in the direction that will tend to cancel the “error.” Note that the stabilized output spike count is determined by the inter-burst frequency; for decreasing retinal periods (increasing frequencies), the loop working point (Figure (Figure6B)6B) will move to the right, toward longer retino-cortical delays. Thus, when the frequency of FeyeM is continuously changing, as is the case with natural viewing, cortical oscillatory frequency will track those changes with a lag of at least one FeyeM cycle, where in each cycle the tracking error is reported by the output of simple cells. The emergence of phase-locking from the transfer functions of the loop is described in the following sections.
Cortical oscillations can track input frequencies if their frequencies (i) are in the range of the input frequencies, and (ii) can be modulated by local cortical inputs (Cortical Oscillations in Appendix 1, Eq. 4). Both requirements are fulfilled by mammalian cortical oscillations. The spectral densities of human FeyeM and neuronal oscillations in the monkey visual cortex (Eckhorn, 1994) exhibit a striking similarity, both emphasize alpha and gamma modes (see Ahissar and Arieli, 2001). Oscillations at frequency ranges that match those of FeyeM had been observed in the visual cortex of cats, ferrets, and monkeys (Gray et al., 1989; Eckhorn, 1994; Gray and McCormick, 1996; Brumberg et al., 2000; Cardin et al., 2005; Bosman et al., 2009). The frequencies of visual cortical oscillations can be controlled locally (Gray and McCormick, 1996; Brumberg et al., 2000; Cardin et al., 2005) and can be modulated by and locked to external stimuli (Eckhorn et al., 1993; Gray and Viana Di Prisco, 1997; Cardin et al., 2005; Bosman et al., 2009), as is the case for other modalities (for review, see Ahissar, 1998).
Besides frequency ranges, the main difference observed between visual and other cortical oscillations is that visual oscillations have not been observed so far in the absence of visual stimuli. This might indicate that expression of cortical oscillations (e.g., translation of sub-threshold oscillations to spike activity) requires an additional excitatory input (Gray and McCormick, 1996; Cardin et al., 2005). Such an input (“M” in Figure Figure6)6) can be provided by an internal preparatory signal or an afferent stimulus-driven signal, and be shaped during development and by learning (Ahissar et al., 1998).
Thalamocortical neurons of NPLL are required to generate an output whose spike count decreases as the retino-cortical delay increases. This function is easily implemented by corticothalamic gating (Figure (Figure6),6), in which thalamic transfer strongly depends on the cortical signal (Diamond et al., 1992; McCormick and Bal, 1994; Sherman and Guillery, 1996). In this gating mode, thalamic neurons fire only, or mostly, when retinal and cortical inputs overlap. When thalamic neurons are in this mode, and their cortical input is periodic, the result is periodic gating. At each oscillation cycle, thalamic output would be maximal when the two inputs fully overlap, and will gradually decrease as the delay increases until no overlap occurs (Thalamic Phase Detection in Appendix 1, Eq. 5). Thus, a computation of the phase difference between retinal and cortical activities and its recoding by spike counts occurs at each oscillation cycle. Operation of the PD is described schematically in Figures Figures55 and and6,6, in which the FF and feedback signals are illustrated as a spike burst and a rectangular function, respectively. The activity symbols in Figures Figures55 and and66 (FF-spike burst; feedback-a rectangular function) are used for illustrative purposes only; both signals are conveyed by discrete spikes, which in turn induce continuous post-synaptic potentials in their post-synaptic PD cells. The mechanism of thalamic phase detection resembles that of coincidence detection suggested for the auditory brainstem (Carr and Konishi, 1990), but with a wider response range; the response of thalamic phase detection decreases gradually as the phase difference increases while that of coincidence detectors decreases sharply. The gradual response profile is essential for the operation of the NPLL (see Phase Locking above).
In each oscillation cycle, loop variables depend on the input and on their values during the previous cycle (Appendix 1, Eqs 6 and 7). Thus, for any given constant input, the loop variables, such as simple-cell spike count and oscillator phase, will continue changing until frequency locking will be achieved. For each input frequency, the loop will stabilize on a different set point, i.e., different set of variable values (Appendix 1, Eqs 8–10). When the frequency of the input is not constant, cortical oscillations will track the changes in the input periodicity, aiming at (though never achieving) a new set point upon each change (Appendix 1, Eq. 11).
The dynamics of the loop following changes in FeyeM frequency are similar to those occurring following external motion (Figure (Figure5,5, see Phase Locking above). Upon increased FeyeM frequency, the FF signals (of all NPLLs) arrive earlier than expected, while the gating signals appear at the time corresponding to the on-going expectation. The result is a decreased overlap in the thalamus (e.g., Figure Figure5D5D third row), thus decreased SC outputs, decreased inhibition on the oscillatory neurons, and thus increased oscillatory frequency. A similar process, with opposite signs, occurs with decreased FeyeM frequency.
Thalamic gating is not operational following sustained silent periods (McCormick and Bal, 1994; Sherman and Guillery, 1996). Therefore, when stimulating paralyzed eyes at low frequencies (<2Hz), thalamocortical neurons would often be activated by retinal outputs even in the absence of cortical activity. If a burst of stimuli follows the first stimulus, a locking-in process is expected to be initiated in the corresponding NPLLs (see Figure 9 in Sosnik et al., 2001).
Retinal periodicities are translated into latencies and spike counts of cortical simple cells (Figure (Figure6B).6B). Since the external velocity (Vx) is encoded at the retina by a frequency shift, it is represented in the cortex by a decrement (Figure (Figure5E)5E) or increment in the number of spikes per cycle from the corresponding output of simple cells. Note that the coding of retinal velocity is differential: local motion is coded by the difference between the spike counts of neighboring simple cells. Thus, although absolute values of spike counts can vary with variations in FeyeM, the local difference due to external local motion will remain. In fact, differential coding applies to FeyeM-induced temporal coding in general. Image shape and texture, as already mentioned, are also coded differentially in this scheme, and are represented by temporal phase relationships among neighboring simple cells (shape, Figure Figure5E,5E, first cycle; Cortical Representations in Appendix 1), and by relative intensities of intra-burst firing (texture; not shown).
Central processing of 2D details, as that of depth cues, is often described in terms of spatiotemporal filtering (Barlow, 1979; Fahle and Poggio, 1981; Burr et al., 1986; DeAngelis et al., 1993; Morgan and Castet, 1995). This description does not specify the actual processing carried by central circuits but indicates that the processing should be carried over both dimensions interactively. The NPLL circuits suggested here can be described as adaptive temporal filters, whose center frequency follows that of FeyeM. The outputs of these NPLLs, i.e., the outputs of cortical simple cells, are processed by complex cells across space. Adjacent simple cells feed the decoded temporal information (now in spike count code) to individual complex cells. Complex cells process (usually integrate) these inputs across the spatial dimension. The exact computation performed by complex cells, and their feedback interactions with thalamocortical NPLLs, are not yet clear. In any case, according to the scheme presented here, thalamocortical NPLLs are responsible for the temporal component, whereas cortical complex cells, interneurons, and the associated circuitry are part of the spatial component, of the implied spatiotemporal central processors of visual inputs.
Although typically cyclic, FeyeM are not purely periodic. Trajectories of FeyeM are strongly modulated in both amplitude and frequency (see Examples of Human FeyeM in Appendix 2). Since, the induced eye movement is the same for the entire retina, the nature of relative time coding, i.e., coding spatial offsets by relative temporal delays across neighboring cells, should not depend crucially on the exact pattern of FeyeM. We tested this prediction using computer simulations. Three gray images with patterned left edges (Figure (Figure7B,7B, images 1–3) were scanned by a small retinal field that was moved according to the FeyeM recorded in a human subject (Figure (Figure7A7A and Examples of Human FeyeM in Appendix 2). Ganglion cell firings were determined at a resolution of 1ms (Meister and Berry, 1999; Reinagel and Reid, 2000) using a simple threshold crossing mechanism; note that an addition of integration-based mechanisms, while not expected to affect timing reliability and thus the relative temporal code, are expected to affect the absolute firing times of ganglion cells. The FF signals were conveyed by OFF-centered subfields of foveal simple cells (sRFs) of 4×1 (horizontal) and 1×4 (vertical) receptors, 1arcmin each, flanked by symmetric inhibitory fields (1 receptor wide, Figure 2D in Hubel and Wiesel, 1962). Each sRF fed a “thalamocortical” loop that contained a RCO and SC and obeyed the NPLL equations (Eqs 4 and 5 in Appendix 1).
The dynamics of the simulated responses were analyzed in relation to FeyeM cycles. In each cycle, delay to the first spike (del) and total spike count (sp) were calculated. The vertical sRFs were blind to the fine offsets of the image’s left edge. The delays and spike counts conveyed by such sRFs (Figure (Figure7B7B depicts one column of vertical sRFs) were almost identical for the three images. In contrast, horizontal sRFs exhibited different dynamics for each edge, resulting in a unique temporal representation during the 3rd cycle. The images were represented by the relative firing times of four horizontal sRFs (sRF #7–10 in Figure Figure7B,7B, enlarged in a separate panel): for image 1, sRFs 7 and 8 exhibited relatively short delays from cycle onset while sRFs #9 and 10 exhibited relatively long delays (short-short-long-long). In a similar manner, images 2 and 3 were represented by short-long-short-long, and short-long-long-short delay patterns, respectively. These representations were preserved at the outputs of the NPLLs (Figure (Figure7B,7B, SC horizontal).
The dynamics of spike generation in relevant horizontal NPLLs are depicted in Figure Figure8.8. The four coding NPLLs (#7–10) received sRF inputs that allowed phase-locking to the horizontal FeyeM. As a result, the RCO bursts (red vertical lines) were phase-locked to the FeyeM cycle onsets (dotted green vertical lines). This resulted in a reduced frequency in the RCOs of these NPPLs: from 4 to ~3 cycles in the time window depicted. RCO locking to FeyeM enabled the translation of sRF firings (magenta vertical lines) to SC firings (blue vertical lines). The other four NPLLs fired out of phase with the input and thus blocked their corresponding sRF signals.
These computer simulations demonstrate the validity of temporal encoding of shape by FeyeM, the superiority of temporal coding along over rate coding across the elongated axes of sRFs for fine spatial details, and the ability of NPLLs to phase-lock to retinal outputs, and thus recode shape by relative time coding, with natural FeyeM. These simulations do not address encoding-decoding of object motion or texture.
One long-standing unresolved puzzle in vision is: how come the world appears stable if our eyes move all the time? The answer to this puzzle is surprisingly simple. This puzzle exists only if one assumes that FeyeM are not part of the visual mechanism. For example, if the visual system would read a spatial retinal code, this code would move across the retina according to the trajectory of FeyeM. In that case, some mechanisms in the brain should correct for eye movements, and it is totally unclear what mechanism could do that. However, if vision is done via FeyeM, no correction is needed. These very movements sample the information in; this is the way the visual system acquires information, exactly as the tactual system acquires information by moving the fingers (or the whiskers in rodents). As the system is tuned for moving its sensors to acquire information, there should be nothing in the system that interprets a coherent movement of the entire sensory field as a movement of the world. Information about the actual movement of the eye is continuously conveyed to the brain by the spatiotemporal activation of horizontal and vertical retinal sRFs (see Encoding of FeyeM Direction and Resolving Image Details by sRF Structure: 1D Vision above). Interference with this spatiotemporal pattern contributes to motion illusions such as autokinesis and induced movement (Poletti et al., 2010), suggesting their inclusion in the seeing loop. A prediction of seeing via FeyeM is thus that small or slow FeyeM-like movements of the entire visual field should not be perceived [see prediction (ii) below].
Optimal functioning of NPLLs requires their operation within a motor-sensory feedback loop (Ahissar and Vaadia, 1990; Ahissar, 1998). Schematically, the loop level containing the NPLLs can be described as is shown in Figure Figure9.9. The movements of the eye [Ve(t)] and external objects [Vx(t)] induce movements of the image on the retina [Vi(t)] which generate afferent visual signals (green arrows). The afferent signals are filtered along their afferent pathway (IFs) and fed into thalamocortical NPLLs. The loop contains many NPLLs, each tuned to a different intrinsic frequency and thus capable of decoding visual information carried by a different range of temporal frequencies. Available data on thalamocortical visual oscillations suggest that this bank of NPLLs span the range of frequencies between 1 and 100Hz, with some tendency for frequencies in the δ, α, and γ ranges (Gray et al., 1989; Eckhorn, 1994; Gray and McCormick, 1996; Brumberg et al., 2000; Cardin et al., 2005; Bosman et al., 2009). The outputs of the NPLLs drive circuits that decode shape, texture and local motion information, and in addition close the motor-sensory loop by driving motor circuits controlling eye velocity. This entire motor-sensory loop functions as a negative feedback loop; a change in eye velocity induces a change in NPLLs outputs [Rout(t)] that generates motor signals that oppose the change in eye velocity (insets of transfer functions). By monitoring the mean output of NPLLs and modifying eye velocity accordingly such a negative feedback loop can maintain the mean temporal frequency of the retinal output in a desired range (The Motor-Sensory Loop in Appendix 1).
At any given viewing period there should be one or several NPLLs, out of the many NPLLs included in the bank, that dominate the motor-sensory loop; the motor-sensory loop functions to optimize the input for these NPLLs during that period. Given the dynamics of frequency variation in FeyeM (e.g., Moller et al., 2002) it would make sense to assume a dynamic shift of control from one pool of NPLLs to another every three or four FeyeM cycles. With optimal tuning, such a period should be sufficient for efficient temporal decoding (Figures (Figures77 and and8;8; Ahissar, 1998). The mechanism that determines the shift of control within the NPLLs bank is not yet clear, although a few potential principles for its operation had been suggested (Ahissar, 1998).
Natural vision is a continuous process that, at any given moment, has to deal with retinal activity that has been affected by recent FeyeM and by optical blurring. FeyeM enforce temporal encoding at the retina. Spikes of retinal outputs indicate times in which image contrasts are crossed by moving receptors, much like spikes of mechanoreceptors indicate times in which their RF crosses a ridge. Such temporal encoding has a hyperacuity resolution and is resistant to optical blurring. Metaphorically, vision can be described as an active process in which the retina “palpates” external objects like rat whiskers and human fingers do. In both vision and touch, the sensory organs are mainly activated when they encounter changes during their scanning movements.
We suggest here a specific way in which the visual system could decode temporally encoded retinal signals. The proposed active decoding scheme involves cortical oscillations (which function as “temporal rulers”) and thalamic phase comparators (which compare input timing against the “ruler”). This scheme is consistent with a large body of anatomical and physiological data. Although the data supporting it are compelling, we do not claim that this proposed scheme is the only possible decoding mechanism or that there is a single decoding mechanism for retinal outputs. Several such mechanisms are available to the visual system, and brains must emphasize one or another mechanism, depending on the visual stimulus, context, and previous experience. During natural viewing, we suggest, the active decoding mechanism described here plays a major role.
The temporal encoding-decoding scheme presented here provides a mechanism that is free from retinal smearing that would otherwise be caused by FeyeM. According to this scheme, local hyperacuity is resistant to optical blurring because of the differential nature of retinal temporal encoding. Utilization of the temporal domain by the visual system allows information to be accumulated, rather than averaged, during the entire fixation period. Furthermore, such utilization of temporal coding enables control of visual resolution by simply controlling eye velocity (Saig et al., 2012); such control can be used to adapt visual resolution to the spatial frequencies of the external image. The decoding by intrinsic oscillations provides an adaptive band-pass filtering mechanism in which the center frequency of the filter tracks the dominant frequency in the input. Note that this mechanism provides also a “temporal extrapolation” mechanism, which can predict the time in which a moving object will cross a given retinal RF by locking to the Doppler-shifted frequency (see Phase Locking above). Finally, decoding by intrinsic oscillations provides internal temporal markers that can facilitate serial processing of sequential chunks of information, chunks that contain inter-related information such as the information sampled in parallel across the retina during each FeyeM cycle.
Accurate vision, if mediated by NPLLs, should require continuous fixation. This is not inconsistent with common experience and controlled studies indicating that visual acuity does improve with longer fixational period (Riggs et al., 1953; Keesey, 1960; Morgan et al., 1983; Packer and Williams, 1992). Following a stimulus onset, visual accuracy is incrementally impaired, and starts to improve again only after about 200ms (Packer and Williams, 1992). Interestingly, image stabilization during the initial periods of fixation facilitates accurate vision. Beyond ~100ms, stabilization is detrimental and retinal motion is helpful for accurate vision (Riggs et al., 1953). All these data are consistent with visual accuracy being impaired by the smearing caused by FeyeM until thalamocortical circuits lock-in and start to decode the temporally encoded information. According to this scenario, it makes sense for the visual system to ignore cortical outputs immediately after a saccade for periods up to about 200ms, a phenomenon known as saccadic suppression (Carpenter, 1988).
To prevent aliasing (which occurs when the sampling frequency is too close to the sampled frequency) temporal encoding should, and probably does, rely on multiple frequencies (see Ahissar and Arieli, 2001). During a fixation period, different frequencies can be superimposed or can appear sequentially. Based on accumulated data (Examples of Human FeyeM in Appendix 2 and published FeyeM traces), our decoding example (Figure (Figure5)5) assumed the latter. However, when different FeyeM frequencies are superimposed, and head movements are added (Steinman and Levinson, 1990), the eye spatially drifts, and consequent cycles do not start at the same retinal location. Thus, thalamocortical circuits should be coupled such that locking information propagates through neighboring circuits (Hoppensteadt and Izhikevich, 1998). In fact, if coupling strengths can change, networks of NPLLs can learn to process complex patterns of oscillatory inputs (Hoppensteadt and Izhikevich, 2000), generated by various interactions of FeyeM with external images. If efferent copies of FeyeM exist, they can significantly facilitate this process (as well as facilitating initial locking-in after saccades). In any case, the challenge of a continuous processing of drifting retinal image is not unique to the present model – it is a general challenge, which every model of vision must deal with.
In fact, the complication induced by slow eye drifts can be better handled by temporal decoders of the type presented here than by spatial decoders that are based on cell identity. Since temporal changes induced by eye drifts are common to the entire visual field, they can be decoded by widely tuned NPLLs, low-pass integration of NPLL outputs (Ahissar, 1998), or by other specialized networks (e.g., Pitkow et al., 2007). Once decoded, this information can be used to control the coupling between neighboring fine-tuned NPLLs (Ahissar, 1998). Note that in any case this global drift, being common to all NPLLs, should not distort the differential temporal coding of local spatial details described here.
Reading out cortical differential representations should involve lateral comparisons of outputs of simple cells, which could be ambiguous if the response polarities (i.e., ON vs. OFF) of the compared cells are not known. This problem is probably circumvented by segregation of thalamocortical circuits to ON-center and OFF-center clusters (McConnell and LeVay, 1984; Lund et al., 1985; Zahs and Stryker, 1988); within each such cluster response timing can be compared across cells with a remarkably high precision (Reinagel and Reid, 2002). A similar segregation, according to color sensitivity, is expected to circumvent ambiguities due to chromatic aberration. Such segregation was observed in the human retina (Roorda and Williams, 1999).
Cortical (and retinal) representations of external velocities are unique only if the amplitude of the FeyeM is smaller than both the sRF length and the external spatial periods. Otherwise, the transformation is not unique; different combinations of external spatial periods and external velocities could induce similar cortical spike counts. When the sRF is longer than the external spatial periods, aliasing problems are introduced, which cause additional ambiguities. The visual system could avoid such ambiguous coding by relying on high frequency low-amplitude FeyeM for foveal vision, and on FeyeM with increasing amplitudes (associated with decreasing frequencies) for increasingly eccentric vision. The finding that rod monochromat subjects, who lack foveal receptors, exhibit large-amplitude-low-frequency nystagmus (Yarbus, 1967; pp. 119–122) is in agreement with such a scheme, which suggests that the lack of foveal reception eliminates the need for high frequency FeyeM, leaving only low frequency FeyeM, which become more coherent. Further support comes from stabilization experiments that showed that when only large-amplitude movements are compensated, only peripheral vision fades away (Gerrits, 1978).
During natural fixation, drift, and tremor movements are often interrupted by brief microsaccades (see Introduction), which bounce retinal RFs to new locations. Obviously, computations of 2D details by NPLLs cannot continue across microsaccades, and should not include data acquired during microsaccades. Interestingly, many cortical neurons respond either during microsaccades (“saccade cells”) or during drift (“position/drift cells”) but not during both (Snodderly et al., 2001). This indicates that the visual system contains independent channels that enable independent processing of visual information acquired during drift-tremor and during microsaccades.
Every sensory system contains multiple motor-sensory loops that together perceive components of the external world. Such loops contain sub-cortical and cortical stations and pathways which interact in complex ways (see for example the scheme of the vibrissal motor-sensory system in Kleinfeld et al., 2006). The encoding-decoding scheme proposed here belongs to an intermediate level of the visual motor-sensory system; there are lower loops that control more basic functions and higher loops that interpret the outputs of the circuits described here. At all loop levels, the processed sensory information affects next eye movements which in turn affect future sensory inputs (Figure (Figure9;9; see also Uchida et al., 2006). The current paper focuses on the encoding and decoding of temporally coded information generated by FeyeM. However, it is assumed that this processing is one component of the entire system of motor-sensory loops composing the visual system, and that this entire system works in coordination when perceiving external objects. Thus, the interpretations of the outputs of the circuits described here are components in the working of higher (and perhaps also lower) motor-sensory visual loops. Another assumption is that at each level there are many parallel circuits and loops that either compete or cooperate in controlling motor functions, depending on the conditions. According to this scheme, the chaotic-like patterns of eye movements typically observed reflect superposition of many competing and cooperating processes implemented along many motor-sensory loops. How, eventually, a coherent perception emerges from such a seemingly messy process is of course not yet clear. Still, the leading assumption here is that although the process looks chaotic to an external observer, there is a hidden order here as each of the operating loops is tuned to process the outcome of its own controlled components of eye movements.
We describe the principles of the encoding-decoding scheme using examples of periodic FeyeM (e.g., Figures Figures33 and and5).5). Still, pure periodicity is not a critical requirement of the proposed scheme. First, as explained in Section “Phase Locking” above, and shown in the Section “Loop Dynamics” in Appendix 1, the decoding circuit can track changes in the input frequency up to a certain limit (see also Ahissar, 1998; Zacksenhouse and Ahissar, 2006). Second, the elongated structure of sRF allows also large tolerance for amplitude modulations: when the amplitude of the FeyeM changes, the temporal information is not affected; the only changes are in the number of sRFs that represent the stimulus and in the number of spikes conveyed by the FF signals. Since these changes are global to the entire retina, relative coding will hardly be affected. Similarly, the component of FeyeM that is perpendicular to the elongated axis of the sRF does not impair temporal coding. In fact, this perpendicular drift allows scanning of many different alignments between sRFs and a given spatial offset in the image. Thirdly, we propose that every decoding loop processes sensory information in a relatively narrow band (about one octave around its intrinsic frequency), and ignores fluctuations outside this band (see The Visual Motor-Sensory Loop above). The simulations with natural FeyeM presented in Figures Figures77 and and8,8, and simulations with amplitude and frequency modulated synthetic FeyeM (not shown) demonstrate that temporal coding functions fairly well with repetitive FeyeM that deviate significantly from pure periodicity.
Electrophysiological and anatomical:
Alleged inconsistencies with relevant experimental data:
Critical for temporal encoding at the retina:
Critical for decoding by thalamocortical NPLLs:
Additional predictions are described in Ahissar (1998).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We wish to thank Merav Ahissar, Shabtai Barash, Rafi Malach, Nava Rubin, and Dov Sagi for stimulating discussions and comments on the manuscript, Flemming Møller for providing the data for Figure Figure3E,3E, Naama Rubin for technical assistance, and Barbara Schick for reviewing and editing. This work was supported by the Israel Science Foundation, The Minerva Foundation funded by the Federal German Ministry for Education and Research, the US-Israel Binational Science Foundation, the Israeli Ministry of Health and the Israeli Ministry of Science and Technology. Ehud Ahissar holds the Helen Diller Family Professorial Chair of Neurobiology.
A given spatial offset in a stationary image, Δx, is translated to a neuronal temporal delay, Δt, by the velocity of the eye in the same direction, Veye, such that:
for clarity, we ignore here the dependence of these variables on time.
Determination of retinal frequencies depends on the relationship between the texture of the image (Sx, spatial period) and the peak-to-peak amplitude of FeyeM (Aeye). We will limit our discussion to cases in which the amplitude of eye movement is smaller than a single cycle of contrast change (Sx>Aeye). In these cases, the retinal periodicity (Ir) follows the periodicity of the FeyeM (Ieye):
Note that due to the dependency of Aeye on Ieye (see Ahissar and Arieli, 2001), different spatial periods will fulfill the above criterion (Sx>Aeye) at different FeyeM frequencies (see also Discussion: Complications and Challenges).
With moving images, the resulting retinal period for a given sRF will be:
where Veye represents the velocity of the eye in the protraction direction (approximated to be constant here), and Vx is the external velocity in the same direction (Figure (Figure3D).3D). We will not deal here with |Vx|≥|Veye|.
We use a linear version of the equation presented by Perkel et al. (1964) to describe the control of neuronal oscillations via inhibitory cells.
where Io(n) is the oscillation period at cycle n, Rout(n−1) is the number of spikes driving the inhibitory neurons during cycle n−1, and Tc is the intrinsic oscillation period (i.e., the oscillation period when Rout=0). The constant γ, whose units are ms/spikes, determines the gain of this control.
In the gating mode, thalamic neurons fire only when the retinal and cortical inputs overlap. Thalamic output would be maximal when the two inputs fully overlap (τD=0, where τD is the delay between the two inputs), and will gradually decrease as the delay increases until no overlap occurs. If the gradual decrease is linear, the thalamic PD can be approximated as:
where Rout(n) is the number of spikes generated by the PD during cycle n, R is the maximal number of spikes that can be generated by the PD [when τD(n)=0], and α is the gain of this function. R, and possibly also α, depend on the contrast; this dependency is ignored in the present discussion. The output signal is expressed in units of spike count, and thus cannot be assigned non-integer values; for clarity, this limitation is omitted from the equation.
The retino-cortical delay at cycle n, τD(n), depends on the delay at cycle n−1 and on the difference between the retinal [Ir(n)] and cortical [Io(n)] periods at cycle n:
This equation shows that the only condition in which τD(n)=τD(n−1), namely that the loop is at a steady-state, is when the oscillation period equals the retinal period:
If Vx=0, Rout represents the frequency of eye movements, or more accurately, the deviation of the eye frequency (1/Ieye) from the NPLL’s intrinsic frequency (1/Tc). If |Vx|<|Veye|, and the image component is moving in the protraction direction, Vx is positive and Rout will increase (compared to a stationary image). If it is moving in the retraction direction, as in Figure Figure5A,5A, Rout will decrease, as in Figure Figure5E.5E. The dependency of Rout on Vx, for various Veye values, is described in Dependency of Rout on Image Velocity in Appendix 2.
The loop can converge on a limited set of input frequencies; a set termed “the working range”. The working range depends on G: it is empty for G≥0, and increases as G becomes more and more negative. Thus, for all loops in which the working range is not empty G<0 and τD decreases with retinal periodicity (Ir; see Eq. 8), or increases with retinal frequency (1/Ir; Figure Figure66B).
The only case in which τD(n) does not depend on τD(n−1), i.e., on the history of the loop, is when G=−1. In this latter case, the loop can track changes in the input periodicity with a delay of only a single cycle (see the case depicted in Figure Figure5D).5D). As G diverges from −1, tracking becomes less and less accurate, since convergence to a fix point takes more and more cycles (Ahissar, 1998).
Retinal periodicities [Ir(n)] are translated into latencies [τD(n); Eqs 8 and 11] and spike counts [Rout(n); Eq. 10] of cortical simple cells (Figure (Figure6B).6B). The steady-state cortical representations are described by Eqs 8 and 9. These cortical representations should be valid at the input layers of V1, and of course might be further transformed in successive processing levels.
These cortical (and retinal) representations depend on the ratios between the amplitude of the FeyeM (Aeye), the sRF length (LRF), and the external spatial periods (Sx=1/spatial frequency). In this paper we focused on cases in which Aeye<Sx and LRF<Sx. In these cases, external velocities (Vx) are represented by cortical spike counts. In cases in which Aeye>Sx, the transformation is not unique; different combinations of Sx and Vx could induce similar spike counts. In cases in which LRF>Sx, aliasing problems are introduced, which cause additional ambiguities. There is probably enough information available to the visual system to resolve all these ambiguities by utilizing intra-burst frequencies, lateral comparisons, and orientations of sRFs. Alternatively, the visual system could avoid this ambiguous coding by relying only on those thalamocortical circuits for which, in a given epoch, Aeye≈LRF<Sx (see Discussion: Complications and challenges).
The visual motor-sensory loop (Figure (Figure9)9) controls the input periodicity by monitoring the output of NPLLs and modifying eye velocity accordingly. The loop equations are:
where Ve(t) is the velocity of the eye in one dimension (e.g., horizontal), gpll and gv are the NPLL’s and velocity controller’s transfer functions, respectively (Figure (Figure9),9), Ir(t) is the inter-spike interval of the retinal output, Tc is the NPLL’s intrinsic period, Rdc(t) is the mean integrated output of NPLLs, Rmax is the maximal possible Rdc(t), and Vother is the velocity additive component caused by the “Other velocity control” input.
Our ability to detect relative offsets is an order of magnitude better than the separation power of the eye. For example, we can detect an offset of a few arcseconds between two co-linear lines (vernier hyperacuity) while we can detect a separation between two parallel lines only if they are separated by about an arcminute (separation acuity; Westheimer, 1976). Hyperacuity is achieved only in relative coordinates, i.e., only between co-existing elements (McKee and Levi, 1987; McKee et al., 1990). It had been shown that separation acuity is limited by the amount of smearing caused by the optics of the eye and by cone spacing in the fovea – two variables that match each other (Westheimer, 1987). It is not known what limits hyperacuity and how it is achieved. It is widely assumed that hyperacuity benefits from the very same factor that limits separation acuity – optical smearing. Since the optics of the eye spread each tiny spot of light over several cones, the location of that spot can be computed by computing the center of cone activation. The resolution of such localization does not depend on cone granularity but on the granularity of cone responses, i.e., by the power of cone response in resolving the intensity of light absorbed by it.
Thus, with the aid of optical smearing, and perhaps overlapping RFs, the challenge of obtaining hyperacuity resolution can be transferred to the intensity domain. However, with all-or-none spikes, response intensity can only be measured along time. How much time is required for collecting enough spikes to detect a vernier offset? A retinal network that is simulated under realistic conditions can generate a one to two spikes difference for a near-threshold vernier offset, if it is given 60ms (Wachtler et al., 1996). To check whether this time period depends on the actual detection mechanism, we performed a simple calculation. We examined the time required for generating a one spike difference in the ensemble firing probability, where the only assumption is that the response of each cell in the population decreases monotonically with the visual angle of the stimulus. Interestingly, this time window hardly depends on the size of the population, and depends only on the maximal firing rate evoked in that population and on the vernier offset (Spatial Encoding of Hyperacuity by Firing Rates in Appendix 3). With realistic response values (Wachtler et al., 1996) and an offset of 1/4 of a ganglion RF diameter, this time window is estimated as ≥80ms. This time window could decrease significantly had the retinal signal been duplicated over several such ganglion populations. However, vernier acuity hardly improves when the vernier components are elongated. In fact, excellent vernier acuity is obtained with just two dots. This observation by itself already contradicts simple accumulation of retinal firing rates, or of other pure spatial signals, as a mechanism for hyperacuity.
The main consideration that invalidates a pure spatial encoding for fine details is the following. As the above observations show, it would take 60ms or more to obtain a minimally detectable difference in spike counts of ganglion cells, whose RF diameters are smaller than 1arcmin (foveal RFs get as small as 0.3arcmin). However, during these 60ms the eye had traveled over a significant number of such RFs [estimations would vary between 2 and 10arcmin, equivalent to 6–30 RFs, depending on the study: (Ratliff and Riggs, 1950; Barlow, 1952; Moller et al., 2002; Cherici et al., 2012; see Figures Figures1A1A and andA1).A1). Thus, individual ganglion cells never get enough time to express light intensity in spike counts with sufficient resolution. Moreover, integration over different RFs would be possible only if the cortical trajectory of the relevant signal would be known to the readout circuit ahead of time – a mechanism that does not seem to be feasible.
High resolution depth information depends on fine coordination between the eyes. During fixation, eye movements are well coordinated only during microsaccades, during which the eyes move simultaneously to the same direction and roughly at the same amplitude (Krauskopf et al., 1960; St Cyr and Fender, 1969). Luminance changes scanned by such brief microsaccades evoke responses in the retina, as is evident from responses of cortical “saccade cells” (Snodderly et al., 2001). The binocular disparity of these responses is translated to some depth value in the brain (Poggio, 1995; Qian, 1997). This depth value carries an error in absolute coordinates due to misalignment of the eyes which are continuously drifting along different paths (McKee et al., 1990). However, since this error is common to the entire image, relative depth relationships can still be resolved, and with the maximal resolution allowed by stereoptic mechanisms (e.g., Archie and Mel, 2000).
How do relative 2D and relative depth computations interact in the visual system is not yet known. However, the complementary nature of the responses of cortical “position/drift cells” and “saccade cells” (Snodderly et al., 2001), suggests that 2D and depth computations might function in parallel employing a time-sharing mechanism: 2D relationships are computed in a periodic manner as describe above, whereas depth relationships are updated during brief interleaving microsaccades, to which the 2D mechanisms are “blind.” An example to illustrate such a process would be the “cycle stealing” mechanism serving direct memory access in computers: memories are updated directly by a peripheral device during brief episodes to which the central processing unit is blind (see, for example, Horowitz and Hill, 1980).
FeyeM were recorded from the right eye of human subjects at 240Hz and a resolution of 0.01° by the iView-X system (SensoMotoric Instruments, GMBH). Subjects looked at a computer screen distanced 1m ahead of them and either tried to keep fixation on a single spot marked with a cross or viewed an image freely. Figure FigureA1A1 depicts 10 FeyeM traces recorded from one human subject while fixating on a single spot within an otherwise blank screen. Figure FigureA2A2 depicts (i) 50 epochs of 300ms each from traces of subjects wearing masks with artificial eyes (Figure (FigureA2A)A2A) – these traces depict the noise induced by the recording system and by head movements; (ii) 50 epochs of 300ms each from traces of subjects fixating on a single spot surrounded by horizontal or vertical gratings (Figure (FigureA2B);A2B); (iii) 50 fixational pauses of subjects while freely viewing a natural image (Figure (FigureA2C;A2C; note the different time scale) and their power spectra (Figure (FigureA2D).A2D). Fixational pause was defined as the period starting 30ms after a saccade and ending 10ms before the next saccade. The distribution of fixational pauses had a mode around 300ms; only pauses whose duration was >250ms are depicted. The 50 epochs were selected in these three cases according to the same criteria, exhibiting the largest modulation depths in each data set, and were de-trended.
Dependency of Rout on image velocity is depicted in Figure FigureA3A3.
Excellent vernier alignment is achieved with just two or three dots (e.g., Westheimer, 1987; Carney et al., 1995). The location of each dot can be estimated from the firing rates of all ganglion cells activated by that dot. We are not interested in the exact estimation mechanism here. We ask: for how long should that neuronal population be active before a detectable difference in spike count emerges for a hyperacuity offset. For simplicity we approximate the point spread function to a linear decay. We assume that the center of the activated population is firing at R spikes/ms, and this rate decreases to 0 linearly across N neurons, where N is the total number of ganglion neurons that can potentially participate in this computation.
We express the vernier offset, d, in units of cell numbers (d<1); thus, N and d share the same units. We look at the resolution of a population of N neurons: what is the minimal offset of a stimulus that will change the population spike count by one spike during time T. The difference (D) in population spike count induced by an offset d during 1ms of response is the difference in the area of two triangles, of lengths N and N−d, and heights R and R(N−d)/N. These triangles represent the firing probabilities of the population of neurons in which activity would change in the same direction, e.g., increase, had their stimulating dot moved by d.
Thus, the difference does not depend at all on N, but on the offset and the maximal firing rate in response to such stimuli. This would be true in general for any gradually decreasing function, since as the spread becomes wider, more cells are involved, but at the same time the slope becomes smaller and the difference in activity induced by the two positions decreases. Indeed, numerical solutions of Gaussian-like decaying functions, such as the point- or line-spread functions specified for pupil diameters around 3mm (Wachtler et al., 1996) show similar independence on N for N>~4, and similar or smaller D values.
The time window required for achieving one spike difference in such retinal populations is thus:
The maximal retinal response rate observed with line vernier stimuli was R=1/20spikes/ms (Wachtler et al., 1996). Vernier offsets at threshold are smaller than 1/4 of the smallest cone diameter. For d=1/4 we get T≥80ms. Thus, it would take at least 80ms to get a difference of 1 spike across the entire population of the relevant retinal ganglion cells.
FeyeM, fixational eye movements; FF, feedforward; NPLL, neuronal phase-locked loop; PD, phase detector; RCO, rate-controlled oscillator; SC, simple cell; sRF, subfield of a simple receptive field.