|Home | About | Journals | Submit | Contact Us | Français|
Although flying insects have limited visual acuity (approx. 1°) and relatively small brains, many species pursue tiny targets against cluttered backgrounds with high success. Our previous computational model, inspired by electrophysiological recordings from insect ‘small target motion detector’ (STMD) neurons, did not account for several key properties described from the biological system. These include the recent observations of response ‘facilitation’ (a slow build-up of response to targets that move on long, continuous trajectories) and ‘selective attention’, a competitive mechanism that selects one target from alternatives. Here, we present an elaborated STMD-inspired model, implemented in a closed loop target-tracking system that uses an active saccadic gaze fixation strategy inspired by insect pursuit. We test this system against heavily cluttered natural scenes. Inclusion of facilitation not only substantially improves success for even short-duration pursuits, but it also enhances the ability to ‘attend’ to one target in the presence of distracters. Our model predicts optimal facilitation parameters that are static in space and dynamic in time, changing with respect to the amount of background clutter and the intended purpose of the pursuit. Our results provide insights into insect neurophysiology and show the potential of this algorithm for implementation in artificial visual systems and robotic applications.
Many animals have evolved the ability to visually detect moving targets, often selecting a single target from amidst many distracters. Furthermore, these animals may interact with the target by initiating motor commands (e.g. eye or body movements), which then modulate visual inputs underlying the detection and selection task (via closed-loop feedback) . While such ‘active vision’ is almost ubiquitous in guiding complex animal behaviour, it remains uncommon in artificial vision systems. However, active vision may be key to exploiting the highly nonlinear pre-processing of visual information by the simple insect brain for complex tasks. For example, dragonflies detect, select and then chase prey or conspecifics within a visually cluttered surround for predation, territorial or mating behaviour . Remarkably, despite limited visual resolution (approx. 1°), they perform this task even in the presence of other distracting stimuli, such as swarms of prey and conspecifics [2,3]. Recent studies show that dragonflies rely on both predictive and reactive control for accurate target tracking , similar to those involved in hand-reaching by primates .
The accessibility of the dragonfly for stable electrophysiological recordings makes this insect an ideal and tractable model system for investigating the neuronal correlates for complex tasks such as target pursuit. Our laboratory has identified and characterized a set of neurons likely to mediate target detection and pursuit. These ‘small target motion detector’ (STMD) neurons of the insect lobula (third optic neuropil) are selective for tiny targets, on the same scale as the optical resolution of the eye. STMDs are velocity-tuned, contrast-sensitive and respond robustly to targets even against the motion of high-contrast background features [6–9]. Individual STMDs may have very small receptive fields, corresponding to just a few dozen facets of the compound eye, or view an entire visual hemisphere, suggesting a complex hierarchy in their contributions to underlying control systems for target pursuit [6–8,10].
Our recent electrophysiological data lend strong support for an underlying algorithm for local target discrimination based on an ‘elementary-STMD’ (ESTMD) operation at each point in the image to provide a matched spatio-temporal filter for small moving targets embedded within natural scenery . The ESTMD model reliably predicts several properties of STMD neurons, including their spatio-temporal tuning, their rejection of background motion  and even the selectivity for dark targets seen in some STMDs . However, this remains a model only for the elementary operation of local target discrimination: it makes no attempt to account for how information integrated across a large visual field is used to control visual gaze or target pursuit. Moreover, the ESTMD model does not explain several recently described features of STMD physiology and insect behaviour required by such a control system:
To test these hypotheses and address the limitations of the ESTMD, we present here an elaborated model for a control and pursuit system inspired directly by these latest physiological findings. This is based on inputs from ESTMDs , but incorporates additional processing to permit (i) prediction of target direction (ii) robust responses independent of the luminance contrast and (iii) a competitive selection mechanism that exploits response facilitation. We incorporate this front-end processing into a closed-loop simulation of dragonfly prey pursuit that incorporates an active saccadic gaze fixation strategy inspired by studies of insect pursuit behaviour [18–20]. We then use this system to explore the effect of varying the spatial and temporal scale of response facilitation on pursuit success and target discriminability under different environmental conditions.
Despite a relatively simple image processing strategy compared with traditional machine vision approaches, this system proves to be remarkably robust, achieving high prey capture success even with complex background clutter, low contrast and high relative speed of pursued prey. Hence, our results should be of interest to robotics engineers looking for computationally simple yet robust systems for figure/ground segregation. Moreover, while our results are consistent with emergent hypotheses for the role of hierarchical elements of the insect STMD system, we also identify several key principles for optimal performance of such a system that are directly testable in future physiological experiments.
Electronic supplementary material, figure S1 shows an overview of the target pursuit model implemented in MATLAB/Simulink. The model is composed of five subsystems: (i) a virtual reality environment to model insect position (predator and prey) and environmental parameters, (ii) an early visual processing stage, (iii) a target matched filtering (‘ESTMD’) stage, (iv) a position selection and facilitation mechanism, and (v) a saccadic pursuit algorithm based on insect behaviour.
A virtual reality (VR) environment (Simulink 3D animation toolbox, MathWorks Inc.) was composed of a cylindrical arena (radius 6 m), rendered with textures derived from natural panoramic image data from four natural scenes (see  for details on acquisition of the original HDR image data). Within this arena, we generated randomized ‘prey’ paths (100 times for each condition tested) with biologically plausible saccadic turns  initiated when the target approached within 50 cm of the cylindrical wall. At each saccade, the new heading was constrained in a range where targets turned between 30° and 150° in the horizontal plane (i.e. as viewed from above) and to a heading in the vertical dimension that ensured that it would not be lost from the limits of the camera viewport. Response transients from initial filter conditions were disregarded by introducing the 40 mm-sized target after a delay of 40 ms. Initial target location was 4 m away from the pursuer (angular size approx. 0.6°). Video was sampled at 1000 Hz from a 40° × 98°-sized viewport to represent the visual field of the pursuer, which moved at a velocity of 8 ms−1.
We tested target velocities between 50% and 200% of the pursuer velocity moving against one of four natural images. All images (Image A–D, electronic supplementary material, figure S1b) had varying clutter, however, all contained 1/f power spectra; a statistical property of natural scenes . The average clutter value (see electronic supplementary material, text S3 for details of the quantifying method) and intensity of each image is presented in electronic supplementary material, table S1. We simulated a range of target intensities set specifically for each image (electronic supplementary material, table S1).
To investigate competitive selection, we simulated a black target against Image B (electronic supplementary material, figure S1b) with a second distracter target introduced 100 ms later. At the time of appearance of the second target, both targets had the same size, luminance and distance from the pursuer. Both targets travelled in mirrored paths (electronic supplementary material, figure S1c) at a velocity of 6 ms−1.
The optics of flying insects are limited by diffraction and other forms of optical interference within the facet lenses . This optical blur was modelled with a Gaussian low-pass filter (electronic supplementary material, text S1). The green spectral sensitivity of the insect motion pathway was simulated by processing only the green channel from the original RGB images .
In biological vision, redundant information is removed with neuronal adaptation (temporal high-pass filtering) and centre–surround antagonism (spatial high-pass filtering) observed in physiological recordings of photoreceptors and the first-order interneurons, the large monopolar cells (LMCs). Subsequent stages of our model simulate these properties (electronic supplementary material, text S1).
Rectifying transient cells (RTCs) within the insect second optic neuropil (medulla) exhibit partial rectification properties well suited as additional input processing stages for an STMD pathway . Similar processing properties were implemented in our model by modelling RTC-like independent adaptation to light increments or decrements [26,27] with strong spatial centre–surround antagonism (electronic supplementary material, text S2).
At a single location, small targets are characterized by an initial rise (or fall) in brightness, and after a short delay are followed by a corresponding fall (or rise), irrespective of the direction of travel. This property of small features is exploited in the original ESTMD model  to provide selectivity for dark objects, by multiplying each ON channel with a delayed version of the OFF channel. Sensitivity to targets independent of their polarity was provided by multiplying each contrast channel (ON or OFF) with a delayed version of the opposite polarity (via a low-pass filter, τ = 25 ms) and then summing the outputs.
Despite its elaboration to permit detection of both contrast polarities, our ESTMD model only provides local target discrimination. To generate an input to the subsequent control system for target pursuit, we therefore added additional stages to integrate target motion across the full visual field of the camera. Our target selection and integration stages were inspired by the observed hierarchy of insect STMD neurons and the recent evidence for facilitation within their receptive fields [15–17].
Wide-field integration in our model begins with neuron-like soft saturation of ESTMD outputs, modelled as a hyperbolic tangent function, ensuring all signals lie between 0 and 1. A simple competitive selection mechanism is then added to the target detection algorithm by choosing the maximum of the output values across the full visual field of the input camera (equivalent to the receptive field of a wide-field STMD neuron). In our model, the location of this maximum is assumed to be the target location. In the insect STMD system, such local position information could be provided by the retinotopic array of small-field STMDs (SF-STMDs) which integrate local outputs of a small number of underlying ESTMDs, as observed in the hoverfly STMD pathway .
We implemented facilitation as observed in dragonfly CSTMD1 neurons (figure 1a,b) [15–17] with a Gaussian-weighted ‘map’ dependent on the location of the winning feature, but shifted by the target velocity vector. The ESTMD output was multiplied with a low-pass filtered version of this facilitation map with a time constant that controls duration of the enhancement around the predicted location of the winning feature. This was varied in the range 40–2000 ms (13 values) thus spanning the typical facilitation time course (approx. 200 ms) observed in dragonfly STMDs . We varied the size of this map via two-dimensional Gaussian kernels from 3° to 15° (half-width at half maximum, six values), thus spanning the observed size (approx. 7°) of the receptive fields of the SF-STMD neurons .
Predicting the future target location required estimation of a velocity vector (electronic supplementary material, figure S2) calculated using a traditional bioinspired direction selective model; the Hassenstein–Reichardt elementary motion detector (HR-EMD) . In this case, the EMD was applied as a second-order motion detector on the ESTMD outputs which correlates adjacent inputs after a delay (via a low-pass filter, τ = 40 ms) resulting in a direction selective output tuned to the velocity of small objects . The spatial shift of the facilitated area was determined by segmenting the output of the HR-EMD into three equal intervals (estimating the range of target velocity).
We previously described CSTMD1 responses to the presentation of a moving target with a slow onset time course and a fast offset decay . Figure 1c shows curve fits modified from individual CSTMD1 responses that we recorded to target onset (as described in reference ). We previously suggested that this asymmetry reveals that the neuron is not merely ‘sluggish’, but rather that the slow facilitation in responses enhances encoding of the target trajectory. Further research supports this, showing that local discontinuities in the target trajectory reset the response to a naive, non-facilitated state [16,17]. Furthermore, even with this slow build-up in response, the underlying velocity tuning of the neuron is similar to other STMD neurons .
To test whether our model of facilitation emulates this asymmetry in onset/offset time course under similar open-loop stimulus conditions, we first simulated an immobilized pursuer viewing a grey (50%) target that moved horizontally along the circumference of the arena against a white background. Our model (figure 1e) reproduces the slow onset and rapid decay of the average CSTMD1 time course (figure 1c,d red lines). In CSTMD1, the offset time course is consistently fast from neuron to neuron (). Although CSTMD1 shows an average 200 ms time constant in the response onset, individual examples also show large variability (figure 1c, blue lines). The variation in onset kinetics is readily simulated by our alteration of the facilitation time constant, but does not affect the offset time course (figure 1e).
Flying insects implement different pursuit strategies by maintaining the target at a specific angular position on the eye. For example, a male housefly will chase another fly by fixating it frontally (i.e. at azimuth 0°), thus ‘tracking’ the target in looping paths [19,20]. On the other hand, an aerial predatory dragonfly will intercept prey, with a recent study indicating that their head movements maintain the target in a frontal region (approx. 5°) . Similar to these pursuit strategies, we implemented a hybrid ‘tracking’, initiating a frontal fixation saccade whenever the winning feature of the ESTMD output (electronic supplementary material, figure S1a) moved more than 5° from the centre of the field of view. This strategy keeps the target close to the pole of expansion in the flow-field generated by the pursuer's own motion through the world, i.e. where local background image speeds are lowest. This discontinuous position-servo approach then allows the nonlinear spatio-temporal filtering inherent to the ESTMD pathway to enhance target ‘pop out’ against a highly cluttered background during the intersaccade period.
STMD neurons are tuned to small objects, with peak responses to an optimum target size of 1.6° × 1.6° . In the biological system, it is presumed that other neurons, such as the ‘looming’ system observed in locusts [31,32], could be recruited to finalize prey capture as the target nears and thus becomes larger than optimal for an STMD. For the sake of clarity, we therefore limited our modelling efforts to the STMD pathway and declared a successful capture when the pursuer came within 1.2 m of the frontally fixated prey in less than 2 s. To exclude ‘fortuitous’ last minute saccades towards background features in the vicinity of the intended target, we arbitrarily included the additional criterion that the target had to be the winner in the output of the ESTMD array for more than 50% of the last 6 ms of tracking.
While analysis of biological neurons is limited to studying facilitation as an ‘always on’ feature of the underlying detection pathway, our model allows us to simulate the fate of pursuit flights from identical starting points but with the facilitation mechanism turned on or off. This allows us to explore effects of facilitation both locally (target discriminability) and globally (average effect on pursuit success).
Figure 2 shows three individual examples of pursuit simulations where targets pass across challenging parts of the background. In all three, the non-facilitated pursuit fails, whereas facilitation results in successful target capture. Figure 2a,b shows two estimates of target contrast during the pursuit. The first is based on the input imagery, i.e. a simple measure of the signal-to-noise ratio based on the luminance difference between the target and its near background. The second is a weighted signal-to-noise ratio (WSNR) calculated (electronic supplementary material, text S4) from target and background luminance, as well as target angular size (i.e. distance between target and pursuer). This takes account of the optical blur that demodulates the target at large distances owing to its small angular size. In all these examples, the WSNR gradually improves throughout the duration of the pursuit as the pursuer nears the target. This is because its angular size increases from the initial value (0.6°)—well below the 1.4° optical blur introduced by the optical sampling. Figure 2c shows target discriminability (electronic supplementary material, text S5) in the ESTMD output for both facilitated and non-facilitated simulations. Discriminability values below 1 (dashed line) are due to detection failure (discriminability = 0) or detection of stronger false positives (0 < discriminability < 1). In example 1, the non-facilitated pursuit fails rapidly owing to low target discriminability, whereas addition of facilitation leads to successful target capture. In examples 2 and 3, the pursuit failure in non-facilitated simulations is due to detection of strong false positives generated by highly contrasting features of the background scene. In these examples, facilitation boosts the local response in the vicinity of the previously tracked target, maintaining focus on the target even though it is inherently less salient than such distracters.
Although we observe clearly improved target discriminability in selected individual pursuits, facilitation might conceivably have a negative influence on target discrimination in some situations, e.g. by enhancing responses to background features. We therefore examined the pursuit success (%) with the addition of facilitation in different environmental scenarios. For this purpose we varied target to pursuer velocity ratio (|Vt|/|Vp|) and target intensity (electronic supplementary material, table S1) against different images (electronic supplementary material, figure S1b). Additionally, we tested the effect of duration of enhancement (facilitation time constant) in these simulations.
Figure 3 shows that pursuit success (averaged over four target intensities) varied across the four images. The more cluttered images, Image A and Image D, yield maximum success rates of approximately 50%. Both include features that evoke false positives, either naturalistic (e.g. foliage in Image A) or man-made (e.g. windows in Image D). Image B is sparse and Image C contains predominantly straight edges. Neither evoke many false positives, resulting in maximum success rates exceeding 75%.
As target velocity exceeds that of the pursuer, pursuit success decreases, because the pursuer is simply not fast enough to catch the target (figure 3). However, given the randomized nature of its path, targets sometimes move towards the pursuer at some point during the simulation so pursuit success is greater than zero. Moreover, pursuit success exhibits tuning to optimal velocity ratios because if a target moves too slowly or too quickly, it falls out of the velocity tuning range that is an inherent property of the ESTMD model .
The optimum facilitation time constant (dashed line) changes in response to variation of the target–pursuer velocity ratio. As target velocity increases, the optimum time constant decreases. This reflects the fact that faster targets require a facilitated area of enhancement to rapidly ‘keep up’ and have less ‘sluggish’ kinetics (i.e. lower time constant). Hence, the ideal time constant is likely to depend on the goal of the pursuit system (e.g. slower-moving prey versus fast-moving conspecifics).
In addition to the consistent influence of target speed, the optimum facilitation time constant changes across images. Pursuit success at any particular target speed improves with a longer time constant in Image A and Image D, compared with the qualitatively less cluttered Image B and Image C (figure 3). These data also suggest a relationship between the kinetics of facilitation and the average amount of background clutter. However, this is complicated by the fact that targets may sometimes be viewed against locally less cluttered parts of generally high-clutter scenes and vice versa. To quantify this relationship and investigate how it may evolve during a pursuit, we developed a robustness metric (electronic supplementary material, text S6). This represents the real performance of the model as a percentage of the ideal performance. The minimum criterion for an acceptable performance is a successful pursuit; however, a higher target discriminability throughout the pursuit indicates more robust performance. Therefore, our robustness metric calculates success rate, with individual successes weighted by their average target discriminability.
We examined the evolution of pursuit robustness in response to changes in background clutter (electronic supplementary material, text S3) and facilitation kinetics. Pursuit durations varied across all images with an overall mean of 0.6561 ± 0.0031 s (electronic supplementary material, figure S3). Figure 4 shows pursuits segmented into three equal periods (determined for each individual pursuit depending on its duration), as well as the whole pursuit (|Vt|/|Vp| = 3/4). Robustness is initially less than 15% (figure 4a), however, it increases as the pursuit progresses (figure 4b,c), owing to the growth of target angular size and the build-up of facilitation. These data confirm that as background clutter increases, target discrimination decreases. More interestingly, the optimum facilitation time constant (dashed line) is oriented towards longer time constants as background clutter increases. This trend can be explained by more frequent camouflage of the target in more cluttered backgrounds. Consequently, a longer time constant enhances the area of target disappearance for a longer duration, thus improving target discrimination when it reappears. This is an analogue of the ‘expectation’ human observers have for the reappearance of a target, following disappearance behind an occluding object . However, when the background is less cluttered, facilitation with a long time constant lags behind a visible target, with the potential to enhance false-positives. Consequently, faster facilitation performs better for smaller clutter values.
Our rationale for using a Gaussian-weighted patch as the basis for facilitation is that position information must be represented in the insect brain via retinotopically organized local feature detectors . It is most likely that facilitation operates at the level of these units. In hoverflies, retinotopically organized SF-STMD neurons have approximately Gaussian receptive fields with a half-width of approximately 7° . It is interesting to consider whether there is a clear optimum scale for this operation. Figure 5 displays the effect of varying facilitation kernel size on pursuit success for different time constants, averaged over target intensities. Small sizes of the kernel (half-width less than 7°) diminish the model's ability to successfully track the target. Because the velocity vector used to shift the facilitation map (§2.4.2) is only an estimate of the target velocity, too small a facilitation patch might not accurately enhance the area around the target, particularly when the target trajectory is unpredictable (e.g. during a prey saccade). Although larger kernels increase the probability of enhancing the correct location of the target, they also boost false positives in a larger area of the background. This likely explains declining pursuit success for kernel sizes above 9° (figure 5). Pursuit success reaches its maximum at the size of 7–9° for all images, remarkably similar to the size of insect SF-STMD receptive fields .
The preceding sections explored optimal parameters for the facilitation mechanism, but do not tell us about the resulting gain in performance. We therefore ran pursuits using optimal parameters for each image and then re-ran them from identical starting points with facilitation turned off, to quantify the absolute improvement in pursuit success (figure 6). With less cluttered images (Image B and Image C), high-contrast targets led to very high capture success even without facilitation, leaving little headway for improvement. We therefore also ran simulations across a larger range of target intensities than in the earlier simulations. The success rate in non-facilitated simulations (contour lines) improves as target intensity is increased or decreased away from the mean background (dashed line). The difference between the facilitated and non-facilitated model is shown with the colour map. The largest improvement from facilitation (hot colours) happens when the non-facilitated model is only successful in 30–60% of simulations. When the target contrast is too low, facilitation has no effect owing to the complete failure of target detection (grey area in the plot).
Dragonflies can feed among swarms of prey, requiring an attention mechanism to select one target. A likely neuronal correlate is the competitive selection previously observed in CSTMD1 electrophysiological recordings . These neurons respond to only one of the competing stimuli at any point in time, although they may switch from one to the other. To examine the interplay between facilitation and competitive selection in our model, we introduced a distracter target to the simulations. We determined whether facilitation results in ‘locking’ on to one target, thus reducing the number of switches between two targets.
Figure 7 shows discriminability of the targets in the ESTMD output and their WSNR value (electronic supplementary material, text S4) for simulations both with and without facilitation. As intuitively expected for two identical targets, in the non-facilitated case, both targets have similar discriminability and thus compete as ‘winners’ in the ESTMD output (figure 7a). At any instant, the winner depends only on the local properties of the background, leading to a competitive switching between the two. As a consequence, any given saccade is more likely to result from a switch in ‘attention’ between the two targets (i.e. change in the current winner indicated by a discriminability value greater than 1) than a ‘normal’ fixation saccade to re-centre the tracked target in the frontal visual field. However, the addition of facilitation reduces both these attentional switches (figure 7a–c) and resulting switching saccades (figure 7a–c, small coloured markers). Switches are reduced even further by increasing the facilitation time constant. In the example shown in figure 7c, the longer time constant leads to the model focusing exclusively on the first target after 170 ms, with only normal fixation saccades after this time. In the same simulation with a shorter facilitation time constant (figure 7b), a lock-on to the second target occurs after 240 ms, following several prior switches between the two. Considering the WSNR value, in the facilitated simulations, even when the winning target moves across an area of the background that causes its contrast to decrease below that of the alternative, the model does not necessarily switch to the inherently more ‘salient’ target. An example is shown in figure 7c, where the discriminability of target 1 is boosted by facilitation to a winning level (greater than 1) even though target 2 has a higher WSNR at the corresponding time frame (the dashed box). These examples support the idea that facilitation plays a potential role in the selective attention observed in the STMD neurons .
To further quantify the effect of facilitation on competitive selection, we calculated the frequency of both fixation saccades (saccades towards the same target) and switching saccades (from one target to the other). Figure 8 clearly shows that the frequency of switching saccades decreases with longer facilitation time constants. With longer time spent with individual targets as the winner, this leads to an increase in the frequency of fixation saccades (figure 8a, open symbols). Interestingly, we see this increase in fixation saccades even for relatively short facilitation time constants (less than 100 ms), even though these do not lead to a significant reduction in the corresponding frequency of switching saccades. This reflects the fact that the boost in local ‘salience’ to one or other (or both) targets resulting from even short facilitation time constants leads to a decrease in the time during a pursuit when neither target is the winner (figure 8b). As expected, for long time constants, when fixation saccades are dominant, average saccadic turn angles approach the angle defined for initiating a frontal fixation saccade (5°, figure 8c). Although the appearance of the second target is always as a mirror image of the first target and thus initially at a small angular separation (between 3° and 10°), as pursuits continue the different paths of the two targets leads to an increase in their angular separation. Consequently, the average saccadic turn angle increases dramatically for shorter time constants where switches between the two targets dominate saccades.
Our data clearly show that an elaborated version of the ESTMD model incorporating summation of feature detectors for both dark and light-contrasting targets provides robust detection of varying target intensities against a wide range of backgrounds. This rivals the remarkable sensitivity for low-contrast targets of the insect visual system upon which it is based .
The newly discovered facilitation in CSTMD1 neurons is suspected to play a role in enhancing sensitivity for targets moving along long trajectories, possibly contributing to the high capture rate in dragonflies. Supporting this hypothesis, inclusion of facilitation in our closed-loop model substantially improves pursuit success. This particularly interesting improvement was even observed for very slow facilitation time constants despite the average duration of successful pursuits remaining relatively short. In this regard, our model optimization parallels analysis of both physiology and behaviour in dragonflies. Dragonfly pursuit flights are typically very short, with an average of 184 ms  (although we note that this was calculated from the time the dragonfly commenced pursuit, so it is likely that the underlying neurons were already encoding target motion for some time prior). Yet the physiologically measured facilitation time reported in our earlier work is on the order of 300–500 ms ([15–17] and figure 1a–c).
One might intuitively expect that time constants governing biological image processing should be at least as fast as the behaviour for which they are employed. In fact, the same expectation is a fundamental basis of control theory. Reducing the phase delay to reach the steady-state mode in the shortest possible time is one of the main concerns in the design process of closed-loop systems [34,35]. However, our observation—in both the biological system and in our model of it—suggests that this need not always be the case. The reduced effectiveness of facilitation with very short time constants most likely reflects numerous contributing factors including uncertainty of the actual target location at any given instant and periods during which the target is camouflaged in the background clutter. Having a sluggish time constant for facilitation allows ‘persistence’ in enhancement of the estimated vicinity of a temporarily invisible target. But as a consequence, the fully facilitated ‘steady-state’ response of neurons observed in the laboratory after 500 ms of motion against a blank background may be rarely experienced in nature when the entire pursuit may be shorter . On the other hand, prey pursuit is not the only target detection and pursuit task that many insects engage in, e.g. dragonflies also pursue fast-moving conspecifics for several seconds in highly cluttered environments. Targets would be frontally fixated for much of such pursuits, providing ample time for STMD neurons to become fully facilitated.
As mentioned earlier, the actual optimum in this relationship between facilitation kinetics and pursuit behaviour likely depends on many factors and should be dynamic, changing based on the amount of background clutter and the target velocity. Therefore, it is possible that the facilitation time course may reflect different ‘modes’ of behaviour adopted by different species. Some dragonflies capture small moving prey above water, where low vegetation in the ecosystem provides a less cluttered environment . The rapid prey pursuit flights analysed by Olberg et al.  were from perching dragonfly species that view prey 45–90° above the horizon at their perch location, allowing them to use the sky as a clear background for detection of small moving prey . However, our physiological recordings are taken from a hawking dragonfly (Hemicordulia) which feeds while continuously flying and is frequently observed catching prey in complex visual clutter. We predict that the optimum facilitation mechanism might thus differ substantially between species that adopt such varied behaviour. It awaits further experiments from other dragonfly species to test this hypothesis.
The other question thus arises as to whether facilitation observed in CSTMD1 neurons indeed exploits dynamic kinetics given different environments or target speeds (as our model predicts), or whether the insect has simply evolved a static facilitation time constant for its natural habitats. Because species like Hemicordulia must deal with a variety of different background scenes, an intriguing possibility is that a dynamic facilitatory time constant would allow for variation in both the background clutter as well as for the purpose of the pursuit (e.g. prey or conspecific). The variation in onset time course observed in individual CSTMD1 recordings (figure 1c) may indicate a dynamically adaptable facilitation mechanism, possibly modulated by factors such as preceding visual stimuli (clutter) and behavioural states (e.g. attention). Further physiological and behavioural experiments will be required to address these questions more directly.
Our implementation of facilitation involved an element with a Gaussian receptive field property inspired by recordings of small-field elements of the insect STMD pathway. We hypothesized that such elements represent a level at which target location is encoded by this pathway, but our simulations were able to explore a range of widths for the ‘receptive field’ of these facilitating elements. Impressively, our results reveal an optimum kernel size for facilitation close to the size of SF-STMD receptive field (approx. 7°) observed in hoverflies . The fact that this receptive field size is intermediate between the size of the local elements (ESTMDs) that are actually responsible for local target motion detection and the larger receptive fields of STMD neurons like CSTMD1 suggests a hierarchy in the organization of STMD neurons. This intermediate scale for SF-STMDs within this hierarchy most likely results from the trade-off between uncertainty in the estimation of target location and in the value of long-lasting facilitation in maintaining an improved sensitivity for targets in clutter. This would be particularly important when features are temporarily obscured by their low contrast against the background or when passing behind occluding foreground features. Although a larger facilitation kernel may increase the probability of enhancing the appropriate target despite uncertainty in its precise location, it also increases the chance of detecting false-positives in the background. We have not yet tested this hypothesis for occluding features, but our model architecture is ideally suited to elaborated simulations of pursuit in more structured three-dimensional scenarios.
Our data support a possible role for facilitation in selective attention, even though we did not implement it as an attention mechanism per se. However, we noted that the addition of facilitation leads to both a decrease in the proportion of time during which neither target is the winner, and in the frequency of switches between fixating the two alternatives. We also see clear examples (figure 7) where the previously fixated target remains the winner despite not always being the inherently more salient of the two—a classic hallmark of attention [36,37]. In this respect, our data mirror the response of the dragonfly CSTMD1 neuron, which displays selective attention in response to two targets moving simultaneously in its receptive field . In CSTMD1, there are also clear instances where the initially fixated target remains the ‘winner’ even when the alternative would have produced a stronger response had it been presented alone . In our model, both the improvement in the relative frequency of normal fixation saccades towards the selected target (as opposed to switches) and the decrease in the average saccadic turn angle saturate as the facilitation time constants approach the order of 100–300 ms. Once again, this is a remarkably close fit to the observed time constant in CSTMD1 [15,17].
Selective attention in insects and other animals undoubtedly involves additional processes to the relatively simple selection mechanism we implemented here. Nevertheless, our results support a potentially important role for a ‘bottom-up’ competitive process in attention as an emergent property of lower level processing in the STMD pathway. However, given the relatively simple winner-takes-all mechanism that we implemented for target selection, we cannot reject the possibility that target selection in biological STMDs might not also reflect a top-down attention process. Testing such mechanisms in physiological recordings is severely restricted in the richness of stimuli that can be presented, because the animal is restrained with wax and can experience stimuli imposed only upon it in open-loop. A major advantage of our computational model is that it allows us to investigate future questions that are difficult or impossible to conceive in physiological recordings. For example, of the effect of facilitation on both attentional switches and the physical saccades. Given its ability to reproduce so much of the lower-level behaviour of the physiological system, our model thus provides a promising platform to further explore the possible ‘higher-order’ network interactions that may be involved in target selection. As we have demonstrated here, a further advantage of our modelling approach is that it also generates hypotheses that require further physiological experiments that are feasible in open-loop STMD recordings, such as the possibility that facilitation time constants may be influenced by the statistics of the background clutter against which stimuli are presented.
S.D.W., B.S.C. and D.C.O. conceived of the study. Z.M.B. developed the code and ran the model simulations. Z.M.B., S.D.W., B.S.C., S.G. and D.C.O. designed the experiments, analysed and interpreted the data and helped draft the manuscript. All authors gave final approval for publication.
We declare we have no competing interests.
This research was supported under Australian Research Council's Discovery Projects funding scheme (project number DP130104572).