The first stage of processing converts digital images into a list of feature points using an elaboration of a background subtraction algorithm. Because the image of a target is usually only a few pixels in area, an individual feature point from a given camera characterizes that camera's view of the target. In other words, neglecting missed detections or false positives, there is usually a one-to-one correspondence between targets and extracted feature points from a given camera. Nevertheless, our system is capable of successful tracking despite missing observations owing to occlusion or low contrast (§3.1) and rejecting false positive feature detections (as described in §3.2).
In the Bayesian framework, all feature points for time t
are the observation t
. The i
th of n
cameras returns m
feature points, with each point zij
being a vector zij
) where u
are the coordinates of the point in the image plane and the remaining components are local image statistics described below. t
thus consists of all such feature points for a given frame t
, … , z1m
, … ,zn1
, … ,znm
}. (In the interest of simplified notation, our indexing scheme is slightly misleading here—there may be varying numbers of features for each camera rather than always m
The process to convert a new image to a series of feature points uses a process based on background subtraction using the running Gaussian average method (reviewed in [39
]). To achieve fast image processing required for real-time operation, many of these operations are performed using the high-performance single instruction multiple data
extensions available on recent ×86 CPUs. Initially, an absolute difference image is made, where each pixel is the absolute value of the difference between the incoming frame and the background image. Feature points that exceed some threshold difference from the background image are noted and a small region around each pixel is subjected to further analysis. For the j
th feature, the brightest point has value βj
in this absolute difference image. All pixels below a certain fraction (e.g. 0.3) of βj
are set to zero to reduce moment arms caused by spurious pixels. Feature area αj
is found from the 0th moment, the feature centre (ũj
) is calculated from the 1st moment and the feature orientation θj
and eccentricity εj
are calculated from higher moments. After correcting for lens distortion (§4), the feature centre is (uj
). Thus, the j
th point is characterized by the vector zj
). Such features are extracted on every frame from every camera, although the number of points m
found on each frame may vary. We set the initial thresholds for detection low to minimize the number of missed detections—false positives at this stage are rejected later by the data association algorithm (§3.2).
Our system is capable of dealing with illumination conditions that vary slowly over time by using an ongoing estimate of the background luminance and its variance, which are maintained on a per-pixel basis by updating the current estimates with data from every 500th frame (or other arbitrary interval). A more sophisticated two-dimensional feature extraction algorithm could be used, but we have found this scheme to be sufficient for our purposes and sufficiently simple to operate with minimal latency.
While the real-time operation of flydra is essential for experiments modifying sensory feedback, another advantage of an online tracking system is that the amount of data required to be saved for later analysis is greatly reduced. By performing only two-dimensional feature extraction in real time, to reconstruct three-dimensional trajectories later, only the vectors zj need be saved, resulting in orders of magnitude less data than the full-frame camera images. Thus, to achieve solely the low data rates of real-time tracking, the following sections dealing with three dimensions are not necessary to be implemented for this benefit of real-time use. Furthermore, raw images taken from the neighbourhood of the feature points could also be extracted and saved for later analysis, saving slightly more data, but still at rates substantially less than the full camera frames provide. This fact is particularly useful for cameras with a higher data rate than hard drives can save, and such a feature is implemented in flydra.
shows the parameters (u, v, θ) from the two-dimensional feature extraction algorithm during a hummingbird flight. These two-dimensional features, in addition to three-dimensional reconstructions, are overlaid on raw images extracted and saved using the real-time image extraction technique described above.
Figure 4. Raw images, two-dimensional data extracted from the images, and overlaid computed three-dimensional position and body orientation of a hummingbird (Calypte anna). In these images, a blue circle is drawn centred on the two-dimensional image coordinates (more ...)