Sensory integration is more complicated for movement planning than for a simple perceptual task. The problem is that movement planning and execution rely on a number of different computations, and estimates of the same spatial variable may be needed for several of these. For example, there is both psychophysical (
Rossetti et al., 1995) and physiological (
Batista et al., 1999;
Buneo et al., 2002;
Kakei et al., 1999,
2001) evidence for two separate stages of movement planning, as illustrated in . First, the movement vector is computed as the difference between the target location and the initial position of the hand. Next, the initial velocity along the planned movement vector must be converted into joint angle velocities (or other intrinsic variables such as muscle activations), which amounts to evaluating an inverse kinematic or dynamic model. This evaluation also requires knowing the initial position of the arm.
When planning a reaching movement, humans can often both see and feel the location of their hand. The ML model of sensory integration would seem to predict that the same weighting of vision and proprioception should be used for both of the computations illustrated in . However, we have previously shown that when reaching to visual targets, the relative weighting of these signals was quite different for the two computations: movement vector planning relied almost entirely on vision of the hand, and the inverse model evaluation relied more strongly on proprioception (
Sober and Sabes, 2003). We hypothesized that the difference was due to the nature of the computations. Movement vector planning requires comparing the visual target location to the initial hand position. Since proprioceptive signals would first have to be transformed, this computation favors vision. Conversely, evaluation of the inverse model deals with intrinsic properties of the arm, favoring proprioception. Indeed, when subjects are asked to reach to a proprioceptive target (their other hand), the weighting of vision is significantly reduced in the movement vector calculation (
Sober and Sabes, 2005). We hypothesized that these results are consistent with “local” ML integration, performed separately for each computation, if sensory transformations inject variability into the transformed signal.
In order to make this hypothesis quantitative, we must understand the role of sensory transformations during reach planning and their statistical properties. We developed and tested a model for these transformations by studying patterns of reach errors (
McGuire and Sabes, 2009). Subjects made a series of interleaved reaches to visual targets, proprioceptive targets (the other hand, unseen), or bimodal targets (the other hand, visible), as illustrated in . These reaches were made either with or without visual feedback of the hand prior to reach onset, and in particular during an enforced delay period after target presentation (after movement onset, feedback was extinguished in all trials). We took advantage of a bias in reaching that naturally occurs when subjects fixate a location distinct from the reach target. In particular, when subjects reach to a visual target in the peripheral visual field, reaches tend to be biased further from the fixation point (
Bock, 1993;
Enright, 1995). This pattern of reach errors is illustrated in the left-hand panels of : when reaching left of the fixation point a leftward bias is observed, and similarly for the right. Thus, these errors follow a retinotopic pattern, i.e. the bias curves shift with the fixation point. The bias pattern changes, but remains retinotopic, when reaching to bimodal targets () or proprioceptive targets (). Most notably, the sign of the bias switches for proprioceptive reaches: subjects tend to reach closer to the point of fixation. Finally, the magnitude of these errors depends on whether visual feedback of the reaching hand is available prior to movement onset (compare the top and bottom panels of ; see also
Beurze et al. (2007)).
While these bias patterns might seem arbitrary, they suggest an underlying mechanism. First, the difference in the sign of errors for visual and proprioceptive targets suggests that the bias arises in the transformation from a retinotopic (or eye-centered) representation to a body-centered representation. To see why, consider that in its simplified one-dimensional form, the transformation requires only adding or subtracting the gaze location (see the box labeled “Transformation” in ). This might appear to be a trivial computation. However, the internal estimate of gaze location is itself an uncertain quantity. We argued that this estimate relies on current sensory signals (proprioception or efference copy) as well as an internal prior that “expects” gaze to be coincident with the target. Thus, the estimate of gaze would be biased toward a retinally peripheral target. Since visual and proprioceptive information about target location travels in different directions through this transformation, a biased estimate of gaze location results in oppositely signed errors for the two signals, as observed in . Furthermore, because the internal estimate of gaze location is uncertain, the transformation adds variability to the signal (see also
Schlicht and Schrater, 2007), even if the addition or subtraction operation itself can be performed without error (not necessarily the case for neural computations,
Shadlen and Newsome, 1994). One consequence of this variability is that access to visual feedback of the hand would improve the reliability of an eye-centered representation (upper pathway in ) more than it would improve the reliability of a body-centered representation (low pathway in ), since the latter receives a transformed, and thus more variable, version of the signal. Therefore, if the final movement plan were constructed from the optimal combination of an eye-centered and body-centered plan (rightmost box in ), the presence of visual feedback of the reaching hand should favor the eye-centered representation. This logic explains why the visual feedback of the reaching hand decreases the magnitude of the bias for visual targets (when the eye-centered space is unbiased; ) but increases the magnitude of the bias for proprioceptive targets (when the eye-centered space is biased; ).
Together, these ideas form the Bayesian integration model of reach planning with ‘parallel representations’, illustrated in . In this model, all sensory inputs related to a given spatial variable are combined with weights inversely proportional to their local variability (
Equation 1), and a movement vector is then computed. This computation occurs simultaneously in an eye-centered and a body-centered representation. The two resultant movement vectors have different uncertainties, depending on the availability and reliability of the sensory signals they receive in a given experimental condition. The final output the network is itself a weighted sum of these two representations. We fit the four free parameters of the model (corresponding to values of sensory variability) to the reach error data shown in solid lines in . The model captures those error patterns (dashed lines in ), and predicts the error patterns from two similar studies described above (
Beurze et al., 2007;
Sober and Sabes, 2005). In addition, the model predicts the differences we observed in reach variability across experimental conditions ().
These results challenge the idea that movement planning should begin by mapping the relevant sensory signals into a single common reference frame (
Batista et al., 1999;
Buneo et al., 2002;
Cohen et al., 2002). The model shows that the use of two parallel representations of the movement plan yields a less variable output in the face of variable and sometimes missing sensory signals and noisy internal transformations. It is not clear whether or how this model can be mapped onto the real neural circuits that underlie reach planning. For example, the two parallel representations could be implemented by a single neuronal population (
Pouget et al., 2002;
Xing and Andersen, 2000;
Zipser and Andersen, 1988). Before addressing this issue, though, we consider the question of how single neurons or populations of neurons should integrate their afferent signals.