“Relative Value” Biases Choice

illustrate psychometric functions (PMFs) depicting the observed proportion of T1 choices as a function of signed motion coherence. A separate PMF is plotted for each of the four reward conditions: high-high (HH; large rewards available for both targets), low-low (LL; small rewards available for both targets), high-low (HL; large reward for target 1 and small reward for target 2) and low-high (LH; vice-versa) (see

Methods). The sigmoidal curves are logistic regression fits to the observed data (

Methods). depict data from a representative experiment for monkey A and monkey T respectively. depict the average PMF across all behavioral sessions for monkeys A (n

=

33) and T (n

=

24) respectively.

Two features of the data in are notable. First, the PMFs for the unbalanced reward conditions are shifted horizontally with respect to the balanced conditions, revealing a systematic choice bias for the larger reward. Both monkeys chose T1 more frequently when it was associated with a high reward relative to T2 (black symbols and lines), and chose T2 more frequently for the converse condition (green symbols and lines). Second, the observed behavior for the balanced conditions (HH and LL reward conditions—red and blue circles, respectively) is nearly identical, indicating that the monkey's probability of choosing T1 is unaffected by changes in the absolute size of the reward. One might have expected the PMFs to steepen for the HH condition if the monkeys were motivated by the larger rewards to discriminate the motion stimulus more carefully. Instead, both monkeys appear to discriminate as well as they possibly can for both conditions, suggesting a high baseline level of motivation throughout the experiments. Both monkeys, however, were significantly more likely to break fixation during LL trials as compared to HH trials (LL trials: monkey A, 2.73%+/−0.13; monkey T, 3.19+/−0.35; HH trials: monkey A, 1.77%+/−0.12; monkey T, 1.66%+/−0.27; two-sample t-test, monkey A, p<10∧-4, monkey T, p<0.002).

For both monkeys, average behavior across all experiment sessions () was very similar to the individual session examples (). Thus the effects of coherence and reward size on psychophysical choices were robust and consistent within and across the data sets for the two monkeys. As we have reported previously

[42], the observed choice biases are nearly optimal in terms of maximizing reward collection across an experimental session. On average, both monkeys harvested rewards at ~98% of the theoretical maximum given their underlying psychophysical sensitivity to the motion stimulus.

The Representation of Choice, Absolute Value, and Relative Value in LIP

Our analysis of LIP activity during this task revealed three primary effects that varied dynamically over the course of a typical trial: 1) the well known effect of decision outcome (choice), particularly in the later stages of the trial, 2) an effect of the “absolute value” of the target in the neuron's RF (T1) irrespective of the value of T2, and 3) an effect of the “relative value” of the target in the neuron's RF (whether it was larger than, smaller than, or equal to the value of T2). In the following three sections, we will illustrate each of these effects and its dynamics qualitatively by inspection of average PSTH's for each monkey. In the fourth section we will analyze these effects quantitatively by means of a multiple regression analysis.

As described in

Methods, we always positioned one response target (T1) within the RF of the neuron under study, while positioning the other target (T2) 180° away in the opposite hemifield. The axis of stimulus motion was defined by these two target positions so that motion discrimination choices corresponded to saccades into or out of the RF. In the following sections, we denote choices into the RF as “T1 choices” and those to the opposite target as “T2 choices”.

Representation of Choice: Qualitative Description

(monkey A) and 3B (monkey T) depict mean LIP firing rate as a function of time for all successfully completed trials in the HH (red) and LL (blue) reward conditions. Data are plotted separately for trials in which the monkey chose T1 (solid lines) and T2 (dashed lines). Both 3A and 3B consist of two panels: a left panel with responses aligned to the time of target onset and a right panel (labeled “late delay epoch”) with responses aligned to the time of the saccade. The black vertical lines in both figures denote relevant task epochs.

Note first that in both 3A and 3B, the solid and dashed lines are initially identical (for each color), but diverge approximately 200 ms into the motion period. Thus, shortly after the onset of the motion stimulus, LIP neurons in both monkeys begin to signal choice—whether the monkey will choose T1 or T2. This result is not surprising. We explicitly selected for study neurons that responded differentially to oppositely directed eye movements in the delayed saccade task, and it is well known from previous work that such LIP neurons typically exhibit “choice predictive” activity during a variety of forced-choice tasks

[30],

[49],

[51]. The data in demonstrate that this property of LIP neurons holds for a task in which decisions are based on a combination of visual motion and reward information. The effect of behavioral choice in the LIP data is robust, consistent across neurons and monkeys, and present for all four reward conditions as demonstrated below.

An unanticipated difference between the two monkeys was the absence of an initial visual “burst” in monkey T. The burst was absent during the delayed saccade task as well (data not shown). Although LIP neurons lacking the visual burst have been observed by our lab and others previously, we have never recorded from a monkey in which the burst appeared to be absent across the population. We do not believe this result is due to oversampling from a few unusual locations in LIP; our recordings sites were reasonably widely distributed along the lateral bank of the intraparietal sulcus. While this appears to be a genuine difference between the monkeys, it does not affect any of our key results pertaining to the accumulation of motion information or the influence of reward condition on LIP activity since these results are present in both monkeys.

Representation of Absolute Value: Qualitative Description

Any differences in neural activity between the HH and LL conditions indicate an effect of absolute reward value since the relative reward value (compared to the value of T2) is identical in the two conditions. By comparing the red and blue lines in we can see the extent to which LIP represents absolute reward value. Consider first the data from monkey A in . The solid red and blue traces (T1 choices) separate with very short latency following presentation of the reward cues at 250 ms. Thus the LIP population rapidly encodes the absolute value of T1, producing elevated firing rates when a high value target is presented within the RF. Following their initial separation, the red and blue traces converge briefly near the beginning of the motion epoch, but then separate again for the duration of the trial. Qualitatively, then, except for a brief interval near the onset of the motion stimulus, LIP neurons from monkey A encode a signal concerning the absolute value of the reward available in the RF throughout the trial. Note that this representation of absolute value is present for T2 choices as well (dashed traces).

shows a similar pattern of activity for the LIP population recorded from monkey T. Even though LIP activity in monkey T does not respond as rapidly or robustly as in monkey A, all major features of the absolute value signal observed in monkey A are replicated in monkey T: 1) the effect of absolute value begins during the reward cue period, 2) greater absolute value is represented by higher firing rates, 3) the effect is maintained until the end of the trial and 4) the effect is present for T2 choice trials as well. A minor difference is that the absolute reward signal does not “disappear” at any point in the trial for monkey T. It is interesting that absolute reward value exerts a substantial effect on LIP activity even though it exerts little if any effect on choice (). We will consider this point further in the simulations and in the

Discussion.

Representation of Relative Value: Qualitative Description

As revealed by the behavioral data, the relative reward value of the two targets exerts a substantial impact on choice behavior. We first examine the effect of relative value on LIP by comparing neuronal responses in the HH and HL reward conditions. In these conditions, the value of T1 is constant (high value) while the value of T2 differs (high in HH, low in HL). Thus, any LIP modulation between these two conditions indicates a relative effect of T2 value on the response to the high value target present in the RF. depict LIP responses for monkeys A and T, respectively, to the HH (red traces) and HL (black traces) reward conditions. The format of these figures is identical to , and the red curves are the same as in .

In , the black and red traces separate late in the reward cue epoch, with the average firing rate being higher for the HL condition (arrow). This difference indicates that on average, LIP neurons respond more strongly to a target in the RF (T1) when it has a larger value *relative* to that of the T2 target. This “relative value” signal is present throughout most of the motion epoch but disappears early in the delay epoch, after the choice has presumably been determined. The same dynamics are evident both for T1 and T2 choices (solid and dashed lines, respectively).

A similar pattern of activity is present for the population data from monkey T, illustrated in . As for monkey A, the relative reward signal emerges late in the reward cue epoch (black arrow), with average firing rate being higher for larger relative value. For monkey T, however, the relative reward signal fades more rapidly than for monkey A. Additionally, for T1 choices, the relative reward signal inverts during the second half of the motion epoch and remains inverted throughout the delay epoch. This inversion is not present for T2 choices, however.

We can acquire a second look at the effects of relative reward by comparing the LL and LH reward conditions. As in the previous comparison of HH and HL trials, the value of T1 is identical (low) for the LL and LH conditions. The two conditions differ only in the value of T1 *relative* to the value of T2, which is equal in the LL condition but low in the LH condition. Again, any modulation of LIP activity between these two conditions comprises a signal of relative reward value.

compare average LIP responses in the LL (blue traces) and LH (green traces) conditions for monkeys A and T, respectively. Note that the blue curves in these figures are the same as the blue curves in . The data for monkey A show an effect of relative reward similar to that seen in . The green trace drops below the blue trace during the reward cue epoch (black arrow), indicating again that average LIP firing rates fall as the relative value of the target in the RF decreases. The green and blue traces converge again during the motion period and remain together throughout the delay period, indicating a diminished representation of relative reward. As shown in , the effect of relative reward is similar, although weaker, in monkey T (black arrow).

Quantifying LIP Dynamics: Absolute Value, Relative Value, Motion Coherence and Choice

As is evident from the qualitative evaluation above, LIP population responses are highly dynamic, representing behaviorally relevant variables to differing degrees at different times during the trial. To quantify these trends we applied a multiple-variable, linear regression model to LIP activity over a sliding temporal window as described in

Methods. For each LIP neuron we applied the model (equation 3) to the average firing rate over a 50 ms window that was progressively slid, in 1 ms intervals, across the duration of a trial. This generated a time vector of coefficients (β

_{coh}, β

_{t1} and β

_{t2} and β

_{choice}) for each neuron describing the influence of each factor on the mean firing rate at successive time points. Because the values of the different variables were scaled appropriately (−1 to +1), comparison of the coefficients provides an accurate comparison of the effects of each variable on the firing rate of LIP neurons.

plot the mean regression coefficients (± s.e.m.) across neurons, for β

_{coh} (black), β

_{t1} (red), β

_{t2} (blue) and β

_{choice} (green) as a function of time for monkeys A and T respectively. The same basic trends are evident in the two monkeys, although the coefficients are smaller and more variable in monkey T. The smaller coefficients in monkey T result from the lower overall firing rate modulation (see –); the greater variance results partly from the smaller sample size (monkey A: n

=

51; monkey T: n

=

31) and partly from greater intrinsic variability between neurons in this animal.

The quantitative data confirm the general impressions derived from qualitative inspection of the average firing rates in –. The first variables to be reflected in the dynamics of LIP firing rates are the absolute reward value of the target in the RF (β

_{t1}) and the value of the target outside the RF (β

_{t2}), which indicates the effect of “relative reward” on firing rate. The effect of T1 value (red curve) rises with very short latency (~100 msec for monkey T; even faster for monkey A); the effect of T2 value (blue curve) arises more slowly, but is clearly present in both animals by the time of onset of the motion stimulus. The sign of β

_{t2} is predominantly negative because a high reward value for T2 decreases the probability of a T1 choice, and thus decreases the firing rate of the LIP neuron. For both animals, a small but significant reversal in the sign of β

_{t2} is present later in the trial—during the delay period for monkey A and late in the motion period for monkey T. Notably, both data sets also exhibit a significant but small effect of β

_{t1} (absolute reward) throughout the delay period, in contrast to the report of Dorris and Glimcher

[32]. This is a significant observation that we will consider further in the

Discussion.

Following onset of the motion stimulus, the effects of motion coherence (β

_{coh}, gray curve) and behavioral choice (β

_{choice}, green curve) arise—essentially simultaneously given the time resolution of our analysis—with a latency of approximately 200–250 msec, as reported previously

[22],

[49],

[51],

[52]. Thus the decision appears to begin forming in the system as soon as evidence about the direction of stimulus motion is present in LIP. Interestingly, the effect of motion coherence abates near the end of the motion period and is completely absent during the delay period. Under the conditions of our experiment, therefore, information about stimulus seems to be discarded once the decision is formed, consistent with previous observations by Roitman and Shadlen (

[22]; their ). As the effects of coherence and target value diminish during the delay period, the effect of choice continues to grow, reaching its peak immediately before the operant saccade. For both monkeys, the peak effects of choice near the end of the trial are nearly equal to the peak effects of absolute value near the beginning.

Quantitatively, our coherence effects, although highly significant, are smaller than those reported in previous studies of LIP. Shadlen and Newsome

[49] reported that a range of coherence from 0% to 51.2% modulated LIP activity by 2.7 spikes/sec for T1 choices and 4.2 spikes/sec for T2 choices. Based on our regression model of LIP activity (Equation 3; β

_{coh}), we calculate that the range of coherences employed in our study (0%–48%) modulated LIP activity by 2.0 spikes/sec in monkey A and by 0.78 spikes/sec in monkey T. (Because we fit the data with a single model (eq. 3), we did not obtain separate estimates for T1 and T2 choices). Roitman and Shadlen

[22] reported substantially larger modulations for the same coherence range: 13.2 and 5.2 spikes per second for T1 and T2 choices, respectively (their ).

Possible Effects of Eye Movements

The operant saccades to T1 or T2 targets can vary slightly from trial to trial in latency, amplitude, velocity and accuracy. Thus it is possible that these small variations in saccade parameters might account for the change in neural response we have associated with absolute value, relative value, and motion coherence. To assess this possibility, we extended our linear regression model to incorporate various parameters of the operant saccade. For each trial, we calculated five parameters from the stored eye position traces: latency, amplitude, accuracy, maximum speed and duration. We included these factors, along with factors for absolute value, relative value, motion coherence and choice, in an extended regression model given by equation 4 (

Methods). We fitted this model, and the original model as well (equation 3), separately to the mean firing rate during three trial epochs: reward cue (250–500 ms), motion (500–1000 ms), and late delay (1000–1550 ms). For all epochs in each monkey, the average values of β

_{coh}, β

_{t1}, β

_{t2} and β

_{choice} were unaffected by inclusion of the saccade parameters in the regression model (paired t-test, p>0.05). Coefficient values sometimes changed significantly for individual experiments after including the saccade parameters in the model, but the direction of the change was not systematic (values could increase as well as decrease) and changes occurred rarely (reward epoch β

_{t1}: 4.87%, β

_{t2}: 3.65% of cells; motion epoch β

_{t1}: 4.87%, β

_{t2}: 3.65%, β

_{coh}: 6.09%, β

_{choice}: 24.29% of cells; delay epoch β

_{t1}: 7.35%, β

_{t2}: 10.9%, β

_{coh}: 1.21%, β

_{choice}: 8.5% of cells). We therefore conclude that variation in saccade metrics does not explain the response modulation accompanying variations in T1 value, T2 value and motion coherence.

Do Individual LIP Neurons Integrate Sensory and Value Information

The data in show that LIP neurons, on average, are influenced simultaneously by several variables—absolute value, relative value and motion coherence—and that the relative influence of these variables changes dynamically during the trial. The averaged data presented thus far, however, do not address the issue of whether these variables are similarly mixed at the level of single neurons, or whether population multiplexing emerges from averaging across neurons which are individually more selective. To address this issue, we analyzed data within the late motion (750–1000 ms) and early delay epochs (1000–1300 ms) to determine how many neurons exhibited significant regression coefficients for one factor alone, any two factors, or all three factors. depicts the results for the two epochs; data from monkeys A and T are shown in blue and red, respectively. In the late motion epoch a substantial number of neurons in both monkeys were influenced by only one factor, but a roughly equal number of neurons represented multiple factors simultaneously. By the early delay epoch, however, most neurons were influenced by multiple factors. Evidently, the dynamic multiplexing of signals in the average data is characteristic of single LIP neurons as well.

Relation to the Integrator/Accumulator Model of Decision-Making

For the motion discrimination task with balanced rewards, both psychophysical performance and neural activity in LIP have been modeled by a process in which noisy information is integrated over time

[22],

[23],

[52]. In these models, temporally varying motion information originating in visual area MT is accumulated by competing pools of LIP neurons. In reaction time experiments, a response is triggered when one of the accumulators reaches a bound. In experiments in which the duration of the integration period is fixed by the experimenter, as in the present study, two possibilities have been discussed. According to the first

[23],

[52], a bound is still used, and the response is determined by the accumulator that reaches the bound first. According to the second (also considered by

[11],

[12],

[15],

[23],

[42]) the state of the accumulators continues to evolve until the go cue is presented, at which time the accumulator with the largest activation is selected. We couch the following discussion in terms of the first of these two possibilities, returning to the second possibility below.

Applying the bounded integration model to the behavioral paradigm of our experiment, as sketched in , two pools of LIP neurons, representing the leftward and rightward saccade targets, would accumulate information from pools of leftward and rightward direction selective neurons in MT. A decision would be reached when the accumulated signal in one pool of LIP neurons reaches the bound.

The accumulation process is schematized by the cartoon of . This trace illustrates an idealized average firing rate for one pool of LIP neurons under balanced reward conditions. LIP activity departs from steady state shortly following the onset of the motion stimulus (time 0), integrating incoming motion information until a bound (dashed line) is reached. Under balanced reward conditions, the two accumulators compete on equal footing (the other accumulator is not shown), and the outcome of the decision process is therefore determined by the relative strength of the motion input to the two LIP accumulators plus stochastic variability in the sensory evidence and in the accumulation process itself. In the unbalanced reward conditions (HL and LH), decisions are biased strongly toward the higher value target (), but the neural mechanisms underlying this behavioral bias are unknown.

illustrate three possible mechanisms suggested by Diederich and Bussmeyer

[1] that could account for the choice bias in the unbalanced reward conditions. The first possibility () is that the reward cue produces an offset in the initial value of the accumulator, granting a relative advantage to the accumulator corresponding to the high value target. In the HL condition, for example, the high value target is in the RF of the LIP pool under study, and the offset is thus positive relative to the other accumulator, which is in the LH condition (compare the black and green traces). The accumulator with the positive offset will therefore tend to reach the bound sooner, resulting in more choices of the RF target in the HL condition. Conversely, the RF target will be at a relative disadvantage in the LH condition (green trace), resulting in more choices of the non-RF target. A second possibility () is that the reward information has no effect on the starting point of the accumulation process, but rather affects the drift rate of the diffusion process by contributing an additional input to the accumulator when the high value target is in the RF (black trace) and/or a negative input to the accumulator when the low value target is in the RF (green trace). An effect of payoff information on drift rate would increase the slope of the accumulator activation curve for the HL condition and/or decrease the slope for the LH condition. Both effects would increase the likelihood of a choice of the high value target. A third possibility () is that the reward information affects the bound, not the state of the accumulators. Thus the bound would be lowered when the high value target is positioned in the RF (HL bound, ) and raised when the low value target is in the RF. These potential mechanisms, of course, are not mutually exclusive, nor are they exhaustive.

Our LIP data allow us to evaluate contrasting predictions of the first two candidate mechanisms. illustrates data from monkey A and monkey T that are directly analogous to the idealized traces for the HL and LH conditions in . These data are for T1 choices and appeared previously in and ; the key elements of the data are reproduced here for ease of comparison, focusing on the motion presentation epoch (time 500–1000) when the accumulation process actually occurs. For both monkeys, it is clear that the traces are offset with the expected sign during the first 200 milliseconds of the motion epoch, confirming the prediction of the “offset” mechanism illustrated in .

The data in do not conform to the prediction of the “drift rate” mechanism in . In fact, the slope appears *shallower* for the HL condition compared to the LH condition. However, these traces are averaged across all motion coherences for each animal (T1 choices only). The HL traces are thus enriched in low coherence stimuli compared to the LH traces because of the strong behavioral bias toward T1 choices in the HL condition. In the LH condition, there are fewer T1 choices overall, and these T1 choices tend to occur when the motion information is sufficiently strong (high positive coherences) to override the reward bias. Strong positive coherences will drive the accumulation process more rapidly than weaker coherences, leading to the slope effect observed in .

To factor out the effect of coherence on the accumulation slopes, we first normalized firing rates within each stimulus condition (signed coherence) before averaging across trials to obtain PSTH's for each reward condition (see

Methods for details). To visualize the normalized data, we then averaged the resulting PSTH's for a given coherence across the population of neurons for the HL and LH reward conditions. show the results of this analysis for +48% coherence, which resulted in T1 choices on nearly every trial for both monkeys (). On the horizontal axis, the trials are aligned to the time of the ‘dip’ measured for each neuron (see

Methods). On the vertical axis, all firing rates begin at zero at the dip, as described in

Methods. Thus the curves illustrate the accumulation process for the HL (black) and LH (green) reward conditions from the time of the dip to the end of the measurement window (defined in

Methods). The basic result is clear and somewhat surprising, even though the data are noisier for monkey T (due in part to the smaller number of neurons contributing to the analysis). The traces for the two reward conditions are indistinguishable for the first 200 milliseconds following the dip, contradicting the prediction of the drift rate mechanism in . Similar trends are evident when the data are averaged across all positive coherences as illustrated in . The slope for HL still becomes shallower than the slope for LH, but only toward the end of the motion integration period. To confirm this impression statistically, we measured the slope of the LIP PSTH's in the HL and LH conditions for each neuron, after averaging traces like those in across all positive coherences for each cell (see

Methods). During the first 200 milliseconds following the dip, there was no significant difference in the distribution of slopes in the HL and LH conditions for either monkey (paired t-test, p>0.05 for both monkeys). When slopes were calculated across the entire motion integration period (as defined in

Methods), however, the distributions differed significantly between the HL and LH conditions in both monkeys, with slopes being

*shallower* in the HL condition (paired t-test, p<0.002 for monkey A, p<0.02 for monkey T).

The evidence in and provides direct support for the view that the hypothesized LIP accumulator starts higher in the HL condition than in the LH condition, and that the drift rate of the accumulators is initially unaffected by the reward condition. As we shall discuss more fully below, the shallower slope for the HL condition toward the end of the integration period is consistent with the presence of an integration bound, which is reached sooner in the HL than in the LH condition. With this encouragement, we conducted mathematical analysis and simulations that we now describe to determine whether a bounded integration model can account for our experimental data—both behavioral and physiological.

Mathematical Analysis and Simulation

For several reasons, the analysis above points toward a bounded integration model, with relative reward affecting the starting point of the integration process. However, the behavioral data pose an immediate challenge to the bounded integration model. The presence of the bound renders an integration process imperfect because it limits the amount of information accumulated, and this can produce distortion in the pattern of behavioral results. This distortion is most obvious if we consider the HH vs. LL conditions in the data for monkey A. These data indicate that the accumulators start closer to the bound in HH than they do in the LL condition. In that case, the bound is reached sooner on average in the HH condition; less information is therefore integrated, with the result that behavioral performance should be less accurate. Yet there is no difference in the behavioral performance between the HH and the LL conditions ().

To address this issue, we must consider two distinct possible sources of variability in the decision process. The first of these—and the one generally receiving the greater emphasis in the literature—is moment-by-moment noise in the input to the accumulators. Let us consider an accumulator model with two accumulators racing to a decision bound. One can characterize the input to each accumulator by the following simple equation:

Equation 5:

where

*a(t)* represents the activation of the accumulator at time t, α is an integration rate parameter (it also indicates the sensitivity to stimulus), C represents the coherence such that positive values excite the accumulator, η(t) represents a sample of noise from the standard normal distribution taken at time t, and σ

_{w} is a scale factor representing the standard deviation of the within-trial, moment-by-moment noise in the integration process. For our case, we are considering a situation in which there are two accumulators, one for each alternative. Equation 5 applies to the accumulator corresponding to the neuron recorded in the physiological experiment. For the other accumulator, C is replaced by –C, so that values exciting one accumulator are inhibiting the other. For such an accumulator model, the distortions discussed above arise and preliminary simulations (not shown) resulted in very poor fits to the behavioral data.

While some models include only the moment-by-moment variability discussed above, others include a second source of variability, namely between-trial variability in the strength of the sensory evidence reaching the accumulators. This idea was first employed by Ratcliff

[5] in accounting for human behavioral data, and has since been incorporated in many other models, including the LATER model, which has been used to account for correlation between the slope of activation in FEF and latency of eye movement responses

[18],

[53][54]. In our case, we capture between-trial variability in the motion-dependent input signal to the accumulators by assuming that the value of C is perturbed, for the duration of a whole trial, by a sample from the standard normal distribution scaled by σ

_{b} (the between-trial standard deviation parameter), so that the integration equation becomes:

Equation 6:

where

*C' = C + σ*_{b}η, represents the perturbed value of

*C*. For the other accumulator we replace

*C'* with

*–C'*, so that the same perturbation affects the input to both accumulators. Importantly, we do not ascribe this between-trial variability to any particular cortical area or processing stage. It may originate in the motion output of MT (due, for example, to the stochastic stimuli employed in these experiments) or to any additional stage of the pathways linking MT to LIP.

A consequence of between-trial variability is that the accuracy of the outcome of the information integration process is less dependent on the duration of integration. It can be shown that the mean of the accumulated sensory information is a simple linear function of *t*,

Equation 7:

while the standard deviation in the accumulated information after

*t* seconds is given by

Equation 8:

As a result the signal-to-noise ratio can be expressed:

Equation 9:

Two points follow from this equation. First, if between-trial variability is high relative to within-trial variability, the signal-to-noise ratio can easily be dominated by the between-trial variability. Second, as time goes by, the relative importance of within-trial variance decreases. Thus, if between-trial variability is relatively high, as long as the accumulation process starts far enough from the decision bound, starting even further from the decision bound can make very little difference in the accuracy of behavioral choices. Based on this insight, our bounded integration model incorporates the assumption that between-trial variability is relatively high.

Simulation Model Details

We simulate data from monkey A, for whom we have the largest and cleanest data set. As we shall see, it is possible to provide a good qualitative fit to the data from this monkey within the framework of the ideas described above. After considering monkey A, we will return to consider the data from monkey T, which is both noisier and more perplexing in certain ways.

Our model shares many features with the LIP portion of the model presented by Mazurek et al

[23], but we do not directly simulate the sensory inputs from MT. Rather, we simply consider the input to the accumulators to have both within and between trial variability as indicated above. Our simulation incorporates the following features:

- The starting point of each of the two accumulators is affected by both relative and absolute reward. In our simulations, the starting point of the T1 accumulator is initialized to the empirically observed activation level at the time the motion stimulus begins to affect activation (the “dip”), approximately 200 msec after motion onset. The starting value assigned to the T2 accumulator is based on the empirically measured T1 values, assuming that the values for T2 are symmetric to those measured for T1: (T2 HL = T1 LH, T2 LH = T1 HL; T2 LL = T1 LL, T2 HH = T1 HH).
- The information accumulation process is affected by both within- and between-trial variability and also by an urgency signal. The activation of the T1 accumulator is updated according to:
Equation 10:

Where, as previously discussed,

*C'* is equal to the stimulus coherence C perturbed by a sample of Gaussian noise with standard deviation

*σ*_{b}. For the T2 accumulator,

*C'* is replaced with –

*C'*. Within-trial, moment-to-moment Gaussian noise with standard deviation

*σ*_{w} is added independently to each of the two accumulators. The parameter

*b* is a positive constant reflecting an overall tendency for activation to increase during the motion period, corresponding to the “urgency” signal of Mazurek, et al.

[23] and other investigators. (See the caption of for parameter values.)

- Information integration occurs for a period of time equal to the duration of the motion stimulus unless the bound is reached before the end of the integration period (see next). In comparing the simulation to data, we treat integration as beginning after a 200 msec propagation delay, so that the simulated processing interval corresponds to the period from 200 to 700 msec post stimulus onset approximately.
- Integration is bounded, so that when the activation of one accumulator reaches the bound value θ, integration of sensory information in both accumulators ceases. The bound is viewed, not as an upper limit on neural activity, but as an internal benchmark on activation, such that when this benchmark is reached, the process of integration ceases, affecting both accumulators equally. Although integration ceases when the bound is reached, the influence of the urgency signal continues until the end of the trial.
- The behavioral choice is assigned to the accumulator whose activation value is highest at the end of the motion period. In cases where the bound is reached, the winning accumulator always corresponds to the accumulator that reached the bound and caused integration to cease.
- Parameter values were estimated according to the following procedure. Parameters
*a* and *σ*_{b} were first approximated by selecting values that permitted a good fit to the behavioral data ignoring the effect of the bound and of the within-trial variability *σ*_{w}, which has a negligible effect after 500 msec of integration. These approximate values could be directly estimated without the need to run the simulation. Estimation of the urgency signal, *b*, and the activation bound value, θ, required searching the parameter space via simulation. Simulation results reported in the figure were based on 25000 simulated trials for each of the 52 combinations of stimulus and reward conditions.

Simulation Results

shows the results of the simulation (right column), along with the comparable psychophysical and neural data from Monkey A (left column). The model captures several important features of both the behavioral and neural data. We begin with a consideration of the behavioral performance: The probability of a T1 choice as a function of coherence is identical in the HH and LL conditions (), even though the accumulation process starts at a higher level in the HH than in the LL condition (); the choice curves for HL and LH conditions are simply shifted to the left or right compared to either the HH and LL curves, in the simulation as in the behavioral data. These patterns are expected based on our analysis above. The high between-trial variability in the drift parameter is the dominant source of variability affecting the choice outcome, so that the outcome is relatively immune to the location of the bound. In essence, the slopes of the behavioral curves depend on the ratio of the parameters *α* and σ_{b}, and the bound on integration has relatively little importance.

The relative positions of the curves along the x-axis reflect the difference in the starting values of the accumulators, which persists throughout the integration period. Ignoring the bound, and taking the choice to be determined by the accumulator that is more active at the end of the motion period, the magnitude of the shift can be directly calculated from the ratio

*S*_{d}/σ(500), where

*S*_{d} is the difference in the starting points of the accumulators (

*S*_{d}=S1–S2) and σ(500) is the standard deviation of the difference in activation of the two accumulators at the end of the motion period, see Equation 8. Once again, because of the high between-trial variability, the presence of the bound and the within-trial variability has a negligible effect on behavior, with the result that the curves are simply shifted left or right by an amount determined by the above ratio.

Because the bound does not impact behavior, an alternative account of the behavioral data would be to suppose that there is no bound on the integration process, and that the monkey simply chooses the most active accumulator at the time he receives the signal to respond. We would not rule out such an account, and we consider such a possibility further in the

Discussion. However, including a bound on integration helps us to account for many features of the physiological data, which we now consider:

- For trials ending in T1 choices, the model captures the negative acceleration (i.e. saturation) of the slopes of the neural activation curves near the end of the motion integration period (). This negative acceleration reflects the effects of reaching the decision bound, which occurs on many but not all trials. For the activation bound parameter value that was used in the simulation, one of the two accumulators reaches a bound on approximately 70% of the trials on average, although the fraction increases with the absolute value of the stimulus coherence C and when the direction of motion is congruent with reward bias. The bound is reached at different times on different trials, accounting for the gradual flattening of all four activation curves for T1 choices.
- In the model as in the data, the neural activation curves for T1 choices in the HL and LH conditions converge noticeably but not completely during the motion integration period (, compare the solid green and solid black curves). The difference in the T1 activation curves is due in part to the different mixture of coherences contributing to T1 choice trials as discussed above in conjunction with , and also to the fact that activations tend to reach the T1 bound, and thus stop growing, sooner (and more often) on average for HL than for LH choices. Convergence to exactly the same level would be expected if the bound was reached on all T1 choice trials. In the model, however, the bound is not actually reached on all trials (point 4 above); on these trials the decision is simply cast in favor of the accumulator with the highest activation level (point 5 above). Thus the T1 activation curves in the model tend toward convergence in the HL and LH conditions without actually reaching the same level.
- In both the model and the data, the HL and LH activation curves converge for T2 choices as well (). The same factors that affect convergence of the T1 choice curves are also in play in the T2 choice curves.
- In the model, there is a subtle trend toward convergence of the HH and LL curves for T1 choices (); this effect is due to the fact that accumulation terminates at the bound for the T1 accumulator sooner on average (and on a higher number of trials) in the HH condition. The effect is subtle because the initial offset between the HH and LL curves is smaller than between the HL and LH curves, resulting in a smaller difference in termination times, and because there is no difference in the mix of coherence values terminating in T1 choice for the HH and LL conditions. In the data, the HH and LL curves are similar in shape for T1 choices, as in the simulation; the difference between the curves seems slightly smaller toward the end of integration than at the beginning. While the effect in the data is unlikely to be statistically reliable, the subtlety of the effect in the simulation is such that a statistically reliable effect in the data would not be expected.
- The model reproduces the rising slope of the accumulation curves for T2 choices near the end of the motion period in all four reward conditions (, dotted curves). This is due to the “urgency” signal represented by parameter
*b* in Equation 10, which continues to affect activation in the model after integration stops. The urgency signal captures the intuition that a premium exists on reaching decisions within a finite time, even on low coherence trials when evidence may accumulate very slowly [23]. Thus both accumulators are driven toward their bounds at a slow but steady rate throughout the trial, independent of evidence accumulation. This factor is less apparent earlier in the trial, where activations reflect both the stimulus effect and the urgency signal.

Overall, the simulation captures both the behavioral data and most of the main features of the physiological data from monkey A. Thus, for this monkey at least, the behavioral and physiological findings appear to be consistent with the hypothesis that reward affects the starting point of an integration process that is subject to high between-trial variability and that employs a decision bound placed such that it is reached only on a subset of trials.

We now consider briefly whether the model described here can account for the data from monkey T. As indicated earlier, several features of monkey T's data are consistent with monkey A's data and thus with the model: 1) both absolute and relative reward effects are present (–), 2) the offset in the starting point of the accumulation process is clear (), and 3) the dynamics of the accumulation process are similar to monkey A once the effects of coherence are controlled for (). The most perplexing aspect of monkey T's data is seen most clearly in . The firing rate trace for the HH condition (solid red curve) is initially lower than the trace for the HL condition (solid black curve), consistent with the relative reward effect. About 200 msec into the motion viewing period, however, the traces reverse order, with average firing rate becoming higher for the HH condition, and the reversal holds for the duration of the trial. This crossover would not be expected in our model; as in the data and the simulation of monkey A, we would expect the curve from the HH condition to converge toward, but not cross, the curve for the HL condition. The cross-over is anomalous, not only from the point of view of the bounded integration model, but also from the point of view of the overall pattern of findings, in which high relative reward of the RF target is typically associated with greater LIP activity.

A variety of approaches might be taken to account for this perplexing result from monkey T. For example, the data could be explained if we relax the assumption that the integration bound is kept the same across all of the reward conditions. If the bound were adjusted upward in the HH condition, keeping it low (and approximately constant) in all of the other reward conditions, the model might then provide a reasonably good approximate account of all facets of monkey T's data. We emphasize that this is only one possible account, and we do not specifically wish to advocate for it. We mention it only to make the point that there is at least one way to explain the anomalous results seen in with a bounded integration framework.