To parameterize performance on the direction discrimination task, we must express

*P*(

*T*) explictly in terms of motion strength or coherence

*C* (the fraction of random dots moving coherently in one direction) and other parameters that enter the DDM, as well as viewing time

*T*. In §2 we observed that the PMF predicted by the DDM depends on drift rate

*A*(

*t*) and noise level

*c* via

equations (7–

9). We now describe a model that relates these quantities to neural firing rates and to

*C*.

Following [

21,

22], and drawing on psychophysical, electrophysiological and computational studies of monkeys performing the motion detection task [

27,

28,

32], we assume that performance is based on the accumulation, in LIP or other areas, of coherence-dependent responses from motion-sensitive neurons in area MT. We set

(a bar is added to the

*μ* of [

21] to distinguish it from the function

*μ*(

*t*) of §2). Here

*M*_{c} and

*M*_{i} represent the accumulated responses of pools of MT neurons that respectively encode the correct and incorrect directions of motion;

is a dimensionless constant derived from neural recordings that scales the variance by the mean, and angle brackets

·

indicate expected values. Accumulated MT responses are modeled as (

*r*_{0} +

*aC*_{m})

*T*, where

*r*_{0} is the baseline firing rate (in spikes/s) for zero coherence and

*C* is motion coherence (which for simplicity is set to zero for sensors tuned for the incorrect direction). The mean responses in (11) therefore depend upon

*C* and

*T* as follows:

Note that the expected difference

is directly proportional to

*C*^{m}, i.e. the drift rate

*A* =

*aC*^{m} reflects the combined contributions of excited responses from the correct sensors and inhibited responses from the incorrect (opposite) sensors.

The neural parameters can be derived from direct recordings, and following [

21,

22], we take

*r*_{0} = 10 spikes/s and

= 0.3 for the remainder of this paper. The motion coherence

*C* of the stimulus is set by the experimenter, and the drift rate scale factor

*a* and exponent

*m* will be fitted to behavioral data. All the parameters in (10) being thus determined via (11–12), the integral can be evaluated explicitly as in (9) to yield

3.1 Effect of stimulus strength

The drift rate scale factor *a* has the most striking effect of any parameter in this paper, as illustrated in , where it varies from 2 to 20 in increments of 2 for *C* = 1 and *m* = 1. As we show in §4, increase in *a* also accounts for most of the improvement in performance seen in the data. The middle panel shows the rapid drop in threshold as *a* increases for four different viewing times *T*. As expected, shorter times have higher thresholds.

The coherence exponent

*m* allows nonlinear scaling of stimulus strength

*S*. Its value is near unity for trained animals [

21], as in the proportional-rate diffusion model of [

23], but it can differ from unity early in training. As

*m* decreases, high coherences are squeezed into the upper range of drift rates and low coherences are stretched over the lower range, implying that it is easier to distinguish among low coherences, as illustrated in . The left panel shows the change in accuracy with

*m*, for

*a* = 2 and three coherence values. Decreases in

*m* improve performance for all

*C* < 1, although the effect is stronger for small

*C* and performance is independent of

*m* for

*C* = 1. Threshold increases with

*m*: higher coherences are required to achieve the same accuracy.

3.2 Power-law drift rates

Gold and Shadlen [

21,

22] modify (12) as follows:

introducing the exponent

*n* to more closely examine the time dependence of behavioral accuracy (note that this differs from the power-law dependence of response time on stimulus strength in Piéron’s Law [

23]). Under this

equation (13) becomes

For trained animals

*n* is found to be near unity, implying linear accumulation of information over time as in

equation (12) [

21,

22]. This strategy is appropriate for the task because the visual stimulus provides motion information uniformly over time. However, earlier in training attention might wax or wane during motion viewing, resulting in values of

*n* that differ from one. For example,

*n* < 1 would imply that the subject extracts most of the motion information used for the decision early in stimulus viewing. Conversely,

*n* > 1 implies that more attention is paid later in stimulus viewing.

We now show that the

*T*^{n} time dependence in (14–15) is equivalent to a variable drift rate. To determine the functional form for

*A*(

*t*) that yields

*T*^{n} dependence, we observe that the PMFs are determined solely by the arguments of the error function in

equations (13) and

(9); i.e., the ratios

in the Gold-Shadlen and OU/DD cases respectively. Assuming that these expressions hold for all

*T* =

*t* ≥ 0 and equating them produces equations that determine, or at least constrain,

*A*(

*t*) and other parameters describing the OU/DD process.

For example, taking λ = 0 and assuming that

*ν*_{0} = 0 (a DD process with all trials starting from identical initial conditions

*x*(0) =

*μ*_{0}), we obtain:

which may be rearranged to read

Setting

*t* = 0 we find that

*μ*_{0} = 0 (the initial condition should be unbiased [

19]) and differentiating (18) with respect to

*t* yields the power law drift rate:

To match the two formulations given the parallel construction of the noise term above and in §2, we set

in the LHS of (17), so that

For

*n* = 1 we have the constant-drift DDM with

*A*(

*t*)

*aC*^{m}, and for any

*n* < 1

*A*(

*t*) has an integrable singularity at

*t* = 0 and decays as a power law with increasing

*t*. shows examples of drift rates computed from (19) and the corresponding PMFs (15) for

*n* ranging from − 0.4 to 1.4 and with

*C* = 1,

*m* = 1 and

*a* = 2, so that

(recall that

*r*_{0} = 10 spikes/s and

= 0.3 throughout).

With other parameters fixed, all accuracy curves cross at *t* = 1 because the argument of the error function is *aC*^{m}T^{(}^{n}^{+1)/2} (cf. (18)), although drift rates do not intersect at a common point. The *n* = 0.2 and 0.4 cases (dotted) show rapid increases in accuracy followed by a sharp leveling off, but, in addition to the singularity in the drift rate at *t* = 0, a potentially troubling result is that *P*(*T*) increases more rapidly for *T* < 1 as *n* decreases. It seems unreasonable that subjects that benefit little from longer viewing times, perhaps due to loss of attention, should outperform better integrators at short times. Since the exponent *n* appears only in the PMF as a power of *T*, this model always exhibits such “crossover” behavior. The thick solid line corresponds to constant drift (*n* = 1), and the dashed-dotted lines to values of *n* > 1.

The bottom row of illustrates the crossover behavior in terms of threshold and slope. For *T* < 1 (dash-dot and solid), threshold *increases* with *n*, indicating decrease in performance, although *n* → 1 ostensibly represents improvement. In contrast, the *T* = 1.4 *s* case shows a decrease in threshold as *n* increases, as expected. The interpretation of *n* thus changes with the choice of *T* for threshold, and for most viewing times, the interpretation is wrong. Note that the slopes all cross at *n* = 0, for which the psychometric function does not change with time.

3.3 Exponential drift rates

The temporal evolution of many neural and other biological process can be modeled by exponential functions. Moreover, differential equations linearized near equilibria necessarily have exponential solutions, prompting us to consider drift rates of the form

These have intial and asymptotic values

provided

*α* ≥ 0, and drift rates are constant if

*d* = 1 or

*α* = 0.

With zero initial bias and variance (

*μ*_{0} =

*ν*_{0} = 0) substitution of (22) into (9) yields:

The error function argument rises initially like

and then approaches

. Lowering

*d* from 1 toward 0 mirrors the decreased improvement in accuracy for longer viewing times for

*n* < 1 in (15). Of course, to see if such a form fits the data as well or better than power laws, the parameters

*b*,

*d* and

*α* defining

*A* must be related to coherence, baseline firing rates, etc. via assumptions similar to those of equations (11–15). Since fully-trained animals approximate the constant DDM with

*n* = 1, we set

*b* =

*aC*^{m} and

*c*^{2} = (2

*r*_{0} +

*aC*_{m})

as in (11–12), thus incorporating parameters available from direct experimental observation and matching the constant drift limit to the previous model. Hence,

*a* and

*m* remain to account for performance changes during training, and two parameters,

*d* and

*α*, replace

*n*.

Balancing detail with parsimony, we suggest several versions of this model. The parameter *d* can be set to 0, resulting in three parameters *a*, *m*, and *α*. Alternatively *α* can be fixed and *d* varied, but the choice of *α* remains problematic, which makes it not a true three-parameter model. Finally, all four parameters can vary, but as explained below, this typically causes overfitting problems.

– show examples of exponential drift rates and the corresponding PMFs (24). For we fix

*d* = 0,

*a* = 2,

*m* =

*C* = 1, and let

*α* vary from − 0.4 to 2. At

*t* = 0, the drift rate is

*b* =

*aC*^{m} for all

*α*, so all PMFs have the same initial slope. This removes the crossover of : poorer integrators with higher

*α* do not outperform better ones on shorter time scales, and accuracies retain the same ordering for all

*T*. Accuracies for high

*α* exhibit a downturn, which may appear counterintuitive. However, this is a direct consequence of the form (22) of

*A*(

*t*), since noise dominates as

*A*(

*t*) → 0 after the exponential decays. For

*d* = 0, the integrated evidence approaches a constant

and accuracy rises to a maximum and thereafter falls toward 50% at long times, following (24). This “rise-and-fall” shape, which cannot be produced using power laws, is useful for fitting data that require

*n* < 0 in the power law model of §3.2. In contrast to the high initial accuracy for

*n* < 0, the PMF for exponential drift rates starts at chance and approaches chance again for high

*α*, after reaching a maximum whose value depends on initial drift and decay. The maximum occurs at

*T*_{max} given by:

Apart from this key difference, the power-law and exponential models are broadly similar, each region of the power law *n* parameter space being reproduced in an appropriate *α* range. Thus, *α* < 0 corresponds to *n* > 1, *α* = 0 is equivalent to *n* = 1 (the standard DDM), 0 < *α* < *α*_{0} corresponds to 1 > *n* > 0, and *α* > *α*_{0} produces the eventual decrease in accuracy with time obtained for *n* < 0. The value of *α*_{0} depends on *d* and the viewing time at which decay is desired: in = 0 and *α*_{0} ≈ 0.7 for decreases in accuracy within two seconds. A final advantage of exponential decay is that, provided *α* ≥ 0, the drift rate is bounded by its starting value *aC*^{m} for all viewing times, thereby removing the physiologically unreasonable singularity at *T* = 0. This feature does not influence our data fits substantially, because all viewing times are greater than 300 ms early in training where *n* < 0 fits occur and greater than 105 ms throughout.

In the lower left panel of , threshold increases with *α* for all times *T*, as desired. A crossover occurs for large enough *α*, in which longer times give higher thresholds, apparently countering intuition. But this results from the rise-and-fall shape noted above: for faster decay rates, the PMF drops as time increases, although longer times are still better than *T* = 0.2. Hence the exponential drift model offers much of the desired behavior and flexibility.

In we fix *a* = 2 and *α* = *m* = *C* = 1 and vary the asymptotic drift rate *d* from 0.2 to 2, and allowing for increases (*d* > 1) or decreases (*d* < 1) with *t* (*d* = 1 corresponds to constant drift). For *d* = 0.1 the error function argument with rapid decay becomes

which decreases before ultimately increasing again like

as the first term dominates. One can view the function

as an integrated SNR, which in this case initially increases, then decreases, and finally increases again, providing the rise-and fall shape. As expected, thresholds decrease as

*d* increases.

Increasing drift rates are easily implemented by setting *d* > 1. In principle, *d* could also vary as a function of *C*, with rising drift rates for low *C* and falling ones for high *C*, although we do not explore this further.

3.4 An Ornstein-Uhlenbeck model

In §§ 3.2–3.3 we showed that the initial improvement in accuracy for shorter viewing times followed by leveling off can be obtained using decaying drift rates. This pattern can also appear for constant drift rates if the parameter λ ≠ 0 in (3). Returning to the general OU solution of (6–9) and setting the drift rate *A*(*t*) = *aC*^{m} and noise

Setting

*m* = 1 and

*μ*_{0} =

*ν*_{0} = 0 leaves two parameters:

*a* and λ, and substitution of (27) into (9) shows that accuracy depends only on the absolute value of |λ|, so we may take λ ≤ 0 or λ ≥ 0. (This property does not hold for

*μ*_{0} or

*ν*_{0} ≠ 0 or

*A*(

*t*) non-constant.) Here λ ≤ 0 was used, consistent with a stable process having a stochastic attractor [

33].

shows the effect of varying λ for fixed *aC*^{m}. The PMFs exhibit initial sharp increase in accuracy followed by leveling off, without crossover. Also, unlike previous models, accuracies do not asymptote at 1, but

Because the normal standard cumulative distribution function Φ(

*q*) is necessarily less than 1 for |λ| ≠ 0, OU processes can produce errors at the highest stimulus strengths without appeal to special procedures such as that of

Equation (34) in the next section.

3.5 Other parameters

Choice bias can be implemented by changing the starting point of the diffusion process. A fixed offset corresponds to a global preference for one decision versus the other. For example, by adding a parameter

*D* to

, with

*D* = ±1 for right- and leftward stimuli respectively, and

*μ*_{0} =

*D* in (18), when

*a* > 0 and

> 0 the PMF shifts upward for trials with rightward motion and downward for trials with leftward motion. A variable offset, which has been used to model fast errors on reaction-time tasks [

34], can also be included. Specifically, a term

_{var} represents the variance at

*T* = 0, so that

and

*σ*^{2} are given by:

Allowing variability in initial data (

*ν*_{0} ≠ 0) via the parameter

_{var}, (18) becomes

Setting

*t* = 0 we conclude that

and

For the OU case with λ ≠ 0, after multiplying through by *e*^{−λ}^{t}, (30) becomes

from which one also obtains

and

Lapses are errors caused by factors other than limitations in perceptual processing, including inattention and incomplete knowledge of the stimulus-response association. They are typically quantified as the error rate at the highest stimulus strengths (the upper asymptote of the PMF) but are assumed to contribute equally to performance at all stimulus strengths. To account for lapses, the PMF can be modified so that its upper asymptote changes from 1 to 1 − Λ while its lower asymptote remains at 0.5, cf. [

35]: