Bayesian model maximum a posteriori estimate (MAP To compute the posterior probability distribution over the four possible saccade choices we first consider Bayes’ rule which states:

*P(s|r)* is the conditional probability of observing a particular saccade choice given a particular discharge rate, also called the posterior.

*P(r|s)* is the conditional probability of observing a particular discharge rate when a particular saccade occurs. This value is known as the likelihood.

*P(s)* is the probability of a saccade choice or the prior.

*P(r)* is the probability of a particular discharge rate in (sp/s). A Bayesian framework provides a way to quantify guesses about events when faced with uncertainty. The probabilities in Bayes’ rule indicate the strength of a belief from 0 to 1. Since the probability of the discharge rate

*P(r)* is independent of the saccade choice, we can restate the posterior as proportional to the product of the likelihood and the prior (

Földiak, 1993;

Oram et al., 1998):

In words, the probability of a saccade choice given the observation of a particular discharge {

*P(s|r)*} is proportional to the conditional probability of the discharge given the saccade choice {

*P(r|s)*} multiplied by the prior {

*P(s)*}. Thus, the posterior is proportional to the product of the likelihood and the prior. Note that for display, we include the normalization factor:

*P(r)* so the scaling ranges from 0 – 1 and so that bona fide probabilities can be compared across conditions (

Oram 1998).

We implemented two prior {

*P(s)*} functions. In one, we used a discrete uniform prior with four Dirac delta functions at each of four possible target choices:

*P(s)* is a discrete prior function describing the four possible saccade target choices (

*s*) separated by 90°. The four delta functions (δ) for each choice are defined by shifts from the first by 90°(π/

*2*). We summed the four delta functions and multiplied by 0.25 (four possible choices) so the prior function had uniform probabilities for the four possible target/saccade choices (). This reflects the experimental situation used. Each of the four possible saccade choices represents one of the four possible target locations and each of these occurred with an equal (25%) probability.

For the second implementation, we used a simulated

*P(s)* and determined the distribution that maximized the MAP model’s performance. To do this, we generated four random values that summed to 1.0 to simulate probability distributions:

Equation 4 defines a discrete prior function as a summation of four delta functions as in

equation 3. Each value in

equation 4 (rand

_{1–4}) varied independently from 0.01 to 0.97 with an interval of 0.01. To ensure that the sum of this distribution was 1.0, we divided each of four random values (rand

_{i}) by the sum of all values

.This was then multiplied by the 90° shifted delta functions yielding a prior probability distribution that is a discrete, non-uniform distribution with four values, one for each of four possible saccade choices (). Because they were generated randomly,

equation 4 produced the same combinations in some cases. Therefore we selected only the unique combinations. This left a total of 156,941 unique combinations of four values. With the simulated prior distributions in hand, we then recomputed the MAP estimate using each one of these 156,941 simulated prior distributions. We indentified the simulated prior function that when used to recover the MAP estimate, resulted in the same or better prediction accuracy as the MAP estimate with the non-uniform prior function. The actual prior function that monkeys might use is unknowable. The prior function used, however, is likely to be related to the final choice behavior (distribution of saccade choices). To test this we calculated the distribution of saccade choices using the actual behavior of the monkeys on a trial by trial basis. In we show through simulations of multiple possible prior functions, that there is a strong relationship between the simulated prior distribution and the distribution of saccade choices. These correlations validate our use of the simulated prior to recover the MAP estimate of the saccade choice.

A critical aspect of computing the MAP estimate is how to determine

*P(r|s)*. In recent work it was shown that a good characterization of

*P(r|s)* can be obtained by assuming a Poisson probability distribution or any distribution of the exponential family with linear sufficient statistics (

Ma et al., 2006;

Beck et al., 2007). So our first approach was to use a Poisson probability distribution constrained by the tuning properties of our SC neurons to estimate

*P(r*_{1–4}|s). Indeed as a first approximation, our neurons behaved in a linear sufficient fashion as determined by assessing the relationship of the variance of action potential counts across trials to the mean of the action potential counts (see

Supplemental Figure 1a). We used the Poisson probability density function in place of the likelihood,

*P(r*_{1–4}|s) where lambda is the expected number of action potential occurrences in a Poisson probability distribution and in our implementation was the tuning curve {

*fi(s)* } of the

*i*_{th} neuron. The exponent and the denominator (

*r*_{i}) are the numbers of action potentials measured in a 20ms time epoch (28ms before to 8 ms before the onset of the saccade):

The posterior probability of a saccade choice (

*s*) given the discharge of all four neurons {

*P(s|r*_{1–4})} was estimated by computing the conditional probability for each of the four neurons and multiplying by the prior probability. In order to combine the neuronal activity linearly we made the reasonable assumption that the neurons in our sample were statistically independent. We describe how we deal with this assumption in

Supplemental Figure 1b,

Supplemental Figure 2 and in the results. To allow summation rather than multiplication, we took the logarithm of

equation (5)
can be ignored because it is independent of the saccade choice. However, we maintained the

term because across our neurons, the tuning curves were different (). As a result this term does not sum to a constant and must remain in the model. The calculation simplifies to:

Note that for

*fi(s)*we also implemented a version of the MAP model in which we used identical Gaussian functions (

Edelman and Keller, 1998) peak shifted by 90° to simulate SC tuning curves (). In this case, the

term was omitted from the model because summing over these functions is a constant. To avoid overestimation of the model, we implemented a leave-one-out cross validation procedure. For this, we extracted one trial from each data set and used the remaining trials to estimate

*P(r|s)* from each set. We then recovered the posterior from the extracted trial. This procedure was repeated for all trials for each data set.

Equation 7 returns one value for each of the possible saccade choices. Computing this value for each of the four possible saccade choices (

*s*_{j}) where

*j* = 1–4, defines the posterior distribution across the four possible saccade choices. In the case of the uniform prior function, the result is a Bayesian estimator that yields the same result as a maximum likelihood estimator as formalized by others (

Sanger, 1996;

Sanger, 2002;

Jazayeri and Movshon, 2006). In the case of the non-uniform prior, the result is a Bayesian estimator distinct from a maximum likelihood. To determine how well the posterior distribution predicted monkeys’ actual choices we compared the maximum

*a posteriori* estimate (MAP) with the saccade choice on a trial by trial basis:

When the saccade choice and the maximum *a posteriori* estimate corresponded, we considered the model to have a correct prediction. provides a graphic depiction of the MAP model along with the different *P(r|s)* and *P(s)* implementations.

Determining the likelihood using a Poisson probability density function relies on two assumptions (

Földiak, 1993;

Sanger, 1996;

Oram et al., 1998;

Sanger, 2002;

Jazayeri and Movshon, 2006;

Ma et al., 2006;

Beck et al., 2008). The first is that the occurrence of each action potential in a spike train is independent of the occurrence of other action potentials in the train. If time between successive action potentials is random we can consider the train of action potentials as a Poisson process. A common way to assess whether discharge statistics can be described as a Poisson process is to determine the index of proportionality, also referred to as the fano factor which is the ratio of the variance of the number of action potentials in an epoch to the number of action potentials in an epoch across trials. On a linear plot, a slope of 1.0 indicates linearity. To determine the fano factor of SC neurons, we counted the number of action potentials within the 28 to 8ms epoch before the onset of a saccade for each trial in a data set. We then determined the trial to trial variance of the action potential counts by subtracting individual trial counts from the mean count and squaring that quantity. This was done for the set of trials across all neurons. We then computed the mean of the difference measure and the mean of the action potential count and plotted these values for each neuron.

Supplemental Figure 1a shows the action potential count variance against the mean count across all 120 SC neurons when the stimulus in the neurons’ RF was either a target or a distractor. The fano factor for neurons when targets were in their RFs was 1.44 (n=120). The fano factor for neurons when distractors were in their RFs was 1.03 (n=360). These observations are consistent with the assumption of linear sufficient statistics, at least for distractor activity.

Supplemental Figure 1 shows that when targets appeared in the RF, the variance to mean relationship diverged from linearity. This is because SC neurons are exhibiting rapid increases in discharge associated with the saccade to the target in the RF. To deal with this deviation from linearity, we extended our probabilistic model to eliminate the Poisson probability distribution to estimate P(r

_{1–4}|s). Instead, we determined

*P(r*_{1–4}|s) directly by using a non-parametric density estimation procedure (

Optican and Richmond, 1987;

Scott, 1992). Panels e and f of show graphically how this was performed. Non-parametric density estimation is simply smoothing a frequency histogram. This procedure is similar to that used to calculate spike density functions from raster plots (

MacPherson and Aldridge, 1979). We first plotted the distribution of discharge rates measured in the four possible target condition during the 20ms epoch measured 28 to 8ms before saccade onset. We applied a smoothing kernel (

*k*[ ]):

where

*h* is the number of bins,

*r* is the discharge rate measured in the 20ms epoch and the domain of

*x* is the set of all numbers defined by the discharge rate. From here, the Gaussians are summed over the discharge rates and the sum is weighted by the number of bins in the frequency distribution (

*n*). Assuming a normal probability density,

*h* can be estimated by minimizing the (averaged) mean integrated squared error AMISE,(

Scott, 1992):

Convolving the histograms with the smoothing kernel in

equation 9 yields the empirical probability density distribution:

This procedure was done to obtain a probability density function for each neuron. From here we could extract the

*P(r|s)* directly to compute the posterior distribution over the four possible saccade choices again on a trial by trial basis:

As in the model shown in

equation 8, when the saccade choice and the maximum

*a posteriori* estimate determined from

equation 12 agreed, we considered the model to have a correct prediction.

The second assumption that is required to compute the posterior probability is that the noise correlations between the four neurons should be independent. Because we were careful to record from neurons with non-overlapping RFs, we assumed independence of the neuronal discharge. We calculated the noise correlation coefficients between neuronal responses to confirm our assumption (

Averbeck et al., 2006). Since we have four neurons, combining each into unique pairs resulted in six pairs allowing us to test all possible noise correlations between the four neurons

*(total conditions = 6(pairs)×4(target conditions)×30(data sets) = 720)*. Neuronal activity was measured 28 to 8 ms before saccade onset for all six pairs. Across our sample of 720 pairs, only 9.86% (71/720) of the pairs had statistically significant noise correlations (

Supplemental Figure 1b). To confirm that the noise correlations were accounted for in our model or did not contribute much to the result of the model we performed a shuffled analysis of our data (as shown in

Supplemental Figure 2).

It is important to note that although our implementation is Bayesian in the sense that we calculated a posterior probability by combining likelihoods and prior functions, there are some differences between our model and true Bayesian estimator. First, the likelihood functions in our model are discrete and are estimated from four individual neurons. Any additional variability that may be conveyed leading to a saccade choice is ignored. Second, the prior functions we implemented are also discrete and deterministic. Third, and as noted above, since we cannot ever know the true prior function, we simulated it. As shown in the simulated prior function correlates with the distribution of saccade choices made by the monkeys. This validates our approach and indicates that the simulated prior function we used to recover the MAP estimates of saccade choice was a good approximation to the actual prior used by the monkeys while performing this task.

Population Vector Average (PVA) To compute the population vector average (

*V⃗.gif" border="0" alt="V" title=""/>*_{population}) we considered each of the four neurons simultaneously recorded as one of a larger population of neurons representing one of the possible saccade choices. We computed the (

*V⃗.gif" border="0" alt="V" title=""/>*_{population}) for each trial using the four neurons, one representing the target (target neuron) and the other three representing the distractors (distractor neurons). We implemented a similar procedure for computing the (

*V⃗.gif" border="0" alt="V" title=""/>*_{population}) as used previously in motor cortex (

Georgopoulos et al., 1986) and SC (

Port and Wurtz, 2003). However, we adopted a normalization procedure suggested by

Salinas and Abbott (1994) to avoid obtaining negative vectors:

We computed the neuronal population vector average using

equation (13) where

*r*_{i} was the mean discharge rate for each

*i*_{th} neuron measured during the 20ms interval immediately before the onset of the saccade during the selection task (28ms to 8ms before saccade onset; note that this is the same interval as used for MAP and WTA). Since each neuron had different discharge rates and different baseline rates, it was necessary to normalize the neuronal responses to avoid arbitrary biases in (

*V⃗.gif" border="0" alt="V" title=""/>*_{population}). The normalized neuronal response was determined by calculating

. This term was then multiplied by the unit vector

*S⃗.gif" border="0" alt="S" title=""/>*_{i} which was defined as the average saccade direction determined by the actual eye movements made by the monkeys. This result was then summed and divided by four to obtain (

*V⃗.gif" border="0" alt="V" title=""/>*_{population}). The saccade choice with the smallest angular difference between it and the (

*V⃗.gif" border="0" alt="V" title=""/>*_{population}) was considered a correct model prediction. One assumption of this version of the PVA is that the SC contains a homogenous representation of all possible saccades. So we also implemented a model developed by Salinas and Abbott (1994) that optimizes the neuronal population vector to take into account inhomogeneous distributions. For this model we determined (

*V⃗.gif" border="0" alt="V" title=""/>*_{population}) by summing the discharge rates weighted by an optimized saccade vector:

where

*i* = neuron number and

*j* = saccade choice. The optimized saccade vector (

*D⃗.gif" border="0" alt="D" title=""/>*_{j(optimized)} was determined by multiplying an inverse correlation matrix (

Supplemental Figure 3a) of the discharge rates for all possible saccade choices by another correlation matrix (

Supplemental Figure 3b) between four saccade unit vectors (

*S⃗.gif" border="0" alt="S" title=""/>*_{j}) and neuronal responses (

*r*_{ij}). Once we obtained the optimized saccade vectors, we multiplied the neuronal responses (

*r*_{i}) by the optimized saccade vectors and averaged them. This calculation resulted in an optimized

*V⃗.gif" border="0" alt="V" title=""/>*_{population} that took into account the inhomogeneity of the saccade choice locations represented by the sample of neuronal discharges. As for the PVA, we considered the optimal linear estimator (OLE) as correctly predicting the saccade choice when the angle between the

*V⃗.gif" border="0" alt="V" title=""/>*_{population} and the actual saccade choice was smallest (

Supplemental Figure 4).