Subjects
16 subjects (11 females) were recruited from the University of California, San Diego (UCSD, La Jolla, CA) community. All had normal or corrected-to-normal vision. Each subject gave written informed consent per Institutional Review Board requirements at UCSD and completed a single 1 hour session in a climate and noise controlled subject room outside of the scanner and a single 1.5–2 hour session in the scanner. Compensation for participation was $10/hour for behavioral training and $20/hour for scanning. Subjects received an additional reward for task compliance according to a point system described below (mean additional compensation: $6.64). Data from two subjects were discarded due to improper slice stack selection during fMRI scanning that resulted in no data being collected from large portions of primary visual cortex, the main area of interest in this study.
Stimuli and task
Visual stimuli were generated using the Psychophysics Toolbox (
Brainard, 1997;
Pelli, 1997) implemented in Matlab (version 7.1; The Math Works, Natick, MA), presented at a frame rate of 60 Hz, and projected onto a screen at the entrance of the scanner bore that subjects viewed through a mirror. Button-press responses were made on an fMRI-compatible response box using the index and middle fingers of the right hand.
Subjects were shown a centrally presented oriented grating (with a diameter of approximately 14°) at full contrast which flickered at 6 Hz (83.33 milliseconds on, 83.33 milliseconds blank interval). On each trial, the orientation of the grating was pseudorandomly selected with equal probability from one of nine possible orientations (0°, 20°, 40°, 60°, 80°, 100°, 120°, 140°, 160°) with a small amount of pseudo-random jitter added (between ±0°–6°, selected from a uniform distribution). On half the trials, the same stimulus was presented at every ‘flicker’ (match trials), but for the remaining trials (mismatch trials), the orientation of the grating was offset by 5° on every other flicker, with the rotational offset of the deviant grating (i.e. clockwise or counterclockwise) fixed on a given trial and counterbalanced across trials (see ). The subject’s objective was to make a match/mismatch judgment by pressing one of two buttons held in the right hand. The order of match and mismatch trials was pseudo-randomized and counterbalanced within each run. Subjects were allowed to make a response any time after the stimulus onset; the stimulus was present for 3 seconds, after which it was replaced with a white centrally presented fixation circle for 3.5 seconds. We omitted all trials in which a subject failed to give a response (less than 1%) or emitted a response faster than 200 milliseconds (less than 1.05%). Since the second grating did not appear until 166.67 milliseconds into the trial, we reasoned that responses quicker than 200 milliseconds should be regarded as definite blind guesses. In total, a single run consisted of 50 trials (36 experimental trials and 14 null trials consisting of just a fixation circle) and lasted 336 seconds including an 11 second period of passive fixation at the end of each run. Across the 36 experimental trials, each of the 9 possible orientations was presented 4 times. Prior to each run, subjects were instructed by the experimenter to emphasize either response accuracy or speed. Subjects earned points based on their performance: +10 for correct responses on accuracy trials, −10 for incorrect responses on accuracy trials, +10 for correct responses on speed trials within the response deadline, +0 for incorrect responses on speed trials, −10 for any responses exceeding the response deadline. At the end of the experiment, subjects were paid an additional $1 for every 100 points earned during their performance while being scanned (rounded to the nearest dollar). During training in the lab, subjects were given trial-by-trial feedback, but feedback was delayed until the end of each run during the scanning session.
Response deadline on speeded trials
All participants were trained prior to the scan session for a minimum of 180 trials. During training, subjects received point feedback on a trial-by-trial basis according to the reward scheme outlined above. Participants practiced the task without any speed pressure until they felt comfortable and performed at approximately 90% accuracy. Subjects were then asked to repeat the task by responding as quickly as they could without guessing. The median of their RT distribution on this block was then set as their response deadline for both the subsequent speeded training blocks and the speeded blocks during the fMRI session.
Linear ballistic accumulation (LBA) model
Behavioral data were modeled using the LBA, which is a simplified version of the ballistic accumulator model and the leaky competing accumulator model (see
Brown and Heathcote, 2005;
2008;
Usher and McClelland, 2001). illustrates the LBA model schematically. On each trial, two racing accumulators begin with a random activation level (the
starting point) that is independently drawn from a uniform distribution on [0,
A]. Activity in each accumulator increases linearly, and a response is triggered as soon as one accumulator crosses the
response threshold. The predicted response time is the time taken to reach the threshold, plus a constant offset time (
non-decision time). The rate at which activation increases in each accumulator is termed the
drift rate for that accumulator. These drift rates are drawn from independent normal distributions for each accumulator (with the standard deviation of these distributions being arbitrarily fixed at 1). The means of the normal distributions reflect the quality of the perceptual input. For example, a salient mismatch between the orientated gratings would lead to a large mean drift rate in the accumulator corresponding to a mismatch response (and vice versa). Hence, the LBA model estimates the mean of this drift rate distribution for each accumulator (“match” or “mismatch”).
The distance from the starting point to the response threshold is a measure of
response caution as this distance quantifies the average amount of evidence that needs to be accumulated before a response is initiated. Changes in response caution are usually assumed to originate from adjusting the response threshold; however, adjusting the response threshold is mathematically equivalent to adjusting the starting point, therefore we chose to fix the height of the uniform distribution (
A) from which the starting points were drawn (although the starting points nevertheless vary trial to trial; see also
Ratcliff 1978;
Ratcliff & Rouder, 1998;
Forstmann et al., 2010;
Van Maanen et al., 2011;
Wolfe and Van Wert, 2010). As a result, we hereon use the response threshold to represent “response caution” since the maximum of the start point distribution is fixed across the SE and AE conditions.
LBA model analyses
The parameters of the LBA model were estimated using the method of maximum likelihood. Likelihood was optimized using simplex searches (
Nelder and Mead, 1965). Initial parameter values for searches were generated two ways: using heuristic calculations based on the data, and using start points determined from the end points of searches for simpler, nested models (full details of these methods and extensive discussion of alternative approaches are provided by
Donkin et al., 2011a). We fit the “match” and “mismatch” trials simultaneously, fixing all parameters between these two trial types to be constant except for the drift rate (which is presumably determined by the stimulus). Different drift rates were estimated for the accumulator corresponding to a “mismatch” response on trials with a “mismatch” stimulus (i.e., “correct” drift rate) and on trials with a “match” stimulus (i.e., “incorrect” drift rate; see ). Similarly, different drift rates were estimated for the accumulator corresponding to a “match” response on trials with a “match” stimulus (“correct” drift rate) and on trials with a “mismatch” stimulus (“incorrect” drift rate; see ). Each different design for constraining model parameters across task conditions was fit separately to each individual subject's data. One subject, however, only made one incorrect response among the AE mismatch trials, thereby providing little constraint on the model estimate for that condition. The parameter estimates for that subject were therefore set to the group average for that condition. The overall grouped BIC value provided very strong support for the design that allowed three parameters to vary between SE and AE conditions (response threshold (
b), drift rate (
v), and non-decision time (
t0)). To quantify that support, we approximated posterior model probabilities based on BIC assuming a fixed effect for subjects (see
Burnham and Anderson, 2002), which showed this design to be more than 10
10 times more likely than the next best design (see
Results section)
| Table 2Average LBA parameter estimates for the best BIC model on mismatch and match trials |
Single-trial linear ballistic accumulator (STLBA)
Response times and accuracy vary on each trial due to environmental changes and/or internal noise in a subject’s cognitive state. It is therefore important to not only map overall mean behavioral patterns with parameters that quantify relevant cognitive processes (as can be done with the standard LBA), but also to link estimates of these parameters and BOLD responses on a trial-by-trial basis.
In the standard LBA model (as in other decision making models, see
Ratcliff, 1985;
Ratcliff and Rouder, 1998), drift rates are normally distributed across trials, with different distributions for each respective accumulator. This assumption of normally distributed drift rates implies that drift rates which are close to the mean of the distribution are more likely than values from the tails of the distribution. In addition, the uniform distribution [0,
A] restricts the range of starting points for each accumulator. These considerations yield the following maximum-likelihood estimates (MLE) for a single-trial drift rate (
di) and a single-trial starting point (
ai) given a trial with response time (
ti):
where
b,
v,
A, and
t0 are the parameters estimated using the standard LBA that correspond to the response threshold (
b), the drift rate (
v), the height of the distribution of starting points (
A), and the non-decision time (
t0), respectively. Note that the assumed independence between estimated parameters that is found in the standard LBA model is no longer preserved with the STLBA. Nevertheless, parameter recovery studies show that the STLBA can explain much of the variance in the true parameter values (see the text surrounding Figure 3 in
Van Maanen et al., 2011).
As in the main LBA analysis, we computed single-trial estimates of drift rate based on a model where response threshold (b), drift rate (v), and the non-decision time (t0) were free to vary between SE and AE trials, whereas the height of the uniform distribution of starting points (A) was fixed (see for exact values). Constraining the model in other reasonable ways (e.g., fixing the non-decision time parameter) yielded qualitatively similar results. Note also that the single-trial estimates for the starting point here are mathematically equivalent to single-trial estimates of the response threshold since what is actually being calculated is the relative distance between the two.
fMRI Data Acquisition and Analysis
All scanning was carried out on a General Electric MR750 3T scanner equipped with an 8-channel head coil at the W.M. Keck Center for Functional MRI on the main campus at UCSD. Anatomical images were acquired using a SPGR T1-weighted sequence that yielded images with a 1×1×1 mm resolution. Whole brain echo planar functional images (EPI) were acquired in 28 (for 8 of the subjects) or 26 (for the remaining subjects) oblique slices (TR = 1500 ms, TE = 30 ms, flip angle = 90°, image matrix = 64 × 64, field of view = 192 mm, slice thickness = 3 mm, 0 mm gap)
Data analysis was performed using BrainVoyager QX (v 1.91; Brain Innovation, Maastricht, The Netherlands) and custom timeseries analysis routines written in Matlab (version 7.11.0.584; The Math Works, Natick, Massachusetts). Data from the main experiment were collected in 8 or 10 runs per subject (i.e., either 4 or 5 runs per response instruction type, respectively). EPI images were slice-time corrected, motion-corrected (both within and between scans) and high pass filtered (3 cycles/run) to remove low frequency temporal components from the timeseries. The timeseries from each voxel in each observer was then z-transformed on a run-by-run basis to normalize the mean response intensity across time to zero. This normalization was done to correct for differences in mean signal intensity across voxels (e.g., differences related to a voxel’s composition or by its distance from the coil elements). We then estimated the magnitude of the BOLD response on each trial by shifting the timeseries from each voxel by four 1.5 second TRs (6 seconds) to account for the temporal lag in the hemodynamic response function, and then extracting data from the two consecutive 1.5 second TRs that correspond to the duration of each 3 second trial (see
Kamitani and Tong, 2005;
Serences and Boynton, 2007a,
b;
Serences et al., 2009). The two data points extracted from these two consecutive TRs were then averaged together to compute a single estimate of the response in each V1 voxel on each trial. These trial-by-trial estimates of the BOLD response amplitude were subsequently used as inputs to the forward encoding model (see
Estimating feature-selective BOLD response profiles using a forward encoding model).
Functional localizer scans
Each subject participated in two runs of an independent functional localizer scan to identify voxels within primary visual cortex that were responsive to the spatial position occupied by the oriented grating stimulus employed in the primary experiment. The localizer stimulus was comprised of a full-contrast counter-phase modulated (8Hz) checkerboard that exactly matched the size of the oriented grating stimulus used in the main task. On each trial, the checkerboard stimulus was presented continuously for 10s, and the contrast of the checkerboard was reduced by 30% for a single video frame at 6 pseudo-randomly selected time-points. Subjects were instructed to make a button-press response with their right index finger every time they detected a contrast decrement. Each 10s trial was then followed by 10s of passive fixation. Visually responsive regions of primary visual cortex were identified using a general linear model (GLM) with a single regressor that was constructed by convolving a boxcar model of the stimulus sequence with a standard model of the hemodynamic response function (a difference-of-two gamma function model implemented in Brain Voyager, time to peak of positive response: 5s, time to peak of negative response: 15s, ratio of positive and negative responses: 6, positive and negative response dispersion: 1). Voxels were retained for analysis in the main experimental task if they passed a false discovery rate corrected single-voxel threshold of p<0.05
Retinotopic mapping
A meridian mapping procedure consisting of a checkerboard wedge flickering at 8 Hz and subtending 60° of polar angle was used to identify V1 (
Engel et al., 1994;
Sereno et al., 1995). Subjects were instructed to fixate on the center of the screen and to passively view the peripheral stimulus. The data collected during these scans was then projected onto a computationally inflated representation of each subject’s gray/white matter boundary. V1 in each hemisphere was then manually defined according to the representations of the upper and lower vertical meridian following standard practices (
Wandell et al., 2007)
Estimating feature-selective BOLD response profiles using a forward encoding model
The goal of encoding models is to adopt an
a priori assumption about the important features that can be distinguished using hemodynamic signals within an ROI, and then to use this set of features (or basis functions) to predict observed patterns of BOLD responses (
Brouwer and Heeger, 2009,
2011;
Dumoulin and Wandell, 2008;
Gourtzelidis, et al., 2005;
Kay and Gallant, 2009;
Kay et a., 2008; Mitchell, et al., 2008;
Naselaris, et al., 2009;
Thirion, et al., 2006; reviewed in
Naselaris, et al., 2011; Saproo and Serences, 2011). Here, we assumed that the BOLD response in a given V1 voxel represents the pooled activity across a large population of orientation selective neurons, and that the distribution of neural tuning preference is biased within a given voxel due to large-scale feature maps (
Freeman et al., 2011) or to random anisotropies in the distribution of orientation-selective columns within each voxel (
Kamitani and Tong, 2005;
Swisher et al., 2010). Thus, the BOLD response measured from many of the voxels in V1 exhibit a robust orientation preference (
Haynes and Rees, 2005;
Kamitani and Tong, 2005;
Serences et al., 2009;
Brouwer and Heeger, 2011;
Freeman et al., 2011)
To estimate orientation-selective response profiles based on activation patterns in V1, we first separated the data from the 8–10 scanning runs obtained for each subject into train and test sets using a ‘leave two-out’ cross-validation scheme (i.e., all but one SE and one AE run were used as a training set, and the held-out SE and AE runs were used as a test set). By holding one AE and one SE run out for use as a test set, we ensured that the training set had an equal number of trials of each type. For each run in the training set, we then computed the mean response evoked by each of the 9 orientations, separately for each voxel. The mean responses were then sorted based on stimulus orientation and run (i.e. mean response to orientation 1 was first, then orientation 2, …, orientation 9). Thus, each training set had 54 observations for subjects who underwent 8 runs in the scanner (6 runs in training set × 9 orientations), and 72 observations for subjects that underwent 10 runs in the scanner (8 runs in the training set × 9 orientations). Note that, as described below, data in the test set were not averaged across trials, and a unique channel response function was estimated for every trial.
Adopting the terminology of
Brouwer and Heeger (2009;
2011), let
m be the number of voxels in a given visual area,
n1 be the number of observations in the training set (either 54 or 72),
n2 be the number of
trials in the test set, and
k be the number of hypothetical orientation channels. Let
B1 (
m ×
n1 matrix) be the training data set, and
B2 (
m ×
n2 matrix) be the test data set. Under the assumption that the observed BOLD signal is a weighted sum of underlying orientation selective neural responses, we generated a matrix of hypothetical channel outputs (C
1,
k ×
n1) comprised of nine half-sinusoidal functions raised to the 6
th power as a basis set (see ). The training data in B1 were then mapped onto the matrix of channel outputs (C
1) by the weight matrix (W,
m ×
k) that was estimated using a GLM of the form:
where the ordinary least-squares estimate of W is computed as:
The channel responses
C2 (
k ×
n2) were then estimated based on the test data (B
2) using the weights estimated in
(2):
The first steps in this sequence (
equations 1–
2) are similar to a traditional univariate GLM in that each voxel is assigned a weight for each feature in the model (in this case, one weight for each hypothetical orientation channel).
Equation 3 then implements a multivariate computation because the channel responses estimated on each trial (in C
2) are constrained by the estimated weights assigned to each voxel
and by the vector of responses observed across all voxels on that trial in the test set. Thus, one key feature of this approach is that a set of estimated channel responses can be obtained on a trial-by-trial basis so long as the number of voxels is greater than the number of channels. If there are fewer voxels than channels, then unique channel response estimates cannot be derived as the number of variables being estimated exceeds the number of available measurements. This ability to estimate the orientation-selective tuning profile on each trial is exploited when comparing channel responses on correct and incorrect trials and when correlating channel responses with accuracy and drift rates on a trial-by-trial basis (see
Results)
The shape of the basis functions used in C
1 has a large impact on the resulting channel response estimates. In the present experiment, we used half-sinusoidal functions that were raised to the 6
th power to approximate the shape of single-unit tuning functions in V1, where the 1/√2 half-bandwidth of orientation tuned cells is approximately 20° (although there is a considerable amount of variability in bandwidth, see Ringach et al., 2002a; Ringach et al., 2002b;
Gur et al, 2005;
Schiller 1976). Given that the half-sinusoids were raised to the 6
th power, a minimum of seven linearly independent functions was required to adequately cover orientation space (
Freeman and Adelson, 1991); however, since we presented nine unique orientations in the experiment, we used a set of nine evenly distributed functions. The use of more than the required seven basis functions is not problematic so long as the number of functions does not exceed the number of measured stimulus values, in which case the matrix C
1 would become rank deficient. While we selected the bandwidth of the basis functions based on physiology studies, all results that we report are robust to reasonable variations in this value (i.e., raising the half-sinusoids to the 5
th–8
th power, all of which are reasonable choices based on the documented variability of single-unit bandwidths). Note that since the magnitude of the channel responses is scaled by the amplitude of the basis functions (which was set to 1 here), the units along the y-axes of all data plots are in arbitrary units. Importantly, however, scaling the basis functions to some other common value would not change the
differential response between conditions.
Using this modeling approach, the center position of each function in the basis set can be systematically shifted across orientation space to estimate the response in a channel centered at any arbitrary orientation (as long as the channels remain linearly independent;
Freeman and Adelson, 1991). While this method of shifting the center of each channel across orientation space could in principle be used to generate channel response profiles with a resolution of 1° (or even smaller), we opted to reconstruct the response functions in 5° steps as no additional insights were gained by estimating the responses at a higher resolution. After generating a channel response function on each trial in 5° steps across orientation space, each function was circularly shifted to a common stimulus-centered reference frame, and these re-centered response functions were averaged across left and right V1 and across all trials of a like kind. Thus, by convention the 0° point along the x-axis in all plots refers to the stimulus that evoked the response profile. Finally, since all channel response functions were found to be symmetrical about their center point, we averaged data from corresponding offsets on either side of the 0° point (i.e., data were averaged from the channels offset by +5° and −5° from the stimulus, +10° and −10°, and so forth) to produce the reported orientation tuning functions. Note that in the process of collapsing across channels centered on both positive and negative offsets from 0°, we necessarily collapsed across mismatch trials in which there was either a clockwise or a counterclockwise offset between sequentially presented gratings within a trial. However, sorting the data by the rotational offset of the deviant grating had no qualitative impact on any of our results, presumably because the two gratings were flickering back and forth on sequential presentations over the course of the 3s trial (see ) and because there was a random jitter of up to ±6° introduced on each trial (see task description above), which was on the same order as the offset between sequential gratings on mismatch trials (±5°)
Bootstrapping/randomization procedure for evaluating statistical significance
Because the basis functions used to estimate channel responses overlapped – thus violating the independence assumption of traditional statistical tests – we estimated statistical significance using a non-parametric bootstrapping/randomization procedure. Note that this bootstrapping/randomization procedure was used for all comparisons related to BOLD tuning functions (see and , AE v. SE, correct AE v. incorrect AE, correct SE v. incorrect SE, the interaction between AE v. SE based on accuracy, AE logistic regression beta weights v. SE logistic regression beta weights, and single-trial correlations between AE responses and drift rate v. single-trial correlations between SE responses and drift rate). First, a series of standard paired t-tests was performed to determine which points along the two tuning curves differed significantly (using a threshold of p<0.05 for each individual t-test). We then generated a new data set by randomly selecting 14 participants with replacement and then re-assigning the condition label associated with data from each participant with a probability of 0.5. A series of paired t-tests was performed on the re-sampled and randomized data set using the same procedure applied to the observed data. This re-sampling plus randomization procedure was then iterated 10,000 times to determine the probability of obtaining the pattern of significant differences obtained using the intact data set under the null hypothesis that the two conditions are equivalent (i.e., interchangeable). The reported p-values in the main text thus reflect the proportion of times we observed a pattern of significant t-tests in the re-sampled data that matched the pattern obtained in the observed data. Note that the behavioral data were evaluated using conventional parametric statistical techniques.