Among the fundaments of adaptive behavior is the ability to predict future events. This ability is crucial to functions ranging from sensory processing to decision making. In psychology and neuroscience, prediction has been studied most extensively in the context of Pavlovian and instrumental conditioning tasks, which measure how organisms anticipate (and act on) affectively significant events such as food delivery or electric shocks. A recent series of functional neuroimaging studies has investigated the neurophysiological basis of prediction and learning in humans. Using Pavlovian and instrumental conditioning tasks, these studies have identified several areas where blood oxygenation level–dependent (BOLD) signals correlate with trial-wise estimates from formal learning models like temporal difference (TD) learning (
Sutton and Barto 1998) or the Rescorla–Wagner (RW) model (
Rescorla and Wagner 1972). In particular, BOLD activity in areas including the striatum and the dorsolateral prefrontal cortex (DLPFC) (key dopaminergic targets) has been shown to covary with both
predictions and
prediction errors (
Fletcher et al. 2001;
McClure et al. 2003;
Corlett et al. 2004;
O'Doherty et al. 2004;
Seymour et al. 2004;
Turner et al. 2004;
Gläscher and Büchel 2005;
Pessiglione et al. 2006;
Jensen et al. 2007).
In all of these previous studies, the learned associations had direct relevance for behavior, either because they were linked to rewarding or punishing outcomes (e.g.,
McClure et al. 2003;
O'Doherty et al. 2004;
Seymour et al. 2004) or because subjects received feedback on their performance (
Fletcher et al. 2001;
Aron et al. 2004;
Corlett et al. 2004;
Turner et al. 2004). In contrast, it is unclear whether incidental learning of stimulus–stimulus associations, i.e., learning of associations that are irrelevant for current behavioral goals, draws upon the same neuronal mechanisms. A paradigm that shows that these types of associations are learned is sensory preconditioning. Here, in a first stage, the subject is exposed to behaviorally meaningless CS
1–CS
2 associations and, in a second stage, to CS
1–US (unconditioned stimulus) pairings. In a third and final stage, the presentation of a CS
2 alone generates a conditioned response, indicating that the subject must have learned the initial CS
1–CS
2 association (
Brogden 1939;
Gewirtz and Davis 2000).
In this study we used a factorial design that extended the first stage of classical sensory preconditioning paradigms. Healthy volunteers performed an audio-visual target-detection task, while being exposed to a stream of concurrent audio-visual “distractor” stimuli (). These stimuli possessed statistical regularities, which enabled prediction of the visual distractor from the preceding auditory cue (). Critically, however, these statistical associations were completely irrelevant to the target-detection task. Any learning of these associations would therefore be of an incidental (task-unrelated) nature and, in the absence of behavioral responses to the learned associations, could only be inferred neurophysiologically. This paradigm capitalized on previous work by McIntosh et al. (
McIntosh et al. 1998) who used positron emission tomography (PET) to show that learning of associations between sensory stimuli was reflected by activity in early visual cortex. However, the use of PET permitted only a simple conditioning scheme and precluded a full investigation of dynamic changes in the brain's representation of the learned association. Here, we employed a more refined conditioning scheme and used functional magnetic resonance imaging (fMRI) to study learning-dependent changes in brain activity over time. Additionally, we assessed learning-dependent changes in effective connectivity between auditory and visual cortex using dynamic causal modeling (DCM).
Using a 4-factorial design (c.f. ), this study characterized learning in terms of the temporal evolution (learning; factor 1) of both brain activity and interregional connectivity in response to a visual stimulus whose presence or absence (V+ vs. V−; factor 2) was predicted in 2 contexts, established by 2 types of auditory conditioning stimuli (CS+ vs. CS−; factor 3), each of which could be present or absent on each trial (A+ vs. A−; factor 4). In other words, in contrast to a classical sensory preconditioning paradigm, we could not only investigate differential learning, depending on CS type but could also assess whether the consequences of an absent CS were learned. It should be noted that both the CS+ and CS− context (or blocks) were balanced in terms of stimuli; the a priori probabilities of the auditory CS and of the visual stimulus occurring on a given trial were always 50%. Critically, the task was not related to these auditory and visual stimuli; subjects performed a target-detection task on unrelated stimuli that were presented sporadically.
One of the features of our factorial paradigm is that on half the trials the auditory CS is absent. This necessitates an additional cue that marks the beginning of each trial which was a visual trial onset (TO) cue. In other words, learning of stimulus associations in this paradigm has 2 components, one related to the auditory CS and another related to the visual TO cue. As a consequence, any model of the learning process must be able to formulate how a net prediction is computed from the associative strengths of the 2 cue components. Here we chose the RW model because it is the simplest and most generic model of associative learning that accounts for cue interactions (see Discussion for details). The RW model has been validated extensively, using behavioral data from both humans and animals and can account for many aspects of associative learning (
Schultz and Dickinson 2000;
Pearce and Bouton 2001). In our study, the trial-wise associative strength predicted by the RW model was used to construct regressors for a voxel-wise general linear model (GLM) of fMRI data and modulatory inputs for dynamic causal models (
Friston et al. 2003) of the effective connectivity between auditory and visual areas. Specifically, we addressed the following 2 questions:
1) In the absence of any behavioral responses to the audiovisual stimulus associations, can we obtain neurophysiological evidence that the brain learns these associations? Specifically, can we find brain regions whose activity correlates with learning (throughout the paper, we will use the colloquial term “learning curve” to denote the vector of predicted associative strength over time, i.e.,

in
eq. 1.) predicted by a generic model of associative learning (i.e., the RW model)? Candidate areas included early visual cortex and the striatum. Furthermore, do these areas show a response profile across cue–outcome combinations that reflects a match between prediction and outcome or rather a prediction-error response?
2) Because the predictive auditory cue temporally precedes the visual outcome, learning should modify neuronal activity in early visual cortex in response to auditory cues. Can these putative learning-related changes in visual cortex activity be explained by changes in the effective connectivity from auditory to visual cortex (c.f., (
McLaren et al. 1989;
McIntosh et al. 1998)? Specifically, do these changes conform to changes in associative strength under a RW model of learning?
Before describing our experiment, 2 important issues should be highlighted. First, the goal of this fMRI study was not to pinpoint the exact mathematical form of incidental learning by comparing different models of associative learning. Instead, we used the simplest (i.e., the RW) model of associative learning that could accommodate our paradigm. In the Discussion, we argue why the RW can be considered an appropriate
a priori learning model for our particular paradigm, relative to other models of associative learning. Second, it is important to note that
within a given experimental condition the predicted outcomes and prediction errors are perfectly anticorrelated (see
Supplementary Material for details). This means they cannot be distinguished as alternative predictors of observed brain responses. However, with our factorial design one can analyze the pattern of parameter estimates
across experimental conditions, contrasting expected and unexpected cue–outcome combinations. This enabled us to distinguish, voxel by voxel, brain responses that reflected a match between predicted and actual trial outcomes from responses that encode prediction error or surprise.