|Home | About | Journals | Submit | Contact Us | Français|
Confronted with a rich sensory environment, the brain must learn statistical regularities across sensory domains to construct causal models of the world. Here, we used functional magnetic resonance imaging and dynamic causal modeling (DCM) to furnish neurophysiological evidence that statistical associations are learnt, even when task-irrelevant. Subjects performed an audio-visual target-detection task while being exposed to distractor stimuli. Unknown to them, auditory distractors predicted the presence or absence of subsequent visual distractors. We modeled incidental learning of these associations using a Rescorla–Wagner (RW) model. Activity in primary visual cortex and putamen reflected learning-dependent surprise: these areas responded progressively more to unpredicted, and progressively less to predicted visual stimuli. Critically, this prediction-error response was observed even when the absence of a visual stimulus was surprising. We investigated the underlying mechanism by embedding the RW model into a DCM to show that auditory to visual connectivity changed significantly over time as a function of prediction error. Thus, consistent with predictive coding models of perception, associative learning is mediated by prediction-error dependent changes in connectivity. These results posit a dual role for prediction-error in encoding surprise and driving associative plasticity.
Among the fundaments of adaptive behavior is the ability to predict future events. This ability is crucial to functions ranging from sensory processing to decision making. In psychology and neuroscience, prediction has been studied most extensively in the context of Pavlovian and instrumental conditioning tasks, which measure how organisms anticipate (and act on) affectively significant events such as food delivery or electric shocks. A recent series of functional neuroimaging studies has investigated the neurophysiological basis of prediction and learning in humans. Using Pavlovian and instrumental conditioning tasks, these studies have identified several areas where blood oxygenation level–dependent (BOLD) signals correlate with trial-wise estimates from formal learning models like temporal difference (TD) learning (Sutton and Barto 1998) or the Rescorla–Wagner (RW) model (Rescorla and Wagner 1972). In particular, BOLD activity in areas including the striatum and the dorsolateral prefrontal cortex (DLPFC) (key dopaminergic targets) has been shown to covary with both predictions and prediction errors (Fletcher et al. 2001; McClure et al. 2003; Corlett et al. 2004; O'Doherty et al. 2004; Seymour et al. 2004; Turner et al. 2004; Gläscher and Büchel 2005; Pessiglione et al. 2006; Jensen et al. 2007).
In all of these previous studies, the learned associations had direct relevance for behavior, either because they were linked to rewarding or punishing outcomes (e.g., McClure et al. 2003; O'Doherty et al. 2004; Seymour et al. 2004) or because subjects received feedback on their performance (Fletcher et al. 2001; Aron et al. 2004; Corlett et al. 2004; Turner et al. 2004). In contrast, it is unclear whether incidental learning of stimulus–stimulus associations, i.e., learning of associations that are irrelevant for current behavioral goals, draws upon the same neuronal mechanisms. A paradigm that shows that these types of associations are learned is sensory preconditioning. Here, in a first stage, the subject is exposed to behaviorally meaningless CS1–CS2 associations and, in a second stage, to CS1–US (unconditioned stimulus) pairings. In a third and final stage, the presentation of a CS2 alone generates a conditioned response, indicating that the subject must have learned the initial CS1–CS2 association (Brogden 1939; Gewirtz and Davis 2000).
In this study we used a factorial design that extended the first stage of classical sensory preconditioning paradigms. Healthy volunteers performed an audio-visual target-detection task, while being exposed to a stream of concurrent audio-visual “distractor” stimuli (Fig. 1). These stimuli possessed statistical regularities, which enabled prediction of the visual distractor from the preceding auditory cue (Fig. 2). Critically, however, these statistical associations were completely irrelevant to the target-detection task. Any learning of these associations would therefore be of an incidental (task-unrelated) nature and, in the absence of behavioral responses to the learned associations, could only be inferred neurophysiologically. This paradigm capitalized on previous work by McIntosh et al. (McIntosh et al. 1998) who used positron emission tomography (PET) to show that learning of associations between sensory stimuli was reflected by activity in early visual cortex. However, the use of PET permitted only a simple conditioning scheme and precluded a full investigation of dynamic changes in the brain's representation of the learned association. Here, we employed a more refined conditioning scheme and used functional magnetic resonance imaging (fMRI) to study learning-dependent changes in brain activity over time. Additionally, we assessed learning-dependent changes in effective connectivity between auditory and visual cortex using dynamic causal modeling (DCM).
Using a 4-factorial design (c.f. Fig. 2), this study characterized learning in terms of the temporal evolution (learning; factor 1) of both brain activity and interregional connectivity in response to a visual stimulus whose presence or absence (V+ vs. V−; factor 2) was predicted in 2 contexts, established by 2 types of auditory conditioning stimuli (CS+ vs. CS−; factor 3), each of which could be present or absent on each trial (A+ vs. A−; factor 4). In other words, in contrast to a classical sensory preconditioning paradigm, we could not only investigate differential learning, depending on CS type but could also assess whether the consequences of an absent CS were learned. It should be noted that both the CS+ and CS− context (or blocks) were balanced in terms of stimuli; the a priori probabilities of the auditory CS and of the visual stimulus occurring on a given trial were always 50%. Critically, the task was not related to these auditory and visual stimuli; subjects performed a target-detection task on unrelated stimuli that were presented sporadically.
One of the features of our factorial paradigm is that on half the trials the auditory CS is absent. This necessitates an additional cue that marks the beginning of each trial which was a visual trial onset (TO) cue. In other words, learning of stimulus associations in this paradigm has 2 components, one related to the auditory CS and another related to the visual TO cue. As a consequence, any model of the learning process must be able to formulate how a net prediction is computed from the associative strengths of the 2 cue components. Here we chose the RW model because it is the simplest and most generic model of associative learning that accounts for cue interactions (see Discussion for details). The RW model has been validated extensively, using behavioral data from both humans and animals and can account for many aspects of associative learning (Schultz and Dickinson 2000; Pearce and Bouton 2001). In our study, the trial-wise associative strength predicted by the RW model was used to construct regressors for a voxel-wise general linear model (GLM) of fMRI data and modulatory inputs for dynamic causal models (Friston et al. 2003) of the effective connectivity between auditory and visual areas. Specifically, we addressed the following 2 questions:
1) In the absence of any behavioral responses to the audiovisual stimulus associations, can we obtain neurophysiological evidence that the brain learns these associations? Specifically, can we find brain regions whose activity correlates with learning (throughout the paper, we will use the colloquial term “learning curve” to denote the vector of predicted associative strength over time, i.e., in eq. 1.) predicted by a generic model of associative learning (i.e., the RW model)? Candidate areas included early visual cortex and the striatum. Furthermore, do these areas show a response profile across cue–outcome combinations that reflects a match between prediction and outcome or rather a prediction-error response?
2) Because the predictive auditory cue temporally precedes the visual outcome, learning should modify neuronal activity in early visual cortex in response to auditory cues. Can these putative learning-related changes in visual cortex activity be explained by changes in the effective connectivity from auditory to visual cortex (c.f., (McLaren et al. 1989; McIntosh et al. 1998)? Specifically, do these changes conform to changes in associative strength under a RW model of learning?
Before describing our experiment, 2 important issues should be highlighted. First, the goal of this fMRI study was not to pinpoint the exact mathematical form of incidental learning by comparing different models of associative learning. Instead, we used the simplest (i.e., the RW) model of associative learning that could accommodate our paradigm. In the Discussion, we argue why the RW can be considered an appropriate a priori learning model for our particular paradigm, relative to other models of associative learning. Second, it is important to note that within a given experimental condition the predicted outcomes and prediction errors are perfectly anticorrelated (see Supplementary Material for details). This means they cannot be distinguished as alternative predictors of observed brain responses. However, with our factorial design one can analyze the pattern of parameter estimates across experimental conditions, contrasting expected and unexpected cue–outcome combinations. This enabled us to distinguish, voxel by voxel, brain responses that reflected a match between predicted and actual trial outcomes from responses that encode prediction error or surprise.
Sixteen healthy volunteers, 25.3 ± 3.3 years of age, (mean age ± SD, 8 female) participated in the study. The subjects had no history of psychiatric or neurological disorders. Written informed consent was obtained from all volunteers prior to the study, which was approved by the National Hospital for Neurology and Neurosurgery Ethics Committee.
The central idea of this study was to present subjects with “distractor” stimuli that were linked by predictive associations: 2 auditory stimuli served as CS and differentially predicted whether or not a visual stimulus would follow. Critically, the volunteers performed an unrelated detection task on separate auditory and visual targets; for this task, the predictive relationships between the distractor stimuli were completely irrelevant. Stimuli were presented using Cogent2000 (www.vislab.ucl.ac.uk/Cogent/index.html). An initial sound matching task and the subsequent learning study (4 × 10 min) were all completed inside the scanner. Subjects were debriefed with a postscan questionnaire to assess whether they had learned the experimental contingencies.
Preceding the learning experiment, subjects had to match the 2 CS (450 and 1000 Hz) and the auditory target stimulus (white noise burst) for perceived loudness. Stimuli were presented sequentially and dichotically. Subjects adapted the volume of the 1000-Hz tone to the 450-Hz tone until they perceived them to be of equal loudness. This procedure was repeated 8 times and the results averaged. Subsequently, subjects matched the perceived loudness of the white noise burst to the pure tones, each repeated 4 times. The adapted volumes, as a percentage of the volume of the low tone were 94.0 ± 6.2% (mean ± SD) for the high tone, and 104 ± 4.9% for the white noise burst.
During the experiment, subjects were exposed to alternating blocks of trials in which one of 2 auditory CS (high and low tone) predicted the presence (CS+) or omission (CS−) of a subsequent visual stimulus with a fixed probability of 80% (Fig. 1 and and2).2). On each trial, a CS was presented (A+) with 50% probability. On 50% of all trials, a visual stimulus was present (V+). Every trial was preceded by a visual TO cue.
Our paradigm thus used a 4-factor design with the following factors for each trial: 1) CS context (CS+ vs. CS−), 2) CS presence (A+ vs. A−), 3) visual outcome (V+ vs. V−), and 4) learning (or time). We used a mixed event and epoch design in which CS type was blocked, whereas the presentation of the CS and visual outcome were randomized (event-related) within blocks. CS+ and CS− blocks were completely balanced so that in each block of 10 trials 5 CS and 5 visual stimuli were presented. Within each subject, the auditory CS+ and CS− and their probabilistic relation to subsequent visual stimuli were fixed throughout the experiment. The assignment of tones to the 2 CS was counterbalanced across subjects, that is, in half the subjects the high tone served as CS+ (and the low tone as CS−), and vice versa the other half of the subjects. Each of the 4 sessions consisted of 20 blocks of 10 trials, interspersed with periods of rest (12 s), in which subjects fixated on a fixation cross. Blocks and sessions were balanced across and within subjects.
To ensure continuous attention to auditory and visual targets per se (but not their statistical associations), subjects performed a concurrent target-detection task. The target stimuli were randomly interspersed between trials and consisted of either a white noise burst or a circle. Target stimuli occurred on average once per block (at most 2 times). In total, 40 auditory and 40 visual target stimuli were presented, randomized within conditions and sessions.
A 3 Tesla Siemens Allegra MRI scanner (Siemens, Erlangen, Germany) was used to acquire T1-weighted fast-field echo structural images and multislice T2*-weighted echo-planar volumes with BOLD contrast (time repetition = 2.08 s). For each subject, functional data were acquired in 4 scanning sessions of approximately 10 min each. 306 volumes were acquired per session (1224 scans in total per subject). The first 6 volumes of each session were discarded to allow for T1 equilibrium effects. Each functional brain volume comprised 34 2-mm axial slices with a 2-mm interslice gap, and an in-plane resolution of 3 × 3 mm. The field of view covered the whole brain, except for the cerebellum and brainstem. The total duration of the experiment was approximately 60 min per subject.
fMRI data were analyzed using the statistical software packaged SPM5 (Wellcome Trust Centre for Neuroimaging, London, UK; http://www.fil.ion.ucl.ac.uk/spm). The 1200 images from each subject were realigned to correct for head movements, corrected for movement-by-distortion interactions (Anderson et al. 2001), spatially normalized to the Montreal Neurological Institute (MNI) template brain, smoothed spatially with a 3-dimensional Gaussian kernel of 8-mm full width half maximum and resampled to 3 × 3 × 3 mm voxels. The data were then modeled voxel-wise, using a GLM that included regressors for all experimental trials as well as regressors for the target-detection task. Trial-specific effects were modeled by trains of delta functions convolved with 3 hemodynamic basis functions (a canonical hemodynamic response function, and its temporal and dispersion derivatives). Additionally, the time-dependent associative strengths from the RW model (; see eq. 1) and their partial derivatives with respect to learning rate (see next section) were used as parametric modulators of each trial-specific regressor. The data were high-pass filtered (cut-off 128 s) to remove low-frequency signal drifts, and a first-order autoregressive model was used to model the remaining serial correlations (Friston et al. 2002). Contrast images of parameter estimates encoding trial-specific effects were created for each subject and entered separately into voxel-wise one-sample t-tests (df = 15), to implement a second-level random effects analysis. We report regions that survive cluster-level correction for multiple comparisons (family-wise error, FWE) across the whole brain at P < 0.05. Because previous studies demonstrated the role of the striatum and the prefrontal cortex in associative learning (e.g., Fletcher et al. 2001; O'Doherty et al. 2004; Corlett et al. 2004), we performed an additional restricted search in these areas, using anatomical masks generated from the PickAtlas toolbox (Maldjian et al. 2003). Again, we only report activations that survived a small volume correction (SVC) at P < 0.05.
We used a RW model of associative learning to generate predictors of learning-dependent changes in brain activity (as indexed by the BOLD signal) and inter-regional connectivity over time. The basic principle of this model is that the size of the trial-specific prediction error, that is, the degree of surprise incurred by an event, determines the change in associative strength. From the train of observed events a learning curve was computed and fitted to the fMRI data. Trial-specific cueing was modeled by means of 2 separate components (see Fig. 1): the visual TO cue, which was present on every trial and the auditory CS per se, which was present on half the trials. This allowed us to model learning effects on trials where no CS was present. In the RW framework, the predicted outcome on trial t, , is the sum of the associative strengths of each cue component:
On each trial t, equation (1) is calculated separately for each cue component, indexed by i (i.e., the auditory CS, and TO), whereas ui,t indexes which of the cue components is actually present on trial t (see the Supplementary Material). λt indicates the actual outcome at trial t, being 1 for V+ and 0 for V−; ϵt is the learning rate that determines how strongly the prediction error affects the update of the prediction. Separate components are summed in equation (2), where is the summed prediction of whether a visual stimulus will be presented at trial t, and j indexes whether this is a CS+ or CS− trial. (When considered for a single cue per trial, eq. 1 can also be seen as a simple model of Hebbian or associative plasticity. In this context, encodes the associative strength, which changes according to the second term in eq. 1. This associative term comprises a (presynaptic) input encoding the outcome on any trial, and a (postsynaptic) prediction error.)
A challenge when applying the RW model to our experiment was to determine an appropriate learning rate. In principle this could be done by fitting the model to behavioral data and using the resulting learning rate to construct regressors for the fMRI analysis. However, our experimental design deliberately precluded behavioral responses; instead, learning could only be assessed neurophysiologically in terms of changes in cortical activity and inter-regional connectivity. Alternative strategies are to choose the learning rate based on principled considerations (e.g., O'Doherty et al. 2004) or using model comparison (Gläscher and Büchel 2005). Because we knew from a previous study that learning should occur in the visual cortex (McIntosh et al. 1998), we adopted the approach by Gläscher and Büchel (2005) of optimizing the value of ϵi to best explain putative learning-induced responses within the main area of interest, the visual cortex. Given our volunteers did not notice the statistical associations (and thus learning was presumably slow) and given that another study of perceptual association learning showed small learning rates ϵCS below 0.1 (Gläscher and Büchel 2005), we tested the following values of ϵCS in separate models: 0.01, 0.025, 0.05, 0.075, 0.1. We found that ϵCS = 0.075 gave the best fit to the data in primary visual cortex for the main contrast of interest (i.e., the 4-way interaction in a random effects second-level analysis); this learning rate was then used for further analysis across the entire brain and for the connectivity analyses described below. Importantly, we used a first-order Taylor expansion around the learning rate ϵCS = 0.075 to make the model less dependent on the particular choice of learning rate and to account for intersubject variability in the shape of the learning curves. This was implemented by including the partial derivative of the learning curve with respect to the learning rate ϵi as an additional parametric modulator in the GLM for the fMRI data.
These analyses assumed that the optimal learning rate was identical for CS+ or CS− trials. In additional analyses suggested by our reviewers, we tested this assumption. We examined whether 1) a selective decrease of the learning rate for CS− trials improved our ability to detect learning effects during this trial type, and, more generally, whether 2) trial-type specific tests of the partial derivatives indicated a learning rate that was different from ϵCS = 0.075. As detailed in the Supplementary Material, neither of these analyses provided any evidence for a differential learning rate over stimuli or regions.
Because of its short duration and small size, the TO cue is less salient than the CS. Because in the RW model the learning rate reflects stimulus properties including salience (Rescorla and Wagner 1972), ϵTO can be assumed to be considerably smaller than ϵCS. In this study ϵTO was assumed to be 4 times smaller than the ϵCS. It should be noted that violations of this assumption are unlikely to have a dramatic effect because the inclusion of the derivatives enables the model to cope with deviations from the assumed learning rates (see above). The resulting learning curves are shown in Figure 3 (see Supplementary Fig. 1A for a breakdown of the learning curves with regard to the 2 cue components).
In our factorial design, learning is reflected by time-evolving, context-dependent brain responses to visual stimuli. Specifically, over time, learning should change how differential brain responses to visual stimuli depend on the presence of an auditory CS and whether it is presented in a CS+ or CS− context. Furthermore, the emergence of differential responses should follow the time-course predicted by the RW model. In other words, learning is expressed as a 4-way interaction CS type × CS presence × visual outcome × RW learning. (Note that when the CS is absent on a specific trial, this trial can be assigned unambiguously to the CS+ or CS− factor because this factor was blocked.) The primary goal of our GLM analyses was therefore to test this interaction. To establish which CS was driving this interaction, we also tested, the simple (3-way) interactions CS presence × visual outcome × RW learning within each CS type. Finally, to test for responses reflecting the prediction () entailed by the auditory CS, independently of the prediction error elicited by the visual outcome, we tested the simple 3-way interaction CS type × CS presence × RW learning, which is independent of visual outcome.
An important feature of our factorial design is that it enabled us to determine whether the responses of a particular brain region reflected the prediction of the visual target or the prediction error. This is important because one cannot include separate regressors based on predictions and prediction errors in the same design matrix. This is due to the form of the RW equation, in which predictions and prediction errors are perfectly correlated (within a given experimental condition), after mean-correction (see Supplementary Materials for details). However, in a factorial design like ours such a distinction can be made by analyzing the pattern of parameter estimates across conditions, contrasting conditions that correspond to expected and unexpected cue–outcome combinations. Specifically, our factorial design provided us, in a mirror-symmetric fashion, with 2 expected outcomes and 2 unexpected outcomes for each CS type. For example, on CS+ trials, A+V+ and A−V− trials represented expected cue–outcome combinations (conditional probability = 80%) whereas A+V− and A−V+ trials consisted of unexpected cue–outcome combinations (conditional probability = 20%); c.f. Figure 2. This means one can effectively compare expected and unexpected trials (with low and high prediction error, respectively), with a contrast that is orthogonal to the presence or absence of the visual outcome and its prediction. This enabled us to distinguish, voxel by voxel, brain responses that reflected expected visual outcomes from those that represented unexpected or surprising outcomes. During learning, brain regions encoding prediction errors should show increasing activation on trials where the outcome was unexpected according to the learned contingencies and decreasing (or nonchanging) activation on trials where the outcome was expected. We will call such an activation pattern a “prediction-error response”; this activation pattern would be expected if surprise was the driving force for learning. In this case, surprising events, or prediction errors, signal the need for learning in order to update predictions. This idea is not only a core component of associative learning models (Shanks 1995; Schultz and Dickinson 2000), but is also central to predictive coding theories of perception (Rao and Ballard 1999; Friston 2005): that the brain should concentrate resources on representing surprising sensory events.
Note that our factorial analysis was not geared towards detecting prediction-error responses only. It was equally capable of finding opposite activation patterns, that is, increasing activation on trials where the prediction based on the learned contingencies matched the outcome, and decreasing (or nonchanging) activation on trials where the prediction did not match the outcome (c.f. Baier et al. 2006). Notably, for our particular design, both types of responses could be identified by the same statistical test, that is, the 4-way interaction CS type × CS presence × visual outcome × learning (see above). Because it is only the direction of the interaction that differs between the 2 types of responses, our factorial design enabled an analysis that simultaneously tested for these 2 aspects of associative learning.
In DCM, the states of multiple interacting brain regions are modeled as a set of coupled bilinear differential equations (Friston et al. 2003). The neuronal states, which represent the neuronal population activity of the modeled brain regions, change in time according to the system's connectivity and experimentally controlled inputs u. These inputs can enter the model in 2 different ways; they can either elicit responses through direct influences on specific regions (“driving inputs,” e.g., sensory inputs) or they can change the strength of connections between regions (“modulatory inputs,” e.g., task effects or learning). The hidden neural dynamics (i.e., not directly observed by fMRI) are modeled by the following bilinear differential equation:
Here, z is the state vector (with each state variable representing the population activity of one region in the model, in this study the auditory and visual cortex), t is continuous time, and uj is the j-th input to the modeled system (here the stimuli and learning curve). In this state equation, the A matrix represents the fixed (endogenous) strength of connections between regions and the B(1)…B(m) matrices represent the modulation of these connections by (exogenous) inputs (in this case, learning), as an additive change. Finally, the C matrix represents the influence of exogenous inputs on each area (here the auditory and visual stimuli). Note that DCM allows one to make inferences about changes in effective connections between areas, which do not necessarily correspond to direct anatomical connections but may be via intermediary regions.
In DCM, the hidden neuronal dynamics described by equation (3) is linked to predicted BOLD responses by a hemodynamic forward model (Friston et al. 2003). Given measured BOLD responses, maximum a posterior estimates of the parameters in equation (3) can be obtained through an optimization scheme based on variational Bayes (Friston et al. 2003).
The goal of the present DCM analysis was to explain the (3-way) simple interaction CS presence × visual outcome × RW learning for CS+ trials in V1 (see SPM findings in the Results section) by a simple model, in which the strength of the A1 → V1 connection was modulated as a function of the RW predictions, (i.e., learning curves; Fig. 3). Representative A1 time series were chosen by testing for the main effect of CS presence, and V1 time series were selected by testing for the simple interaction described above. (The goal of DCM is to explain regional effects [as detected in a voxel-wise GLM analysis] in terms of interregional connectivity and its experimentally induced changes. This puts congruence constraints on the contrast used to identify a regional time series and the mechanisms in a DCM that are proposed to model this time series. Therefore, different contrasts are typically required for selecting time series representing the different areas in a model; c.f. Stephan, Harrison, et al. 2007.) We did not model the 4-way interaction with DCM because the SPM analysis showed that the learning effect was driven by the CS+ (see Results section).
As the exact locations of activation maxima varied over subjects, we ensured the comparability of our models across subjects by using combined anatomical–functional constraints in selecting the subject-specific time series (c.f. Stephan, Marshall, et al. 2007). Specifically, we thresholded the subject-specific SPMs at P < 0.05 and chose the local maximum within 8 mm of the group activation maxima in primary auditory cortex (A1) and primary visual cortex (V1) as inferred by a probabilistic cytoarchitectonic atlas in MNI space (Eickhoff et al. 2005). As a summary time series, we computed the first eigenvector across all suprathreshold voxels within a radius of 4 mm around the chosen local maximum. Overall, we were able to extract time series in 14 out of 16 subjects. In 2 subjects, V1 could not be defined due to the lack of a significant interaction that met the anatomical and functional criteria described above. These 2 subjects were excluded from the DCM analysis.
The question addressed by DCM was whether learning effects in V1 could be explained by changes in the connectivity of a simple auditory–visual network. Our DCMs modeled the entire time series, so data from all trials or conditions, trying to explain regional activations by condition-dependent changes in connectivity. We tested 3 simple models that could potentially account for the interaction we found in V1. These models were fitted separately to each subject's data and compared using Bayesian model selection (Penny et al. 2004). In these models, auditory and visual stimuli from all trials elicited activity directly in their respective primary sensory areas (see Fig. 4). These driving inputs were modeled as individual events. The first model only had a connection from A1 to V1, whereas the second and third models included the reciprocal connection (see Fig. 5). The A1 → V1 connection in model 1 and 2, and the V1 → A1 connection in model 3 were modulated by the Hadamard product (point-wise multiplication) of the RW associative strength and a vector encoding visual outcome (1 for visual stimulus present, −1 for visual stimulus absent) during CS+ trials. In the first 2 models, this modulatory effect corresponds to the interaction of the auditory CS+ prediction with the visual outcome and models a learning-dependent contribution from CS+ responses in auditory cortex to visual cortex responses that depends on whether the visual stimulus was present or not (c.f., a prediction error that rests on top-down signals from auditory areas). In the third model, which represented a control suggested by one of our reviewers, this modulatory effect acted on the reverse connection, V1→A1.
The postscan debriefing questionnaire showed that none of the subjects had become aware of the contingencies between the auditory and visual stimuli. Prior to the fMRI data analysis we verified subjects’ performance on the target-detection task. On average, subjects responded to 93 ± 3% of the target stimuli. Following Gläscher and Büchel (2005) we determined an optimal learning rate for the RW model, evaluating the primary contrast of interest (i.e., the 4-way interaction in a random effects second-level analysis) under different learning rates in the primary visual cortex (as defined by a probabilistic cytoarchitectonic atlas (Eickhoff et al. 2005). Model fits under 5 different learning rates, suggested ϵCS = 0.075 was the optimal learning rate (see Fig. 3 and Methods section for details).
First, we examined the 4-way interaction CS type × CS presence × visual outcome × RW learning. We found learning-dependent responses in the primary visual cortex and putamen that survived whole-brain correction for multiple comparisons (see Fig. 5A,B). To characterize the nature of this interaction, we tested the simple interaction (CS presence × visual outcome × RW learning) within each CS type. This showed that the 4-way interaction was driven mainly by learning during the CS+ blocks (see Supplementary Fig. 1B for the parameter estimates). As shown in Figure 5A,B, testing the simple interaction for CS+ trials afforded almost identical results in the visual cortex and the putamen as the 4-way interaction (see also Table 1). In contrast, no evidence of learning, that is, no significant interaction of CS presence and outcome with learning, was found for CS− trials.
The nature of the simple 3-way interaction was such that V1 and the putamen showed an increased response when an expected visual stimulus was omitted, or when an unexpected visual stimulus was presented (i.e., A+V− and A−V+ trials). Critically, this response to surprising visual outcomes increased over time as the association was learned, following the form of the RW learning curve. Conversely, V1 responses to predicted stimuli diminished during learning. The putamen showed the same pattern of responses bilaterally; this activation extended into the insula bilaterally (see Table 1).
Because previous studies have implicated the right DLPFC in prediction (error) processing (Fletcher et al. 2001; Corlett et al. 2004), we used an anatomically defined fronto-striatal mask to test the 3-way interaction CS type × CS presence × RW learning, which characterizes responses to the prediction entailed by the auditory CS, independent of the visual outcome. During learning, the right DLPFC became increasingly active when a visual stimulus was predicted compared to when it was not; activity was higher for CS+A+ and CS−A− trials compared with CS+A− and CS−A+ trials (compare the probabilities in Fig. 2). As above, we characterized the nature of the 3-way interaction by testing the associated simple interactions, confirming it was also driven by CS+ trials (Fig. 4C). The same pattern of activation was found in the left putamen, but this activation did not survive correction for multiple comparisons.
Because the learning effect was mainly driven under CS+ blocks, we focused on changes in connectivity between auditory and visual cortices during incidental learning of the predictive attributes of CS+ trials (see Fig. 6). Bayesian model comparison showed that a DCM with a single connection from A1 to V1 (model 1) was superior to alternative models with reciprocal connections (group Bayes factor in favor of model 1: 2.1 × 1017 and 2.2 × 1018 when compared with model 2 and model 3, respectively). Across subjects, the A1 → V1 connection in the optimum model had an average strength of 0.10 s−1 (p = 0.003, df = 13, t = 3.57). During CS+ trials, this connection was significantly modulated by learning, depending on whether the visual stimulus was present or not (i.e., CS+ × (V+ vs. V−) × ϕ in Fig. 6). Note that the modulatory variable in the DCM corresponds to the interaction of the auditory prediction with the visual outcome during CS+ trials. It accounts for a learning-dependent contribution from CS+ responses in auditory cortex to visual cortex responses that depends on whether the visual stimulus was present or not (c.f., a prediction error mediated by top-down signals from auditory areas). Quantitatively, the strength of this modulation was −0.01 s−1 (p = 0.028, df = 13, t = 2.49). This corresponds to learning-induced changes in connectivity ranging from 2% (for CS+A− trials) to 8% (for CS+A+ trials) (Fig. 6). (As shown by eq. 3, the overall strength of a connection, given a single modulatory parameter, is the sum of the intrinsic connection strength [A] and the modulatory parameter [B] multiplied with its associated input [u]. In the present case, the asymptotic magnitude of the input function is 0.8 for CS+A+ trials and 0.2 for CS+A− trials [see Fig. 5].)
Critically, the negative sign of the modulatory parameter reflects the nature of the visual responses to auditory afferents under CS+ trials: V1 responses to predicted visual stimuli diminished during learning and the DCM explained this through a decrease in the strength of the A1 → V1 connection. This is exactly consistent with an increase in the “explaining away” of predicted visual input under predictive coding; in other words, if top-down predictions (see eq. 2) from auditory cues decrease the amplitude of V1 prediction error , a better prediction corresponds to a decrease in effective connectivity. Conversely, V1 responses to unpredicted (i.e., absent) visual stimuli increased during learning. This was modeled in the DCM through an increase in the A1 → V1 connection strength; again this is consistent with an increase in V1 prediction-error amplitude , when predictions are violated. In summary, A1 → V1 influences depended on whether the visual outcome was expected or surprising and were consistent with an “explaining away” role. The emergence of this effect conformed to the learning curve provided by the RW model.
McIntosh and colleagues showed that after a predictive relationship between an auditory stimulus and a visual stimulus had been learned, the auditory stimulus alone was able to evoke responses in the visual cortex (McIntosh et al. 1998). The current study extended this work, pairing a visual stimulus with a predictive auditory stimulus in a 4-factorial design, with the factors CS type (CS+, CS−), CS presence (A+, A−), visual stimulus presence (V+, V−), and learning (over time). Both CS+ and CS− blocks were exactly balanced in terms of sensory stimulation, so that the a priori probabilities of the auditory CS and of the visual stimulus occurring on a given trial were always 50%. Critically, the volunteers did not make any responses to the stimuli whose associations were being learned; instead, they performed a target-detection task on unrelated stimuli. Our factorial design enabled us 1) to characterize changes in neurophysiological responses due to learned associations that were incidental to behavior, and 2) to investigate whether activity in specific brain areas, and the connection strengths amongst them, reflected a match between predictions and outcome or prediction errors, respectively.
Our results demonstrate that during incidental learning of audio-visual associations changes in both regional activity and underlying connectivity reflect prediction errors. Furthermore, we show that learning-dependent responses in visual cortex can be elicited, even in the absence of visual stimuli. This finding can be explained by changes in top-down influences from auditory regions that are consistent with predictive coding models of perceptual inference.
The goal of this study was not to pinpoint the exact mathematical form of learning by comparing different models of associative learning. Instead, we focused on changes in regional activity and interregional connectivity that could be explained by a specific learning model, namely the RW model. The RW model is a generic and well-established model of associative learning that has been successful in modeling a wide range of learning processes (Rescorla and Wagner 1972; Schultz and Dickinson 2000; Pearce and Bouton 2001). We chose this model because it is the simplest learning model appropriate for our particular paradigm. In the absence of interactions among multiple cues per trial, the RW model is mathematically equivalent to a Hebbian model of associative learning (Montague and Berns 2002). A crucial aspect of our paradigm, however, is that on each trial the net prediction resulting from 2 interacting cue components (the auditory CS and the visual TO cue) must be considered (see Methods sections for details). This excludes the use of any associative learning model that cannot accommodate cue interactions (e.g., Hebbian models). In contrast, the RW model accommodates this aspect gracefully. Another learning model, TD learning, can also deal with multiple cues and their temporal relationships; however, under our design with temporally overlapping cue and outcome, the TD model is effectively equivalent to the simpler RW model. Finally, the associative learning models of Pearce and Hall (1980) and Mackintosh (1975) assume that prediction errors affect the amount of attention that is allocated to stimuli and that the more attention is allocated to a specific stimulus, the more strongly it becomes associated with an outcome or reinforcer. This is not relevant to our experimental paradigm in which attention is actively directed away from the stimuli whose associations are learned.
The RW model has one problematic limitation, however: as detailed in the supplementary materials, its equation uses both predictions and prediction errors that are perfectly correlated under mean-correction. In situations where mean-correction is mandatory (e.g., when using them to form interaction terms) this makes it impossible to disambiguate/interpret their contributions to a dependent variable. However, the factorial design in our study allows us to circumvent this problem, as it comprises conditions that correspond to congruent and incongruent prediction/outcome combinations, respectively. Analyzing the 4-way interaction between our experimental factors, we found that responses in the primary visual cortex and the putamen were sensitive to surprising events; over time, these areas became significantly more active when presented with a surprising cue–outcome combination. Learning was stronger for the CS+ blocks than for the CS− blocks, which is in line with previous behavioral evidence (Wasserman et al. 1993; Fletcher et al. 2001). Previous fMRI studies in humans have demonstrated that BOLD activity in the striatum is correlated with (signed) prediction errors during reinforcement learning (O'Doherty et al. 2003; McClure et al. 2003; O'Doherty et al. 2004; Seymour et al. 2004; Jensen et al. 2007; Menon et al. 2007) and other associative learning tasks (Corlett et al. 2004). In these studies, the learned associations, and the sign of the resulting prediction errors, were of direct relevance for behavior. The current study shows that the putamen is sensitive to unexpected outcomes even when the cue-stimulus association is learned incidentally and has no relevance to behavior. However, in contrast to the previous studies, the pattern of putamen activity does not appear to be sensitive to the direction of the prediction error, only to its amplitude. This difference may reflect the fact that learning was perceptual as opposed to operant. In other words, the occurrence of an unpredicted or surprising event may play the role of negative reward, irrespective of whether the surprising event entailed the presence of absence of a stimulus. This issue will be discussed further in the section on predictive coding below.
Our finding that learning-induced responses in primary visual cortex and the putamen reflected prediction errors accords with a basic principle emerging from many previous studies: prediction errors, or surprise, constitute a driving force for learning because they signal the need for learning in order to update predictions (Shanks 1995; Schultz et al. 1997; Schultz and Dickinson 2000). Although the role of prediction errors has been mainly explored for reinforcement learning so far, there is growing evidence that prediction errors may be equally important for learning statistical relationships that are affectively neutral and behaviorally irrelevant. In other words, the same mechanisms that optimize the learning of stimulus–response links may operate during the perceptual learning of stimulus–stimulus associations (Rao and Ballard 1999; Friston 2005). Evidence that organisms learn predictive associations between initially neutral stimuli is seen in classical conditioning effects such as sensory preconditioning (Brogden 1939). Some forms of sensory learning also exhibit such features, for example, the mismatch negativity (MMN) paradigm, in which responses to sensory stimuli decrease with predictability (Friston 2005; Baldeweg 2006), regardless of whether stimuli are attended. A mechanism similar to predictive coding has been proposed in the motor domain for cancellation of self-generated events (Wolpert et al. 1995; Blakemore et al. 1998; Shergill et al. 2005). Moreover, the learning of predictive relationships that are affectively neutral and task-irrelevant may engage similar computational and neural mechanisms as those for predicting significant events (Zink et al. 2006; Wittmann et al. 2007).
The results of the present study support the notion that the role of prediction errors in learning transcends the simple reinforcement of stimulus–response links and plays a more pervasive and general role in various forms of learning. Indeed a hallmark of adaptive systems is their ability to minimize surprising exchanges with their environment (Friston et al. 2006). This entails adjustments to their internal models of the environment so that potentially surprising event can be predicted. Almost universally, this adjustment involves changes in the system's connections; it is therefore perhaps a little surprising that most previous imaging studies on learning and conditioning have exclusively searched for brain areas whose activity correlated with specific variables of a particular learning model (e.g., prediction or prediction error), but have not investigated how these variables change interactions among areas (but see McIntosh et al. 1998; Büchel et al. 1999). Functional interactions are central to the physiological implementation of learning; it has long been suggested that plasticity in connection strengths between neurons underlies the learning of predictive associations (Hebb 1949). Put simply, 2 neural units encoding associated entities increase their synaptic connections to encode the learned associative strength of the stimuli. More precisely, for RW and similar “caching” models (Daw et al. 2005) the connection strength at time t should carry the predicted association at time t (McLaren et al. 1989; Schultz and Dickinson 2000). This hypothesis requires models of effective connectivity, in which connection strengths vary as a function of the associative strength predicted by the learning model. To our knowledge, the present study has implemented this approach for the first time, modeling how learning, as described by a RW model, modulates the effective connectivity, as assessed by a DCM, between primary auditory and visual areas.
In accordance with the considerations above, we investigated whether the learning-related changes in visual cortex responses could be explained by a simple model of effective connectivity, in which the strength of A1 → V1 connection changed as a function of the associative strength predicted by the RW model. We modeled observed responses in the primary visual cortex by means of a simple 2-area DCM in which activity in the visual cortex was modeled by 2 components, 1) a direct effect of visual stimulation and 2) a modulation of the A1 → V1 connection by the interaction of the time-evolving prediction with the visual input (in CS+ blocks; see Fig. 6). Across subjects, this DCM showed a significant change in the strength of the A1 → V1 connection congruent with the pattern of responses in V1: the A1→V1 connection strength increased on trials where the visual outcome did not match the auditory prediction and decreased on trials where prediction and outcome matched. In other words, the learning-induced changes in A1 → V1 connection strength reflected the same pattern of surprise or prediction errors as the regional activity in V1. This demonstrated that the response of V1 to visual stimuli was modulated by learning-dependent changes in top-down auditory influences that were consistent with the notion of predictive coding, a general framework for perceptual inference and learning that is discussed in the next section (Friston 2005).
Although connections in models of effective connectivity do not need to correspond to monosynaptic anatomical connections, it is of interest to note that the surprise-related response in visual cortex appears to be in the peripheral visual field (Fig. 3A), and anatomical connections from primary auditory cortex to peripheral visual cortex have been demonstrated in recent monkey studies (Falchier et al. 2002; Rockland and Ojima 2003). Additionally, numerous fMRI studies have demonstrated that auditory stimulation or auditory attention affect activity in visual cortices during simultaneous processing of visual stimuli (e.g., McIntosh et al. 1998; Baier et al. 2006; Watkins et al. 2006).
In previous neurophysiological studies of reinforcement learning, a negative prediction error, in the form of unexpected absence of a reinforcer (e.g., a reward), often led to a decrease in neuronal or BOLD activity (Schultz 1998; McClure et al. 2003; Tobler et al. 2007). Such directed excursions are thought to reflect the fact that the prediction error is a signed quantity: it signals not just that predictions need to be updated, but in which direction. In contrast, in our study we found an increase in striatum and visual cortex activity not only for unexpectedly presented stimuli, but also for the unexpected absence of a stimulus. Similarly, the strength of the A1 → V1 connection decreased whenever the visual outcome was expected, and it increased whenever the outcome was surprising.
A useful perspective that explains our 2 main findings, the implicit encoding of surprise by V1 responses and its mediation by learning-dependent changes in input from the auditory cortex, is provided by the framework of predictive coding. Predictive coding posits a hierarchy of connected brain areas in which each level strives to attain a compromise between information about sensory inputs provided by the level below and predictions (or priors) provided by the level above (Rao and Ballard 1999; Murray et al. 2002; Friston 2003; Summerfield et al. 2006). The central learning principle is to establish a good model of the world, which is achieved by changing connection strengths such that prediction errors are minimized at all levels of the hierarchy. The hierarchy of a predictive coding architecture is often defined anatomically (in terms of forward and backward connections) and within one sensory modality, but it is equally possible to examine cross-modal predictive coding relationships (c.f. von Kriegstein and Giraud 2006). In the present study, a temporal hierarchical relation between auditory and visual areas is induced by presenting the auditory cue prior to the visual stimulus.
Predictive coding may be a general principle of brain function in which statistical relationships in the world are monitored, even when they are not attended and not relevant for ongoing behavior. This would allow the brain to ignore predictable and therefore uninteresting events in the environment, thereby enhancing the saliency of unexpected events. A good example of this notion is given by the mismatch negativity (MMN), the difference between the event-related potential to an unexpected “deviant” and predictable “standard” stimuli (Naatanen et al. 2001). Importantly, the relationship between the MMN and learning was not established on the basis of behavioral data; in fact, it was initially not even recognized (Naatanen et al. 1978). This relationship was only subsequently inferred from striking relationships between the probability of deviants and neurophysiological time series (e.g., Csepe et al. 1987; Pincze et al. 2002). Current theories of MMN, which interpret it as a paradigmatic example of learning based on predictive coding (Friston 2005; Baldeweg 2006), have recently received empirical support by DCM studies of electroencephalographic measurements (David et al. 2006; Garrido et al. 2007). These studies demonstrated that MMN can be understood as a prediction-error signal, which results from deviant-induced changes in inter-regional connection strengths. A similar conclusion is offered by the present study. Here, we found that, at least during CS+ trials, BOLD responses in area V1 increased when the prediction provided by the auditory cue did not match the subsequent visual stimulus (analogous to MMN elicited by deviants). This surprise signal progressively increased as the predictive properties of the auditory cue were learnt. Moreover, in direct analogy to DCM studies of the MMN (David et al. 2006; Garrido et al. 2007), we found a decrease in the A1 → V1 connection strength on “standard” trials (where the prediction by the auditory cue was correct), and an increase on “deviant” trials where the visual outcome did not match the prediction by the auditory cue. In the context of predictive coding, learning involves a more efficient suppression of sensory events, which is manifest by an apparent reduction in evoked responses, mediated by top-down predictions (which explain away bottom-up sensory afferents). Within the framework of our bilinear DCM, this is modeled as a decrease in top-down effective connectivity for visual stimuli that match the current prediction.
We conclude this article by discussing a number of limitations of the present study. First, because we wished to study brain responses to stimulus associations that were irrelevant to behavior, we did not obtain behavioral evidence for learning. Instead, as with the MMN paradigm described above, learning is characterized neurophysiologically as a change in activity over time. We are currently conducting similar experiments with stimuli that do require a behavioral response, providing us with a behavioral assessment of the learning process. It might be useful to emphasize that a neurophysiological characterization of incidental associative learning processes, only requires that the statistical associations between the CS/US stimuli are irrelevant for task performance. In contrast, it is not essential that the CS and US stimuli themselves are behaviorally irrelevant. In fact, in our experiment these stimuli have some behavioral relevance insofar as they constitute distractors to which responses must be suppressed.
A second limitation is that the magnitude of the learning effects (i.e., changes in A1 → V1 connection strength in the range of 2–8%) was rather modest at the single-subject level. This is likely to be due to the incidental nature of the learning in the present study, with attention being directed away from stimulus associations and none of the subjects noticing the contingencies. However, the expression of these learning effects was highly consistent across subjects.
Finally, the dynamic causal model presented here does not make any assumptions about where in the brain the predicted associative strength is calculated; that is, which brain area exerts the modulatory influence onto the A1 → V1 connection. Given the responses that we observed in the putamen, it is possible that the modulation of the A1 → V1 connection is mediated via this region. Testing this hypothesis, however, requires the inclusion of nonlinear terms in the neuronal state equation of DCM which goes beyond its bilinear mathematical framework. However, very recently, there has been methodological progress in nonlinear extensions of DCM (Stephan, Harrison, et al. 2007), and once this approach is firmly established and accepted, it should be possible to investigate the source of the modulatory influences we observed. Notwithstanding this limitation, the current study has presented a novel combination of dynamic system models and formal learning theory, which were used to model human neuroimaging data. This is a further step toward the long-term goal of constructing invertible models that unite the neurophysiological and computational aspects of learning (c.f. Stephan 2004).
Wellcome Trust (ref: 0856780/Z/99/B); Wellcome Trust PhD studentship (ref: 078047/ZS/04/Z) supported H.D.O.; and University Research Priority Program “Foundations of Human Social Interactions” at the University of Zurich supported K.E.S.
We thank Quentin Huys for helpful discussions of the manuscript.
Conflicts of Interest: None declared.