|Home | About | Journals | Submit | Contact Us | Français|
Marr–Albus adaptive filter models of the cerebellum have been applied successfully to a range of sensory and motor control problems. Here we analyze their properties when applied to classical conditioning of the nictitating membrane response in rabbits. We consider a system-level model of eyeblink conditioning based on the anatomy of the eyeblink circuitry, comprising an adaptive filter model of the cerebellum, a comparator model of the inferior olive and a linear dynamic model of the nictitating membrane plant. To our knowledge, this is the first model that explicitly includes all these principal components, in particular the motor plant that is vital for shaping and timing the behavioral response. Model assumptions and parameters were systematically investigated to disambiguate basic computational capacities of the model from features requiring tuning of properties and parameter values. Without such tuning, the model robustly reproduced a range of behaviors related to sensory prediction, by displaying appropriate trial-level associative learning effects for both single and multiple stimuli, including blocking and conditioned inhibition. In contrast, successful reproduction of the real-time motor behavior depended on appropriate specification of the plant, cerebellum and comparator models. Although some of these properties appear consistent with the system biology, fundamental questions remain about how the biological parameters are chosen if the cerebellar microcircuit applies a common computation to many distinct behavioral tasks. It is possible that the response profiles in classical conditioning of the eyeblink depend upon operant contingencies that have previously prevailed, for example in naturally occurring avoidance movements.
A striking feature of cerebellar physiology is that the cortical, nuclear and olivary territories are connected in register to form a basic microcircuit that is repeated many-fold across the cerebellum to reveal, at the cortical level, a “crystalline” array of processing elements. Numerous studies have shown that small sets of these microcircuits are fundamental for fast and accurate control of various individual motor tasks-for example, smooth pursuit eye movements (Zee et al., 1981), ocular following (Shidara et al., 1993), saccades (Optican and Robinson, 1980) the vestibulo-ocular reflex (VOR) (Ito, 1982) and limb movements (Gilbert and Thach, 1977). This combination of uniformity and general applicability suggests that the cerebellar microcircuit implements a common algorithm that is useful for a wide range of sensory and motor control tasks (Ito, 1984, 2006).
A conceptual framework that is widely used in characterizing this cerebellar algorithm is based on the models of Marr (1969) and Albus (1971). In the Marr–Albus framework, the cerebellum learns to construct appropriate outputs from input “contexts,” which are represented by those afferent sensory or motor signals on the mossy fiber inputs that occur reliably before the required output (Albus, 1971; Gilbert, 1974; Marr, 1969). Learning is driven by a second type of input, a “teaching” signal carried by climbing fibers. A simple yet general implementation of this conceptual framework is the adaptive filter, a generic signal-processing device that has wide utility in control theory and other applications (Widrow, 1985), and was first formally applied to the cerebellum by Fujita (1982). Because the cerebellum has such a homogeneous structure, it can be hypothesized that each microcircuit implements the same basic algorithm and that a particular behavioral task is specified by the information content of its climbing fiber input and by the target of its output. Adaptive filter models have been used successfully for many of the cerebellum-dependent tasks referred to above (references given in Dean et al., 2010).
Classical conditioning of eyeblink and skeletal muscle reflexes has been shown to be dependent on the cerebellum (see review by Yeo and Hesslow, 1998) and experimental analysis of the neural circuitry underlying eyeblink conditioning suggests mechanisms that fit well within the Marr–Albus framework. It appears that the conditioned stimulus (CS) is a “context” delivered to the cerebellum by the mossy fiber system, the unconditioned stimulus (US) is a “teaching” signal borne by the climbing fiber input from the inferior olive, and the resultant cerebellar output drives the conditioned response (CR) as a motor output (Hesslow and Yeo, 2002; Yeo and Hesslow, 1998). If the cerebellar microcircuit implements a single basic algorithm, as has been suggested, then this cerebellum-dependent associative learning task should have an adaptive filter implementation consistent with that of other cerebellar tasks.
However, we propose here that there is a fundamental difficulty in applying the Marr–Albus framework to classical conditioning. All Marr–Albus based models of the cerebellar role in motor control recognize that the olivary signal conveys a sensory or motor error related to movement inaccuracy, and that by reducing this signal the cerebellum learns accurate commands. But in classical conditioning of a movement there is apparently no signal related to movement inaccuracy, because by definition the US is delivered independently of the subject's response. In a carefully controlled classical conditioning study, the US is not ameliorated and there appears no a priori reason why the system produces an accurately-timed, amplitude-scaled conditioned response. Moreover, a teaching signal that cannot diminish presents severe theoretical problems for models of the Marr–Albus type, because a meaningful endpoint to learning can never be reached. The only way to achieve stable learning is for the system itself to construct an internal estimate of movement inaccuracy, for example by comparing the US with the CR motor drive. To date, all other Marr–Albus type models of classical conditioning have used this comparator hypothesis (e.g., Grossberg and Schmajuk, 1989; Medina and Mauk, 2000; Moore et al., 1989), and yet the implications for motor learning have not been investigated systematically.
We raise this concern in the particular context of classical conditioning of the nictitating membrane (third eyelid) response in rabbits. This experimental system is well-suited for studying cerebellar function because learning is robust and tractable, the behavioral response is well-characterized and significant parts of the essential neuronal circuitry have been identified (Thompson, 1983; Yeo and Hesslow, 1998). In addition, there is strong evidence that the error signal driving the learning results from a comparison of the US and CR drive. For example, US-driven olivary responses, recorded as climbing fiber field potentials in the cerebellar cortex, and which drive learning in the cerebellar cortex, do diminish during CR acquisition (Hesslow and Ivarsson, 1996; Rasmussen et al., 2008). Finally, we have recently characterized the dynamics of the nictitating membrane response and its motor drive (Lepora et al., 2007b, 2009). Two relations were evident from this analysis: (1) the temporal profile of the motor drive (from retractor bulbi EMG spike-rate profiles) was well-approximated by a Gaussian function, whose peak amplitude was linearly related to the peak amplitude of the conditioned response; (2) the NM response profiles could be generated from this drive by passing the latter through a first-order linear differential equation equivalent to a first-order filter with time constant of the order of 100–200ms.
Based on these principles, we construct a system-level model of NM conditioning (Figure (Figure1)1) in which the cerebellum is modeled as an appropriate adaptive filter, the motor plant as a first-order filter, and the inferior olive as a comparator of the US signal and cerebellar output (as conveyed by the nucleo-olivary pathway). Then we vary the model assumptions and parameters to examine systematically how the resulting conditioned response depends on the interaction between the comparator architecture and the other principal components of the system-level neuroanatomy, in order to disambiguate the general properties of the model's behavior from its specific dependence on parameter values. In consequence, we identify the computational consequences of a having a comparator for the inferior olive, and how these relate to the absence of sensory or motor error in a cerebellar model of classical conditioning.
We show that: (1) Trial-level features, such as acquisition, extinction and blocking, are emergent properties of the classical conditioning model and largely independent of parameterization details; (2) Real-time features of classical conditioning, such as the CR amplitude and timing, could be accurately reproduced by appropriate tuning of the model parameters in the plant, inferior olive and cerebellum. The tuning required for (2) is a direct consequence of the olive not conveying motor error to the cerebellum, but rather an error signal from a comparison between the US and cerebellar output. We suggest that all previous models of classical conditioning employing a comparator architecture suffer from this same “fine-tuning” problem. A possible solution to this problem is that CR profiles in classical conditioning of the NM are acquired through operant contingencies that have previously prevailed, for example in naturally occurring avoidance movements. In principle, a Marr–Albus model utilizing sensory or motor error could learn these operant relationships prior to classical conditioning.
Parts of this work have been submitted previously in abstract form Lepora et al. (2007a).
Neuronal signals traveling down the “wires” in Figure Figure11 represent mean firing rates for the associated neuron or population of neurons, so that all information is assumed rate-coded. The systems corresponding to the “boxes” in Figure Figure11 are each represented by a filter transformation, which is a mathematical tool for describing the transformation of time-varying functions that is equivalent to a differential equation relating input and output signals. Most systems in Figure Figure11 are modeled as non-adaptive, fixed transformations. Only the (cerebellar) system C is adaptive, with its transformation continually updating according to a separate “teaching” signal.
For computational purposes, all signals are discrete over time at intervals of Δt=1ms. Simulations are set up over a number of trials, typically 100, each lasting for 1s and starting at time zero. Model inputs are one CS or more, each labeled by their subscripts 1,…,n, and a single US. These stimuli are of constant intensity between their onset and offset times
where the stimulus intensities are initially normalized to Ii=IUS=1. Accordingly, the inter-stimulus interval (ISI) equals USonset–CSonset, the time difference between CS- and US-onset times, which is here set to a standard value of 500ms common in experimental eyeblink conditioning. The US duration USoffset–USonset is set to 10ms, and CSonset is set to occur at t=0ms.
The cerebellar cortex system C represents the processes that transform the neuronal drive from the CS into the (Purkinje cell) output of the cerebellar cortex c(t). A Marr–Albus adaptive filter model of the cerebellum is used to represent this transformation (Figure (Figure22 and see Introduction). The input expansion-recoding layer of the adaptive filter represents the transformation from the input signals (corresponding to the CS neuronal drive) into an expanded distribution of parallel signals (corresponding to parallel fiber activity):
where each basis filter G1 to GN transforms the input CS signal into a parallel fiber signal p1(t),…,pN(t) and p0 is a constant offset (representing the background firing rate of granule cells). Our initial choice of basis filters is designed to transform a square-wave CS into a set of Gaussians beginning from CS-onset (Figure (Figure3A)3A) that have some similarity with the spectral basis functions used by Bullock et al. (1994) in their model of eyeblink conditioning. Later in this paper we will also consider other recoding strategies (Figure (Figure3).3). The mathematical definition of these basis filters is covered in the Appendix. The output layer of the adaptive filter is the site of adaptation in this model, corresponding to weight multiplication and summation:
where wk are a set of adjustable weights, representing synaptic efficacies, and c0 is a constant offset (representing the background firing rate of the cerebellar output).
Climbing fiber activity e(t) originating in the inferior olive is assumed to drive learning in accordance with the covariance learning rule in which weight changes are proportional to the temporal correlation between parallel fiber and climbing fiber signals,
where β is the learning rate (set to a nominal value of 10−4 in the simulations) and the weights update at each time step wk(t+Δt)=wk(t)+δwj(t). Weight changes may be positive or negative, representing long-term depression (LTD) or long-term potentiation (LTP) respectively at the parallel fiber/Purkinje cell synapse, although other sites of plasticity could be included in the same formalism if they are modified by the same error signal. Note that the learning rule depends only on the change of its inputs from their mean firing rates to ensure that the system is stable in the absence of a change of drive from the parallel and climbing fibers.
The result of conditioning is generally accepted to be a pause in Purkinje cell output (Jirenhed et al., 2007). With the sign conventions chosen here, this requires some weights wk(t) to become negative. Although parallel fiber/Purkinje cell synapses are excitatory, parallel fibers also modulate Purkinje cell output via inhibitory interneurons. Effective negative weights can thus be obtained either by allowing learning in this indirect pathway (Ekerot and Jorntell, 2003) or by assuming a constant weight on this pathway that is modulated by the excitatory input. Yet another alternative is to have parallel fibers carry both bursts and pauses generated by CS activity in Eq. 2 above. All these schemes are computationally equivalent to allowing both positive and negative weights in Eq. 3.
The deep cerebellar nucleus system N receives inhibitory input from the cerebellar cortex and is activated by a pause in cortical firing to excite sites outside the cerebellum. The simplest representation of this process is sign inversion relative to a threshold:
where n0 is the threshold firing rate. As we will show, this assumption leads to unobserved oscillations in the deep cerebellar nucleus output which can only be corrected by introducing a threshold non-linearity into the system. This can be achieved in various ways, but the simplest choice is to allow the deep cerebellar nucleus to activate only if the cerebellar cortical drive falls below a threshold:
This threshold non-linearity allows only a pause from the cerebellar output to activate the deep nucleus. The implications of this assumption are discussed later (see Results).
The inferior olive system O is assumed to construct its output signal e(t) by comparing the US input signal with a copy of the cerebellar nucleus output, interpreted as modulation of the excitatory US-drive by inhibitory nucleo-olivary feedback (Andersson et al., 1988). The most basic representation of this comparison is signal subtraction:
where and are gains and and are time delays of the US and cerebellar components respectively, and e0 is a constant offset (representing the background firing rate of neurons in the inferior olive). Both gains are initially set to unity and time delays initially set to zero. The brainstem system B transforms the US input and cerebellar output signals into the motor command m(t) that drives the output (eyeblink) system, interpreted as the convergence of these two neuronal drives at the premotor sites in the brainstem. The simplest representation of this combination is a linear sum of the two signals:
where and are the gains of the US and cerebellar components, which are initially set to unity. The constant offset m0 represents the background firing rate of premotor neurons, which is assumed to produce zero initial displacement of the eyeblink plant described below. Its value does not affect model performance, and is set to zero.
The eyeblink plant system P transforms the motor command m(t) into the output (blink) response r(t), and represents the dynamics of the blink-related muscles and peripheral tissues. A simple, experimentally verified model of this system for the nictitating membrane (third eyelid) response in rabbit is a first-order linear filter (Lepora et al., 2007b):
where ap=exp(−Δt/tp), bp=gp are the filter parameters defined by the gain gp=1 and time constant tP=100ms that is typical from studies of this plant. Note that unlike the external eyelids the nictitating membrane in rabbit has no antagonist muscle, and thus can only move passively as it re-opens on the return phase of the blink.
The main aim of this paper is to investigate systematically how the performance of the above system-level model of eyeblink conditioning depends on its parameters and underlying assumptions. In particular, this approach will allow us to disambiguate features of the model performance that depend on the model's basic computational capacity from those requiring hand-tuning of parameter values. For the sensitivity analysis, we generally take three parameter values, consisting of the default with higher and lower values. Because most CR properties change monotonically with parameter value, this is enough to qualitatively describe the model sensitivity. The variants of the eyeblink conditioning model that we consider are as follows.
The plant gain is fixed at gp=1 to give a peak UR amplitude of ~10mm that gives a standard baseline for changes to the CR profile. The plant time constant is varied across values tp of 50, 100, and 200ms to span the range observed in behavioral studies (Lepora et al., 2007b).
It is useful to define a relative scale between the gains of the US-driven and cerebellar-driven components. Then the effect of the two drives can be compared with values rO of 0.5, 1, and 2, and the gain of the US component (with rO=1) is a measure of the overall comparator gain. In addition we also varied time delays in the US-driven and cerebellar-driven components by 0, 50, and 100ms. Values from 20 to 90ms are compatible with the physiologically identified delays in nucleo-olivary modulation of climbing fiber responses (see Discussion).
The gain of the US-driven component is fixed at one to keep the peak UR amplitude fixed at ~10mm, which provides a standard baseline to compare changes in the CR amplitude with. The gain of the cerebellar-driven component is varied around its initial value of one.
The basis filters used initially in our simulations are designed to give a set of Gaussians from a square-wave CS input (Figure (Figure3A).3A). The first two variants of the adaptive filter model consider changes to this initial set of basis filters: (i) a more closely spaced set of widths for the Gaussians (Figure (Figure3B)3B) and (ii) a set of Gaussians that also vary in peak amplitude (Figure (Figure3C).3C). The other variants consider completely different basis filters: (iii) a tapped delay line (Figure (Figure3D),3D), as considered in early models of eyeblink conditioning (Barto and Sutton, 1982); Moore et al., 1989); (iv) a set of decaying exponential profiles (Figure (Figure3E);3E); and (v) a delta function basis (Figure (Figure3F).3F). Mathematical details of how these basis filters are defined are given in the Appendix.
To give a standard measure of trial-level behavior in classical conditioning, we implemented the Rescorla–Wagner (RW) model (Rescorla and Wagner, 1972), as it is generally regarded as a good overall account of trial-level effects in classical conditioning (e.g., Brandon et al., 2002). The RW model describes how US reinforcement of one or more CSs (as occurs in various contingency learning situations, such as blocking and conditioned inhibition) changes the CS–US association strengths V of the set S of CSs present on that trial, with learning rule:
where 0<αi≤1 is the salience for the ith CS, λ is the asymptotic value of the associations, and β1 and β2 are learning rates when the US is present/absent respectively. Our simulations typically considered 100 trials with learning rates β1=β2=0.1, salience αi=0.05 and asymptotic acquisition strength fixed to λ=4.5.
The first set of simulations was of delay conditioning with CS-onset preceding US-onset by 500ms and CS-offset concurrent with US-offset. Behavioral experiments typically consider blocks of several acquisition trials interspersed with single CS-alone test trials. These test trials are inserted to view CR topography that is otherwise hidden by the UR (although they cannot be too frequent, otherwise CR acquisition is affected). Because a simulation can use test trials for which the system is artificially set to not learn, we followed every CS–US paired trial with a CS-only test trial to track acquisition over the entire learning process.
Simulations were considered first for trial-level features of the model where only the CR peak amplitude is examined for each trial. The more detailed real-time features of the CR topography within a trial were then considered. Dependency upon the modeling parameters was considered for both trial-level and real-time studies.
To test the basic model for appropriate acquisition of a CR, we simulated 100 CS+US paired trials with ISI of 500ms. Appropriate extinction was tested by following these 100 acquisition trials with 100 unpaired CS-alone extinction trials. As in behavioral experiments, with CS and US intensities fixed, the amplitude of the CR peak gives a measure of CS–US associative strength. We considered first the adaptive filter model described for the basic real-time model (see Methods).
Acquisition training caused the model CR peak to increase from 0mm on the first trial to a stable maximum of ~4.5mm (blue curve in Figure Figure4A).4A). This maximum value was a consequence of parameter choices described in Section “Methods,” and is typical of conditioned eyeblink responses in the rabbit. Extinction training then caused the model CR peak to decrease steadily to zero response (blue curve in Figure Figure4B).4B). Both the acquisition and extinction learning curves approximated exponential functions that asymptote to a stable endpoint of learning, which is the fully acquired CR in acquisition learning and the absence of a CR in extinction learning. This learning stability is an important feature that follows directly from the inferior olive comparator model in Eq. 6: as the nucleo-olivary inhibition becomes sufficiently intense to cancel the US excitation to the olive, the climbing fiber error signal approaches zero and learning ends. For comparison, the acquisition learning curve was considered when the nucleo-olivary pathway is removed from the model. CR acquisition then became unstable because there is then no inhibitory feedback to drive down climbing fiber activity as learning proceeds (red line in Figure Figure44A).
Robustness of the NM conditioning model over acquisition and extinction was tested with alternatives that incorporated changes to the plant, olive comparator and cerebellar model (see Figure Figure6).6). All of the adaptive filter models that we considered approached a stable endpoint to learning in a manner qualitatively similar to that of the initial model. Features that varied included the overall learning rate, as measured by the number of trials to approach the final state, and the amplitude at full acquisition. Thus the basic acquisition and extinction behavior of the model was robust to changes in model specifics. Moreover, in all cases considered here, the acquisition and extinction learning curves had exponential profiles that approached stable asymptotes after many trials. Such learning curves are typical of models that use a supervised learning algorithm based on a covariance learning rule, of which a well-known example is the RW model (Rescorla and Wagner, 1972) of classical conditioning (dashed blue lines on Figure Figure44).
Important examples of contingency learning are overshadowing, blocking and conditioned inhibition effects with respect to the presentation of two conditioned stimuli, CSA and CSB (assumed here to be of equal onset and duration, with ISI of 500ms). These effects were examined firstly in the system-level model (see Methods) and then robustness to the changes in parameters was considered (see Methods for parameter details). One hundred training trials were typically used, with the training broken into two stages of 50 trials if necessary.
Overshadowing was tested by first training with compound CSA+CSB+US paired trials and then looking for a diminished CR to independent presentation of CSA and CSB (compared with CRs to each stimulus had each been trained alone) with the more salient CS dominating. In our model, salience corresponded to the amplitude of the CS drive to the cerebellum, with CSA here considered to have twice the salience of CSB. Model results were consistent with overshadowing: CR peak amplitudes of ~3.7 and ~0.9mm for the individual stimuli CSA and CSB were diminished relative to the ~4.5-mm peak amplitude for training a single CS (c.f. Figures Figures4A,C).4A,C). Hence, with compound conditioning, the more salient CSA elicited a CR that is greater, or overshadows, that ofCSB.
Blocking was tested by training a CR to the first stimulus over CSA+US paired trials, and then looking for blocked acquisition to a second stimulus CSB when training with the compound CSA+CSB+US in paired trials. Model results exhibited a clear blocking effect with the CR peak to the pre-trained stimulus CSA reaching a maximum of ~4.5mm, while that of the second stimulus CSB reached only ~0.5mm after compound training (Figure (Figure44D).
Conditioned inhibition was tested by training the first stimulus CSA to reliably predict the US and the compound CSA+CSB to reliably predict no US (e.g., by interspersing paired CSA+US trials with unpaired CSA+CSB trials (Pavlov, 1927)). Inhibition was then manifest by, for example, requiring an extended training period to acquire an overt CR to the second stimulus CSB. Model results exhibited two aspects of conditioned inhibition: (i) inhibitory training led the first stimulus CSA to reliably predict the US and the compound CSA+CSB to reliably predict no US; (ii) acquisition training to CSB was then retarded by ~5 trials relative to a naive CS (Figure (Figure44E).
Robustness of contingency learning for the adaptive filter model was tested with the alternative models described in Model Variants (see Methods). All models displayed qualitatively similar trial-level behavior to the initial model, showing that the contingency learning phenomena of blocking, overshadowing and conditioned inhibition are robust features of this class of models. In addition, trial-level results were qualitatively similar to those of the RW model of classical conditioning (dashed lines on Figure Figure44).
The previous results revealed how the NM conditioning model performs at the trial-level, where only information about the CR peak amplitude is used to describe the learning process. The following simulations examined the real-time behavior of the adaptive filter model, such as the shape of the learnt CR profile and how this profile changed during learning.
For an overall illustration of the real-time behavior during learning, consider the cascade diagram of the evolving CR profile (Figure (Figure5A)5A) during the acquisition training represented at the trial-level in Figure Figure4A.4A. The behavior agreed qualitatively with experimental studies (Figure (Figure5B):5B): during acquisition training, the CR rose from an initial zero response, with the peak response occurring near the time of US-onset. However, the model CR peak was delayed by ~75ms after US-onset, a delay rather greater than those seen in the behavioral responses that contributed to our model of the nictitating membrane plant (Lepora et al., 2007b; see Figure Figure5B).5B). The CR peak delays observed in other behavioral studies are considered in the discussion, but for the time we note that this model delay is closer to the values described in some other reports (see Hardiman and Yeo, 1992; Kehoe et al., 2009). In terms of response duration, the simulated and experimental responses were very similar, with both taking approximately 500ms to rise and fall. Adjustments to the basic model for timing the peak CR are addressed later.
Firstly, we examined how the CR profile was affected by changing the parameter values of the eyeblink plant (Eq. 8). For the basic model, we used a first-order linear filter with gain gp=1 and time constant tp=100ms, consistent with nictitating membrane (third eyelid) response data in rabbit (Lepora et al., 2007b). Our simulations considered the effect on the CR of varying these two parameters, the results of which are shown in Figure Figure6.6. The other default parameters in the basic model were: relative olive gain rO=1; nucleo-olivary delay, ; and CS–US ISI, tISI=500ms.
The gain of the eyeblink plant determined the linear scale between the motor drive and eyelid displacement for both the CR and UR. Therefore, varying this gain simply scaled the CR amplitude (Figure (Figure6A)6A) and the UR amplitude equally (figure not shown). Note that in our model the plant gain has no effect on the ratio of CR to UR peak amplitude. In the remaining simulations the plant gain was fixed at gP=1, which gives a peak UR amplitude of ~10mm. Variations in the CR profile were then examined relative to this fixed UR.
The plant time constant determined the time-course of the passive return to rest of the CR after the motor drive has terminated, which relates to the visco-elastic properties of the eyeblink plant. As expected, increasing the time constant across the range of physiologically observed values (Lepora et al., 2007b) of tp=50, 100, and 200ms led to progressively longer return times (Figure (Figure6B).6B). Note also that the peak CR amplitude became larger with longer time constants, because of a corresponding decrease in elasticity or increase in viscosity of the plant. In addition, the CR peak was delayed for longer time constants, with latencies from US-onset to peak CR of 43, 70, and 98ms for the three time constant values above. This increase in delay occurs because the plant becomes slower to react to its input as its time constant increases, resulting in a progressively longer lag between the peak drive and peak response. Note that the lag also increases with the time taken for the input drive to peak (from zero lag for an instantaneous delta function drive).
CR profiles were also affected by changing the inferior olive model. The effect of changing gain parameters in the inferior olive model in Eq. 6 was more easily seen by considering an overall gain and a relative gain of the cerebellar output and US-drive. Varying the overall gain simply scaled the climbing fiber signal e(t) and hence the learning rate β in Eq. 4; therefore, it does not affect directly the CR profile. However, the relative gain determined the cancellation of the US-drive and cerebellar drive at learning completion, so that scaling the relative gain changed the CR amplitude proportionally to 1/rO (Figure (Figure6C).6C). Therefore, increasing the plant gain was equivalent to decreasing the relative olive gain (Figures (Figures6A,C),6A,C), even though they were caused by different mechanisms (the first fixed the cerebellar output and changed the plant, while the second fixed the plant and changed the cerebellar output).
Increasing the time delay for the US-drive was equivalent to delaying US occurrence, which delayed CR occurrence by the same duration (Figure not shown). Biologically, this delay is only a few milliseconds. A more important parameter was the delay in nucleo-olivary inhibition. A key feature of the model is that the olive functions as a comparator of the CR drive and the US signal to control learning rates. Without any delays in the nucleo-olivary pathway, olivary output would only be small when the CR drive peaks at US-onset (or a few ms later to allow for neural conduction delays in the US pathway). However, as we have shown (Lepora et al., 2007b), the peak CR drive, measured as the EMG response in the retractor bulbi muscle, reliably preceded US-onset by 50–100ms (Figure (Figure44 of Lepora et al., 2007b). Consistent with these previous results, here we find that introducing a delay in the nucleo-olivary pathway causes the Gaussian CR drive to reliably precede the US-onset. Then due to the visco-elastic properties of the NM plant (see previous section), the peak CR lags the peak drive by a time that depends upon both the width of the Gaussian and the plant time constant (c.f. Figure Figure6B)6B) and which can produce CR peaks around the US-onset or later.
The actual nucleo-olivary delay in the model is a free parameter, with values of 0, 50, and 100ms leading to latencies from US-onset to CR peak of 70, 37, and 6ms, respectively (Figure (Figure6D)6D) for a plant time constant tp of 100ms. Note that these latencies depend on the plant dynamics, with a larger plant time constant tp of 200ms giving longer CR peak latencies of 98, 61, and 27ms, respectively (Figure not shown). In setting a realistic value for the nucleo-olivary delay, it should be noted that electrophysiological evidence has revealed delays of 25–90ms in nucleo-olivary inhibition for eyeblink-related climbing fiber responses (Hesslow, 1986; Hesslow and Ivarsson, 1996).
Next we examined how the CR profile is affected by changing the ISI from CS-onset to US-onset, while keeping the US duration fixed at 10ms. An important aspect of this dependence in the model is that the adaptive filter basis elements were Gaussian, with widths that increased according to their time-to-peak after CS-onset (Figures (Figures3A–C).3A–C). This model feature was included to simulate a broadening of the CR profile as the ISI became longer, which is fully in accord with experimental evidence across a range of studies on both the width of the EMG drive and the shape of the CR (Lepora et al., 2007b, Figure 11).
The model results are that at three different ISIs (350, 500, 650ms) the CR peak shifted in time relative to US-onset (Figure (Figure6E).6E). These finding are broadly in accord with experimental manipulations of the CS–US interval, which also reveal longer delays of CR peak with longer ISIs (Kehoe et al., 2009). Note that the profiles in Figure Figure6E6E are considered independently of nucleo-olivary delay which is set here to zero. For Gaussian basis functions of constant peak amplitude (Figure (Figure3A),3A), the resulting CRs showed little variation in peak amplitude (Figure (Figure6E).6E). However, if Gaussian basis functions of varying peak amplitude (Figure (Figure3C)3C) were considered, then the amplitude of the CR peak varied with ISI (Figure not shown). This effect was most pronounced over a fixed number of acquisition trials, because the model then had a slower rate of CR acquisition at longer ISIs. Details of the response latencies and other timing aspects were otherwise little affected.
Peak CR latency increases with ISI because the CR drive becomes broader at longer ISI, and this effect interacts with the plant kinetics to give a longer latency between the peak drive and response. For a plant time constant tp of 100ms, as in Figure Figure6E,6E, the CR peak latencies were 65, 70, and 74ms for ISI of 350, 500, and 650ms, respectively. These values were longer for a larger plant time constant tp of 200ms, with corresponding latencies of 88, 98, and 107ms consistent with our previous results on the plant dynamics.
Next we examined how the CR profile is affected by changing the brainstem model in Eq. 7. The brainstem model used here linearly combines the drive from the US and cerebellar output, with a gain for the US-drive and a gain for the cerebellar drive. The effect of these two parameters was to scale the plant gain for the UR and CR respectively, which simply scaled the response amplitude by the same amount. We set the gain for the US-drive to unity to keep the UR fixed. Then the gain for the cerebellar drive scaled the CR amplitude relative to the UR (results identical to Figure Figure66A).
The CR profile was also affected by changing the adaptive filter model of the cerebellum in Eqs 2 and 3. In particular, we considered the effect on the CR of different sets of basis elements, corresponding to sets of parallel fiber activity activated by the neuronal drive from the CS (see Figure Figure33).
All of the simulations considered thus far used basis elements that were Gaussian functions of varying width (Figure (Figure3A).3A). One simple change is to decrease the differences in width of the Gaussian functions to give more basis elements across the CS duration (Figure (Figure3B).3B). Another simple change is to modify the peak amplitude of the Gaussian, for example with an alpha function envelope (Figure (Figure3C).3C). Neither of these changes appreciably affected the profile of the fully acquired CR (Figure (Figure6F),6F), as might be expected because the range of possible CR motor drives that can be constructed from these elements is the same in each case.
More drastic changes to the basis elements did affect the CR profiles. A tapped delay line gives a set of basis elements that are delayed versions of the CS drive (Figure (Figure3D)3D) – these basis elements led to an unrealistic profile with an early onset artifact for the CR (Figure (Figure6G,6G, blue curve), which occurred because the broad profile with sharp onset of these basis functions led to a large initial contribution that could not be canceled in combination with later elements. Decaying exponential basis elements have earlier onsets than the Gaussians (Figure (Figure3E)3E) – these led to a CR with a very gradual rise to peak amplitude (Figure (Figure6G,6G, green curve) because more of the CR drive is happening earlier in the response. Delta function basis elements (Figure (Figure3F)3F) also generated unrealistic profiles (Figure (Figure6G,6G, red curve), with the sharp onset of the element giving a sharp onset to the CR. Our initial choice of Gaussian basis elements was motivated from their production of realistic CR profiles that do not have the artifacts described above.
Finally, we examined the effect on CRs of removing the threshold non-linearity introduced in the model of the deep cerebellar nucleus. This threshold ensured only pauses in cortical firing can activate extra-cerebellar sites (Eq. 5).
Removing the DCN non-linearity resulted in small oscillations of the conditioned response profile, which produced early and late response artifacts (Figure (Figure6H).6H). For the simulations we assumed these oscillations are not realistic and included the deep cerebellar nucleus threshold in the basic model. However, it is possible such small oscillations are difficult to identify behaviorally because of their small amplitude compared to noise. In the simulations these disturbances were most evident just after CS-onset and in the response tail. The reason for the wave-like artifacts is that the cerebellar output and US-representation did not cancel perfectly in the inferior olive, because the parallel fiber activity to the cerebellar cortex contained components wider than the US. This non-cancellation interacted with the covariance learning rule to produce a wave-like disturbance spreading out from the US. Introducing the above threshold stopped the non-cancellation from being communicated to the learning rule, which prevented the oscillations.
A widely used conceptual framework for understanding cerebellar function is that the cerebellum learns to construct accurate motor commands from input “contexts,” using a teaching signal provided by a second input (Albus, 1971; Marr, 1969). A simple yet general implementation of this conceptual framework is the adaptive filter (Widrow, 1985), which was first formally applied to the cerebellum by Fujita (1982), and used subsequently to model many cerebellar tasks. Since classical eyeblink conditioning is a preparation that has been used extensively to investigate cerebellar function (Hesslow and Yeo, 2002; Thompson, 2005; Yeo and Hesslow, 1998), it is natural to ask whether the adaptive filter formalism can be applied to this paradigm also. As we have indicated, there is a serious problem with this application. Other Marr–Albus based models recognize that the inferior olive conveys sensory or motor signals related to movement inaccuracy, and that the cerebellum reduces this signal to learn accurate commands. However, in classical conditioning of a movement there are no such signals, but instead the inferior olive is interpreted as a comparator of the US signal and cerebellar output (e.g., Andersson et al., 1988; Jirenhed et al., 2007; Kim et al., 1998; Medina et al., 2002; Rasmussen , 2008; Sears and Steinmetz, 1991).
To investigate the implications of this difference in error signal we constructed a system-level model of classical conditioning consistent with the anatomy of the eyeblink/NM response circuitry, an adaptive filter for the cerebellum coupled to a comparator signal from the inferior olive, and a linear dynamic model representing the NM motor plant (Figure (Figure1).1). The plant model and drive followed from our recent study of this motor system (Lepora et al., 2007b, 2009). Thus, to our knowledge, this is the first model to explicitly include all of these components, in particular the motor plant that is vital for shaping and timing the CR profile. A number of previous classical conditioning models are also based on adaptive filter and comparator architectures (Table (Table1),1), but these components have often been implicit, rather than explicit, in the model structure (Bartha et al., 1991; Bullock et al., 1994; Grossberg and Schmajuk, 1989; Medina and Mauk, 2000; Moore et al., 1989).
Given the explicit separation of our model into the plant, adaptive filter and comparator components, and the relatively few parameters of our system-level model, we were then able to investigate systematically the model assumptions and parameters to uncover the implications of using a comparator error signal in a Marr–Albus type model. This investigation gave two main results:
The implications of these results for understanding the cerebellar role in classical conditioning are as follows.
The central feature of classical conditioning is that by definition the relation of the CS to the US is totally unaffected by the subject's response (Gormezano et al., 1983; Pavlov, 1927). Delivery of the US is entirely outside the subject's control, in contrast to operant conditioning where reward is contingent on the subject's behavior (e.g., Mackintosh, 1974). In the particular case of eyeblink conditioning, the use of an airpuff CS can potentially allow lid closure to partially avoid the airpuff effects if precautions are not taken. But classical conditioning of the eyeblink/NM develops robustly with a periocular electrical US that presents no opportunity for response-related amelioration of the US, and all of the empirical data used in our modeling is from such experiments. This special feature of classical conditioning means that if the US were to be interpreted as the teaching signal there would be an immediate problem: the teaching signal continues to be delivered even when the CR is fully acquired, which would either drive the CR ever larger without limit or imply that asymptotic learning is governed by synaptic saturation, against which there is substantial evidence. This problem is reflected in the unstable behavior of the adaptive filter model when the US alone is used as a teaching signal (Figure (Figure4A,4A, red trace). The teaching signal continues to be delivered whatever response the model makes, so weight change continues without limit and the learning becomes unstable.
A widely accepted proposal is that the teaching signal conveyed to the cerebellum from the inferior olive is the result of a comparison between the US and CR drive (e.g., Andersson et al., 1988; Jirenhed et al., 2007; Kim et al., 1998; Medina et al., 2002; Rasmussen et al., 2008; Sears and Steinmetz, 1991). Then a copy of the CR command is subtracted from the US signal in a comparator. Thus, as learning proceeds, the CR command becomes larger until it reaches a value at which the US signal is canceled, after which learning ceases. The problem of unstable learning is therefore solved. Consistent with this suggestion, experimental evidence indicates that the signal conveyed from inferior olive to cerebellar cortex does not remain constant, but rather declines in step with the acquisition of the CR. Multiunit olivary activity (Sears and Steinmetz, 1991), cortical climbing fiber field potentials (Hesslow and Ivarsson, 1996) and complex-spike activity (Rasmussen et al., 2008) all decline with conditioning and within each trial. The temporal profile of olivary suppression is correlated with CR amplitude but with a phase shift of around 90ms (Hesslow and Ivarsson, 1986), which is consistent with the inferior olive firing in the model discussed here.
Evidence suggests that this comparator signal could account not only for stable acquisition, but also for classical conditioning phenomena such as extinction (Bengtsson et al., 2007; Medina et al., 2002), blocking (Andersson et al., 1988; Kim et al., 1998; Rasmussen et al., 2008), and indeed all the contingencies learnt by the RW trial-level model (Sears and Steinmetz, 1991). Therefore, the classical conditioning paradigm provides information about the predictive relation between the CS and US, which can be used to modulate the representation of the US within the nervous system itself. This is the information exploited by the comparator architecture, so that by subtracting a CS-derived internal signal from the internal representation of the US, an error signal is generated that in effect represents the extent to which the US is not predicted. The performance of the system is thus driven by the intrinsic characteristics of the learning rule and error signal, so that parameterization details of the other model components are not crucial to the outcome.
This reasoning explains why models of classical conditioning based on a comparator architecture account for the trial-level performance, but real-time features of the CR profile need to be supplied by alternative means, such as tuning the model components.
In eyeblink conditioning, the timing of the CR peak is governed by the ISI. In well-conditioned subjects, the CR peak on a CS-alone trial is seen to have reasonably constant latencies from the normal onset of US delivery. For an ISI of 500ms, average reported values in well-trained subjects include about 0ms (Lepora et al., 2007b; see Figure Figure5B),5B), about 75ms (Hardiman and Yeo, 1992) and Kehoe et al. (2009) reported an average CR peak latency lag of 114ms when all responses are considered. Responses <0.5mm are often excluded from analysis in NMR conditioning studies because the frequency of non-associative responses (non-CRs) increases somewhere below this value (e.g., Hardiman and Yeo, 1992). If these smaller responses were excluded, consistent with other studies, the average peak CR lag at 500ms reported by Kehoe et al. (2009) would shorten toward 75ms. For an ISI of 350ms, CR peak lags are reported between 0ms (Lepora et al., 2007b) and 50ms (Kellett et al., 2010). The cerebellum is widely considered to play a crucial role in this CR timing but, as discussed above, classical conditioning neither provides nor requires sensory information from the periphery about the actual CR and its coincidence with the US. We now discuss how this lack of timing information implies that classical conditioning models based on a comparator architecture require tuned model parameters to achieve the observed behavior.
A key point is that in the comparator the critical timing relationship is between two internal signals, one derived by the cerebellum from the CS and the other representing the US. However, if the internal CS-derived signal is also the CR motor command, then the nature of the internal timing relationship creates a problem. For accurate timing of the CR peak to US-onset, the motor drive must peak before the US to compensate for the visco-elastic plant dynamics. Though individual subjects appear to time their CR peaks fairly consistently, the common assertion that nictitating membrane CR peaks are always timed very closely to US-onset appears to apply only to some experimental subjects. Here, the basic model produced delays in CR peak from US-onset of 70–98ms for an ISI of 500ms and of 65–88ms for ISI 350ms with realistic plant dynamics (Figure (Figure6B).6B). Thus the basic model does not account for the shorter CR peak lags of 0–75ms and 0–50ms respectively, which are seen behaviorally with these two ISIs. In previously reported cases where the US to CR peak lags were close to zero, the peak CR drive, measured as the EMG response in the retractor bulbi muscle, reliably preceded US-onset by 50–100ms (Figure 4 of Lepora et al., 2007b). Thus, evidence from behavioral and electrophysiological analyses suggests that a realistic model should provide a mechanism for advancing the CR drive and response by approximately 50ms.
We simulated this feature by inserting a delay in the pathway conveying a copy of the CR command to the comparator. This can be understood by first considering the system with zero delay in the pathway from cerebellum to comparator, so that the learnt cerebellar activity coincides with the US-related activity. If the delay is turned on, the signal from cerebellum to comparator then occurs after the US-related activity in the comparator, which would cause Purkinje cell activity in cerebellar cortex, and hence cerebellar output, to adapt to give a new peak of activity that occurs before the old peak activity. The general idea that a delay in the cerebellar-comparator pathway would improve CR timing is consistent with the results of recent experimental studies (Bengtsson and Hesslow, 2006; Svensson et al., 2006).
To first approximation, CR latencies for a given plant time constant are roughly equal for ISIs above about 300ms, so that a delay in the nucleo-olivary pathway of 50ms (near the middle of the 25–90ms observed range of Hesslow, 1986; Hesslow and Ivarsson, 1996) will function effectively to reduce the 65–98ms CR peak latencies in the basic model to the range of 32–61ms that is closer to the range of behavioral data (Hardiman and Yeo, 1992; Kehoe et al., 2009; Lepora et al., 2007b). Thus, by incorporating realistic plant dynamics, the model naturally reproduces the findings that nictitating membrane CR peaks are usually delayed beyond US-onset and the magnitude of this delay correlates with the ISI, a feature of conditioning that has escaped attention until relatively recently (see Kehoe et al., 2009). The implications of these empirical and computational findings are discussed below.
A wide variety of evidence suggests that, as reflected in Figure Figure1,1, the nucleo-olivary pathway is part of the comparator (Andersson and Hesslow, 1987; Andersson et al., 1988; Bengtsson and Hesslow, 2006; Bengtsson et al., 2007; Hesslow and Ivarsson, 1996; Nicholson and Freeman, 2003; Svensson et al., 2006). However, a number of other functions have been proposed for this pathway (Bengtsson and Hesslow, 2006), including control of electrotonic coupling between inferior olive neurons (Lang et al., 1996) and feedback regulation of tonic simple-spike firing rates in Purkinje cells (Bengtsson et al., 2004). These rates are depressed by tonic increases in climbing fiber firing, but this depression in turn disinhibits cells in the deep cerebellar nuclei, whose increased activity then acts via the nucleo-olivary pathway to reduce olivary firing to a suitable value. Such a feedback mechanism could act to maintain stable baseline firing rates throughout the olivo-cerebellar circuit (Attwell et al., 2001). It is currently unclear to what extent a feedback mechanism of this kind would be compatible with a comparator function. Certainly its tonic feedback effects have complicated the interpretation of experiments designed to investigate the effects of olivary manipulations upon learning in eyeblink conditioning (Kim et al., 1998; Medina et al., 2002; Yeo et al., 1986; Zbarska et al., 2007, 2008). Note that the present model, in common with many others (Table (Table1),1), consider in effect only variations about an unchanging, tonic olivary firing rate. Therefore, it does not encompass these other functions of the nucleo-olivary pathway and does not deal with the severe effects of reducing the inferior olivary firing rate to zero by anesthesia (Cerminara and Lawson, 2004).
It has also been proposed that pathways other than the direct nucleo-olivary projection play a role in the comparator, for example the pathway from deep cerebellar nucleus to inferior olive via the red nucleus, and possibly also the trigeminal complex (Davis and Dostrovsky, 1986; Horn et al., 1998; Liu et al., 2000; Sears and Steinmetz, 1991). Furthermore, cerebellar modulation of sensory inputs to the olive could enable comparator function without direct modulation at the olive itself. For eyeblink conditioning, the excitatory and inhibitory influences on the trigeminal complex from the red nucleus (Davis and Dostrovsky, 1986), may be highly influential. However, from a computational perspective, the important features of the pathway that conveys cerebellar output (directly or indirectly) to the olive are its signal-processing capabilities. For example, the nucleo-olivary projection appears to have a hard-wired delay: brief (20ms, 200Hz) stimulation in appropriate regions of the brainstem produces a long-lasting inhibition which peaks 25–50ms after stimulation onset (Hesslow and Ivarsson, 1996; Svensson et al., 2006) and nucleo-olivary inhibition during the production of a conditioned response peaks with a delay of 90ms from CS-onset (Hesslow, 1986). These response characteristics appear qualitatively similar to those observed in vitro (Best and Regehr, 2009) and are attributed to the very unusual properties of the nucleo-olivary synapse. This delay could help with the accuracy of CR timing, as suggested previously (Svensson et al., 2006), shown explicitly here, and discussed above. However, a fixed delay could also have computational disadvantages, since the desired value of the delay would depend upon which particular part of the body (motor plant) was being controlled by any given cerebellar microzone. In addition, the eyeblink plant dynamics underlying CR production are known to vary across subjects, leading to questions about how the delay in the nucleo-olivary pathway is tuned for each particular animal. Some remarks on the information required for such tuning are discussed in the final section on avoidance learning.
Models of Pavlovian conditioning have been under development for many years (see, e.g., Brandon et al., 2002). For comparison with the model discussed here, we consider only those models of eyeblink conditioning based on adaptive filter architectures. For clarity, the properties of these models are summarized in tabular form (Table (Table1),1), from which we make several observations.
The first three observations relate to the computational consequences of having an adaptive filter structure with a comparator error signal.
First, all previous models based on adaptive filters use comparator error signals, in common with the model analyzed here. As explained in the Section “Introduction,” this is because a comparator allows stable learning in Marr–Albus type models of classical conditioning.
Second, their outputs are then also not fully specified by the teaching signal, so that CR profiles depend on the choice of model parameters. Real-time CR profiles can be obtained with a wide range of different assumptions, with particular focus on the signals in parallel fibers that correspond to the choice of basis function for the adaptive filter (Table (Table1).1). The reason for this richness is that CR profiles can be approximated with few parameters, whereas the details of processing in cerebellar cortex require very many parameters for their description, many of which are unspecified because the relevant evidence is not yet availabel. Thus, when the properties of the system output are constrained not by the requirements of the motor task, but solely by the properties of the modeled cerebellar microcircuit, there are many ways to achieve realistic-looking outputs.
Third, the principal computational properties of these models (namely acquisition/extinction of a timed response to the US) in fact follow from any adaptive filter model of the cerebellum coupled to an appropriate comparator signal. One other group of adaptive filter models (Desmond and Moore, 1988, 1991; Moore et al., 1989) has considered training with multiple stimuli, and as here found the model exhibited overshadowing, blocking and conditioned inhibition effects. However, a main finding of the present study is that any adaptive filter model of eyeblink conditioning that uses a comparator error signal will also lead to these types of contingency learning. Therefore, we expect that all models in Table Table11 can learn these contingencies if they are extended to admit inputs from multiple conditioned stimuli.
The final two observations relate to how the adaptive filter architecture has been embedded within previous models.
First, the fact that some of the models in Table Table11 consider a second adaptive layer in addition to the principal adaptive filter layer, either to represent deep cerebellar nuclear plasticity (e.g., Grossberg and Schmajuk, 1989; Medina and Mauk, 2000) or cerebellar granule layer plasticity (Desmond and Moore, 1988, 1991; Moore et al., 1989) does not help with the problem of principled specification of CR profiles since the additional layer provides an additional source of free parameters. We find these extra layers of plasticity are not required to model the basic trial-level and real-time aspects of eyeblink conditioning, but do not rule out that they could be necessary for other aspects of the motor task not considered here.
Second, none of the other models of eyeblink conditioning have considered the adaptive filter model of the cerebellum to control a model of the motor plant. In principle, a plant model of the type considered here could be included in these models, but those models that do not give Gaussian outputs would not give realistic CRs. For example, in Figure 9 of Grossberg and Schmajuk (1989) the model output is tuned to resemble the actual CR, and in Desmond and Moore (1988) the model outputs do not have a smooth Gaussian profile. Furthermore, none of the other models has a delay in the nucleo-olivary component of the comparator and hence all would result in a CR peak that is significantly delayed after the US-onset.
The above results emphasize that the central difficulty of modeling CR production in eyeblink conditioning arises because Pavlovian conditioning does not allow a response to the CS to affect the actual US, and therefore there is no information to precisely specify the CR profile.
One solution to the problem of learning the real-time features of the CR could lie in the relation between classically conditioned eyeblinks and those learned as a result of avoidance training through operant conditioning (such as occur naturally in responding to aversive stimuli and thus would have occurred previously). Because the conditioned response ameliorates the US in avoidance learning, comparison of the CR and US can take place outside the brain rather than inside it as in the comparator architecture. Such an external comparison would retain the contingency learning capacities of comparator-based models and, in addition, provide genuine “error” information from corneal sensors and other periorbital receptors to specify the CR profile. Comparison of avoidance and classically conditioned eyeblinks from this perspective raises two important questions.
First, little is known about the computational requirements of protective responses. For eyeblinks, there should be a trade-off between protecting the eye (long blink) and maintaining vision (short blink), but the precise relation is unclear. Experimental evidence concerning the effects of eyelid restraint indicates that the amplitude of unconditioned blinks is under adaptive control by the cerebellum (Chen and Evinger, 2006; Pellegrini and Evinger, 1997), and further investigation of this control may help to clarify its computational basis.
The second question concerns similarities between the temporal profiles of eyeblinks produced by classical conditioning and those resulting from avoidance training. There appears to have been little systematic investigation of this issue, but one report compares conditioned responses obtained using periocular electrical US with those obtained using an airpuff US that was directed to the unrestrained eyelids and so permitted a degree of US avoidance. The two response types were very similar (Mauk and Ruiz, 1992). If there were some avoidance component in the airpuff condition, as the authors suggest, then it would appear that the performance of the eyeblink control system is little affected when the internal comparator is substituted for external comparison, which in turn suggests that the comparator architecture is accurately tuned to mimic the external world.
A key feature of simulating these features with a future Marr–Albus type model of the cerebellum is that it should also be able to permit accurate performance on other motor control tasks, such as VOR calibration. Accumulating evidence suggests that even for precise motor control, the relevant “error” signal is not the difference between desired and actual motor outputs, not least because the desired motor commands are usually unknown. Instead, the cerebellum appears to use sensory indicators of inaccurate motor commands as teaching signals (e.g., retinal slip for the VOR). Then adaptive filter models of the cerebellum can learn to reduce motor error indirectly, by decorrelating their inputs from the sensory-derived teaching signal (Dean et al., 2002; Dean and Porrill, 2008; Porrill and Dean, 2007; Porril l et al., 2004).
One unexpected outcome of modeling the neural substrate of NM conditioning relevant to this issue is that it became increasingly evident the widely-held view that NM CR peaks are precisely timed to US-onset is inconsistent with the literature. It is much more common for NM CRs to lag the US by a delay, consistent in individual subjects, of up to 50ms at ISIs around 350ms, with longer lags for extended ISIs. In advancing arguments for a calibration of the NM response system under normally-occurring operant conditions, it would appear that such calibration is only partly successful if the resultant behavior has consistent timing lags. However, it should be recognized that the NM response is only one part of the eyeblink defensive system that importantly includes the external eyelid system too. For rabbits, extensive experimental evidence (e.g., Ruiz and Mauk, 1992; Zbarska et al., 2007) suggests this response system has a very similar dependence upon the cerebellum but CR peak latencies have much shorter lags from US-onset at 10–20ms at an ISI of 250ms even with an airpuff US (Gruart et al., 2000). Thus, if there is common drive for eyeblink and NM conditioned responses (see van Ham and Yeo, 1996) the optimization of a response system with fast (eyelid) and slower (NM) time constants might lead to the NM CR peak delays seen in the experimental literature and revealed here in our model.
In considering how the system matches CR peak to delivery of the US it should be noted that trains of electrical pulses are used as the US in behavioral experiments with typical durations of 60ms (e.g., Kehoe et al., 2009; Kellett et al., 2010) and an airpuff US is typically longer at 100ms (e.g., Welsh, 1992). Thus, CR peak delays similar to the US duration may reflect strategies that better optimize US cancellation through peripheral and central mechanisms. To advance our understanding of these problems, further studies with very short US durations, with detailed electrophysiological analysis of Purkinje cell activity in the critical cerebellar control regions during conditioning and with careful monitoring of eyelid and nictitating membrane responses will be needed.
Whatever the details, a marked similarity between classically conditioned blinks and those produced by avoidance training would suggest the comparator architecture illustrated in Figure Figure11 lacks a crucial adaptive ingredient. Understanding the nature of this adaptation would shed light not only on the prediction of sensory signals in classical conditioning, but also on the nature of cerebellar control in general.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was supported by the United Kingdom Biotechnology and Biological Research Council (Grant BBS/B/17026 under the Integrative Analysis of Brain and Behaviour Initiative) and the Wellcome Trust (VIP award to N. Lepora).
The basis filters used initially in our simulations are designed to give a set of Gaussians from a square-wave CS input (Figure (Figure3A).3A). A suitable filter transformation of the CS is
with linearly spaced means and widths μk=0.05k, σk=μk/5 and hk=1 where k runs from 1 to 20. The first couple of variants of the adaptive filter model consider changes to this initial set of basis filters. First, a set of Gaussians also varying in peak amplitude (Figure (Figure3B),3B), with an envelope function hk(μk) assumed to depend only on the Gaussian mean (we used to have unit peak amplitude at ~400ms). Second, a more closely spaced set of widths for the Gaussians (Figure (Figure3C),3C), with μk=0.025k and σk=μk/5. The other variants consider are of the form:
First, a set of decaying exponential with αk=0.1/k the time constant of the decaying exponential. Second, a tapped delay line (Figure (Figure33E), with delay dk=0.05k. Third, a delta function basis (Figure (Figure3F),3F), with delay dk=0.01k.