|Home | About | Journals | Submit | Contact Us | Français|
The mediodorsal thalamus (MD) is a crucial component of the neural network involved in the learning and generation of goal-directed actions. A series of experiments reported here examined the contributions of MD to the temporal differentiation of reward-guided actions. In Experiment 1, we trained rats on a discrete-trial, fixed-criterion temporal differentiation task, in which only lever presses exceeding a threshold duration value were rewarded. Pre-training MD lesions impaired temporal differentiation of action duration, by increasing the dispersion of the duration distribution. Post-training MD lesions also impaired differentiation, but by reducing the average emitted press durations, thus shifting the distribution without increasing the dispersion. In Experiment 2, we trained rats to space their lever pressing above criterion inter-press-intervals in order to earn rewards. Both pre-training and post-training MD lesions impaired the differentiation of inter-press-intervals. These results show that MD plays an important role in the acquisition and expression of action differentiation.
The mediodorsal nucleus of the thalamus (MD), which participates in both the associative and limbic cortico-basal ganglia networks, has been implicated in learning and memory (Markowitsch, 1982; Aggleton and Mishkin, 1983a; Hunt and Aggleton, 1991, 1998a; Oyoshi et al., 1996; Parker et al., 1997; Gaffan and Parker, 2000; Mitchell and Dalrymple-Alford, 2005; Mitchell et al., 2007a,b). MD receives inputs from the medial substanta nigra pars reticulata and ventral globus pallidus (Haber et al., 1985; Ilinsky et al., 1985) and sends direct projections to the striatum (Cheatwood et al., 2003, 2005); it is also strongly and reciprocally connected with the prefrontal cortex, including anterior cingulate cortex and orbital frontal cortex (Goldman-Rakic and Porrino, 1985; Giguere and Goldman-Rakic, 1988; Ray and Price, 1992, 1993). In agreement with the anatomical connectivity, MD lesions have been reported to disrupt the acquisition of stimulus-reward associations (Gaffan and Murray, 1990; Gaffan et al., 1993) and action-outcome associations (Corbit et al., 2003; Mitchell et al., 2007b; Ostlund and Balleine, 2008), which involve the limbic and associative cortico-basal ganglia networks, respectively (Yin et al., 2008). MD lesions also cause deficits in recognition memory (Aggleton and Mishkin, 1983b; Zola-Morgan and Squire, 1985; Parker and Gaffan, 1997; Parker et al., 1997; Hunt and Aggleton, 1998a), Pavlovian fear conditioning, stress responses (Herry et al., 1999; Chauveau et al., 2005), and limbic motor seizures (Cassidy and Gale, 1998; Popken et al., 2000; Byne et al., 2001; Volk and Lewis, 2003).
As reported by Corbit et al. (2003), MD plays an important role in the acquisition of goal-directed behavior, which is sensitive to outcome devaluation and instrumental contingency degradation. This study focuses on the role of MD in the timing of actions, using the action differentiation paradigm. Action differentiation is the process by which different actions are generated despite identical stimulus conditions (Platt et al., 1973; Kuch, 1974; Kuch and Platt, 1976; Yin, 2009). Here we examined the temporal differentiation of action duration (the period between the pressing and the release of a lever) and inter-response-times (IRT, time between two adjacent lever presses) to further investigate the role of the MD in instrumental learning.
We used 65 male Long–Evans rats (~7 weeks at the beginning of the experiments). All procedures followed Duke University Animal Care and Use Committee guidelines. Surgery was performed under general anesthesia with isoflurane (2%). Anesthetized animals were mounted in a stereotaxic device (Kopf, CA, USA). The scalp was cut to expose the skull surface and small burr holes were drilled bilaterally (AP −3.0; ML ±0.7−0.8; DV −5.5). Lesions were created by infusing 0.4μl NMDA (0.1M in PBS) per side over 2min; and to allow the drug to diffuse, the needle was left in the brain for another 5min after the end of injection. Sham lesions were created using the same procedures except 0.9% saline was injected instead of NMDA.
Training took place in six Med Associates (St. Albans, VT, USA) operant chambers. Rats were food deprived by feeding them 10–12g of home chow each day after training and testing, to maintain their body weight at about 85% of their normal weight. Water was always available in the home cages. Each chamber was equipped with a food magazine that received pellets from a pellet dispenser (Bio-Serv 45mg dustless precision pellets, Bio-Serv, NJ, USA). An infrared beam crossed the magazine opening to record head entries into the magazine. Each chamber contained two retractable levers on either side of the magazine and a 3W 24V house light mounted on the wall opposite the levers and magazine. A computer with the Med-PC-IV program was used to control the equipment and record behavior. The duration of each lever press was measured at a resolution of 10ms using custom-written programs.
Twenty-three rats were used in this experiment (sham, n=11; lesion, n=12). Before the temporal differentiation training, we replicated the previously reported deficits on the outcome devaluation test following MD lesions. Both groups of rats were trained with the same two-action, two-outcome training design from a previous study (Corbit et al., 2003). Briefly, rats were given two 30-min sessions of magazine training before lever press training, in which the food pellets and 20% sucrose were delivered on a random time schedule (on average every 60s), with no levers extended. On the next day, half of the rats in each group received pellets by pressing the left lever and the sucrose by pressing right lever. The rest were trained with the opposite pairing. Initial lever press training began with 2 days of continuous reinforcement (CRF, each press earns one reward), and then shifted to random ratio (RR) schedule, which consisted of 2 days of RR5 (0.2 probability of reward for each press), 2 days of RR10 (0.1 reward probability) and 2 days of RR20 (0.05 reward probability). The rats were trained with two sessions each day, one with each paring. The sessions were separated by at least 1h. Each session started with illumination of the house light and insertion of the lever, and ended with turning off the house light and retraction of the lever after 90min or 30 rewards.
After the last day of RR20 training, the rats were given two consecutive days (one session per day) of outcome devaluation test. Before each test, the rats were pre-fed on either pellet or sucrose for at least 1h. After pre-feeding, a 5-min choice extinction test was conducted. During the test, both levers were inserted, but no reward was delivered. The number of presses on each lever was recorded.
After the devaluation test, the rats were retrained with 1 day of CRF with pellets, during which half the rats pressed the left lever and the other half pressed the right lever. Sucrose was not used in this experiment. Following CRF, the rats were successively shifted to three different temporal response differentiation schedules, in which the rats were trained to produce lever presses with a minimum duration at 400, 800 and 1600ms (Yin, 2009). A discrete-trial program was used. Each trial began with the insertion of a lever, and ended with its retraction as soon as the lever was pressed and released. The trial was repeated, with an inter-trial-interval of 8s. If the press lasted longer than the criterion duration, following the release of the lever a food pellet was delivered immediately into the food magazine. If not, no pellet was given. The session was terminated after 90min or 50 earned pellets. The rats were trained for six sessions on each criterion.
Twenty-six rats were used in this experiment. Before surgery, naïve rats were trained with temporal differentiation schedules: >400, >800 and >1600ms for six sessions on each criterion after four sessions of CRF. Following the last session of 1600-ms duration training, the rats were given free access to food in their home cage for 1 day. At the time of the surgery, rats were divided into two groups based on their baseline instrumental performance during their duration differentiation training. Half the rats from each cage (the rats were paired in home cage) received MD lesion with 0.4μl NMDA, and the remaining received 0.9% saline. The surgery procedure was the same as described above. After surgery, the rats were allowed to recover for 5 days, and then returned to the food deprivation schedule for 2 days before the test. During the test, the rats were placed under >1600ms schedule for four sessions.
After duration differentiation training, 16 of the rats from the post-training duration differentiation experiment (sham, n=8; lesion, n=8) were retrained with CRF for four sessions, and then used for IRT differentiation schedule at >10 and >20s successively. Rats were required to press the lever with a minimum delay above a criterion value (10 or 20s) after the last press. If they pressed earlier than the required delay, the reward was canceled. The rats were trained for five sessions at 10s, and then shifted to 20s for five sessions. Each session terminated after 90min or 30 earned pellets.
Sixteen naïve rats were used in post-training MD lesion experiment. Lever press training began with four sessions of CRF. Half of the rats in each group earned pellets by pressing the left lever. The remaining rats were trained with the right lever. They were all shifted to two sessions of RR5, two sessions of RR10 and two sessions of RR20. They then were trained on IRT differentiation schedule at >10 and >20s for six sessions on each criterion. The session ended with 30 earned pellets or after 90min.
Before surgery, the rats had free access to food for 1 day. They were then divided into two groups based on their baseline instrumental performance during the pre-surgery training. Half of the rats from each cage received MD lesion, and the remaining were chosen as controls. The surgical procedure is the same as described above. Similarly, the rats were given a recovery period of 5 days and then food deprived before testing. The >20-s schedule was used during the test.
Data analysis was performed using Microsoft Excel, Graphpad Prism and Matlab.
Histological analysis showed that NMDA infusions caused substantial neuronal damage in the MD, with limited damage to surrounding thalamic nuclei in some subjects (Figure (Figure1).1). Four rats with inaccurate lesions (lesion of deeper and more caudal thalamic nuclei, or unilateral damage of MD) were excluded.
To make sure we were targeting the same area as previous work did, we replicated experiment using two-action and two-outcome training, and performed a 5-min devaluation test before the duration differentiation training. Our data indicated that both groups of rats learned to press the lever for both pellets and sucrose after 8 days of training. The mean press rates increased across the days. A two-way ANOVA with group and days as factors showed no main effects of group (pellet, F1,105=2.35, p>0.05; sucrose, F1,105=2.44, p>0.05), main effects of days (pellet, F7,105=90.83, p<0.0001; sucrose, F7,105=67.98, p<0.0001), and no interactions between them (pellet, F7,105=0.24, p>0.05; sucrose, F7,105=1.79, p>0.05) on pellet and sucrose. During the devaluation test, MD lesioned rats displayed reduced sensitivity to outcome devaluation: their mean response rate did not differ between devalued and non-devalued levers (planned comparison, p>0.05). By contrast, the sham control group exhibited a normal devaluation effect, pressing less frequently on the devalued lever (planned comparison, p<0.05). We were therefore able to replicate the results from previous work (Corbit et al., 2003) showing that MD lesion impaired acquisition of the action-outcome contingency.
To examine the effect of MD lesion on temporal differentiation, the rats were retrained with 1 day of CRF with pellets as reinforcers, and then began the duration differentiation task. Figure Figure22 showed that all rats (n=11 for sham and n=11 for MD) learned to perform the task and produced duration distributions peaking around criterion durations within six sessions (Figure (Figure22A).
As the press durations exhibited non-Gaussian distributions, to quantify the performance, the median press duration value of each rat was used as a measure of the timing of the action and interquartile range (IQR) of the duration distribution was used as a measure of dispersion. For all three criterion durations, the median duration of both groups increased within six sessions (Two-way ANOVA, main effects of time: 400ms, F5,100=2.48, p<0.05; 800ms, F5,100=8.74, p<0.0001; 1600ms, F5,100=24.74, p<0.0001), and reached a steady state across last three sessions (no effects of time: Fs<1.2, ps>0.05, Figure Figure2B).2B). There was no significant difference between sham and MD lesion rats for all three criterion durations (no effects of group: 400ms, F1,100=1.14, p>0.05; 800ms, F1,100=0.74, p>0.05; 1600ms, F1,100=0.38, p>0.05); no interaction between time and group (Fs <1).
However, as shown in Figure Figure2B,2B, pre-training MD lesion produced higher variability in press durations. This was confirmed by a two-way ANOVA. The IQRs of lesioned rats were higher than those of sham rats at 400 and 1600ms (main effects of lesion, for 400ms, F1,100=5.20, p<0.05; for 1600ms, F1,100=9.32, p<0.01; no effects of time, for 400ms, F5,100=0.74, p>0.05; for 1600ms, F5,100=0.82, p>0.05; no interactions, for 400ms, F5,100=1.12, p>0.05; for 1600ms, F5,100=1.14, p>0.05). For 800ms, the IQR was not significantly different (no effect of lesion, F1,100=2.74, p>0.05; effect of time, F5,100=2.34, p<0.05; no interaction, F5,100=0.87, p>0.05) although the IQRs were numerically higher for MD lesions.
We also compared the proportion of presses rewarded and the rate of lever presses (rLP) for each criterion duration (Figure (Figure3A).3A). For both groups, the proportion of rewarded presses and rLP increased over six sessions (Two-way ANOVA, main effects of time, 400ms, Fs5,100>6.70, ps<0.0001; 800ms, Fs5,100>19.3, ps <0.0001; 1600ms, Fs5,100>23.6, ps<0.0001; no interactions between time and group, Fs<1.62, ps>0.05) at each criterion. Similar to the median duration measure, the transition to steady state occurred around the 4th session for both sham and lesion groups (no effects of time for last three session: Fs2,40<1.38, ps>0.05 for proportion of rewarded presses; Fs2,40<1.18, ps>0.05 for rLP; except at 1600ms, F2,40=3.68, p<0.05 for proportion of rewarded presses; no interactions at all criterions, Fs <1.21, ps >0.05). As shown in Figure Figure3A,3A, lesioned rats produced lower proportion of correct (rewarded) presses and showed reduced rate of pressing at 1600ms across six sessions (main effect of group, F1,100=5.06, p<0.05 for proportion of rewarded presses; F1,100=5.58, p<0.05 for rLP). This effect was smaller for shorter duration criteria. At 400ms, although the proportion of presses rewarded and the press rate of lesion group were not significantly lower than those of control group across six sessions (no effect of group, Fs1,100<3.70, ps >0.05), they did reach statistical significance over the last three sessions (main effect of group, F1,40=4.72, p<0.05 for proportion of press rewarded; F1,40=7.01, p<0.05 for rLP). There was no significant lesion effect for the 800-ms criterion duration, however (no effect of group, Fs1,100<0.87, ps >0.05). Thus pre-training MD lesions impaired the accuracy of action timing during the initial differentiation learning and when the performance became more difficult.
Interval timing often exhibits the scalar property, i.e. noise is proportional to the average. A recent study in mice indicated that motor cortical lesions have no impact on the scalar property of press duration (Yin, 2009), which was previously suggested to be a basic property of the psychophysical judgment of temporal duration. To assess the effect of MD lesions on the scalar property of timing, coefficient of variation (CV, standard deviation/mean) across six sessions was analyzed (Figure (Figure3B).3B). A two-way ANOVA analysis with group and duration as factors showed a main effect of group (F1,40=7.50, p<0.05), a main effect of duration (F2,40=13.91, p<0.05), and no interaction between these two factors (F2,40=1.06, p>0.05). In other words, MD lesion resulted in a general increase in the CV of press durations, and the CV changes depending on the duration criterion. Post hoc tests revealed a difference between 400 and 800ms sessions (p<0.05), but no difference between 800 and 1600ms (p>0.05). This observation suggests that, in rats, the distribution of lever press durations may exhibit the scalar property only for relatively long durations.
To investigate how post-training MD lesions affect the expression of temporal differentiation, after training with three criterion durations, the rats were separated into two groups (n=13 for sham and n=12 for MD lesion) based on their press duration distributions (Figure (Figure4A).4A). Planned comparisons revealed no significant difference in median (p>0.05), IQR (p>0.05), rate of lever pressing (p>0.05), and proportion of rewarded presses (p>0.05) during the last session. After surgery, rats were re-tested with the 1600-ms duration task. The distribution of lesioned group immediately shifted to the left (Figure (Figure4B)4B) during the first session of post-lesion tests. By contrast, there was no change in the sham group. Yet the dispersion of the distribution did not change in either group. A two-way AVOVA with group and session (pre-lesion vs. post-lesion) as factors showed an interaction between these two factors for median duration (F1,23=11.2, p<0.05) and proportion of rewarded press (F1,23=31.8, p<0.05). Furthermore, post hoc analysis showed that the median duration of lesioned group was significantly reduced (p<0.01, Figure Figure4C).4C). Moreover, proportion of rewarded press was immediately decreased after surgery (p<0.0001). No differences were found for sham group (ps>0.05). In comparison, there was no interaction between time and group, no effect of session and no effect of group for IQR (Fs<1.86, ps>0.05) and rate of pressing (Fs<2.21, ps>0.05). Interestingly, the deficit in the lesioned group disappeared after three additional training sessions (4th session after surgery, ps>0.05 for all measures, data not shown). In short, post-training MD lesions caused an immediate deficit in the capacity of timing the required action duration, but this deficit could be reduced by additional learning.
The IRT distribution is shown in Figure Figure5A.5A. At the beginning of training, both groups showed peak values that are much lower than the criterion value (data not shown). After five session of 10s IRT training, sham group learned the task and showed a bi-modal distribution with the second mode above the criterion IRT. MD lesion group still produced IRTs with a single mode below criterion (Figure (Figure5A).5A). After shifting to the new criterion of 20s, the difference between two groups became significant. In the first session after shifting, the sham group immediately shifted their second peak above 20s, whereas the lesioned group showed lower IRTs (data not shown). During the last session, the lesion group slightly increased the proportion of long IRTs (Figure (Figure55A).
Acquisition was quantified using three measures: Median, IQR, and proportion of rewarded presses. Here better performance is indicated by higher IRTs. A two-way ANOVA analysis of the >10s training data (Figure (Figure5B,5B, left), with time and lesion as factors, revealed a main effect of time across sessions (median, F4,54=13.53, p<0.0001; IQR, F4,54=11.71, p<0.0001; proportion of rewarded press, F4,54=46.71, p<0.0001), but no main effect of lesion (median, F1,54=0.76, p>0.05; IQR, F1,54=2.89, p>0.05; proportion of rewarded press, F1,54=2.41, p>0.05), and no interaction between these factors (Fs<2.12, ps>0.05).
When the criterion IRT shifted to 20s, lesioned rats immediately showed a deficit. This observation was confirmed by a two-way ANOVA performed on the data from first three sessions of 20s IRT training (Figure (Figure5B,5B, right). MD lesions reduced IQR and proportion of presses rewarded (main effects of lesion on IQR: F1,26=5.95, p<0.05; on proportion of rewarded presses, F1,26=10.37, p<0.05). There was also a significant effect of time (IQR, F2,26=4.79, p<0.05; proportion of rewarded presses, F2,26=3.83, p<0.05) and no interactions between these factors (IQR, F2,26=0.33, p>0.05; proportion of rewarded presses, F2,26=0.76, p>0.05). There was no difference in median IRT (no main effect of lesion, F1,26=1.10, p>0.05). These findings indicated that MD lesion has an important effect on IRT differentiation, especially when the task became presumably more difficult (>20s).
Furthermore, the lever press durations were examined. Figure Figure66 showed that sham and MD lesions exhibited similar distributions of press duration during IRT differentiation test. There was no significant group difference in median duration and IQR (Fs1,52<1.93, p>0.05; no effect of time, Fs4,52<1.66, ps >0.05; no interaction between lesion and time, Fs4,52<2.04, p>0.05). Thus, when the time between presses is differentially reinforced, MD lesions did not affect the press durations themselves. Such results show that press duration and IRTs are independently controlled.
Similarly, after training with both criterion IRTs, the rats were separated into two groups. No differences existed between two groups for median IRT (t-test, p>0.05), IQR of IRT (p>0.05), rate of pressing (p>0.05) and proportion of rewarded presses (p>0.05) at the last pre-surgery session of 20s. After recovery from surgery, rats were retested with 20s sessions (Figure (Figure7A).7A). A two-way ANOVA analysis with session and group as factors showed a main effect of session (F1,13=6.02, p<0.05), no effect of group (F1,13=0.69, p>0.05), but an interaction between these factors (F1,13=8.49, p<0.05) for IQR of IRT. Significantly, a post hoc analysis on the pre- and post-lesion training session revealed that the dispersion of lesion group was reduced by the lesion (Figure (Figure7B,7B, lesion IQR, p<0.05). There were no effects of session and group, and no interactions for median IRT (Fs <2.44, ps >0.05), proportion of presses rewarded (Fs <3.46, ps >0.05), and rate of pressing (Fs <4.64, ps >0.05). These results indicated that post-training MD lesion impaired expression of IRT differentiation.
Previous studies have found a range of drug effects on duration and IRT differentiation of behavior (Schulze and Paule, 1990, 1991; Buffalo et al., 1993, 1994; Hudzik and McMillan, 1994a,b, 1995; McMillan et al., 1994; McClure and McMillan, 1997; McClure et al., 1997). Yet the neural substrates important for these two temporal dimensions of action have not been examined in any systematic fashion (Yin, 2009).
In this study, we found that MD plays an important role in both acquisition and expression of temporal differentiation in instrumental learning. More specifically, (i) pre-training MD lesion impaired the differentiation of action durations, producing higher variability in press duration; (ii) post-training MD lesion reduced the action duration and accuracy, but did not affect variability; (iii) pre-training MD lesion impaired the acquisition of IRT selection at longer required IRT (20s), resulting in lower IRTs and probability of rewarded presses; (iv) post-training MD lesion also impaired expression of IRT differentiation. Overall, there is a general impairment in the formation of an operant, i.e. any behavioral parameter that increases the frequency of reward delivery. The effects of MD lesions are specific – limited to the shaping of the appropriate operant, be it press duration or IRT. When press duration was the operant, it was affected by MD lesions; but when IRT was the operant, MD lesions impaired IRT differentiation without having any effect on duration distribution.
Differentiation is to be distinguished from discrimination. In discrimination, behavior is generated in response to some discriminative stimulus, e.g. green light go, red light stop. In differentiation, the external stimuli do not provide any instruction about what the animal should do. Rather, the animal must use learned criteria to produce appropriate behavior. Most previous lesion studies of reward-guided behaviors use some variant of cue discrimination procedure, but the study of differentiation has largely been neglected.
The current study focuses on temporal differentiation, which is concerned with the questions of ‘when’ and ‘how long.’ In the absence of instructions the organism can select the appropriate behavioral parameter based on experienced consequences. Action duration and spacing are two basic temporal dimensions of behavior, known to be modifiable by learning (Skinner, 1938; Kuch, 1974; Kuch and Platt, 1976). In the former, the duration is the operant, whereas in the latter, the time between presses is the operant. Press duration differentiation restricts the animal's behavioral repertoire (it is impossible to enter the magazine or groom while holding down the lever), whereas IRT differentiation does not restrict the range of behaviors to fit the required temporal interval. Because IRT can be affected by a variety of uncontrolled variables, it is thought to be a noisy index of temporal differentiation (Platt et al., 1973; Kuch, 1974; Kuch and Platt, 1976). Our results support this assumption. In duration differentiation, the improvement in performance and the impairments produced by MD lesion were relatively consistent. However, in IRT differentiation, the IRTs between two presses showed a bimodal distribution. Obviously, the second peak near the criterion IRTs was due to reinforcement, while the first peak of short IRTs (~2.5s) was relatively independent of reinforcement. Nevertheless, historically the more commonly studied temporal dimension of actions has been IRTs, in part for technical reasons. Here we showed that MD lesions impaired performance on both tasks, and that duration differentiation is a more convenient and reliable method for studying temporal differentiation.
In this study the temporal differentiation experiments were conducted with a single action and a single reward. Extended training under these conditions has been shown to result in lever pressing that is not explicitly goal-directed, as indicated by insensitivity to outcome devaluation treatments (Yin and Knowlton, 2006). But devaluation is not a convenient test to assess the goal-directedness of differentiation, as the operant in question here is not the rate of action but the form of action. It is certainly possible that devaluation can reduce the rate of lever pressing on this task, which would indicate the goal-directed nature of the differentiated action. This analysis can be difficult to perform, however, as rate-based devaluation relies on the use of extinction tests to probe the remembered action-outcome representation, and the lack of reinforcement in extinction tests may produce other effects on the form of the action after temporal differentiation. A possible solution is the use of partial reinforcement schedules for specific duration criteria, so that the animals are used to performing non-reinforced but nevertheless correct actions. Such a method should be effective when combined with a short extinction test in revealing the goal-directedness of differentiated actions.
Pre-training and post-training MD lesions produced different effects. Both impaired accuracy of performance; both increased number of errors (presses not long enough to be rewarded), but in different ways. Pre-training lesions increased the dispersion of the lever press duration distribution, i.e. increasing ‘noise’ in performance. Post-training lesions, on the other hand, did not affect dispersion, but simply reduced the median duration. One obvious alternative account of these findings, especially the deficits after post-training lesions, is that MD may be needed for the inhibitory control of instrumental actions, which explains the premature release of the lever after post-training lesions. This account, however, failed to explain the very different results obtained after pre-training lesions, namely increased dispersion of press durations with no significant change in average duration. Thus, whatever role the MD may play in the inhibition of actions cannot easily explain our results.
With pre-training lesions, it is possible that animals were able to make use of alternative systems to acquire the action duration, as the duration distribution is more dispersed. In this connection it should be noted that for the 800-ms duration criterion, pre-training lesions did not produce significant deficits, even though the lesioned rats showed numerically higher IQR. With post-training lesions, there appeared to be a direct effect on the acquired memory of the criterion duration. The variability, however, was not affected. To our knowledge, such observations have never been reported in any previous lesion study for any brain region. But their significance remain unclear, as the neural circuitry underlying duration differentiation is still poorly defined.
As previously reported, pre-training but not post-training MD lesions significantly reduced sensitivity of instrumental performance to outcome devaluation and instrumental contingency degradation, suggesting a crucial contribution of MD to the acquisition of action-outcome contingencies (Corbit et al., 2003). Furthermore, others have found MD is important for new learning, but not for retrieval of previously learned scene discrimination (Mitchell et al., 2007a; Mitchell and Gaffan, 2008). Consistent with previous work, our data revealed that the MD is critical for new learning: Pre-training lesions of the MD impaired acquisition. Unlike previous work, however, our data also revealed that post-training lesions impaired the expression of temporal differentiation, for both press durations and IRTs (Figures (Figures44 and and7).7). Thus, our behavioral measures permit the discovery of new effects of post-training lesions.
Our data also suggest that the largest effects of MD lesions are found when animals have to re-adjust their behavior to new and more difficult contingencies. For example, when the criterion duration shifted to 1600ms and IRT shifted to 20s (Figures (Figures2,2, ,33 and and5).5). This is consistent with studies which suggested that the MD plays particularly role in certain forms of behavioral flexibility (Hunt and Aggleton, 1998b; Block et al., 2007). In this study, MD is required when animals have to adjust the timing of their actions.
Despite the range of deficits produced by MD lesions, they did not impair general sensorimotor functions or motivation (Figure (Figure6).6). Rather, the MD may be critical for the acquisition and retention of the operant – an arbitrary set of behavioral parameters that lead to the goal. Thus the present results extend previous work on the role of MD in action-outcome learning. But a major difference lies in the significant post-training lesions reported here using temporal differentiation. Previous work (Ostlund and Balleine, 2008) did not identify any effect on sensitivity to devaluation following post-training lesions. In traditional rate measures of instrumental performance, the operant is less constrained. As such it could be redundantly represented by numerous brain areas; and once acquired, the expression of the press rate-outcome knowledge does not appear to require the MD. On the other hand, the present temporal differentiation procedure requires more specific representation of the duration of the required lever press, which may require the MD.
The cortical-basal ganglia networks are functional units for behavioral integration (Yin and Knowlton, 2006; Haber and Calzavara, 2009). How different substrates within the networks contribute to temporal differentiation of action is not yet known. One obvious candidate, in light of our data, is the associative cortico-basal ganglia network, which has access to motor initiation networks in the brainstem. In addition to MD, the medial prefrontal cortex, the major target of MD outputs, is important for learning of new action-outcome contingencies but not for expression of learned associations (Corbit and Balleine, 2003; Ostlund and Balleine, 2005). As suggested by the similar pre-training effects of the dorsomedial striatum and the prefrontal cortex (Corbit and Balleine, 2003; Yin et al., 2005), initial acquisition of action-outcome contingencies is mediated by the associative cortico-basal ganglia network including medial prefrontal cortex, associative striatum, and MD. While our results have uncovered a novel role for MD in action differentiation, additional research will be needed to clarify the specific contributions of thalamic, striatal, and cortical regions to this important adaptive function.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work is supported by National Institute on Alcohol Abuse and Alcoholism Grants 016991 to H.H.Y. We would like to thank Oksana Shelest and Alberto Lopez for their help with the experiments.
CRF, continuous reinforcement; CV, coefficient of variation; IRT, inter-response-time; MD, mediodorsal; NMDA, N-methyl-D-aspartic acid; rLP, rate of lever presses; RR, random ratio.