Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Neuron. Author manuscript; available in PMC 2012 December 22.
Published in final edited form as:
PMCID: PMC3246213

NMDA Receptors in Dopaminergic Neurons are Crucial for Habit Learning


Dopamine is crucial for habit learning. Activities of midbrain dopaminergic neurons are regulated by the cortical and subcortical signals among which glutamatergic afferents provide excitatory inputs. Cognitive implications of glutamatergic afferents in regulating and engaging dopamine signals during habit learning however remain unclear. Here we show that mice with dopaminergic neuron-specific NMDAR1 deletion are impaired in a variety of habit learning tasks while normal in some other dopamine-modulated functions such as locomotor activities, goal directed learning, and spatial reference memories. In vivo neural recording revealed that DA neurons in these mutant mice could still develop the cue-reward association responses, but their conditioned response robustness was drastically blunted. Our results suggest that integration of glutamatergic inputs to DA neurons by NMDA receptors, likely by regulating associative activity patterns, is a crucial part of the cellular mechanism underpinning habit learning.


Many acts, after repetitive practice, would transform from being goal directed to automated habits, which can be carried out efficiently and subconsciously. Habits help to free up the cognitive loads on routine procedures and allow us to focus on new situations and tasks. Despite breakthroughs unveiling participations of different anatomical structures in habits formation (Knowlton et al., 1996; Yin and Knowlton, 2006), the underpinning physiological mechanisms and how different network circuitries integrate remain unclear.

Dopamine is an important regulator of synaptic plasticity, especially in the basal ganglia, a structure essential for habit learning. In both human patients (Fama et al., 2000; Knowlton et al., 1996) and rodents (Faure et al., 2005), habit learning is often found impaired following dopaminergic neuron degeneration. Dopamine has thus been postulated as a main modulator in the mechanisms underpinning habit learning (Ashby et al., 2010). Despite this importance, the mechanisms modulating dopamine during habit learning have yet to be fully investigated. Studies have shown that habit learning deficits caused by dopamine deafferentation could not be rescued by simple intra-striatal injections of DA agonists (Faure et al., 2010). These observations suggested that dopamine, the modulator itself, might need to be regulated during normal habit learning. Anatomically, along with cholinergic inputs, glutamatergic afferents from brain structures such as pedunculopontine tegmental nucleus (PPTg), subthalamic nucleus (STN) and prefrontal cortex (PFC) provide main forms of excitatory inputs to the midbrain DA neurons (Grace et al., 2007). A member of the ionotropic glutamate receptor family, NMDA receptors are an important regulator of DA neuron activity. Firstly, the synaptic plasticity in the glutamatergic afferents to the dopamine neurons depends on NMDA receptors (Bonci and Malenka, 1999; Overton et al., 1999; Ungless et al., 2001). This plasticity can be modulated by experiences, environmental factors, and psychostimulant drugs (Bonci and Malenka, 1999; Kauer and Malenka, 2007; Saal et al., 2003). Secondly, iontophoretic administration of NMDAR antagonists, but not AMPAR-selective antagonists, attenuated phasic firing of DA neurons, an activity linked to reward/incentive salience (Schultz, 1998), without changing the frequency of tonic firing (Overton and Clark, 1992). Thirdly, in drug addiction studies, NMDARs in DA neurons are essential for developing nicotine conditioned place preference (Wang et al., 2010) and likely also involved in cocaine conditioned place preference (Engblom et al., 2008; Zweifel et al., 2008). Thus, we postulated that modulation of DA neurons by NMDA receptors might be important in the engaging DA neurons in the habit learning. Here, we set out to examine the roles of NMDA receptors in DA neurons, by generating DA neuron-specific NR1 knockout mice and testing them in a variety of habit learning paradigms (Devan and White, 1999; Dickinson et al., 1983; Packard et al., 1989; Packard and McGaugh, 1996). In order to understand the cellular mechanisms, we also recorded the DA neurons in these mice using multi-electrode in vivo neural recording techniques (Wang and Tsien, b, 2011).


Production and basic characterization of DA neuron selective NR1 knock-out mice

These mice, named “DA-NR1-KO”, were produced by crossing floxed NR1 (fNR1) mice (Tsien et al., 1996) with Slc6a3+/Cre transgenic mice which express Cre recombinase under DA transporter promoter (Zhuang et al., 2005) (Figure 1A, Figure 1B). The DA neuron specific deletion of the NR1 gene was confirmed by both the reporter gene method (Figure 1C) and immunohistochemistry (Figure 1D), which showed that the gene deletion was restricted to the dopamine neuron in regions such as the VTA and the substantia nigra. No obvious changes were observed in expression of tyrosine hydroxylase the catecholamines neuronal marker, suggesting that there was no obvious loss of dopaminergic neurons (Figure S1).

Figure 1
Generation and characterization of DA-NR1-KO mice

DA-NR1-KO mice were born in the expected Mendelian ratios and visually indistinguishable from the controls. Additionally, they were normal in locomotor activities in a novel open field (Figure 2A), in learning the rotarod tests (Figure 2B), in an anxiety test using the elevated plus maze (Figure 2C), and in the novel object recognition tests (Figure 2D). These results showed that many of the behavioral functions that were sensitive to dopamine dysfunctions were preserved in the DA-NR1-KO mice.

Figure 2
Basic behavioral characterization of DA-NR-KO mice

DA neurons in the DAT-NR1-KO show normal tonic firing, reduced phasic firing and reduced responses to the reward predicting cue

In order to investigate the impacts of NR1 deletion on the cellular properties of DA neurons, we recorded the activities of these neurons in both the DAT-NR1-KO mice and wild type control littermates. Movable bundles of 8 tetrodes (32 channels) were implanted into the ventral midbrain primarily the VTA. The putative DA neurons were identified based on their firing patterns and their sensitivity to dopamine receptor agonist apomorphine (1 mg/kg, i.p.) at the end of each recording session (Figure 3A, B, C and D).

Figure 3
Bursting firing by DA neurons is impaired in KO mices

14 putative DA neurons from 4 mutant mice and 16 from 6 wild type controls were recorded and analyzed. Phasic firing activities or bursting was defined as a spike train beginning with an inter spike interval (ISI) smaller than 80 ms and terminating with an ISI greater than 160ms. Comparing with the control neurons, phasic firing activities was greatly reduced in the NR1 KO DA neurons. The observed median frequency of phasic firing decreased from 0.78±0.09 Hz in the control DA neurons to 0.36±0.09 Hz in KO DA neurons. (Mann-Whitney U test P<0.01) (Figure 3E). A significant reduction was also observed in the percentages of spikes fired in phasic activities (34.7% in the controls vs 21.2% in the DAT-NR1-KO, Mann-Whitney U test P<0.01) (Figure 3F). The total firing rate was also reduced in the mutant DA neurons. This appeared to be correlated with reduced burst set rate (5.18±0.59 Hz, control, vs. 3.85±0.38Hz, KO; r=0.7719, Mann-Whitney U test P<0.01) (Figure 3G). No significant difference was observed in the tonic firing between the mutant and control groups. (4.42±0.44 Hz in control, vs. 3.29±0.36Hz in KO, Mann-Whitney U test P>0.05) (Figure 3H)

To further evaluate the response of DA neurons in a learning task, mice were trained 40 trials per day in a Pavlovian conditioning paradigm in which a 5 KHz tone that lasted 1 second proceeded immediately before the delivery of a food pellet. DA neurons from both genotypes were able to associate the tone with phasic firing, but the conditioned responses were much weaker in the DAT-NR1-KO group (Figure 4A). Thus while DAT-NR1-KO neurons showed increased firing over the days during the training, their responses were significantly reduced comparing with the controls on day 1 (19.21±3.24 Hz, control, vs. 9.74±0.30Hz, KO; p<0.01), day 2 (36.33±4.39Hz, control, vs. 16.43±4.01Hz, KO; p<0.01) and day 3 (59.38±3.82 Hz, control, vs. 33.88±4.30Hz, KO; p<0.01) (Figure 4B). These data suggested that while NMDAR1 deletion did not prevent DA neurons from developing conditioned responses (bursting) towards reward predicting cues, it did however greatly lowered the robustness of the bursting response, a phenomena which we call DA neuron blunting.

Figure 4
Responses of putative dopamine (DA) neurons in three days reward test

Habit learning, but not goal directed learning, was impaired in the operant appetitive conditioning

To assess habit learning, we first tested the mice in a lever pressing operant conditioning task. In this task, an instrumental action, pressing lever to obtain food, can transform from a goal directed to a habitual response after extensive training and become progressively less sensitive to devaluation of outcome (Dickinson et al., 1983). The decreased sensitivity can thus be measured as a behavioral readout of habit learning (Figure 5A). Both mutant and control mice learned to press the lever on an extensive training protocol consisting of four days of continuous reinforcement (CRF), two days of random interval RI 30s, and six days of RI 60s schedules (Dickinson et al., 1983). Mice in both groups increased lever press rates during the training (CRF Day 1 through 4, RI 30s day 5 and 6, RI 60s day 7 through 12) (Figure 5B). A two-way ANOVA of repeated measures, with days and genotype as factors showed no effect of genotype (F(1, 231) = 0.07), a main effect of days (F(11, 231) = 51.4, P<0.01), and no interaction between these factors (F(11, 231) = 0.269). This result suggested that the DA-NR1-KO mice have normal wanting of the pellet reward and exhibited normal goal-directed learning.

Figure 5
Habit and goal directed learning test with operant appetitive conditioning

Lever pressing was then tested after the outcome devaluation. Mice were pre-fed with either regular mouse chow to which they had been exposed in their regular home cages (non-devalued condition/control), or purified high-energy pellets which are identical to the rewards earned during lever-press sessions (devalued condition). Feeding with mouse chow was used as a control for the overall level of satiety, causing little reduction in the rewarding value of the purified high-energy pellets. Levers were inserted in the 5 minute long probe test which immediately followed the hour long unlimited food exposure (pellets or chow). No pellets were given during the tests. Comparing numbers of lever press during the tests showed that while no differences were found between the mutant and the control mice on non-devalued condition (p=0.94) or between the devalued and non-devalued conditions (p = 0.153) in the control group, there was a significant difference in the mutant mice between devalued and non-devalued conditions (p < 0.01). Furthermore, there was also a significant difference between the mutant and control mice on devalued condition (p < 0.05). A two-way ANOVA of repeated measures, with treatment and genotype as factors showed a interaction between the two factors (F(1, 21) = 4.98, p<0.05) (Figure 5C). These suggested that the conditional knockout mice failed to develop the lever-pressing habit despite extensive training, and their action stayed goal directed.

Spatial navigation habit, but not spatial memory, was impaired in the positively reinforced plus maze

Habit learning was then assessed in a navigation-based paradigm using plus maze place/response learning tasks (Devan and White, 1999; Packard, 1999; Packard and McGaugh, 1996). Littermates in genotypes Slc6a3+/Cre; fNR1/+, Slc6a3+/Cre and wild type served as three control groups for the DA-NR1-KO mice. The maze was built with transparent walls and placed in a room furnished with spatial cues. The schematic training and testing schedules are shown in Figure 6A. Naïve animals, always starting from the same location in the maze (the “south” arm), were trained to find a fixed target site (in the “east” arm) (training I in Figure 6A). In order to facilitate developing habit based navigation, the north and the west arm were both closed. It has been shown that under this paradigm, normal mice would learn to search the target using spatial reference memory after moderate training but would switch to habitual navigation after extensive training (Packard and McGaugh, 1996). Probe trials, during which the start location switched from the “south” arm to the “north” arm, were given at different time points to allow dissociation of the spatial and habitual strategies. Thus mice using the “habit strategy” were predicted to turn right (into the “west” arm) while the “spatial” mice, guided by distal spatial cues, were predicted to go to the “east” arm, where the target resided during training.

Figure 6
Habit learning analyzed using plus maze

All mice were trained in 10 trials per day for five consecutive days before the first probe trial on day 6 (Probe 1 in Fig 6A). During this probe trial, the DA-NR1-KO group and control mice showed similar preferences [χ2(3, n = 43) = 0.346, P= 0.951] for the “spatial” strategy, opting to turn left towards the “east” arm (Figure 6B) suggesting that they had similarly acquired the spatial memory and that they shared comparable motivation. All mice were then trained for 10 additional days before the second probe trial (probe 2 in Figure 6A) on day 17. During this probe trial, no significant differences were found among the three control groups (χ2 (2, n=29) = 0.499, p= 0.779). As a group, control mice opted to “turn right” (and into the “west” arm) significantly more on day 17 than on day 6 (χ2 (1, n = 29) = 22.587, p = 0.00000201), indicating a learned “habit” based searching strategy. In contrast, less than 10% of the DA-NR1-KO mice (comparing with 80% of control mice) (mutants vs controls: χ2 =7.244; p =0.007), opted to turn “right” on day 17 (Figure 6B), suggesting that they failed to learn the “habit” based strategies and instead kept using the “spatial” strategy.

To confirm that the deficits in the plus maze tasks were indeed from habit learning, right after the second probe trial mice were further challenged in a “re-learn after 90° rotation” procedure (training II, Figure 6A), three trials a day for two days within the exact same maze and surrounding cues. During the training, both the west and south arms were blocked. The start box was placed in the “east” arm and the food rewards in the “north” arm. Mice were tested in a rotation test on day 19 and accuracies to locate the food were scored. Mice started from the “east” arm with all arms open during the test (Figure 6A). For “habit” mice who had learned to “turn right” during previous training sessions (day 1 through 16), this new learning was simply a re-training, in which the same habit response (turning “right”) would lead them to the new food location. However, for the “spatial” mice, switching of target location from the “east” arm to the “north” arm conflicted with the previously learned spatial relationship and thus, was predicted to inhibit new learning. As in Figure 6C, the mutants showed significantly less success (turning “right” or into the “north” arm) (χ2(3, n = 42) = 11.667, P= 0.0006) while no difference was found (χ2(3, n = 42) = 0.73, P= 0.694) among the three control groups. This supported the notion that mutant mice failed to learn the habit strategy, even after the extensive training.

Spatial navigational habit learning, but not spatial memory, was impaired in the negatively reinforced plus maze

Since many studies suggested that dopamine is important for reward pathways, we asked whether habit learning deficits seen in the DA-NR1-KO mice hinged on the nature of the reinforcement. The aforementioned experiments were replicated in a water-based plus maze, in which the sole escape from the water was for mice to locate and climb onto a hidden platform in the end of one arm. This water-based plus maze behavior was driven by the desire to escape from the negative environment and offered an additional opportunity to compare with habit learning based on positive reinforcement such as the seeking of a food reward. All parameters such as maze dimensions, cues used, starting and target locations, number of trials per day and numbers of days in training remained the same as those in the previous food rewarded experiments (Figure 6A).

The first probe trial revealed no significant differences between any two of the four genotypes (χ2(3, n = 43) = 0.346, P= 0.951). The second probe trial showed that over 80% of the control mice had adopted the “habit” strategy, while the mutant mice remained strongly “spatial” (Figure 6D). No differences were found among the three control groups (χ2 (2, n=29) = 0.499, p= 0.779). As a group, the control mice opted for the “habit” strategy significantly more on day 17 than on day 6 (χ2 (1, n = 29) = 22.587, p = 0.00000201). A significantly lower percentage of DA-NR1-KO mice opted to “turn right” (7.14% vs. 80% in the control mice; χ2 (1, n = 43) = 20.904, p = 0.00000483). The deficits in habit learning were further confirmed in the rotation test given after two days of the “re-learn after 90° rotation” challenge learning (Training II, Figure 6A). A significantly smaller proportion of the mutant mice (28.6%) in contrast to 80% of the controls, were able to successfully locate the new platform position (One tailed probability = 0.000388, Fisher’s exact test). These data thus agreed with the findings from the food-rewarded tasks suggesting that the learning deficits were unlikely contingent on the types of reinforcement employed in the training process.

Due to the significant involvement of spatial learning in the plus maze task, mice were tested in a spatial version of the plus maze (Figure 7A). They were trained six trials per day for six days to find a hidden platform in the water filled plus maze. With all four arms open, starting points switched between trials in each day rotating among the distal ends of three arms that did not contain the platform, following a semi-random order. The platform location remained fixed throughout. A probe test was given on day 10, three days after the training session ended. During the test, with the platform removed, mice were released to the center of the maze and allowed to search for 60 seconds. Durations spent by each mouse in each arm were recorded (Figure 7B). Mice from all four groups spent significantly more time searching in the target arm [mutants, F(3,32) = 101.292, p <0.001; Cre, fNR1/+, F(3,28) = 134.996, p<0.001; cre, F(3,36) = 147.806, p <0.001; wild type, F(3, 36) = 294.358, p < 0.001; Newman-Keuls post hoc comparison (the target arm compared to all the other arms), P < 0.01 for all genotypes]. No differences were found between the mutant and any control groups, suggesting that spatial learning abilities were unlikely a factor causing the habit learning deficits observed in the DA-NR1-KO mice.

Figure 7
Spatial memory test using plus maze and habit learning test using zigzag maze

Habit learning in a nonspatial zig-zag maze-based habit task was impaired

Instead of compromising habit learning per se, DA specific NR1 deletion could instead have skewed the competition between “spatial” and “habit” memory systems in the plus maze task. In order to investigate this possibility, we designed a nonspatial “zigzag maze” task as a more direct measurement of habit learning. As shown in Figure 8A, the water filled zigzag maze consisted of eight arms similar in length. Mice were trained to escape onto a hidden platform. Six different starting points were chosen, each paired with its own location of the hidden platform. The platform locations were chosen so that they would be reached after two consecutive right turns from the start point. All mice were trained 12 trials per day for 10 days. To facilitate developing the turning habits, some arms were blocked (red lines) so that mice were only allowed the correct turn at each intersection. A probe test was given on day 11 in which mice were placed at a random start location. Some arms in the maze remained blocked (red lines) but unlike in training, mice were allowed to choose between turning “left” or “right” at two intersections (Figure 8A). Mice were scored for whether they finished the two consecutive right turns (counted as “successful”). No differences were found among the three control genotypes (all between 90% to 100%, χ2(2, n = 29) = 1.968, P= 0.374) (Figure 8B), and they were pooled. The conditional knockout mice showed a significantly lower successful rate in making the two consecutive right turns (One tailed probability = 0.000196, Fisher’s exact test) again suggesting that the DA-NR1-KO mice are defective in developing the navigation habit.

Figure 8
Habit learning test using zigzag maze


Here we studied mutant mice with DA neuron selective NR1 deletion using in a set of behavioral tasks as well as in vivo neural recording techniques. Behavioral analysis revealed that the DAT-NR1-KO mice were impaired in several forms of habit learning. In an operant task where both the mutant and control mice learned a goal directed action in the initial training, but extensive training shifted the learned action from goal directed to habitual only in the control mice. In the mutant mice, this action remained goal directed and thus sensitive to reward devaluation. Similarly, in plus maze tasks, while both mutants and the controls learned to navigate based on spatial cues in initial training, extensive training shifted navigation from spatial into habitual also only in the controls, whereas the mutants’ navigation remained spatially oriented. Such deficits in habit learning were observed in both positively reinforced and negatively reinforced tasks. This is consistent of our recent recordings showing that DA neurons employ a convergent encoding strategy for processing both positive and negative values (Wang and Tsien, 2011b). One notable feature from those in vivo recording experiments is that some of the DA neurons exhibiting stimulus-suppression-then-rebound-excitation in response to negative experiences (Wang and Tsien, 2011b). This offset-rebound excitation may encode information reflecting not only a relief at the termination of such fearful events, but perhaps providing some sorts of motivational signals (e.g., motivation to escape).

Therefore, our data strongly suggested that NMDA receptor function in DA neuron be essential for habit learning. A previous study by Zweifel et al. (Zweifel et al., 2009) reported that the DA neuronal selective NR1 KO mice were impaired in learning a water maze task and also impaired in learning a conditioned response in an appetitive T-maze task, seemingly in disagreement with our results of normal spatial learning and goal direct learning. The experimental conditions used in their studies were however quite different from those in ours. The water maze deficit was transient and detectable only during the very early part (day 2 in a five day session) of the training. The T-maze was a goal directed paradigm that likely also involved mice learning context association between landmarks and rewards. Additionally the action-reward contingency was also different than that in the operant paradigm which we used. It is very likely that factors such as task difficulties, amount of training, cue saliencies, temporal and spatial contingencies between the CS and the rewards can affect the type and amount of involvement by DA neuron. Using in vivo neural recordings, we observed that although the response to cue-reward association is much attenuated in NR1 KO DA neurons in term of both response peak amplitude and duration, these neurons nonetheless still could form the cue-reward association. Interaction between the blunted responsiveness of DA and test conditions may leave some goal directed learning impaired by the NR1 deletion while spare some others. A good example is that in reported published later by the same group, the authors reported that the NR1 KO mice were normal in a goal direct learning paradigm (Parker et al., 2010). In our study, the test conditions and amount of trainings we used allowed the controls as well as the mutants to learn goal directed and spatial learning normally and indistinguishably. After extensive training under these conditions, the mutant mice could not develop the habits learning while the controls clearly did.

Dopamine is an important modulator for the habit learning (Wickens et al., 2007; Yin and Knowlton, 2006). Most of the current understanding of its involvement in the habit learning has so far centered on the downstream pathways and structures such as the dorsal striatum and more recently the prefrontal cortex (Wickens et al., 2007; Yin and Knowlton, 2006). Our finding here highlighted the importance of glutamatergic modulations of the DA neuron circuitry itself, in this case mediated by NMDA receptors, and suggested that this upstream pathway should be considered an integral part of the habit learning networks. With perception of the environmental stimuli likely carried out by glutamatergic signals, it is conceivable that NMDA receptors in dopaminergic neurons participate in the controlling and fine-tuning of dopaminergic neuron activity patterns during habit formation. An important part of this regulation is perhaps to create the cue-reinforcement association at an appropriate level in term of response robustness and overall DA neuron network patterns so that DA neurons would respond accordingly to procedures and cues with higher incentives salience. NMDA receptors are required in mediating synaptic plasticity in glutamatergic synapses onto DA neurons (Bonci and Malenka, 1999). Our results showed that modulation by NMDA receptors facilitate bursting of DA neurons towards the learned reward predicting cues. It is conceivable that the function of NMDA receptors in regulating phasic firing may be closely linked its roles in regulating synaptic plasticity. In fact, studies have shown that enhanced synaptic strength onto dopamine neurons may act to facilitate their phasic firing (Stuber et al., 2008).

The blunting of the phasic firing of DA neuron in the mutant mice can contribute or even result in the habit learning deficits. There are several brain regions involved in habit learning that can be affected by this blunting. The most intuitive one is the striatum. Dopamine signaling has been postulated as the mechanism that trains the striatum, which in turn trains the cortex to establish the appropriate sensorimotor associations required for developing habits (Ashby et al., 2010; Wickens et al., 2007). Dopamine modulates the plasticity in the corticostriatal synapses, facilitating induction of LTP in conditions that would otherwise induce LTD. This facilitation requires dopamine D1 receptor (Calabresi et al., 2000). The low affinity of D1 receptors towards dopamine coupled with the fast dopamine reuptake (Cragg et al., 1997) in the striatum likely makes the dopamine modulation sensitive to the blunting of phasic release. From a network point of view, it has been reported that dopamine can cause changes in the coordinated activity of neuronal ensembles in corticostriatal circuits and by doing so “gate” the inputs in those downstream regions (Costa et al., 2006). Thus when dopamine level is low, such as when bursting activities are insufficient, it fails to produce and reinforce these networks’ connectivity underlying habit formation. Other than the striatum, reduced bursting of DA neurons may also affect activities of structures such as the prefrontal cortex of which lesion of the medial infralimbic area was reported to impair expression of a learned habit (Coutureau and Killcross, 2003). Studies have shown that tonic dopamine concentration in the prefrontal area, likely due to the relatively slower dopamine reuptake (Seamans and Yang, 2004), may be affected by previous phasic dopamine release (Matsuda et al., 2006). The presence of background dopamine signal converts LTD to potentiation. This “priming” requires time to develop and requires D1 and D2 receptors both of which have low affinity to dopamine. It is very likely that this phasic release induced “priming” could also be affected by the amount of DA neurons bursting thus by blunting of DA response. It will be of great interest to dissect the various roles of those different brain regions in habit formation in future studies.

It is also important for future research to further analyze the contributions of NMDARs within different dopamine subpopulations, and temporally within different phases of habit learning. The potential sub-regional circuitry within the DA neuron populations in the VTA and SNr regions can be highly crucial for integrating distinct cortical and subcortical inputs (Grace et al., 2007; Lammel et al., 2011; Lisman and Grace, 2005). Thus, it is conceivable that additional sub-regional specific manipulations and analyses could further elucidate how the glutamatergic regulation of DA neurons, as revealed by our current study, modulates habit formation.

In summary, our study has provided several important insights about NMDA receptor in DA neurons and habit learning: First, NMDA receptors in DA neurons are required for learning habits including appetitive lever pressing and spatial navigational habits. Second, the dependence of habit learning on NMDA receptors in DA neurons was observed in both positively and negatively reinforced trainings. Third, DA neurons lacking the NMDA receptors can still form the cue-reward association but with greatly reduced phasic activity as well as conditioned response robustness. Taken together, our results suggest that the NMDA receptors in DA neurons are an important modulator of DA neurons’ response robustness in cue-reward association and an essential element underpinning habit learning.



Mice carrying alleles of NMDAR1 flanked by loxP sites (fNR1; “floxed” NR1) were bred with Slc63a Cre transgenic mice. Offspring were genotyped by PCR for both the Cre transgene and for the floxed NMDAR1 (fNR1) locus. Mice used in these experiments have been bred for at least five generations onto the C57/BL6 background. Animals were maintained on a 12 hr light/dark cycle in the Georgia Health Sciences University animal care facility. Except for when specified in experiment, such as when food pellets were used as rewards, food and water were given ad libitum. All procedures relating to animal care and treatment conform to the Institutional and NIH guidelines. For behavioral tests in the study we used male mice around 1 year old in age. These animals have been prescreened to make sure that they have normal vision and hearing capacity.


Mice were perfused transcardially with 4% paraformaldehyde (PFA) in 1x PBS followed by a post fixation in 4% paraformaldehyde overnight. Coronal sections (50 μm thick) were cut on a Vibratome and collected in 0.5% PFA in 1×PBS and stored at 4°C before use. For double immunofluorescent staining of beta-Galactosidase and tyrosine hydroxylase (TH), sections were incubated at 4°C overnight with gentle shaking in primary antibody [Anti-beta-Galactosidase (pAb) 1/5000, Invitrogene; anti tyrosine hydroxylase (TH) (monoclonal antibody) 1/1000) ]in a buffer containing 0.05% Tween20, 10% normal goat serum and 1x PBS following pre-incubation in 10% normal goat serum and 1× PBS at room temperature for 2 hours. The sections were then incubated with Alexa conjugated secondary antibodies (1/200, Invitrogene) at room temperature for 2 hours. Beta-Galactosidase-IR was visualized by Alexa 568, TH-IR by Alexa 488. A similar procedure was employed to double stain NMDAR1 and tyrosine hydroxylase except for that anti-NR1 (polyclonal antibody 1:100; Chemicon, Temecula, CA) was used as the primary antibody for NMDAR1. The sections were incubated with Alexa conjugated secondary antibodies (1/200, Invitrogene) at room temperature for 2 hours. NMDAR1 was visualized by Alexa 488, TH-IR by Alexa 594. Fluorescent images were captured with a confocal laser scanning microscope and an epifluorecence microscope.

Elevated plus maze

This apparatus consists of a center platform (5 cm × 5 cm) 37 cm off the ground with four branching arms (30 cm long and 5 cm wide). Two of the four arms are open and the other two arms are enclosed by black walls (20 cm high). Testing was performed during light phase in a dimly lit room (50 lux). Animals were placed on the center platform and scored for arm entries and time spent in each arms. Percentages of time animals spent in the open arms were calculated out as the final read out of anxiety. Unpaired t-tests were used to compare the significance between the different genotypes.


RotaRod analysis was performed using the mouse version of Rotarod manufactured by San Diego Instruments. Mice were trained by allowing them to run on a rotarod rotating at 30 rpm for a total time span of 5 minutes. (Time counting was stopped when mice dropped until it is put back onto the rotarod again.) During the tests, mice were again placed on top of the rotarod which rotated at 30 rpm. Durations each mouse stayed on the rotarod (latency to fall) were recorded. Any mice remaining on the apparatus 300s after the starts were removed, and the time was scored as 300 s. Unpaired t-tests were used to compare the significance between the latencies in different genotypes.

Open field activity test

Locomotor activity was measured by scoring beam breaks in activity chambers (San Diego Instruments, San Diego, CA). Prior to open field tests, animals were handled for two consecutive days. Standard rat cages were used as the novel open field for the mice tested. Locomotor activities were recorded for one hour and scored for both five minutes and one hour. Unpaired t-tests were used to compare the significance in fine movements, ambulatory movements and rearing between the different genotypes.

Instrumental training

Mice were placed on a food deprivation schedule to reduce their weight to 80–85% of their baseline weight. They were fed for 2 hours with mouse chow in their home cages each day after training. Water was available at all time in the home cages.

Training and testing took place in eight Med Associates operant chambers (21.6 cm length × 17.8 cm width × 12.7 cm height) housed in boxes with sound-attenuating walls. Each chamber was equipped with a food magazine, two retractable levers, one on each side of the magazine and a 3 W, 24 V house light mounted on the same wall but above the food magazine. Bio-Serv 20 mg pellets from a dispenser into the magazine were used as reward. The software Med-PC-IV from Med Associates was used for equipment control and behavior recording.

Lever-press training

At the beginning of each session, the house light was turned on and the lever inserted. At the end of each session, the light was turned off and the lever retracted. Mice were trained in an initial lever-press training consisted of 4 consecutive days of continuous reinforcement (CRF), during which the mice received a pellet for each lever press. A session would end after 60 min or after the mouse had collected 30 rewards, whichever came first. After CRF, mice were trained with random interval (RI) schedules to generate habitual lever pressing (Dickinson et al., 1983). The training started with 2 days on RI 30 s, with a 0.1 probability of reward availability every 3 seconds contingent on lever press, and followed by 6 days on the 60 s interval schedule, with a 0.1 probability of reward availability every 6 seconds contingent on lever pressing. Repeated measures ANOVA was used to compare lever press between the different genotypes.

Devaluation tests

A specific satiety procedure was used for outcome devaluation. Mice were given unlimited access within a fixed duration to either the mouse chow to which they had been exposed in their home cages (non-devalued condition/control), or the purified pellets they normally earned during lever-press sessions (devalued condition). The mouse chow served as a control for overall level of satiety. This procedure controls the overall level of satiety and motivational state while altering the current value of a specific reward. Immediately after 1 hour of unlimited exposure to the pellets or chow, the mice were subjected to a 5 min long probe test. During the probe test, the lever was inserted, but no pellet would be delivered in response to lever pressing. This brief extinction test was designed to test whether the acquired lever pressing of the mice was controlled by the action–outcome instrumental contingency or habit (e.g. in response to a antecedent stimuli.) On the second day of outcome devaluation, the same procedure was used, except that those animals that received mouse chow on day 1 received pellets on day 2, and vice versa. When grouping, mice were counter balanced between genotypes and treatment. Repeated measures ANOVA and unpaired student t-test were used to compare lever press between the different genotypes as specified in the text.

Plus Maze

The maze consisted of four arms measuring 35 cm long and 6 cm wide and 35 cm deep, with transparent high walls made of clear plexiglass. For training positively reinforced with food pellets (20 mg per pellet), animals were maintained at 80–75% of their free-feeding weight throughout the experiment. For training negatively reinforced with water, water was stained opaque and white with titanium dioxide. A hidden platform was placed 1 inch under the water surface. The training and testing were as described in the text. For plus maze assays, littermates in Slc6a3+/Cre, fNR1/+ (control), Slc6a3+/Cre (Cre control) and wild type genotypes were chosen as three control groups. Turning of mice in different tests were compared using Chi square tests as specified in the text to compare the performance of mice from different genotypes. Additionally, repeated measures ANOVA and unpaired student t-test, as specified in the text, were used to compare time spent in different arms among mice from the different genotypes.

Zigzag maze

The shape of the zigzag maze is illustrated in Figure 8A. Each arms measure about 30 cm long and 6 cm wide and 35 cm deep. The maze was filled with water which was stained opaque and white with titanium dioxide. A hidden platform was placed at a designed location 1 inch under the water surface. Training and tests were done as described in the text. Chi square test and Fisher’s exact test were used to compare the performance of mice from different genotypes.


A 32-channel (a bundle of 8 tetrodes), ultra-light (weight <1 g), movable (screw-driven) electrode array was constructed similar to that described previously (Lin L, Chen G, Xie K, Zaia KA, Zhang S, et al. Large-scale neural ensemble recording in the brains of freely behaving mice. J Neurosci Methods. 2006;155:28–38.). Each tetrode consisted of four 13-μm diameter Fe-Ni-Cr wires (Stablohm 675, California Fine Wire; with impedances of typically 2–4 MΩ for each wire) or 17-μm diameter Platinum wires (90% Platinum 10% Iridium, California Fine Wire; with impedances of typically 1–2 MΩ for each wire). One week before surgery, mice (3–6 months old) were removed from the standard cage and housed in customized homecages (40×20×25 cm). On the day of surgery, mice were anesthetized with Ketamine/Xylazine (80/12 mg/kg, i.p.); the electrode array was then implanted toward the VTA in the right hemisphere (3.4 mm posterior to bregma, 0.5 mm lateral and 3.8–4.0 mm ventral to the brain surface) and secured with dental cement.

Tetrode recording and units isolation

Two or three days after surgery, electrodes were screened daily for neural activity. If no dopamine neurons were detected, the electrode array was advanced 40~100 μm daily, until we could record from a putative dopamine neuron. Multi-channel extracellular recording was similar to that described previously [49]. In brief, spikes (filtered at 250–8000 Hz; digitized at 40 kHz) were recorded during the whole experimental process using the Plexon multichannel acquisition processor system (Plexon Inc.). Mice behaviors were simultaneously recorded using the Plexon CinePlex tracking system. Recorded spikes were isolated using the Plexon OfflineSorter software: multiple spike sorting parameters (e.g., principle component analysis, energy analysis) were used for the best isolation of the tetrode-recorded spike waveforms. Combining the stability of multi-tetrode recording and multiple unit-isolation techniques available in OfflineSorter (e.g., principle component analysis, energy analysis), individual VTA neurons can be studied in great detail, in many cases for days.

Reward conditioning of DA neurons

Mice were slightly food restricted before reward association training. In reward conditioning, mice were placed in the reward chamber (45 cm in diameter, 40 cm in height). Mice were trained to a tone (5 kHz, 1 sec) with subsequent sugar pellet delivery for at least three days (40 trials per day; with an interval of 1–2 min between trials). The tone was generated by the A12–33 audio signal generator (5-ms shaped rise and fall; about 80 dB at the center of the chamber) (Coulbourn Instruments). A sugar pellet (14 mg) was delivered by a food dispenser (ENV-203-14P, Med. Associates Inc.) and dropped into one of two receptacles (12×7×3 cm) at the termination of the tone (the other receptacle was used as control, where a sugar pellet was never received).

Analysis of in vivo recording data

Sorted neural spikes were processed and analyzed in NeuroExplorer (Nex Technologies) and Matlab. Dopamine neurons were classified based on the following three criteria: 1) low baseline firing rate (0.5–10 Hz); 2) relatively long inter-spike interval (all the classified putative dopamine neurons are with ISIs >4 ms within a ≥99.8% confidence level). The shortest ISI we recorded was 4.1 ms under any conditions in our experiment (only well-isolated units with amplitude ≥0.4 mV were used for calculation of the shortest ISI). The averaged shortest ISIs was 6.8±2.2 ms (Mean ± s.d.; n = 36). In contrast, the ISI for non-dopamine neurons can be as short as 1.1 ms; 3) regular firing pattern when mice were freely behaving (fluctuation <3 Hz). Here, fluctuation represents the standard deviation (s.d.) of the firing rate histogram bar values (bin = 1 sec; recorded for at least 600 sec).


  • Many DA-regulated functions were spared in DA selective NR1 KO mice.
  • NR1 modulate but is not required for cue-reward association in DA neurons
  • NR1 in DA neurons is required for habit learning.
  • DA-NR1-KO mice can learn goal directed actions.

Supplementary Material



We thank Fengying Huang and Brianna Klein for technical assistance. This work was supported by funds from NIMH, NIA, and Georgia Research Alliance (all to JZT).


Authors Contributions

Conceived and designed the experiments: LPW FL XS JZT. Performed the experiments: LPW, FL, DW, KX, DHW. Analyzed the data: LPW, FL, DW, KX, JZT. Contributed reagents/materials/analysis tools: LPW, FL, XS, JZT. Wrote the paper: LPW, FL, KX, JZT.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Ashby FG, Turner BO, Horvitz JC. Cortical and basal ganglia contributions to habit learning and automaticity. Trends Cogn Sci. 2010;14:208–215. [PMC free article] [PubMed]
  • Bonci A, Malenka RC. Properties and plasticity of excitatory synapses on dopaminergic and GABAergic cells in the ventral tegmental area. J Neurosci. 1999;19:3723–3730. [PubMed]
  • Calabresi P, Gubellini P, Centonze D, Picconi B, Bernardi G, Chergui K, Svenningsson P, Fienberg AA, Greengard P. Dopamine and cAMP-regulated phosphoprotein 32 kDa controls both striatal long-term depression and long-term potentiation, opposing forms of synaptic plasticity. J Neurosci. 2000;20:8443–8451. [PubMed]
  • Costa RM, Lin SC, Sotnikova TD, Cyr M, Gainetdinov RR, Caron MG, Nicolelis MA. Rapid alterations in corticostriatal ensemble coordination during acute dopamine-dependent motor dysfunction. Neuron. 2006;52:359–369. [PubMed]
  • Coutureau E, Killcross S. Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav Brain Res. 2003;146:167–174. [PubMed]
  • Cragg S, Rice ME, Greenfield SA. Heterogeneity of electrically evoked dopamine release and reuptake in substantia nigra, ventral tegmental area, and striatum. J Neurophysiol. 1997;77:863–873. [PubMed]
  • Devan BD, White NM. Parallel information processing in the dorsal striatum: relation to hippocampal function. J Neurosci. 1999;19:2789–2798. [PubMed]
  • Dickinson A, Nicholas DJ, Adams CD. The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. The Quarterly Journal of Experimental Psychology Section B: Comparative and Physiological Psychology. 1983;35:35 – 51.
  • Engblom D, Bilbao A, Sanchis-Segura C, Dahan L, Perreau-Lenz S, Balland B, Parkitna JR, Luj·n R, Halbout B, Mameli M, et al. Glutamate Receptors on Dopamine Neurons Control the Persistence of Cocaine Seeking. Neuron. 2008;59:497–508. [PubMed]
  • Fama R, Sullivan EV, Shear PK, Stein M, Yesavage JA, Tinklenberg JR, Pfefferbaum A. Extent, pattern, and correlates of remote memory impairment in Alzheimer’s disease and Parkinson’s disease. Neuropsychology. 2000;14:265–276. [PubMed]
  • Faure A, Haberland U, Conde F, El Massioui N. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J Neurosci. 2005;25:2771–2780. [PubMed]
  • Faure A, Leblanc-Veyrac P, El Massioui N. Dopamine agonists increase perseverative instrumental responses but do not restore habit formation in a rat model of Parkinsonism. Neuroscience. 2010;168:477–486. [PubMed]
  • Grace AA, Floresco SB, Goto Y, Lodge DJ. Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends Neurosci. 2007;30:220–227. [PubMed]
  • Kauer JA, Malenka RC. Synaptic plasticity and addiction. Nature Reviews Neuroscience. 2007;8:844–858. [PubMed]
  • Knowlton BJ, Mangels JA, Squire LR. A neostriatal habit learning system in humans. Science. 1996;273:1399–1402. [PubMed]
  • Lammel S, Ion DI, Roeper J, Malenka RC. Projection-specific modulation of dopamine neuron synapses by aversive and rewarding stimuli. Neuron. 2011;70:855–862. [PMC free article] [PubMed]
  • Lisman JE, Grace AA. The hippocampal-VTA loop: controlling the entry of information into long-term memory. Neuron. 2005;46:703–713. [PubMed]
  • Matsuda Y, Marzo A, Otani S. The presence of background dopamine signal converts long-term synaptic depression to potentiation in rat prefrontal cortex. J Neurosci. 2006;26:4803–4810. [PubMed]
  • Overton P, Clark D. Iontophoretically administered drugs acting at the N-methyl-D-aspartate receptor modulate burst firing in A9 dopamine neurons in the rat. Synapse. 1992;10:131–140. [PubMed]
  • Overton PG, Richards CD, Berry MS, Clark D. Long-term potentiation at excitatory amino acid synapses on midbrain dopamine neurons. Neuroreport. 1999;10:221–226. [PubMed]
  • Packard MG. Glutamate infused posttraining into the hippocampus or caudate-putamen differentially strengthens place and response learning. Proc Natl Acad Sci U S A. 1999;96:12881–12886. [PubMed]
  • Packard MG, Hirsh R, White NM. Differential effects of fornix and caudate nucleus lesions on two radial maze tasks: evidence for multiple memory systems. J Neurosci. 1989;9:1465–1472. [PubMed]
  • Packard MG, McGaugh JL. Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol Learn Mem. 1996;65:65–72. [PubMed]
  • Parker JG, Zweifel LS, Clark JJ, Evans SB, Phillips PE, Palmiter RD. Absence of NMDA receptors in dopamine neurons attenuates dopamine release but not conditioned approach during Pavlovian conditioning. Proc Natl Acad Sci U S A. 2010;107:13491–13496. [PubMed]
  • Saal D, Dong Y, Bonci A, Malenka RC. Drugs of abuse and stress trigger a common synaptic adaptation in dopamine neurons. Neuron. 2003;37:577–582. [PubMed]
  • Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol. 1998;80:1–27. [PubMed]
  • Seamans JK, Yang CR. The principal features and mechanisms of dopamine modulation in the prefrontal cortex. Prog Neurobiol. 2004;74:1–58. [PubMed]
  • Stuber GD, Klanker M, de Ridder B, Bowers MS, Joosten RN, Feenstra MG, Bonci A. Reward-predictive cues enhance excitatory synaptic strength onto midbrain dopamine neurons. Science. 2008;321:1690–1692. [PMC free article] [PubMed]
  • Tsien JZ, Huerta PT, Tonegawa S. The essential role of hippocampal CA1 NMDA receptor-dependent synaptic plasticity in spatial memory. Cell. 1996;87:1327–1338. [PubMed]
  • Ungless MA, Whistler JL, Malenka RC, Bonci A. Single cocaine exposure in vivo induces long-term potentiation in dopamine neurons. Nature. 2001;411:583–587. [PubMed]
  • Wang DV, Tsien JZ. Conjunctive processing of locomotor signals by the ventral tegmental area neuronal population. PLoS One. 2011a;6:e16528. [PMC free article] [PubMed]
  • Wang DV, Tsien JZ. Convergent processing of both positive and negative motivational signals by the VTA dopamine neuronal populations. PLoS One. 2011b;6:e17047. [PMC free article] [PubMed]
  • Wang LP, Li F, Shen X, Tsien JZ. Conditional knockout of NMDA receptors in dopamine neurons prevents nicotine-conditioned place preference. PLoS One. 2010;5:e8616. [PMC free article] [PubMed]
  • Wickens JR, Horvitz JC, Costa RM, Killcross S. Dopaminergic mechanisms in actions and habits. J Neurosci. 2007;27:8181–8183. [PubMed]
  • Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat Rev Neurosci. 2006;7:464–476. [PubMed]
  • Zhuang X, Masson J, Gingrich JA, Rayport S, Hen R. Targeted gene expression in dopamine and serotonin neurons of the mouse brain. J Neurosci Methods. 2005;143:27–32. [PubMed]
  • Zweifel LS, Argilli E, Bonci A, Palmiter RD. Role of NMDA receptors in dopamine neurons for plasticity and addictive behaviors. Neuron. 2008;59:486–496. [PMC free article] [PubMed]
  • Zweifel LS, Parker JG, Lobb CJ, Rainwater A, Wall VZ, Fadok JP, Darvas M, Kim MJ, Mizumori SJ, Paladini CA, et al. Disruption of NMDAR-dependent burst firing by dopamine neurons provides selective assessment of phasic dopamine-dependent behavior. Proc Natl Acad Sci U S A. 2009;106:7281–7288. [PubMed]