|Home | About | Journals | Submit | Contact Us | Français|
Remembering experiences that lead to reward is essential for survival. The hippocampus is required for forming and storing memories of events and places, but the mechanisms that associate specific experiences with rewarding outcomes are not understood. Event memory storage is thought to depend on the reactivation of previous experiences during hippocampal sharp wave-ripples (SWRs). We used a novel sequence switching task that allowed us to examine the interaction between SWRs and reward. We compared SWR activity after animals traversed spatial trajectories and either received or did not receive a reward. Here we show that rat hippocampal CA3 principal cells are significantly more active during SWRs following receipt of reward. This SWR activity was further enhanced during learning and reactivated coherent elements of the paths associated with the reward location. This enhanced reactivation in response to reward could be a mechanism to bind rewarding outcomes to the experiences that precede them.
How do we remember experiences that lead to reward? Although the hippocampus is required for storing memories of the places and events that make up these experiences (Squire, 1982), little is known about the mechanisms that associate specific experiences with their outcomes. Studies in rodents examining hippocampal responses to different outcomes have generally focused on the presence or absence of a reward such as food or an escape platform in a watermaze. These reports analyzed place field activity, where hippocampal excitatory cells (“place cells”) fire in particular locations in space during active exploration. These studies found that the presence of reward or differences in motivational state can alter the firing rate or location of hippocampal place fields (Breese et al., 1989;Kobayashi et al., 1997;Fyhn et al., 2002;Tabuchi et al., 2003;Holscher et al., 2003;Kennedy and Shapiro, 2009). As the presence or absence of visual cues has a similar effect on place cell firing (Hetherington and Shapiro, 1997), these studies suggest that reward can act like other sensory cues to alter the activity of place cells. Place field changes could signal the presence of something “interesting” when the animals is in the vicinity of the reward, but it is not clear how this activity would help the animal learn to navigate from distant locations to the reward.
Place cells are also active during high frequency network oscillations called sharp wave ripples (SWRs), in which sequences of cells activated during movement are reactivated (Skaggs and McNaughton, 1996;Lee and Wilson, 2002;Foster and Wilson, 2006;Ji and Wilson, 2007;Diba and Buzsaki, 2007). SWRs occur largely during sleep and awake immobility (Buzsaki et al., 1983;O’Neill et al., 2006;Cheng and Frank, 2008) and are thought to be important for spatial learning and memory formation (Redish and Touretzky, 1998;Redish, 1999;Samsonovich and Ascoli, 2005;Nakashiba et al., 2008). Hippocampal reactivation allows events that are experienced relatively briefly to be replayed over and over again on a short timescale compatible with synaptic plasticity (Buzsaki, 1986;Wilson and McNaughton, 1994;Sutherland and McNaughton, 2000). In particular, reactivation during pauses in waking behavior frequently results in the sequential activity of place cells active on paths to or from the animal’s current location (Foster and Wilson, 2006;Diba and Buzsaki, 2007;Karlsson and Frank, 2009;Davidson et al., 2009). Thus, because SWR reactivation can occur after traversing a path, it could allow the animal to learn the relationship between the path and its outcome (Johnson and Redish, 2005;Foster and Wilson, 2006;Diba and Buzsaki, 2007).
Given that rewarded events are often well remembered, we would predict that rewarded outcomes would modulate memory storage mechanisms for the associated events. In particular, we might expect that a rewarding outcome would facilitate reactivation of the experience that led to that outcome. While a recent report documented outcome related activity in the primate hippocampus (Wirth et al., 2009), the relationship between reward and reactivation has not been investigated. We therefore asked whether receipt of reward affects reactivation of place cells in the hippocampus.
SWR events generally originate in hippocampal area CA3 (Csicsvari et al., 2000), so we focused our studies on this area. We recorded from principal neurons while animals learned to switch between two spatial sequences in response to changing reward contingencies (Fig. 1a,b). This sequence switching task allowed us to compare trials when the animals performed the same behavioral sequence and either did or did not receive reward (Fig. 1c). Animals first learned a spatial alternation sequence (S1) to criterion and then learned to switch between this sequence and a new sequence (S2). This task has four features that make it appropriate for examining the effect of reward on hippocampal memory processing. First, the rapid learning of the initial sequence requires the hippocampus (Kim and Frank, 2009) and the hippocampus is required for flexibly changing behavior in response to changing reward contingencies (Hsiao and Isaacson, 1971;Hirsh et al., 1978;Ainge et al., 2007). Second, because reward contingencies change during each run session, this task provides an adequate number of unrewarded trials to allow us to compare neural activity during rewarded and unrewarded trials. Third, the presence or absence of reward drives ongoing behavior. This is in contrast to tasks where reward is randomly omitted (eg. Tabuchi et al., 2003) and animals must learn to behave continuously regardless of each trial’s outcome. Fourth, in this task animals learn to switch between sequences in a familiar environment, allowing us to control for the effects of spatial novelty on SWR activity (Cheng and Frank, 2008;Karlsson and Frank, 2008).
All six arms were open in both sequences and animals received a liquid chocolate reward at the end of an arm if that arm was the next correct arm in the sequence. No experimenter delivered cues indicated whether a trial was or was not rewarded other than the presence or absence of the reward itself. Here, as in other studies of reward (Tremblay et al., 1998;Fiorillo et al., 2008), receipt of reward consists of the entire reward or lack of reward experience including sensation, consumption and the affective states induced by reward presence or absence. Our goal was to determine if this reward experience changed memory processing in the hippocampus.
We focused our analysis on the sequence switching phase of the task. At the beginning of each session animals were placed in the home arm of the to-be-rewarded sequence. We distinguished between an accurate response, where the animal made a choice consistent with the rules of either S1 or S2, and a rewarded response where an animal made a choice consistent with the rules of the currently rewarded sequence. This allowed us to quantify the probability of an accurate response for both sequences on every trial (Fig. 1b). We found that when animals were first placed in the home arm of S2 they immediately performed the previously rewarded S1. Thus, animals used environmental cues and a track-based reference frame to perform the task, rather than remembering a series of right or left turns based on their body reference frame. After executing S1 for several trials animals changed their behavior and eventually learned to perform S2. For the quantification of behavior used for our analyses we employed a dynamic state-space algorithm (Smith et al., 2004) to estimate the likelihood of an accurate response on each trial (Fig. S1). This algorithm allows us to compute confidence intervals which were essential for defining periods when one sequence was performed significantly more accurately than the other.
We recorded from three animals during the sequence switching phase of the task (n = 270 single neurons, Fig. S2 for histology and LFPs in CA3). We first restricted our analyses to putative excitatory neurons that were active in restricted spatial regions (place fields) on the track. To determine whether hippocampal SWR activity varies with receipt of reward, we examined SWR activity when animals stopped at the well and were either rewarded or not rewarded. To identify SWRs, we filtered the local field potential from 150 – 250Hz, determined an envelope by Hilbert transform, and detected when the envelope amplitude exceeded 3 standard deviations from baseline for at least 15 ms (Cheng and Frank, 2008). We chose this frequency band to be consistent with previous analysis of CA1 and CA3 SWRs from our laboratory (Cheng and Frank, 2008;Karlsson and Frank, 2009). There is, however, some controversy over the correct criteria for identifying SWRs in CA3. There is a report that the local field potential signature of SWRs is of lower frequency in CA3 than in CA1 (Csicsvari et al., 1999), which might imply that the events we recorded were distinct from CA1 SWRs. There have also been suggestions that some high frequency events in CA3 might be related to local gamma oscillations, although we are not aware of any published data linking 20 – 80 Hz gamma activity to power in the 150 – 250 Hz band during awake immobility. Finally, it is conceivable that some SWRs were actually from the reference electrode located in the corpus callosum above CA1, and appeared to be in CA3 based on our use of that reference for the recordings. We did not record in the CA1 cell layer in this study, nor did we record laminar profiles across CA3, so to address these issues we examined data from a previous dataset where we identified SWR events in both CA3 and CA1 using the same algorithm (Karlsson and Frank, 2009). We computed the cross-correlations of the times of SWRs detected from simultaneous recordings in CA3 and CA1 and found many cases where there was a sharp peak at zero reflecting events detected in both regions (not shown). Thus, this particular definition of SWRs identifies CA3 events that are often seen in CA1. In addition, as shown below, there is substantial CA3 activity during these events, as would be expected for CA3 events that precede CA1 SWRs (Csicsvari et al., 2000). Thus, we are confident that the SWRs we refer to are population events in CA3 that have the potential to propagate out to CA1 and other brain regions.
We found that cells with place fields on the track (n = 107) were much more likely to be active during SWRs at the well (wSWRs) on rewarded trials than unrewarded trials (Fig. 2a, 2b, p < 10−10, Fig. S3; all statistical tests were rank sum tests and n = 107 for activation probability per pass or per wSWR unless otherwise noted, U = 31307). This enhanced activity was associated with two differences in neural responses between rewarded and unrewarded trials. First, there were more wSWRs per unit time on rewarded trials (Fig. 2c, p < 10−10, n = 3945 rewarded trials, n = 709 unrewarded for SWR rate unless otherwise noted, U = 1151800). Second, we examined each wSWR individually to control for the greater rate of wSWRs and found that place cells were more likely to be active in any given wSWR on rewarded trials (Fig. 2d, S4a, p < 10−10, U = 19828). As expected given the greater activation probability per wSWR, the average number of spikes each neuron fired per wSWR, the mean firing rate in wSWRs and the proportion of cells active per wSWR were also larger in rewarded than unrewarded trials (Fig. 2e, 2f, and 2g, p’s < 10−4, n = 107 cells; prop. active n = 4427 SWRs following reward, n = 238 following no reward, U = 13865, 13979, 397170, respectively).
The increase in wSWR rate and activation probability within individual wSWRs accounted for a four-fold increase in total activation probability on rewarded trials. There was an additional two-fold increase that resulted from longer time spent at the well on rewarded trials (p < 10−10, n = 3945 rewarded trials, n = 709 unrewarded). We controlled for this time difference by truncating the time on each rewarded trial to match the duration of immobility on a randomly selected unrewarded trial. Cells were still significantly more likely to be active on truncated rewarded trials than unrewarded trials (Fig. 2h, p < 10−10, U = 40392). Similarly, the differences in wSWR rate and activation probability per wSWR remained significantly higher on truncated rewarded trials (p < 10−10 and p < 0.01, U = 1164700 and 26094, respectively).
If this enhanced reactivation is important for learning about experiences that lead to reward, we would expect stronger reactivation when the animal learns new path-reward associations. Consistent with this prediction, we found that wSWR rate, activation probability per wSWR and the proportion of cells active per wSWR were higher when animals were first exposed to S2. We examined rewarded trials during periods when animals performed the rewarded sequence significantly more accurately than the unrewarded sequence. On the first day of exposure to S2, wSWR rate was higher on rewarded trials during the first session of S2 than during the rewarded trials in the previous S1 session (Fig. 3a, p < 0.0005, n = 471 S1 trials, n = 140 S2 trials, U = 110138). Similarly, both activation probability and proportion of cells active per wSWR were significantly greater in S2 than S1 on day 1. (Fig. 3b,c; activation prob. p < 0.05, Student’s paired t-test, n = 14 cells; prop. active, p < 10−4, S1 n = 428 SWRs, S2 n = 147, U = 48223). The increase in activation probability and proportion of cells active per wSWR in S2 was above and beyond the overall increases in rewarded trials compared to unrewarded.
Finally, as we would expect if these differences were related to learning a novel sequence, there were no significant differences on the third day of exposure to S2, when S2 was more familiar (Fig. 3b,c; p’s > 0.1; SWR rate: S1 n = 353 trials, S2 n = 131; activation prob. n = 14 cells; prop. active S1 n = 455 SWRs, S2 n = 288, U = 77166). While these findings demonstrate clear differences between sessions and across days, the wSWR rate and the proportion of cells active per pass were relatively stable within individual sessions (Fig. S4b,c). Therefore this enhanced wSWR activity on S2 does not simply reflect sensitivity the changes in reward contingencies that occur at the beginning of each session. Furthermore, this increase in wSWR activity in S2 cannot be due to the presence of consumatory behaviors, as the animal consumed reward in both sequences. Taken together, these results demonstrate that reward related SWR activity is further enhanced when animal must learn new path-reward associations.
The increase in wSWR activity on rewarded trials was not simply encoding the presence of reward; instead wSWR activity reflected structured reactivation of neurons active on the paths associated with the rewarded location. Previous reports of reactivation during pauses in behavior have documented increased coordinated activity of pairs of CA1 place cells during SWRs (Kudrimoti et al., 1999;Cheng and Frank, 2008) as well as sequential replay of CA3 and CA1 place cells during SWRs (Foster and Wilson, 2006;Diba and Buzsaki, 2007;Csicsvari et al., 2007;Karlsson and Frank, 2009;Davidson et al., 2009). We therefore asked whether the wSWR activity we saw was specific to particular place cells or pairs of cells active on the track. We computed the probability that a neuron active during a wSWR at the well was also active during the run period leading up to or away from that well. Those probabilities were both significantly higher than the probability that the neuron was active during a randomly selected run period (see methods, Fig. 4a, p < 10−10, n = 3945 trials).
If wSWR activity resulting from reward specifically reactivates meaningful patterns of place cell activity, we would also expect greater reactivation of cells with place fields on the track. We clustered cells during both run and rest sessions, allowing us to identify neurons that were active in the rest box but did not have place fields on the track. We found that cells with place fields on the track were much more likely to be activated than cells without (Fig. 4b, ranksum test, p < 10−5, U = 64918 rewarded trials, U = 32377 unrewarded trials). Further, the increase in activation probability per wSWR from unrewarded to rewarded trials was much larger for cells with place fields on the track, when measured as either the average across all cells (p < 10−5) or as the increase within individual cells (Fig. 4c, p < 10−10, U = 145963). We confirmed these effects with a 2-way ANOVA and found main effects of reward (F(1,454) = 56.71, p < 10−5) and of the presence of a place field (F(1,454) = 94.55, p < 10−5). There was also a highly significant interaction (F(1,455) = 27.9, p < 10−5), due to a larger increase for cells with place fields. See the Supplementary Methods for further discussion of the measurements.
We then examined the place field locations of the cells active during wSWRs. One could imagine that SWRs preferentially reactivate cells that were most recently active on the run to the well. If so, we would expect cells that were active closest in space or time to the reward well would be more likely to fire during wSWRs. We found no such bias. During the run periods, both the general population and cells that were active during wSWRs tended to fire at the turns of the track. To visualize the spatial distribution of run period activity, we plotted population firing rate maps constructed from all the spikes for every cell that was active during wSWRs at a particular well in a single session (Fig. 4d, 7, 4 and 13 cells respectively). Each spike from each cell contributed equally. The populations of cells that were active during wSWRs were active at multiple locations on the track and they tended to fire more during the turn leading to the reward well, consistent with previous reports that place fields congregate around relevant cues (Hetherington and Shapiro, 1997). We also noted that some cells had multiple place fields, likely contributing to the activity on more distant paths.
To quantify the spatial distribution of run period activity preceding wSWRs, we calculated the location of the peak of the occupancy normalized firing rate during the run period for all cells and only for cells that were active during wSWRs. The distributions for the entire population and for the cells that fired during wSWRs on rewarded trials were very similar (Fig. 4e and S5a,b; linear regression R2’s > 0.6, p’s < 10−4, n = 21 spatial bins). A complementary analysis examining the time between spiking and wSWR activity also failed to show a temporal bias for cells with place fields closer to the reward locations (Fig. S5c–f).
If reward enhances the reactivation of experiences associated with reward, we would predict that cells that fired together during the run would also fire together during wSWRs on rewarded trials. We computed the coactivity across cells pairs and found that pairs of place cells were more than twice as likely to fire together during wSWRs on rewarded than unrewarded trials (Fig. 5a, p < 10−10, n = 498 pairs, U = 207903). This coactivation probability per wSWR was higher for cells with greater overlap between their place fields (R2 = 0.1192, p < 10−10 for rewarded trials and R2 = 0.0213, p < 0.005 for unrewarded trials, n = 498 pairs).
The coactivity of cells pairs on rewarded trials was greater than expected given the firing of the individual cells (Fig. 5b). We also found a significant correlation between place field overlap and the extent to which cells were more coactive per wSWR than expected by chance (Cheng and Frank, 2008)(Fig. 5b, Fig. S6a,b, R2 = 0.0846, p < 10−5, n = 412 pairs on rewarded trials and R2 = 0.0335, p > 0.4, n = 20 pairs on unrewarded trials; note that the measure is only defined if each cell is active at least once in a wSWR). Increased coactivation was present only in SWRs: there was no significant difference in coactivation during the run up to the food well for rewarded and unrewarded trials (see methods, Fig. 5c, p > 0.34, 95% confidence intervals for unrewarded: 0, 0.0191 and rewarded: 0, 0.0186, n = 498 pairs, U = 58377). We found similar results when we repeated these analyses including only cell pairs recorded from different tetrodes (SFig. 6c–e) and when we measured the joint-surprise (Grun et al., 2002;Pazienti and Grun, 2006) of cell pairs’ spiking (SFig. 6f–h). Joint surprise also measures the extent to which two cells are more coactive than expected by chance based on the individual cells’ firing, and has been used to help control for spike sorting errors. Taken together, these results confirm that SWR activity reactivated elements of an experience and that receipt of reward enhances this reactivation.
Finally, we found that reactivation in CA3 was consistent with the ordered replay of place cell sequences observed in previous studies (Foster and Wilson, 2006;Diba and Buzsaki, 2007;Csicsvari et al., 2007;Karlsson and Frank, 2009). Ordered replay implies that for two cells, the further apart their place fields, the longer the time between their spikes during SWRs. We therefore took each pair of cells and measured the extent to which the distance between their place field peaks predicted the inter-cell interspike intervals during wSWRs (Karlsson and Frank, 2009). Considering only rewarded trials, we found a highly significant relationship, consistent with ordered replay (Fig. 5d, R2 = 0.0913, p < 10−10, n = 426 ISIs).
Previous reports of “reverse replay” argued that when cells were reactivated in an order opposite to that on the path to a reward, this pattern of activity could help the animal learn the sequence of locations leading to the reward (Foster and Wilson, 2006;Diba and Buzsaki, 2007). We therefore asked whether the reactivation we saw was consistent with reverse replay. We found that in 337 of 495 cases (68.1%) where two cells were active in a wSWR, the order of activity was opposite that seen during the run (proportion > 0.5, p < 10−10). At the same time, many cells were active at a given location in both directions of motion on the track, so these events are consistent with both replay of the path to the reward and “preplay” of paths from the reward. Nonetheless, these findings demonstrate that CA3 SWR activity following receipt of a reward reactivates coherent elements of the animal’s path associated with the rewarded location.
Finally, we carried out a series of controls to determine whether increased wSWR activity on rewarded trials could be attributed to activity during the run to the well, the relative timing of rewarded and unrewarded trials, behavioral variability, the behavioral sequence the animals executed, reward expectation, or wSWR properties. In no case were we able to identify a difference in these factors that could explain the differences between neural activity during wSWRs on rewarded and unrewarded trials. Here we discuss two of these controls; the rest can be found in the Supplementary Results, and SFig. 7 and 8.
We first asked whether the differences in rewarded trials could be explained by the specific spatial sequence the animal performed. We compared times when animals performed S1 when it either was or was not rewarded. For this analysis we took advantage of the fact that when reward contingencies changed to reward S2, animals continued to perform S1. We therefore compared periods when animals performed S1 accurately when reward was either delivered (e.g. Fig. 1c far left box) or omitted (e.g. Fig. 1c middle left box). Here the behavioral sequence was identical and only the receipt of reward varied. Even when S1 was performed accurately, activation probability per wSWR was only elevated when animals received reward (Fig. 6a, Fig. S7i–l, p < 10−5, n = 106 cells recorded when there were wSWRs following reward, n = 46 unrewarded, U = 1944). We found similar results if we truncated the time spent at the well on these rewarded trials to match the unrewarded trials (p < 10−4, U = 2821.5). This shows that the enhanced wSWR activity was due to the receipt of reward, not the behavioral demands of the task.
These results also suggest that expectation of a reward did not lead to increased activation per wSWR on rewarded trials. Based on the history of rewards in S1, the animal would expect a reward for performing S1. If SWR activity was due to reward expectation, SWR activity would be higher when the animal performed S1 when it was no longer rewarded following a session where it was rewarded. However, SWR activity is higher on rewarded trials and lower on unrewarded trials regardless of the prior history of reward.
We similarly found that in cases where a reward was delivered at a previously unrewarded location, the presence of reward rather than the history of reward was the best predictor of wSWR activity. We examined trials on the first exposure to S2 when the animal received a reward in arm E, a previously unrewarded arm (animal 1:15 trials; animal 2: 4 trials; animal 3: 19 trials). Reward on these trials on arm E was initially unexpected given the animals’ previous experience with S1. The number of unexpected reward trials included was similar to that used in other studies of reward expectation (Fiorillo et al., 2008;Schultz et al., 1992;Tremblay et al., 1998). Neurons were significantly more active on these unexpectedly rewarded trials than unrewarded trials (Fig. 6b, p < 0.04, n = 14 cells, U = 168).
We have shown that receipt of reward enhances SWR reactivation and this reactivation is further enhanced when new reward contingencies must be learned. The activation probability per SWR, the number of SWRs per time, the number of spikes per SWR, and the mean spike rate during SWRs were significantly higher at the end of rewarded trials than unrewarded trials. This increase in SWR activity only occurred after animals reached the reward well. Furthermore, on rewarded trials occurring when the animal was learning a new sequence (S2), the rate of SWRs and the probability that cells were active in each wSWR was even higher than during rewarded trials associated with the familiar S1. The animal was receiving and consuming reward in both cases, demonstrating that the increase in neural activity could not be explained solely as a result of the presence of consummatory behaviors. We also showed that spiking during wSWRs reactivates coherent elements of the experiences that are associated with the paths to and from the rewarded location. We performed an extensive series of control analyses examining differences in activity on the path to the reward, behavioral differences at the reward location and reward expectation. None of these potential confounds could explain the enhanced activation of CA3 neurons during wSWRs following receipt of reward, indicating that receipt of reward is a key determinant of wSWR rate and firing during wSWRs.
Our results are distinct from previous demonstrations of an interaction between reward and hippocampal activity. Previous studies found that place fields can change in response to reward (Breese et al., 1989;Kobayashi et al., 1997;Tabuchi et al., 2003;Holscher et al., 2003). Our findings, in contrast, indicate that reward plays a special role in modulating the reactivation of cells associated with recent experiences. Similarly, the increase in reactivation we saw is distinct from observations of outcome selectivity in primate hippocampus (Wirth et al., 2009). That paper reported that hippocampal neuronal activity between trials was related to whether the animal made a correct or incorrect choice on the previous trial, irrespective of the specific stimuli the animal experienced during that trial. Our findings suggest that in the rodent hippocampus, activity following a reward specifically relates to the sequence of locations the animal traversed on the way to the reward. Finally, the enhanced reactivation we report is also distinct from reward “signals” in which single cells encode aspects of reward like reward expectation or reward prediction error as observed in several other brain regions (Schultz, 2000). Instead, enhanced SWR reactivation following reward is better understood as reactivating patterns of activity that reflect experiences associated with the reward location.
Previous findings have suggested a simple model whereby reactivation merely reflects recent activity within the hippocampal network. These studies have shown that reactivation during sleep reflects the structure of previous awake experience (Wilson and McNaughton, 1994), and the amount that two cells fire together during SWRs in sleep depends on the amount that the two cells fire together during prior experience (O’Neill et al., 2008). Similarly, repeated and regular traversals of the same path lead to increases in SWR rate and reactivation in familiar environments (Jackson et al., 2006).
Our results demonstrate that this simple model is not sufficient: we found an increase in SWR reactivation when animals were rewarded even though there was no increase in activity during the preceding run period. In addition, there was no apparent bias in which cells along the trajectory to the well were reactivated, indicating that on short timescales, the timing of place cell firing relative to the SWR did not determine the strength of reactivation. Finally, pairs of cells were more coactive during SWRs at the well on rewarded trials, but there was no difference in coactivity during the run period of rewarded and unrewarded trials. Thus, reactivation is not simply a reflection of recent activity.
Instead, our results argue for a more complex model where an event’s outcome modulates the strength of reactivation. We found that cells both with and without place fields were more likely to be activated during wSWRs on rewarded than unrewarded trials. These observations indicate that reward increases the likelihood of reactivation for all cells. At the same time, cells with place fields were much more likely to be reactivated than cells without place fields, and cells active on paths associated with the reward location were most likely to be reactivated. Therefore, the specific spatial sequence the animal traversed strongly influences which cells will be active during SWRs, while the presence or absence of reward modulates the amount and strength of reactivation.
We found that the rate of SWRs and the likelihood that cells would be active within SWRs increased when animals had to learn new path-reward associations, suggesting that SWR reactivation contributes to learning. More specifically, SWR reactivation is well suited to help the animal learn the paths that lead to reward. Unlike place field activity on the path up to the reward, activity in SWRs can activate specific patterns of place cells after the outcome of traversing the path is known. We found evidence for sequential activation in pairs of neurons, and other reports have established that these reactivation events frequently involve replay of entire paths along the track (Foster and Wilson, 2006;Diba and Buzsaki, 2007;Karlsson and Frank, 2009;Davidson et al., 2009).
We propose two possible mechanisms by which this SWR reactivation could facilitate learning the paths that lead to reward. First, the enhanced reactivation of rewarded paths could strengthen representations in neocortical areas of rewarded paths over unrewarded paths. Later, when animals are selecting between several possible paths, the rewarded paths could outcompete unrewarded paths in the decision process. Alternately, the enhanced SWR reactivation could facilitate an association between a path and the outcome of that path creating a set of path-outcome associations. Because SWR reactivation occurs after the path is complete and can be coherent with activity in neocortical and striatal areas (Chrobak and Buzsaki, 1996;Ji and Wilson, 2007;Wierzynski et al., 2009;Peyrache et al., 2009;Lansink et al., 2009), reactivation could help link a specific path with reward information encoded in other brain regions. As place fields were commonly found around turns when animals have to make arm choices, reactivation of place cells could lead to associations between reward outcome and the activity related to choices the animal made to reach the reward.
In this latter case in which SWR activity facilitates the formation of path-outcome associations, we might wonder why reactivation occurs more on rewarded than unrewarded trials when a lack of reward is also informative. In our task, as in natural foraging, there are generally many more paths that do not lead to reward than paths that do lead to reward. Thus, it may be advantageous to preferentially encode the relatively small number of path-reward associations as compared to the large number of path-no reward associations. Indeed, when reward contingencies change and animals encountered no reward where they previously were rewarded, the lack of reward was not so significant that animals immediately changed their behavior. Instead, they persisted in performing the unrewarded paths for many trials, indicating that the presence of reward on that path in the past may continue to influence behavior despite the current lack of reward.
Overall, our results demonstrate a new link between reward and the reactivation of recent experience. This reward-related reactivation may be a mechanism to learn and remember experiences that lead to reward.
Male Long-Evans rats were handled and food deprived to 85–90% of baseline weight. Animals were initially trained to run back and forth on a linear track for liquid chocolate reward delivered in food wells at the ends of the track. Linear track pretraining took place in a different room from the recording room. One of the animals was pretrained on S1 (Fig. 1a) in the recording room, while two animals were not exposed to the behavioral task until recording began. Following pretraining animals were implanted with a microdrive array containing 16 independently movable tetrodes targeting CA3 (−3.6 mm AP; 3.4mm L) using previously described methods (Frank et al., 2004). Over the next 7 – 10 days tetrodes were lowered first to CA1 and then to CA3. Details on data collection can be found in Karlsson and Frank (2008). CA3 was identified by depth and the characteristic EEG waveforms on each recording tetrode. Electrode positions were additionally confirmed by histology. For one animal, electrode lesions were made at the end of each tetrode and later confirmed to be in the CA3 pyramidal cell layer (Fig. S2a). For two animals, the microdrive fell off before lesions could be made. In these animals we were able to confirm that the implant site was over dorsal CA3 and that the depths were consistent with CA3 recordings (Fig. S2b,c). Furthermore the EEG signatures characteristic of CA3 were similar in all animals (Fig. S2d), although it is possible that a small number of cells were recorded from the dentate gyrus. For all animals a reference tetrode was positioned in the corpus callosum. All neural signals were recorded relative to that reference to eliminate muscle artifacts from the recordings.
During recordings animals were rewarded with liquid chocolate for performing the behavioral paradigm shown in Figure 1. The track included 4 sequence arms, B C D and E, and one extra arm on each end (A and F). Arms were separated by vertical walls (0.6 cm thick, 24 cm tall and 81 cm long). Distal cues were visible above these walls, at either end of each arm, and along the straight section connecting different arms. Circles indicate food wells where animals received reward in arms B through E. Colored arrows indicate trajectories included in S1 (blue) and S2 (red).
The task consists of two rules. First, a visit to the home arm (arm C is S1 and arm D in S2) was rewarded when the animal came from any other arm (inbound trajectories). Second, a visit to an arm adjacent to the home arm was rewarded when the animal came from the home arm after having previously visited the opposite adjacent arm (outbound trajectories). Consecutive visits to the same food well were never rewarded. Together, these rules defined a correct cyclical sequence of food-well visits (Fig. 1a): right, center, left, center, right, center, left, center, etc (Frank et al., 2000;Kim and Frank, 2009). The inbound trajectories require the animal to return to a single rewarded location, the home arm, from any other arm. The outbound trajectories require the animal to remember which arm he just came from. For the first outbound trajectory at the start of each session, the animal was rewarded for visiting either home-adjacent arm. If the animal visited an arm not included in the rewarded sequence (e.g. arm A, E or F for S1), the animal was rewarded for returning to the home arm. On the following outbound trajectory the animal was rewarded for visiting either home-adjacent arm. We have shown that during the initial learning of the task, animals learn the inbound component first and then learn to alternate on outbound trajectories (Kim and Frank, 2009). Therefore, once animals learn to perform the outbound trajectories with high accuracy they are generally performing the entire sequence accurately. Note that we use the terms trajectory and path interchangeably throughout the manuscript.
During each run session the animal was placed in the home arm of the to-be-rewarded sequence (arm C for Sequence 1 and arm D for Sequence 2). Each run session was between 20 and 30 minutes long; one animal performed two sessions and two animals performed three sessions per day. Thirty to forty minute rest sessions in a high walled box in the same room preceded and followed each run session. Once the animal performed S1 with 80% accuracy, measured across a run session, or had 6 full days of training and was above 75% accurate, the sequence switching phase of the task commenced.
On the first day of sequence switching animals first performed one session where S1 was rewarded. Then in the second session, reward contingencies changed such that S2 was rewarded. All subsequent sessions alternated between rewarding S1 and S2 within each day (see Fig. S1 for details). Recording continued throughout the rest and run sessions.
Reward was delivered via an air pressure / solenoid system and was triggered via key press on a keyboard. The experimenter’s back was to the animal such that the experimenter was between the animal and the keyboard during the experiment. The experimenter triggered reward release before the animal reached the reward well, and on correct outbound trials from the center arm the experimenter generally triggered both the reward in the outer arm and the subsequent center arm (inbound) reward. Thus the audible solenoid click occurred before the animal stopped at the well and there was no consistent temporal relationship between solenoid clicks and reward delivery. In rare cases reward was triggered just as the animal reached the well. Excluding these trials had no effect on the rewarded / unrewarded differences.
We distinguished between “accurate” responses that were consistent with the rules of S1 or S2, and rewarded responses. This allowed us to score behavior according to the rules of both sequences simultaneously. For Fig. 1 we used a 20 trial moving average applied to all trials to illustrate the behavior, but as this moving average does not provide confidence bounds, we also used a dynamic state-space smoothing algorithm (Smith et al., 2004;Smith et al., 2007) to estimate the animals’ probability of an accurate response for each sequence on each trial and to compute confidence intervals for the estimated probability. For the algorithm we focused on outbound passes because an outbound trajectory could be correct for S1 or S2 but not both. We scored outbound trajectories from the home arm of a sequence as accurate or inaccurate separately for S1 and S2. The estimated probability distribution produced by the algorithm was taken from the end of one run session and used as a starting value for the estimation of the next run session. This corresponds to the assumption that the animal began each session with some information from the previous session but also allows the learning state to “jump” if the animal behaves very differently at the beginning of the next session.
On the basis of these estimates we defined three stages of behavioral performance. First, animals performed the unrewarded sequence significantly more accurately than the rewarded sequence, as occurs when the reward contingences are first changed. In this period the mode of the estimated probability accurate response distribution for the newly unrewarded sequence is greater than the 95% confidence bounds of estimated probability accurate response of the rewarded sequence. Then, animals traversed a variety of spatial trajectories between food wells, and neither sequence was performed significantly more accurately than the other. In this the period neither mode is greater than the confidence bounds of the other sequence. Finally, animals performed the rewarded sequence significantly more accurately than the unrewarded sequence. In this period the mode for the rewarded sequence is greater than the confidence bounds of the unrewarded sequence. We used these stages to compare accurate performance of S1 when S1 was rewarded to accurate performance of S1 when S2 was rewarded. We also calculated behavioral entropy as described previously(Jackson et al., 2006) so that we could identify any differences in behavior between rewarded and unrewarded trials (see Supplementary Methods and Results).
Only well isolated cells with tightly clustered spikes and clear refractory periods were included. Cells were clustered throughout run and rest periods, allowing us to identify cells that were active but did not have place fields on the track. As our results involved comparisons of spiking from the same clusters within a day, poor clustering cannot account for the effects we observed. We did not attempt to match cells across days, so in some cases the same cell may have been recorded across multiple days. All analyses were restricted to putative principal neurons (n = 100, 42, and 128 for animals 1,2 and 3 respectively;(Fox and Ranck, 1981;Frank et al., 2001). To identify cells with place fields we calculated the ‘linearized’ activity of each cell. Only times when animals were running forward at least 2 cm/sec were included. The behavioral data were separated into different spatial trajectories (e.g. A to B, B to A, B to C, …) and the animal’s linear position was measured as the distance in cm along the track from the reward site on the start arm. We calculated occupancy normalized firing rate maps using 2 cm spatial bins smoothed with a 4 cm standard deviation Gaussian curve with a total extent of 20 cm, excluding bins with less than 0.2 sec/bin occupancy. Cells with a mean rate greater than 0.1 Hz and a peak spatial rate greater than 3 Hz were considered to have a place field on the track. Place field overlap was calculated according to a previously established method (Battaglia et al., 2004).
SWRs were identified as described previously (Cheng and Frank, 2008). Briefly, LFPs were recorded from one channel of each tetrode. On each day, the tetrode with the largest number of isolated neurons was used for SWR detection. The LFP signal was band pass filtered between 150–250 Hz and an envelope was determined by Hilbert transform. SWR events were detected if the envelope exceeded a threshold of mean + 3 stdev for at least 15 ms. Events included times around the triggering event during which the envelope exceeded the mean. SWR amplitude was measured in standard deviations from baseline.
We defined when animals were stopped at a food well as times when the animal was within 10 cm of the well with a linear speed (e.g. speed along the long axis of the track) equal to zero. These times include only periods when the animal had arrived at the well and could consume reward if it was present. SWRs that occur during these times are referred to as wSWRs. Linear speed was calculated as the change in linear distance per position samples divided by the time between samples (33 ms) and was not smoothed. We obtained essentially identical results when we used a two dimensional speed of zero, also not smoothed, to define periods of immobility.
We calculated a number of measures related to activity during SWRs. The activation probability per SWR was the number of SWRs in which a cell was active divided by the total number of SWRs. The mean rate during SWRs was the total number of spikes during SWRs divided by the total duration of SWRs. The proportion of cells active per SWR was the proportion of cells with place fields on the track that were active during the SWR. The proportion of cells active per SWR was calculated for each SWR; activation probability, mean rate, and number of spikes per SWR were per cell measures and SWR rate was measured per trial.
To control for differences in the timing of rewarded and unrewarded trials within the session we selected single pairs of adjacent rewarded and unrewarded trials. We randomly selected the order of the pairs: rewarded followed by unrewarded or unrewarded followed by rewarded. We also controlled for differences in time spent at the well by truncating the time stopped at the well on rewarded trials to match that time spent on unrewarded trials.
We examined activity during the entire run period when animals were running at greater than 2 cm/sec and were more than 20 cm from the start or end well. For each cell we calculated peak firing rate and mean firing rate as described previously(Karlsson and Frank, 2008) for all rewarded or unrewarded trials. We then examined the coactivation probability during the run to the well. Binning the run periods into 100 msec bins, for each pair of cells we calculated the probability that the two cells were active together in a bin. We also computed two measures that quantified the extent to which the coordinated activity in SWRs was greater than that expected by chance, the coactivity z-score(Cheng and Frank, 2008), and joint surprise (Grun et al., 2002;Pazienti and Grun, 2006). Definitions of these measures are presented in the Supplementary Methods.
We determined whether SWR activity during periods when the animal was stopped at the well was consistent with previous reports of reactivation using two complementary analyses. For all of these analyses we identified run periods during each trial as times when the animal was moving forward at greater than 2 cm/s and more than 10 cm from the start or end well. We excluded cells with place fields within 10 cm of the wells at the beginning or end of the pass to focus on the cells with place fields active during running.
First, we calculated the probabilities that a cell active on the run towards the well or on the subsequent run away from the well was also active during wSWRs. We compared those probabilities to the probability that the cell was active on a randomly chosen run between wells. Second, we used a pair-wise measure to determine whether the spiking during wSWRs was consistent with the ordered replay seen in downstream CA1. This pair-wise measure was necessary because CA3 tends to be sparsely active (Leutgeb et al., 2004) and, as our goal was to record for many days, we did not maximize the number of simultaneously recorded cells on a single day. The presence of coherent replay would predict that the time between spikes from different cells in a SWR should be related to the distance between the cells’ place fields. For every pair of place cells active in a wSWR, we measured the absolute value of the time from each reference spike of one cell to all spikes from the other cell for each trial. We restricted the time axis to values between 0 and 500 ms and plotted each time between wSWR spikes from the pair of cells against the linear distance between the place field centers on the preceding run. We used the preceding run because some cells had multiple fields on the track, and we wished to focus on activity associated with the most recent run between food wells. We calculated the R2 value of a linear fit to the points in the plot, which measured the degree to which the distance between the cells’ place fields predicts the time between their wSWR spikes.
We characterized the run period activity of cells that fired during SWRs using three complementary analyses. We identified run periods as above (speed > 2 cm/sec, > 10 cm from wells) We first visualized run period activity in the two dimensional track we collected all run period spikes from the cells that fired during wSWRs at a given well, identified the firing location of each spike in two dimensions, and created occupancy normalized rate maps as though these spikes had come from a single neuron.
Second, we computed the distribution of peak firing locations for all cells and cells that fired during wSWRs. For each trajectory for each cell, we computed the location of the peak occupancy normalized firing rate and then created a distribution of peak locations from that set of cells. We computed these curves separately for rewarded and unrewarded trials.
Finally, we examined the location and timing of spikes during the run period leading up to a wSWRs. For each cell that fired during wSWRs on a pass, we determined the time between the cell’s spikes during the run periods and the wSWRs in which the cell fired. This time was also decomposed into the time between the spikes during the run and when the animal reached the reward well and the time between when the animal reached the well and when wSWRs occurred.
We would like to thank Maya Chandru for assistance with data collection, Mattias Karlsson and Anne Smith for help with data analyses, and Allison Doupe, Howard Fields, Marianne Hafting-Fyhn, Patricia Janak, Ana Nathe, Michael Stryker and the members of the Frank Laboratory for their scientific and editorial suggestions. This work was supported by NIH grant MH080283, the John Merck Fund and the McKnight Foundation.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.