PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Neuron. Author manuscript; available in PMC 2010 April 16.
Published in final edited form as:
PMCID: PMC2822782
NIHMSID: NIHMS102552

Capacity-enhancing synaptic learning rules in a medial temporal lobe online learning model

Abstract

Medial temporal lobe structures are responsible for recording the continuous stream of autobiographical memories that define our unique personal history. Remarkably, these areas can construct durable memories from brief exposures to the constantly changing activity patterns arriving from antecedent cortical areas. Using a computer model of the hippocampal Schaffer collateral pathway that incorporates evidence for dendritic spikes in CA1 pyramidal neurons, we searched for biologically-plausible long-term potentiation (LTP) and homeostatic depression (HD) rules that maximize “online” learning capacity. We found memory utilization is most efficient when (1) very few synapses are modified to store each pattern, (2) LTP — the learning operation --is dendrite-specific and gated by distinct pre- and postsynaptic thresholds, (3) HD — the forgetting operation — co-occurs with LTP and targets least-recently potentiated synapses, and (4) both LTP and HD are all-or-none, leading de facto to binary-valued synaptic weights. In networks containing 40 million synapses, the novel learning scheme led to order-of-magnitude capacity increases compared to conventional plasticity rules.

Introduction

A remarkable feat of biological design lies in the brain's ability to function as a sort of neural “camcorder”, laying down memory traces of ongoing experience at the speed of life. The brain areas most closely associated with online learning lie in the medial temporal lobe (MTL), including the perirhinal cortex, parahippocampal gyrus, and hippocampus (Squire et al., 2007). A distinguishing feature of online learning is that incoming patterns must in principle be encoded in a single training presentation. This denies neurons the opportunity to make multiple up-and-down adjustments to their synaptic strengths during many passes over the same information, forcing them instead to make changes that are abrupt and yet long-lasting.

Several models have been proposed for online learning in neural circuits {Nadal, 1986; Henson, 1995; Sohal, 2000; Bogacz, 2001; Norman, 2003; Fusi, 2005; Greve, 2008}. However, none so far has included the assumption, supported by both physiological and modelling studies (Mel, 1993; Schiller et al., 2000; Ariav et al., 2003; Poirazi et al., 2003; Polsky et al., 2004; Losonczy and Magee, 2006; Losonczy et al., 2008; Spruston, 2008), that the thin basal and apical oblique dendrites of pyramidal neurons do not simply funnel their synaptic inputs passively to the cell body, but instead provide a layer of nonlinear integrative “subunits” that can substantially increase a cell's ability to process and store information (Koch et al., 1983; Mel et al., 1998; Archie and Mel, 2000; Poirazi and Mel, 2001; Poirazi et al., 2003). One “disadvantage” of a layered nonlinear model for online learning is that it is more difficult to analyze mathematically, given the presence of internal subunits with modifiable thresholds, and statistical dependencies between subunits arising from their common inputs. Nonetheless, it remains possible through controlled simulation studies to determine which features of the constituent neurons, dendrites, synapses, and plasticity rules have the greatest impact on the system's ability to rapidly store, and faithfully preserve, learned information. Taking this approach, we have focused on the problem of recognition memory, that is, the ability to distinguish previously learned patterns from novel patterns with low false positive and negative error rates.

Results

Measuring capacity in the basic network

We studied an online learning network whose architecture is loosely modelled after the Schaffer collateral projection from hippocampal CA3 neurons onto the apical oblique dendrites of CA1 pyramidal cells (Figure 1A), though our model could be applied to any similarly structured monosynaptic neural pathway. Only a single post-synaptic neuron is shown (Figure 1A), with each of its dendrites (vertical green “branches”) receiving synapses from a small fraction of the incoming axons drawn at random. Dendrites are assumed to function as separately thresholded subunits (Schiller et al., 2000; Ariav et al., 2003; Antic et al., 2003; Milojkovic et al., 2004; Polsky et al., 2004; Losonczy and Magee, 2006) giving rise to a functional 2-layer network (Mel, 1992; Mel et al., 1998; Archie & Mel, 2000; Poirazi et al., 2003). The interconnection matrix between axons and dendrites is assumed to be random and fixed, and only the first-layer synaptic weights wij are modifiable. Weights are randomly initialized to weak (w=0) or strong (w=1) values with equal probability. Input patterns are sparse binary-valued vectors X = {x1, … xn} whose components indicate whether the corresponding axon is firing (xi = 1) or not (xi = 0). In all simulations reported here, patterns were drawn at random with prob[xi=1]=1/64. Each dendritic subunit j first sums its inputs linearly, giving a subthreshold activation level

Figure 1
Model architecture and learning rule
aj=iDjWijXi

where Dj is the set of inputs to the jth dendrite. Each training pattern is presented once. When a training pattern causes a subunit to cross the learning threshold (aj > θL), which we assume corresponds to the initiation of a local dendritic spike (Dudman et al, 2007; Remy and Spruston, 2007), all active synapses on that branch are either strengthened (w→1) or remain strong (Figure 1B). To prevent memory washout, an equal number of synapses is depressed (w → 0), chosen at random from the strong synapses within each branch undergoing plasticity. A constant 1-to-1 ratio of strong to weak synapses is thus homeostatically maintained. When subunit activity crosses an even higher “firing” threshold aj > θF > θL, the local spike (sj = 1) “propagates” to the soma and causes its parent cell to fire (y = 1). The overall memory response is the combined output of all cells r=iyi, and a pattern is said to be recognized if r > θR, where θR is the recognition threshold.

Intuitively, the memory works as follows: within each subunit participating in the storage of a pattern, the group of co-activated synapses (red circles in Fig. 1B) represents a randomly sampled “higher-order feature” contained in the pattern, written into the memory by the LTP operation. Each learned pattern is represented by a collection of such features. It is crucial that θF > θL so that novel patterns, on their first exposure to the memory, cause very few subunits and cells to fire and thus mostly fall below θR (green area in Figure 2A). When a previously-trained pattern is re-encountered within its storage lifetime, however, its stored higher-order features are “read out” by firing their respective subunits and cells, thereby producing a suprathreshold response (red area in Figure 2A). For simplicity, we assume synapses are not modified during the recognition phase. A high learning threshold ensures that only those rare subunits that are already close to representing one of the pattern's higher-order features are recruited to participate in its stored trace. This reduces the number of synaptic changes needed to store a pattern and thus conserves memory resources. If θL becomes too high, however, memory responses become too weak to reliably distinguish trained from untrained responses in the presence of noise.

Figure 2
HD-induced degradation of responses to trained patterns (red shaded area) leads to gradual merging with untrained pattern distribution (green)

A trained pattern's stored trace weakens over time as homeostatic depression events erode its internal representation (Lynch et al., 1977; Nadal et al., 1986; Morris and Willshaw, 1989; Henson and Willshaw, 1995; Fusi et al., 2005). A pattern reaches the end of its lifetime when it fails to fire enough of its trained subunits/cells to reach θR. This leads to a working definition of the capacity C: the number of sequentially trained patterns looking back in time for which the average true recognition rate remains high (we chose 99%), with few false positive responses (we chose 1%).

Performance of a network consisting of 25,600 axons forming 2.56 million synapses (100 each) onto the 10,000 dendritic subunits of 400 neurons (25 each) is shown in Figure 2A. Values of θL = 6 and θF =9 were found to optimize capacity as defined above. To more fairly compare results using different learning rules, a network-level “gain control” operation limited plasticity to L subunits drawn at random from those crossing θL for each presented pattern, where L was optimized for each learning rule. This allowed more precise control of synaptic resource consumption per pattern than was possible through adjustments in the learning threshold alone. With L=120 in the present case, pattern storage on average involved the potentiation of just 2 synapses and depression of 2 others per trained dendrite, for a total of 480 modified synapses – less than 0.02% of all synapses in the network.

Declining trace strength is plotted as a function of increasing pattern age, measured as the number of patterns elapsed since training (Figure 2A). The lower boundary of the red shaded area indicates the weakest 1% of trained patterns of a given age, while the upper boundary of the green shaded area indicates the strongest 1% of untrained patterns; this latter value determined θR (black arrows). Capacity in this case was 1,100 patterns (red arrow), corresponding to a 1% cumulative false-negative (FN) rate on the rising dashed red curve. The gradually falling/broadening cumulative distributions of trained pattern responses are shown in Figure 2B for the most recent 100, 600, and 1,100 training patterns. The shaded corners denote the 1% false positive (green) and false negative (red) errors at capacity.

Improved performance using multi-valued synaptic weights

In certain neural network models, the ability to assign and stably maintain finely-graded synaptic weight levels increases capacity by enabling more subtle changes in the shape and/or orientation of the learned decision surface (Rumelhart et al., 1986; Fusi et al., 2005), though see also (Schwenker et al., 1996). Multi-level synapses also increased capacity here, but for an unexpected reason. LTP remained all-or-none in these experiments, that is, any active synapse on a trained subunit was fully potentiated. This ensured relative uniformity of initial trace strength from pattern to pattern, a reasonable objective for an online memory. Multiple weight levels instead came into play during homeostatic depression: rather than depress S synapses fully within a subunit to offset S potentiations, S*W synapses were each depressed by a fractional amount 1/W, where W was the number of available weight levels. This led to steady-state weight histograms as shown in Figure 3A. As the weight resolution W increased, the variability of aj for untrained patterns dropped steadily from a STDEV of 1.5 for binary weights (W=2) to a STDEV of 1.2 for W=32 (Figure 3B). This trend gradually lightened the upper tail of the aj distribution, which when mapped through the subunit's thresholding nonlinearity, reduced both the mean and the variance of subunit responses to untrained patterns (Figure 3B). This meant θR could be substantially lowered while maintaining the desired 1% false positive criterion. In turn, fewer subunits had to be trained (note the lower y-intercepts in Figure 3C), reducing per-pattern resource consumption and extending memory lifetimes. With 32-level weights, recognition errors rose much more slowly and capacity more than tripled compared to binary weights (Figure 3C, red vs. blue arrows; see also orange dotted curve in 3D). Multi-level weights would not always be beneficial, however, as shown below.

Figure 3
Multi-level weights and age-ordered depression

Age-ordered depression protects recently stored information

A major problem for an online memory with a high firing threshold θF is that the loss of a small fraction of the synapses involved in storing a pattern, caused by random depression events, can drive the pattern's evoked response below the recognition threshold even though most of its synaptic resources remain occupied. For example, using binary weights (Figure 2), we found that at steady state only 22% of all strong synapses were involved in the representation of any of the last 1,100 patterns; the remainder represented older patterns or vestiges thereof. An alternative to random depression is to target those synapses that were least-recently-potentiated or refreshed. This strategy, originally proposed in the context of a single-layer network (Henson and Willshaw, 1995), preferentially reclaims synaptic resources dedicated to the oldest patterns while protecting recently stored information – the de facto goal of an online memory. In simulations using “age-ordered depression” (AOD), memory traces remained stable far longer, consistent with the observations of Henson & Willshaw (1995) (compare orange dotted and blue solid curves in Figure 3E). The capacity boost was substantial: for binary-valued weights, AOD led to a more than 5-fold boost in capacity up to 5,800 patterns (Figure 3DE).

Age-ordered depression leads to a preference for binary weights

Surprisingly, however, we found that multi-valued weights were detrimental to capacity with AOD (Figure 3D), since the spreading of small weight decrements over a larger population of synapses meant that attrition of stored traces began earlier than necessary (compare blue and red curves in Figure 3E). For example, with W=32 and a corresponding weight decrement Δw = -1/32, the need to depress 2 units of weight within a dendrite undergoing plasticity changes – to balance 2 units of potentiation – requires that the 64 least-recently potentiated synapses each be slightly weakened. In a branch with 256 synapses, this means that half of the strong weights, and an even larger fraction of stored higher-order features, are degraded during every learning event in that branch. In contrast, with 2-level weights and full depression (Δw = -1), only 2 strong synapses are adversely affected during a learning event, leaving the majority of stored higher-order features intact. This concentrated depression strategy allows a pattern's stored trace to ride just above the recognition threshold for the longest possible time after which all of its synaptic resources are released quasi-synchronously.

Beyond their failure to improve performance, we found that multi-valued weights were fundamentally incompatible with AOD: when old weights are systematically targeted, nearly the same population of synapses is depressed from one learning event to the next as their weights are decremented to 0 in W-1 steps. To the extent that the “depression transient” is short compared to the time a weight spends in the fully strengthened or fully weakened state, synaptic weights subject to AOD revert to essentially binary-valued quantities regardless of the value of W (Figure 3F). Thus, while the capacity-boosting effects of AOD have been previously documented (Henson and Willshaw, 1995), our results point to a surprising additional benefit of age-ordered depression in a 2-layer network: storage capacity is maximized when synapses alternate between just two long-term stable states (weak and strong). Interestingly, the hypothesis that Schaffer collateral synapses have only two long-term stable weight levels has received some experimental support (Petersen et al., 1998; Ganeshina et al., 2004; O'Connor et al., 2005; Nicholson et al., 2006).

Capacity degrades gradually with increasing “age noise”

To test whether the capacity-boosting effects of AOD depend on precise age information, we first examined the distribution of synapse lifespans (in the strong state) for the runs shown in Figure 3E with binary weights (W=2). Unlike the exponential lifespan distribution that results from random depression (Figure 3G, orange dotted curve), the distribution for AOD was peaked well away from the origin with an initial period of virtually complete protection (Figure 3G, blue curve). The peak lifespan corresponds roughly to the storage capacity, which was 5,800 patterns in this case. We conducted additional simulation experiments in which Gaussian “age noise” was added to the protection tags assigned to synapses whenever they were potentiated or refreshed, leading to a detrimental spread in the distribution of ages at which synapses were reclaimed by AOD. A simpler depression rule was used here, not requiring that synapses be explicitly sorted by age. Instead, synapses were released into a depressible pool upon expiration of their protection tags, and subjected to random depression thereafter. Capacity was affected, but not severely, even when age noise was a substantial fraction of the optimal lifespan (See blue histogram in Figure 3H).

Increasing storage efficiency through a dual learning threshold

Next, we found that relying solely on the Hebbian criterion of strong post-synaptic depolarization (aj > θL) to decide whether an active synapse gets potentiated leads to a substantial waste of synaptic resources. If θL is crossed by an unusually small cohort of presynaptic axons, because they activate an unusually high proportion of already-strong synapses, it can occur that even when all of the participating synapses are potentiated, the cohort remains too small to drive the subunit past its optimized firing threshold θF (Figure 4A). When this occurs, synaptic resources are expended, but cannot later be read out. The waste is not limited to synapses that are overtly strengthened: an already-strong synapse that is merely refreshed by the LTP signal will have its return to the depressible pool uselessly delayed. To address this, we included a second learning threshold θL-pre that places a lower limit on the number of pre-synaptic axons that must drive subunit j for learning to occur:

Figure 4
Enhancing capacity with dual threshold LTP
iDjXij>θL-pre

Remarkably, when we required that both thresholds be crossed for LTP to occur within a branch, storage capacity nearly doubled again to 11,200 patterns (Figure 4B). This suggests that a biophysical mechanism capable of ensuring both a strong pre- and post-synaptic participation in order for plasticity to occur within a dendrite would lead to a substantial increase in storage capacity (see Section “Why LTP should be subject to two thresholds” for a discussion of the possible roles of AMPA- and NMDA-channels in the mediation of this dual-threshold effect).

Scaling of storage capacity with increasing network size

Finally, we considered the network's scaling behaviour. Given the very low overlap between stored patterns (we found each strong synapse participated in the storage of only 2.5 patterns on average within the 11,200 pattern capacity horizon), we expected storage capacity to scale roughly linearly with increasing network size. When optimized for each size, thresholds increased to maintain a roughly constant number of synapses modified per stored pattern. We ran 6 additional network sizes from 1× to 64× (our default network was 4×), where the largest network contained N=40M synapses and had a capacity of 114,000 patterns. We plotted capacity as a function of N and found power law scaling with exponents of 0.82 and 0.86, respectively, for the worst and best plasticity rules discussed above (Figure 5). Given the slightly different exponents, the factor of improvement seen with the augmented LTP/HD rules grew slightly over the range of network sizes tested, reaching a factor of 12 improvement at the largest size. The largest network we simulated contained 6,400 neurons, corresponding to 1-2% of the pyramidal neurons in rat CA1 (Boss et al., 1987).

Figure 5
Capacity scaling with network size

Discussion

We have studied an online learning scenario in which each pattern is stored in a single trial by rapidly imprinting a random sampling of its higher-order features onto a few strongly activated dendrites. LTP is the learning operation that causes new features to be written into the memory in a branch-specific way, while synaptic depression is responsible for erasing old features (thereby achieving synaptic weight homeostasis). Both LTP and HD are all-or-none operations, meaning that synaptic weights alternate between 2 long-term-stable states (weak and strong) (Petersen et al., 1998; Ganeshina et al., 2004; O'Connor et al., 2005). It is interesting to note that though they play opposite roles in our model (learning vs. forgetting), synaptic potentiation and depression operations occur in the same dendrites at the same time in response to the same local signals (Lynch et al., 1977; Morris and Willshaw, 1989). The coordinated activation of long-term potentiation and depression processes at different synapses within the same dendrites within the same time window is consistent with reports of subcellular localization of synaptic plasticity (Magee and Johnston, 1997; Poirazi and Mel, 2001; Sajikumar and Frey, 2004; Froemke et al., 2005; Gordon et al., 2006; Govindarajan et al., 2006; Harvey and Svoboda, 2007; Losonczy et al., 2008), and with the evidence that the mechanisms involved in strengthening and weakening synapses share common molecular features, as in the intriguing phenomenon of synaptic “cross capture” (Sajikumar and Frey, 2004; Govindarajan et al., 2006).

Where recognition neurons are found, expect extremely sparse changes

Learning in the model occurs only in dendrites where a local spike (Gordon et al., 2006; Remy and Spruston, 2007) is triggered by an unusually strong synaptic input (Schiller et al., 2000; Poirazi et al., 2003; Polsky et al., 2004; Losonczy and Magee, 2006; Major et al., 2008). When dendritic thresholds are set high, which we found maximizes storage capacity, the physical changes in the network needed to encode a new pattern for later recognition are extremely sparse. In our largest networks containing 6,400 neurons and ~40 million synapses, the learning of a pattern involves the participation of fewer than 1% of the neurons, an even smaller fraction of dendrites (less than 0.02%), and a minute fraction of the synapses in the network -- only a few per participating dendrite and less than 0.001% overall. These values should not be viewed as literal predictions, however, since they would drop to even lower values for larger networks, and/or would climb to higher values if noise levels were increased. Nevertheless, the preference in our model for storage through all-or-none changes in a minute fraction of binary-valued synapses, rather than distributing graded synaptic weight changes across a large population of multi-valued synapses, points to a key synergy in our model between sparse binary storage and age-ordered synapse recycling (see section below on “Recognition memory, sparse plasticity, and age-ordered depression”).

One consequence of the extremely sparse utilization of neuronal hardware in the storage of individual patterns is that it could be difficult to find pure recognition neurons in vivo using classical electrophysiological methods. In particular, only a small fraction of recognition/familiarity neurons would participate in the learning of any given pattern, or equivalently, any given recognition neuron would participate in the learning of only a small fraction of patterns. Even on the rare occasion that a recorded neuron actually participates in the learning of a pattern, it would not be expected to fire on the initial presentation of the pattern, but only on a subsequent presentation after LTP expression has occurred (Frank et al., 2004). For this reason, neurons optimized for old-new recognition performance, perhaps concentrated in the perirhinal cortex (Barker et al., 2007), could easily be misclassified as “unresponsive” when initially probed with large numbers of novel stimuli (Xiang and Brown, 1998). It is important to note, however, that our prediction of sparse plasticity applies only to pure recognition neurons, and would not necessarily hold for neurons involved in other types of memory tasks or combinations thereof. For example, neurons in the human hippocampus and amygdala that fire “densely” upon reactivation of stored memory traces do not carry a pure recognition signal but represent both the familiarity of a stored item as well as its recalled spatial context (Rutishauser, Schumann & Marmelak 2008). It is quite possible that the spatial context of a remembered item, which generally includes a multitude of objects distributed throughout extrapersonal space, is represented by a much denser neural code than is needed to support pure old-new pattern discrimination.

The use of in vivo imaging methods, though technically difficult, would seem the ideal approach to studying recognition memory at the neurophysiological level, since this would allow simultaneous recordings from large numbers of neurons (Ohki et al., 2005; Greenberg et al., 2008) and possibly even from the dendrites themselves (Harvey and Svoboda, 2007). This would allow rare learning events in dendritic branches, and perhaps even individual synaptic changes, to be more easily detected.

Can a single dendrite fire a neuron?

One side effect of a high dendritic learning (i.e. local spike) threshold is that rarely will more than one dendrite be activated per neuron participating in the learning process. This means that a single dendrite, when re-activated by one of its stored higher-order features, should be capable of driving output spikes in its parent neuron. There is some evidence that this is possible in CA1 pyramidal cells (Ariav et al., 2003; Losonczy et al., 2008). To investigate this issue further, we ran an additional simulation in a 4× network with the constraint that cells could only learn or fire when two or more of their dendrites crossed the (learning or firing) threshold in response to a pattern. To allow learning and readout to occur, both dendritic learning and firing thresholds had to be so lowered, leading to such an increase in untrained pattern responses, that storage capacity was reduced 4-fold. This result, coupled with the experimental findings cited above, suggests that pyramidal cells in MTL recognition memory structures should be fireable in vivo by a single strongly activated dendritic branch.

Why LTP should be subject to two thresholds

The standard notion underlying Hebbian LTP induction is that a synaptic weight should increase when the synapse is pre-synaptically activated on a strongly activated post-synaptic cell (Bliss and Lomo, 1973). The term “strongly activated” typically refers to the peak post-synaptic potential, which must be large enough to relieve NMDA channels of their Mg++ block, allowing them to conduct the calcium current needed for LTP induction. In our model, the threshold θL determines whether a dendrite is sufficiently “strongly activated” in the peak EPSP sense. It is in effect a measure of the total synaptic weight Σwi being presynaptically driven on a branch, regardless of the number of participating axons.

We pointed out that there is a second sense in which a dendrite can be strongly activated, measured in terms of the number of axons driving the dendrite, regardless of their post-synaptic weights. In our model, this quantity is measured by θLpre. We showed that when LTP is gated by both thresholds, ensuring that a branch is strongly activated in both senses, a significant savings in synaptic resources is achieved, leading to a significant increase in storage capacity.

The effects of the two thresholds may be understood as follows: a high value of θL ensures that a higher-order feature to be encoded in a branch is almost encoded there already, that is, most of its participating axons already have strong weights so that as few additional weights as possible (if any) need to be strengthened to encode the new feature. Raising the threshold is thus a means of encouraging the sharing of synaptic resources among patterns containing similar higher-order features. A high value of θLpre, in turn, ensures that synaptic resources are only invested in the encoding of higher-order features when the features are high-enough of order to be successfully read out when subsequently reactivated. θLpre must therefore be at least as high as θF.

What biophysical mechanism(s) might make LTP dependent on both a strong pre-synaptic showing, governed by θLpre in our model, and a strong post-synaptic showing, as governed by θL? We conjecture that the peculiar asymmetry in CA1 between NMDA conductances, which are relatively uniform across the synaptic population (Takumi et al., 1999; Nusser, 2000; Nicholson et al., 2006) vs. the more variable AMPA conductances that increase and decrease in response to learning signals (Isaac et al., 1995; Malinow and Malenka, 2002) (though see Watt et al., 2004; Lisman and Raghavachari, 2006), could support a dual pre- and post-synaptic learning threshold. In particular, if a threshold level of calcium influx through NMDA channels is the key requirement for LTP induction (Malenka and Nicoll, 1999), a strong NMDAR-dependent calcium influx in an activated dendrite should require both that (1) a sufficient number of NMDA channels are activated by glutamate, which depends on having a sufficient number of pre-synaptic axons driving that branch (θL-pre); and (2) a sufficient total post-synaptic AMPA conductance is activated by those firing axons (θL), which depends on their synapses being mostly already AMPA-strong. Whether or not LTP is proven in future experimental studies to depend on both pre- and post-synaptic thresholds in this particular way, the very different biophysical properties of NMDA and AMPA channels, and their different distributions across the synaptic population, call into question the conventional concept of a scalar “synaptic weight” at each synapse, and could lead to more complex and varied plasticity rules than are contemplated by current learning models (see also Liaw and Berger, 1996; Chover et al., 2001).

Representational style: a table of binary higher-order features (and the adequacy of binary weights)

To understand why binary-valued weights outperformed multi-valued weights in our simulations, it is useful to consider the way patterns are represented in the 2-layer architecture we have studied, and to keep in mind the special nature of the old-new memory problem.

Any binary-valued pattern can be represented by a set of higher-order features. Let m be the number of active axons contained in a pattern P. Any subset of k of those active axons can be called a kth-order feature of P, and there are (mk) such features in a pattern (1019 for the parameters used in our simulations). A single dendrite in our model with a firing threshold θF = k can be viewed as a “table” of kth-order features (or higher), where a particular feature An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpg is said to be “contained” in the table if (1) all k of its axons make contact with the dendrite, and (2) all of its associated synapses are already strong. For k =10, assuming feature An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpg has appropriate connectivity to the dendrite (condition 1), and 50% of all synapses are strong, the prior probability that condition 2 is satisfied for An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpg is 1/210 ≈1/1000. The act of adding An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpg to the dendrite's table, assuming condition 1 is met, is to strengthen any of its k synapses that are not already strong. The network as a whole can be viewed as a super-table combining all of the dendrite sub-tables, and a pattern as a whole is represented – for purposes of old-new recognition – solely by the number R of its features An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpg1, … An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpgr that are stored in all of the dendrite sub-tables. Note that even a never-before-seen pattern has some of its features stored in the super-table (see green distribution in Figure 2B). The act of learning a pattern is simply to load an additional set of its features into the table so that its trace strength R is unambiguously higher than the random background distribution, where the cutoff for recognition is θR.

Since the representation of a pattern depends only on whether its features are stored or not, once a feature An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpg is initially stored in a dendritic sub-table by insuring that all of it's weights have the value 1, no representational purpose is served by later incrementing some or all of its weights (if this were biophysically possible); the feature is stored regardless, but at a greater expense of synaptic resources. Even when a particular synapse participates in the storage of multiple higher-order features An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpgi, An external file that holds a picture, illustration, etc.
Object name is nihms102552ig1.jpgj … (recall that each synapse in our 4× network was used in the storage of 2.5 features on average), recognition of any of those features is not improved by increasing the synapse's weight to a value greater than 1. The actual effect of increasing any single synaptic weight relative to others (e.g., from 1 to 2) would be to expand the dendrite's feature table to include a large number of features of order k-1 that include that heavier-weighted input. This type of bulk expansion of a subunit's “receptive field” to include a large, highly structured collection of lower-order features will rarely be useful in distinguishing randomly drawn training patterns from the random background.

Thus, in a memory where the patterns to be stored are binary-valued, of uniform activation density, and statistically independent, and memory readout is strictly a count of each pattern's stored higher-order features, binary-valued synaptic weights are probably sufficient. This provides a possible explanation for the observations in CA1 that synaptic weights alternate between two long-term-stable states (Petersen et al., 1998; Ganeshina et al., 2004; O'Connor et al., 2005; Nicholson et al., 2006).

Recognition memory, sparse plasticity, and age-ordered depression

The goal of an online recognition memory as defined here is to retain the longest possible sequence of patterns stretching from the present into the past while maintaining acceptably low recognition error rates. The binary nature of the old-new classification problem means that any excess trace strength devoted to patterns within the capacity horizon C does not increase capacity, nor does the abrupt release of storage resources for patterns beyond C decrease capacity. To maximize C, therefore, the ideal use of synaptic resources involves maintaining uniform, just-suprathreshold (> θR) traces for all trained patterns regardless of their age up to C, after which all resources devoted to “expired” patterns should be turned over for reuse. This idealized scenario can be approached in practice by limiting homeostatic depression to synapses “older” than C – affording newer patterns with near-absolute protection and explaining the near-constant memory trace strength in Figure 3E and and4B.4B. The protection period T, closely related to the capacity, can be estimated by dividing the total number of strong synapses in the network (N/2) by a lower bound on the number of synapses used (i.e. strengthened or refreshed) per stored pattern:

CT=N2LθLpre

The value of T is indicated by a circle in Figure 4, demarcating the onset of the memory's rapid decay phase.

Importantly the key requirement for limiting destructive changes to patterns older than C is that some synapses must be older than C, that is, some synapses cannot have been used – either potentiated or refreshed – for at least the past C training patterns. This establishes a connection between age-ordered depression and sparse plasticity in an online memory optimized for long storage times: only when a small fraction of synapses are modified in the storage of each pattern can synapses reach old age. And only when synapses can reach old age can the memory traces associated with old patterns be selectively targeted by AOD.

For clarity, it is important to dissociate sparse plasticity, which refers to the small fraction of synapses altered in the storage of each pattern, from sparse coding, which refers to the small fraction of axons firing in each input pattern. A major advantage of a 2-layer architecture is that plasticity can be made sparse regardless of the input coding density. This can be achieved by increasing learning threshold(s) until plasticity is limited to only a small fraction of the available subunits, and hence synapses. The absolute number of synapses devoted to each stored pattern, which determines the stored trace strength, can be independently controlled by varying the number of subunits contained within the network.

Implementing Age-ordered depression

Age-ordered depression with binary weights relieves the biological machinery from the need to bidirectionally adjust finely-graded weight values or “cascade” states (Fusi et al., 2005) and maintain them stably over months or years. These needs are replaced by the requirements that (1) an all-or-none change at a synapse established during LTP remains functionally intact despite a covert aging process that “counts time” in terms of dendrite-specific learning events, and (2) homeostatic depression acts preferentially on the oldest synapses, that is, synapses whose age counters have reached their limits. The ubiquitin-proteasome system (Colledge et al., 2003; Schwartz, 2003; Ding and Shen, 2008; Tai and Schuman, 2008) could provide a mechanism for age-dependent synaptic protection/elimination. In a simplified view of ubiquitin-dependent protein degradation, activated enzymes sequentially elongate poly-ubiquitin chains on targeted proteins, when the chains reach a critical length the targeted proteins are recognized by the proteasome and degraded. The length of the poly-ubiquitin chain could thus act as an event counter, allowing the oldest synaptic proteins to be specifically targeted for breakdown. Consistent with such a mechanism, a number of synaptic proteins involved in plasticity are known to be ubiquitin regulated (Colledge et al., 2003; Ding and Shen, 2008; Tai and Schuman, 2008), including post-synaptic proteins involved in homeostatic depression (Ehlers, 2003; Zhao et al., 2003).

Conclusions

Further experiments in vitro and in vivo will be needed to test whether the above constellation of architectural assumptions (2 layer network, random connectivity, binary-valued patterns), dendritic properties (separate plasticity and firing thresholds, branch-specific bi-directional plasticity, single branch able to fire the soma), and synaptic plasticity rules (dual threshold LTP, age-ordered homeostatic depression, sparse plasticity) provide a good model of one-shot learning in MTL recognition/familiarity neurons. A major additional challenge will be to understand how a simple online recognition memory of the kind we have studied here in isolation, consisting of a single axonal pathway projecting to a homogeneous population of dendrites fed by random patterns, fits into a larger system that involves (1) multiple areas with feedback loops within and between areas (Burwell et al., 1995; Redish and Touretzky, 1998; Squire et al., 2004; Burgess, 2007); (2) gradients of interconnection strengths (Amaral and Witter, 1989) and responses properties (Brun et al., 2008; Giocomo and Hasselmo, 2008) both within and between areas; (3) multiple input pathways per neuron (Jarsky et al., 2005); (4) ion channel gradients within neurons that alter synaptic integration as a function of distance from the soma (Hoffman et al., 1997; Magee, 1998); (5) the action of various neuromodulators (Lisman and Spruston, 2005; Hasselmo, 2006); (6) various network rhythms depending on behavioral state (Huerta and Lisman, 1993; Buzsaki and Draguhn, 2004); and (7) input patterns that contain spatio-temporal correlations, (8) vary in importance and emotional valence, and once having been learned, (9) must obey intricate rules of consolidation, reconsolidation, extinction, and so on (Tronson and Taylor, 2007). A continuing dialogue between modeling and experimental efforts will be essential for resolving these many interesting questions.

Acknowledgments

We thank Dr. Fritz Sommer and the anonymous reviewers for many helpful comments, and Bardia Behabadi and Yichun Wei for technical assistance. This work was supported by grants from NSF CRCNS grant IIS-0613583 and NIMH grant 5R01MH065918.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • Amaral DG, Witter MP. The three-dimensional organization of the hippocampal formation: A review of anatomical data. Neuroscience. 1989;31:571–591. [PubMed]
  • Archie KA, Mel BW. A model for intradendritic computation of binocular disparity. Nat Neurosci. 2000;3:54–63. [PubMed]
  • Ariav G, Polsky A, Schiller J. Submillisecond precision of the input-output transformation function mediated by fast sodium dendritic spikes in basal dendrites of CA1 pyramidal neurons. J Neurosci. 2003;23:7750–7758. [PubMed]
  • Barker GR, Bird F, Alexander V, Warburton EC. Recognition memory for objects, place, and temporal order: a disconnection analysis of the role of the medial prefrontal cortex and perirhinal cortex. J Neurosci. 2007;27:2948–2957. [PubMed]
  • Bliss TVP, Lomo T. Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. The Journal of physiology. 1973;232:331–356. [PubMed]
  • Bogacz R, Brown MW, Giraud-Carrier C. Model of Familiarity Discrimination in the Perirhinal Cortex. Journal of Computational Neuroscience. 2001;10:5–23. [PubMed]
  • Boss BD, Turlejski K, Stanfield BB, Cowan WM. On the numbers of neurons on fields CA1 and CA3 of the hippocampus of Sprague-Dawley and Wistar rats. Brain Research. 1987;406:280–287. [PubMed]
  • Brun VH, Solstad T, Kjelstrup KB, Fyhn M, Witter MP, Moser EI, Moser MB. Progressive increase in grid scale from dorsal to ventral medial entorhinal cortex. Hippocampus. 2008;18:1200–1212. [PubMed]
  • Burgess N. Computational Models of the Spatial and Mnemonic Functions of the Hippocampus. In: Andersen P, Morris R, Amaral D, O'Keefe J, editors. The Hippocampus Book. Oxford: 2007. pp. 715–750.
  • Burwell RD, Witter MP, Amaral DG. Perirhinal and postrhinal cortices of the rat: a review of the neuroanatomical literature and comparison with findings from the monkey brain. Hippocampus. 1995;5:390–408. [PubMed]
  • Buzsaki G, Draguhn A. Neuronal oscillations in cortical networks. Science. 2004;304:1926–1929. [PubMed]
  • Chover J, Haberly LB, Lytton WW. Alternating dominance of NMDA and AMPA for learning and recall: a computer model. Neuroreport. 2001;12:2503–2507. [PubMed]
  • Colledge M, Snyder EM, Crozier RA, Soderling JA, Jin Y, Langeberg LK, Lu H, Bear MF, Scott JD. Ubiquitination Regulates PSD-95 Degradation and AMPA Receptor Surface Expression. Neuron. 2003;40:595–607. [PMC free article] [PubMed]
  • Ding M, Shen K. The role of the ubiquitin proteasome system in synapse remodeling and neurodegenerative diseases. Bioessays. 2008;30:1075–1083. [PMC free article] [PubMed]
  • Dudman JT, Tsay D, Siegelbaum SA. A Role for Synaptic Inputs at Distal Dendrites: Instructive Signals for Hippocampal Long-Term Plasticity. Neuron. 2007;56:866–879. [PMC free article] [PubMed]
  • Ehlers M. Activity level controls postsynaptic composition and signaling via the ubiquitin-proteasome system. Nat Neurosci. 2003;6:231–242. [PubMed]
  • Frank LM, Stanley GB, Brown EN. Hippocampal plasticity across multiple days of exposure to novel environments. J Neurosci. 2004;24:7681–7689. [PubMed]
  • Froemke RC, Poo Mm, Dan Y. Spike-timing-dependent synaptic plasticity depends on dendritic location. Nature. 2005;434:221–225. [PubMed]
  • Fusi S, Drew PJ, Abbott LF. Cascade models of synaptically stored memories. Neuron. 2005;45:599–611. [PubMed]
  • Ganeshina O, Berry RW, Petralia RS, Nicholson DA, Geinisman Y. Differences in the expression of AMPA and NMDA receptors between axospinous perforated and nonperforated synapses are related to the configuration and size of postsynaptic densities. The Journal of Comparative Neurology. 2004;468:86–95. [PubMed]
  • Giocomo LM, Hasselmo ME. Time constants of h current in layer ii stellate cells differ along the dorsal to ventral axis of medial entorhinal cortex. J Neurosci. 2008;28:9414–9425. [PMC free article] [PubMed]
  • Gordon U, Polsky A, Schiller J. Plasticity compartments in basal dendrites of neocortical pyramidal neurons. J Neurosci. 2006;26:12717–12726. [PubMed]
  • Govindarajan A, Kelleher RJ, Tonegawa S. A clustered plasticity model of long-term memory engrams. Nat Rev Neurosci. 2006;7:575–583. [PubMed]
  • Greenberg DS, Houweling AR, Kerr JND. Population imaging of ongoing neuronal activity in the visual cortex of awake rats. Nat Neurosci. 2008;11:749–751. [PubMed]
  • Greve A, Sterratt D, Donaldson D, Willshaw D, van Rossum M. Optimal learning rules for familiarity detection. Biological Cybernetics 2008 [PubMed]
  • Harvey CD, Svoboda K. Locally dynamic synaptic learning rules in pyramidal neuron dendrites. Nature. 2007;450:1195–1200. [PMC free article] [PubMed]
  • Hasselmo ME. The role of acetylcholine in learning and memory. Curr Opin Neurobiol. 2006;16:710–715. [PMC free article] [PubMed]
  • Henson RN, Willshaw DJ. Short-term associative memory. Proceedings of the INNS World Congress on Neural Networks; Washington DC. 1995.
  • Hoffman DA, Magee JC, Colbert CM, Johnston D. K+ channel regulation of signal propagation in dendrites of hippocampal pyramidal neurons. Nature. 1997;387:869–875. [PubMed]
  • Huerta PT, Lisman JE. Heightened synaptic plasticity of hippocampal CA1 neurons during a Cholinergically induced rhythmic state. Nature. 1993;364:723–725. [PubMed]
  • Isaac JTR, Nicoll RA, Malenka RC. Evidence for silent synapses: Implications for the expression of LTP. Neuron. 1995;15:427–434. [PubMed]
  • Jarsky T, Roxin A, Kath WL, Spruston N. Conditional dendritic spike propagation following distal synaptic activation of hippocampal CA1 pyramidal neurons. Nat Neurosci. 2005;8:1667–1676. [PubMed]
  • Koch C, Poggio T, Torre V. Nonlinear interactions in a dendritic tree: localization, timing, and role in information processing. Proceedings of the National Academy of Sciences of the United States of America. 1983;80:2799–2802. [PubMed]
  • Liaw JS, Berger TW. Dynamic synapse: a new concept of neural representation and computation. Hippocampus. 1996;6:591–600. [PubMed]
  • Lisman J, Raghavachari S. A Unified Model of the Presynaptic and Postsynaptic Changes During LTP at CA1 Synapses. Sci STKE. 2006;2006:re11. [PubMed]
  • Lisman J, Spruston N. Postsynaptic depolarization requirements for LTP and LTD: a critique of spike timing-dependent plasticity. Nat Neurosci. 2005;8:839–841. [PubMed]
  • Losonczy A, Magee JC. Integrative properties of radial oblique dendrites in hippocampal CA1 pyramidal neurons. Neuron. 2006;50:291–307. [PubMed]
  • Losonczy A, Makara JK, Magee JC. Compartmentalized dendritic plasticity and input feature storage in neurons. Nature. 2008;452:436–441. [PubMed]
  • Lynch GS, Dunwiddie T, Gribkoff V. Heterosynaptic depression: a postsynaptic correlate of long-term potentiation. Nature. 1977;266:737–739. [PubMed]
  • Magee JC. Dendritic hyperpolarization-activated currents modify the integrative properties of hippocampal CA1 pyramidal neurons. J Neurosci. 1998;18:7613–7624. [PubMed]
  • Magee JC, Johnston D. A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. Science. 1997;275:209–213. [PubMed]
  • Major G, Polsky A, Denk W, Schiller J, Tank DW. Spatio-temporally Graded NMDA Spike/Plateau Potentials in Basal Dendrites of Neocortical Pyramidal Neurons. J Neurophysiol. 2008 00011.02008. [PubMed]
  • Malenka RC, Nicoll RA. Long-Term Potentiation--A Decade of Progress? Science. 1999;285:1870–1874. [PubMed]
  • Malinow R, Malenka RC. AMPA receptor trafficking and synaptic plasticity. Annu Rev Neurosci. 2002;25:103–126. [PubMed]
  • Mel BW. NMDA-based pattern discrimination in a modeled cortical neuron. Neural Comput. 1992;4:502–516.
  • Mel BW. Synaptic integration in an excitable dendritic tree. J Neurophysiol. 1993;70:1086–1101. [PubMed]
  • Mel BW, Ruderman DL, Archie KA. Translation-invariant orientation tuning in visual “complex” cells could derive from intradendritic computations. J Neurosci. 1998;18:4325–4334. [PubMed]
  • Milojkovic BA, Radojicic MS, Goldman-Rakic PS, Antic SD. Burst generation in rat pyramidal neurones by regenerative potentials elicited in a restricted part of the basilar dendritic tree. J Physiol. 2004;558:193–211. [PubMed]
  • Morris RGM, Willshaw DJ. Must what goes up come down? Nature. 1989;339:175–176. [PubMed]
  • Nadal JP, Toulouse G, Changeux JP, Dehaene S. Networks of Formal Neurons and Memory Palimpsests. EPL (Europhysics Letters) 1986;1:535–542.
  • Nicholson DA, Trana R, Katz Y, Kath WL, Spruston N, Geinisman Y. Distance-Dependent Differences in Synapse Number and AMPA Receptor Expression in Hippocampal CA1 Pyramidal Neurons. Neuron. 2006;50:431–442. [PubMed]
  • Norman KA, O'Reilly RC. Modeling hippocampal and neocortical contributions to recognition memory: a complementary-learning-systems approach. Psychol Rev. 2003;110:611–646. [PubMed]
  • Nusser Z. AMPA amd NMDA receptors: similarities and differences in their synaptic distribution. Current Opinion in Neurobiology. 2000;10:337–341. [PubMed]
  • O'Connor DH, Wittenberg GM, Wang SSH. Graded bidirectional synaptic plasticity is composed of switch-like unitary events. PNAS. 2005;102:9679–9684. [PubMed]
  • Ohki K, Chung S, Ch'ng YH, Kara P, Reid RC. Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature. 2005;433:597–603. [PubMed]
  • Petersen CC, Malenka RC, Nicoll RA, Hopfield JJ. All-or-none potentiation at CA3-CA1 synapses. Proceedings of the National Academy of Sciences of the United States of America. 1998;95:4732–4737. [PubMed]
  • Poirazi P, Brannon T, Mel BW. Pyramidal neuron as two-layer neural network. Neuron. 2003;37:989–999. [PubMed]
  • Poirazi P, Mel BW. Impact of Active Dendrites and Structural Plasticity on the Memory Capacity of Neural Tissue. Neuron. 2001;29:779–796. [PubMed]
  • Polsky A, Mel BW, Schiller J. Computational subunits in thin dendrites of pyramidal cells. Nat Neurosci. 2004;7:621–627. [PubMed]
  • Redish AD, Touretzky DS. The role of the hippocampus in solving the Morris water maze. Neural computation. 1998;10:73–111. [PubMed]
  • Remy S, Spruston N. Dendritic spikes induce single-burst long-term potentiation. Proceedings of the National Academy of Sciences. 2007;104:17192–17197. [PubMed]
  • Rumelhart DE, Hinton GE, McClelland JL. A general framework for parallel distributed processing. Vol. 1. Cambridge, MA: Bradford; 1986.
  • Rutishauser U, Schuman EM, Mamelak AN. Activity of human hippocampal and amygdala neurons during retrieval of declarative memories. Proc Natl Acad Sci USA. 2008;105:329–334. [PubMed]
  • Sajikumar S, Frey JU. Late-associativity, synaptic tagging, and the role of dopamine during LTP and LTD. Neurobiology of learning and memory. 2004;82:12–25. [PubMed]
  • Schiller J, Major G, Koester HJ, Schiller Y. NMDA spikes in basal dendrites of cortical pyramidal neurons. Nature. 2000;404:285–289. [PubMed]
  • Schwartz JH. Ubiquitination, Protein Turnover, and Long-Term Synaptic Plasticity. Sci STKE 2003. 2003:pe26. [PubMed]
  • Schwenker F, Sommer FT, Palm G. Iterative retrieval of sparsely coded associative memory patterns. Neural Networks. 1996;9:445–455.
  • Sohal VS, Hasselmo ME. A model for experience-dependent changes in the responses of inferotemporal neurons. Network: Computation in Neural Systems. 2000;11:169–190. [PubMed]
  • Spruston N. Pyramidal neurons: dendritic structure and synaptic integration. Nat Rev Neurosci. 2008;9:206–221. [PubMed]
  • Squire LR, Stark CEL, Clark RE. The Medial Temporal Lobe. Annual Review of Neuroscience. 2004;27:279–306. [PubMed]
  • Squire LR, Wixted JT, Clark RE. Recognition memory and the medial temporal lobe: a new perspective. Nat Rev Neurosci. 2007;8:872–883. [PMC free article] [PubMed]
  • Tai HC, Schuman EM. Ubiquitin, the proteasome and protein degradation in neuronal function and dysfunction. Nat Rev Neurosci. 2008;9:826–838. [PubMed]
  • Takumi Y, Ramirez-Leon V, Laake P, Rinvik E, Ottersen OP. Different modes of expression of AMPA and NMDA receptors in hippocampal synapses. Nat Neurosci. 1999;2:618–624. [PubMed]
  • Tronson NC, Taylor JR. Molecular mechanisms of memory reconsolidation. Nat Rev Neurosci. 2007;8:262–275. [PubMed]
  • Watt AJ, Sjostrom PJ, Hausser M, Nelson SB, Turrigiano GG. A proportional but slower NMDA potentiation follows AMPA potentiation in LTP. Nat Neurosci. 2004;7:518–524. [PubMed]
  • Xiang JZ, Brown MW. Differential neuronal encoding of novelty, familiarity and recency in regions of the anterior temporal lobe. Neuropharmacology. 1998;37:657–676. [PubMed]
  • Zhao Y, Hegde AN, Martin KC. The ubiquitin proteasome system functions as an inhibitory constraint on synaptic strengthening. Curr Biol. 2003;13:887–898. [PubMed]