|Home | About | Journals | Submit | Contact Us | Français|
The ability to determine the interval and duration of sensory events is fundamental to most forms of sensory processing, including speech and music perception. Recent experimental data support the notion that different mechanisms underlie temporal processing in the subsecond and suprasecond range. Here, we examine the predictions of one class of subsecond timing models: state-dependent networks. We establish that the interval between the comparison and the test interval, interstimulus interval (ISI), in a two-interval forced-choice discrimination task, alters the accuracy of interval discrimination but not the point of subjective equality—i.e. while timing was impaired, subjective time contraction or expansion was not observed. We also examined whether the deficit in temporal processing produced by short ISIs can be reduced by learning, and determined the generalization patterns. These results show that training subjects on a task using a short or long ISI produces dramatically different generalization patterns, suggesting different forms of perceptual learning are being engaged. Together, our results are consistent with the notion that timing in the range of hundreds of milliseconds is local as opposed to centralized, and that rapid stimulus presentation rates impair temporal discrimination. This interference is, however, decreased if the stimuli are presented to different sensory channels.
Timing in the range of tens of milliseconds to a few seconds is of fundamental importance for a wide range of sensory and motor tasks (Ivry & Spencer 2004; Mauk & Buonomano 2004; Buhusi & Meck 2005; van Wassenhove 2009). For example, the ability to discriminate the interval and duration of sounds is critical for speech processing (Liberman et al. 1956; Scott 1982; Drullman 1995; Shannon et al. 1995; Aasland & Baum 2003). However, the neural mechanisms involved even in a simple temporal task, such as interval discrimination, remain unknown.
Advances in the understanding of the neural basis of learning and memory benefited tremendously from the realization that memory was not a unitary process, but could be divided into declarative and non-declarative memory and each of these into further subdivisions (Squire 1986). Similarly, the emerging realization that temporal processing is not a unitary neural process, but probably encompasses a number of independent or interdependent processes, is an important factor in understanding existing data and in guiding future experiments.
The mammalian brain processes temporal information and tells time over time scales exceeding 10 orders of magnitude: from the few microseconds used for sound localization, to daily, monthly and yearly rhythms relevant to sleep–wake, menstrual and seasonal cycles, respectively (Buonomano 2007). It is well established that the neural mechanisms underlying the shortest and longest extremes of temporal processing, sound localization and circadian rhythms, are entirely distinct and independent (Carr 1993; King & Takahashi 2000; Panda et al. 2002). While the mechanisms underlying timing in the intermediary range of milliseconds to minutes are not understood, it is becoming increasingly evident that this range is likely to also encompass distinct mechanisms (Fraisse 1984; Gibbon et al. 1997), and the distinction has been made between perceptual/automatic versus cognitive timing (Michon 1985; Rammsayer 1999; Lewis & Miall 2003), millisecond timing versus interval timing (Buhusi & Meck 2005) and millisecond versus second timing (Mauk & Buonomano 2004). Thus, in order to address the neural mechanisms of timing, it is useful to distinguish between various potential divisions of temporal processing. Although, the correct taxonomy of temporal processing remains an open question, relevant classification dimensions include:
Determining the correct temporal taxonomy will be critical in establishing a coherent and consistent interpretation of the increasing number of experiments aimed at understanding temporal processing. These issues and the different models of temporal processing will not be addressed in detail here, as they have been discussed in a number of recent reviews (Lewis & Miall 2003; Ivry & Spencer 2004; Mauk & Buonomano 2004; Buhusi & Meck 2005; Ivry & Schlerf 2008) as well as in the accompanying articles in this issue. Here, we will focus primarily on describing what we will refer to as the state-dependent network (SDN) model, which relates to subsecond sensory timing, and experimentally examine some of its predictions.
The SDN model proposes that temporal processing is inherently encoded in the state of neural networks (Buonomano & Merzenich 1995; Buonomano 2000). A useful analogy is dynamics in a liquid. A pebble thrown into a pond will create a spatial–temporal pattern of ripples, and the pattern produced by any subsequent pebbles will be a complex nonlinear function of the interaction of the stimulus (the pebble) and the internal state of the liquid (the current pattern of ripples). Ripples thus establish a short-lasting and dynamic memory of the recent stimulus history of the liquid. The state of a neural network includes ongoing activity (the active state) and the presence of time-dependent neuronal properties (the hidden state) (Buonomano & Maass 2009). In the case of an auditory interval discrimination task, there is an ‘empty’ period in the stimulus, during which the auditory cortex neurons generally stop firing, thus timing would rely primarily on the hidden state; i.e. the change in network state produced by properties such as short-term synaptic plasticity. In an interval discrimination task, the first tone will activate a population of neurons within a local cortical network; given the presence of many experimentally characterized neuronal and synaptic properties, with time constants in the order of hundreds of milliseconds, this local network should be in a different state before the arrival of the second pulse 100ms later. For example, as a result of short-term synaptic plasticity (Zucker 1989; Reyes & Sakmann 1999) synapses may be stronger or weaker, which should alter the population response to the same input. Differences in the population response can in turn code for time. In a sense, in the same manner that long-term potentiation provides a memory of coincident activity between groups of synapses that occurred minutes or hours in the past (Brown et al. 1990; Karmarkar et al. 2002), short-term synaptic plasticity provides a memory of an event that happened a hundred milliseconds ago.
The SDN model can be considered an intrinsic model of timing, in that it does not rely on what most would consider specialized timing mechanisms—although it could be argued that one of the specialized functions of short-term synaptic plasticity is temporal processing. Similarly, this class of models is also local, i.e. any cortical network could potentially process temporal information. Furthermore, interval discrimination could potentially rely on temporal processing at multiple sequential stages in the sensory hierarchy, and the relative contribution of low- and high-level areas could depend on the nature and design of the task.
The SDN model predicts that the arrival of each sensory event is encoded in the temporal context of previous events. Specifically, the second tone of a 100ms interval arrives in the network state established by the first tone, and thus the population response can encode this interval. However, if that 100ms interval happened to be preceded by another tone, then it will be superimposed on yet another neural network state. In the same manner that previous ripples on the surface of a pond will establish a ‘context’ or state that will alter the ripples produced by the next pebble thrown in, each sensory event will alter the response to the next. Because the superposition of these states is highly nonlinear, this model predicts that there is no built-in linear metric of time, such as the ticks of a clock. Thus, during a two-interval forced-choice interval discrimination task, the presentation of the standard interval can interfere with the processing of the comparison interval if the network has not had time to ‘reset’. Here, reset would correspond to the network returning to some baseline state, the time required for this would be determined by the time constants of the relevant time-dependent neuronal properties. For short-term synaptic plasticity, this is in the range of a few hundred milliseconds. Recent experimental results have established that indeed, short interstimulus intervals (ISIs) in an interval discrimination task impair temporal processing (Karmarkar & Buonomano 2007). Importantly, however, if both intervals were presented using tones of different frequencies, little or no impairment was observed. Thus, suggesting that timing is occurring locally, i.e. one interval does not interfere with the timing of the next if it arrives in a different local network—as would be expected during the presentation of different tone frequencies as a result of the tonotopic organization of the auditory cortex. Here, we examine a number of related predictions generated by the SDN model.
Subjects consisted of paid undergraduate and graduate students who reported having normal hearing, and were between the ages of 18 and 30 from the UCLA community. All experiments were run in accordance with the University of California human subjects guidelines.
As mentioned above, a previous study determined that short ISIs impaired interval discrimination if both intervals were presented at the same frequency, but not if they were presented using different frequencies (Karmarkar & Buonomano 2007). This study, however, did not examine whether the ISI effect was produced by a shift in the PSE (corresponding to time compression or dilation). Here, we first examined the effect of ISI and of changing frequencies on the PSE and DLs using a two-interval forced-choice procedure that allowed for the estimation of the psychometric functions.
The standard interval was 100ms. We used a 2×2 design, varying the ISI and frequency. The ISI was either short or long (mean of 250 or 750ms, respectively). The standard and comparison interval were of the same or different frequencies (see §2), resulting in four conditions: ShortISI-SameFr; LongISI-SameFr; ShortISI-DiffFr; and LongISI-DiffFr. The fitted psychometric functions for all subjects are shown in figure 1a. Group data suggest different DLs, but similar PSE values for all four conditions (figure 1b,c). A two-way analysis of variance with repeated measures revealed a significant interaction between ISI and frequency on the DLs, indicating an increase in threshold in the ShortISI-SameFr condition (F1,18=33, p<0.0005). By contrast, there was no significant interaction or main effects on the PSE.
These results replicate the main finding of previously published experiments (Karmarkar & Buonomano 2007): that short ISIs impair interval discrimination. Since this effect is limited to cases in which both the standard and comparison intervals are presented at the same frequencies, it seems that it is not a result of a general or non-specific effect of the increased stimulus presentation rate, but rather a result of the interference of the preceding stimulus on subsequent processing of intervals coming in on the same channel. Additionally, these results indicate that the impairment was not produced by time compression or dilation effect since there was no detectable shift in the PSE. Rather, the decrease in performance is attributable to a change in the precision of temporal discrimination. In these experiments, the subjects received feedback after each trial, thus it is possible that the lack of a change in the PSE was due to ongoing ‘recalibration’ during each block. However, a separate set of experiments in which the feedback was omitted still revealed the same effect of the DL and no effect on the PSE.
The above results and previously published data (Rammsayer 1999) are consistent with the notion that there is a transition between different neural mechanisms underlying timing somewhere in the range of hundreds of milliseconds. In order to gain insights as to where the boundary between millisecond and second timing lies, we performed further experiments in which we varied the ISI over five different intervals (50, 250, 500, 750 and 1000ms), again using a 100ms standard. Additionally, we performed a set of control experiments in which we examined the effect of three ISIs (250, 500 and 750ms) on a frequency discrimination task.
To obtain accurate threshold estimates with fewer runs, we used the reversal values of the adaptive procedure as opposed to the estimation of the psychometric functions to quantify performance. As shown in figure 2, the thresholds were higher for the two shorter ISIs. A repeated-measure ANOVA revealed a significant effect of ISI (F4,56=6.6, p<0.001). A planned comparison revealed that the only significant difference between adjacent ISIs was between 250 and 500ms (p=0.016; Bonferroni corrected). In contrast to the effect of ISI on interval discrimination, there was no significant effect of the three ISIs examined on frequency discrimination (F2,28=0.79, p=0.46).
These results demonstrate that the impairment of 100ms discrimination produced by short ISIs is strongest at 50 and 250ms. Interestingly, the magnitude of the impairments was not significantly different between the 50 and 250ms ISIs. The presence of a significant difference in performance between the 250 and 500ms ISIs, together with the absence of a difference between 500 and 750ms, suggests that in the framework of the SDN model, local networks settle back to a baseline state between 250 and 500ms.
Another approach to examining whether time is encoded in the population response of local networks, which in turn are influenced in a nonlinear fashion by the temporal context established by previous sensory events, is to examine generalization patterns of perceptual learning. Previous studies have used generalization to examine both the temporal specificity of interval learning and whether it generalizes across frequency channels and sensory modalities (Wright et al. 1997; Nagarajan et al. 1998; Westheimer 1999; Meegan et al. 2000; Karmarkar & Buonomano 2003). We next examined the results of training two groups of subjects on either the ShortISI-SameFr or LongISI-SameFr condition. The first goal of this study was to determine whether training on the ShortISI-SameFr could overcome the performance deficits observed above. The second goal was to examine, if learning occurred, whether it would generalize to the remaining three conditions.
Experiments were performed over 10 days. During the first and last days, subjects were administered three blocks of each of the four conditions. In the intervening 8 days, subjects ran 12 blocks on the trained condition (ShortISI-SameFr or LongISI-SameFr). Figure 3a,b show the learning curves of the subjects in both conditions. In each condition, eight subjects exhibited significant learning curves as determined by a significant linear trend using a one-way repeated ANOVA. The analysis of the generalization patterns was based on this subgroup of ‘learners’ (Wright et al. 1997; Karmarkar & Buonomano 2003). Importantly however, there was a significant difference in the pre- and post-test values for both groups when tested on their trained conditions (ShortISI-SameFr or LongISI-SameFr) independent of whether all subjects or the subset of learners were considered. To determine whether learning in each group generalized to the other three conditions, we performed a two-way ANOVA (repeated measures on both factors), where one factor was pre-test versus post-test, and the other, the three naive conditions. As shown in figure 3b, in the ShortISI-SameFr group, there was no significant main effect of training (F1,7=0.82, p=0.39) or of the interaction (F2,14=0.58, p=0.57). By contrast, in the subjects trained on the LongISI-SameFr conditions, there was a highly significant effect of training (F1,7=25.6, p<0.002) and no significant interaction (F2,14=0.68, p=0.52).
These results established that independent of whether subjects were trained on the ShortISI-SameFr or LongISI-SameFr conditions, they improved on the trained stimulus set. However, while the subjects trained on the ‘easy’ (LongISI-SameFr) showed robust generalization, those trained on the ‘hard’ condition (ShortISI-SameFr) did not show any significant transfer to the naive conditions. Interestingly, these transfer results are consistent with generalization patterns in other forms of perceptual learning, specifically training on an easy condition produces more robust generalization (Ahissar & Hochstein 1997). Indeed, training on LongISI-SameFr seemed to be as effective as actual training on the ShortISI-SameFr in improving ShortISI-SameFr performance. Specifically, post-test SameISI–SameFr threshold was on an average lower in the LongISI-SameFr group than in the SameISI–SameFr condition.
The above results provide a new set of constraints that must be accounted for by any general model of temporal processing in the millisecond range. While the results are largely consistent with the predictions made by the SDN model, they also highlight the need to further refine this model and cannot exclude a number of additional models. Below we address the implications of the current results.
Some of the first psychophysical evidence that millisecond and second timing may rely on distinct mechanisms was provided by Rammsayer and colleagues who showed that discrimination of a 1s interval was impaired when subjects performed an additional cognitive task, but 50ms discrimination was not (Rammsayer & Lima 1991). Additionally, pharmacological manipulations of the dopaminergic system and benzodiazepines can differentially affect 50–100ms and 1s discrimination (Rammsayer 1997, 1999). The observation that the relationship between performance and the standard interval, as measured by the coefficient of variation, is higher for short intervals, has also been used to argue that there is a transition between timing mechanisms in the range of hundreds of milliseconds (Gibbon et al. 1997; Mauk & Buonomano 2004). Additionally, experiments which show that short intervals are more impaired in inter-modal timing tasks are consistent with the notion that millisecond processing may rely more on local channel-specific networks, while longer intervals may be more centralized and less influenced by channel manipulations (Rousseau et al. 1983). In a meta-analysis study, Lewis & Miall (2003) suggested that the differential patterns of blood-oxygen-level-dependent activity in short- and long-interval discrimination tasks are also consistent with distinct neural mechanisms. Recent results have further supported the presence of different mechanisms by showing that a distractor stimulus preceding the interval to be discriminated impairs 100ms, but not 1s discrimination (Karmarkar & Buonomano 2007).
While there is mounting evidence for distinct mechanisms for a perceptual and cognitive timing, the boundary and degree of overlap between them is unclear. One of the goals of experiment 2 was to use the hypothesis that perceptual timing relies on local state-dependent computations and thus is susceptible to interference by preceding stimuli, and to examine the issue of where the transition between short- and long-interval mechanisms lies. The results suggest that the boundary may lie between 250 and 500ms. This range is consistent with the proposal that time-dependent neural properties such as short-term synaptic plasticity may underlie temporal processing, since many forms of short-term synaptic plasticity seem to take a few hundred milliseconds to ‘reset’, i.e. return to baseline PSP amplitude (Markram et al. 1998; Reyes & Sakmann 1999; Marder & Buonomano 2003).
An additional task that has been used to examine the boundary between different timing scales was one in which a variable ‘distractor’ is presented before the comparison interval. Similarly, to the task studied here, this distractor was predicted to alter the subsequent timing by placing the network in a different state during each trial. It was originally shown that a distractor with a 100 (50–150) ms mean significantly decreased discrimination of a 100ms task, but a proportional distractor did not impair a 1s discrimination task (Karmarkar & Buonomano 2007). A recent study replicated this finding for a 100ms standard interval, but reports that a 300 (225–375) ms distractor did not alter discrimination of a 300ms interval (Spencer et al. 2009); however, an additional study has reported a significant effect of a standard interval of 300ms using variable distractors with the same mean but with a range of 150–450ms (Rocca & Burr 2007).
Together, current studies suggest a boundary between perceptual and cognitive timing in the range of hundreds of milliseconds, and well below 1s. However, it is important to stress that in addressing the existence of distinct mechanisms for millisecond and second timing, it is critical to note that an actual ‘hard’ boundary is unlikely, rather a transition range with a significant degree of overlap is likely to be present. Furthermore, within this transition zone, it is likely that both mechanisms could operate in parallel and their respective contributions could depend on the nature of the task at hand.
Previous studies on perceptual learning of interval discrimination have revealed that learning is temporally specific; learning of one interval does not generalize to other intervals (Wright et al. 1997; Nagarajan et al. 1998; Karmarkar & Buonomano 2003). However, these studies and other studies have also demonstrated that interval learning can generalize to different auditory frequencies (Wright et al. 1997; Karmarkar & Buonomano 2003), visual locations (Westheimer 1999) and from one modality to another (Nagarajan et al. 1998; Meegan et al. 2000). The interval specificity could be interpreted as meaning that there are specialized timing circuits for each interval; however, this is also what is expected from the SDN model if one assumes that learning consists of an improved readout of the population code specific to each interval (Buonomano 2000). By contrast, the generalization to different spatial channels could be used to argue that there is a central timer (see below).
The perceptual learning results presented here further establish that interval discrimination undergoes learning, and demonstrate that the severe impairment produced presenting the standard and comparison intervals in close temporal proximity can be overcome to the extent that performance becomes similar for both the short and long ISIs (figure 3b). Interestingly, however, training on the ShortISI-SameFr condition did not improve performance on the LongISI-SameFr condition. This result is unique in that it demonstrates a highly specific form of learning, i.e. there was no generalization to the same standard interval of 100ms when the ISI was 750ms—in other words, in this case learning was specific to both the ISI of 250ms and the frequency condition. By contrast, training on the LongISI-SameFr condition transferred to other conditions. Thus, LongISI-SameFr training did result in improvement in the ShortISI-SameFr; however, there was still a significant difference between LongISI-SameFr and ShortISI-SameFr after training (p=0.003), which was not the case after ShortISI-SameFr training. Thus, the fact that the ISI impairment was erased after training on ShortISI-SameFr but not after LongISI-SameFr training suggests that qualitatively different learning strategies are being engaged (see below).
As recently pointed out by Ivry & Schlerf (2008), a number of critical issues remain unaddressed in most models of temporal processing, including in the SDN model. One issue relates to the transfer of interval discrimination learning to different sensory channels (Wright et al. 1997). First, in interpreting these psychophysical results, it is critical to recall the often implicit assumption that there exists a single mechanism or site of learning is unlikely to be true. Neurophysiological and psychophysical perceptual learning studies have indicated that there are probably a number of different forms and sites of plasticity operating in parallel (Gilbert et al. 2001; Ahissar & Hochstein 2004; Amitay et al. 2006). A perceptual task relies on a number of distinct cognitive mechanisms. In the case of interval discrimination, in addition to a means to measure time per se, it is also necessary to temporarily store the standard interval, compare the measured intervals and make a decision based on this comparison. While temporal perceptual learning may indeed rely primarily on improvement of the temporal component, there is little evidence that it could not be a result of improved memory of the standard interval or in the comparison of both intervals. Indeed, an improvement in either of these mechanisms could explain the interval specificity of learning as well as the spatial generalization. Additionally, it is important to emphasize that while the SDN model directly addresses the potential timing mechanisms, it does not make any strong predictions regarding the mechanisms of temporal perceptual learning.
Independent of the mechanisms of temporal perceptual learning, a critical question common to all local models of temporal processing, including the SDN model, remains: how are intervals on different channels compared? Specifically, if we assume that temporal computations occur in local cortical networks, how do we compare the interval at one frequency with that from another frequency or modality? The population response ‘signature’ to a 100ms interval in the auditory and somatosensory cortices should be entirely unrelated. This is a fundamental problem, but not unique to timing; it is a restatement of the problem of how the brain performs invariant pattern recognition (Olshausen et al. 1995; Buonomano & Merzenich 1999; DiCarlo & Cox 2007). How do we know that the letter ‘A’ in the left hemifield corresponds to the same symbol when it is flashed to the right hemifield? Or similarly, how do we know that the same word spoken in a low or high-pitched voice is the same word? In both cases, the set of primary cortical neurons activated by both stimuli is non-overlapping. Although the mechanisms underlying invariant pattern recognition remain unknown, a number of proposed solutions require experience-dependent mapping of different sensory representations to a common higher order representation. It seems inevitable that all local models of temporal processing will have to rely on some similar mapping, which would allow intervals on different sensory channels to be mapped to a shared representation. A related possibility is that generalization across different frequencies or modalities could occur despite the fact that timing per se is occurring in different networks because the code could be the same. A simple example of a SDN model of this kind would be a ‘suppression code’. Specifically, it is well established that the neural and population response to the second of a pair of tones can be suppressed (forward masked) by the first, and the magnitude of this suppression is time dependent (Brosch & Schreiner 1997; Rennaker et al. 2007). Thus, the magnitude of the response could encode time, such a code could be considered a type of an energy model of timing, and could potentially be universally read out by downstream neurons.
The strong prediction of the SDN model is that there is no linear metric of time. This means that the population code for a 100ms interval is not inherently related in any linear fashion to the population code for a 200ms interval—by contrast, in a clock model, if 100 ticks corresponds to 100ms it can be immediately established that 200 ticks corresponds to 200ms. However, it is important to note that SDN does not imply that the appropriate mapping of the network response to a linear metric cannot be learned through experience. Indeed, we interpret the fact that subjects improved in the ShortISI-SameFr as evidence of this (recall that in this condition the ISI varied between approx. 190 and 310ms). Clearly, we learn to identify the same intervals in a multitude of different temporal contexts. For example, anyone fluent in Morse code must learn to identify whether the duration of a tone was short or long in the context of an extremely complex and rapid sequence of previous tones. Morse code and language are, of course, complex tasks, requiring years to learn, and some of this learning may be devoted to establishing that the same stimulus can produce different neural population codes depending on the temporal context. The interference between successive stimuli would be lessened by decreasing the presentation rate of the stimuli—which may be related to why the initial stages of Morse code and language learning are facilitated by slow rates.
It is clear the SDN and other models of temporal processing are not sufficient to explain all facets of temporal processing, particularly regarding the mechanisms underlying temporal perceptual learning. As we develop more elaborate models and theories of temporal processing, it will be important to distinguish between task components that reflect true temporal processing and those that correspond to more general cognitive components shared by non-temporal perceptual tasks, such as the buffering and comparison of stimulus features, and invariant forms of pattern recognition. Additionally, while our current focus remains on simple temporal tasks, such as interval and duration tasks, it is ultimately necessary that the same models account for complex forms of temporal processing, such as temporal sequences or Morse code. The SDN has this potential, but predicts that previous stimuli can interfere with the encoding of subsequent temporal features. This is both an inherent strength and weakness of the model. A strength because it naturally encodes complex temporal patterns as well as simple intervals (Buonomano 2000); a weakness because by encoding every object in the context of the previous, it becomes challenging to identify specific temporal objects embedded in a stream of stimuli (Knüsel et al. 2004).
This research was supported by the NIMH.
One contribution of 14 to a Theme Issue ‘The experience of time: neural mechanisms and the interplay of emotion, cognition and embodiment’.