|Home | About | Journals | Submit | Contact Us | Français|
Based on behavioral studies, several relatively distinct perceptual and cognitive functions have been defined in cognitive psychology such as sensory memory, short-term memory, and selective attention. Here, we review evidence suggesting that some of these functions may be supported by shared underlying neuronal mechanisms. Specifically, we present, based on an integrative review of the literature, a hypothetical model wherein short-term plasticity, in the form of transient center-excitatory and surround-inhibitory modulations, constitutes a generic processing principle that supports sensory memory, short-term memory, involuntary attention, selective attention, and perceptual learning. In our model, the size and complexity of receptive fields/level of abstraction of neural representations, as well as the length of temporal receptive windows, increases as one steps up the cortical hierarchy. Consequently, the type of input (bottom-up vs. top down) and the level of cortical hierarchy that the inputs target, determine whether short-term plasticity supports purely sensory vs. semantic short-term memory or attentional functions. Furthermore, we suggest that rather than discrete memory systems, there are continuums of memory representations from short-lived sensory ones to more abstract longer-duration representations, such as those tapped by behavioral studies of short-term memory.
Making sense of our everyday environment is a complex task. The brain must map the stream of sensory stimuli to sensory and higher-order object and event representations. With the help of such representations, the brain can prioritize processing of relevant stimuli, predict what will happen next, and quickly react to unexpected events. The brain is also capable of memorizing and learning to categorize novel perceptual features and objects. In the field of cognitive psychology, these perceptual and cognitive functions have been studied as relatively distinct research questions. For example, in classical memory models, the short-lived (~seconds) highly accurate representation of elementary stimulus features has been termed sensory memory, whereas longer-duration (tens of seconds) and semantically higher-order memory representations are referred to as short-term memory. The ability to attend certain inputs has been called selective attention, and the capability of reacting to novel events outside of the focus of attention has been termed involuntary attention. Finally, the ability to learn new categories for sensory events, such as when learning to discriminate phonemes of a foreign language, has been termed perceptual learning. However, while behaviorally manifested as relatively distinct perceptual/cognitive functions, it is possible that they are supported by common underlying neural processes, instead of specialized modules. In the following, we examine the hypothesis that these cognitive and perceptual functions might be supported by similar neuronal mechanisms.
Based on the hypothesis that common neural mechanisms could support sensory memory, involuntary and selective attention, as well as perceptual learning in the auditory system, we propose a model shown in Figure 1. In this model, we hypothesize that the type of input (bottom-up vs. top-down) and the hierarchical level which the inputs target (assuming continuums of larger and more complex receptive fields, as well as progressively longer temporal receptive windows as one steps up in cortical hierarchy) determine whether the inputs give rise to representations matching the classical characteristics of sensory vs. short-term memory, or whether they support selective attention. This work builds on a model that we have proposed earlier in which auditory stimuli, auditory selective attention, and cross-modal inputs each cause transient center-excitatory and surround-inhibitory modulations (i.e., short-term plasticity) in topographically organized populations within auditory cortex, thus potentially giving rise to neural representations that underlie auditory sensory memory. In the model we further suggested that such short-term plasticity underlies visual-auditory perceptual interactions, filters relevant sounds during rapidly changing environmental and task demands, and forms the basis for longer-lasting perceptual learning (Jaaskelainen et al., 2007). For similar, earlier, theoretical formulations, see (Grossberg, 1980). In the following, we will first review findings that show how bottom-up input suppresses/adapts neurons in sensory cortices in a feature-specific manner, and how such short-term plasticity can give rise to sensory memory representations.
Neurons responding to specific stimulus features within the primary and secondary sensory cortices of each modality (deCharms et al., 1998; Hind, 1953; Hubel et al., 1968; Werner et al., 1968) are partially suppressed for brief durations (~seconds) after each stimulus. This robust phenomenon seems to be a general processing principle as it has been shown in the auditory (Calford et al., 1995; Davis et al. 1966; Fruhstorfer et al., 1970; Lu et al., 1992a; Ulanovsky et al., 2004), visual (Ohzawa et al., 1982; Desimone 1996; Uusitalo et al., 1996), and somatosensory (Chung et al., 2002; Hellweg et al., 1977) systems. Here, we call this phenomenon stimulus-specific adaptation (SSA), after Ulanovsky and colleagues (Ulanovsky et al., 2003). SSA has also been referred in the literature as post-stimulus inhibition, response saturation, paired-stimulus inhibition, paired-pulse inhibition, repetition suppression, adaptation, habituation, forward masking, and refractoriness. With the term SSA we refer to the phenomenon of the response to a stimulus that bears resemblance to the preceding stimulation being reduced in amplitude and delayed in latency as compared with the response to the same stimulus when presented without preceding stimuli. When the preceding stimuli are different enough to activate non-overlapping stimulus-specific neuronal populations SSA does not occur (see (Jaaskelainen et al., 2007; May et al., 1999)).
The neural basis of SSA has been a topic of considerable research and there are multiple candidate neurophysiological mechanisms. Some studies suggest that surround inhibition is the major mechanism contributing to SSA via inhibitory interneurons (Krnjevic et al., 1966). There are similar observations in the somatosensory system, as the regular spiking units in the rat barrel cortex were inhibited more prominently when the whisker deflection angles were non-preferred vs. preferred (Brumberg et al., 1996). Further, in the visual system, SSA seems to utilize a surround inhibition mechanism (Shapley et al., 2007), an interpretation further supported by ERP findings (see below). Contradicting evidence has been, however, obtained in electrophysiological studies that, using pure tones, have noted strongest SSA for stimuli presented at a given neuron’s best frequency (Bartlett et al., 2005) instead of a surround-inhibition effect. Yet, auditory cortex neurons are driven more strongly by amplitude- or frequency modulated stimuli than by pure tones (Wang et al., 2005), tentatively suggesting that studies inspecting SSA effects with pure tones may have failed to optimally match the receptive fields. If this is the case, the effects of SSA might have been sampled at the inhibitory surround only, which would result in observing only monotonic SSA effects. The resilience of neural firing to optimal stimulus features can also be seen in studies where spectrotemporal receptive fields are obtained by presenting continuous white noise (or temporally orthogonal ripple stimuli) and cross-correlating the stimulus spectrogram with the spiking behavior of the neuron (deCharms et al., 1998; Fritz et al., 2003) (i.e., the neuron fires throughout the continuous stimulation whenever the noise/stimulation optimally matches the receptive field center, but becomes adapted for nonoptimal stimuli). Visual studies have suggested that modulation/suppression of thalamocortical inputs may partially explain the surround inhibition observed in cortex (Chung et al., 2002; Durand et al., 2002; Durand et al., 2007; Ozeki et al., 2004). As a note of caution, however, whether surround inhibition constitutes the neural basis of SSA (as proposed in our model) remains a topic of speculation that should be empirically tested in future studies.
The electro- and magnetoencephalographic (EEG/MEG) N1 response—which is a correlate of stimulus detection (Parasuraman et al., 1980; Pins et al., 2003)— is robustly suppressed when adaptor stimuli are introduced in auditory (Hari et al., 1982; Lu et al., 1992a; Lu et al., 1992b) visual (Kenemens et al., 2003; Manahilov et al., 1986) and somatosensory (Kekoni et al., 1997) studies. In light of the neurophysiological findings reviewed above on surround inhibition, N1 repetition suppression might be due to most of the local neuronal population being adapted by the preceding stimulus, with only neurons most optimally tuned to the eliciting stimulus left responsive. This, admittedly speculative, hypothesis is tentatively supported by findings showing that increases in auditory N1 response latencies due to preceding auditory stimulation follow a bi-phasic function: when the preceding stimuli were either identical to or widely different from the test stimulus, response latencies were slightly increased, however, with small and intermediate differences in sound frequency the response latencies were robustly increased (May et al., 1999). Lateral inhibition effects on the N1 response have been also described in auditory MEG studies (Pantev et al., 2004). Thus, SSA might serve the purpose of preserving the firing of neurons that accurately represent the features of the incoming stimulus while dampening the activity of neurons that less well match the stimulus features, giving rise to a spatially organized memory representation (see below). Tentatively, it is possible that observations of “sparse” representations of sensory events in auditory (Hromádka et al., 2008) and visual (Vinje et al., 2000) cortices could be explained by surround inhibition. Taken together, these findings suggest that SSA tunes topographically organized neural populations to form transient representations of sensory stimuli. Importantly, such transient representations might support sensory memory. In the following, we review empirical findings supporting this hypothesis.
Sensory memory is an accurate memory representation of the auditory, visual, and somatosensory stimuli that lasts for a brief duration. Sensory memory representations also form a sensory-level prediction of the immediate future (Raij et al., 1997), thus allowing determination of stimulus novelty (Jaaskelainen et al., 2004). Based on behavioral studies, the auditory sensory memory (also called “echoic memory”) has been suggested to last for several seconds (Erikson and Johnson, 1964; Sams et al., 1993), whereas shorter duration has been suggested for the visual sensory memory (~hundreds of milliseconds) (Deutsch et al., 1963). There is, however, behavioral evidence suggesting that both auditory and visual sensory memories are composed of a highly transient (~few hundreds of milliseconds) and accurate representation of elementary physical stimulus features and of a longer-duration (from several seconds up to a few tens of seconds) storage that contains already somewhat more abstract features (Cowan, 1995).
In human MEG studies, the adaptation time constant of auditory cortex N1 response has been found to closely correlate with the duration of behaviorally measured auditory sensory memory (Lu et al., 1992a). Human MEG studies have also shown that primary sensory cortices exhibit shorter adaptation time constants than secondary sensory cortices in both the auditory (Lu et al., 1992b) and visual (Uusitalo et al., 1996) modalities. Further, single-neuron recordings in the cat auditory cortex have shown that within the primary auditory cortex (A1) the duration of SSA is not uniform, but rather there is a continuum of adaptation time constants (Ulanovsky et al., 2004). It remains to be tested whether the longer average adaptation time constants, observed in non-invasive neuronal population level MEG recordings in lower- vs. higher-order cortices (Lu et al., 1992b; Uusitalo et al., 1996), reflect shifts of the distribution of adaptation time constants of individual cells towards longer ones, as we here hypothesize in our model. Tentatively, it is possible that the range of adaptation time constants within (Ulanovsky et al., 2004) and across (Lu et al., 1992b; Uusitalo et al., 1996) levels of cortical hierarchy tuned to various stimulus features (Ahveninen et al., 2006; Bendor et al., 2005; Bilecen et al., 2002; Hall et al., 2006; Obleser et al., 2006; Pulvermulller et al., 2006; Talavage et al., 2004) could form the basis for sensory memory representations (see (Jaaskelainen et al., 2007; Nelken et al., 2003), for a schematic illustration, see Figure 2). If one speculates further, modulation of spontaneous oscillatory activity of the local neuronal population (Steinmetz et al., 2000; Whittington et al., 2000) by SSA might, in turn, be the mechanism that represents (Palva et al., 2005) and conveys the contents of sensory memory to other brain areas via cortico-cortical and corticofugal (Winer, 2006) connections.
Psychophysics studies have described longer sensory memory durations (i.e., larger recency effect in immediate serial recall task) for complex (speech) that simple sounds (Surprenant et al., 1993), which is consistent with the findings of shorter adaptation time constants in primary vs. higher-order sensory areas (Lu et al., 1992b; Uusitalo et al., 1996) and behavioral evidence suggesting at least two sensory memory stores of different lengths (Cowan, 1995). Further evidence for the role of SSA in supporting sensory memory comes from studies suggesting that SSA forms the neurophysiological basis of the so-called mismatch negativity response, which is a correlate of auditory sensory memory (Jaaskelainen et al., 2004; May et al., 1999; Ulanovsky et al., 2003; for an ongoing debate on this issue, see Garrido et al., 2009; May et al., 2010; Naatanen et al., 2005). Similar evoked responses have been documented also in visual (Kenemens et al., 2003) and somatosensory (Kekoni et al., 1997) studies in human subjects, suggesting that adaptation could support sensory memory in each sensory modality.
It has been suggested that some memory representations are supported by cells exhibiting sustained firing, such as those found in monkey prefrontal cortex during the delay period in working memory tasks (Fuster et al., 1971; Rainer et al., 1998) and, recently, cells that have been observed to store a continuum of reward memory traces in prefrontal, cingulate and parietal cortical areas (Bernacchia et al., 2011). This could be interpreted to contradict the idea that cortical adaptation --a reduction in activity-- could support sensory and short-term memory. However, recent models support the idea that neuronal firing per se is not sufficient to enable short-term memory representations in the prefrontal cortex, and that rapid changes in synaptic weights (i.e., short-term plasticity) may give rise to distributed neuronal representations (Mongillo et al., 2008), and that oscillatory mechanisms might help activate cell assemblies that underlie different short-term memory representations (Colliaux et al., 2009). Furthermore, if indeed lateral inhibition is the primary neurophysiological mechanism underlying SSA (Jaaskelainen et al., 2007; May et al., 1999; Okamoto et al., 2004; Okamoto et al., 2005), those neurons that are most accurately tuned to the stimulus are spared by SSA, thus giving rise to a pattern of neurons that are left spontaneously active in the neural population, which might constitute the memory representation. Observations of sparse distributed representations of sounds in the auditory cortex (Hromádka et al., 2008), and recent findings of distributed patterns of suppressed auditory cortex hemodynamic activity that correlates with sensory memory (Linke et al., 2011), also support this hypothesis. Further, even if the neurons optimally tuned to the stimulus are fully adapted as suggested by some studies (see, for instance Bartlett et al., 2005), it can be argued that this changes the activity of the neural population, thus eliciting “negative image” memory representations. For instance, recent findings in the ferret auditory cortex suggest that dynamic shaping of neuronal receptive fields can be caused by synaptic depression (David et al., 2009), which is a mechanism with a behaviorally valid time scale (i.e., up to several seconds or a few tens of seconds) and recent models have also stressed the importance of synaptic depression as a mechanism that allows neuronal gain control (Rothman et al., 2009). Thus, while the specific neurophysiological mechanisms underlying SSA remain unclear, SSA appears to have the right properties to support short-lived sensory memory representations. As a note of caution, however, empirical evidence for the relationship between adaptation time constants and memory representations is far from being unequivocal, and thus this key hypothesis of our model should be tested in future empirical work, for instance, by rigorously comparing adaptation timescales and durations of memory representations across the cortical hierarchy.
Recent functional magnetic resonance imaging (fMRI) findings have further suggested that there are increasingly long temporal receptive windows in hierarchically higher order areas (Hasson et al., 2008; Kauppi et al., 2010; Lerner et al. 2011; Raij, 2008), which is consistent with feed-forward processing with aggregate adaptation at each step and might be related to the behavioral findings showing longer-duration memory for words as opposed to simple auditory stimuli (Surprenant et al., 1993). These findings are also consistent with classification of memory systems to short-lived sensory memory retaining elementary stimulus features vs. longer-duration limited-capacity short-term memory for semantic memory representations. However, given that there appears to be a gradual increase in adaptation timescales, temporal receptive window length, and receptive field complexity, as one steps up the cortical hierarchy, it might be more appropriate to view these memory systems as a continuum from sensory to short-term memory, rather than separate modules (see Figure 1). This model is in line with arguments for division of sensory memory into subclasses of shorter and longer durations (Cowan, 1995), but goes even further by proposing that there are continuums of sensory memory traces of different durations and levels of abstraction. There is, however, abundant empirical evidence in favor of the modular memory systems (see, for instance, Baddeley 1990) and thus what we propose should be viewed as a hypothesis that should be empirically tested in future studies by explicitly investigating whether there are memory continuums between the discrete classes of memory traces (e.g., whether there memory traces of intermediate length and level of abstraction between those traditionally associated with sensory and short-term memory stores).
One of the perceptual functions that an accurate sensory memory representation enables is discrimination between sensory stimuli (Moore et al., 1999). In keeping with the adaptation hypothesis of sensory memory, rapid adaptation in the visual system increases the accuracy of neural population coding that correlates with behavioral performance (Gutnisky et al., 2008). Specifically, adaptation de-correlates activity of neurons responding to orientations similar and highly dissimilar from the orientation of the test stimulus, while neurons responding to intermediately different orientations were less de-correlated (Gutnisky et al., 2008). In the auditory modality, it is well known that hearing a “sample” of a target sound makes it easier for the auditory system to focus attention on subsequent presentations of the target. In other words, formation of transient sensory memory representations enables selective attention to certain stimulus features and perceptual objects, as also suggested by the presence of interaction between sound presentation rate and attention on auditory cortex activation strength (Rinne et al., 2005). Indeed, what makes the findings of SSA tuning the auditory cortex feature specific neuronal populations especially interesting is that similar receptive field tuning mechanisms have been suggested to underlie selective attention (Fritz et al., 2007). We will review these findings next.
In auditory studies in awake, behaving ferrets, spectrotemporal receptive fields (STRF) of A1 neurons demonstrated rapid plasticity, shifting to encompass the frequency of the target tone when the attended sound frequency was sufficiently close to the excitatory center of the neuron’s receptive field (Fritz et al., 2003) (see Figure 3). This effect seemsed to be caused by top-down center excitation and concomitant surround suppression centered at the target sound frequency. When the animals attended to temporal sound features or to multiple-tone sound targets, STRF changes revealed task-specific “signature patterns” of top-down facilitation and suppression (Fritz et al., 2005; Fritz et al., 2007), with some neurons differentially changing their STRF depending on task requirements. This may suggest that the STRF changes increase figure/ground separation by filtering out the background while enhancing the target (Fritz et al., 2007). Further, the receptive field changes occurred relatively quickly (within the 2.5 min required for STRF measurement) and were correlated with improved behavioral performance (Fritz et al., 2003; Fritz et al., 2007). Interestingly, recent results based on simultaneous recordings of prefrontal and auditory cortex activity suggest that the task-related receptive field changes in A1 are driven by behaviorally gated prefrontal cortical neuronal activity (Fritz et al., 2010).
Supporting the findings of task-related enhancement of response selectivity in animal models, task-dependent modulation of sensory processing has been reported to be specific to particular levels in brain hierarchy, as opposed to global cortical response enhancement. In our combined fMRI/MEG study, the selectivity of human anterior secondary auditory cortex (“what” pathway) to vowels was enhanced when subjects directed their attention to phonetic features (Ahveninen et al., 2006). Conversely, selectivity of posterior auditory cortex “where” processing pathway to sound loci was enhanced when subjects attended to locations in space (Ahveninen et al., 2006). The latter finding is supported by recent fMRI findings (Krumbholz et al., 2007). Changes in tonotopic maps (Ozaki et al., 2004; Paltoglou et al., 2009) and increased auditory cortex selectivity for sound frequency (Ahveninen et al., 2011; Kauramäki et al., 2007; Okamoto et al., 2007) have been further observed when subjects selectively attend to pitch (see Figure 4), with the enhanced sound frequency selectivity correlating with improvements in pitch discrimination task performance (Kauramäki et al., 2007). In a recent study, we observed rapid (~seconds) retuning of MEG responses, estimated to originate from non-primary auditory cortical areas, upon directing selective attention in a dichotic listening task, and this effect correlated with behavioural sound discrimination accuracy (Ahveninen et al., 2011). In human intracranial recordings, similarly enhanced responses to attended — and suppressed responses to unattended — stimuli have been noted, with the enhancement occurring at a shorter latency than the suppression (Bidet-Caulet et al., 2007). Thus, focusing attention on a given acoustic feature seems to enhance neuronal selectivity for that feature in the part of the sensory cortex that is specialized for processing it.
Recent findings in the visual and somatosensory systems suggest that feature-specific enhancement of neuronal selectivity could be a more general mechanism underlying selective attention, rather than something that is specific to the auditory modality. Feature-specific visual attention has been reported to enhance responses to preferred stimuli and to decrease responses to non-preferred stimuli (Kostandov et al., 2006; Martinez-Trujillo et al., 2004; Okazaki et al., 2008). Further, visual area MT was differentially modulated during speed vs. contrast discrimination tasks in human subjects (Huk et al., 2000), V1 differentially activated when subjects directed their attention to points in space vs. to task structure (Jack et al., 2006), and attention-driven suppression of visual responses to stimuli surrounding the attended location was recently observed in both striate and extrastriate cortex (Slotnick et al., 2003), in addition to the well-documented facilitation at the attended location (Munneke et al., 2008). A recent study suggested that rapid task-modulation of auditory cortex receptive fields similar to what has been described in Figure 3 may also occur in visual cortex. Specifically, in macaque visual area MT, spatial attention led to shifting of receptive fields towards the attended location as well as in shrinking of the receptive field sizes in the attended locations (Womelsdorf et al., 2006) (see Figure 5). In a human somatosensory MEG study, selective attention enhanced early M50 responses to stimulation of task-relevant fingers while suppressing responses to stimulation of task-irrelevant fingers (Iguchi et al., 2005). Further, attention dynamically altered spatial processing of painful stimuli (Quevedo et al., 2007). Thus, selectively attending to certain features of auditory, visual, and somatosensory stimulation tends to selectively enhance responses to those features.
The findings reviewed above are seemingly in contrast with numerous previous reports that have suggested gain changes as opposed to enhanced selectivity or receptive field modulation with visual selective attention (McAdams et al., 1999; Treue et al., 1999). Also, the vast majority of human neuroimaging studies have reported response enhancements to attended stimuli that are consistent with a simple gain enhancement model of selective attention: auditory cortex response enhancement to attended vs. non-attended stimuli has been consistently shown in human EEG (Hillyard et al., 1973; Karns et al., 2008), MEG (Poghosyan et al., 2008; Rif et al., 1991; Woldorff et al., 1993) and fMRI studies (Jancke et al., 1999; Petkov et al., 2004), and auditory cortex selective attention effects are enhanced with faster stimulus presentation rates (Rinne et al., 2005). Response enhancement with selective attention has also been observed in the visual (Mangun et al., 1988; Moran et al., 1985; Tootell et al., 1998) and somatosensory cortical areas (Chapman et al., 2005; Hsiao et al., 1993; Hyvarinen et al., 1980; Karns et al., 2008; Steinmetz et al., 2000). These effects appear to occur already within the primary sensory cortices (Hyvarinen et al., 1980; Karns et al., 2008; Poghosyan et al., 2008; Rif et al., 1991; Woldorff et al., 1993). However, common to auditory (Jancke et al., 1999; Petkov et al., 2004), visual (Kastner et al., 2000), and somatosensory (Chapman et al., 2005; Hsiao et al., 1993; Hyvarinen et al., 1980) modalities, attentional response enhancements have been reported to be significantly larger in secondary than primary sensory cortical areas.
Recent modeling work suggests a viable explanation for the seemingly conflicting findings on whether selectivity enhancement or an overall increase in gain underlies selective attention. Specifically, selective attention might work by response normalization (Lee et al., 2009). According to this hypothesis, multiplicative gain changes predict response modulation when there is a single stimulus in the receptive field of the neuron. Thus, it is possible that under conditions where there are two or more competing stimuli within the receptive field (as often is the case in auditory studies where subjects discriminate target sounds from a train of “standard” stimuli, or when temporally orthogonal ripple combination TORC stimuli are played in the background to estimate the SRTFs), the normalization results in more complex non-linear modulations, such as shrinking of the receptive field (Lee et al., 2009). Attentional task (spatial vs. feature-based) also affects the normalization mechanism (Reynolds et al., 2009). Further, it has been suggested that synaptic depression, which is one of the candidate neurophysiological mechanisms for task-specific receptive field changes in A1, could underlie such normalization (Abbott et al., 1997), which is similar to our previously proposed conceptual model in which the pattern of post-stimulus inhibition helps refine sensory cortical representations during selective attention (Jaaskelainen et al., 2007).
In human studies that have reported attentional enhancement of primary auditory cortex responses, selective attention effects may occur as early as 20–50 ms from sound onset (Poghosyan et al., 2008; Rif et al., 1991; Woldorff et al., 1993) and 55–90 ms from picture onset (Poghosyan et al., 2008). The selective attention effects on secondary auditory cortex “what” vs. “where” processing streams begin at ~100 ms from stimulus onset, with attentional enhancements related to “where” processing occurring a few tens of milliseconds earlier than those associated with “what” processing (Ahveninen et al., 2006). Similarly, visual attention to spatial location enhances responses at earlier latencies whereas attention to color causes enhancement of activity at later latencies (Anllo-Vento et al., 1996) (although see also (Zhang et al., 2009)). These findings are in keeping with the hypothesis that the visual dorsal “where” pathway is responsible for conducting a fast and coarse stimulus analysis that then influences the longer-latency, more detailed, processing in the ventral “what” stream via top-down inputs from frontal cortical object representations (Bar et al., 2006); similar evidence of fast but coarse stimulus processing in the auditory “where” pathway has also been presented (Jaaskelainen et al., 2004). Furthermore, the presence of attentional modulation already in the primary sensory cortices at early latencies supports “early selection” or “gating” (Woldorff et al., 1993) over “late selection” (Hansen et al., 1980; Naatanen, 1992) models of selective attention.
Top-down inputs can extend from sensory cortices to subcortical sensory nuclei of the thalamus and colliculi. These modulations have been studied less extensively than the cortical modulations, but they allow task-related flexibility of sensory processing at very early stages. In anatomical studies, it has been noted that the number of descending fibers can be manifold greater than the number of afferent fibers, even though the descending inputs do not drive the target neurons as strongly as the ascending inputs (Winer, 2006), and the descending pathways traverse through fewer synapses than the ascending connections. In the auditory system, corticofugal connections from A1 to the ventral middle geniculate body (MGBv) follow tonotopic organization, and auditory cortex has been observed to exert center excitation and surround inhibition on MGB and inferior colliculi (IC) (Suga et al., 2000). In the human visual system, hemodynamic responses in lateral geniculate nucleus (LGN) are enhanced to attended and suppressed to ignored stimuli, with these effects being larger in magnitude than those observed in the striate cortex (McAlonan et al., 2008; O’Connor et al., 2002). These findings suggest that there are significant top-down modulations from cortex to subcortical structures similar to the center-excitation surround-inhibition top-down modulations in the sensory cortical areas.
The thalamic reticular nucleus (TRN) might play a pivotal role in top-down modulation of the sensory nuclei of the thalamus. In general, central and caudal parts of TRN receive connections from the sensory (and motor) cortices, while most of the frontal cortical areas connect to anterior TRN. However, certain monkey frontal lobe areas project widely to TRN including its caudal parts that receive projections from the sensory and polymodal cortices (Zikopoulus et al., 2006). Connections from A1 and A2 via TRN have been observed to inhibit neural activity in different tonotopically organized parts of the MGBv than direct A1/A2-to-MGBv excitatory inputs (Kimura et al., 2007), making it possible to “gate” auditory processing at the level of MGBv. For similar findings in TRN regarding the barrel and visual cortex, see (Temereanca et al., 2004) and (McAlonan et al., 2008), respectively. Further, cross-modal TRN projections have been observed, supporting the view that the sensory systems are highly intertwined even at a very low level of processing (Kimura et al., 2007; Raij et al., 2010). In summary, these findings suggest that there is robust task-related short-term plasticity already in subcortical sensory nuclei that may filter stimuli before processing in the sensory cortices.
The fact that neuronal mechanisms of selective attention are similar across different sensory systems tentatively suggests that short-term plasticity could be a general organizing principle that helps filter relevant stimuli from noise. Furthermore, it seems that top-down inputs tune feature-specific sensory cortical neural populations similarly to bottom-up inputs. Sensory cortices could then be understood as interaction surfaces between the sensory environment and internal goals of the organism, potentially explaining why it is easier to direct and maintain attentional focus right after the attended stimulus has occurred (i.e., when bottom-up and top-down inputs match). On the other hand, sometimes when listening intently, one can hear sounds that are not there. Tentatively, such illusory percepts could be partially explained by mismatching top-down and bottom-up inputs in auditory cortex (Jaaskelainen et al., 2007).
Besides helping to filter stimuli during selective attention, sensory cortical short-term plasticity can underlie more lasting changes in receptive fields of neurons. In animal studies it has been observed that some of the receptive field changes linger up to several hours (Fritz et al., 2007). Short-term plasticity could thus support perceptual learning, which refers to learning of stimulus discrimination, such as when learning to differentiate between phonemes of a foreign language or between sounds made by different musical instruments. It has been noted that perceptual learning is accompanied with lasting changes in receptive field properties of sensory cortical neurons, findings that will be reviewed below.
Several studies describe training-dependent plastic modulations that resemble short-term plasticity during selective attention. For instance, in rats, robust task-specific topographic map changes were observed when identical sets of auditory stimuli were presented with only one feature, frequency or intensity, as task relevant (Polley et al., 2006). These plasticity effects correlated with the degree of perceptual learning in the tasks (Polley et al., 2006). In classical conditioning and sensitization studies, receptive fields of auditory cortical neurons have been consistently shown to exhibit long-term modulations (Weinberger, 2007) that are similar to the transient task-related top-down modulations of STRFs observed in ferrets during attentive behavior (see Figure 6).
In humans, MEG studies have shown that musical training enlarges auditory cortical representations of sound features relevant to music perception (Pantev et al., 2003). EEG response modulation has also been reported when subjects learn to discriminate between phonetic stimuli (Alain et al., 2007; Tremblay et al., 2002). Recently, it was observed that such auditory-cortex changes can occur very rapidly, paralleling the time course (~tens of minutes) of perceptual learning (Alain et al., 2007). Experience-dependent plasticity has been described also in the somatosensory (Kerr et al., 2008) and visual (Sasaki et al., 2010) modalities.
Analogous to attentional effects there might be also subcortical long-term plastic changes that correlate with perceptual learning. In humans, musical experience dependent plasticity was recently documented in brainstem frequency following EEG responses, consistent with top-down modulations of auditory subcortical nuclei (Musacchia et al., 2007) (see Figure 7). Further, in recent ferret studies selective lesions of the descending cortico-collicular fibers fully prevented relearning of auditory localization upon occlusion of one ear (King et al., 2007; Bajo et al. 2010), while some relearning still occurred when A1 was inactivated (King et al., 2007).
Top-down influences, in particular attention to the to-be-learned stimuli, seem to be particularly important in perceptual learning (Alain et al., 2007; Amitay et al., 2006; Polley et al., 2006) (but see also (Sasaki et al., 2010; Seitz et al., 2005)), as highlighted by a recent study showing improved auditory frequency discrimination following training with physically identical tones (Amitay et al., 2006). It is possible that this resulted from tuning of the low-level representation against which subsequent discrimination is performed. However, some longer-term plasticity does occur without attention, as long-time adaptation was observed to tune the spatial feature selectivity of the cat visual cortex neurons even while under anesthesia (Bouchard et al., 2008). Recent theoretical work by Ahissar et al. (2009) suggests that perception normally operates at the level of higher-order representations that do not contain low-level sensory information. In order for perceptual learning to occur, specific learning paradigms need to be utilized, that allow modification of the low-level sensory representations (Ahissar et al., 2009).
Acetylcholine release from nucleus basalis (NB) onto the auditory cortex could explain why some selective attention related short-term plasticity effects persist (Weinberger et al., 2006). Specifically, classical conditioning in rats (Weinberger, 2007) seems to rely on similar neuronal effects to the STRF changes observed in ferrets during selective attention (Fritz et al., 2007). In a related study, pairing of broad-spectrum noise with NB stimulation resulted in widening of receptive fields of rat auditory cortical neurons, and this effect was reversed by subsequent pairing of NB stimulation with tones (Bao et al., 2003). Such long-term receptive field changes seemed to depend on ascending cholinergic neurotransmission from the NB (Froemke et al., 2007; Weinberger et al., 2006). Despite these findings, many open questions remain, as other transmitter systems beyond the cholinergic system have been implicated in sensory learning (Ji et al., 2007).
In our theoretical model, schematically illustrated in Figure 1, bottom-up input driven by sensory stimuli and top-down effects caused by activation of higher-order representations interact at different brain hierarchical levels to tune topographically organized neuronal populations via mechanisms involving center-excitation and surround-inhibition. In our model, this tuning forms sensory representations that may support, depending on the type of input (bottom-up vs. top-down) and hierarchical level (sensory vs. association cortex), sensory memory, short-term memory, involuntary attention, selective attention, and perceptual learning. Further, subcortical structures might be especially important for filtering (or gating) sensory input, while cortex could play a greater role in temporal integration of sensory events to increasingly longer and abstract memory representations (Hasson et al., 2008; Lerner et al., 2011; Nelken et al., 2003; Raij, 2008). We further propose, similar to suggestions of Bar and colleagues (Bar et al., 2006), that the “where” processing pathway of at least the visual (and possibly auditory (Jaaskelainen et al., 2004)) system provides an initial quick analysis of stimulus location and identity, and frontal object representations then feed back to the slower “what” processing streams to improve/facilitate recognition. We assume that the frontal higher-order representations are modality independent (i.e., are activated by corresponding inputs through each of the senses, see for instance (Barbey et al., 2009; Ojanen et al., 2005)), and that activation of such representations would send object-specific facilitatory feedback inputs to each of the relevant senses simultaneously. Naturally, there are numerous open questions and alternative mechanisms that need to be addressed in future experimental work. Rapid advances in functional neuroimaging techniques in humans (Ahveninen et al., 2006; Auranen et al., 2009; Lin et al., 2006; Nummenmaa et al., 2007), as well as developments in multi-neuron recordings in animals (Ohki et al., 2005) are making it possible to study the outstanding questions in detail.
Supported by the Academy of Finland, National Institutes of Health (R01 HD040712, R01 NS037462, R01MH083744-01A2, R21DC010060, 5R01NS048279-04, and P41 RR14075), Helen Hay Whitney Foundation (MA), National Center for Research Resources and Shared Instrumentation Grants S10RR14798, S10RR023401, S10RR019307, and S10RR023043.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.