Our results demonstrate that visual experience with an object has a highly similar influence on the dynamics of overall fMRI signal reduction and behavioral priming. Both were observed to be (i) relatively small for briefly presented stimuli that were hardly recognized; (ii) increase with level of prior visual exposure to be maximal for a duration of 250 ms; (iii) decrease in magnitude for prior exposures longer than 250 ms; and (iv) remain significant for at least 1900 ms of prior visual exposure. The data reported here reveal a novel and counterintuitive property of both repetition reduction and behavioral priming. Specifically, for both phenomena, this is the first demonstration that a maximal effect is obtained only for a prior exposure of 250 ms, and that the magnitude of these effects is reduced for longer durations. While our primary focus concerns experience-related reductions in cortical response and the general effect of visual exposure on object representations in the cortex, the striking similarity of the dynamics of repetition reduction and behavioral priming resonates strongly with the hypothesis that these two phenomena are critically related.
The cortical regions showing repetition-related response reduction in our fMRI results include bilateral collateral sulcus and fusiform gyrus, left lateral occipito-temporal sulcus, inferior temporal gyrus and inferior frontal cortex. Each of these regions has previously been found to exhibit reduced activity for repeated objects when compared with that for novel objects (Buckner et al., 1998
; Vuilleumier et al., 2002
; Sayres and Grill-Spector, 2003
; Maccotta and Buckner, 2004
). Although we did not test for distinct processing contributions of different regions, the sensory processing function typically associated with relatively posterior occipital-temporal regions suggests that the specific reduction found there might reflect perceptual priming. Results of previous fMRI studies also implicate more anterior regions of temporal-occipital and inferior prefrontal cortices, especially in left hemisphere, to be associated with object representations that generalize across different exemplars (Koutstaal et al., 2001
; Simons et al., 2003
) or viewpoints (Vuilleumier et al., 2002
) that involve lexical/semantic information (Demb et al., 1995
; Thompson-Schill et al., 1999
; Koutstaal et al., 2001
), or that concern task-specific (Wagner et al., 2000
) and response-related (Dobbins et al., 2004
) information associated with an object. We therefore take response reductions in these regions to reflect perceptually abstract and non-perceptual (e.g. conceptual, linguistic and response-related) components of priming (cf. Schacter et al., 2004
). As shown in , all of these regions produced similar ‘rise-and-fall’ patterns of exposure-related fMRI repetition reduction, suggesting that each of these areas either mediates, or is affected by, the processes involved in the response reduction. Thus, while the nature of the object-related information represented in these various cortical regions may differ, the processes that shape the representations found in each may be the same.
The different versions of our study yielded highly similar results. This suggests that our results are robust in the face of differences in the time interval separating the first and second presentations of each object, and differences in expectancies, strategies and contrast effects that the different experimental designs afford. In particular, the similarity across the event-related and blocked designs demonstrates that the ‘rise-and-fall’ pattern of results we obtained was not due to exposure-related differences in how subjects allocated their attention. Attentional confounds can be problematic for the results of block design experiments, because subjects typically know what condition to expect on every trial. However, there was no way for subjects in the event-related design to know what duration to expect for a forthcoming stimulus, because the presentation orders of the different conditions in these designs were intermixed. Thus, there was no way to allocate different levels of attention voluntarily when stimuli appeared for different exposure durations. The similarity in the results from the different designs therefore provides strong evidence, not only that the ‘rise-and-fall’ pattern of repetition reduction and behavioral priming replicates, but also that these effects are not an artifact of differences in the top-down allocation of attention.
Findings of repetition-related response reduction have been speculated to reflect a ‘sharpening’ of the cortical response (Desimone, 1996
). This hypothesis regarding the functional significance of repetition reduction was later interpreted (Wiggs and Martin, 1998
) to suggest that the reduced response is a manifestation of a selective representation, in which only key object features continue to be represented with repeated experience. These two proposals differ from each other in that one focuses on a ‘sharpened’, and presumably exhaustive, representation, whereas the other focuses on a selective, non-exhaustive representation. Although neither of these proposals would individually predict a pattern of exposure effects similar to that reported here, our findings support the coexistence of both mechanisms. As elaborated below, we suggest that these mechanisms operate separately from each other, and together create object representations that are both ‘sharpened’ and selective.
According to the present proposal, visual exposure to a certain object first recruits a sharpening process during which the initially broad cortical response becomes fine-tuned and maximally stimulus-specific. The cortical response to a visual input is initially driven by coarse information and global aspects of the image and, in that sense, is not optimal and therefore requires fine-tuning. Indeed, psychophysical experiments with stimuli ranging from simple gratings (DeValois and DeValois, 1988
) to complex scenes (Schyns and Oliva, 1994
) indicate that observers perceive global components considerably earlier than they perceive the stimulus-specific detail (Watt, 1987
; Bar, 2003
; Loftus and Harley, 2004
). Recent neurophysiological studies (Brown and Xiang, 1998
; Sugase et al., 1999
; Tamura and Tanaka, 2001
) support this idea by showing that activity in inferior temporal is initially, at~130 ms from stimulus onset, broad and relatively less selective to the specific stimulus, representing only its global properties (e.g. general orientation and dimensions). Then, at~240 ms from stimulus onset, the representation becomes stimulus-specific, such that only those neurons that best represent the specific properties of the stimulus continue to respond (Tamura and Tanaka, 2001
). Fine-tuning may also benefit from the attentional selectivity of neurons in inferior-temporal cortex, which follows a comparable timecourse (Chelazzi et al., 1998
): While cells initially show a similar response, regardless of the relevance of a particular stimulus, this response becomes highly selective in accordance with attentional demands within 200 ms from stimulus onset. Taken together, these timecourses are especially compelling in their similarity to our findings that exposure effects peaked for objects previously presented for 250 ms, suggesting that maximal fMRI signal reduction coincides with the completion of fine-tuning. The outcome of this fine-tuning process is an efficient but exhaustive representation of the stimulus. The representation is efficient in that each object’s feature is represented optimally, but is also redundant because it includes all of the features in the image. Based on the inverted U-shaped pattern of exposure effects we observed, it is proposed that a subsequent selection process eliminates this redundancy.
Given sufficient exposure to a specific object, this second process selects the key features from the fine-tuned, exhaustive representation of the object in a similar manner as suggested previously (Wiggs and Martin, 1998
). Subsequently, only the key features continue to be represented, while the neurons representing redundant features gradually respond less. Signals for guiding the selection of these key features may be projected back from the prefrontal cortex, which processes, among other things, semantic information about objects (Demb et al., 1995
; Wagner et al., 2000
), as well as from the amygdala, which analyzes emotionally relevant information (Hariri et al., 2002
). For the present purpose, key features are defined as either diagnostic features that distinguish the specific object from other objects, features that are critical for the specific task at hand, features that remain invariant under various viewing conditions, features of outstanding interest, or odd, surprising and unexpected features. For example, while the shape of the legs of a certain chair may be considered a key property and will continue to be represented, maintaining details about all four of its similarly looking legs is not essential for an economic and reliable representation. Being selective about which information is represented may also serve to emphasize the unique features of a certain object and thus make it more recognizable, just as a caricature of a face, eliminating non-distinctive extraneous information, can be recognized more accurately than its detailed, veridical version (Rhodes et al., 1987
). Thus, allocating neurons for representing redundant or non-essential features can be seen as a waste of resources (Lennie, 2003
), and it is predicted that representations are formed to minimize such cortical commitment whenever optimization is possible.
The selection process that we describe is proposed to help shape object representations. However, the term selection has also been associated, in a different context, with a mechanism that operates in left inferior frontal cortex to select among multiple lexical/semantic representations that compete for access to further processes based on their relevance to task and stimulus demands (Thompson-Schill et al., 1999
). Greater selection in this latter regard refers to the need to select an appropriate representation from many different representations. This between-representation process is therefore notably distinct from the within-representation process that we describe. Importantly, while selection between different semantic representations may occur primarily in left inferior frontal cortex, the shaping of object-related representations by the selection of which properties should continue to be represented may occur throughout various cortical regions involved in object priming and recognition.
The exposure-related fine-tuning and selection processes described here may overlap in time, but they are completed consecutively. Fine-tuning is guided by the arrival of gradually increasing details about the visual stimulus, and is therefore an inherently bottom-up process that is completed relatively early (e.g. our results suggest by~250 ms). The selection process, on the other hand, depends on high-level information and semantic knowledge, and is therefore predicted to be guided by top-down mechanisms and be completed relatively later (i.e. 350 ms and beyond based on our data). While future research is needed to address whether the precise time course of these processes depends on task demands or the processing complexity of individual objects, the present findings nevertheless suggest that the combined outcome of these two processes is an efficient and selective long-term representation.
How does this two-process model account for the parabolic pattern of our results? A mask presented after a picture interrupts further visual processing (Rolls and Tovee, 1994
; Kovaćs et al., 1995
). If we assume that priming captures the most developed representation up to this interruption, then measures of priming can be considered to reflect the latest outcome of the processes that shape visual representations (Bar, 2001
). When a mask interrupts processing at 250 ms, a comprehensive fine-tuning process has been completed, but the selection process has not yet developed. The resulting primed representation is therefore based on a fine-tuned representation of all the features. Accordingly, the next time subjects see that specific object the activation of this complete and fine-tuned object representation elicits a minimal cortical response. In other words, presenting the image first for 250 ms results in maximal repetition reduction relative to novel controls because all of the object’s features have been stored in a fine-tuned manner.
When, on the other hand, the mask interrupts visual processing at 350 ms or longer, after the subset of the relevant key features has been selected, the resulting stored representation is partial because it only includes key features. In other words, key features are primed and represented in their finetuned form, whereas ‘non-key’ features are no longer part of the object representation and are therefore primed relatively weakly, if at all. When a subject sees the specific object again, the primed features elicit a minimal response but the ‘non-key’ features elicit a response comparable to that of a previously unseen feature. This combination of activating primed and less-primed features results in a cortical reduction and RT improvement lower than the maximum, but higher than that obtained for a novel object.
We have described the operations of the fine-tuning and selection mechanisms primarily in terms of the formation of perceptual representations of objects. However, the proposed fine-tuning and selective processes are also presumed to operate to shape other types of object-related representations, such as those involved in the conceptual, linguistic and response-related components of priming (for a review of priming specificity, see Schacter et al., 2004
). Support for this comes from the fact that we obtained the same ‘rise-and-fall’ pattern of exposure-related response reductions in several cortical regions, including anterior temporal and inferior frontal regions that have been implicated in non-perceptual operations. This possibility underscores the potential generality and importance of our proposal, and emphasizes the need for future research to establish the extent to which the fine-tuning and selective processes might reflect the general operating characteristics of neural ensembles in shaping different types of cortical representations.
While our primary focus in this investigation concerns experience-related reductions in cortical response, our behavioral results also merit consideration. Indeed, despite decades of research interest in the behavioral manifestations of priming, evidence regarding the effects of initial exposure duration on subsequent recognition performance for repeated objects is lacking. As a result, the ‘rise-and-fall’ pattern of exposure-related behavioral priming effects that we obtained is itself a novel finding. Despite the lack of comparable prior object recognition studies, several studies using visually abstract or linguistic stimuli have manipulated prime exposure duration and are therefore relevant to the present results. However, many of these studies used only very brief prime exposure durations (<100 ms, e.g. Frost et al., 2003
), relatively long exposures (>1000 ms, e.g. Jacoby and Dallas, 1981
; Neill et al., 1990
; Musen, 1991
) or only two different exposure durations (e.g. Hirshman and Mulligan, 1991
, Experiment 3; Versace, 1998
; Versace and Nevers, 2003
), and none used reaction times to measure priming. Unfortunately, the absence of multiple prime durations and/or lack of a similar range of durations as used in our study impedes proper comparison of these results to our general ‘rise-and-fall’ pattern of behavioral priming effects.
Of the remaining studies that used relatively more comparable procedures, three provided results that are nominally consistent with our behavioral findings. Two studies reported by Crabb and Dark (1999
, Experiment 2), for instance, together show a similar ‘rise-and-fall’ pattern of priming effects on identification accuracy for words. In their first study (Crabb and Dark, 1999
), repeated target words that were actively attended to in prime displays were correctly identified more often than new, unprimed items. Importantly, for these items there was a priming-related ‘rise’ in the proportion of identified repeated items, relative to the proportion of identified new items, when the prime exposure duration increased from 100 ms (0.095) to 200 ms (0.126), and there was a ‘fall’ in priming magnitude when the duration increased further to 300 ms (0.100). Another of their studies that used longer prime exposure durations (Crabb and Dark, 2003
, Experiment 2) showed additional evidence for the ‘fall’ of priming, with nominally greater priming for words that were initially presented individually for 200 ms than for those presented for 600 or 1000 ms. Although the statistical reliability of these prior trends was not established, the general similarity between these results and ours supports the potential generality of our findings. Even more compelling in this regard are the results of von Hippel and Hawkins (1994
, Experiment 1). Prime words in their study were presented under perceptual study conditions for 50, 100, 200, 500 or 1000 ms. The proportion of these prime words that were subsequently used to complete word fragments (e.g. ma__l_ → marble) showed a ‘rise’ in priming, with steady increases with prime exposure from 50 ms to the maximal priming effect at 200 ms. The ‘fall’ of priming was also clearly evident in these results, with consecutive decreases in the proportion of primed fragment completions following 200 ms of prime exposure to that following 500 and 1000 ms of prime exposure, respectively. Furthermore, the quadratic trend defining this ‘rise-and-fall’ pattern of priming effects was found to be statistically reliable. Although this pattern was not as clear in other conditions in von Hippel and Hawkin’s (1994
) study, such as when subjects were required to type the name of previously primed words that were briefly flashed again for 33 ms, our survey of the behavioral priming literature nevertheless suggests that our proposal is further supported by previous reports.
We have interpreted the ‘rise-and-fall’ patterns of repetition-related response reduction and behavioral priming that we obtained as reflections of how cortical representations are shaped with increasing visual experience. Our account suggests that a ‘rise-and-fall’ pattern might be expected in any situation where a repeated stimulus and task-related demands are highly similar across both presentations, where a fine-tuned response to redundant and otherwise irrelevant features and information provides a greater overlap between an object’s cortical representation and the corresponding visual input, and where selection of only ‘key’ features for continued representation reduces this overlap. Importantly, our account does not suggest that increase exposure inevitably decreases behavioral performance with sufficient visual exposure. Indeed, the effect of the proposed selection process might often make object identification more efficient; that is, retaining only the most distinctive, relevant features and information about an object will generally make it easier to distinguish from other objects. Thus, eliminating the influence of redundant, less relevant information can aid identification. However, in our task, this normally redundant and less relevant information is in fact helpful, as it provides a greater overlap between an object’s cortical representation and the corresponding visual input. Maximal priming should therefore be observed in such situations whenever the object is most accurately and exhaustively represented (i.e. following maximal fine-tuning and minimal feature selection).
The reliable correlation and striking similarity in the ‘rise-and-fall’ pattern of repetition-related response reduction and behavioral priming we observed suggests that these phenomena are critically related. If the evolution of an object’s cortical representation is related to recognition ability, then at least some level of representational fine-tuning may be required before recognition of an object is possible. Consequently, if the representation activated in a second encounter is fine-tuned, RT is shorter than that observed for a novel stimulus because less time is required for recognition. Our proposal that fine-tuning is completed by 250 ms is supported in this regard by the fact that RTs were indeed fastest for objects shown previously for 250 ms, in addition to priming being maximal in this condition. The link to behavioral RT improvement is bolstered by the finding that the cortical response to visual objects is not only reduced with repeated exposure, but also peaks earlier (Noguchi et al., 2004
). Similarly, in a study of the cell population in IT, activity there initially distinguished between novel and familiar objects~100 ms after the onset of their response (~180 ms from stimulus onset; Li et al., 1993
). The 100 ms delay of this diagnostic activity, however, was reduced to only 10 ms following additional presentations. This shortening of response onset to a familiar stimulus may therefore reflect the efficiency involved in behavioral RT priming.