|Home | About | Journals | Submit | Contact Us | Français|
Human perception is supported by regions of ventral visual cortex that become active when specific types of information appear in the environment. This coupling has led to a common assumption in cognitive neuroscience that stimulus-evoked activity in these regions only reflects information about the current stimulus. Here we challenge this assumption for how scenes are represented in a scene-selective region of parahippocampal cortex. This region treated two identical scenes as more similar when they were preceded in time by the same stimuli compared to when they were preceded by different stimuli. These findings suggest that parahippocampal cortex embeds scenes in their temporal context in order to determine what they represent. By integrating the past and present, such representations may support the encoding and navigation of complex environments.
The visual system contains brain regions that respond preferentially to particular types of information, such as faces, bodies, words, and scenes (Kanwisher, 2010). The fact that these responses are time-locked to the appearance of an external stimulus has led to a common assumption that internal representations of these experiences only reflect information about the current stimulus. Here we show that a scene-selective area of parahippocampal cortex (PHC) represents more than the current scene. Specifically, we demonstrate that representations of visual scenes in PHC are shaped by temporal context—a moving window of recent experience.
Temporal context has a powerful influence on cognition, affecting core abilities such as episodic memory and language. In episodic memory, the temporal context model (TCM) posits that new memories are stored by binding current experience to a mental cache of what has been experienced recently (Howard and Kahana, 2002; Sederberg et al., 2008). This mechanism explains a range of memory behavior including the contiguity effect, whereby retrieval of a context increases the probability of retrieving episodes that occurred in close temporal proximity (Kahana, 1996). For example, it is beneficial to think back to when we last saw our keys in order to find where we lost them. In language, word meanings depend on the temporal contexts in which we hear or read them (Howard et al., 2011). For example, the word “bank” can refer to a monetary institution or the edge of a river, depending on the narrative in which it occurs.
Here we test whether scene perception, like memory and language, depends on temporal context. We hypothesize that although neural responses are time-locked to the appearance of a scene, such responses (and the underlying information processing) can be contingent on visual input that is no longer available. We investigate the visual category of scenes and corresponding scene-selective PHC (parahippocampal place area, PPA; Aguirre et al., 1998; Epstein and Kanwisher, 1998) because temporal context may be especially important for representing places in the world. Temporal context may help our brains distinguish scenes that have the same visual properties but are located in different places (e.g., the insides of two McDonald’s), because what precedes these scenes in our experience is different (e.g., certain roads, stores, etc.). Likewise, temporal context may help connect scenes that are located in the same place but have different visual properties (e.g., two rooms in a house), because these scenes occur close together in our experience (e.g., when walking through the house).
To test the hypothesis that scene representations in PPA depend on temporal context we employ repetition attenuation (RA). RA (or repetition suppression, fMRI-adaptation) refers to the fact that brain regions respond less to repeated vs. novel stimuli (see Grill-Spector et al., 2006). RA is a powerful tool for studying the contents of neural representations: If changes to a stimulus dimension do not eliminate RA in a brain region, then it can be inferred that this region does not represent the manipulated dimension. If RA is eliminated (i.e., if the region treats the changed stimulus as novel), then it can be inferred that the manipulated dimension was represented. Thus, RA provides an index of whether a brain region represents two stimuli as the same or different (Turk-Browne et al., 2008). Using this logic, we measured RA when a scene was presented twice in the same temporal context vs. in two different temporal contexts. If PPA represents scenes in their temporal context, then a repeated scene in a repeated temporal context should elicit more RA than a repeated scene in a novel temporal context.
Twenty Princeton community members (9 female, 19 right-handed, mean age 24.1 years) participated in the main experiment. Two subjects were excluded from analysis: one withdrew halfway, and the other expressed fatigue and missed many responses (behavioral accuracy 3.0 SD below mean). Nine new subjects (5 female, 7 right-handed including 1 ambidextrous, mean age 23.9 years) participated in the control experiment. One subject was excluded from analysis because they missed many responses (behavioral accuracy 2.6 SD below mean). All subjects had normal or corrected-to-normal visual acuity, provided informed consent, and received monetary compensation. The study protocol was approved by the Institutional Review Board for Human Subjects at Princeton University.
Scene stimuli consisted of colorful photographs of indoor and outdoor places, subtending 8.8 × 8.8 ° on a screen behind the scanner bore, viewed with a mirror attached to the headcoil. Subjects fixated a central dot that remained onscreen throughout each run. Stimuli lasted 500ms, and subjects responded within 1500ms on a button pad.
Subjects completed two main task runs lasting 8min 52.5s each. For each scene, they judged whether it occurred indoors (e.g., living room, cockpit) or outdoors (e.g., village, landscape). In each run, 120 scenes were presented, separated by a jittered stimulus-onset-asynchrony sampled pseudorandomly from 3s (40%), 4.5s (40%), or 6s (20%).
The sequence of scenes was generated with a manipulation of temporal context. Scenes were assigned to 40 triplets (e.g., ABC, DEF). Each triplet contained two scenes (AB, DE) that appeared in advance of the first presentation of the critical third scene (C, F). The indoor-outdoor status of scenes in a triplet was counterbalanced within and across conditions by generating triplets reflecting all indoor/outdoor permutations. In the Repeated Context condition, triplets were repeated verbatim (ABC) in the second presentation, whereas in the Novel Context condition, the first two items were replaced by two novel items from the same indoor/outdoor category (preserving the sequence of responses) and the third item was left intact (GHF). For both Context conditions, the final item in the first presentation was the Novel Item and the final item in the second presentation was the Repeated Item (Fig. 1).
The triplets were sequenced in mini-blocks containing the first and second presentations of four triplets randomly interleaved. These mini-blocks were used to generate the trial order, but there were no timing or other cues to the boundaries between mini-blocks or triplets. To further obscure this algorithm, we randomly swapped triplets at the boundaries between mini-blocks with roughly 50% probability. This procedure created a distribution of repetition lags between 6 (~25s) and 21 items (~88s).
During debriefing, subjects were told that many images had been presented twice and that some of these repeated images were preceded by the same two images both times. When asked about their awareness of these manipulations, most subjects reported noticing only a few repeated images, and just two subjects reported noticing any triplet repetitions.
The control experiment was identical to the main experiment except that the final item in the second presentation of each Repeated Context triplet was replaced with a new item (Novel Item 2). In other words, these triplets contained the same two context items as the first presentation (AB), but a new final item (I) in place of the original one (C). This allowed us to assess the independent effect of repeating the context items on RA in the Repeated Context condition. The Novel Context condition and all other parameters were identical.
The jittering helped separate evoked BOLD responses to individual scenes. To further improve design efficiency, the timing and order of conditions was pre-generated. We simulated many possible random designs and computed collinearity between the resulting condition timeseries, choosing designs for which pairwise correlations were minimized (rs<0.3). Several different designs were used across subjects.
Subjects also completed a functional localizer run lasting 6min 6s to define bilateral PPA regions of interest (ROIs). An ROI approach was used to minimize multiple comparisons and improve statistical power. The localizer run alternated between six repetitions each of scene and face blocks (Turk-Browne et al., 2010). Blocks contained 12 stimuli presented for 500ms every 1500ms. The 18s of stimulation was followed by 12s of fixation before the next block. Subjects made the same indoor/outdoor judgment for scenes, and a male/female judgment for faces.
Data were acquired with a 3T Siemens Allegra head-only scanner and a Nova Medical headcoil. Functional images were obtained with a T2*-weighted EPI sequence: TR=1500ms, TE=28ms, matrix=64, field of view=224mm; flip angle=64°; thickness=5mm (3.5×3.5×5mm voxels). Twenty-six oblique axial slices aligned to the anterior-/posterior-commissure line were collected in interleaved order. In the main and localizer runs, 355 and 244 volumes were acquired, respectively. To align scans, coplanar FLASH and high-resolution MPRAGE T1-weighted anatomicals were collected.
For T1 equilibration, the scanner discarded three volumes and we discarded six more. Functional runs were preprocessed and analyzed in FSL (www.fmrib.ox.ac.uk/fsl), including correction of head motion and slice-acquisition time, spatial smoothing (5mm FWHM Gaussian kernel; Desmond and Glover, 2002; Norman-Haignere et al., 2012), and high-pass temporal filtering (128s period). Preprocessed runs were aligned to coplanar and high-resolution anatomicals and the standard MNI brain, and interpolated to 2mm isotropic voxels.
Evoked responses were estimated by fitting finite impulse response (FIR) functions in a general linear model (GLM) for each run. Twelve explanatory variables (EVs) were specified for each combination of: three triplet positions, two triplet presentations, and two Context conditions. For example, using ABC to reflect Repeated Context triplets, DEF/GHF to reflect Novel Context triplets, and subscript 1 and 2 to reflect the first and second presentation of each triplet, the 12 EVs were: A1, B1, C1, D1, E1, F1, A2, B2, C2, G1, H1, F2. For each EV, a constant-height delta function was placed at the volume corresponding to the onset of each trial of that type and the next 11 volumes, forming an FIR basis set of 12 regressors (modeling an 18s post-stimulus timecourse). Trials with incorrect or missed behavioral responses were assigned to a 13th 'Error' EV, again modeled with 12 regressors. This produced 156 regressors (13×12), to which we added six motion regressors (162 total). Parameter estimates were normalized to percent signal change by scaling with the min/max amplitude of the predicted effect, dividing by the run mean, and multiplying by 100 (mumford.fmripower.org/perchange_guide.pdf).
There were four conditions of interest: Novel Item/Repeated Context (C1), Repeated Item/Repeated Context (C2), Novel Item/Novel Context (F1), and Repeated Item/Novel Context (F2). The control experiment was analyzed identically, except that Repeated Item/Repeated Context (C2) was replaced with Novel Item 2/Repeated Context (I1). FIR timecourses for these conditions were extracted from each subject’s PPA ROIs. The peak response in each condition was defined as the average signal over timepoints whose responses (collapsed across conditions) did not differ from the numerical peak (ps>0.05). This procedure resulted in the peak being defined as the 4.5s timepoint alone in both experiments. Peak responses were analyzed using a 2 (Hemisphere: Right, Left) × 2 (Context: Repeated, Novel) × 2 (Item: Repeated, Novel) repeated-measures ANOVA. This ROI analysis approach based on FIR models and peak timepoints has been used extensively to examine repetition effects in PPA (e.g., Turk-Browne et al., 2006; Park et al., 2007; Schmitz et al., 2010). The same pattern of results was obtained when PPA data were modeled with a canonical hemodynamic response function (HRF) and when FIR responses were quantified as area under the curve.
Exploratory whole-brain analyses were conducted to examine effects outside of the ROIs. These analyses were similar to above, except that evoked BOLD responses were modeled with double-gamma HRFs. Despite not fitting as precisely, we used this HRF due to the difficulty in picking which FIR timepoint(s) should be the peak separately for each voxel. Thus, one parameter estimate was obtained in each condition at the first level, reflecting overall response amplitude. At the second level, estimates were combined across runs within-subject using a fixed-effects GLM. At the group level, contrasts were defined for the main effect of Item (overall RA), the interaction of Item and Context, and the simple effects of Item within Context conditions (Repeated Context RA, Novel Context RA). To evaluate reliability, we conducted non-parametric permutation tests with randomise in FSL. Resulting statistical maps were corrected for multiple comparisons (p<0.05) using a non-parametric cluster-mass procedure (cluster-forming threshold, p<0.001). Given the role of the hippocampus in encoding temporal contexts (e.g., Jenkins and Ranganath, 2010; Tubridy and Davachi, 2011) and matching new stimuli to the current context (e.g., Kumaran and Maguire, 2006; Morgan et al., 2011), we also conducted a targeted analysis within an anatomical mask of the bilateral hippocampus (from the Harvard-Oxford atlas) at a liberal uncorrected threshold (p<0.01, minimum 5 voxel extent).
The localizer was analyzed with HRFs like the whole-brain analyses. Separate regressors were entered for face and scene blocks. Resulting parameter estimates were compared within-subject to obtain a statistical map of scene-selectivity (scene > face). To define PPA ROIs, we chose the voxel with strongest scene-selectivity in bilateral posterior PHC, often in the collateral sulcus (Turk-Browne et al., 2010; Norman-Haignere et al., 2012). ROIs were found bilaterally in all subjects. Data were extracted with a 4mm FWHM Gaussian kernel centered on the peak coordinate (mean MNI coordinates, right: 29, −47, −14; left: −27, −49, −14). The results below were robust to the choice of ROI size, replicating identically with an 8mm kernel.
Performance on the indoor/outdoor task was excellent, with subjects achieving overall mean accuracy of 96.6%. Error trials were excluded from analysis. Possibly due to near-ceiling accuracy, the main effects of Item (F[1,17]=1.72, p=0.21) and Context (F[1,17]=1.35, p=0.26) did not reach significance. There was, however, a marginal interaction between Item and Context (F[1,17]=3.33, p=0.09), with better accuracy for Repeated vs. Novel Items in Repeated (98.6% vs. 95.8%; t=2.26, p<0.05) but not Novel Contexts (95.8% vs. 96.1%; t<1). In median response times (RTs), there was a main effect of Item (F[1,17]=36.20, p<0.001), with faster RTs for Repeated vs. Novel Items, but no main effect of Context (F[1,17]=1.16, p=0.30). There was also no interaction between Item and Context (F<1), with faster RTs for Repeated vs. Novel items in both Repeated (695ms vs. 723ms; t=3.43, p<0.005) and Novel Contexts (680ms vs. 721ms; t=3.61, p<0.005).
Although behavioral priming can mirror RA (e.g., Turk-Browne et al., 2006), there is no necessary relationship between these two measures: equal RA occurs in PPA when repeated scenes elicit slower and faster RTs than novel scenes (Xu et al., 2007). Moreover, behavioral priming is often more correlated with RA in prefrontal cortex than in visual cortex (Schacter et al., 2007). We thus rely on ROI analyses of PPA to test our hypothesis.
The difference in evoked BOLD responses for Novel Item and Repeated Item provides an index of RA. If PPA only represents properties of the current stimulus, then equivalent RA should be observed for all Repeated Items, irrespective of whether they occur in a Repeated Context or a Novel Context. If PPA represents the current scene in its temporal context, then RA should be greater when the temporal context 'dimension' of a Repeated Item is also repeated, and more RA should be observed in a Repeated Context than a Novel Context.
There was a main effect of Item (F[1,17]=12.42, p<0.005), reflecting robust overall RA (Novel Item > Repeated Item). There were no main effects of Hemisphere or Context (ps>0.11). There was an interaction between Item and Hemisphere (F[1,17]=10.65, p<0.005), reflecting more RA in Right vs. Left PPA. Critically, there was a significant interaction between Item and Context (F[1,17]=4.77, p<0.05), with more RA for Repeated Context vs. Novel Context (Fig. 2). No other interactions reached significance (ps>0.85). Planned comparisons revealed RA bilaterally for Repeated Context (Right: t=3.69, p<0.005; Left: t=2.24, p<0.05) but only in right PPA for Novel Context (Right: t=2.37, p<0.05; Left: t<1).
We interpret greater RA in the Repeated Context condition as reflecting activation by the Repeated Item of a temporal-context-dependent representation. An alternative explanation is that the greater Repeated Context RA reflected carryover of RA from the preceding context items: Repeated Items in Repeated Contexts were preceded by two other repeated items, while Repeated Items in Novel Contexts were preceded by two novel items. That is, repeated context items may have suppressed PPA in advance of the Repeated Item, ensuring a weaker response.
We ran a control experiment with new subjects to evaluate this alternative. Instead of presenting a Repeated Item in the Repeated Context, a new item (Novel Item 2) was presented instead (Fig. 3A). If greater RA in the Repeated Context condition of the main experiment can be attributed to the repetition of context items, then Novel Item 2 should elicit a weaker response than the original Novel Item 1. Because the control experiment design was not factorial (Repeated Context contained only Novel Items, Novel Context contained Repeated and Novel Items), we relied on planned comparisons.
The Novel Context condition replicated the main experiment: RA (Novel Item > Repeated Item) was found in Right (t=2.78, p<0.05) but not Left PPA (t=1.86, p=0.08). Critically, in the Repeated Context condition, no difference between Novel Item 1 and Novel Item 2 was observed in Right or Left PPA (Fig. 3B; ps>0.62). Aside from the fact that we successfully obtained RA for Novel Context, and that the Repeated Context difference was numerically in the wrong direction, a bootstrapping analysis revealed that this failure to reject the null hypothesis was unlikely to be due to a lack of statistical power (p=0 that the Novel Item 1 – Novel Item 2 difference was obtained from the same distribution as the Novel Item – Repeated Item difference for Repeated Context in the main experiment). Thus, repeated context items per se did not produce the stronger RA in the Repeated Context condition of the main experiment.
Exploratory analyses were conducted to consider effects outside of PPA (corrected p<0.05, center-of-gravity MNI coordinates). The main effect of Item (Novel Item > Repeated Item) produced two clusters, right PHC (31, −42, −17) and left transverse occipital sulcus (TOS; −34, −87, 9). In addition to PPA/PHC, the TOS is another scene-selective region. The interaction of Item and Context (Repeated Context RA > Novel Context RA) did not reveal any clusters at the corrected threshold. The simple effect of Item for Repeated Contexts produced one cluster, right PHC (31, −46, −20), but no clusters were obtained for Novel Contexts at the corrected threshold. These results mirror the ROI data, with stronger RA in right PHC and in Repeated Contexts; although we benefitted from the increased sensitivity of ROI analyses for observing the interaction of Item and Context in PPA.
A targeted analysis of the hippocampus was also conducted based on its involvement in contextual processing (uncorrected p<0.01). The main effect of Item did not reveal any clusters above the extent threshold. The interaction of Item and Context produced one cluster in right hippocampus (28, −17, −15). This interaction did not reflect stronger Repeated Context RA, since the simple effect of Item for Repeated Contexts only produced a cluster in left hippocampus (−26, −27, −11). Rather, the interaction reflected weaker Novel Context RA, as evidenced by a negative simple effect of Item (Repeated Item > Novel Item) for Novel Contexts in the same right hippocampus region (29, −17, −16). The RA in left hippocampus for Repeated Contexts is analogous to findings that this region responds less to items expected in the current context (e.g., Kumaran and Maguire, 2006; Morgan et al., 2011). The repetition enhancement in right hippocampus may reflect the ability in TCM for items to retrieve their associated context (Sederberg et al., 2008), which was possible for Repeated Items in Novel Contexts, but impossible for all Novel Items (no associated context) and unnecessary for Repeated Items in Repeated Contexts (correct context was already active).
Our findings provide evidence that scene representations in PPA depend on temporal context. In the main experiment, the magnitude of RA was more than 4× greater in the Repeated Context condition (0.41%) than in the Novel Context condition (0.09%). Because previous studies used randomized trial histories (e.g., Turk-Browne et al., 2006), which mirrored our Novel Context condition, they may have underestimated the true magnitude of RA. These studies intended to repeat stimuli, but assumed that the brain’s definition of a repetition matched the experimenter’s: that a repetition occurred when a stimulus had been seen before, irrespective of whether it had appeared in the same temporal context. We demonstrate that the definition of a scene repetition should be broadened to include temporal context in PPA.
There are two potential explanations of how temporal context affects RA. First, PPA may exhibit hysteresis, a general property of many mechanical, biological, and economic systems in which output can only be predicted from input with knowledge about initial conditions (i.e., the starting state of the system). Recent visual experience may alter the state of PPA by inducing persistent activity, which may in turn provide scaffolding for the representation of a novel scene. When the scene is re-encountered, it may not fully benefit from this prior experience unless PPA has been returned to the same state by repeating the temporal context.
Second, temporal context may affect RA as a result of prediction. TCM provides a potential mechanism for such predictions, whereby the current context—a recency-weighted running average of experience—cues the retrieval of information that was previously bound to that context. In our case, the context includes the preceding two items, which may cue the retrieval of the third item. TCM was developed based on word list recall and other episodic memory tasks, and linked to the hippocampus and medial temporal lobe more broadly (e.g., Kumaran and Maguire, 2006; Jenkins and Ranganath, 2010; Manning et al., 2011; Morgan et al., 2011; Tubridy and Davachi, 2011). Our exploratory analyses of the hippocampus are consistent with the possibility that these mechanisms operate incidentally and may be responsible for context-dependent RA in PPA. Specifically, the hippocampus may spontaneously retrieve items associated with the current context and reinstate them in PPA (Turk-Browne et al., 2010), such that greater RA occurs when a predicted scene appears.
Neural models of RA (see Grill-Spector et al., 2006) often focus on local processes and interactions (i.e., fatigue and sharpening models). In contrast, the prediction account fits best with a facilitation model in which hippocampal or other feedback potentiates the representation of the predicted item, reducing processing latency/duration (or prediction error; Friston, 2005). The fact that expectations about stimulus repetition enhance RA has been interpreted similarly (Summerfield et al., 2008). Our findings represent a novel discovery because the general probability of item repetition was held constant throughout our study, with predictability depending only on the identities of the preceding context items and prior learning of specific item-item associations.
As an implicit measure, RA may provide a new way to answer fundamental questions about context. For example: How many preceding items need to be repeated to elicit maximal RA? Do they need to be repeated in the same order? How is the length of the context window affected by stimulus complexity? Are there stable individual differences in context window length and do they predict memory abilities? Answers to such questions have important implications for theoretical models of memory such as TCM, where a definition of temporal context has proven elusive.
The current findings only license conclusions about scenes and PPA. However, they are consistent with three broader interpretations: First, temporal context may be an important property of how all objects are represented in their preferred category-selective areas. Second, effects of temporal context may be limited to PPA and to scenes, possibly reflecting a greater importance of temporal context for that category. Third, effects may be limited to PPA but occur there for objects from any category (cf. debate about semantic context in PHC; Bar et al., 2008; but see Epstein and Ward, 2010). By establishing this novel phenomenon, we hope to stimulate future research to adjudicate between these possibilities. For instance, each account makes a different prediction about whether temporal context should influence RA for other objects (e.g., faces): in their preferred area (e.g., fusiform face area), at all, or in PPA, respectively.
By integrating past and present, PPA and PHC more generally may serve as a bridge between perception and memory. Given the importance of temporal context in episodic memory (Howard and Kahana, 2002; Sederberg et al., 2008), our findings suggest a parsimonious explanation for why this region is frequently implicated in both domains (e.g., Brewer et al., 1998; Wagner et al., 1998; Turk-Browne et al., 2006). Ultimately, the context-dependence of scene representations in PHC may help us encode and navigate complex environments. Indeed, PHC has been implicated in spatial navigation beyond the processing of scenes (e.g., Aguirre et al., 1996; Janzen and van Turennout, 2004; Zhang and Ekstrom, 2012). Temporal context may help PHC to encode the structure of the environment in the service of navigation, both by differentiating similar looking scenes that are located in different places and by stitching together different looking scenes that appear nearby in space.
We thank Jon Cohen, Lila Davachi, and Ken Norman for helpful conversations, and Alexa Tompary for technical assistance. This work was supported by National Eye Institute grant R01EY021755 to NTB, and does not necessarily represent the official views of the National Eye Institute or the National Institutes of Health.