|Home | About | Journals | Submit | Contact Us | Français|
Memory for naturalistic events over short delays is important for visual scene processing, reading comprehension, and social interaction. The research presented here examined relations between how an ongoing activity is perceptually segmented into events and how those events are remembered a few seconds later. In several studies participants watched movie clips that presented objects in the context of goal-directed activities. Five seconds after an object was presented, the clip paused for a recognition test. Performance on the recognition test depended on the occurrence of perceptual event boundaries. Objects that were present when an event boundary occurred were better recognized than other objects, suggesting that event boundaries structure the contents of memory. This effect was strongest when an object’s type was tested, but was also observed for objects’ perceptual features. Memory also depended on whether an event boundary occurred between presentation and test; this variable produced complex interactive effects that suggested that the contents of memory are updated at event boundaries. These data indicate that perceptual event boundaries have immediate consequences for what, when, and how easily information can be remembered.
One function of perception is to divide continuous experience into discrete parts, providing a structure for selective attention, memory, and control. This is readily observed in scene perception, in which objects are segmented from backgrounds (e.g., Biederman, 1987; Vecera, Behrmann, & McGoldrick, 2000; Woodman, Vecera, & Luck, 2003) and in discourse processing, in which transitions between clauses and narrated situations influence reading times and discourse memory (e.g., Clark & Sengul, 1979; Glenberg, Meyer, & Lindem, 1987). Similarly, an online perceptual process called event segmentation, divides ongoing activity into events (see Zacks, Speer, Swallow, Braver, & Reynolds, 2007 for an in-depth review). For example when watching someone boil water, an observer might divide the actor’s activity into getting a pot from a rack, filling the pot with water, setting the pot on the burner, turning on the burner, and bringing the water to a boil. The experiments presented in this paper investigated whether event segmentation also provides a structure for event memory: Because event segmentation separates “what is happening now” from “what just happened,” it may impact the ease with which recently encountered information is remembered. For example, it may be more difficult for an observer to retrieve information about the pot-rack once the “getting-a-pot” activity has ended and the “filling-the-pot-with-water” activity has begun.
Previous research on event segmentation provides compelling evidence that it is an important and ongoing component of perception (Zacks & Swallow, 2007). Event segmentation is commonly measured by asking participants to explicitly identify event boundaries, which separate natural and meaningful units of activity (Newtson, 1973). However, functional neuroimaging data indicate activities are segmented even as naïve observers passively view activities (Speer, Swallow, & Zacks, 2003; Zacks et al., 2001; Zacks, Swallow, Vettel, & McAvoy, 2006). In addition, observers tend to agree about when event boundaries occur (Newtson, 1976). This likely reflects observers’ tendency to segment events at points of changes. Changes may be in perceptual information, such as an actor’s position and object trajectories (Hard, Tversky, & Lang, 2006; Newtson, Engquist, & Bois, 1977; Zacks, 2004), or in conceptual information, such as an actor’s location, intentions, or goals (Speer, Zacks, & Reynolds, 2007). For example, when watching a person read a book on a couch, observers might identify an event boundary when the actor changes his position from sitting to lying down and again when he closes the book, signaling a change in his goals. However, event boundaries are not identified when large shifts in visual input that accompany cuts in film occur (e.g., a cut from a wide-angle shot to a close-up), unless the cut coincides with a change in the scene or activity (Schwan, Garsoffky, & Hesse, 2000). These data suggest that event boundaries may be characterized as points of perceptual and conceptual changes in activity separated by periods of relative stability.
Event Segmentation Theory (EST) offers a theoretical perspective on how the neurocognitive system implements event segmentation (Zacks et al., 2007). At its core, EST claims that segmentation is a control process that regulates the contents of active memory. According to EST, observers build mental models of the current situation (event models) to generate predictions of future perceptual input. Event models are based on current perceptual input and semantic representations of objects, object relations, movement and statistical patterns, and actor goals for the type of event currently perceived (event schemata, Bartlett, 1932; Glenberg, 1997; Johnson-Laird, 1989; Rumelhart, Smolensky, McClelland, & Hinton, 1986; Zwaan & Radvansky, 1998). For as long as they accurately predict what is currently happening, event models are shielded from further modification by a gating mechanism. When the event changes, the accuracy of predictions generated from the event model decreases and prediction error increases. High levels of prediction error trigger the gating mechanism to open, causing the event model to be reset and rebuilt. When event models are rebuilt, incoming perceptual information (such as information about objects and actors) is processed in relation to other elements of the event and to semantic representations. Once accurate perceptual predictions can be generated from the event model, the gate closes to prevent further modification of the event model. EST proposes that event boundaries correspond to those moments when event models are reset and updated with new information.
EST draws on several theories of discourse processing, comprehension, and cognitive control. The Structure Building Framework (Gernsbacher, 1985) and Event Indexing Model (Zwaan & Radvansky, 1998) indicate that observers build mental models of the current situation in order to comprehend a narrated situation. They also propose that mental models are either rebuilt or updated when information that is incongruent with the current model is encountered. However, EST proposes that event perception is predictive (cf., Wilson & Knoblich, 2005), rather than integrative (Gernsbacher, 1985; Zwaan & Radvansky, 1998). Several models of working memory and cognitive control also posit that a gating mechanism shields active representations of one’s own goals from perceptual input (Botvinick, Braver, Barch, Carter, & Cohen, 2001; Frank, Loughry, & O’Reilly, 2001; O’Reilly, Braver, & Cohen, 1999). The gating mechanism proposed in EST is similar to these in implementation (Reynolds, Zacks, & Braver, 2007). However, unlike these models EST claims that active memory for the current, observed event is flushed when the model is updated.
EST has strong implications for long-term memory for events–episodic memory–and for the short-term accessibility of information relevant to those events. First, event boundaries should have a privileged status in long-term memory. When the gating mechanism opens at event boundaries, boundary information should be processed more fully, making greater contact with relevant semantic knowledge of the current event and activities. When considering individual objects and actors that are present at boundaries, these should also be processed in relation to each other as a part of the formation of a new event model. Second, EST’s claim that models of the current event are actively maintained in memory suggests that different mechanisms are used to retrieve information from previous events (stored in long-term, weight based representations) than are used to retrieve information from the current event.
Evidence in favor of the prediction that long-term memory is better for boundaries than for nonboundaries is strong: Information is better encoded and later retrieved if it is presented at event boundaries rather than at nonboundaries (Baird & Baldwin, 2001; Boltz, 1992; Hanson & Hirst, 1989, 1991; Lassiter, 1988; Lassiter & Slaw, 1991; Lassiter, Stone, & Rogers, 1988; Newtson & Engquist, 1976; Schwan & Garsoffky, 2004; Zacks, Speer, Vettel, & Jacoby, 2006). For example, after watching a film showing goal-directed activities, observers better recognize movie frames from boundary points than from nonboundary points (Newtson & Engquist, 1976). In another study, participants were asked to view complete films, films that preserved event boundaries and omitted nonboundaries, and films that preserved nonboundaries and omitted event boundaries (Schwan & Garsoffky, 2004). Event recall and recognition were similar for complete movies and for movies that preserved event boundaries, but were poor for movies that omitted event boundaries. Thus, memory for events appears to rely on the information that is presented at event boundaries.
The majority of evidence that event boundaries affect memory for recent information comes from research on the mental representations used to understand text and discourse. Early work on this topic demonstrated that people’s ability to reproduce discourse verbatim was markedly compromised after syntactical boundaries (e.g., the end of a clause or sentence; Clark & Sengul, 1979; Jarvella, 1979). More recently, others have examined how changes in the situation described in text and discourse influence the accessibility of recently encountered information. This work has been primarily driven by the proposal that readers’ comprehension relies on models that represent the current described situation (situation models; Gernsbacher, 1985; Glenberg, 1997; Johnson-Laird, 1989; Zwaan & Radvansky, 1998). The claim is that situation models are updated when the situation changes. Therefore, situation changes should take longer to process and should mark when information from the previous situation becomes more difficult to retrieve. Reading time data are consistent with this proposal (Mandler & Goodman, 1982; Zwaan, Magliano, & Graesser, 1995; Zwaan, Radvansky, Hilliard, & Curiel, 1998) and a large body of research shows that situation changes alter the accessibility of information recently presented in text (Glenberg, 1997; Johnson-Laird, 1989; Levine & Klin, 2001; Rapp & Taylor, 2004; Zwaan, Langston, & Graesser, 1995; Zwaan & Radvansky, 1998). Thus, when a protagonist is described as taking off his sweatshirt, putting it down, and then running away, readers have more difficultly recognizing “sweatshirt” than if the protagonist had been described as putting on the sweatshirt (Glenberg et al., 1987). Other work has tied situation changes specifically to event boundaries in text and in film (Magliano, Miller, & Zwaan, 2001; Speer & Zacks, 2005; Speer et al., 2007) and has shown that readers have difficulty accessing information that was encountered prior to event boundaries in text and picture stories (Gernsbacher, 1985; Speer & Zacks, 2005). Finally, there is limited evidence that changes in observed activity and in one’s own spatial location reduce the accessibility of information encountered before the change occurred (Carroll & Bever, 1976; Radvansky & Copeland, 2006). These studies offer indirect support for the hypothesis that boundaries in perceived events impact the retrieval of event information. However, none have directly evaluated the relationship between event segmentation in perception and memory for recently encountered information.
These data suggest that information from the current event should be more quickly and accurately retrieved than information from a previous event. However, it is also possible that information maintained in event models may interfere with retrieval from the current event, but not with retrieval from previous events. Interference from multiple competing representations may increase with increased similarity between the to-be-retrieved item and other information maintained in memory, and when the learning and retrieval situations are similar (Anderson & Neely, 1996; Bunting, 2006; Underwood, 1957). Therefore, under some circumstances information from the current event may be more difficult to retrieve than information from the previous event.
In summary, activities are segmented into smaller events as they are perceived. People tend to segment events when there are changes in actors’ positions or movement characteristics, changes in locations, and changes in the goals or intentions of actors. According to EST, when an event boundary occurs mental representations of the current event are updated and actively maintained until the next boundary. This theory suggests that information should be better encoded at event boundaries and that the accessibility of recently encountered information should change once an event boundary occurs.
The goals of these experiments were twofold. First, three experiments investigated the association between event segmentation and encoding and retrieval. EST proposes that active representations of “what is happening now” are built at event boundaries. If this is the case, the occurrence of event boundaries during object presentation (presentation-boundaries) should lead to additional processing of those objects. Objects present when an event boundary occurs (boundary objects) should therefore be better encoded than objects for which no boundaries occurred during presentation (nonboundary objects). For example, in one of our stimulus movies a man and his sons are gathering bed sheets. After they gather the sheets, the film cuts to a shot of the actors carrying the laundry down the stairs. An event boundary occurs soon after the cut (reflecting a change in activity and location). A chandelier is on the screen when the boundary occurs (making it a boundary object) and it is later tested. According to EST a new event model should be constructed at the event boundary, and it should contain information about the chandelier and other objects in the scene (such as the pictures on the walls), the objects’ configuration, and the actors’ new inferred goals. The chandelier is processed more as a part of the formation of an event model. As a result, it should be remembered better than if it had been presented when no boundaries occurred (e.g., after the actors started going down the stairs).
A second prediction of EST is that event boundaries should influence retrieval by changing the accessibility of recently presented objects. Event boundaries that occur between object presentation and test (delay-boundaries) determine whether an object must be retrieved from the current event or from a previous event. Delay boundaries could have two effects on accessibility: They could reduce the accessibility of objects from previous events (which need to be retrieved from long-term memory) or they could increase the accessibility of objects from previous events (which should be less susceptible to interference from similar information in active memory). Finally, because the accessibility of an object from a previous event should depend on whether it has been encoded into long-term memory, the effect of delay-boundaries should differ for boundary and nonboundary objects. In the previous example, after the actors go down the first set of stairs the film cuts to a shot showing them carrying the laundry into a basement. An event boundary occurs at this point, reflecting the change in the actors’ location. At this point in time the model for the previous event would be flushed, and a new event model built. Because the chandelier was presented in the previous event it would now need to be retrieved from long-term memory. If the chandelier had not been encoded into long-term memory (which is likely for nonboundary objects), it should be poorly recognized; if it had been encoded into long-term memory (which is likely for boundary objects), the chandelier should still be recognizable.
The second goal of these studies was to characterize the types of information that are stored in event models and that contribute to event memory. Early work on discourse memory demonstrated a dissociation between memory for the lexical content and syntactic structure of a sentence and memory for the meaning of a sentence. Although participant’s ability to recognize changes to the surface features of a sentence drops once a second sentence is presented, their ability to recognize changes to semantic content is well preserved (Sachs, 1967, 1974). The second and third experiments investigated memory for two similar types of information. Like semantic content, conceptual information in scenes consists of object, character, and scene types (Rosch, 1978). Like surface information, perceptual information in scenes allows one to discriminate between individuals in a category; it includes color, shape, orientation, size, and statistical structures. Perceptual information is distinguished from sensory primitives that form an image-like representation of the scene and that have undergone little processing (Oliva, 2005; Schyns & Oliva, 1994; Tarr & Bülthoff, 1998). In tests of conceptual memory, participants chose between a picture of an object that was the same type of object as the one being tested (e.g., a different chandelier) and a picture of an object that was a different type of object (e.g., a ceiling fan). In tests of perceptual memory, participants chose between a picture of the object from the movie (e.g., the chandelier) and a picture of the same type of object (e.g., a different chandelier). EST suggests that memory for both conceptual and perceptual information will be better for boundary objects than for nonboundary objects (event models should be constructed from semantic and perceptual input at event boundaries). Furthermore, if the purpose of event models is to generate perceptual predictions, memory for perceptual information may be more susceptible to delay-boundaries than memory for conceptual information (see also Gernsbacher, 1997).
To examine the relationship between event segmentation and memory, these experiments used clips from commercial films that presented objects within the context of common, everyday activities. The movies were engaging and activities complex, encouraging attention to the activities in the films. This should reduce the role of strategies that ignore the activities in favor of attending to objects. It also avoids concerns over the realism of materials constructed in the laboratory, which may appear contrived and could lack the variety and complexity of events encountered in everyday life. An important drawback to this approach, however, is that it precludes the assignment of individual objects to each of the experimental conditions. Therefore, pre-existing differences in the objects (e.g., the speed with which the object can be identified in a scene) were evaluated in two pilot experiments and the analyses used regression to statistically control for these and other object features (e.g., object size).
Experiment 1 tested recognition memory for objects presented in film to evaluate the two predictions derived from EST: first, that information encoded during an event boundary would be better remembered; second, that retrieving information from a previous event would differ from retrieving information from the current event. Previous research has demonstrated that long-term memory for movie frames is better when the frames come from event boundaries than when they come from nonboundaries (Newtson & Engquist, 1976). However, the extant data on the effect of event boundaries on episodic memory say little about its underlying mechanisms: Because whole scenes were tested with foils chosen unsystematically, little is known about the content of the affected representations. Previous tests have all been conducted with delays greater than several minutes, so it is unclear how quickly these effects appear. In addition, previous work has not directly examined the relationship between event boundaries and object memory in ongoing, perceived events at short delays.
For this experiment, participants watched movie clips that showed actors engaged in everyday activities in naturalistic environments. About once a minute the clips were stopped for a forced choice recognition test measuring memory for an object presented five seconds earlier. Gaze position was monitored while the movie played. After each movie, participants answered questions about the activities and goals of the characters in the clips; these were included to motivate participants to attend to the actors’ activities. Prior to the experiment, a separate group of observers segmented the movies, allowing us to vary two factors across trials. The presentation-boundaries factor described whether an event boundary occurred during object presentation, and the delay-boundaries factor described whether an event boundary occurred during the five second delay between object presentation and test.
Fifty-two participants (33 female, 18–28 years old) from the Washington University community participated in this study for partial fulfillment of course credit or for pay. The Washington University Human Research Protection Office approved all consent, recruitment, and experimental procedures. Data from an additional seven participants were collected and replaced due to noisy eye data (n = 4), failure to follow instructions (n = 2), or failure to complete the experiment (n = 1).
Stimuli were presented with PsyScope X software (Cohen, MacWhinney, Flatt, & Provost, 1993) on a 22″ NEC Multisync monitor and were controlled by a Power Macintosh G4 (Apple Inc., Cupertino, CA). Headphones were used for auditory presentation. Gaze position and pupil diameter were measured and recorded with an IScan RK-426PC Pupil/Corneal Reflection Tracking System (ISCAN, Inc., Burlington, MA) connected to a Pentium II Personal Computer. Eye data recording was synchronized to the onset of the clips via the PsyScope button box.
Participants were seated with their eyes approximately 86 cm from the computer monitor. Gaze position and pupil diameter were recorded at 60 samples per second. Gaze position was calibrated for 9 equidistant points on the screen at the beginning of the experiment and was converted to visual degrees offline. Blinks were identified with an algorithm that searched for rapid, transient drops in pupil diameter. Gaze position was interpolated over blinks (Beatty & Lucero-Wagoner, 2000).
Clips from four commercial movies were selected according to the following criteria: the clips depicted characters engaged in everyday activities with little dialogue; the clip’s setting was natural and realistic; the clips presented objects suitable for testing once per minute, on average; the clips from a given film lasted at least 7 minutes, though the footage needed not be contiguous in the original film. Five clips from the films Mr. Mom (Dragoti, 1983), Mon Oncle (Tati, 1958), One Hour Photo (Romanek, 2002), and 3 Iron (Ki-Duk, 2004) met these criteria. Scenes from the film 3 Iron (2004) were presented in two clips to allow the experimenter to introduce a new character in the second clip. Clips were edited to respect the natural breaks in the film. The beginnings and ends of each clip were padded with 5 seconds of a black screen. A practice clip was taken from the movie The Red Balloon (size, in visual degrees: 17.8° × 13.3°; Lamorisse, 1956). Clip properties and content are described in Tables 1 and and22.
To determine when normative event boundaries occurred in the clips, an independent group of 16 participants watched the clips twice; once to indicate when they believed boundaries between large units of activity occurred, and again to indicate when they believed boundaries between small units of activity occurred (coarse- and fine-segmentation tasks). For a given clip and grain, the button-press data from all participants were combined and smoothed with a Gaussian kernel (bandwidth = 2.5 seconds for coarse boundaries and 1 second for fine boundaries) to obtain the density of button presses for each millisecond of the clip. Peaks in the smoothed density functions were identified and sorted in decreasing order of density. Normative event boundaries were defined as the N time points with the highest peaks, where N equaled the mean number of times participants pressed the button for that clip and segmentation grain (see Table 3). This procedure identified event boundaries that were most characteristic of observers’ segmentation in terms of temporal location and number. 1
Two-alternative forced choice recognition tests were administered during the first viewing of each clip. Most of the time (74.5%), the clip stopped for an object test, which tested memory for objects presented in the film. Other times (25.5%) the clip stopped for an event test, which tested memory for the activities depicted in the film. The object test alternatives were an image of an old object, which was an object presented in the clip and an image of a different type object, which was an object that was not in the clip but was contextually appropriate. For example, in one scene a rectangular aquarium is shown in a child’s room. Participants were asked to choose between a picture of the rectangular aquarium and a picture of a baseball glove, which was not shown in the clip. In all, there were 35 object tests and 12 event tests.
To select the old objects, all objects in the clips that were continuously visible for at least one second and that were dissimilar to other objects in the clip were identified. Objects that were considered likely to be fixated and were identifiable outside the clip were selected. When possible, test images were made by tracing the objects and pasting them onto a white background. When this resulted in pictures that were deemed difficult to identify (usually due to scene lighting), objects were cropped with a rectangular bounding box, which preserved a small amount of the surrounding scene.
The foil for the recognition test consisted of an object (token) from a category of objects that deemed contextually appropriate for the clip (e.g., a baseball glove for a scene in a child’s room) but that did not occur in the clip for up to five seconds after the old object was presented. Foils therefore differed from the old objects in category (different type objects). A second set of recognition test alternatives was generated at the same time for use in later experiments. These same type objects were tokens from the same category of objects as the old object (e.g., a round aquarium instead of a rectangular aquarium). Images for the different type and same type objects were obtained from pictures that featured the objects as the primary subject. These pictures were obtained from the Big Box of Art (Hemera, 2003), online searches, and from personal photographs. Test alternatives were constructed with following steps: 1) the object was traced and cut from the original photograph; 2) the object was pasted into the frame containing the old object; 3) the object was moved and resized to fit within the scene; 4) the color values, saturation levels, contrast, resolution and noise of the object were adjusted until they matched the rest of the frame; and 5) the object was cut from the frame with procedure used to cut the old object from the frame.
Two control experiments were conducted. In a modified match-to-sample task participants were shown the frame from the clip from which the old object image was constructed. Below the frame, two objects were presented and participants were asked to determine which object matched an object in the frame. One group of 10 participants was shown the old object and same type object and was asked to indicate which exactly matched an object in the frame (perceptual match). Another group of 10 participants was shown the same type object and the different type object and was asked to indicate which object was the same type as an object in the frame (conceptual match). Objects were excluded if fewer than 80% of participants were able to correctly perform the perceptual and conceptual match tasks or if response times for that object were greater than two standard deviations above the mean response time for all objects.
The final set of old objects was restricted so no two old objects occurred in the same five second period. The event tests and descriptions of the final set of old objects, different type objects, and same type objects are available online at http://dcl.wustl.edu/stimuli/.
Participants performed a memory task for which they viewed the clips and responded to event and object recognition tests. Clip order was counterbalanced across participants. All stimuli were presented on a white background. Participants started a trial by pressing the space bar. At the beginning of the memory task participants fixated a black cross (1° × 1°) at the center of the screen for 2 s after eye calibration was checked (to check calibration, the computer determined whether the participant looked within 2.5° of the fixation as instructed). The movie clip then appeared at the center of the screen. Approximately once a minute the clip stopped for an object or event recognition test. Recognition tests are illustrated in Figure 1. For object tests, the clip stopped five seconds after the old object left the screen. The question “Which of these objects was just in the movie?” appeared 4.15° above a fixation cross at the center of the screen. The old object and its corresponding different type object were presented 4.30° to the left and right of the fixation cross. Event tests proceeded in the same manner as object tests and occurred approximately five seconds after the end of the activity that was being tested. For these tests a question about the activity that occurred five seconds before the test was presented (e.g., “Who started the music?”), and the two answer choices consisted of the correct answer and a reasonable alternative (e.g, “The man” “The woman”). Participants responded by pressing the “J” key for the answer choice on the left or the “K” key for the answer choice on the right. The side on which the correct choice appeared was randomly determined on each trial. The test remained on the screen until the participant responded. On the next trial, the last 10 seconds of the clip were replayed to reinstate the context of the film and to provide feedback on the recognition tests. This process was repeated until the entire clip was presented. The end of the clip was not followed by either an object or event test.
Five multiple-choice comprehension questions followed the end of each clip. These questions focused on the activities, intentions, and goals of the characters. Each question had four alternatives labeled “a” through “d;” participants responded by pressing one of four correspondingly labeled keys on the computer keyboard. Following correct responses the computer beeped, and following incorrect responses the computer buzzed. To ensure that the questions were not too difficult, an independent group of eight participants answered comprehension questions after watching each clip in its entirety. For three comprehension questions from the 3 Iron clips accuracy was near chance, and these questions were replaced prior to conducting the main experiment. After removing these items, mean individual accuracy was 94.3% (SD = 5.83), and the mean of their individual median response times was 7219 ms (SD = 998 ms).
Participants performed a practice session before the memory task. To encourage them to attend to the events depicted in the clip, six of the eight trials presented during the practice clip were event tests.
Following informed consent, each participant sat in front of the computer monitor and the experimenter calibrated the eye tracker. Afterwards, the instructions appeared on the screen, and the experimenter read these aloud. Participants were told that the actions of the characters were the primary focus of each clip and were asked to pay attention to the activities in order to answer several multiple-choice questions after each clip. They were also told that the clips would stop every now and then for a question about an activity or an object that was just presented in the clip, and they should answer these questions quickly and accurately. Because the clips often came from the middle of longer movies the experimenter read a brief introduction to the clip to provide the participant with relevant background information. The experimenter left the room after the practice, but returned between the clips to read the introductions. Participants were offered a break after the first half of the clips.
Two main factors were of interest. The presentation-boundary factor designated whether an event boundary (coarse or fine) occurred while the object was on the screen. If either a coarse or fine event boundary occurred during object presentation, the object was in the boundary object condition. If no boundaries occurred during object presentation, the object was in the nonboundary object condition. The delay-boundary factor designated whether an event boundary occurred during the five second delay between object presentation and test. For some objects, no boundaries occurred during the delay and these objects were therefore presented in the current event; the rest of the objects were presented in the previous event because at least one boundary, coarse or fine, occurred during the delay. The presentation-boundary and delay-boundary factors were crossed, resulting in four object test conditions (see Figure 2). There were 7 objects in the nonboundary object, current event condition, 8 objects in the boundary object, current event condition, 9 objects in the nonboundary object, previous event condition, and 11 objects in the boundary object, previous event condition. In all cases, the delay between object presentation and test was five seconds.
A final factor coded whether the old object was fixated. The location and size of each old object were tracked with internally developed software. For each object the time during which gaze position fell within a bounding box drawn 0.25° around the outside of the object was calculated. If the participant’s gaze fell within the bounding box for at least 200 msec the object was coded as a fixated object. If not, the object was coded as a nonfixated object. For analyses that included fixation as a factor, trials on which measures of gaze position were too noisy were excluded.2 Because individual fixation patterns varied, a given object could be in different fixation conditions for different participants.
Several variables were used as predictors in regression models of individual recognition test performance to account for differences in the old objects. These variables included the average size and eccentricity of the old object3 and median response times in the perceptual and conceptual match to sample tasks. (Accuracy was near ceiling.) Object size and eccentricity were similar across conditions (largest F(1, 31) = 2.15, p = .152), but were included to better control for differences across the objects. Mean response times on the perceptual and conceptual match tasks varied across conditions after accounting for differences in the size and eccentricity of the objects (delay-boundary × presentation-boundary interaction, perceptual match: F(1, 9) = 21.3, p = .001; conceptual match: F(1, 9) = 15.5, p = .003; Table 4). Because the match to sample tasks required participants to search for and identify the objects in the movie frames, response times should capture differences in the ease with which the objects could be perceived and encoded in the scene.
For the accuracy analyses, we calculated one logistic regression model for each individual, and entered estimates from these models into t-tests to determine the reliability of the effects across participants. For the figures and post-hoc analyses, accuracy was estimated for an “average” old object (size, eccentricity, and perceptual and conceptual match response times were set to the means for all old objects) for each trial and individual using the regression coefficients from the individual models. Post hoc tests were performed on the logits of accuracy.4 (Unless otherwise noted, all post hoc tests were Tukey’s HSD procedure, signified by qS.) Accuracy analyses were followed by analyses of response times to evaluate speed-accuracy trade-offs. For response time analyses, linear regression was performed for each individual and the residual response times from these regressions were then analyzed with analysis of variance. Response time analyses included correct and incorrect trials.
On average, participants correctly answered 73.4% (SD = 7.54) of the object tests with a median response time of 2724 ms (SD = 744).
Because of their fixation patterns several individuals did not have data in all eight conditions resulting from the crossing of the delay-boundary, presentation-boundary, and fixation factors. For this analysis and the following analyses, only individuals who had at least one trial in each of the eight conditions were included. Thirty-six of the 52 participants met this criterion.
As illustrated in Figure 3a, recognition accuracy was greater for boundary objects than for nonboundary objects (presentation-boundary effect, odds ratio: 1.65; mean logistic coefficient: 0.504; SD = 0.360; t(35) = 8.42, p < .001, d = 1.40). However, this difference was greatest for objects from previous events, resulting in a reliable interaction between the delay-boundary and presentation-boundary factors (delay-boundary × presentation-boundary interaction, odds ratio: 1.44; mean logistic coefficient: 0.363; SD = 0.456; t(35) = 4.79, p < .001, d = 0.798). Post-hoc analyses indicated that accuracy was greater for boundary objects from previous events than for boundary objects from the current event and this difference approached reliability (mean difference in estimated accuracy = .098, qs(35) = 3.72, p = .058). Accuracy was worse for nonboundary objects from the previous event than for nonboundary objects from the current event (mean change in estimated accuracy = −.167, qs(35) = −4.19, p = .027). Responses were more accurate when the object was fixated (odds ratio: 2.06; mean logistic coefficient: 0.721; SD = 0.781; t(35) = 5.54, p < .001, d = 0.923). There was no reliable difference in overall accuracy for objects from previous events and objects from current events (delay-boundary effect, odds ratio: 0.997; mean logistic coefficient: −0.003; SD = 0.409; t(35) = −0.043, p = .966, d = −0.007). No other effects were reliable (largest effect, odds ratio: 1.22; mean logistic coefficient: 0.200; SD = 0.715; t(35) = 1.68, p = .102, d = 0.280).
One concern about this analysis is that a large number of participants were excluded because their fixation patterns did not allow for a complete crossing of the three factors. Therefore, a secondary analysis excluded the fixation factor and was performed on all 52 participants. The outcome of this analysis was largely consistent with the data from the more limited sample: The interaction between delay- and presentation-boundaries was reliable, as was the main effect of the presentation-boundary factor (smallest effect, odds ratio: 1.52; mean logistic coefficient: 0.418; SD = 0.459; t(51) = 6.57, p < .001, d = 0.911). However, responses were reliably more accurate for objects from previous events than for objects from the current event (delay-boundary effect, odds ratio: 1.18; mean logistic coefficient: 0.168; SD = 0.400; t(51) = 3.02, p = .004, d = 0.419).
Response times were also examined to ensure that the observed relationship between event boundaries and response accuracy were not due to speed-accuracy tradeoffs. Response times are illustrated in Figure 3b. With one exception, those conditions associated with the highest response accuracy were associated with faster response times. Responses were faster for boundary objects than for nonboundary objects (presentation-boundary effect, −182 ms, F(1, 35) = 5.59, p = .024, ηp2 = .138). In addition, retrieving objects from the previous event was associated with an increase in response times for nonboundary objects but not boundary objects, leading to a reliable interaction between the delay-boundary and presentation-boundary factors (F(1, 35) = 8.52, p = .006, ηp2 = .197; delay-boundary effect for nonboundary objects: 340 ms; delay-boundary effect for boundary objects: −55 ms). The interaction was only present for fixated objects, and the three-way interaction between the delay-boundary, presentation-boundary, and fixation factors was reliable (delay-boundary × presentation-boundary x fixation interaction, F(1, 35) = 18.1, p < .001, ηp2 = .341). Overall, responses were faster to fixated objects than to nonfixated objects (fixation effect: 334 ms, F(1, 35) = 22.6, p < .001, ηp2 = .391). Higher accuracy rates for objects from previous events than objects from current events was associated with increased response times, indicating a speed-accuracy trade-off (delay-boundary effect: 147 ms, F(1, 35) = 5.56, p = .024, ηp2 = .137).
Accuracy on the event tests and comprehension questions was high, indicating that participants were attending to the activities and events in the clips (event test accuracy: 95.0%, SD = 5.78%; event test RT: 3003 ms, SD =797; comprehension test accuracy: 86.7%, SD = 6.60%; comprehension test RT: 6853 ms, SD = 1633).
In this experiment, participants’ ability to recognize objects that were presented just five seconds earlier was related to when the activities were segmented. In general, recognition was better for objects that were present when the event was segmented (boundary objects) than for other objects (nonboundary objects). However, this difference was observed only when objects from a previous event were tested. Relative to objects from the current event, memory for objects from previous events was worse for nonboundary objects and similar for boundary objects. These data provide strong initial support for the claim that the ability to remember recently encountered objects changes when an event is segmented.
The data are consistent with the predictions derived from EST (Zacks, et al., 2007). In particular, the data suggest that memory for objects from previous events was limited to those that were present at event boundaries, which should be better encoded into long-term memory than nonboundary objects. EST proposes that event models are constructed at event boundaries. At these moments in time, active memory for events is more sensitive to incoming perceptual information, and this information should be processed in relation to relevant semantic knowledge of the activities and objects currently perceived. In addition, this information should be processed in relation to other information present at that time (e.g., object configurations and an actor’s placement relevant to those objects may now be processed). As a result, objects that are present when an event boundary occurs should receive additional relational and associative processing, leading to durable representations that survive when active memory is reset. In line with earlier demonstrations of better long-term memory for boundary information (Baird & Baldwin, 2001; Boltz, 1992; Hanson & Hirst, 1989, 1991; Lassiter, 1988; Lassiter & Slaw, 1991; Lassiter et al., 1988; Newtson & Engquist, 1976; Schwan & Garsoffky, 2004; Zacks, Speer et al., 2006), the data were fully consistent with this hypothesis.
The data also support EST’s claim that event segmentation is a control process that clears active memory of information from past events at event boundaries. If this is the case, information that is not encoded into long-term memory should be more poorly recognized if the event is segmented between presentation and test. Therefore, nonboundary objects should be more poorly recognized after delay-boundaries. To the extent that similar representations from the same event compete (Anderson & Neely, 1996; Bunting, 2006; Underwood, 1957), nonboundary information from past events should no longer compete with boundary information after it is cleared from active memory. Therefore, it may be just as easy or easier to access information that has been encoded into long-term memory if the event is segmented between presentation and test. In this case, memory for boundary objects should be relatively intact after delay-boundaries. This is exactly what was observed in Experiment 1. Furthermore, theories of text and discourse comprehension that suggest that nonboundary information should be mapped onto existing models of the situation (Gernsbacher, 1985; Zwaan & Radvansky, 1998) do not appear consistent with the divergent effects of delay-boundaries on nonboundary and boundary objects.
The recognition tests in Experiment 1 tapped memory for the types of objects that were in the movie as well as memory for the perceptual features of those objects. However, memory for conceptual and perceptual information may be differently influenced by event segmentation. If the purpose of event models is to generate perceptual predictions, then perceptual information from the current event should be more accessible than perceptual information from previous events. With regard to conceptual information, it is possible that event segmentation may trigger a rapid and holistic extraction of conceptual and gist information from scenes (cf., Oliva, 2005; Potter, Staub, & O’Connor, 2004), but may have little effect on the degree to which perceptual information is processed. However, the data from Experiment 1 cannot rule out another possibility: Perceptually detailed representations of a scene may be formed and stored in long-term, episodic memory when event boundaries are encountered. If this is the case, then both conceptual and perceptual memory for objects should be better for boundary objects than nonboundary objects, regardless of whether the object was fixated. The next two experiments were performed to determine whether perceptual and conceptual information are stored in event models and are maintained in boundary representations.
The data from Experiment 1 suggested that how object information is retrieved from memory changes if the event has been segmented since the object was last seen. However, it does not indicate what type of information is affected by event boundaries. Performance could have been driven by at least two kinds of object information: conceptual and perceptual. Because participants chose between two objects that differed in their basic level categories, conceptual representations of the objects’ categories or other semantic features would often be sufficient to discriminate old from new objects. Participants also could have used representations of the perceptual features of the objects, because the old object and different type object also differed in their perceptual characteristics. Experiments 2 and 3 better isolated the relationship between event boundaries and the accessibility of conceptual and perceptual representations of objects. Experiment 2 focused on the role of conceptual information on recognition test performance by requiring participants to choose the type of object that matched an object recently presented in the clip.
A second group of 52 participants (30 female, 18–22 years old) were recruited from the Washington University community. Data from an additional nine participants were collected and replaced due to failure to complete the experiment (n = 4), technical difficulties or experimenter error (n = 3), noisy eye data (n = 2), or failure to follow instructions (n = 1).
The clips, event tests, and comprehension questions were described in the Methods section of Experiment 1. However, the recognition test alternatives were designed to limit participants’ use of perceptual memory for the objects. Participants were shown an image of the same type object and the different type object and were asked to indicate which one was like an object just presented in the movie (Figure 1c). Same type objects were from the same basic level category of objects as the old objects, but were not presented in the film. They could differ from old objects along several perceptual dimensions, including shape, color, orientation, and size. For example, in one scene, a stapler is shown on a counter. At test, participants chose between a different stapler (same type object) and a tape dispenser (different type object). The different type objects were the same as those that were used in Experiment 1 and did not appear in the clips. In the conceptual match to sample control task, participants were able to successfully match the same type object to the old object 95.4% (SD = 3.6) of the time.
On average participants correctly answered 65.2% (SD = 7.41) object tests with an average median response time of 2902 ms (SD = 647).
Forty-four participants had at least one trial in each of the eight object test conditions. Their performance is illustrated in Figure 4. Responses were more accurate for boundary objects than for nonboundary objects (presentation-boundary effect, odds ratio: 1.55; mean logistic coefficient: 0.437, SD = 0.300; t(43) = 9.167, p < .001, d = 1.46). Responses were also more accurate for objects from previous events than for objects from the current event (delay-boundary effect, odds ratio: 1.24; mean logistic coefficient: 0.211, SD = 0.353; t(43) = 3.97, p < .001, d = 0.598). However, better recognition test performance for objects from previous events was limited to boundary objects. The differential effect of delay-boundaries on boundary and nonboundary objects was reliable (delay-boundary × presentation-boundary interaction, odds ratio: 1.36; mean logistic coefficient: 0.307, SD = 0.414; t(43) = 4.93, p < .001, d = 0.743). The presence of a delay-boundary was associated with reliably greater accuracy for boundary objects (mean change in estimated accuracy = .226, qs(43) = 5.82, p = .001), and a nonsignificant decrement in accuracy for nonboundary objects (mean change in estimated accuracy = −.109, qs(43) = −2.38, p = .346). Fixating an object was associated with greater levels of accuracy (fixation effect, odds ratio: 1.73; mean logistic coefficient: 0.546, SD = 0.806; t(43) = 4.49, p < .001, d = 0.677). No other interactions were reliable (largest effect, odds ratio: 1.21; mean logistic coefficient: 0.194, SD = 0.767; t(43) = 1.68, p = .101, d = 0.253, for the presentation-boundary × fixation interaction).
To ensure that the relationship between event boundaries and object recognition was observable in the full sample of participants, fixation was ignored and all participants were included in a secondary analysis. This analysis was consistent with the observations from the analysis of the more limited sample, indicating that responses were most likely to be accurate when boundary objects were tested, particularly when they were from the previous event (smallest effect, odds ratio: 1.26; mean logistic coefficient: 0.230, SD = 0.346; t(43) = 4.80, p < .001, d = 0.666).
Response time analyses found no evidence for speed-accuracy trade-offs (Figure 4b). When there were differences in response speed, those conditions in which responses were faster were also associated with higher rates of accuracy. Responses were faster for when objects from previous events than for objects from the current event (delay-boundary effect: −156 ms, F(1, 43) = 5.70, p = .021, ηp2 = .117). Although response times were similar for boundary objects and nonboundary objects, they were fastest when boundary objects were fixated and retrieved from the previous event (delay-boundary × presentation-boundary × fixation interaction, F(1, 43) = 10.7, p = .002, ηp2 = .200; presentation-boundary × fixation interaction, F(1, 43) = 6.15, p = .017, ηp2 = .125). No other main effects or interactions were significant (largest effect, F(1, 43) = 0.524, p = .473, ηp2 = .012).
Participants accurately answered the event tests and comprehension questions, indicating that they attended to the activities and events in the clips (event test accuracy: 96.5%, SD = 4.47; event test RT: 3074 ms, SD = 712; comprehension test accuracy: 85.7%, SD = 8.17; comprehension test RT: 6717 ms, SD = 1471).
In this experiment, participants were asked to indicate which of two objects was the same type of object as an object in the clip. Thus, this experiment examined memory for conceptual information about recently encountered objects. As in Experiment 1, object recognition was associated with when boundaries occurred in the clips: When they were retrieved from a previous event rather than from the current event, conceptual memory was better for boundary objects and worse for nonboundary objects. Therefore, the data indicate that conceptual memory can support recognition memory for boundary objects, but not nonboundary objects, after an event has been segmented. Furthermore, conceptual memory for boundary objects from previous events was relatively good even for objects that were not fixated. Not only is this consistent with the proposal that event models are rebuilt from perceptual information and associated semantic knowledge at event boundaries, it also is also indicative of a process that holistically processes scenes at those times.
In this and the previous experiment, boundary objects were better recognized when an event boundary occurred during the delay. This is surprising in light of previous research suggesting that information from previous events should be less accessible than information from the current event (Glenberg et al., 1987; Levine & Klin, 2001; Radvansky & Copeland, 2006; Speer, Zacks, & Reynolds, 2004; Zwaan & Radvansky, 1998). There are several possible explanations for better memory after event boundaries. One of these is that information that is actively maintained in memory could interfere with retrieval from the current event, particularly if it is similar to the information being retrieved (e.g., Anderson & Neely, 1996; Bunting, 2006; Capaldi & Neath, 1995). Because it may be more demanding to maintain information in active memory as events progress, it is also possible that when the test occurs within an event matters. Relative to the most recent event boundary, tests requiring retrieval from previous events occurred earlier than tests requiring retrieval from the current event. Another possibility is that objects may be better remembered if they are presented in more past events. It is possible that objects that are presented in multiple events are represented multiple times in episodic memory. If this is the case, then boundary objects should be easier to retrieve than objects that are present in only one event (e.g., nonboundary objects). Further investigation is needed to assess each of these and other possibilities.
These data suggest that the ability to remember the types of objects present in a previous event depends on whether those objects were present during the construction of an event model. Furthermore, the effect of event boundaries on encoding does not appear to depend on fixation. However, the data do not indicate whether representations of event boundaries contain perceptual detail as well as conceptual or gist information. A third experiment addressed this question.
The data from Experiments 1 and 2 suggest that participants can rely on conceptual information to recognize boundary objects from previous events. The final experiment examined the role of event boundaries on the ability to retrieve perceptual information about objects in events. If perceptual information is not maintained in event boundary representations, then memory for the perceptual features of objects should be worse when they are retrieved from a previous event than when they are retrieved from the current event. However, if, like conceptual information, perceptual information is encoded for objects throughout the scene when an event boundary occurs, then memory for the perceptual details of boundary objects should be good regardless of whether the object was fixated. Although EST makes no particular claims about either possibility, it does suggest that perceptual information should be better for boundary objects than for nonboundary objects. Furthermore, because perceptual information is not needed to generate perceptual predictions after the event has been segmented, then it should be less accessible following delay-boundaries.
Fifty-two participants (30 female, 18–24 years old) were recruited from the Washington University community. Data from an additional 12 participants were collected and replaced due to failure to complete the experiment (n = 5), technical difficulties (n = 4), noisy eye data (n = 2), or failure to follow instructions (n = 1).
To test memory for perceptual information about the objects presented in the clip, participants were asked to discriminate between the old objects and objects that were the same type as the old object. For example, if the old object was a chair, then participants chose between the same chair that was shown in the clip (old object) and a new chair that did not appear in the clip (same type object). The old object alternatives were the same as those used in Experiment 1 and the same type object alternatives were the same as those used in Experiment 2. Because the old object and same type object were different tokens from the same category, participants were expected to depend heavily on memory for perceptual information to perform the object tests. Otherwise, the task design and procedure were exactly as described in the Methods section of Experiment 1. In the perceptual match to sample control task, participants were able to successfully match the old object more than 96.8% (SD = 4.6) of the time.
Participants correctly answered an average of 71.6% (SD = 9.73) of object tests with an average median response time of 3099 ms (SD = 974).
Forty-two participants had at least one trial in each of the eight conditions. As Figure 5a illustrates, response accuracy was better for objects retrieved from the current event than for objects retrieved from a previous event (delay-boundary effect, odds ratio: 0.845; mean logistic coefficient: −0.168, SD = 0.408; t(41) = −2.66, p = .011, d = −0.411). Accuracy was also better for boundary objects than for nonboundary objects when they were fixated and retrieved from a previous event. If the object was not fixated, accuracy was numerically worse for boundary objects than for nonboundary objects regardless of delay-boundaries. This complex pattern of data resulted in a reliable three-way interaction between the delay-boundary, presentation-boundary, and fixation factors as well as several reliable two-way interactions (delay-boundary × presentation-boundary × fixation interaction, odds ratio: 1.30; mean logistic coefficient: 0.266, SD = 0.837; t(41) = 2.96, p = .046, d = 0.318; delay-boundary × presentation-boundary interaction, odds ratio: 1.14; mean logistic coefficient: 0.135, SD = 0.419; t(41) = 2.08, p = .043, d = 0.322; presentation-boundary × fixation interaction, odds ratio: 1.28; mean logistic coefficient: 0.243, SD = 0.784; t(41) = 2.01, p = .051, d = 0.310). Overall, responses were more accurate for fixated objects, though this effect did not reach significance (fixation effect, odds ratio: 1.20; mean logistic coefficient: 0.179, SD = 0.753; t(41) = 1.54, p = .131, d = 0.238). No other effects or interactions were reliable (largest effect, odds ratio: 0.909; mean logistic coefficient: −0.095, SD = 0.829; t(41) = −0.746, p = .460, d = −0.115 for the delay-boundary × fixation interaction).
To fully characterize the three-way interaction accuracy was estimated from the individual logistic model parameters and analyzed with analyses of variance and t-tests. For fixated objects, response accuracy was worse for objects from previous events than for objects from the current event, though this difference was only apparent for nonboundary objects (delay-boundary × presentation-boundary interaction for fixated objects: F(1, 41) = 12.7, p = .001, ηp2 = .236; delay-boundary effect for fixated objects: F(1, 41) = 6.97, p = .012, ηp2 = .145). Accuracy was worse for fixated nonboundary objects from previous events than for those from the current event (mean change in estimated accuracy = −.24; qs (41) = −6.86, p < .001). Accuracy was similar for fixated boundary objects from previous events and for those from the current event (mean change in estimated accuracy = .0006; qs (41) = −0.179, p = .999). For nonfixated objects, accuracy was worse when they were from a previous event than when they were from the current event, regardless of presentation-boundaries (delay-boundary effect for nonfixated objects: F(1, 41) = 5.78, p = .021, ηp2 = .124). No other effects were reliable for nonfixated objects (largest effect, F(1, 41) = 0.980, p = .328, ηp2 < .023).
Because the previous analyses were performed on a subset of participants (those with data in all eight conditions), these results were susceptible to selection effects. A secondary set of analyses was performed to include as many participants as possible. Because fixation interacted with the delay- and presentation-boundary factors, additional analyses were restricted to certain conditions (e.g., nonfixated objects only) to avoid collapsing over the fixation factor. The analysis of nonfixated objects included all 52 participants and was consistent with the data from the limited sample: Accuracy was greater for nonfixated objects from the current event than for nonfixated objects from previous event, regardless of presentation-boundaries (delay-boundary effect for nonfixated objects, odds ratio: 0.798; mean logistic coefficient: −0.225, SD = 0.525; t(51) = −3.09, p = .003, d = −0.429; next largest effect, odds ratio: 0.983; mean logistic coefficient: −0.017, SD = 0.578; t(51) = −0.216, p = .830, d = −0.030). The analysis of fixated objects was restricted to objects retrieved from a previous event in order to include as many participants as possible. This analysis included 50 participants and was consistent with the data from the limited sample. Accuracy was greater for fixated boundary objects than for fixated nonboundary objects when they were retrieved from the previous event, and this difference was marginally reliable (odds ratio: 1.31; mean logistic coefficient: 0.272, SD = 1.01; t(51) = 1.91, p = .062, d = 0.270). Therefore, the outcome of these analyses was consistent with observations from the limited set of participants.
Response time analyses showed no evidence of speed-accuracy trade-offs. As Figure 5b illustrates, responses tended to be faster in those conditions in which they were also the most accurate. Overall, responses were 274 ms faster for fixated objects (fixation effect, F(1, 41) = 17.0, p < .001, ηp2 = .293) and 118 ms faster for objects from the current event than for objects from a previous event (delay-boundary effect, F(1, 41) = 5.65, p = .001, ηp2 = .022). Furthermore, the delay-boundary effect was reliably greater for fixated objects (delay-boundary × fixation interaction, F(1, 41) = 15.0, p < .001, ηp2 = .267). The difference in response times for objects from a previous event and objects from the current event was greater for nonboundary objects (273 ms) than for boundary objects (−37 ms), resulting in a reliable interaction between the delay-boundary and presentation-boundary factors (delay-boundary × presentation-boundary interaction, F(1, 41) = 5.80, p = .021, ηp2 = .124). These effects were greater for fixated objects, though the three way interaction between the delay-boundary, presentation-boundary, and fixation factors was not reliable (delay-boundary × presentation-boundary × fixation interaction, F(1, 41) = 2.73, p = .106, ηp2 = .062). No other main effects or interactions were significant (largest effect, F(1, 41) = 0.638, p = .429, ηp2 = .015).
Performance on the event tests and comprehension questions indicated that participants were attending to the activities and events (event test accuracy: 96.5%, SD = 4.77; event test RT: 3297 ms, SD = 854; comprehension test accuracy: 85.5%, SD = 6.95; comprehension test RT: 7218 ms, SD = 1714).
By asking participants to indicate which of two objects from the same basic level category was presented in the clip, this experiment examined recognition memory for the perceptual features of objects. As with the two previous experiments, the data indicated that the ability to recognize a recently encountered object was associated with event segmentation. However, in this experiment accuracy was worse for objects from previous events than for objects from the current event. This was true for nonfixated objects and for nonboundary objects. In contrast, perceptual memory for fixated boundary objects remained relatively high when they were retrieved from a previous event.
In this experiment, the relationship between event boundaries and object recognition depended on whether the old object was fixated. This was particularly striking in the case of boundary objects. When boundary objects were fixated they were recognized well regardless of whether they were from a previous event or from the current event. However, when the old object was not fixated, accuracy was worse for objects presented in a previous event than for objects presented in the current event, regardless of whether they were boundary or nonboundary objects. These data suggest that perceptual information is cleared from active memory at event boundaries and that representations of event boundaries are limited in perceptual detail. Because the interaction of the presentation-boundary and delay-boundary effects was only observed for fixated objects, it is unlikely that event boundary representations contain perceptual detail for non-fixated objects and scene regions.
The results of Experiment 3 provide additional support for the predictions of EST (Zacks, et al., 2007), and some insight into the types of information maintained in event models and in longer-lasting representations of events. In particular, they suggest that perceptual information about fixated boundary objects is encoded into long-term episodic memory. The data also suggest that perceptual information about boundary and nonboundary objects is actively maintained in event models until the event is segmented. This makes sense within the framework of EST: the maintenance of perceptual information in event models should improve the accuracy of perceptual predictions for the current event. Once the event changes the utility of maintaining perceptual information in memory decreases.
Although the basic effects of event boundaries on memory were relatively consistent across the three experiments, several qualitative differences were observed. To characterize differences in how event boundaries related to conceptual and perceptual information, we conducted an analysis that compared the three experiments. The data were analyzed in the same manner as in Experiments 1–3, but with experiment included as a between subjects factor (experiment) with three levels describing the type of information isolated in the object tests: both, conceptual, and perceptual tests.
To determine whether the effects of event boundaries on accuracy depended on the type of information tested, the logistic regression coefficients obtained from each individual’s regression models were analyzed in separate ANOVAs with experiment as a between subjects factor. Logistic regression coefficients were obtained for each participant. This analysis confirmed that the relationship between event boundaries and accuracy varied across experiments. As is illustrated in Figure 6a, the effect of testing objects from previous events, rather than objects from the current event depended on the experiment (delay-boundary × experiment interaction, F(2, 119) = 10.2, p < .001, ηp2 = .145). Fisher’s protected t-tests indicated that the delay-boundary effect was reliably more positive for conceptual tests than for tests of both types of information and tests of perceptual information (conceptual–perceptual: .379, t(119) = 4.51, p = .004; conceptual–both: .214, t(119) = 2.44, p < .001), and was marginally reliably more negative for tests of perceptual information than for tests of both types of information (perceptual–both: −.165; t(119) = −1.87, p = .064). There were also reliable differences in the effect of the presentation-boundary factor across experiments (Figure 6b, presentation-boundary × experiment interaction, F(2, 119) = 23.3, p < .001, ηp2 = .281). Fisher’s protected t-tests indicated that although the presentation-boundary effect was similar for tests of conceptual information and tests of both types of information, it was significantly smaller for tests of perceptual information (conceptual–both: −.067, t(119) = −0.851, p = .396; conceptual–perceptual: .422, t(119) = 5.58, p < .001; perceptual–both: −.488, t(119) = −6.14, p < .001).
The magnitude of the delay-boundary × presentation-boundary interaction also varied across the three types of tests (delay-boundary × presentation-boundary × experiment interaction: F(2, 119) = 3.10, p = .048, ηp2 = .05). For each experiment, Figure 6c illustrates recognition test accuracy for the four object test conditions defined by event segmentation, collapsing across fixation conditions. Fisher’s protected t-tests indicated that the interactive effect of the delay-boundary and presentation-boundary factors was smaller when perceptual information was tested than when both types of information were tested (both–perceptual: .228, t(119) = 2.35, p = .020). The difference in the interactive effect of delay- and presentation-boundaries in conceptual and perceptual tests was marginal (conceptual–perceptual: .173, t(119) = 1.87, p = .064). The magnitude of this effect was not significantly different for tests of both types of information and for tests of conceptual information (both-conceptual: .056, t(119) = 0.581, p = .562). The magnitude of the fixation effect was larger for tests of both types of information and for conceptual tests than it was for perceptual tests (F(2, 119) = 4.98, p = .008, ηp2 = .077). In addition, although the three way interaction between the delay-boundary, presentation-boundary, and fixation factors was larger for perceptual tests than conceptual tests and tests of both types of information, this difference was not reliable (fixation × experiment interaction F(2, 119) = 2.08, p = .130, ηp2 = .034; fixation × delay-boundary × presentation-boundary × experiment interaction, F(2, 119) = 2.08, p = .130, ηp2 = .034). There were no reliable differences between experiments in the magnitudes of the interaction of fixation with either the delay-boundary factor or the presentation-boundary factor (largest effect, F(2, 119) = 0.676, p = .512, ηp2 = .011).
The response time data were also examined with experiment as a between subjects factor. The response time analyses can be summarized by two main findings. First, a consistent feature of all three experiments was the reliable three-way interaction between the delay-boundary, presentation-boundary, and fixation factors (Figure 7a). Testing objects from a previous event rather than from the current event had opposite effects on response times to boundary objects and nonboundary objects (delay-boundary × presentation-boundary, F(1, 119) = 7.36, p = .008, ηp2 = .058), however, this interaction was observed only for fixated objects (delay-boundary × presentation-boundary × fixation interaction, F(1, 119) = 25.6, p < .001, ηp2 = .177). Responses were 407 ms slower for fixated nonboundary objects from a previous event than for those from the current event (qs(119) = 8.91, p < .001), but did not reliably differ for fixated boundary objects from previous and current events (delay-boundary effect for fixated boundary objects: −193 ms, qs(119) = −3.66, p = .338). When the objects were not fixated, response times were relatively uniform across the four object test conditions and did not reliably differ from each other (largest qs(119) = 2.48, p = .649).
The second main finding from the response time analysis was fixation had little effect on response time when only conceptual information was tested (Figure 7b; fixation × experiment interaction, F(2, 119) = 5.52, p = .005, ηp2 = .085). The effect of the delay-boundary factor and its interaction with fixation also were modulated by experiment (delay-boundary × fixation × experiment interaction, F(2, 119) = 4.93, p = .009, ηp2 = .077; delay-boundary × experiment interaction, F(2, 119) = 8.09, p < .001, ηp2 = .120). Fisher’s protected t-tests indicated that when objects from the current event were tested, the effect of fixation on response times was significantly lower for conceptual tests than for tests including perceptual information (fixation effect for tests of objects from the current event for tests of perceptual, conceptual, and both types of information, respectively, −575 ms, 16 ms, −424 ms; fixation effect for tests of objects from a previous event for tests of perceptual, conceptual, and both types of information, respectively, 27 ms, −85 ms, −244 ms; perceptual-conceptual for fixation effect in current events, t(119) = −3.71, p < .001; both-conceptual for fixation effect in current events, t(119) = −2.65, p = .009). No other differences in the effect of fixation across the delay-boundary and experiment factors were reliable.
These analyses indicate that although there were consistencies in the relationship between object memory and event boundaries across the three types of object tests, there were several important differences. Conceptual memory was better when an event boundary occurred during object presentation and when a new event began before the object was tested. However, perceptual memory was impaired when a new event began during the delay between object presentation and test. With regard to EST, these data suggest that perceptual information (and possibly conceptual information) is maintained in active memory until an event is segmented and that long-term representations of events are mainly conceptual in nature.5
The observed differences in the relationship between event boundaries and memory for conceptual and perceptual information imply a central role for perceptual information in the representation of current events in memory. When retrieving information from the current event, the data indicate that perceptual information, but not conceptual information, can be used to recognize both boundary and nonboundary objects from the current event. However, it is possible that perceptual memory for objects interfered with performance on the conceptual recognition test in Experiment 2. Because the same type objects did not match the old objects in their perceptual details, participants would need to identify the types of objects in the current event while also ignoring the perceptual details of those objects. Indeed, as perceptual information for most objects became less available following delay-boundaries, conceptual information for boundary objects became more accessible.
The present experiments examined two specific hypotheses about the relationship between event segmentation and memory for recently encountered objects: first, that objects present when an event boundary occurs are better encoded than are other objects; second, that event boundaries alter the accessibility of recently encountered objects. The data clearly support the first hypothesis. The occurrence of event boundaries during object presentation was associated with better recognition test performance, particularly when memory for conceptual information was tested. The data also support the second hypothesis, suggesting that retrieval across events relies on long-term representations of events. These effects were observed despite the fact that the delay between object presentation and testing was brief and was held constant across conditions. The implications of these data are broad, and are consistent with claims that event segmentation influences when information is encoded and when that information is most accessible.
EST proposes that event segmentation is a control process that regulates when active representations of events are reset and updated. Better recognition memory for boundary objects than for nonboundary objects is consistent with EST’s proposal that event models are rebuilt from current input at event boundaries. A reduction in memory for nonboundary objects from previous events is consistent with EST’s proposal that event models are reset at event boundaries. In addition, EST suggests that when information is retrieved from a previous event, it must be retrieved from long-term memory. This claim was supported by the observation that memory for objects from the previous event relied on memory for event boundaries, which are better encoded into long-term memory than are nonboundaries (Boltz, 1992; Newtson & Engquist, 1976).
Although the data generally conform to the predictions derived from EST, two aspects of the data were unexpected. First, in Experiments 1 and 2 boundary objects from previous events were remembered well regardless of whether they were fixated. This is surprising in light of the important role fixation and attention play in visual memory (cf., Henderson & Hollingworth, 1999b). However, similar performance for fixated and nonfixated boundary objects was limited to tests that tapped conceptual memory. This pattern implicates the engagement of a coarse, holistic perceptual process, such as the evaluation of scene gist (cf., Oliva, 2005; Potter et al., 2004), at event boundaries. A second unexpected aspect of the data was that boundary objects from previous events were remembered better than boundary objects from current events in Experiments 1 and 2. Thus, the data suggest that resetting event models may improve conceptual memory for boundary objects. Increases in accessibility following segmentation could occur if resetting the event model also reduces interference from competing representations in active memory (Aron, Robbins, & Poldrack, 2004; Bunting, 2006; Kuhl, Dudukovic, Kahn, & Wagner, 2007) or releases attentional resources. Although these two findings fit with EST, they are not obvious consequences of the processes it describes. Additional experiments designed to address the relationship between event segmentation and holistic perceptual processing and interference are needed.
Other accounts of event segmentation have focused on how events are segmented and on why event boundaries are better represented in memory than nonboundaries. These theories have suggested that event boundaries are remembered because they have high information content (Newtson, 1998), because they are points of transition from one over-learned sequence of activity to another (Avrahami & Kareev, 1994), or because they occur when activity deviates from schematic representations of the current activity (Hanson & Hanson, 1996). Because boundary objects were better remembered than nonboundary objects, aspects of the present data are consistent with these accounts of event segmentation. However, a comprehensive theory of event segmentation must also provide an explanation for how event segmentation is associated with changes in the ability to retrieve recently encountered information. Those theories that have focused on why event boundaries are better remembered than nonboundaries offer little insight into this effect.
Like EST, several theories of discourse comprehension also propose that people build mental models of the current situation in memory, and that these models are active only for as long as they adequately represent the current situation. These theories are consistent with the observation that boundary objects are remembered well in new events. However, unlike EST, these theories also claim that information from the middle of events is integrated into mental models that represent the current situation (Gernsbacher, 1985; Zwaan & Radvansky, 1998). During retrieval, information from the middle of previous events should be accessible, though it should take longer to retrieve than information from the beginning of those events (Gernsbacher, 1985, 1997). However, we found no evidence that information from the middle of an event (nonboundary objects) is maintained in memory after an event boundary. Rather, recognition accuracy for nonboundary objects tested in new events was near chance in most cases.
The difference in the predictions of EST and theories of discourse comprehension highlight the fact that they were designed to address how people process and understand two very different types of input. Whereas most words in a communication (or pictures in a picture story) are likely to contribute to comprehension, information encountered between event boundaries may contribute relatively little to one’s understanding of an ongoing activity (Newtson & Engquist, 1976; Schwan & Garsoffky, 2004). Take for example the sentence “The man took the laundry downstairs.” Almost all words in this sentence contribute to the reader’s understanding of what is happening. However, when watching a man carrying laundry downstairs, little is learned about his activity between the time he starts down the stairs and the time he arrives at the bottom of the stairs (assuming nothing happens along the way). Furthermore, it is possible to present irrelevant information in film without calling attention to it (e.g., a ceiling fan near the stairway). In the case of the man taking laundry downstairs, there would be little need to mention or describe a ceiling fan and mentioning one might lead readers to draw inferences about the authors’ intent in doing so (Graesser, Singer, & Trabasso, 1994). Finally, in film objects, locations, and actors can all physically persist on the screen from one event to the next. In contrast, in discourse it is not possible for a word or phrase to persist beyond a clause boundary except in memory. Thus, there is no clear analogue between boundary objects in our experiments and boundary information in discourse. Although there are many similarities between discourse processing and event perception (Gernsbacher, 1985; Magliano et al., 2001; Radvansky & Copeland, 2006; Speer & Zacks, 2005; Speer et al., 2007), these differences prevent strong conclusions about one based on data and theories regarding the other.
Because these experiments establish a relationship between event segmentation and memory for recently encountered objects, they bear on how memory operates in dynamic, naturalistic contexts. In particular, these data suggest that event segmentation influences the contents of both short-term and long-term episodic memory. They also offer insight into how event segmentation may influence processing of and memory for conceptual and perceptual information.
These experiments suggest that short-term memory may be limited by the duration of the current event. Moreover, they suggest that perceptual information is actively maintained in memory for as long as it is useful for generating perceptual predictions. In Experiment 3, the ability to recognize the perceptual features of objects was worse if the object was from a previous event than if it was from the current event. Similarly, in Experiment 1, nonboundary objects were more poorly recognized when they were from a previous event. Therefore, the perceptual details of recently encountered objects, including nonboundary objects, may be maintained in active memory (i.e., event models or the visuo-spatial sketchpad). However, perceptual memory for an object appears to persist beyond the current event only if the object was fixated and present when an event boundary occurred. These data are consistent with a body of work that indicates that clausal structure and changes in the situation depicted in film and text impact the ability to retrieve recently encountered information, particularly perceptual or surface information (Carroll & Bever, 1976; Clark & Sengul, 1979; Gernsbacher, 1985; Glenberg et al., 1987; Jarvella, 1979; Radvansky & Copeland, 2006; Sachs, 1967, 1974; Speer & Zacks, 2005). They are also similar to work that shows that only the most recent item or “processing epoch” in a study list is actively maintained in memory (McElree, 2006). Importantly, however, the data presented here extend these findings to an online perceptual process (Zacks & Swallow, 2007). They also imply that the effects of event boundaries on retrieval depend on the type of information that is to be retrieved, are observable within five seconds after the boundary has occurred, and reflect the presence of boundaries during encoding.
With regard to long-term episodic memory, event boundaries could be represented in at least two ways. Event boundary representations could be mainly schematic, capturing basic, possibly title-like, information about the types of people, objects, and/or activities in a scene as well as their basic spatial and functional relations (cf., Oliva, 2005; Potter et al., 2004). Event boundary representations could also be more image-like and contain visual or other sensory detail. The data reported here suggest that event boundary representations capture information about the types of objects present during the boundary, but not necessarily much perceptual detail about those objects. There are several possible explanations for this effect, including EST’s claim that boundary information receives additional processing as part of the construction of new event models. An intriguing possibility is that changes in events may trigger a reevaluation of scene gist (Oliva, 2005), resulting in better recall and recognition of boundaries than nonboundaries. Gist representations could contain information about high-level conceptual features of the scene as well as the spatial configuration and orientation of objects in the scene (Henderson & Hollingworth, 1999a; Hollingworth, 2007; Hollingworth & Henderson, 2002; Jiang, Olson, & Chun, 2000; Potter et al., 2004; Schyns & Oliva, 1994). These representations may provide a structure, perhaps spatial in nature, on which the perceptual details of events may be encoded and maintained in long-term memory. If this information is extracted only at event boundaries, then there would be no structure on which to encode objects that were not present during an event boundary. This may explain why perceptual memory for fixated nonboundary objects was worse when they were retrieved from previous events. Additional research examining the nature of event boundary representations is needed to evaluate this possibility.
There are other potential explanations of the effect of event boundaries on encoding. For example, boundary objects were in more events than were nonboundary objects. If events, rather than event boundaries, are the units of long-term event memory then boundary objects should be represented more often and in more diverse contexts in memory than nonboundary objects. It is also possible that, by virtue of being present when an event is segmented, boundary objects tie the previous event to the current event in memory, making them central to coherent representations of the activities depicted throughout the film.6 A similar claim has been made for situation changes in the Event Indexing Model (Zwaan & Radvansky, 1998). However, data from the event segmentation literature argues against these explanations. For example, after watching a movie depicting a goal directed activity filmed in a single location and with no cuts, participants more accurately recognized frames from a boundary point than frames from nonboundary points (Newtson & Engquist, 1976). Because there were no cuts or changes in camera angle, the scene was consistent and the narrative coherent throughout the film. Furthermore, functional neuroimaging data show that neural processing in some regions increases around event boundaries (Zacks et al., 2001). These data are consistent with the proposal that it is increased perceptual processing in response to changes in the activity that underlies the memory advantage for boundary information (Zacks et al., 2007).
Finally, the structure of memory, particularly as it relates to a distinction between short-term and long- term memory, has long been debated in the literature (Baddeley, 2003; Cowan, 1999; Jonides et al., 2008; McElree, 2006; Olson, Page, Moore, Chatterjee, & Verfaellie, 2006; Ranganath & Blumenfeld, 2005; Shrager, Levy, Hopkins, & Squire, 2008; Talmi, Grady, Goshen-Gottstein, & Moscovitch, 2005). Although the debate is far from resolved, there is substantial behavioral and neurophysiological evidence in favor of unitary store models (cf., Jonides et al., 2008; McElree, 2006; Olson et al., 2006; Öztekin, McElree, Staresina, & Davachi, in press; Ranganath & Blumenfeld, 2005). These models claim that short-term and long-term memory are distinguished by the activation level of a representation that is maintained by focused attention. For example, McElree and colleagues (McElree, 2006; Öztekin et al., in press) present behavioral and neuroimaging data that suggest that retrieval processes are the same for all items in study lists except for the most recently presented item or “epoch.” Similarly, the present experiments suggest that information from the most recent event is more accessible than information from previous events. However, these experiments do not speak to how information in the current event is maintained in memory. It is possible that event models exceed the capacity of focused attention. Therefore, whether information from the current event is actively maintained in memory through focused attention on an event model (as a unit) or with a specialized store such as Baddeley’s episodic buffer in working memory (Baddeley, 2003) or visual working memory for actions (Wood, 2007), is an open question.
For these experiments we used a quasi-experimental design to evaluate the effects of event segmentation with highly engaging, rich, and natural materials that are commonly encountered outside the laboratory. There are important drawbacks to this approach. First, we could not randomly assign the individual objects to the different test conditions. To statistically control for potentially confounded, superficial differences in the objects, regression analyses included variables coding for object size and eccentricity. The regression analyses also included variables coding for performance in two match-to-sample tasks designed to measure the ease with which the old objects were segregated and identified in the scenes. Second, as with most studies of event segmentation, event segmentation itself was not manipulated. Instead we measured segmentation and used it to sort object recognition trials into different conditions. As a result, these data do not permit claims that there is a causal relationship between event segmentation and memory on their own.
A replication of these data with a different stimulus set and wider range of materials would be valuable. However, the effects observed here do converge with previous findings using less naturalistic materials for which true experimental manipulations were possible. These manipulations include changes in activity in film, situation changes in narrative text, and location changes in virtual reality environments (Carroll & Bever, 1976; Radvansky & Copeland, 2006; Speer, Jacoby, & Braver, 2003; Zwaan & Radvansky, 1998). Given recent advances in the literature examining the perceptual and conceptual features of activity that are associated with event segmentation, it seems likely that future research will manipulate event segmentation as well.
Finally, despite its limitations, our use of natural materials offered critical advantages for these experiments. First, in order to examine how event segmentation influences memory for objects, it was important to ensure that the activities in the movies were engaging. This decreased the likelihood that participants attended to objects rather than to the activities depicted in the film. Second, the activity depicted in the films needed to be realistic. This ensured that the activities progressed normally and did not appear contrived or awkward, interfering with segmentation.
By perceptually dividing what just happened from what is happening now, event segmentation may impact the ease with which recently encountered objects are remembered. Indeed, the ability to remember an object that was presented just five seconds earlier was repeatedly found to be associated with whether an event boundary occurred between presentation and test. Under these circumstances recognition was dependent on whether the object was present during an event boundary. These data provide evidence that perceptual event segmentation has immediate consequences for object recognition, and suggest that event boundary representations are mainly conceptual in nature but can also contain perceptual details of fixated objects. These data are consistent with the claim that perceptual event segmentation reflects a control process that regulates the contents of activity memory. In short, these data suggest that boundaries in event perception are also boundaries for memory.
The authors would like to thank Deanna Barch, Nicole Speer, Randy Buckner, Stephen Lindsay, Anna MacKay, Corey Maley, Yuhong Jiang, and Tal Makovski, and two anonymous reviewers for their comments on this research, Corey Maley, Derek Holder, and Stephanie Brewer for their invaluable help with data collection, and Becky Hedden and Avis Chan for their early efforts in stimulus development. This research was funded by NIH Grant RO1 MH070674 to Jeffrey M. Zacks and a Dean’s Dissertation Research Fellowship to Khena Swallow. Khena Swallow is currently with the Department of Psychology and Center for Cognitive Science at the University of Minnesota.
Note: Criteria 2, 4, 5, 6, 9, and 10 were evaluated by the first and second authors.
1Participants in Experiments 1–3 also segmented the movies in a second session that occurred one to two days after the recognition tests. However, their performance on this task was more variable than what we observed for participants who did not perform the recognition tests. Event boundaries defined by the independent group of observers were therefore deemed more reliable indicators of segmentation than were individual boundaries (see also Speer, Swallow, & Zacks, 2003). Further detail is available from the first author upon request.
2Noise in gaze position measurements leads to excessively variable gaze position values over time. We calculated the distance between gaze position at sample s and gaze position at sample s-1 for all samples acquired while the object was on the screen. We then divided the mean of these distances by their standard deviation. The more variable gaze position, the higher the standard deviation of distances and the lower the resulting ratio. Trials on which the ratio fell below 0.5 were excluded.
3Models were also generated that included old object duration as a predictor. However, due to correlations between old object duration and fixation, and old object duration and the presentation-boundary factor, multicollinearity in these models was high. Any object that was on the screen for more than 20 seconds was also present during an event boundary (and therefore was a boundary object). Analyses that included old object duration as a nuisance variable were not substantively different with regard to the main findings of these experiments.
4The logit, or the natural log of the odds ratio between the probability of a correct response and an incorrect response, captures the likelihood of a correct response on a given trial. Because logits are more suitable to regression analyses than are proportions, post-hoc analyses were performed on the logits before transforming them into the estimated proportion of correct responses. To obtain the studentized t-statistics (qs), analysis of variance was performed on the logits and the resulting Mean Squared Error (MSE) for the appropriate term was used.
5It is important to note that encoding strategies may have varied across the different types of tests. It is not clear that these strategies would interact with segmentation processes.
6We thank an anonymous reviewer for this suggestion.