One function of perception is to divide continuous experience into discrete parts, providing a structure for selective attention, memory, and control. This is readily observed in scene perception, in which objects are segmented from backgrounds (e.g., Biederman, 1987
; Vecera, Behrmann, & McGoldrick, 2000
; Woodman, Vecera, & Luck, 2003
) and in discourse processing, in which transitions between clauses and narrated situations influence reading times and discourse memory (e.g., Clark & Sengul, 1979
; Glenberg, Meyer, & Lindem, 1987
). Similarly, an online perceptual process called event segmentation
, divides ongoing activity into events (see Zacks, Speer, Swallow, Braver, & Reynolds, 2007
for an in-depth review). For example when watching someone boil water, an observer might divide the actor’s activity into getting a pot from a rack, filling the pot with water, setting the pot on the burner, turning on the burner, and bringing the water to a boil. The experiments presented in this paper investigated whether event segmentation also provides a structure for event memory: Because event segmentation separates “what is happening now” from “what just happened,” it may impact the ease with which recently encountered information is remembered. For example, it may be more difficult for an observer to retrieve information about the pot-rack once the “getting-a-pot” activity has ended and the “filling-the-pot-with-water” activity has begun.
Previous research on event segmentation provides compelling evidence that it is an important and ongoing component of perception (Zacks & Swallow, 2007
). Event segmentation is commonly measured by asking participants to explicitly identify event boundaries
, which separate natural and meaningful units of activity (Newtson, 1973
). However, functional neuroimaging data indicate activities are segmented even as naïve observers passively view activities (Speer, Swallow, & Zacks, 2003
; Zacks et al., 2001
; Zacks, Swallow, Vettel, & McAvoy, 2006
). In addition, observers tend to agree about when event boundaries occur (Newtson, 1976
). This likely reflects observers’ tendency to segment events at points of changes. Changes may be in perceptual information, such as an actor’s position and object trajectories (Hard, Tversky, & Lang, 2006
; Newtson, Engquist, & Bois, 1977
; Zacks, 2004
), or in conceptual information, such as an actor’s location, intentions, or goals (Speer, Zacks, & Reynolds, 2007
). For example, when watching a person read a book on a couch, observers might identify an event boundary when the actor changes his position from sitting to lying down and again when he closes the book, signaling a change in his goals. However, event boundaries are not identified when large shifts in visual input that accompany cuts in film occur (e.g., a cut from a wide-angle shot to a close-up), unless the cut coincides with a change in the scene or activity (Schwan, Garsoffky, & Hesse, 2000
). These data suggest that event boundaries may be characterized as points of perceptual and conceptual changes in activity separated by periods of relative stability.
Event Segmentation Theory (EST) offers a theoretical perspective on how the neurocognitive system implements event segmentation (Zacks et al., 2007
). At its core, EST claims that segmentation is a control process that regulates the contents of active memory. According to EST, observers build mental models of the current situation (event models
) to generate predictions of future perceptual input. Event models are based on current perceptual input and semantic representations of objects, object relations, movement and statistical patterns, and actor goals for the type of event currently perceived (event schemata
, Bartlett, 1932
; Glenberg, 1997
; Johnson-Laird, 1989
; Rumelhart, Smolensky, McClelland, & Hinton, 1986
; Zwaan & Radvansky, 1998
). For as long as they accurately predict what is currently happening, event models are shielded from further modification by a gating mechanism. When the event changes, the accuracy of predictions generated from the event model decreases and prediction error increases. High levels of prediction error trigger the gating mechanism to open, causing the event model to be reset and rebuilt. When event models are rebuilt, incoming perceptual information (such as information about objects and actors) is processed in relation to other elements of the event and to semantic representations. Once accurate perceptual predictions can be generated from the event model, the gate closes to prevent further modification of the event model. EST proposes that event boundaries correspond to those moments when event models are reset and updated with new information.
EST draws on several theories of discourse processing, comprehension, and cognitive control. The Structure Building Framework (Gernsbacher, 1985
) and Event Indexing Model (Zwaan & Radvansky, 1998
) indicate that observers build mental models of the current situation in order to comprehend a narrated situation. They also propose that mental models are either rebuilt or updated when information that is incongruent with the current model is encountered. However, EST proposes that event perception is predictive (cf., Wilson & Knoblich, 2005
), rather than integrative (Gernsbacher, 1985
; Zwaan & Radvansky, 1998
). Several models of working memory and cognitive control also posit that a gating mechanism shields active representations of one’s own goals from perceptual input (Botvinick, Braver, Barch, Carter, & Cohen, 2001
; Frank, Loughry, & O’Reilly, 2001
; O’Reilly, Braver, & Cohen, 1999
). The gating mechanism proposed in EST is similar to these in implementation (Reynolds, Zacks, & Braver, 2007
). However, unlike these models EST claims that active memory for the current, observed event is flushed when the model is updated.
Event Segmentation and Memory
EST has strong implications for long-term memory for events–episodic memory–and for the short-term accessibility of information relevant to those events. First, event boundaries should have a privileged status in long-term memory. When the gating mechanism opens at event boundaries, boundary information should be processed more fully, making greater contact with relevant semantic knowledge of the current event and activities. When considering individual objects and actors that are present at boundaries, these should also be processed in relation to each other as a part of the formation of a new event model. Second, EST’s claim that models of the current event are actively maintained in memory suggests that different mechanisms are used to retrieve information from previous events (stored in long-term, weight based representations) than are used to retrieve information from the current event.
Evidence in favor of the prediction that long-term memory is better for boundaries than for nonboundaries is strong: Information is better encoded and later retrieved if it is presented at event boundaries rather than at nonboundaries (Baird & Baldwin, 2001
; Boltz, 1992
; Hanson & Hirst, 1989
; Lassiter, 1988
; Lassiter & Slaw, 1991
; Lassiter, Stone, & Rogers, 1988
; Newtson & Engquist, 1976
; Schwan & Garsoffky, 2004
; Zacks, Speer, Vettel, & Jacoby, 2006
). For example, after watching a film showing goal-directed activities, observers better recognize movie frames from boundary points than from nonboundary points (Newtson & Engquist, 1976
). In another study, participants were asked to view complete films, films that preserved event boundaries and omitted nonboundaries, and films that preserved nonboundaries and omitted event boundaries (Schwan & Garsoffky, 2004
). Event recall and recognition were similar for complete movies and for movies that preserved event boundaries, but were poor for movies that omitted event boundaries. Thus, memory for events appears to rely on the information that is presented at event boundaries.
The majority of evidence that event boundaries affect memory for recent information comes from research on the mental representations used to understand text and discourse. Early work on this topic demonstrated that people’s ability to reproduce discourse verbatim was markedly compromised after syntactical boundaries (e.g., the end of a clause or sentence; Clark & Sengul, 1979
; Jarvella, 1979
). More recently, others have examined how changes in the situation described in text and discourse influence the accessibility of recently encountered information. This work has been primarily driven by the proposal that readers’ comprehension relies on models that represent the current described situation (situation models
; Gernsbacher, 1985
; Glenberg, 1997
; Johnson-Laird, 1989
; Zwaan & Radvansky, 1998
). The claim is that situation models are updated when the situation changes. Therefore, situation changes should take longer to process and should mark when information from the previous situation becomes more difficult to retrieve. Reading time data are consistent with this proposal (Mandler & Goodman, 1982
; Zwaan, Magliano, & Graesser, 1995
; Zwaan, Radvansky, Hilliard, & Curiel, 1998
) and a large body of research shows that situation changes alter the accessibility of information recently presented in text (Glenberg, 1997
; Johnson-Laird, 1989
; Levine & Klin, 2001
; Rapp & Taylor, 2004
; Zwaan, Langston, & Graesser, 1995
; Zwaan & Radvansky, 1998
). Thus, when a protagonist is described as taking off his sweatshirt, putting it down, and then running away, readers have more difficultly recognizing “sweatshirt” than if the protagonist had been described as putting on the sweatshirt (Glenberg et al., 1987
). Other work has tied situation changes specifically to event boundaries in text and in film (Magliano, Miller, & Zwaan, 2001
; Speer & Zacks, 2005
; Speer et al., 2007
) and has shown that readers have difficulty accessing information that was encountered prior to event boundaries in text and picture stories (Gernsbacher, 1985
; Speer & Zacks, 2005
). Finally, there is limited evidence that changes in observed activity and in one’s own spatial location reduce the accessibility of information encountered before the change occurred (Carroll & Bever, 1976
; Radvansky & Copeland, 2006
). These studies offer indirect support for the hypothesis that boundaries in perceived events impact the retrieval of event information. However, none have directly evaluated the relationship between event segmentation in perception and memory for recently encountered information.
These data suggest that information from the current event should be more quickly and accurately retrieved than information from a previous event. However, it is also possible that information maintained in event models may interfere with retrieval from the current event, but not with retrieval from previous events. Interference from multiple competing representations may increase with increased similarity between the to-be-retrieved item and other information maintained in memory, and when the learning and retrieval situations are similar (Anderson & Neely, 1996
; Bunting, 2006
; Underwood, 1957
). Therefore, under some circumstances information from the current event may be more difficult to retrieve than information from the previous event.
In summary, activities are segmented into smaller events as they are perceived. People tend to segment events when there are changes in actors’ positions or movement characteristics, changes in locations, and changes in the goals or intentions of actors. According to EST, when an event boundary occurs mental representations of the current event are updated and actively maintained until the next boundary. This theory suggests that information should be better encoded at event boundaries and that the accessibility of recently encountered information should change once an event boundary occurs.
Goals of the Current Studies
The goals of these experiments were twofold. First, three experiments investigated the association between event segmentation and encoding and retrieval. EST proposes that active representations of “what is happening now” are built at event boundaries. If this is the case, the occurrence of event boundaries during object presentation (presentation-boundaries) should lead to additional processing of those objects. Objects present when an event boundary occurs (boundary objects) should therefore be better encoded than objects for which no boundaries occurred during presentation (nonboundary objects). For example, in one of our stimulus movies a man and his sons are gathering bed sheets. After they gather the sheets, the film cuts to a shot of the actors carrying the laundry down the stairs. An event boundary occurs soon after the cut (reflecting a change in activity and location). A chandelier is on the screen when the boundary occurs (making it a boundary object) and it is later tested. According to EST a new event model should be constructed at the event boundary, and it should contain information about the chandelier and other objects in the scene (such as the pictures on the walls), the objects’ configuration, and the actors’ new inferred goals. The chandelier is processed more as a part of the formation of an event model. As a result, it should be remembered better than if it had been presented when no boundaries occurred (e.g., after the actors started going down the stairs).
A second prediction of EST is that event boundaries should influence retrieval by changing the accessibility of recently presented objects. Event boundaries that occur between object presentation and test (delay-boundaries) determine whether an object must be retrieved from the current event or from a previous event. Delay boundaries could have two effects on accessibility: They could reduce the accessibility of objects from previous events (which need to be retrieved from long-term memory) or they could increase the accessibility of objects from previous events (which should be less susceptible to interference from similar information in active memory). Finally, because the accessibility of an object from a previous event should depend on whether it has been encoded into long-term memory, the effect of delay-boundaries should differ for boundary and nonboundary objects. In the previous example, after the actors go down the first set of stairs the film cuts to a shot showing them carrying the laundry into a basement. An event boundary occurs at this point, reflecting the change in the actors’ location. At this point in time the model for the previous event would be flushed, and a new event model built. Because the chandelier was presented in the previous event it would now need to be retrieved from long-term memory. If the chandelier had not been encoded into long-term memory (which is likely for nonboundary objects), it should be poorly recognized; if it had been encoded into long-term memory (which is likely for boundary objects), the chandelier should still be recognizable.
The second goal of these studies was to characterize the types of information that are stored in event models and that contribute to event memory. Early work on discourse memory demonstrated a dissociation between memory for the lexical content and syntactic structure of a sentence and memory for the meaning of a sentence. Although participant’s ability to recognize changes to the surface features of a sentence drops once a second sentence is presented, their ability to recognize changes to semantic content is well preserved (Sachs, 1967
). The second and third experiments investigated memory for two similar types of information. Like semantic content, conceptual information
in scenes consists of object, character, and scene types (Rosch, 1978
). Like surface information, perceptual information
in scenes allows one to discriminate between individuals in a category; it includes color, shape, orientation, size, and statistical structures. Perceptual information is distinguished from sensory primitives that form an image-like representation of the scene and that have undergone little processing (Oliva, 2005
; Schyns & Oliva, 1994
; Tarr & Bülthoff, 1998
). In tests of conceptual memory, participants chose between a picture of an object that was the same type
of object as the one being tested (e.g., a different chandelier) and a picture of an object that was a different type
of object (e.g., a ceiling fan). In tests of perceptual memory, participants chose between a picture of the object from the movie (e.g., the chandelier) and a picture of the same type of object (e.g., a different chandelier). EST suggests that memory for both conceptual and perceptual information will be better for boundary objects than for nonboundary objects (event models should be constructed from semantic and perceptual input at event boundaries). Furthermore, if the purpose of event models is to generate perceptual
predictions, memory for perceptual information may be more susceptible to delay-boundaries than memory for conceptual information (see also Gernsbacher, 1997
To examine the relationship between event segmentation and memory, these experiments used clips from commercial films that presented objects within the context of common, everyday activities. The movies were engaging and activities complex, encouraging attention to the activities in the films. This should reduce the role of strategies that ignore the activities in favor of attending to objects. It also avoids concerns over the realism of materials constructed in the laboratory, which may appear contrived and could lack the variety and complexity of events encountered in everyday life. An important drawback to this approach, however, is that it precludes the assignment of individual objects to each of the experimental conditions. Therefore, pre-existing differences in the objects (e.g., the speed with which the object can be identified in a scene) were evaluated in two pilot experiments and the analyses used regression to statistically control for these and other object features (e.g., object size).